实现hive proxy2-hive操作hadoop时使用用户的地方-阿里云开发者社区

实现hive proxy2-hive操作hadoop时使用用户的地方

2017-11-07 1414

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

日志服务 SLS，月写入数据量 50GB 1个月

简介：

hive权限有两层，hive本身的验证和hadoop的验证。自定义hive的proxy功能时，hive层面的相关验证更改在
http://caiguangguang.blog.51cto.com/1652935/1587251

中已经提过，这里说下hive和hadoop以及本地文件系统相关的几个出现用户的地方：
1.job的log文件

session初始化时会初始化日志文件，主要在SessionState的start方法中：

 
        public 
        static 
        SessionState start(SessionState startSs) { 
       
        setCurrentSessionState(startSs); 
       
        if
        (startSs.hiveHist == 
        null
        ){ 
       
        if 
        (startSs.getConf().getBoolVar(HiveConf.ConfVars.HIVE_SESSION_HISTORY_ENABLED)) {  
       
        // 如果设置hive.session.history.enabled为true，则会初始化日志文件,默认为false 
       
        startSs.hiveHist = 
        new 
        HiveHistoryImpl(startSs);   
        // 由HiveHistoryImpl 产生日志文件 
       
        }
        else 
        { 
       
        //Hive history is disabled, create a no-op proxy 
       
        startSs.hiveHist = HiveHistoryProxyHandler.getNoOpHiveHistoryProxy(); 
       
        } 
       
        } 
       
        ...

再来看org.apache.hadoop.hive.ql.history.HiveHistoryImpl类的构造函数，定义了日志的路径，如果日志目录不存在，则创建目录：

 
        public 
        HiveHistoryImpl(SessionState ss) { 
       
        try 
        { 
       
        console = 
        new 
        LogHelper(LOG); 
       
        String conf_file_loc = ss.getConf().getVar( 
       
        HiveConf.ConfVars.HIVEHISTORYFILELOC);  
       
        //HIVEHISTORYFILELOC("hive.querylog.location", System.getProperty("java.io.tmpdir") + File.separator + System.getProperty("user.name")), 
       
        默认值是/tmp/${user.name}/目录 
       
        if 
        ((conf_file_loc == 
        null
        ) || conf_file_loc.length() == 
        0
        ) { 
       
        console.printError(
        "No history file location given"
        ); 
       
        return
        ; 
       
        } 
       
        // Create directory 
       
        File histDir = 
        new 
        File(conf_file_loc); 
       
        if 
        (!histDir.exists()) { 
        //创建日志目录 
       
        if 
        (!histDir.mkdirs()) { 
       
        console.printError(
        "Unable to create log directory " 
        + conf_file_loc); 
       
        return
        ; 
       
        } 
       
        } 
       
        do 
        { 
       
        histFileName = conf_file_loc + File.separator + 
        "hive_job_log_" 
        + ss.getSessionId() + 
        "_" 
       
        + Math.abs(randGen.nextInt()) + 
        ".txt"
        ;  
       
        // 日志文件的完整路径 比如 /tmp/hdfs/hive_job_log_4f96f470-a6c1-41ae-9d30-def308e5412f_564454280.txt
       
        /tmp/hdfs/hive_job_log_sessionid_随机数.txt
       
        } 
        while 
        (! 
        new 
        File(histFileName).createNewFile()); 
       
        console.printInfo(
        "Hive history file=" 
        + histFileName); 
       
        histStream = 
        new 
        PrintWriter(histFileName); 
       
        HashMap<String, String> hm = 
        new 
        HashMap<String, String>(); 
       
        hm.put(Keys.SESSION_ID.name(), ss.getSessionId()); 
       
        log(RecordTypes.SessionStart, hm); 
       
        } 
        catch 
        (IOException e) { 
       
        console.printError(
        "FAILED: Failed to open Query Log : " 
        + histFileName 
       
        + 
        " " 
        + e.getMessage(), 
        "\n" 
       
        + org.apache.hadoop.util.StringUtils.stringifyException(e)); 
       
        } 
       
        }

2.job的中间文件
hive执行过程中保存在hdfs的路径，由hive.exec.scratchdir和hive.exec.local.scratchdir定义
scratch文件是在org.apache.hadoop.hive.ql.Context类的构造方法中获取
关于scratch目录的相关配置：

 
        SCRATCHDIR(
        "hive.exec.scratchdir"
        , 
        "/tmp/hive-" 
        + System.getProperty(
        "user.name"
        )),   
       
        //默认值为/tmp/hive-当前登录用户
       
        LOCALSCRATCHDIR(
        "hive.exec.local.scratchdir"
        , System.getProperty(
        "java.io.tmpdir"
        ) + File.separator + System.etProperty(
        "user.name"
        )), 
       
        SCRATCHDIRPERMISSION(
        "hive.scratch.dir.permission"
        , 
        "700"
        ),

在org.apache.hadoop.hive.ql.Context类的构造方法中

 
        // scratch path to use for all non-local (ie. hdfs) file system tmp folders 
       
        private 
        final 
        Path nonLocalScratchPath; 
       
        // scratch directory to use for local file system tmp folders 
       
        private 
        final 
        String localScratchDir ; 
       
        // the permission to scratch directory (local and hdfs ) 
       
        private 
        final 
        String scratchDirPermission ; 
       
        ...
       
        public 
        Context(Configuration conf, String executionId)  { 
       
        this
        .conf = conf; 
       
        this
        .executionId = executionId; 
       
        // local & non-local tmp location is configurable. however it is the same across 
       
        // all external file systems 
       
        nonLocalScratchPath = 
       
        new 
        Path(HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIR), 
       
        executionId); 
       
        localScratchDir = 
        new 
        Path(HiveConf.getVar(conf, HiveConf.ConfVars.LOCALSCRATCHDIR), 
       
        executionId).toUri().getPath(); 
       
        scratchDirPermission= HiveConf.getVar(conf, HiveConf.ConfVars.SCRATCHDIRPERMISSION); 
       
        }

在Driver的compile方法中会初始化这个对象。

3.job提交的用户

 
        JobClient的init方法
       
        UserGroupInformation clientUgi; 
       
        ....
       
        public 
        void 
        init( JobConf conf) 
        throws 
        IOException { 
       
        setConf(conf); 
       
        cluster = 
        new 
        Cluster(conf); 
       
        clientUgi = UserGroupInformation.getCurrentUser(); 
       
        }

这里增加proxy比较容易，用UserGroupInformation的createRemoteUser方法即可：
比如把init方法改为：

 
        public 
        void 
        init(JobConf conf) 
        throws 
        IOException { 
       
        setConf(conf); 
       
        cluster = 
        new 
        Cluster(conf); 
       
        if 
        (conf.getBoolean(
        "use.custom.proxy"
        ,
        false
        )) 
       
        { 
       
        String proxyUser = conf.get(
        "custom.proxy.user"
        ); 
       
        clientUgi = UserGroupInformation.createRemoteUser(proxyUser); 
       
        }
        else
        { 
       
        clientUgi = UserGroupInformation.getCurrentUser(); 
       
        } 
       
        LOG.warn(
        "clientUgi is " 
        + clientUgi); 
       
        }

 
  本文转自菜菜光 51CTO博客，原文链接：http://blog.51cto.com/caiguangguang/1589874，如需转载请自行联系原作者

实现hive proxy2-hive操作hadoop时使用用户的地方

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

实现hive proxy2-hive操作hadoop时使用用户的地方

热门文章

最新文章

相关课程

相关电子书

相关实验场景