hive0.13 rows loaded为空问题源码分析及fix-阿里云开发者社区

hive0.13 rows loaded为空问题源码分析及fix

2017-11-15 1109

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

升级hive0.13之后发现job运行完成后Rows loaded的信息没有了。

rows loaded的信息在hive0.11中由HiveHistory类的printRowCount输出。HiveHistory类的主要用途是记录job运行的信息，包括task的counter等。默认的目录在/tmp/$user中。

hive0.11在SessionState 的start方法中会初始化HiveHistory的对象

 
        if 
        (startSs. hiveHist == 
        null
        ) { 
       
        startSs. hiveHist = 
        new 
        HiveHistory(startSs); 
       
        }

而在hive0.13中HiveHistory是一个抽象类，其具体的实现在HiveHistoryImpl类中，其中初始化HiveHistoryImpl对象时增加了一层判断,判断hive.session.history.enabled的设置（默认为false），导致不会实例化HiveHistoryImpl类

 
        if
        (startSs.hiveHist == 
        null
        ){ 
       
        if 
        (startSs.getConf().getBoolVar(HiveConf.ConfVars.HIVE_SESSION_HISTORY_ENABLED)) { 
       
        startSs.hiveHist = 
        new 
        HiveHistoryImpl (startSs); 
       
        }
        else 
        { 
       
        //Hive history is disabled, create a no-op proxy 
       
        startSs.hiveHist = HiveHistoryProxyHandler .getNoOpHiveHistoryProxy(); 
       
        } 
       
        }

在fix这个配置之后，仍然没有发现rows loaded的信息，通过分析源码

printRowCount方法的实现如下：

 
        public 
        void 
        printRowCount(String queryId) { 
       
        QueryInfo ji = queryInfoMap.get(queryId); 
       
        if 
        (ji == 
        null
        ) {   
        // 如果ji为空，则直接返回 
       
        return
        ; 
       
        } 
       
        for 
        (String tab : ji. rowCountMap.keySet()) { 
       
        console.printInfo(ji. rowCountMap.get(tab) + 
        " Rows loaded to " 
        + tab); 
        // 从hashmap中获取数据 
       
        } 
       
        }

在hive0.13中，这里获取的ji对象是空值。

近一步发现，是由于counter中没有TABLE_ID_(\\d+)_ROWCOUNT，导致不能匹配ROW_COUNT_PATTERN的正则。就不能正常获取的row count的值。

其中获取tasker count的rows loaded信息的getRowCountTableName方法内容如下:

 
        private 
        static 
        final 
        String ROW_COUNT_PATTERN = 
        "TABLE_ID_(\\d+)_ROWCOUNT"
        ; 
       
        private 
        static 
        final 
        Pattern rowCountPattern = Pattern.compile(ROW_COUNT_PATTERN); 
       
        ......
       
        String getRowCountTableName(String name) { 
       
        if 
        (idToTableMap == 
        null
        ) { 
       
        return 
        null
        ; 
       
        } 
       
        Matcher m = rowCountPattern.matcher(name); 
       
        if 
        (m.find()) {  
        // //没有和TABLE_ID_xxxx match的counter导致，即counter没有打印出TABLE_ID_(\\d+)_ROWCOUNT导致。。 
       
        String tuple = m.group(
        1
        ); 
       
        return 
        idToTableMap.get(tuple); 
       
        } 
       
        return 
        null
        ; 
       
        }

而TABLE_ID_(\\d+)_ROWCOUNT是由FileSinkOperator类负责写入的。hive0.11中相关的代码如下：

 
        protected 
        void 
        initializeOp(Configuration hconf) 
        throws 
        HiveException { 
       
        ..........
       
        int 
        id = conf.getDestTableId(); 
       
        if 
        ((id != 
        0
        ) && (id <= TableIdEnum. values().length)) { 
       
        String enumName = 
        "TABLE_ID_" 
        + String.valueOf(id) + 
        "_ROWCOUNT"
        ; 
       
        tabIdEnum = TableIdEnum.valueOf(enumName); 
       
        row_count = 
        new 
        LongWritable(); 
       
        statsMap.put( tabIdEnum, row_count ); 
       
        }

而在hive0.13中这部分代码都被去掉了，找到了原因，fix也比较简单，把这个counter加回去就可了。

patch如下：

 
        diff --git a/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java b/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
       
        index 1dde78e..96860f7 
        100644 
       
        --- a/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
       
        +++ b/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java
       
        @@ -
        68
        ,
        13 
        +
        68
        ,
        16 
        @@ 
       
        import 
        org.apache.hadoop.util.ReflectionUtils; 
       
        import 
        com.google.common.collect.Lists; 
       
        +
        import 
        org.apache.commons.logging.Log; 
       
        +
        import 
        org.apache.commons.logging.LogFactory; 
       
        +
       
        /**
       
        * File Sink operator implementation. 
       
        **/ 
       
        public 
        class 
        FileSinkOperator 
        extends 
        TerminalOperator<FileSinkDesc> 
        implements 
       
        Serializable { 
       
        -
       
        +  
        public 
        static 
        Log LOG = LogFactory.getLog(
        "FileSinkOperator.class"
        ); 
       
        protected 
        transient 
        HashMap<String, FSPaths> valToPaths; 
       
        protected 
        transient 
        int 
        numDynParts; 
       
        protected 
        transient 
        List<String> dpColNames; 
       
        @@ -
        214
        ,
        6 
        +
        217
        ,
        7 
        @@ 
        public 
        Stat getStat() { 
       
        protected 
        transient 
        FileSystem fs; 
       
        protected 
        transient 
        Serializer serializer; 
       
        protected 
        transient 
        LongWritable row_count; 
       
        +  
        protected 
        transient 
        TableIdEnum tabIdEnum = 
        null
        ; 
       
        private 
        transient 
        boolean 
        isNativeTable = 
        true
        ; 
       
        /** 
       
        @@ -
        241
        ,
        6 
        +
        245
        ,
        23 
        @@ 
        public 
        Stat getStat() { 
       
        protected 
        transient 
        JobConf jc; 
       
        Class<? 
        extends 
        Writable> outputClass; 
       
        String taskId; 
       
        +  
        public 
        static 
        enum 
        TableIdEnum { 
       
        +       TABLE_ID_1_ROWCOUNT,
       
        +       TABLE_ID_2_ROWCOUNT,
       
        +       TABLE_ID_3_ROWCOUNT,
       
        +       TABLE_ID_4_ROWCOUNT,
       
        +       TABLE_ID_5_ROWCOUNT,
       
        +       TABLE_ID_6_ROWCOUNT,
       
        +       TABLE_ID_7_ROWCOUNT,
       
        +       TABLE_ID_8_ROWCOUNT,
       
        +       TABLE_ID_9_ROWCOUNT,
       
        +       TABLE_ID_10_ROWCOUNT,
       
        +       TABLE_ID_11_ROWCOUNT,
       
        +       TABLE_ID_12_ROWCOUNT,
       
        +       TABLE_ID_13_ROWCOUNT,
       
        +       TABLE_ID_14_ROWCOUNT,
       
        +       TABLE_ID_15_ROWCOUNT;
       
        +  }
       
        protected 
        boolean 
        filesCreated = 
        false
        ; 
       
        @@ -
        317
        ,
        7 
        +
        338
        ,
        15 
        @@ 
        protected 
        void 
        initializeOp(Configuration hconf) 
        throws 
        HiveException { 
       
        prtner = (HivePartitioner<HiveKey, Object>) ReflectionUtils.newInstance( 
       
        jc.getPartitionerClass(), 
        null
        ); 
       
        } 
       
        -      row_count = 
        new 
        LongWritable(); 
       
        +      
        //row_count = new LongWritable(); 
       
        +         
        int 
        id = conf.getDestTableId(); 
       
        +         
        if 
        ((id != 
        0
        ) && (id <= TableIdEnum.values().length)) { 
       
        +               String enumName = 
        "TABLE_ID_" 
        + String.valueOf(id) + 
        "_ROWCOUNT"
        ;   
       
        +               tabIdEnum = TableIdEnum.valueOf(enumName);
       
        +               row_count = 
        new 
        LongWritable(); 
       
        +               statsMap.put(tabIdEnum, row_count);
       
        +         }
       
        +
       
        if 
        (dpCtx != 
        null
        ) { 
       
        dpSetup(); 
       
        }

打完patch后，重新打包，替换线上的hive-exec-xxx.jar包之后测试，rows loaded的数据又回来了。

本文转自菜菜光 51CTO博客，原文链接：http://blog.51cto.com/caiguangguang/1528516，如需转载请自行联系原作者

hive0.13 rows loaded为空问题源码分析及fix

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

hive0.13 rows loaded为空问题源码分析及fix

热门文章

最新文章

相关课程

相关电子书