hive2solr multivalue功能实现-阿里云开发者社区

hive2solr multivalue功能实现

2017-11-22 1717

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

之前介绍了github上的hive2solr项目和solr的multivalue功能。
线上我们是采用hive计算完数据后推送到solr的方法，如果需要实现multivalue的话，默认的hive2solr是有些问题的。即使在hive中对于的field是多个字，导入solr之后也只是一个整体的字符串，比如下面表的数据如下：

 
        id        
        test_s  test_ss 
       
        3       d       f d h

其中test_ss为multivalue类型,导入solr之后：

 
        {
       
        "test_ss": [ 
       
        "f d h"  //识别为一个元素 
       
        ], 
       
        "test_s": "d", 
       
        "id": "3", 
       
        "_version_": 1472413953618346000 
       
        }

如果直接由hive生成数组插入solr会报array转换string失败的错误。

 
  
    
      
      
        select 
        id,test_s,split(test_ss,
        ' '
        ) 
        from 
        t2; 
       
 
        FAILED: NoMatchingMethodException 
        No 
        matching method 
        for 
        class org.apache.hadoop.hive.ql.udf.UDFToString  
       
 
        with 
        (array<string>). Possible choices: _FUNC_(void)  _FUNC_(boolean)  _FUNC_(tinyint)  _FUNC_(
        smallint
        )  
       
 
         
        _FUNC_(
        int
        )  _FUNC_(
        bigint
        )  _FUNC_(
        float
        )  _FUNC_(
        double
        )  _FUNC_(string)  _FUNC_(
        timestamp
        )  _FUNC_(
        decimal
        )  _FUNC_(
        binary
        ) 
       
 
    

   
 

在hive向solr写入数据主要通过SolrWriter的write方法实现的，其最终是调用了SolrInputDocument的setField方法,可以通过更改代码为如下内容来workaround。
SolrWriter的write方法：

 
        @Override 
       
        public 
        void 
        write(Writable w) 
        throws 
        IOException { 
       
        MapWritable map = (MapWritable) w; 
       
        SolrInputDocument doc = 
        new 
        SolrInputDocument(); 
       
        for 
        (
        final 
        Map.Entry<Writable, Writable> entry : map.entrySet()) { 
       
        String key = entry.getKey().toString(); 
       
        doc.setField(key, entry.getValue().toString());  
        //调用了SolrInputDocument的setField方法 
       
        } 
       
        table.save(doc); 
       
        }

更改为：

 
        @Override 
       
        public 
        void 
        write(Writable w) 
        throws 
        IOException { 
       
        MapWritable map = (MapWritable ) w; 
       
        SolrInputDocument doc = 
        new 
        SolrInputDocument(); 
       
        for 
        (
        final 
        Map.Entry<Writable , Writable> entry : map.entrySet()) { 
       
        String key = entry.getKey().toString(); 
       
        String value = entry.getValue().toString(); 
       
        String[] sl = value.split( 
        "\\s+"
        );  
        //即把hive输入的数据通过空格分隔，切成数组（hive的sql只要concact即可）       
       
        List<String> valuesl = java.util.Arrays.asList(sl); 
       
        log.info(
        "add entry value lists:" 
        + valuesl); 
       
        for
        (String vl :valuesl){ 
       
        doc.addField(key,vl); 
        //改为调用addFiled的方法，防止覆盖 
       
        } 
       
        } 
       
        table.save(doc); 
       
        }

导入测试结果：

 
        {
       
        "test_ss"
        : [ 
       
        "f"
        , 
       
        "d"
        , 
       
        "h" 
       
        ], 
       
        "test_s"
        : 
        "d"
        , 
       
        "id"
        : 
        "3"
        , 
       
        "_version_"
        : 
        1472422023801077800 
       
        }

 
  本文转自菜菜光 51CTO博客，原文链接：http://blog.51cto.com/caiguangguang/1433770，如需转载请自行联系原作者

文章标签：

SQL

HIVE

关键词：

Hive功能

hive2solr multivalue功能实现

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

hive2solr multivalue功能实现

热门文章

最新文章

相关课程

相关电子书

相关实验场景