【原创】hive关联hbase表后导致统计数据报错

简介: 环境说明: 搭建好的hadoop+hbase+zookeeper集群,因为hbase里面查询数据不支持select语句,所以搭建起了hive(数据仓库)。我的hive搭建过程也不做太多的介绍,用的是第三方数据库mysql存储hive的元数据。
环境说明:
搭建好的hadoop+hbase+zookeeper集群,因为hbase里面查询数据不支持select语句,所以搭建起了hive(数据仓库)。我的hive搭建过程也不做太多的介绍,用的是第三方数据库mysql存储hive的元数据。在hive里面我把hbase数据库的xyz表和hive里面的hbase_table_1表关联上,然后执行select * from table可以查到数据,但是select count(*) from table死活报错,结果是mapreduce的任务没跑成功。截图如下:
先查看hbase数据库的xyz表的数据
hbase(main):001:0> scan 'xyz'
ROW                  COLUMN+CELL                                               
10000                column=cf1:val, timestamp=1340091488116, value=China     
1 row(s) in 0.6730 seconds
hbase(main):002:0>
其次查看hive中的hbase_table_1表的数据
hive> select * from hbase_table_1;
OK
10000   China
Time taken: 4.133 seconds
hive>
最后我在hive里要做统计多少行命令和报错信息
hive> select count(*) from  hbase_table_1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_201206190956_0001, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201206190956_0001
Kill Command = /opt/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=master:9002 -kill job_201206190956_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-06-20 10:46:58,214 Stage-1 map = 0%,  reduce = 0%
2012-06-20 10:47:58,795 Stage-1 map = 0%,  reduce = 0%
2012-06-20 10:48:03,875 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201206190956_0001 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201206190956_0001_m_000002 (and more) from job job_201206190956_0001
Exception in thread "Thread-36" java.lang.RuntimeException: Error while reading from task log url
        at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
        at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
        at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://slave1:50060/tasklog?taskid=attempt_201206190956_0001_m_000000_1&start=-8193
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305)
        at java.net.URL.openStream(URL.java:1009)
        at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
        ... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
经过几天上网查询和同事沟通的结果,解决上述问题需要有两个步骤:
1、在hive的配置文件hive-site.xml里面增加如下内容,当然value里面的值根据你自己的实际情况来写
 
    hive.aux.jars.path
 file:///opt/hive/lib/hive-hbase-handler-0.8.1.jar,file:///opt/hive/lib/h
base-0.92.1.jar,file:///opt/hive/lib/zookeeper-3.3.1.jar
 
2、然后将namenode节点的hbase配置文件hbase-site.xml拷贝到hadoop的conf目录下,最后将你的它用rsync同步到所有的datanode节点上。
最后我们在查一下试试?
hive> select count(*) from hbase_table_1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapred.reduce.tasks=
Starting Job = job_201206190956_0003, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201206190956_0003
Kill Command = /opt/hadoop/libexec/../bin/hadoop job  -Dmapred.job.tracker=master:9002 -kill job_201206190956_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-06-20 12:04:12,499 Stage-1 map = 0%,  reduce = 0%
2012-06-20 12:04:27,668 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:28,682 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:29,703 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:30,713 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:31,724 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:32,734 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:33,757 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:34,768 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:35,777 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:36,788 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:37,798 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:38,808 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:39,869 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:40,880 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:42,126 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:43,136 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:44,145 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:45,155 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:46,164 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:47,174 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:48,183 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.82 sec
2012-06-20 12:04:49,236 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:50,247 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:51,267 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:52,277 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:53,288 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:54,320 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:55,330 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:56,341 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:57,364 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
2012-06-20 12:04:58,375 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.48 sec
MapReduce Total cumulative CPU time: 7 seconds 480 msec
Ended Job = job_201206190956_0003
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   Accumulative CPU: 7.48 sec   HDFS Read: 240 HDFS Write: 2 SUCESS
Total MapReduce CPU Time Spent: 7 seconds 480 msec
OK
1
Time taken: 92.04 seconds
可以了!因为我的表中只有一行数据!用的虚拟机比较慢,哎!!!
 


 
相关实践学习
云数据库HBase版使用教程
  相关的阿里云产品:云数据库 HBase 版 面向大数据领域的一站式NoSQL服务,100%兼容开源HBase并深度扩展,支持海量数据下的实时存储、高并发吞吐、轻SQL分析、全文检索、时序时空查询等能力,是风控、推荐、广告、物联网、车联网、Feeds流、数据大屏等场景首选数据库,是为淘宝、支付宝、菜鸟等众多阿里核心业务提供关键支撑的数据库。 了解产品详情: https://cn.aliyun.com/product/hbase   ------------------------------------------------------------------------- 阿里云数据库体验:数据库上云实战 开发者云会免费提供一台带自建MySQL的源数据库 ECS 实例和一台目标数据库 RDS实例。跟着指引,您可以一步步实现将ECS自建数据库迁移到目标数据库RDS。 点击下方链接,领取免费ECS&RDS资源,30分钟完成数据库上云实战!https://developer.aliyun.com/adc/scenario/51eefbd1894e42f6bb9acacadd3f9121?spm=a2c6h.13788135.J_3257954370.9.4ba85f24utseFl
目录
相关文章
|
1月前
|
SQL 分布式计算 DataWorks
DataWorks报错问题之集成hive数据源报错如何解决
DataWorks是阿里云提供的一站式大数据开发与管理平台,支持数据集成、数据开发、数据治理等功能;在本汇总中,我们梳理了DataWorks产品在使用过程中经常遇到的问题及解答,以助用户在数据处理和分析工作中提高效率,降低难度。
|
1月前
|
SQL 关系型数据库 MySQL
Sqoop【付诸实践 01】Sqoop1最新版 MySQL与HDFS\Hive\HBase 核心导入导出案例分享+多个WRAN及Exception问题处理(一篇即可学会在日常工作中使用Sqoop)
【2月更文挑战第9天】Sqoop【付诸实践 01】Sqoop1最新版 MySQL与HDFS\Hive\HBase 核心导入导出案例分享+多个WRAN及Exception问题处理(一篇即可学会在日常工作中使用Sqoop)
89 7
|
1月前
|
SQL DataWorks NoSQL
DataWorks报错问题之从hive到mysql报错如何解决
DataWorks是阿里云提供的一站式大数据开发与管理平台,支持数据集成、数据开发、数据治理等功能;在本汇总中,我们梳理了DataWorks产品在使用过程中经常遇到的问题及解答,以助用户在数据处理和分析工作中提高效率,降低难度。
|
2月前
|
SQL 消息中间件 分布式数据库
flink sql问题之连接HBase报错如何解决
Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。本合集提供有关Apache Flink相关技术、使用技巧和最佳实践的资源。
185 0
|
2月前
|
SQL 消息中间件 Kafka
Flink部署问题之hive表没有数据如何解决
Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。本合集提供有关Apache Flink相关技术、使用技巧和最佳实践的资源。
|
2月前
|
SQL Java Apache
Flink报错问题之flink-1.11写hive报错如何解决
Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。本合集提供有关Apache Flink相关技术、使用技巧和最佳实践的资源。
|
2月前
|
SQL 关系型数据库 分布式数据库
Flink报错问题之用flush方法写入hbase报错如何解决
Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。本合集提供有关Apache Flink相关技术、使用技巧和最佳实践的资源。
|
2月前
|
SQL 消息中间件 Apache
Flink报错问题之使用hive udf函数报错如何解决
Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。本合集提供有关Apache Flink相关技术、使用技巧和最佳实践的资源。
|
2月前
|
SQL 消息中间件 Java
Flink报错问题之写入Hive报错如何解决
Apache Flink是由Apache软件基金会开发的开源流处理框架,其核心是用Java和Scala编写的分布式流数据流引擎。本合集提供有关Apache Flink相关技术、使用技巧和最佳实践的资源。
|
4月前
|
SQL 分布式数据库 HIVE
Hbase 和Hive表关联
Hbase 和Hive表关联
39 0