鱼跟猫 2016-05-05 4500浏览量
E-MapReduce中支持的Hive,会默认在Master节点的Mysql数据库中记录元数据信息。通常,用户会将数据存储在E-MapReduce的HDFS中,使用Hive处理HDFS中的数据。当集群释放时,节点的所有数据包括HDFS数据和Hive元数据都会被删除。前面我撰文说过,我们鼓励用户将数据存储在OSS中,这样可以实现存储和计算的分离,享受到OSS的弹性高可用。更多细节你可以看一下这篇文章。除此之外,我们可能有多个集群,很自然地需要多个集群共享一个Hive元数据仓。总结来说,我们希望在E-MapReduce集群外部创建Hive元数据仓。那么怎么才能做得到呢?了解阿里云生态产品的人会很自然地想到,是否可以用RDS来做Hive元数据仓?答案是肯定的,下面将演示如何在E-MapReduce上使用RDS创建Hive元数据仓。
这里不赘述如何在RDS上创建数据库,如有需要请查看RDS相关文档。创建完数据库,我们需要以下这三个信息:
数据库帐号:hive
数据库密码:Hive001
数据库内网地址:rm-bp************735.mysql.rds.aliyuncs.com
创建Hive元数据库hivemeta,字符集选择 latin1,授权账户hive读写权限。
前面我已经说过,E-MapReduce默认使用Master节点的Mysql作为元数据仓。为了使用RDS来作为元数据仓,我们要修改默认的Hive配置文件。这里我们需要准备一个自定义的配置文件。关于自定义配置文件格式,我们可以看E-MapeReduce官方文档。下面是我的配置文件hive-site.json:
{
"configurations": [
{
"classification": "hive-site",
"properties": {
"javax.jdo.option.ConnectionUserName": "hive",
"javax.jdo.option.ConnectionPassword": "Hive001",
"javax.jdo.option.ConnectionURL": "jdbc:mysql://rm-bp************735.mysql.rds.aliyuncs.com:3306/hivemeta?createDatabaseIfNotExist=true",
"hive.metastore.uris": "thrift://localhost:9083"
}
}
]
}
将上面的文件上传到OSS任意目录,下一步会用到这个配置文件。
这里不赘述集群创建过程,如有需要请查看E-MapReduce相关文档。需要注意的是,在第三步“软件配置”中,我们需要在“软件配置(可选)”这一项选择OSS中的hive-site.json文件。
将上一步创建的集群机器内网IP配置到RDS白名单中。
Last login: Thu May 5 10:02:12 2016 from 42.120.74.97
Welcome to aliyun Elastic Compute Service!
[root@emr-header-1 ~]#
[root@emr-header-1 ~]# su hadoop
[hadoop@emr-header-1 root]$
[hadoop@emr-header-1 ~]$ hive
Logging initialized using configuration in file:/etc/emr/hive-conf-1.0.1/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apps/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/hbase-1.1.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apps/apache-hive-1.0.1-bin/lib/hive-jdbc-1.0.1-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive>
hive> CREATE EXTERNAL TABLE emrusers (
> userid INT,
> movieid INT,
> rating INT,
> unixtime STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> LOCATION 'oss://y***********n:m************************4@xxx.oss-cn-hangzhou-internal.aliyuncs.com/tmp/hive';
hive> select count(*) from emrusers;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.
Query ID = hadoop_20160505102931_a476ce8d-7c4e-45f8-a953-4e8e37c91354
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1462363452366_0004, Tracking URL = http://xxxxxxxxxx:20888/proxy/application_1462363452366_0004/
Kill Command = /usr/lib/hadoop-current/bin/hadoop job -kill job_1462363452366_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-05-05 10:35:06,061 Stage-1 map = 0%, reduce = 0%
2016-05-05 10:35:14,163 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.59 sec
2016-05-05 10:35:20,453 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.1 sec
MapReduce Total cumulative CPU time: 5 seconds 100 msec
Ended Job = job_1462363452366_0004
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.1 sec HDFS Read: 8168 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 100 msec
OK
100000
Time taken: 36.085 seconds, Fetched: 1 row(s)
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。
阿里巴巴开源大数据技术团队成立阿里云EMR技术圈, 每周推送前沿技术文章,直播分享经典案例、在线答疑,营造纯粹的开源大数据氛围,欢迎加入!加入钉钉群聊阿里云E-MapReduce交流2群,点击进入查看详情 https://qr.dingtalk.com/action/joingroup?code=v1,k1,cNBcqHn4TvG0iHpN3cSc1B86D1831SGMdvGu7PW+sm4=&_dt_no_comment=1&origin=11