一、安装环境
JDK 1.8
二、安装Hadoop
1、下载hadoop
http://mirror.bit.edu.cn/apache/hadoop/ 选择合适的版本
下载hadoop
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
执行 进行解压,为了方便使用吗,mv进行修改名称
1. tar -xzvf hadoop-3.3.0.tar.gz 2. mv hadoop-3.3.0.tar.gz hadoop
2、修改环境变量
将hadoop环境信息写入环境变量中
vim /etc/profile export HADOOP_HOME=/opt/hadoop export PATH=$HADOOP_HOME/bin:$PATH
执行source etc/profile使其生效
3、修改配置文件
修改hadoop-env.sh文件,vim etc/hadoop/hadoop-env.sh修改JAVA_HOME信息
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el7_8.x86_64
执行hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar grep input output 'dfs[a-z]',hadoop自带的例子,验证hadoop是否安装成功
三、安装hive
1、下载hive
wget http://mirror.bit.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
解压tar -zxvf apache-hive-3.1.2-bin.tar.gz
修改名称 mv apache-hive-3.1.2-bin hive
2、修改环境变量
vim /etc/profile
export HIVE_HOME=/opt/hive export PATH=$MAVEN_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
source etc/profile
3、修改hivesite 配置
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! --> <!-- WARNING!!! Any changes you make to this file will be ignored by Hive. --> <!-- WARNING!!! You must make your changes in hive-site.xml instead. --> <!-- Hive Execution Parameters --> <!-- 以下配置原配置都有,搜索之后进行修改或者删除后在统一位置添加 --> <property> <name>javax.jdo.option.ConnectionUserName</name>用户名 <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name>密码 <value> 123456 </value> </property> <property> <name>javax.jdo.option.ConnectionURL</name>mysql <value>jdbc:mysql: //127.0.0.1:3306/hive</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name>mysql驱动程序 <value>com.mysql.jdbc.Driver</value> </property> <property> <name>hive.exec.script.wrapper</name> <value/> <description/> </property>
复制mysql的驱动程序到hive/lib下面,然后进入/hive/bin 目录执行
schematool -dbType mysql -initSchema
4、验证是否安装成功
hive --version查看当前版本
hive 看是否进入hive命令操作行,进去的话说明成功
四、Hive数据集成
配置了hive的钩子后,在hive中做任何操作,都会被钩子所感应到,并以事件的形式发布到kafka,然后,atlas的Ingest模块会消费到kafka中的消息,并解析生成相应的atlas元数据写入底层的Janus图数据库来存储管理;
1、Hive同步配置集成
修改hive-env.sh,指定hive钩子的jar包位置,钩子的jar包和工具在atlas编译完成之后自动生成,在apache-atlas-sources-2.1.0/distro/target/目录下
export HIVE_AUX_JARS_PATH=/opt/apache-atlas-2.1.0/hook/hive
修改hive-site.xml,指定钩子执行的方法
<property> <name>hive.exec.post.hooks</name> <value>org.apache.atlas.hive.hook.HiveHook</value> </property>
注意,这里其实是执行后的监控,可以有执行前,执行中的监控。其实就是一个执行生命周期的回调监控。
2、全量同步配置
拷贝atlas配置文件atlas-application.properties到hive配置目录
添加两行配置:
atlas.hook.hive.synchronous=false atlas.rest.address=http://doit33:21000
atlas安装之前,hive中已存在的表,钩子是不会自动感应并生成相关元数据的;可以通过atlas的一个工具,来对已存在的hive库或表进行元数据导入;该工具也是存在atlas编译生成的hive-hook包里。
bin/import-hive.sh
执行结果如下,导入数据需要输入atlas的账号密码,输入完之后会开始导入数据,
提示Hive Meta Data imported successfully!!!说明数据导入成功
sh import-hive.sh Using Hive configuration directory [/opt/hive/conf] Log file for import is /opt/apache-atlas-2.1.0/logs/import-hive.log SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] 2021-01-15T11:41:01,614 INFO [main] org.apache.atlas.ApplicationProperties - Looking for atlas-application.properties in classpath 2021-01-15T11:41:01,619 INFO [main] org.apache.atlas.ApplicationProperties - Loading atlas-application.properties from file:/opt/hive/conf/atlas-application.properties 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Using graphdb backend 'janus' 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Using storage backend 'hbase2' 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Using index backend 'solr' 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Atlas is running in MODE: PROD. 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Setting solr-wait-searcher property 'true' 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Setting index.search.map-name property 'false' 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Setting atlas.graph.index.search.max-result-set-size = 150 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache = true 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-clean-wait = 20 2021-01-15T11:41:01,660 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.db-cache-size = 0.5 2021-01-15T11:41:01,661 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-cache-size = 15000 2021-01-15T11:41:01,661 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-dirty-size = 120 Enter username for atlas :- admin #手动输入atlas用户名和密码 Enter password for atlas :- 2021-01-15T11:41:05,721 INFO [main] org.apache.atlas.AtlasBaseClient - Trying with address http://127.0.0.1:21000 2021-01-15T11:41:05,831 INFO [main] org.apache.atlas.AtlasBaseClient - method=GET path=api/atlas/admin/status contentType=application/json; charset=UTF-8 accept=application/json status=200
3、钩子测试
配置好所有钩子之后,hive中尝试创建一个测试表,再看一下atlas中是否可以搜索到。可以就算配置成功了
创建之前,数据表信息展示如下
之后在hive里再创建一张表
hive> CREATE TABLE `teache` ( > `id` int , > `name` string , > `age` int , > `sex` string, > `peoject` string > ) ; OK Time taken: 0.645 seconds hive> show tables; OK class student teache Time taken: 0.108 seconds, Fetched: 3 row(s)
atlas自动就有了
五、错误记录
1、配置文件中存在异常字符
根据指定的
Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common- 3.1 . 2 .jar!/hive-log4j2.properties Async: true Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at org.apache.hadoop.fs.Path.initialize(Path.java: 263 ) at org.apache.hadoop.fs.Path.<init>(Path.java: 221 ) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java: 710 ) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java: 627 ) at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java: 591 ) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java: 747 ) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java: 683 ) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 498 ) at org.apache.hadoop.util.RunJar.run(RunJar.java: 323 ) at org.apache.hadoop.util.RunJar.main(RunJar.java: 236 ) Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D at java.net.URI.checkPath(URI.java: 1823 ) at java.net.URI.<init>(URI.java: 745 ) at org.apache.hadoop.fs.Path.initialize(Path.java: 260 ) ... 12 more
解决方式:
找到指定的配置文件行数,将描述进行删除
<property> <name>hive.exec.scratchdir</name> <value>/tmp/hive</value> <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description> </property> <property> <name>hive.exec.local.scratchdir</name> <value>/tmp/hive/local</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/tmp/hive/resources</value> <description>Temporary local directory for added resources in the remote file system.</description> </property>
2、guava版本不一致
Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8 at [row,col,system-id]: [ 3215 , 96 , "file:/opt/hive/conf/hive-site.xml" ] at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java: 3051 ) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java: 3000 ) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java: 2875 ) at org.apache.hadoop.conf.Configuration.get(Configuration.java: 1484 ) at org.apache.hadoop.hive.conf.HiveConf.getVar(HiveConf.java: 4996 ) at org.apache.hadoop.hive.conf.HiveConf.getVar(HiveConf.java: 5069 ) at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java: 5156 ) at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java: 5104 ) at org.apache.hive.beeline.HiveSchemaTool.<init>(HiveSchemaTool.java: 96 ) at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java: 1473 ) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java: 62 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java: 43 ) at java.lang.reflect.Method.invoke(Method.java: 498 ) at org.apache.hadoop.util.RunJar.run(RunJar.java: 323 ) at org.apache.hadoop.util.RunJar.main(RunJar.java: 236 ) Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8 at [row,col,system-id]: [ 3215 , 96 , "file:/opt/hive/conf/hive-site.xml" ] at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java: 621 ) at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java: 491 ) at com.ctc.wstx.sr.StreamScanner.reportIllegalChar(StreamScanner.java: 2456 ) at com.ctc.wstx.sr.StreamScanner.validateChar(StreamScanner.java: 2403 ) at com.ctc.wstx.sr.StreamScanner.resolveCharEnt(StreamScanner.java: 2369 ) at com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java: 1515 ) at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java: 2828 ) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java: 1123 ) at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java: 3347 ) at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java: 3141 ) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java: 3034 ) ... 15 more
解决办法:
1、com.google.common.base.Preconditions.checkArgument这个类所在的jar包为:guava.jar
2、hadoop-3.2.1(路径:hadoop\share\hadoop\common\lib)中该jar包为 guava-27.0-jre.jar;而hive-3.1.2(路径:hive/lib)中该jar包为guava-19.0.1.jar
3、将jar包变成一致的版本:删除hive中低版本jar包,将hadoop中高版本的复制到hive的lib中。
再次启动问题得到解决!