hive-3.1.2安装以及使用tez作为执行引擎指南
hive是构建于hadoop之上的、基于SQL的分布式关系型数据库。
为了成功安装好hive,首先确保
已经安装
安装包下载与解压
cd /data
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
tar zxvf apache-hive-3.1.2-bin.tar.gz
ln -s /data/apache-hive-3.1.2-bin /data/hive
配置文件修改
1 修改/etc/profile
vi /etc/profile
# 新增以下内容
export HIVE_HOME=/data/hive
export PATH=$PATH:$HIVE_HOME/bin
# 刷新配置环境
source /etc/profile
2 查看hive版本
hive --version
Hive 3.1.2
Git git://HW13934/Users/gates/tmp/hive-branch-3.1/hive -r 8190d2be7b7165effa62bd21b7d60ef81fb0e4af
Compiled by gates on Thu Aug 22 15:01:18 PDT 2019
From source with checksum 0492c08f784b188c349f6afb1d8d9847
3 复制hive-default.xml.template,得到一份hive-site.xml
cp hive-default.xml.template hive-site.xml
4 复制hive-env.sh.template,得到一份hive-env.sh
cp hive-env.sh.template hive-env.sh
在hive-env.sh填入如下内容
JAVA_HOME=/data/jdk8
HADOOP_HOME=/data/hadoop
HIVE_HOME=/data/hive
export TEZ_CONF_DIR=/data/tez/conf
export TEZ_JARS=/data/tez/*:/data/tez/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
- hive需要使用关系型数据库来存储元数据,默认使用derby,这边使用mysql,如果你没有安装mysql可参考文章进行安装,同时授权hadoop1和hadoop2节点可以访问mysql
接下来修改hive-site.xml
新建文件夹:
mkdir -p /data/hive/logs
修改权限为777
chmod -R 777 /data/hive/logs
5.1 配置mysql元数据库
这边的mysql数据库地址为:
hostname: hadoop2
username: root
password:
# 修改以下几个配置项
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop2:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Pass-123-root</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/data/hive/logs/hive/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/data/hive/logs/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/data/hive/logs/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/data/hive/logs/hive</value>
<description>Location of Hive run time structured log file</description>
</property>
5.2 修改执行引擎为tez
<property>
<name>hive.execution.engine</name>
<value>tez</value>
<description>
Expects one of [mr, tez, spark].
Chooses execution engine. Options are: mr (Map reduce, default), tez, spark. While MR
remains the default engine for historical reasons, it is itself a historical engine
and is deprecated in Hive 2 line. It may be removed without further warning.
</description>
</property>
6 下载mysql-jdbc到hive/lib目录下
cd /data/hive/lib && wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/5.1.49/mysql-connector-java-5.1.49.jar
初始化元数据
schematool -dbType mysql -initSchema
在进行元数据初始化过程中,可能会有如下报错,针对这个问题,只要将报错信息中的对应行删除即可(注意rol, col, system-id对应的值)。
2021-08-12 16:15:58,896 INFO [main] conf.HiveConf (HiveConf.java:findConfigFile(187)) - Found configuration file file:/data/apache-hive-3.1.2-bin/conf/hive-site.xml
2021-08-12 16:15:59,118 ERROR [main] conf.Configuration (Configuration.java:loadResource(2980)) - error parsing conf file:/data/apache-hive-3.1.2-bin/conf/hive-site.xml
org.apache.hadoop.shaded.com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8
at [row,col,system-id]: [3215,96,"file:/data/apache-hive-3.1.2-bin/conf/hive-site.xml"]
修改hadoop相关配置
修改hadoop中的core-site.xml,新增配置
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>hadoop2</value>
</property>
重启hdfs、yarn
hadoop2节点执行
hdfs --daemon stop namenode
hdfs --daemon start namenode
hdfs --daemon stop datanode
hdfs --daemon start datanode
yarn --daemon stop resourcemanager
yarn --daemon start resourcemanager
yarn --daemon stop nodemanager
yarn --daemon start nodemanager
hadoop1节点执行
hdfs --daemon stop namenode
hdfs --daemon start namenode
hdfs --daemon stop datanode
hdfs --daemon start datanode
yarn --daemon stop nodemanager
yarn --daemon start nodemanager
修改hdfs上新建/user/hive目录并修改/user/hive的目录权限
useradd hive
hdfs dfs -mkdir /user/hive
hdfs dfs -chown -R hive:supergroup /user/hive
启动hive metastore和hiveserver2
11 切换到hive用户,后台启动hive metastore和hiveserver2
su hive
nohup hive --service metastore > /data/hive/logs/hive-metastore.log 2>&1 &
nohup hive --service hiveserver2 > /data/hive/logs/hiveserver2.log 2>&1 &
12 使用beeline连接hiveserver2
[hive@hadoop2 logs]$ beeline
Beeline version 3.1.2 by Apache Hive
beeline> !connect jdbc:hive2://hadoop2:10000/default
Connecting to jdbc:hive2://hadoop2:10000/default
Enter username for jdbc:hive2://hadoop2:10000/default: hive
Enter password for jdbc:hive2://hadoop2:10000/default: ****
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://hadoop2:10000/default>
hive基本功能测试
create database test;
use test;
create table test(a string);
insert into test values("tom");
select * from test group by a;