应用场景
当我们按照hadoop完全分布式集群搭建博客搭建了hadoop以后,发现这是一个空的hadoop,只有YARN,MapReduce,HDFS,而这些实际上我们一般不会直接使用,而是需要另外部署Hadoop的其他组件,来辅助使用。比如我们把数据存储到了hdfs,都是文件格式,用起来肯定不方便,用HIVE把数据从HDFS映射成表结构,直接用sql语句即可操作数据。另外针对分布式数据计算算法MapReduce,需要直接写MapReduce程序,比较复杂,此时使用Hive,就可以通过写SQL语句,来实现MapReduce的功能实现。
操作步骤
注意:首先需要注意的是让Hadoop完全分布式环境跑起来,然后只需要在namenode节点安装hive即可!
1. hive包下载
2. 解压缩配置环境变量
# cd /opt # hive包的目录放到服务器的opt目录下
# tar -xzvf apache-hive-2.1.1-bin.tar.gz # 将压缩包进行解压
# mv apache-hive-2.1.1-bin hive2.1.1 #更换hive的目录名为hive2.1.1
# vim /etc/profile # 修改环境变量配置文件
export JAVA_HOME=/opt/jdk1.8
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/opt/hadoop2.6.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HIVE_HOME=/opt/hive2.1.1
export HIVE_CONF_DIR=$HIVE_HOME/conf
export CLASSPATH=.:$HIVE_HOME/lib:$CLASSPATH
export PATH=$PATH:$HIVE_HOME/bin
# source /etc/profile #使配置生效
3. 修改Hive配置
3.1 生成 hive-site.xml配置文件
# cd /opt/hive2.1.1/conf/
# cp hive-default.xml.template hive-site.xml
3.2 创建HDFS目录
注意:我们需要在HDFS创建/user/hive/warehouse,/tmp/hive这两个目录,因为在修改hive-site.xml配置文件的时候需要使用该目录!
# hdfs dfs -mkdir -p /user/hive/warehouse # 创建warehouse目录
# hdfs dfs -chmod 777 /user/hive/warehouse # 给warehouse目录进行赋权
# hdfs dfs -mkdir -p /tmp/hive/ # 创建warehouse目录
# hdfs dfs -chmod 777 /tmp/hive # 给warehouse目录进行赋权
3.3 修改hive-site.xml文件中的临时目录
将${system:java.io.tmpdir}全部替换为/opt/hive2.1.1/tmp/【该目录需要自己手动建】,将${system:user.name}都替换为root
<property>
<name>hive.exec.local.scratchdir</name>
<value>${system:java.io.tmpdir}/${system:user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>${system:java.io.tmpdir}/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>${system:java.io.tmpdir}/${system:user.name}</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>${system:java.io.tmpdir}/${system:user.name}/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
替换后
<property>
<name>hive.exec.local.scratchdir</name>
<value>/opt/hive2.1.1/tmp/root</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/opt/hive2.1.1/tmp/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/opt/hive2.1.1/tmp/root</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/opt/hive2.1.1/tmp/root/operation_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
3.4 修改hive-site.xml文件,修改文件中的元数据的连接,驱动,用户名,密码
hive-site.xml中相关元数据信息配制:
javax.jdo.option.ConnectionDriverName,将对应的value修改为MySQL驱动类路径;
javax.jdo.option.ConnectionURL,将对应的value修改为MySQL的地址;
javax.jdo.option.ConnectionUserName,将对应的value修改为MySQL数据库登录名;
javax.jdo.option.ConnectionPassword,将对应的value修改为MySQL数据库的登录密码:
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.210.70:3306/hive?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>11111</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description> Enforce metastore schema version consistency. True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
3.5 下载mysql驱动,并上传到hive中
下载后,上传到/opt/hive2.1.1/lib目录下
3.6 修改hive-env.sh文件
# cd /opt/hive2.1.1/conf
# cp hive-env.sh.template hive-env.sh
打开hive-env.sh配置并且添加以下内容:
export HADOOP_HOME=/opt/hadoop2.6.0
export HIVE_CONF_DIR=/opt/hive2.1.1/conf
export HIVE_AUX_JARS_PATH=/opt/hive2.1.1/lib
4. 启动hive
# cd /opt/hive2.1.1/bin
# schematool -initSchema -dbType mysql # 对数据库进行初始化