Hive集成Hue安装部署

2023-09-14 98

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云数据库 RDS MySQL Serverless，0.5-2RCU 50GB

简介： Hive集成Hue安装部署

1 Apache Hue简介

hue是一个用于数据仓库的开源sql工作台，是一个成熟的开源SQL助手，通过使用Hue我们可以在浏览

器端的Web控制台上进行交互来处理数据，支持的环境有: Hadoop、Hive、HBase、MapReduceJob、Solr、JDBC相关的数据等等。

2 Apache Hue 介绍

2.1 Hue 是什么

HUE=Hadoop User ExperienceHue 是一个开源的 Apache Hadoop UI 系统，由 Cloudera Desktop 演化而来，最后 Cloudera 公司将其贡献给 Apache 基金会的 Hadoop 社区，它是基于Python Web 框架 Django 实现的。通过使用 Hue，可以在浏览器端的 Web 控制台上与 Hadoop 集群进行交互，来分析处理数据，例如操作 HDFS 上的数据，运行 MapReduce Job，执行 Hive的 SQL 语句，浏览 HBase 数据库等等。

2.2 Hue 能做什么

访问 HDFS 和文件浏览
通过 web 调试和开发 hive 以及数据结果展示
查询 solr 和结果展示，报表生成
通过 web 调试和开发 impala 交互式 SQL Query
spark 调试和开发
Pig 开发和调试
oozie 任务的开发，监控，和工作流协调调度
Hbase 数据查询和修改，数据展示
Hive 的元数据（metastore）查询
MapReduce 任务进度查看，日志追踪
创建和提交 MapReduce，Streaming，Java job 任务
Sqoop2 的开发和调试
Zookeeper 的浏览和编辑数据库（MySQL，PostGres，SQlite，Oracle）的查询和展示

2.3 Hue 的架构

Hue 是一个友好的界面集成框架，可以集成各种大量的大数据体系软件框架，通过一个界面就可以做到查看以及执行所有的框架。 Hue 提供的这些功能相比 Hadoop 生态各组件提供的界面更加友好，但是一

些需要 debug 的场景可能还是要使用原生系统才能更加深入的找到错误的原因。

3 Hue 的安装

3.1 上传解压安装包

Hue 的安装支持多种方式，包括 rpm 包的方式进行安装、tar.gz 包的方式进行安装以及 cloudera manager 的方式来进行安装等，我们这里使用 tar.gz 包的方式来进行安装。 Hue 的压缩包的下载地址：

http://archive.cloudera.com/cdh5/cdh/5/

我们这里使用的是 CDH5.14.0 这个对应的版本，具体下载地址为

http://archive.cloudera.com/cdh5/cdh/5/hue-3.9.0-cdh5.14.0.tar.gz

cd /export/servers/
tar -zxvf hue-3.9.0-cdh5.14.0.tar.gz

3.2 编译初始化工作

3.2.1 联网安装各种必须的依赖包

yum install -y asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gccc++ krb5-devel libffi-devel libxml2-devel libxslt-devel make openldap-devel
python-devel sqlite-devel gmp-devel

3.2.2 Hue 初始化配置

cd /export/servers/hue-3.9.0-cdh5.14.0/desktop/conf
vim hue.ini
#通用配置
[desktop]
secret_key=jFE93j;2[290-eiw.KEiwN2s3['d;/.q[eIW^y#e=+Iei*@Mn<qW5o
http_host=node-1
is_hue_4=true
time_zone=Asia/Shanghai
server_user=root
server_group=root
default_user=root
default_hdfs_superuser=root
#配置使用 mysql 作为 hue 的存储数据库,大概在 hue.ini 的 587 行左右
[[database]]
engine=mysql
host=node-1
port=3306
user=root
password=Hadoop
name=hue

3.2.3 创建 mysql 中 Hue 使用的 DB

create database hue default character set utf8 default 
collate utf8_general_ci;

3.3 编译 Hue

cd /export/servers/hue-3.9.0-cdh5.14.0
make apps

编译成功之后，会在 hue 数据库中创建许多初始化表。

3.4 启动 Hue、Web UI 访问

cd /export/servers/hue-3.9.0-cdh5.14.0/
build/env/bin/supervisor

页面访问路径：

http://node-1:8888

第一次访问的时候，需要设置超级管理员用户和密码。记住它。

若想关闭 Hue ,直接在窗口 ctrl+c 即可。

4 Hue 与软件的集成

4.1 Hue 集成 HDFS

注意修改完 HDFS 相关配置后，需要把配置 scp 给集群中每台机器，重启 hdfs

集群。

4.1.1 修改 core-site.xml 配置

<!—允许通过 httpfs 方式访问 hdfs 的主机名 -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<!—允许通过 httpfs 方式访问 hdfs 的用户组 -->
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>

4.1.2 修改 hdfs-site.xml 配置

<property>
 <name>dfs.webhdfs.enabled</name>
 <value>true</value>
</property>

4.1.3 修改 hue.ini

cd /export/servers/hue-3.9.0-cdh5.14.0/desktop/conf
vim hue.ini
[[hdfs_clusters]]
 [[[default]]]
fs_defaultfs=hdfs://node-1:9000
webhdfs_url=http://node-1:50070/webhdfs/v1
hadoop_hdfs_home= /export/servers/hadoop-2.7.5
hadoop_bin=/export/servers/hadoop-2.7.5/bin
hadoop_conf_dir=/export/servers/hadoop-2.7.5/etc/hadoop

4.1.4 重启 HDFS、Hue

start-dfs.sh
cd /export/servers/hue-3.9.0-cdh5.14.0/
build/env/bin/supervisor

4.2． Hue 集成 YARN

4.2.1 修改 hue.ini

[[yarn_clusters]]
 [[[default]]]
 resourcemanager_host=node-1
 resourcemanager_port=8032
 submit_to=True
 resourcemanager_api_url=http://node-1:8088
 history_server_api_url=http://node-1:19888

4.2.2 开启 yarn 日志聚集服务

MapReduce 是在各个机器上运行的，在运行过程中产生的日志存在于各个

机器上，为了能够统一查看各个机器的运行日志，将日志集中存放在 HDFS 上，

这个过程就是日志聚集。

<property> ##是否启用日志聚集功能。
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property> ##设置日志保留时间，单位是秒。
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>

4.2.3 重启 Yarn、Hue

build/env/bin/supervisor

4.3 Hue 集成 Hive

如果需要配置 hue 与 hive 的集成，我们需要启动 hive 的 metastore 服务以及 hiveserver2 服务（impala 需要 hive 的 metastore 服务，hue 需要 hvie 的

hiveserver2 服务）。

4.3.1 修改 Hue.ini

[beeswax]
 hive_server_host=node-1
 hive_server_port=10000
 hive_conf_dir=/export/servers/hive/conf
 server_conn_timeout=120
 auth_username=root
 auth_password=123456
[metastore]
 #允许使用 hive 创建数据库表等操作
 enable_new_create_table=true

4.3.2 启动 Hive 服务、重启 hue

去 node-1 机器上启动 hive 的 metastore 以及 hiveserver2 服务

cd /export/servers/hive
nohup bin/hive --service metastore &
nohup bin/hive --service hiveserver2 &

重新启动 hue。

cd /export/servers/hue-3.9.0-cdh5.14.0/
build/env/bin/supervisor

4.4 Hue 集成 Mysql

4.4.1 修改 hue.ini

需要把 mysql 的注释给去掉。 大概位于 1546 行
[[[mysql]]]
 nice_name="My SQL DB"
 engine=mysql
 host=node-1
 port=3306
 user=root
 password=hadoop

4.4.2 重启 hue

cd /export/servers/hue-3.9.0-cdh5.14.0/
build/env/bin/supervisor

4.5 Hue 集成 Oozie

大数据Oozie任务调度

4.6 Hue 集成 Hbase

4.6.1 修改 hbase 配置

在 hbase-site.xml 配置文件中的添加如下内容，开启 hbase thrift 服务。

修改完成之后 scp 给其他机器上 hbase 安装包。

<property>
<name>hbase.thrift.support.proxyuser</name>
<value>true</value>
</property>
<property>
<name>hbase.regionserver.thrift.http</name>
<value>true</value>
</property>

4.6.2 修改 hadoop 配置

在 core-site.xml 中确保 HBase 被授权代理，添加下面内容。

把修改之后的配置文件 scp 给其他机器和 hbase 安装包 conf 目录下。

<property>
<name>hadoop.proxyuser.hbase.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hbase.groups</name>
<value>*</value>
</property>

4.6.3 修改 Hue 配置

[hbase]
 # Comma-separated list of HBase Thrift servers for clusters in the format of 
'(name|host:port)'.
 # Use full hostname with security.
 # If using Kerberos we assume GSSAPI SASL, not PLAIN.
 hbase_clusters=(Cluster|node-1:9090)
 # HBase configuration directory, where hbase-site.xml is located.
 hbase_conf_dir=/export/servers/hbase-1.2.1/conf
 # Hard limit of rows or columns per row fetched before truncating.
 ## truncate_limit = 500
 # 'buffered' is the default of the HBase Thrift Server and supports security.
 # 'framed' can be used to chunk up responses,
 # which is useful when used in conjunction with the nonblocking server in Thrift.
thrift_transport=buffered

4.6.4 启动 hbase(包括 thrift 服务)、hue

需要启动 hdfs 和 hbase，然后再启动 thrift。

start-dfs.sh

start-hbase.sh

hbase-daemon.sh start thrift