hive的详细介绍，安装，部署-阿里云开发者社区

介绍：

一、什么是hive？？？

1,hive是基于Hadoop的一个数据仓库工具、

2,可以将结构化的数据文件映射为一张数据库表，并提供类sql的查询功能、

3,可以将sql语句转换为mapreduce任务进行运行、

4,可以用来进行数据提取转换加载（ETL）

5,hive是sql解析引擎，它将sql 语句转换成M/R job然后在Hadoop中运行。

hive的表其实就是HDFS的目录/文件夹。

hive表中的数据就是hdfs目录中的文件。按表名把文件夹分开。如果是分区表，则分区值是子文件夹，可以直接在M/R job里使用这些数据.

6,hive优点与缺点：

可以提供类SQL语句快速实现简单的mapreduce统计，不需要开发专门的mapreduce应用

不支持实时查询

7，hive数据分为真实存储的数据和元数据

真实数据存储在hdfs中，元数据存储在mysql中

metastore 元数据存储数据库

Hive将元数据存储在数据库中，如MySQL、derby。

Hive中的元数据包括表的名字，表的列和分区及其属性，表的属性（是否为外部表等），表的数据所在目录等。

二、hive的体系架构：

用户接口，包括 CLI（shell），JDBC/ODBC，WebUI(通过浏览器)

元数据存储，通常是存储在关系数据库如 mysql, derby 中

解释器、编译器、优化器、执行器完成HQL查询语句从语法分析，编译，优化以及查询计划的生成，生成的查询计划存储在HDFS中，并随后被mapreduce调用执行

Hadoop：用 HDFS 进行存储，利用 MapReduce 进行计算（带*的查询select * from teacher不会生成mapreduce任务，只是进行全表扫描）

在此强调：

Hadoop，zookpeer，spark，kafka,mysql已经正常启动

三、开始安装部署hive

基础依赖环境：

 
         1，jdk   1.6+ 
        
         2, hadoop 2.x 
        
         3，hive 0.13-0.19 
        
         4，mysql   （mysql-connector-jar）

安装详细如下：

 
         #java  
        
         export 
         JAVA_HOME=
         /soft/jdk1
         .7.0_79/ 
        
         export 
         CLASSPATH=.:$JAVA_HOME
         /lib/dt
         .jar:$JAVA_HOME
         /lib/tools
         .jar 
        
         #bin 
        
         export 
         PATH=$PATH:/$JAVA_HOME
         /bin
         :$HADOOP_HOME
         /bin
         :$SCALA_HOME
         /bin
         :$SPARK_HOME
         /bin
         :
         /usr/local/hadoop/hive/bin 
        
         #hadoop 
        
         export 
         HADOOP_HOME=
         /usr/local/hadoop/hadoop 
        
         #scala 
        
         export 
         SCALA_HOME=
         /usr/local/hadoop/scala 
        
         #spark 
        
         export 
         SPARK_HOME=
         /usr/local/hadoop/spark 
        
         #hive 
        
         export 
         HIVE_HOME=
         /usr/local/hadoop/hive

一、开始安装：

1，下载：

 
         https:
         //hive
         .apache.org
         /downloads
         .html

解压：

 
         tar  
         xvf   apache-hive-2.1.0-bin.
         tar
         .gz  -C  
         /usr/local/hadoop/ 
        
         cd  
         /usr/local/hadoop/ 
        
         mv   
         apache-hive-2.1.0   hive

2，修改配置

 
         修改启动环境 
        
         cd   
         /usr/local/hadoop/hive 
        
         vim bin
         /hive-config
         .sh 
        
         #java  
        
         export 
         JAVA_HOME=
         /soft/jdk1
         .7.0_79/ 
        
         #hadoop 
        
         export 
         HADOOP_HOME=
         /usr/local/hadoop/hadoop 
        
         #hive 
        
         export 
         HIVE_HOME=
         /usr/local/hadoop/hive

修改默认配置文件

 
         cd   
         /usr/local/hadoop/hive 
        
         vim conf
         /hive-site
         .xml 
        
         <configuration> 
        
         <property> 
        
         <name>javax.jdo.option.ConnectionURL<
         /name
         > 
        
         <value>jdbc:mysql:
         //master
         :3306
         /hive
         ?createDatabaseInfoNotExist=
         true
         <
         /value
         >  
        
         <description>JDBC connect string 
         for 
         a JDBC metastore<
         /description
         > 
        
         <
         /property
         > 
        
         <property> 
        
         <name>javax.jdo.option.ConnectionDriverName<
         /name
         > 
        
         <value>com.mysql.jdbc.Driver<
         /value
         > 
        
         <description>Driver class name 
         for 
         a JDBC metastore<
         /description
         > 
        
         <
         /property
         > 
        
         <property> 
        
         <name>javax.jdo.option.ConnectionUserName<
         /name
         > 
        
         <value>hive<
         /value
         > 
        
         <description>Username to use against metastore database<
         /description
         > 
        
         <
         /property
         > 
        
         <property> 
        
         <name>javax.jdo.option.ConnectionPassword<
         /name
         > 
        
         <value>xujun<
         /value
         > 
        
         <description>password to use against metastore database<
         /description
         > 
        
         <
         /property
         > 
        
         <
         /configuration
         >

3，修改tmp dir

修改将含有"system:java.io.tmpdir"的配置项的值修改为如上地址

/tmp/hive

4，安装mysql driver

去mysql官网下载驱动mysql-connector-java-5.1.40.zip

unzip mysql-connector-java-5.1.40.zip

cp mysql-connector-java-5.1.40-bin.jar /user/lcoal/hadoop/hive/lib/

二、安装好mysql，并且启动

1.创建数据库

 
         create 
         database 
         hive  
        
         grant 
         all 
         on 
         *.* 
         to  
         hive@
         '%'  
         identified 
         by 
         'hive'
         ; 
        
         flush  
         privileges
         ;

三，初始化hive(初始化metadata)

 
         cd   
         /usr/local/hadoop/hive 
        
         bin
         /schematool 
         -initSchema -dbType mysql  
        
         SLF4J: See http:
         //www
         .slf4j.org
         /codes
         .html
         #multiple_bindings for an explanation. 
        
         SLF4J: Actual binding is of 
         type 
         [org.apache.logging.slf4j.Log4jLoggerFactory] 
        
         Metastore connection URL: jdbc:mysql:
         //hadoop3
         :3306
         /hive
         ?createDatabaseInfoNotExist=
         true 
        
         Metastore Connection Driver : com.mysql.jdbc.Driver 
        
         Metastore connection User: hive 
        
         Starting metastore schema initialization to 2.1.0 
        
         Initialization script hive-schema-2.1.0.mysql.sql 
        
         Initialization script completed 
        
         schemaTool completed

四、启动

 
         [hadoop@hadoop1 hadoop]$ hive
         /bin/hive 
        
         which
         : no hbase 
         in 
         (
         /usr/lib64/qt-3
         .3
         /bin
         :
         /usr/local/bin
         :
         /bin
         :
         /usr/bin
         :
         /usr/local/sbin
         :
         /usr/sbin
         :
         /sbin
         :
         //soft/jdk1
         .7.0_79
         //bin
         :
         /bin
         :
         /bin
         :
         /bin
         :
         /usr/local/hadoop/hive/bin
         :
         /home/hadoop/bin
         ) 
        
         SLF4J: Class path contains multiple SLF4J bindings. 
        
         SLF4J: Found binding 
         in 
         [jar:
         file
         :
         /usr/local/hadoop/hive/lib/log4j-slf4j-impl-2
         .4.1.jar!
         /org/slf4j/impl/StaticLoggerBinder
         .class] 
        
         SLF4J: Found binding 
         in 
         [jar:
         file
         :
         /usr/local/hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1
         .7.10.jar!
         /org/slf4j/impl/StaticLoggerBinder
         .class] 
        
         SLF4J: See http:
         //www
         .slf4j.org
         /codes
         .html
         #multiple_bindings for an explanation. 
        
         SLF4J: Actual binding is of 
         type 
         [org.apache.logging.slf4j.Log4jLoggerFactory] 
        
         Logging initialized using configuration 
         in 
         jar:
         file
         :
         /usr/local/hadoop/hive/lib/hive-common-2
         .1.0.jar!
         /hive-log4j2
         .properties Async: 
         true 
        
         Hive-on-MR is deprecated 
         in 
         Hive 2 and may not be available 
         in 
         the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases. 
        
         hive> show databases; 
        
         OK 
        
         default 
        
         Time taken: 1.184 seconds, Fetched: 1 row(s) 
        
         hive> 
        
         五，实践操作 
        
         使用hive创建表 
        
         以下两个操作只是针对当前session终端 
        
         1，hive> 
         set  
         hive.cli.print.current.db=
         true
         ;   设置显示当前数据库名 
        
         hive (default)>  
        
         2，hive (default)> 
         set  
         hive.cli.print.header=
         true
         ;    当使用
         select 
         查询数据时候，显示的结果会带有表的字段名称 
        
         3，创建表，并导入数据 
        
         hive> create table teacherq(
         id 
         bigint,name string) row 
         format 
         delimited fields terminated by 
         '\t'
         ; 
        
         OK 
        
         hive> create table people (
         id 
         int ,name string); 
        
         OK 
        
         Time taken: 3.363 seconds 
        
         hive> SHOW TABLES; 
        
         OK 
        
         people 
        
         teacherq 
        
         student 
        
         Time taken: 0.283 seconds, Fetched: 1 row(s) 
        
         导入数据： 
        
         hive>load  data  
         local  
         inpath 
         '/root/stdent.txt'  
         into table teacherq; 
        
         注意：如果你是普通用户启动hive，则使用相对路径来导入本地数据 
        
         mv 
         stdent.txt 
         /usr/local/hadoop/hive/ 
        
         cd   
         /usr/local/hadoop/hive 
        
         > load  data  
         local  
         inpath  
         'stdent.txt'  
         into table teacherq; 
        
         Loading data to table default.teacherq 
        
         OK 
        
         Time taken: 2.631 seconds 
        
         hive> 
         select 
         * from teacherq; 
        
         OK 
        
         1   zhangsan 
        
         2   lisi 
        
         3   wangwu 
        
         4   libai 
        
         Time taken: 1.219 seconds, Fetched: 4 row(s) 
        
         hive>

4.建表(默认是内部表)

适用于先创建表，后load加载数据、

 
         create 
         table 
         trade_detail(id 
         bigint
         , account string, income 
         double
         , expenses 
         double
         , 
         time 
         string) row format delimited fields terminated 
         by 
         '\t'
         ;

默认普通表load数据：

 
         load 
         data  
         local 
         inpath 
         '/root/student.txt'  
         into  
         table 
         student;

建外部表

适用于，hdfs先有数据，后创建表，进行数据查询，分析管理

 
    
     
       
       
         create 
         external 
         table 
         td_ext(id 
         bigint
         , account string, income 
         double
         , expenses 
         double
         , 
         time 
         string) row format delimited fields terminated 
         by 
         '\t' 
         location 
         '/td_ext'
         ; 
        
 
     
 
    
  

外部表load数据：

 
         load 
         data  
         local 
         inpath 
         '/root/student.txt'  
         into  
         table 
         student;

建分区表

方法一：先创建分区表，然后load数据

partition就是辅助查询，缩小查询范围，加快数据的检索速度和对数据按照一定的规格和条件进行管理。

 
         create 
         table 
         td_part(id 
         bigint
         , account string, income 
         double
         , expenses 
         double
         , 
         time 
         string) partitioned 
         by 
         (logdate string) row format delimited fields terminated 
         by 
         '\t'
         ;

分区表中load数据

load data local inpath '/root/data.am' into table beauty partition (nation="USA");

hive (itcast)> select * from beat;

beat.idbeat.namebeat.sizebeat.nation

1glm22.0china

2slsl21.0china

3sdsd20.0china

NULLwww19.0china

Time taken: 0.22 seconds, Fetched: 4 row(s)

方法二：先在hdfs 创建目录，倒入数据，最后，更改hive元数据的信息

1，创建分区目录

hive (itcast)> dfs -mkdir /beat/nation=japan

dfs -ls /beat；

Found 2 items

drwxr-xr-x - hadoop supergroup 0 2016-12-05 16:07 /beat/nation=china

drwxr-xr-x - hadoop supergroup 0 2016-12-05 16:16 /beat/nation=japan

2，为分区目录加载数据

hive (itcast)> dfs -put d.c /beat/nation=japan

此时查询数据：数据还未加载进来。

hive (itcast)> dfs -ls /beat/nation=japan;

Found 1 items

-rw-r--r-- 3 hadoop supergroup 20 2016-12-05 16:16 /beat/nation=japan/d.c

hive (itcast)> select * from beat;

beat.idbeat.namebeat.sizebeat.nation

1glm22.0china

2slsl21.0china

3sdsd20.0china

NULLwww19.0china

Time taken: 0.198 seconds, Fetched: 4 row(s)

3，手动修改hive表结构，添加分区表信息

hive (itcast)> alter table beat add partition (nation='japan') location "/beat/nation=japan";

Time taken: 0.089 seconds

hive (itcast)> select * from beat;

beat.idbeat.namebeat.sizebeat.nation

1glm22.0china

2slsl21.0china

3sdsd20.0china

NULLwww19.0china

7ab111.0japan

8rb23234.0japan

Time taken: 0.228 seconds, Fetched: 6 row(s)

此时数据加载完成。

删除分区

用户可以用 ALTER TABLE DROP PARTITION 来删除分区。分区的元数据和数据将被一并删除。

例：

ALTER TABLE beat DROP PARTITION (nation='japan');

特殊情况案例：

1，表中的某个字段需要作为分区的分区名，默认不允许创建,解决方法：

hive (itcast)> create table sms(id bigint ,content string,area string) partitioned by (area string) row format delimited fields terminated by '\t' ;

FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns

解决方法：

建立冗余字段，即使用 area_pat来区分，

或者修改源码

hive (itcast)> create table sms(id bigint ,content string,area string) partitioned by (area_pat string) row format delimited fields terminated by '\t' ;

本文转自crazy_charles 51CTO博客，原文链接：http://blog.51cto.com/douya/1878779，如需转载请自行联系原作者

hive的详细介绍，安装，部署

热门文章

最新文章

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

hive的详细介绍，安装，部署

热门文章

最新文章

相关课程

相关电子书

相关实验场景