Hive学习-数据定义语句-阿里云开发者社区

一、概要

1. Hive SQL和SQL之间的几个主要区别

1)存储位置：Hive的数据存储在Hadoop上，而像MySQL这样的关系型数据库将数据存储在设备或本地系统中。

2)Hive不支持等值连接，通常使用left join、right join 或者inner join替代。例如，在SQL中内关联可以这样写：'select * from a, b where a.key = b.key'，而在Hive中则不能这样做。

3)Hive不支持“Insert into 表 Values ()”、UPDATA、DELETE操作，而标准的SQL是支持这些操作的。

4)Hive不支持事务，这与许多关系型数据库的支持事务的特性不同。

2 常见Hive SQL语句

创建库：create database 库名；

删除库：drop database 库名；

修改库：alter database 库名 set dbproperties('键'='值');

查看库：desc database 库名；

创建表：create table 表名；

删除表：drop table 表名；

修改表：alter table 表名 SET TBLPROPERTIES('external'='true'修改为外部表/'external'='false'修改为内部表);

查看表：select * from 表名；

查看表类型：desc formatted 表名；

插入数据：insert into table 表名 partition(分区位置) values(值1),(值2);

删除表中的数据：truncate table 表名；

追加覆盖数据：insert overwrite table 表名 partition(分区位置) select 列名1，列名2 from 表名 where 位置;

查询数据：select * from 表名；

二、对数据库的操作

1 创建数据库

代码语言：javascript

复制

hive>create database if not exists table_name [WITH DBPROPERTIES('createtime'='20240329')] [location 'home/myhive2'];

说明：hive的表默认存放位置模式是由hive-site.xml当中的一个属性指定的 :hive.metastore.warehouse.dir创建数据库并指定hdfs存储位置。

2修改数据库

代码语言：javascript

复制

hive>alter database table_name set dbproperties('createtime'='20240329');

来修改数据库的一些属性。但是数据库的元数据信息是不可更改的，包括数据库的名称以及数据可以使用alter database 命令库所在的位置。

3查看数据库信息

1）查看数据库基本信息

代码语言：javascript

复制

hive>desc database table_name;

2）查看数据库详细信息

代码语言：javascript

复制

hive>desc database extended table_name;

4删除数据库

1）删除一个空数据库

如果数据库下面有数据表，那么就会报错。

代码语言：javascript

复制

hive> drop database table_name;

2）强制删除数据库

包含数据库下面的表一起删除。

代码语言：javascript

复制

hive> drop database table_name  cascade;

5是否显示当前数据库？

代码语言：javascript

复制

hive>set hive.cli.print.current.db=true;

三、对数据表的操作

1 创建数据表语法

代码语言：javascript

复制

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
-- (注意：Hive 0.14.0及更高版本中提供临时)
  [(col_name data_type [column_constraint_specification] [COMMENT col_comment], ... [constraint_specification])]
  [COMMENT table_comment]
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  [SKEWED BY (col_name, col_name, ...) 
-- (注意：Hive 0.10.0及更高版本中可用)]
     ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
     [STORED AS DIRECTORIES]    
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
   | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]
-- (注意：以及Hive 0.6.0以及更高版本中可用)
  ]
[LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)]
-- (注意：Hive 0.6.0以及更高版本中可用)  [AS select_statement]; 
-- (注意：Hive 0.5.0及更高版本中可用,不支持外表)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];

2 内部表

1）创建内部表案例

案例1：创建内部表（管理表）

代码语言：javascript

复制

hive>use demo; -- 使用myhive数据库
hive> create table student(id int comment 'student id',
                         > name string comment 'student name')
TBLPROPERTIES('created'='jery', 'created_at'='2024-8-20 11:49:00') ;
hive> insert into student values (1,"zhangsan");    
hive> insert into student values (1,"zhangsan"),(2,"lisi");  -- 一次插入多条数据
hive> select * from student;

注意：创建表后会自动增加两个属性，last_modified_by和last_modified_time

案例2：创建表并指定字段之间的分隔符

代码语言：javascript

复制

create  table if not exists student2(id int ,name string)
row format delimited fields terminated by '\t' --Tab键分割字段
stored as textfile --存储文件类型
location '/user/student2'; --存储路径

案例3：根据查询结果创建表

Create Table As Select (CTAS)

代码语言：javascript

复制

create table student3 as select * from student2;

注意：不能用于外部表。

案例4：根据已经存在的表结构创建表。

代码语言：javascript

复制

create table student4 like student3;
hive> select * from student3;
OK
student3.id       student3.name
1       zhangsan
2       lisi
Time taken: 0.093 seconds, Fetched: 2 row(s)
hive> select * from student4;
OK    
student4.id       student4.name
Time taken: 0.061 seconds
hive>

可以看出student3 有数据有结构，student4只有结构无数据。

案例5：一个完整案例

下面给出创建一个表，注入数据，查询一个完整的例子。注入数据，查询具体内容在后面介绍。

（1）创建一个表

代码语言：javascript

复制

CREATE TABLE person(
id INT,
name STRING,
age INT,
likes ARRAY,
address MAP
 )
ROW FORMAT DELIMITED--行格式分隔
FIELDS TERMINATED BY ','--字段以逗号分割
COLLECTION ITEMS TERMINATED BY '-' --集合分隔符-
MAP KEYS TERMINATED BY ':' --MAP以冒号分开
LINES TERMINATED BY '\n'; --行结束以回车

（2）建立数据文件

代码语言：javascript

复制

#mkdir -p /var/bigdata/hive/    
#cd /var/bigdata/hive/
#gedit data.txt

代码语言：javascript

复制

1,elite0,10,basketball-music-dance,adderss:xx
2,elite1,20,basketball-music-dance,adderss:xx
3,elite2,10,basketball-music-dance,adderss:xx
4,elite3,20,basketball-music-dance,adderss:xx
5,elite4,10,basketball-music-dance,adderss:xx
6,elite5,20,basketball-music-dance,adderss:xx

（3）查询

代码语言：javascript

复制

hive> select * from person;
OK
1       elite0       10       ["basketball","music","dance"]       {"adderss":"xx"}
2       elite1       20       ["basketball","music","dance"]       {"adderss":"xx"}
3       elite2       10       ["basketball","music","dance"]       {"adderss":"xx"}
4       elite3       20       ["basketball","music","dance"]       {"adderss":"xx"}
5       elite4       10       ["basketball","music","dance"]       {"adderss":"xx"}
6       elite5       20       ["basketball","music","dance"]       {"adderss":"xx"}
Time taken: 0.862 seconds, Fetched: 6 row(s)

2）查询表的结构

（1）只查询表内字段及属性

语法

代码语言：javascript

复制

hive>desc table_name;

案例

代码语言：javascript

复制

hive> desc student;
OK
id      int student id   
name    string     student name
Time taken: 0.042 seconds, Fetched: 2 row(s)

（2）查询详细查询(一)

语法

代码语言：javascript

复制

desc formatted table_name;

案例

代码语言：javascript

复制

desc formatted student;
OK
# col_name            data_type           comment      
id      int          student id   
name    string       student name
# Detailed Table Information
Database:           demo
Owner:root
CreateTime:         Tue Aug 20 11:53:05 CST 2024
LastAccessTime:     UNKNOWN
Retention:          0
Location:           hdfs://localhost:9000/user/hive/warehouse/demo.db/student    
Table Type:         MANAGED_TABLE       
Table Parameters:
COLUMN_STATS_ACCURATE       {\"BASIC_STATS\":\"true\"}
       created             jery
       created_at          2024-8-20 11:49:00
       numFiles            1
       numRows             2
       rawDataSize         16
       totalSize           18
       transient_lastDdlTime       1724126030
# Storage Information
SerDe Library:    org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:        org.apache.hadoop.mapred.TextInputFormat
OutputFormat:       org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:         No
Num Buckets:        -1
Bucket Columns:            []
Sort Columns:       []
Storage Desc Params:
serialization.format       1    
Time taken: 0.044 seconds, Fetched: 33 row(s)

（3）详细详细查询(二)

语法

代码语言：javascript

复制

hive> desc extended table_name;

案例

代码语言：javascript

复制

hive> desc extended person;
OK
id                 int
name               string
age                int
likes              array
address            map
                
Detailed Table Information       Table(tableName:person, dbName:demo, owner:root, createTime:1723004992, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:age, type:int, comment:null),
comment:null)], location:hdfs://localhost:9000/user/hive/warehouse/demo.db/person, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1,    
serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{colelction.delim=-, mapkey.delim=:, serialization.format=,, line.delim=        
, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1723005757, totalSize=276, numRows=0, rawDataSize=0, numFiles=1}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)               
Time taken: 0.054 seconds, Fetched: 8 row(s)

formatted比extended显示更详细内容，Formatted用得更多。

3）查询创建表的语句

语法

代码语言：javascript

复制

hive> show create table table_name;

案例

代码语言：javascript

复制

hive> show create table person;
OK
createtab_stmt
CREATE TABLE `person`(
  `id` int, 
  `name` string, 
  `age` int, 
  `likes` array<string>, 
  `address` map<string,string>)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'colelction.delim'='-', 
  'field.delim'=',', 
  'line.delim'='\n', 
  'mapkey.delim'=':', 
  'serialization.format'=',') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/demo.db/person'
TBLPROPERTIES (
  'transient_lastDdlTime'='1723005757')
Time taken: 0.169 seconds, Fetched: 22 row(s)

4）显示数据库中有哪些表

（1) 显示当前库内的表

代码语言：javascript

复制

hive>show tables;

（2) 显示某个库内的表

代码语言：javascript

复制

hive>show tables IN db_name;

（3) 模糊查询显示当前库内的表

代码语言：javascript

复制

hive> show tables 'stu.*';

注意：（2)，（3)不能同时使用。

5）字段类型

分类	类型	描述	字面量示例
原始类型	BOOLEAN	true/false	TRUE
	TINYINT	1字节的有符号整数 -128~127	1Y
	SMALLINT	2个字节的有符号整数，-32768~32767	1S
	INT	4个字节的带符号整数	1
	BIGINT	8字节带符号整数	1L
	FLOAT	4字节单精度浮点数1.0	1.0
	DOUBLE	8字节双精度浮点数	1.0
	DEICIMAL	任意精度的带符号小数	1.11
	STRING	字符串，变长	"a",'b'
	VARCHAR	变长字符串	"a",'b'
	CHAR	固定长度字符串	"a",'b'
	BINARY	字节数组	无法表示
	TIMESTAMP	时间戳，毫秒值精度	①
	DATE	日期	②
	INTERVAL	时间频率间隔
复杂类型	ARRAY	有序的的同类型的集合	array(1,2)
	MAP	key-value,key必须为原始类型，value可以任意类型	map('a',1, 'b',2)
	STRUCT	字段集合,类型可以不同	③
	UNION	在有限取值范围内的一个值	④

①整数：距离Unit时间（1970-1-1 00:00:00）的秒数:1327882394

浮点数:Unit时间（1970-1-1 00:00:00）的秒数+纳秒:1327882394.123456789

字符串：2024-09-03 12:56:43:123456789

②'2016-03-29'

③struct('1',1,1.0), named_stract('col1','1','col2',1,'clo3',1.0)

④create_union(1,'a',63)

对decimal类型简单解释下：

用法：decimal(11,2) 代表最多有11位数字，其中后2位是小数，整数部分是9位；如果整数部分超过9位，则这个字段就会变成null；如果小数部分不足2位，则后面用0补齐两位，如果小数部分超过两位，则超出部分四舍五入

也可直接写 decimal，后面不指定位数，默认是 decimal(10,0) 整数10位，没有小数。

6）官方对字段的定义

代码语言：javascript

复制

# 数据类型
data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (注意：Hive 0.7.0 及更高版本中可用)
 # 基本数据类型
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN    
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (注意：Hive 2.2.0 及更高版本中可用)
  | STRING
  | BINARY      -- (注意：Hive 0.8.0 以及更高版本中可用)
  | TIMESTAMP   -- (注意：Hive 0.8.0 以及更高版本中可用)
  | DECIMAL     -- (注意：Hive 0.11.0 以及更高版本中可用)
  | DECIMAL(precision, scale)  -- (注意：Hive 0.13.0 及更高版本中可用)
  | DATE        -- (注意：Hive 0.12.0 以及更高版本中可用)
  | VARCHAR     -- (注意：Hive 0.12.0 以及更高版本中可用)
  | CHAR        -- (注意：Hive 0.13.0 以及更高版本中可用)
 ## List集合
array_type
  : ARRAY < data_type >
# map集合
map_type
  : MAP < primitive_type, data_type >
# 结构
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (注意：Hive 0.7.0 及更高版本中可用)    
# 行格式，分隔符
row_format
  : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
        [NULL DEFINED AS char]   -- (注意：Hive 0.13.0 以及更高版本中可用)
  | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
 # 文件格式
file_format:
  : SEQUENCEFILE
  | TEXTFILE    -- (默认值，取决于hive.Default.fileformat配置)
  | RCFILE      -- (注意：Hive 0.6.0 以及更高版本中可用)
  | ORC         -- (注意：Hive 0.11.0 以及更高版本中可用)
  | PARQUET     -- (注意：Hive 0.13.0 以及更高版本中可用)
  | AVRO        -- (注意：Hive 0.14.0 以及更高版本中可用)
  | JSONFILE    -- (注意：Hive 4.0.0 以及更高版本中可用)
  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
# 列约束
column_constraint_specification:    
  : [ PRIMARY KEY|UNIQUE|NOT NULL|DEFAULT [default_value]|CHECK  [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
 
default_value:
  : [ LITERAL|CURRENT_USER()|CURRENT_DATE()|CURRENT_TIMESTAMP()|NULL ]
constraint_specification:
  : [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
    [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
    [, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]
           
NOVALIDATE
    [, CONSTRAINT constraint_name UNIQUE (col_name, ...) DISABLE NOVALIDATE RELY/NORELY ]
[, CONSTRAINT constraint_name CHECK [check_expression] ENABLE|DISABLE NOVALIDATE RELY/NORELY ]

3 外部表

1）外部表与内部表

	内部表	外部表
创建语法	CREATE TABLE [IF NOT EXISTS] table_name	CREATE EXTERNAL TABLE [IF NOT EXISTS] table_name
区别	删除表时，元数据与数据都会被删除	删除外部表只删除metastore的元数据，不删除hdfs中的表数据

外部表因为是指定其他的hdfs路径的数据加载到表当中来，所以hive表会认为自己不完全独占这份数据，所以删除hive表的时候，数据仍然存放在hdfs当中，不会删掉，只会删除表的元数据。

2）构建外部表

代码语言：javascript

复制

create external table ext_student(s_id int,s_name string) row format delimited fields terminated by ',' location '/home/jerry/data';

3) 追加数据操作

数据文件

代码语言：javascript

复制

1,Jerry
2,Tom
3,Esabel
4,Cindy

语法

代码语言：javascript

复制

hive>load data local inpath…into table

案例

代码语言：javascript

复制

hive>load data local inpath '/home/jerry/hive/student.csv' into table ext_student;

4) 覆盖数据操作

语法

代码语言：javascript

复制

hive>load data local inpath…overwrite into table

案例

代码语言：javascript

复制

hive>load data local inpath '/home/jerry/hive/student.csv' overwrite into table ext_student;

5）从hdfs文件系统向表中加载数据

代码语言：javascript

复制

hive>load data inpath '/hivedatas/techer.csv' into table teacher;

6）从hdfs文件系统加载数据到指定分区

代码语言：javascript

复制

hive>load data inpath '/hivedatas/techer.csv' into table teacher partition(cur_date=20201210);

使用 load data local 表示从本地文件系统加载，文件会拷贝到hdfs上
使用 load data 表示从hdfs文件系统加载，文件会直接移动到hive相关目录下，注意不是拷贝过去，因为hive认为hdfs文件已经有3副本了，没必要再次拷贝了
如果表是分区表，load 时不指定分区会报错
如果加载相同文件名的文件，会被自动重命名。

7) 显示是否为外部表还是内部表

语法

代码语言：javascript

复制

desc extended student;

案例

代码语言：javascript

复制

hive> desc extended student;
OK    
id  int student id          
name           string             student name
…        
tableType:MANAGED_TABLE, rewriteEnabled:false)
…

tableType:MANAGED_TABLE表示管理表

代码语言：javascript

复制

hive> desc extended ext_student;
OK
s_id       int
s_name                     string
…                             
tableType:EXTERNAL_TABLE, rewriteEnabled:fals)
…

tableType:EXTERNAL_TABLE表示外部表

8）表结构复制

我们在2）案例4讲过管理表表结构复制，这里结合外部表讲一下。

语法

代码语言：javascript

复制

hive>create [external] table if not exists table1 like tablet2 location '/path/';

案例

代码语言：javascript

复制

hive> create external table if not exists demo.ext_student1 like demo.student location '/home/jerry/data';
hive> desc extended demo.ext_student1;    
…tableType:EXTERNAL_TABLE…
hive> create table if not exists demo.ext_student2 like demo.student location '/home/jerry/data';
hive> desc extended demo.ext_student2;
…tableType:MANAGED_TABLE…

可见

是否加external	原表类型	目标表类型
是	MANAGED_TABLE	EXTERNAL_TABLE
否	MANAGED_TABLE	MANAGED_TABLE

代码语言：javascript

复制

ive> select * from ext_student1;
OK
NULL       NULL
NULL       NULL
NULL       NULL
NULL       NULL
Time taken: 1.142 seconds, Fetched: 4 row(s)
hive> select * from ext_student2;
OK
NULL       NULL
NULL       NULL
NULL       NULL
NULL       NULL    
Time taken: 0.13 seconds, Fetched: 4 row(s)

可见，仅复制表结构，不复制表内容。

4 分区

1）创建分区表

l单分区

代码语言：javascript

复制

hive> create table table_name (s_id string, s_score int) partitioned by (month string);

l多分区

代码语言：javascript

复制

hive> create table table_name (s_id string, s_score int) partitioned by (year string,month string,day string);

hive表创建的时候可以用 location 指定一个文件或者文件夹，当指定文件夹时，hive会加载文件夹下的所有文件，当表中无分区时，这个文件夹下不能再有文夹，否则报错

当表是分区表时，比如 partitioned by (day string)，则这个文件夹下的每一个文件夹就是一个分区，且文件夹名为 day=20201123 这种格式，然后使用：

代码语言：javascript

复制

hive> msck repair table table_name;

修复表结构，成功之后即可看到数据已经全部加载到表当中去了。

2）加载数据到分区的表中

l加载数据到一个分区的表中

代码语言：javascript

复制

hive> load data local inpath '/export/servers/hivedatas/score.csv' into table table_name partition (month='201806');

l加载数据到一个多分区的表中

代码语言：javascript

复制

hive> load data local inpath '/export/servers/hivedatas/score.csv' into table table_name partition(year='2018',month='06',day='01');

3）添加分区

l添加一个分区

代码语言：javascript

复制

hive> alter table table_name add partition(month='201805');

l添加多个分区

代码语言：javascript

复制

hive> alter table table_name add partition(month='201804') partition(month = '201803');
hive> alter table log add if not exists
partition (year=2022,month=3,day=1) location '/logs/2022-3-1'
partition (year=2022,month=3,day=2) location '/logs/2022-3-2'
partition (year=2022,month=3,day=3) location '/logs/2022-3-3'
partition (year=2022,month=3,day=4) location '/logs/2022-3-4'

4）修改分区路径

语法

代码语言：javascript

复制

hive> alter table table_name partition (…) set location '/…/';

案例

代码语言：javascript

复制

hive> alter  table log partition(year=2022,month=3,day=1) set location '/s3n://ourbucket/logs/2022/3/1/';

5）删除分区

语法

代码语言：javascript

复制

hive>alert…drop…

案例

代码语言：javascript

复制

hive> alter table table_name  drop partition(month = '201806')

6）案例

代码语言：javascript

复制

CREATE TABLE part_student (
      id INT,
      info String)
partitioned by (year INT, month INT, day INT)
ROW FORMAT DELIMITED  
FIELDS TERMINATED BY '\t'  
MAP KEYS TERMINATED BY ':'  
LINES TERMINATED BY '\n';

在/home/jerry/hive/stu/建立文件：2022-2-1，2022-9-1，2023-2-1，2023-9-1，2024-2-1，2024-9-1。

代码语言：javascript

复制

load data local inpath '/home/jerry/hive/stu/2022-2-1' into table part_student partition(year=2022,month=2,day=1);
load data local inpath '/home/jerry/hive/stu/2022-9-1' into table part_student partition(year=2022,month=9,day=1);
load data local inpath '/home/jerry/hive/stu/2023-2-1' into table part_student partition(year=2023,month=2,day=1);
load data local inpath '/home/jerry/hive/stu/2023-9-1' into table part_student partition(year=2023,month=9,day=1);
load data local inpath '/home/jerry/hive/stu/2024-2-1' into table part_student partition(year=2024,month=2,day=1);    
load data local inpath '/home/jerry/hive/stu/2024-9-1' into table part_student partition(year=2024,month=9,day=1);

查询

代码语言：javascript

复制

SELECT id,
    get_json_object(info, '$.name') AS name,  
    get_json_object(info, '$.age') AS age
FROM   
   part_student;

7）优化查询

语法

代码语言：javascript

复制

hive>set hive.mapred.mode=strict;

案例

代码语言：javascript

复制

hive>set hive.mapred.mode=strict;
hive>select * from part_student;
FAILED: SemanticException Queries against …
part_student.id       part_student.info       part_student.year       part_student.month       part_student.day
21       {"name":"Qian","age":"20"}       2024       2       1
22       {"name":"Yong","age":"20"}       2024       2       1
…
hive> set hive.mapred.mode=nonstrict;
hive>select * from part_student;    
OK
part_student.id       part_student.info       part_student.year       part_student.month       part_student.day
1       {"name":"Jerry","age":"19"}       2022       2       1
2       {"name":"Tom","age":"19"}         2022       2       1

结论：hive.mapred.mode=strict; 不允许全表扫描，默认nonstrict

8）查看分区

l显示所有分区

语法

代码语言：javascript

复制

hive> show partitions table_name;

案例

代码语言：javascript

复制

hive> show partitions part_student;
OK
year=2022/month=2/day=1
year=2022/month=9/day=1
year=2023/month=2/day=1
year=2023/month=9/day=1
year=2024/month=2/day=1
year=2024/month=9/day=1
Time taken: 0.053 seconds, Fetched: 6 row(s)

l显示某个分区

语法

代码语言：javascript

复制

> show partitions table_name partition(…);

案例

代码语言：javascript

复制

hive> show partitions part_student partition(year=2022);
OK
partition
year=2022/month=2/day=1
year=2022/month=9/day=1
Time taken: 0.075 seconds, Fetched: 2 row(s)

l显示详细分区内容

语法

代码语言：javascript

复制

hive> desc extended table_name;

案例

代码语言：javascript

复制

hive> desc extended  part_student;
OK
col_name       data_type       comment
id         int
info       string
year       int
month int
day        int
                
# Partition Information    
# col_name                   data_type                  comment
year       int
month  int
day        int
Detailed Table Information       Table(tableName:part_student, dbName:demo, owner:root, createTime:1724235141, lastAccessTime:0, retention:0,
…
partitionKeys:[FieldSchema(name:year, type:int, comment:null), FieldSchema(name:month, type:int, comment:null), FieldSchema(name:day, type:int, comment:null)], parameters:{transient_lastDdlTime=1724235141, totalSize=904, numRows=0, rawDataSize=0, numFiles=6, numPartitions=6}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false)       
Time taken: 0.078 seconds, Fetched: 15 row(s)

5. 表的修改

1）修改表名称

语法

代码语言：javascript

复制

hive> alter table old_table_name rename to new_table_name;

案例

代码语言：javascript

复制

hive> alter table student rename to students;

2）添加列

语法

代码语言：javascript

复制

hive> alter table table_name add columns (…);

案例

代码语言：javascript

复制

hive> alter table student add columns (length int, weight flat);

3）更新列

语法

代码语言：javascript

复制

hive> alter table table_name change column old_col new_col type comment 'new coimments';

案例

代码语言：javascript

复制

hive> alter table student change column age current_age int comment 'new coimments';

4）删除或者替换列

语法

代码语言：javascript

复制

hive>alter table table_name replace column(
…
);

案例

代码语言：javascript

复制

hive>alter  table  person replace column(
id int comment '…'
name string comment '…'
age int comment '…'
like string comment '…'
);

5）修改表属性

语法

代码语言：javascript

复制

hive>alter  table  table_name  SET TBLPROPERTIES(
'atribute' = '…'
);

案例

代码语言：javascript

复制

hive>alter  table  person  SET TBLPROPERTIES(
'note' = 'this table is …'
);

只能添加，不可删除

6）删除表、清空表

l删除表操作

代码语言：javascript

复制

hive>drop table if exists table_name;

l清空表操作

代码语言：javascript

复制

hive>truncate table table_name;

外部表：元数据删除，表中数据不删除

Hive元数据是指描述和存储有关Hive数据仓库中数据和表结构的信息。

注意：只能清空管理表，也就是内部表；清空外部表，会产生错误。L

注意：truncate和drop：

如果 hdfs 开启了回收站，drop 删除的表数据是可以从回收站恢复的，表结构恢复不了，需要自己重新创建；

truncate 清空的表是不进回收站的，所以无法恢复truncate清空的表。所以truncate一定慎用，一旦清空将无力回天。

元数据在MySQLMariaDB [hive]> select * from PARTITIONS;+---------+-------------+------------------+-------------------------+-------+--------+| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |+---------+-------------+------------------+-------------------------+-------+--------+| 1 | 1723006998 | 0 | age=10 | 10 | 9 || 2 | 1723007024 | 0 | age=20 | 11 | 9 || 17 | 1724140001 | 0 | year=2022/month=2/day=1 | 47 | 30 || 18 | 1724140242 | 0 | year=2023/month=2/day=1 | 48 | 30 || 19 | 1724140242 | 0 | year=2023/month=9/day=1 | 49 | 30 || 20 | 1724140242 | 0 | year=2024/month=2/day=1 | 50 | 30 || 21 | 1724140243 | 0 | year=2024/month=9/day=1 | 51 | 30 |+---------+-------------+------------------+-------------------------+-------+--------+ 7 rows in set (0.000 sec)MariaDB [hive]> select * from dbs;ERROR 1146 (42S02): Table 'hive.dbs' doesn't existMariaDB [hive]> select * from DBS;+-------+-----------------------+---------------------------------------------------+---------+------------+------------+| DB_ID | DESC | DB_LOCATION_URI | NAME | OWNER_NAME | OWNER_TYPE |+-------+-----------------------+---------------------------------------------------+---------+------------+------------+| 1 | Default Hive database | hdfs://localhost:9000/user/hive/warehouse | default | public | ROLE || 6 | NULL | hdfs://localhost:9000/user/hive/warehouse/demo.db | demo | root | USER |+-------+-----------------------+---------------------------------------------------+---------+------------+------------+2 rows in set (0.000 sec)

6 加载数据

加载数据在前面介绍过一些，现在总结如下。

1）LOAD DATA [LOCAL] INPATH

代码语言：javascript

复制

hive>LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

#加载本地文件到表

代码语言：javascript

复制

hive>load data local inpath '/home/jerry/data.txt' into table person;

# 查询加载的数据

代码语言：javascript

复制

hive>select * from person;

使用OVERWRITE：原先存在，覆盖

不使用OVERWRITE：原先存在，追加

2）直接向分区表中插入数据

代码语言：javascript

复制

hive>insert into table table_name partition(…) values (value1, value2,value3);

3）通过查询方式加载数据

代码语言：javascript

复制

hive>insert overwrite table table_name2 select s_id,c_id,s_score from table_name1;

4）查询语句中创建表并加载数据

代码语言：javascript

复制

hive>create table new_table_name as select * from old_table_name ;

5）在通过location指定加载数据的路径创建表

代码语言：javascript

复制

hive> create external table table_name (s_id string,c_id string,s_score int) row format delimited fields terminated by ',' location '/myscore';

6）export导出与import 导入 hive表数据（内部表操作）

代码语言：javascript

复制

hive> create table new_table_name  like old_table_name; --依据已有表结构创建表    
hive>export table teacher to  '/export/teacher ';
hive>import table teacher2 from '/export/teacher ';

7）使用select…where语句加载数据

代码语言：javascript

复制

from  log
insert overwrite table my_log
    partiton (year=2022,month=3,day=1)
    select * where log.year =2022 and log.month=3 and log.day=1
insert overwrite table my_log
    partiton (year=2022,month=3,day=2)
    select * where log.year =2022 and log.month=3 and log.day=2
insert overwrite table my_log
    partiton (year=2022,month=3,day=3)
    select * where log.year =2022 and log.month=3 and log.day=3
insert overwrite table my_log
    partiton (year=2022,month=3,day=4)
select * where log.year =2022 and log.month=3 and log.day=4

7 动态分区

代码语言：javascript

复制

insert overwrite table my_log
partiton (year,month,day)
select …log.year,log. month,log.day
from log;

year，month，day均是动态的。

代码语言：javascript

复制

insert overwrite table my_log
partiton (year=2022,month,day)
select …log.year,log. month,log.day
from log
where log.year=2022;

year是静态，month，day均是动态的。

动态分区默认没有开启

代码语言：javascript

复制

hive> set hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;

8. hive表中数据导出

1）insert导出

l将查询的结果导出到本地

代码语言：javascript

复制

hive>insert overwrite local directory
hive>insert overwrite local directory '/export/servers/exporthive' select * from student;
hive>insert overwrite local directory '/export/servers/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from student;

l将查询的结果导出到HDFS上(没有local)

代码语言：javascript

复制

hive>insert overwrite directory '/export/servers/exporthive' row format delimited fields terminated by '\t' collection items terminated by '#' select * from score;

l案例

代码语言：javascript

复制

hive>from part_student st
insert overwrite local directory '/home/jerry/hive/2022'
    select * where st.year=2022
insert overwrite local directory '/home/jerry/hive/2023'
    select * where st.year=2023
insert overwrite local directory '/home/jerry/hive/2024'
select * where st.year=2024;

2）hive表中数据导出

lHadoop命令导出到本地

代码语言：javascript

复制

#dfs -get /export/servers/exporthive/000000_0 /export/servers/exporthive/local.txt;

lhive shell 命令导出

基本语法：（hive -f/-e 执行语句或者脚本 > file）

代码语言：javascript

复制

#hive -e "select * from myhive.score;" > /export/servers/exporthive/score.txt
#hive -f export.sh > /export/servers/exporthive/score.txt

lexport导出到HDFS上

代码语言：javascript

复制

#export table score to '/export/exporthive/score';

Hive学习-数据定义语句

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Hive学习-数据定义语句

热门文章

最新文章

相关课程

相关电子书

相关实验场景