大数据技术之Hive5-阿里云开发者社区

大数据技术之Hive5

2023-06-12 171 发布于吉林

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生大数据计算服务 MaxCompute，5000CU*H 100GB 3个月

云原生大数据计算服务MaxCompute，500CU*H 100GB 3个月

简介： 大数据技术之Hive5

8.2.5 流程控制函数

1）case when：条件判断函数

语法一：case when a then b [when c then d]* [else e] end

返回值：T

说明：如果a为true，则返回b；如果c为true，则返回d；否则返回 e

hive> select case when 1=2 then 'tom' when 2=2 then 'mary' else 'tim' end from tableName;

mary

语法二： case a when b then c [when d then e]* [else f] end

返回值: T

说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f

hive> select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end from t ableName;

mary

2）if: 条件判断，类似于Java中三元运算符

语法：if（boolean testCondition, T valueTrue, T valueFalseOrNull）

返回值：T

说明：当条件testCondition为true时，返回valueTrue；否则返回valueFalseOrNull

（1）条件满足，输出正确

hive> select if(10 > 5,'正确','错误');

输出：正确

（2）条件满足，输出错误

hive> select if(10 < 5,'正确','错误');

输出：错误

8.2.6 集合函数

1）size：集合中元素的个数

hive> select size(friends) from test; --2/2 每一行数据中的friends集合里的个数

2）map：创建map集合

语法：map (key1, value1, key2, value2, …)

说明：根据输入的key和value对构建map类型

案例实操：

hive> select map('xiaohai',1,'dahai',2);

输出：

hive> {"xiaohai":1,"dahai":2}

3）map_keys：返回map中的key

hive> select map_keys(map('xiaohai',1,'dahai',2));

输出：

hive>["xiaohai","dahai"]

4）map_values: 返回map中的value

hive> select map_values(map('xiaohai',1,'dahai',2));

输出：

hive>[1,2]

5）array 声明array集合

语法：array(val1, val2, …)

说明：根据输入的参数构建数组array类

案例实操：

hive> select array('1','2','3','4');

输出：

hive>["1","2","3","4"]

6）array_contains: 判断array中是否包含某个元素

hive> select array_contains(array('a','b','c','d'),'a');

输出：

hive> true

7）sort_array：将array中的元素排序

hive> select sort_array(array('a','d','c'));

输出：

hive> ["a","c","d"]

8）struct声明struct中的各属性

语法：struct(val1, val2, val3, …)

说明：根据输入的参数构建结构体struct类

案例实操：

hive> select struct('name','age','weight');

输出：

hive> {"col1":"name","col2":"age","col3":"weight"}

9）named_struct声明struct的属性和值

hive> select named_struct('name','xiaosong','age',18,'weight',80);

输出：

hive> {"name":"xiaosong","age":18,"weight":80}

8.2.7 案例演示

1. 数据准备

1）表结构

2）建表语句

hive>

create table employee(

name string, --姓名

sex string, --性别

birthday string, --出生年月

hiredate string, --入职日期

job string, --岗位

salary double, --薪资

bonus double, --奖金

friends array<string>, --朋友

children map<string,int> --孩子

)

3）插入数据

hive> insert into employee

values('张无忌','男','1980/02/12','2022/08/09','销售',3000,12000,array('阿朱','小昭'),map('张小无',8,'张小忌',9)),

('赵敏','女','1982/05/18','2022/09/10','行政',9000,2000,array('阿三','阿四'),map('赵小敏',8)),

('宋青书','男','1981/03/15','2022/04/09','研发',18000,1000,array('王五','赵六'),map('宋小青',7,'宋小书',5)),

('周芷若','女','1981/03/17','2022/04/10','研发',18000,1000,array('王五','赵六'),map('宋小青',7,'宋小书',5)),

('郭靖','男','1985/03/11','2022/07/19','销售',2000,13000,array('南帝','北丐'),map('郭芙',5,'郭襄',4)),

('黄蓉','女','1982/12/13','2022/06/11','行政',12000,null,array('东邪','西毒'),map('郭芙',5,'郭襄',4)),

('杨过','男','1988/01/30','2022/08/13','前台',5000,null,array('郭靖','黄蓉'),map('杨小过',2)),

('小龙女','女','1985/02/12','2022/09/24','前台',6000,null,array('张三','李四'),map('杨小过',2))

2. 需求

1）统计每个月的入职人数

（1）期望结果

month	cnt
4	2
6	1
7	1
8	2
9	2

（2）需求实现

select

month(replace(hiredate,'/','-')) as month,

count(*) as cn

from

employee

group by

month(replace(hiredate,'/','-'))

2）查询每个人的年龄（年+ 月）

（1）期望结果

name	age
张无忌	42年8月
赵敏	40年5月
宋青书	41年7月
周芷若	41年7月
郭靖	37年7月
黄蓉	39年10月
杨过	34年9月
小龙女	37年8月

（2）需求实现

-- 转换日期

select
name,
replace(birthday,'/','-') birthday
from
employee t1
-- 求出年和月
select
name,
year(current_date())-year(t1.birthday) year,
month(current_date())-month(t1.birthday) month
from
(
select
name,
replace(birthday,'/','-') birthday
from
employee
)t1 t2
-- 根据月份正负决定年龄
select
name,
concat(if(month>=0,year,year-1),'年',if(month>=0,month,12+month),'月') age
from
(
select
name,
year(current_date())-year(t1.birthday) year,
month(current_date())-month(t1.birthday) month
from
(
select
name,
replace(birthday,'/','-') birthday
from
employee
)t1
)t2

3）按照薪资，奖金的和进行倒序排序，如果奖金为null，置位0

（1）期望结果

name	sal
周芷若	19000
宋青书	19000
郭靖	15000
张无忌	15000
黄蓉	12000
赵敏	11000
小龙女	6000
杨过	5000

（2）需求实现

select

name,

salary + nvl(bonus,0) sal

from

employee

order by

sal desc

4）查询每个人有多少个朋友

（1）期望结果

name	cnt
张无忌	2
赵敏	2
宋青书	2
周芷若	2
郭靖	2
黄蓉	2
杨过	2
小龙女	2

（2）需求实现

select

name,

size(friends) cnt

from

employee;

5）查询每个人的孩子的姓名

（1）期望结果

name	ch_name
张无忌	["张小无","张小忌"]
赵敏	["赵小敏"]
宋青书	["宋小青","宋小书"]
周芷若	["宋小青","宋小书"]
郭靖	["郭芙","郭襄"]
黄蓉	["郭芙","郭襄"]
杨过	["杨小过"]
小龙女	["杨小过"]

（2）需求实现

hive>

select

name,

map_keys(children) ch_name

from

employee;

6）查询每个岗位男女各多少人

（1）期望结果

job	male	female
前台	1	1
研发	1	1
行政	0	2
销售	2	0

（2）需求实现

select
job,
sum(if(sex='男',1,0)) male,
sum(if(sex='女',1,0)) female
from
employee
group by
job

*8.3 高级聚合函数

多进一出（多行传入，一个行输出）。

1）普通聚合 count/sum.... 见第6章 6.2.4

2）collect_list 收集并形成list集合，结果不去重

ive>

select

sex,

collect_list(job)

from

employee

group by

sex

结果：

女 ["行政","研发","行政","前台"]

男 ["销售","研发","销售","前台"]

3）collect_set 收集并形成set集合，结果去重

hive>

select

sex,

collect_set(job)

from

employee

group by

sex

结果：

女 ["行政","研发","前台"]

男 ["销售","研发","前台"]

8.3.1 案例演示

1）每个月的入职人数以及姓名

hive>
select
month(replace(hiredate,'/','-')) as month,
count(*) as cn,
Collect_list(name) as name_list
from
employee
group by
month(replace(hiredate,'/','-'))
结果：
month cn name_list
4 2 ["宋青书","周芷若"]
6 1 ["黄蓉"]
7 1 ["郭靖"]
8 2 ["张无忌","杨过"]
9 2 ["赵敏","小龙女"]

8.4 炸裂函数

8.4.1 概述

8.4.2 案例演示

1.数据准备

1）表结构

movie	category
《疑犯追踪》	悬疑，动作，科幻，剧情
《Lie to me》	悬疑，警匪，动作，心理，剧情
《战狼2》	战争，动作，灾难

2）建表语句

1. hive (hive (default)>
create table movie_info(
    movie string,     --电影名称
    category string   --电影分类
) 
row format delimited fields terminated by "\t";

3）装载语句

insert overwrite table movie_info
values ("《疑犯追踪》", "悬疑,动作,科幻,剧情"),
       ("《Lie to me》", "悬疑,警匪,动作,心理,剧情"),
       ("《战狼2》", "战争,动作,灾难");

2. 需求

1）需求说明

根据上述电影信息表，统计各分类的电影数量，期望结果如下：

剧情	2
动作	3
心理	1
悬疑	2
战争	1
灾难	1
科幻	1
警匪	1

2）答案

select
    cate,
    count(*)
from
(
    select
        movie,
        cate
    from
    (
        select
            movie,
            split(category,',') cates
        from movie_info
    )t1 lateral view explode(cates) tmp as cate
)t2
group by cate;

*8.5 窗口函数（开窗函数）

8.5.1 概述

8.5.2 常用窗口函数

按照功能，常用窗口可划分为如下几类：聚合函数、跨行取值函数、排名函数。

1）聚合函数

max：最大值。

min：最小值。

sum：求和。

avg：平均值。

count：计数。

2）跨行取值函数

（1）lead和lag

注：lag和lead函数不支持自定义窗口。

（2）first_value和last_value

3）排名函数

注：rank 、dense_rank、row_number不支持自定义窗口。

*8.5.3 案例演示

1.数据准备

1）表结构

2）建表语句

create table order_info

(

order_id string, --订单id

user_id string, -- 用户id

user_name string, -- 用户姓名

order_date string, -- 下单日期

order_amount int -- 订单金额

);

3）装载语句

insert overwrite table order_info

values ('1', '1001', '小元', '2022-01-01', '10'),

('2', '1002', '小海', '2022-01-02', '15'),

('3', '1001', '小元', '2022-02-03', '23'),

('4', '1002', '小海', '2022-01-04', '29'),

('5', '1001', '小元', '2022-01-05', '46'),

('6', '1001', '小元', '2022-04-06', '42'),

('7', '1002', '小海', '2022-01-07', '50'),

('8', '1001', '小元', '2022-01-08', '50'),

('9', '1003', '小辉', '2022-04-08', '62'),

('10', '1003', '小辉', '2022-04-09', '62'),

('11', '1004', '小猛', '2022-05-10', '12'),

('12', '1003', '小辉', '2022-04-11', '75'),

('13', '1004', '小猛', '2022-06-12', '80'),

('14', '1003', '小辉', '2022-04-13', '94');

2. 需求

1）统计每个用户截至每次下单的累积下单总额

（1）期望结果

（2）需求实现

select

order_id,

user_id,

user_name,

order_date,

order_amount,

sum(order_amount) over(partition by user_id order by order_date rows between unbounded preceding and current row) sum_so_far

from order_info;

2）统计每个用户截至每次下单的当月累积下单总额

（1）期望结果

（2）需求实现

select

order_id,

user_id,

user_name,

order_date,

order_amount,

sum(order_amount) over(partition by user_id,substring(order_date,1,7) order by order_date rows between unbounded preceding and current row) sum_so_far

from order_info;

3）统计每个用户每次下单距离上次下单相隔的天数（首次下单按0天算）

（1）期望结果

（2）需求实现

select

order_id,

user_id,

user_name,

order_date,

order_amount,

nvl(datediff(order_date,last_order_date),0) diff

from

(

select

order_id,

user_id,

user_name,

order_date,

order_amount,

lag(order_date,1,null) over(partition by user_id order by order_date) last_order_date

from order_info

)t1

4）查询所有下单记录以及每个用户的每个下单记录所在月份的首/末次下单日期

（1）期望结果

（2）需求实现

select

order_id,

user_id,

user_name,

order_date,

order_amount,

first_value(order_date) over(partition by user_id,substring(order_date,1,7) order by order_date) first_date,

last_value(order_date) over(partition by user_id,substring(order_date,1,7) order by order_date rows between unbounded preceding and unbounded following) last_date

from order_info;

大数据技术之Hive5

8.2.5 流程控制函数

8.2.6 集合函数

8.2.7 案例演示

*8.3 高级聚合函数

8.3.1 案例演示

8.4 炸裂函数

8.4.1 概述

8.4.2 案例演示

*8.5 窗口函数（开窗函数）

8.5.1 概述

8.5.2 常用窗口函数

*8.5.3 案例演示

热门文章

最新文章

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

大数据技术之Hive5

8.2.5 流程控制函数

8.2.6 集合函数

8.2.7 案例演示

*8.3 高级聚合函数

8.3.1 案例演示

8.4 炸裂函数

8.4.1 概述

8.4.2 案例演示

*8.5 窗口函数（开窗函数）

8.5.1 概述

8.5.2 常用窗口函数

*8.5.3 案例演示

热门文章

最新文章

相关课程

相关电子书

相关实验场景