Hive - grouping sets 示例与详解-阿里云开发者社区

Hive - grouping sets 示例与详解

2022-08-22 538

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 介绍 group by 以及 grouping sets 相关用法。

一.引言

现有超市用户购物数据表一张，其字段与数据如下：

编辑

用户 id 为连续数字编号，性别分别用 0、1表示，年龄使用 xxs 表示，cost 代表用户在超市的消费总额：

1 0 00s 100 20220505
2 1 90s 200 20220505
3 1 00s 300 20220505
4 1 70s 400 20220505
5 0 60s 500 20220505
6 0 80s 600 20220505
7 1 80s 700 20220505
8 0 60s 800 20220505
9 0 70s 900 20220505
10  1 10s 1000  20220505

下面使用 group by 和 grouping sets 对表中数据进行提取。

二.Group By

对 hive 表中的数据进行分组汇总时常用 group by 语法

1.按年龄分组

select user_age,sum(buy_cost) from user_purchasing_info group by user_age;

2.按性别分组

select user_gender,user_age,sum(buy_cost) from user_purchasing_info group by user_gender,user_age;

0 2900.0
1 2600.0

3.按性别、年龄分组

按照多个变量进行 group by 时需要注意 group by 后的字段也需要全部添加：

select user_gender,user_age,sum(buy_cost) from user_purchasing_info group by user_gender,user_age;

0 00s 100.0
0 60s 1300.0
0 70s 900.0
0 80s 600.0
1 00s 300.0
1 10s 1000.0
1 70s 400.0
1 80s 700.0
1 90s 200.0

三.Grouping Sets

1.Grouping Sets 示例

grouping sets 方法提供一个 () sets，内部支持添加 group by 字段的不同组合，按不同维度分组聚合后，最后将数据合并，类似于根据 sets 内的字段进行多次 group by 操作再最后执行 union all。下面对超市购买表进行聚合，一次性查询按年龄、性别、年龄&性别的聚合结果：

select user_gender,user_age,sum(buy_cost) from user_purchasing_info 
where dt='20220505' 
group by user_gender,user_age 
grouping sets((user_gender),(user_age),(user_gender, user_age));

编辑

可以与上述查询结果相对应，其中蓝框内的结果为按年龄 user_age 进行分组的结果，红框内为按性别 user_gender + 年龄 user_age 进行分组的结果，剩余两行为按性别 user_gender 进行分组的结果。

2.Grouping Sets 实战

上面示例展示了一次获取不同分组的数据方法，实战场景下用法例如给定数据求按照某个维度的分数据和汇总总数据，对应到上述超市数据即求当天营业额按性别 + 年龄分组和总和数据：

select 
concat_ws('_',nvl(user_gender,'total'),nvl(user_age,'total')),sum(buy_cost) 
from user_purchasing_info where dt='20220505' 
group by user_gender,user_age 
grouping sets((user_gender),(user_gender, user_age),());

通过 concat_ws 和 nvl 函数连接了分组字段并使用 nvl 函数排除 NULL 的数据，非常的实用 👍

total_total 5500.0
0_total 2900.0
0_00s 100.0
0_60s 1300.0
0_70s 900.0
0_80s 600.0
1_total 2600.0
1_00s 300.0
1_10s 1000.0
1_70s 400.0
1_80s 700.0
1_90s 200.0

四.总结

简言之 Grouping Sets = Group By + Union，有几个点需要注意：

A.NULL 值

上面 Grouping Sets 中分别采用 A、B、A+B 的字段聚合，但是 Select 时 A，B 都添加了，所以对应单独 GroupBy A 或者 B 的数据，最终结果会默认补 NULL，所以上面结果中会有很多 NULL 值。

B.空 Set ()

Grouping Sets 实战中，我们添加了一个空 Sets()，其等价于：

select sum(buy_cost) from user_purchasing_info where dt='20220505';

即对整个表计算 sum，此时 user_age、user_gender 都为 NULL，通过 nvl 函数转换为 total_total，所以最终数据得到 total_total: 5500.0 的数据。

C.分组函数

除了 Grouping Sets 这个聚合功能函数外，类似的函数还有 Grouping_Id、CUBE、RollUp，可以参考：Hive - Cube, Rollup, GroupingId 示例与详解。

文章标签：

SQL

HIVE

关键词：

Hive示例

Hive - grouping sets 示例与详解

一.引言

二.Group By

1.按年龄分组

2.按性别分组

3.按性别、年龄分组

三.Grouping Sets

1.Grouping Sets 示例

2.Grouping Sets 实战

四.总结

热门文章

最新文章

相关课程

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Hive - grouping sets 示例与详解

一.引言

二.Group By

1.按年龄分组

2.按性别分组

3.按性别、年龄分组

三.Grouping Sets

1.Grouping Sets 示例

2.Grouping Sets 实战

四.总结

热门文章

最新文章

相关课程

相关电子书