级联报表查询
假如,有如下数据(文件名:accumulate.dat):
A,2015-01,5 A,2015-01,15 B,2015-01,5 A,2015-01,8 B,2015-01,25 A,2015-01,5 C,2015-01,10 C,2015-01,20 A,2015-02,4 A,2015-02,6 C,2015-02,30 C,2015-02,10 B,2015-02,10 B,2015-02,5 A,2015-03,14 A,2015-03,6 B,2015-03,20 B,2015-03,25 C,2015-03,10 C,2015-03,20
需求:统计每个用户的月总额、累计到当月的总额:
-----统计结果如下累计报表
用户 |
月份 |
月总额 |
累计到当月的总额 |
A |
2015-01 |
33 |
33 |
A |
2015-02 |
10 |
43 |
A |
2015-03 |
30 |
73 |
B |
2015-01 |
30 |
30 |
B |
2015-02 |
15 |
45 |
... |
... |
... |
... |
解决方法1: 复杂sql--累计报表
建表映射:
create table t_access_times(username string,month string,counts int) row format delimited fields terminated by ',';
导入数据(accumulate.dat):
load data local inpath '/root/accumulate.dat' into table t_access_times;
1、第一步,先求个用户的月总金额
select username,month,sum(salary) as salary from t_access_times group by username,month
输出结果
+-----------+----------+---------+--+ | username | month | salary | +-----------+----------+---------+--+ | A | 2015-01 | 33 | | A | 2015-02 | 10 | | B | 2015-01 | 30 | | B | 2015-02 | 15 | +-----------+----------+---------+--+
2、第二步,将月总金额表自己连接 自己连接
select A.*,B.* FROM (select username,month,sum(salary) as salary from t_access_times group by username,month) A inner join (select username,month,sum(salary) as salary from t_access_times group by username,month) B on A.username=B.username where B.month <= A.month
输出结果
+-------------+----------+-----------+-------------+----------+-----------+--+ | a.username | a.month | a.salary | b.username | b.month | b.salary | +-------------+----------+-----------+-------------+----------+-----------+--+ | A | 2015-01 | 33 | A | 2015-01 | 33 | | A | 2015-01 | 33 | A | 2015-02 | 10 | | A | 2015-02 | 10 | A | 2015-01 | 33 | | A | 2015-02 | 10 | A | 2015-02 | 10 | | B | 2015-01 | 30 | B | 2015-01 | 30 | | B | 2015-01 | 30 | B | 2015-02 | 15 | | B | 2015-02 | 15 | B | 2015-01 | 30 | | B | 2015-02 | 15 | B | 2015-02 | 15 | +-------------+----------+-----------+-------------+----------+-----------+--+
第3步:得到最终结果
select auname,amonth,acnts,sum(bcnts) from t_tmp2 group by auname,amonth,acnts;
当然,也可以把整个逻辑过程写成一个SQL语句:
select A.username,A.month,max(A.salary) as salary,sum(B.salary) as accumulate from (select username,month,sum(salary) as salary from t_access_times group by username,month) A inner join (select username,month,sum(salary) as salary from t_access_times group by username,month) B on A.username=B.username where B.month <= A.month group by A.username,A.month order by A.username,A.month;
解决2:使用窗口分析函数--累计报表
-- 窗口分析函数 sum() over() :可以实现在窗口中进行逐行累加
0: jdbc:hive2://localhost:10000> select * from t_access_amount;
+----------------------+------------------------+-------------------------+--+ | t_access_amount.uid | t_access_amount.month | t_access_amount.amount | +----------------------+------------------------+-------------------------+--+ | A | 2015-01 | 33 | | A | 2015-02 | 10 | | A | 2015-03 | 20 | | B | 2015-01 | 30 | | B | 2015-02 | 15 | | B | 2015-03 | 45 | | C | 2015-01 | 30 | | C | 2015-02 | 40 | | C | 2015-03 | 30 | +----------------------+------------------------+-------------------------+--+
求出每个人截止到每个月的总额
select uid,month,amount, sum(amount) over(partition by uid order by month rows between unbounded preceding and current row) as accumulate from t_access_amount;