hive查询语法
提示:在做小数据量查询测试时,可以让hive将mrjob提交给本地运行器运行,可以在hive会话中设置如下参数:
hive> set hive.exec.mode.local.auto=true;
基本查询示例
select * from t_access;
select count(*) from t_access;
select max(ip) from t_access;
条件查询
select * from t_access where access_time<'2017-08-06 15:30:20'
select * from t_access where access_time<'2017-08-06 16:30:20' and ip>'192.168.33.3';
join关联查询示例
假如有a.txt文件
a,1 b,2 c,3 d,4
假如有b.txt文件
a,xx b,yy d,zz e,pp
进行各种join查询:
inner join(join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a join t_b b on a.name=b.name
结果:
+--------+--------+--------+--------+--+ | aname | anumb | bname | bnick | +--------+--------+--------+--------+--+ | a | 1 | a | xx | | b | 2 | b | yy | | d | 4 | d | zz | +--------+--------+--------+--------+--+
left outer join(left join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a left outer join t_b b on a.name=b.name
结果:
right outer join(right join)
select a.name as aname, a.numb as anumb, b.name as bname, b.nick as bnick from t_a a right outer join t_b b on a.name=b.name
结果:
full outer join(full join)
结果:
left semi join
hive中不支持exist/IN子查询,可以用left semi join来实现同样的效果:
select a.name as aname, a.numb as anumb from t_a a left semi join t_b b on a.name=b.name;
结果:
注意: left semi join的 select子句中,不能有右表的字段
group by分组聚合
select dt,count(*),max(ip) as cnt from t_access group by dt; select dt,count(*),max(ip) as cnt from t_access group by dt having dt>'20170804'; select dt,count(*),max(ip) as cnt from t_access where url='http://www.edu360.cn/job' group by dt having dt>'20170804';
注意: 一旦有group by子句,那么,在select子句中就不能有 (分组字段,聚合函数) 以外的字段
如果想要【select非group by的字段】,可以参考https://blog.csdn.net/fly910905/article/details/80865000
## 为什么where必须写在group by的前面,为什么group by后面的条件只能用having
因为,where是用于在真正执行查询逻辑之前过滤数据用的
having是对group by聚合之后的结果进行再过滤;
上述语句的执行逻辑:
- where过滤不满足条件的数据
- 用聚合函数和group by进行数据运算聚合,得到聚合结果
- 用having条件过滤掉聚合结果中不满足条件的数据
子查询
select id,name,father from (select id,name,family_members['brother'] as father from t_person) tmp where father is not null;