标签
PostgreSQL , 10.0 , hashed aggregation with grouping sets
背景
grouping sets 是多维分析语法,PostgreSQL 从9.5开始支持这种语法,常被用于OLAP系统,数据透视等应用场景。
《PostgreSQL 9.5 new feature - Support GROUPING SETS, CUBE and ROLLUP.》
由于多维分析的一个QUERY涉及多个GROUP,所以如果使用hash agg的话,需要多个HASH table,并行计算. 9.5, 9.6的时候,还不支持一个QUERY使用多个HASH TABLE并行计算。
10.0 扩展了聚合NODE,支持hashAggregate并行开多个hashtable,以及MixedAggregate策略用于sort grouping时哈希表的数据倒腾。
使用时对用户完全透明,同时优化器在使用hash agg, multi hashtable,时,会尽量的减少重复SORT。
总而言之,grouping set多维分析会更快(即使包含排序),更省内存。
Support hashed aggregation with grouping sets.
This extends the Aggregate node with two new features:
HashAggregate can now run multiple hashtables concurrently,
and a new strategy MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
例子
+explain (costs off) select a, b, grouping(a,b), sum(v), count(*), max(v)
+ from gstest1 group by grouping sets ((a),(b)) order by 3,1,2;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------
+ Sort
+ Sort Key: (GROUPING("*VALUES*".column1, "*VALUES*".column2)), "*VALUES*".column1, "*VALUES*".column2
+ -> HashAggregate
+ Hash Key: "*VALUES*".column1
+ Hash Key: "*VALUES*".column2
+ -> Values Scan on "*VALUES*"
+(6 rows)
这个patch的讨论,详见邮件组,本文末尾URL。
PostgreSQL社区的作风非常严谨,一个patch可能在邮件组中讨论几个月甚至几年,根据大家的意见反复的修正,patch合并到master已经非常成熟,所以PostgreSQL的稳定性也是远近闻名的。