distinct xx和count(distinct xx)的变态递归优化方法

本文涉及的产品
RDS PostgreSQL Serverless,0.5-4RCU 50GB 3个月
推荐场景:
对影评进行热评分析
云数据库 RDS SQL Server,基础系列 2核4GB
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
简介: 今天要说的这个优化是从前面一篇讲解《performance tuning case :use cursor or trigger replace group by and order by》http://blog.163.com/digoal@126/blog/static/16387704020.

今天要说的这个优化是从前面一篇讲解《performance tuning case :use cursor or trigger replace group by and order by》
http://blog.163.com/digoal@126/blog/static/16387704020128142829610/
的延展.

CASE

例如一个表中有一个字段是性别, 这个表不管有多少条记录, 性别这个字段一般来说也就2个值
select count(distinct sex) from table; 得到的结果当然是2. 但是如果数据量很大的情况下, 这种运算就非常耗时, 下面来测试一下 :

PostgreSQL

测试表

digoal=> create table sex (sex char(1), otherinfo text);  
CREATE TABLE  

测试数据

digoal=> insert into sex select 'm', generate_series(1,10000000)||'this is test';  
INSERT 0 10000000  
digoal=> insert into sex select 'w', generate_series(1,10000000)||'this is test';  
INSERT 0 10000000  

测试SQL1

digoal=> \timing on  
digoal=> select count(distinct sex) from sex;  
 count   
-------  
     2  
(1 row)  
Time: 47254.221 ms  

测试SQL2

digoal=> select sex from sex t group by sex;  
 sex   
-----  
 w  
 m  
(2 rows)  
Time: 6534.433 ms  

执行计划

digoal=> explain select count(distinct sex) from sex;  
                             QUERY PLAN                                
---------------------------------------------------------------------  
 Aggregate  (cost=377385.25..377385.26 rows=1 width=2)  
   ->  Seq Scan on sex  (cost=0.00..327386.00 rows=19999700 width=2)  
  
digoal=> explain select sex from sex t group by sex;  
                              QUERY PLAN                                 
-----------------------------------------------------------------------  
 HashAggregate  (cost=377385.25..377385.27 rows=2 width=2)  
   ->  Seq Scan on sex t  (cost=0.00..327386.00 rows=19999700 width=2)  

创建索引

digoal=> create index idx_sex_1 on sex(sex);  
CREATE INDEX  
digoal=> set enable_seqscan=off;  
SET  

使用索引后的执行计划, PostgreSQL可以使用Index Only Scan.

digoal=> explain select count(distinct sex) from sex;  
                                         QUERY PLAN                                           
--------------------------------------------------------------------------------------------  
 Aggregate  (cost=532235.01..532235.02 rows=1 width=2)  
   ->  Index Only Scan using idx_sex_1 on sex  (cost=0.00..482234.97 rows=20000016 width=2)  
  
digoal=> explain select sex from sex t group by sex;  
                                          QUERY PLAN                                            
----------------------------------------------------------------------------------------------  
 Group  (cost=0.00..532235.01 rows=2 width=2)  
   ->  Index Only Scan using idx_sex_1 on sex t  (cost=0.00..482234.97 rows=20000016 width=2)  

创建索引后SQL耗时

digoal=> select count(distinct sex) from sex;  
 count   
-------  
     2  
(1 row)  
Time: 49589.947 ms  
  
digoal=> select sex from sex t group by sex;  
 sex   
-----  
 m  
 w  
(2 rows)  
Time: 6608.053 ms  

O

测试表

SQL> create table sex(sex char(1), otherinfo varchar2(64));  
Table created.  

测试数据

SQL> insert into sex select 'm', rownum||'this is test' from dual connect by level <=10000001;  
10000001 rows created.  
SQL> commit;  
Commit complete.  
SQL> insert into sex select 'w', rownum||'this is test' from dual connect by level <=10000001;  
10000001 rows created.  
SQL> commit;  
Commit complete.  

测试SQL1:

SQL> set autotrace on  
SQL> set timing on  
SQL> select count(distinct sex) from sex;  
COUNT(DISTINCTSEX)  
------------------  
                 2  
Elapsed: 00:00:03.62  
  
Execution Plan  
----------------------------------------------------------  
Plan hash value: 2096505595  
---------------------------------------------------------------------------  
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------  
|   0 | SELECT STATEMENT   |      |     1 |     3 | 13106   (3)| 00:02:38 |  
|   1 |  SORT GROUP BY     |      |     1 |     3 |            |          |  
|   2 |   TABLE ACCESS FULL| SEX  |    24M|    69M| 13106   (3)| 00:02:38 |  
---------------------------------------------------------------------------  
Note  
-----  
   - dynamic sampling used for this statement  
Statistics  
----------------------------------------------------------  
          0  recursive calls  
          0  db block gets  
      74074  consistent gets  
          0  physical reads  
          0  redo size  
        525  bytes sent via SQL*Net to client  
        487  bytes received via SQL*Net from client  
          2  SQL*Net roundtrips to/from client  
          1  sorts (memory)  
          0  sorts (disk)  
          1  rows processed  

测试SQL2

SQL> select sex from sex t group by sex;  
S  
-  
w  
m  
Elapsed: 00:00:03.23  
Execution Plan  
----------------------------------------------------------  
Plan hash value: 2807610159  
---------------------------------------------------------------------------  
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------  
|   0 | SELECT STATEMENT   |      |    24M|    69M| 14908  (14)| 00:02:59 |  
|   1 |  HASH GROUP BY     |      |    24M|    69M| 14908  (14)| 00:02:59 |  
|   2 |   TABLE ACCESS FULL| SEX  |    24M|    69M| 13106   (3)| 00:02:38 |  
---------------------------------------------------------------------------  
Note  
-----  
   - dynamic sampling used for this statement  
Statistics  
----------------------------------------------------------  
          0  recursive calls  
          0  db block gets  
      74074  consistent gets  
          0  physical reads  
          0  redo size  
        563  bytes sent via SQL*Net to client  
        487  bytes received via SQL*Net from client  
          2  SQL*Net roundtrips to/from client  
          0  sorts (memory)  
          0  sorts (disk)  
          2  rows processed  

创建索引

SQL> create index idx_sex_1 on sex(sex);  
Index created.  
Elapsed: 00:00:33.40  

创建索引后的测试, 执行时间没有明显变化.

SQL> select count(distinct sex) from sex;  
COUNT(DISTINCTSEX)  
------------------  
                 2  
Elapsed: 00:00:04.32  
Execution Plan  
----------------------------------------------------------  
Plan hash value: 1805173869  
-----------------------------------------------------------------------------------  
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |  
-----------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT      |           |     1 |     3 |  6465   (3)| 00:01:18 |  
|   1 |  SORT GROUP BY        |           |     1 |     3 |            |          |  
|   2 |   INDEX FAST FULL SCAN| IDX_SEX_1 |    24M|    69M|  6465   (3)| 00:01:18 |  
-----------------------------------------------------------------------------------  
Note  
-----  
   - dynamic sampling used for this statement  
Statistics  
----------------------------------------------------------  
          5  recursive calls  
          0  db block gets  
      36421  consistent gets  
      36300  physical reads  
          0  redo size  
        525  bytes sent via SQL*Net to client  
        487  bytes received via SQL*Net from client  
          2  SQL*Net roundtrips to/from client  
          1  sorts (memory)  
          0  sorts (disk)  
          1  rows processed  
SQL> select sex from sex t group by sex;  
S  
-  
w  
m  
Elapsed: 00:00:03.21  
Execution Plan  
----------------------------------------------------------  
Plan hash value: 2807610159  
---------------------------------------------------------------------------  
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------  
|   0 | SELECT STATEMENT   |      |    24M|    69M| 14908  (14)| 00:02:59 |  
|   1 |  HASH GROUP BY     |      |    24M|    69M| 14908  (14)| 00:02:59 |  
|   2 |   TABLE ACCESS FULL| SEX  |    24M|    69M| 13106   (3)| 00:02:38 |  
---------------------------------------------------------------------------  
Note  
-----  
   - dynamic sampling used for this statement  
Statistics  
----------------------------------------------------------  
          5  recursive calls  
          0  db block gets  
      74170  consistent gets  
          0  physical reads  
          0  redo size  
        563  bytes sent via SQL*Net to client  
        487  bytes received via SQL*Net from client  
          2  SQL*Net roundtrips to/from client  
          0  sorts (memory)  
          0  sorts (disk)  
          2  rows processed  

对比以上测试, O的性能要明显优于PostgreSQL.
将count(distinct sex)修改如下后PostgreSQL的执行速度有明显改善, 但是性能还是低于O一截, 约一半.

digoal=> select count(*) from (select sex from sex t group by sex) t;  
 count   
-------  
     2  
(1 row)  
Time: 6231.965 ms  

开始优化咯

那么如何优化呢?
在PostgreSQL中的递归SQL在这里就派上大用场了, 结合btree索引扫描. 性能可以提升几万倍.
来看如下优化过程 :
创建测试表 :

create table user_download_log (user_id int not null, listid int not null, apkid int not null, get_time timestamp(0) not null, otherinfo text);  

插入测试数据

insert into user_download_log select generate_series(0,10000000),generate_series(0,10000000),generate_series(0,10000000),generate_series(clock_timestamp(),clock_timestamp()+interval '10000000 min',interval '1 min'), 'this is test';  

创建索引 :

create index i1 on user_download_log (user_id);  
create index i2 on user_download_log (otherinfo);  

查看数据分布 :
用来说明递归SQL适合哪种场景的优化.

select count(distinct user_id), count(distinct otherinfo) from user_download_log;  
  count   | count   
----------+-------  
 10000001 |     1  

查看未优化时以下SQL的执行计划以及耗时.

digoal=> explain analyze select count(distinct otherinfo) from user_download_log;  
                                                               QUERY PLAN                                                             
      
------------------------------------------------------------------------------------------------------------------------------------  
----  
 Aggregate  (cost=208334.36..208334.37 rows=1 width=13) (actual time=6295.493..6295.494 rows=1 loops=1)  
   ->  Seq Scan on user_download_log  (cost=0.00..183334.29 rows=10000029 width=13) (actual time=0.014..1612.333 rows=10000001 loops  
=1)  
 Total runtime: 6295.550 ms  

优化后的SQL :

digoal=> with recursive skip as (  
digoal(>   (  
digoal(>     select min(t.otherinfo) as otherinfo from user_download_log t where t.otherinfo is not null  
digoal(>   )  
digoal(>   union all  
digoal(>   (  
digoal(>     select (select min(t.otherinfo) from user_download_log t where t.otherinfo > s.otherinfo and t.otherinfo is not null)   
digoal(>       from skip s where s.otherinfo is not null  
digoal(>   )  -- 这里的where s.otherinfo is not null 一定要加,否则就死循环了.  
digoal(> )   
digoal-> select count(distinct otherinfo) from skip;  
 count   
-------  
     1  
(1 row)  

优化后的SQL执行计划以及耗时, 性能提升了36390倍, 相比O也提升了上万倍.

digoal=> explain analyze with recursive skip as (  
  (  
    select min(t.otherinfo) as otherinfo from user_download_log t where t.otherinfo is not null  
  )  
  union all  
  (  
    select (select min(t.otherinfo) from user_download_log t where t.otherinfo > s.otherinfo and t.otherinfo is not null)   
      from skip s where s.otherinfo is not null  
  )  -- 这里的where s.otherinfo is not null 一定要加,否则就死循环了.  
)   
select count(distinct otherinfo) from skip;  
                                                                                 QUERY PLAN                                           
                                           
------------------------------------------------------------------------------------------------------------------------------------  
-----------------------------------------  
 Aggregate  (cost=10.55..10.56 rows=1 width=32) (actual time=0.094..0.094 rows=1 loops=1)  
   CTE skip  
     ->  Recursive Union  (cost=0.03..8.28 rows=101 width=32) (actual time=0.044..0.073 rows=2 loops=1)  
           ->  Result  (cost=0.03..0.04 rows=1 width=0) (actual time=0.042..0.042 rows=1 loops=1)  
                 InitPlan 1 (returns $1)  
                   ->  Limit  (cost=0.00..0.03 rows=1 width=13) (actual time=0.038..0.039 rows=1 loops=1)  
                         ->  Index Only Scan using i2 on user_download_log t  (cost=0.00..296844.61 rows=10000029 width=13) (actual   
time=0.037..0.037 rows=1 loops=1)  
                               Index Cond: (otherinfo IS NOT NULL)  
                               Heap Fetches: 1  
           ->  WorkTable Scan on skip s  (cost=0.00..0.62 rows=10 width=32) (actual time=0.013..0.013 rows=0 loops=2)  
                 Filter: (otherinfo IS NOT NULL)  
                 Rows Removed by Filter: 0  
                 SubPlan 3  
                   ->  Result  (cost=0.03..0.04 rows=1 width=0) (actual time=0.018..0.018 rows=1 loops=1)  
                         InitPlan 2 (returns $3)  
                           ->  Limit  (cost=0.00..0.03 rows=1 width=13) (actual time=0.017..0.017 rows=0 loops=1)  
                                 ->  Index Only Scan using i2 on user_download_log t  (cost=0.00..107284.96 rows=3333343 width=13) (  
actual time=0.015..0.015 rows=0 loops=1)  
                                       Index Cond: ((otherinfo > s.otherinfo) AND (otherinfo IS NOT NULL))  
                                       Heap Fetches: 0  
   ->  CTE Scan on skip  (cost=0.00..2.02 rows=101 width=32) (actual time=0.047..0.077 rows=2 loops=1)  
 Total runtime: 0.173 ms  
(21 rows)  

换一个字段, 数据分布广泛的字段上使用以上优化方法, 看是否妥当, 以下是原始SQL的执行计划以及耗时 :

digoal=> explain analyze select count(distinct user_id) from user_download_log;  
                                                              QUERY PLAN                                                              
     
------------------------------------------------------------------------------------------------------------------------------------  
---  
 Aggregate  (cost=208334.36..208334.37 rows=1 width=4) (actual time=4008.858..4008.858 rows=1 loops=1)  
   ->  Seq Scan on user_download_log  (cost=0.00..183334.29 rows=10000029 width=4) (actual time=0.014..1606.607 rows=10000001 loops=  
1)  
 Total runtime: 4008.916 ms  

换一个字段, 数据分布广泛的字段上使用以上优化方法, 看是否妥当, 以下是采用递归SQL后的执行计划以及耗时 :
显然性能是下降的, 所以使用递归SQL不适合数据分布广泛的字段的group by或者count(distinct)操作.

digoal=> explain analyze with recursive skip as (  
  (  
    select min(t.user_id) as user_id from user_download_log t where t.user_id is not null  
  )  
  union all  
  (  
    select (select min(t.user_id) from user_download_log t where t.user_id > s.user_id and t.user_id is not null)   
      from skip s where s.user_id is not null  
  )  -- 这里的where s.user_id is not null 一定要加,否则就死循环了.  
)   
select count(distinct user_id) from skip;  
                                                                                    QUERY PLAN                                        
                                                 
------------------------------------------------------------------------------------------------------------------------------------  
-----------------------------------------------  
 Aggregate  (cost=10.44..10.45 rows=1 width=4) (actual time=186741.338..186741.339 rows=1 loops=1)  
   CTE skip  
     ->  Recursive Union  (cost=0.03..8.17 rows=101 width=4) (actual time=0.047..178296.238 rows=10000002 loops=1)  
           ->  Result  (cost=0.03..0.04 rows=1 width=0) (actual time=0.046..0.046 rows=1 loops=1)  
                 InitPlan 1 (returns $1)  
                   ->  Limit  (cost=0.00..0.03 rows=1 width=4) (actual time=0.042..0.042 rows=1 loops=1)  
                         ->  Index Only Scan using i1 on user_download_log t  (cost=0.00..285759.50 rows=10000029 width=4) (actual t  
ime=0.040..0.040 rows=1 loops=1)  
                               Index Cond: (user_id IS NOT NULL)  
                               Heap Fetches: 1  
           ->  WorkTable Scan on skip s  (cost=0.00..0.61 rows=10 width=4) (actual time=0.017..0.017 rows=1 loops=10000002)  
                 Filter: (user_id IS NOT NULL)  
                 Rows Removed by Filter: 0  
                 SubPlan 3  
                   ->  Result  (cost=0.03..0.04 rows=1 width=0) (actual time=0.016..0.016 rows=1 loops=10000001)  
                         InitPlan 2 (returns $3)  
                           ->  Limit  (cost=0.00..0.03 rows=1 width=4) (actual time=0.015..0.015 rows=1 loops=10000001)  
                                 ->  Index Only Scan using i1 on user_download_log t  (cost=0.00..103588.85 rows=3333343 width=4) (a  
ctual time=0.014..0.014 rows=1 loops=10000001)  
                                       Index Cond: ((user_id > s.user_id) AND (user_id IS NOT NULL))  
                                       Heap Fetches: 10000000  
   ->  CTE Scan on skip  (cost=0.00..2.02 rows=101 width=4) (actual time=0.050..183449.391 rows=10000002 loops=1)  
 Total runtime: 186909.323 ms  
(21 rows)  
Time: 186910.482 ms  

以下是同样的数据结构以及测试数据在O下的测试.

SQL> create table test (id int, otherinfo varchar2(32)) nologging;  
Table created.  
SQL> insert into test select rownum,'this is test' from dual connect by level <=10000001;  
10000001 rows created.  
SQL> commit;  
SQL> create index i1 on test(id);  
SQL> create index i2 on test(otherinfo);  
SQL> explain plan for select count(distinct id) from test;  
Explained.  
SQL> select * from table(dbms_xplan.display());  
PLAN_TABLE_OUTPUT  
--------------------------------------------------------------------------------------------------------------------------------------------  
Plan hash value: 1403727100  
------------------------------------------------------------------------------  
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |  
------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT      |      |     1 |    13 |  4178   (3)| 00:00:51 |  
|   1 |  SORT GROUP BY        |      |     1 |    13 |            |          |  
|   2 |   INDEX FAST FULL SCAN| I1   |  9834K|   121M|  4178   (3)| 00:00:51 |  
------------------------------------------------------------------------------  
Note  
-----  
   - dynamic sampling used for this statement  
13 rows selected.  
SQL> explain plan for select count(distinct otherinfo) from test;  
Explained.  
SQL> select * from table(dbms_xplan.display());  
PLAN_TABLE_OUTPUT  
--------------------------------------------------------------------------------------------------------------------------------------------  
Plan hash value: 2603667166  
---------------------------------------------------------------------------  
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------  
|   0 | SELECT STATEMENT   |      |     1 |    18 |  5837   (3)| 00:01:11 |  
|   1 |  SORT GROUP BY     |      |     1 |    18 |            |          |  
|   2 |   TABLE ACCESS FULL| TEST |  9834K|   168M|  5837   (3)| 00:01:11 |  
---------------------------------------------------------------------------  
Note  
-----  
   - dynamic sampling used for this statement  
13 rows selected.  
SQL> set timing on  
SQL> select count(distinct otherinfo) from test;  
COUNT(DISTINCTOTHERINFO)  
------------------------  
                       1  
Elapsed: 00:00:02.49  
SQL> select count(distinct id) from test;  
COUNT(DISTINCTID)  
-----------------  
         10000001  
Elapsed: 00:00:07.13  

从执行耗时可以看出PostgreSQL在数据分布稀疏的字段上使用递归SQL优化后的性能相比O有41213倍的性能提升.

补充

递归查询中不允许使用聚合函数 :

with recursive skip as (  
  (  
    select min(t.otherinfo) as otherinfo from user_download_log t where t.otherinfo is not null  
  )  
  union all  
  (  
    select min(t.otherinfo) from user_download_log t, skip s   
      where t.otherinfo > s.otherinfo   
      and t.otherinfo is not null  
      and s.otherinfo is not null  
  )  -- 这里的where s.otherinfo is not null 一定要加,否则就死循环了.  
)   
select * from skip;  
ERROR:  aggregate functions not allowed in a recursive query's recursive term  
LINE 7:     select min(t.otherinfo) from user_download_log t, skip s...  
                   ^  
Time: 0.581 ms  

修改如下即可 :

with recursive skip as (  
  (  
    select min(t.otherinfo) as otherinfo from user_download_log t where t.otherinfo is not null  
  )  
  union all  
  (  
    select (select min(t.otherinfo) from user_download_log t where t.otherinfo > s.otherinfo and t.otherinfo is not null)   
      from skip s where s.otherinfo is not null  
  )  -- 这里的where s.otherinfo is not null 一定要加,否则就死循环了.  
)   
select * from skip;  

细心的朋友发现O测试中未对表进行分析, 以下是分析后的结果, 执行计划无变化 :

SQL> analyze table sex estimate statistics for all columns sample 10 percent;  
Table analyzed.  
SQL> analyze index idx_sex_1 estimate statistics sample 10 percent;  
Index analyzed.  
SQL> select sex from sex t group by sex;  
  
S  
-  
w  
m  
  
Elapsed: 00:00:03.17  
  
Execution Plan  
----------------------------------------------------------  
Plan hash value: 2807610159  
  
---------------------------------------------------------------------------  
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------  
|   0 | SELECT STATEMENT   |      |     2 |     2 | 14519  (12)| 00:02:55 |  
|   1 |  HASH GROUP BY     |      |     2 |     2 | 14519  (12)| 00:02:55 |  
|   2 |   TABLE ACCESS FULL| SEX  |    20M|    19M| 13062   (2)| 00:02:37 |  
---------------------------------------------------------------------------  
  
  
Statistics  
----------------------------------------------------------  
          1  recursive calls  
          0  db block gets  
      74074  consistent gets  
          0  physical reads  
          0  redo size  
        563  bytes sent via SQL*Net to client  
        487  bytes received via SQL*Net from client  
          2  SQL*Net roundtrips to/from client  
          0  sorts (memory)  
          0  sorts (disk)  
          2  rows processed  
  
SQL> select count(distinct sex) from sex;  
  
COUNT(DISTINCTSEX)  
------------------  
                 2  
  
Elapsed: 00:00:03.85  
  
Execution Plan  
----------------------------------------------------------  
Plan hash value: 1805173869  
  
-----------------------------------------------------------------------------------  
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |  
-----------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT      |           |     1 |     1 |  6454   (3)| 00:01:18 |  
|   1 |  SORT GROUP BY        |           |     1 |     1 |            |          |  
|   2 |   INDEX FAST FULL SCAN| IDX_SEX_1 |    20M|    19M|  6454   (3)| 00:01:18 |  
-----------------------------------------------------------------------------------  
  
  
Statistics  
----------------------------------------------------------  
          1  recursive calls  
          0  db block gets  
      36325  consistent gets  
          0  physical reads  
          0  redo size  
        525  bytes sent via SQL*Net to client  
        487  bytes received via SQL*Net from client  
          2  SQL*Net roundtrips to/from client  
          1  sorts (memory)  
          0  sorts (disk)  
          1  rows processed  

O在这类应用场景中还有一个选择,使用位图索引。
摘录一段O位图索引的介绍
位图索引 Bitmap index
场合:列的基数很少,可枚举,重复值很多,数据不会被经常更新
原理:一个键值对应很多行(rowid), 格式:键值 start_rowid end_rowid 位图
优点:OLAP 例如报表类数据库 重复率高的数据 特定类型的查询例如count、or、and等逻辑操作因为只需要进行位运算即可得到我们需要的结果
缺点:不适合重复率低的字段,还有经常DML操作(insert,update,delete),因为位图索引的锁代价极高,修改一个位图索引段影响整个位图段,例如修改
一个键值,会影响同键值的多行,所以对于OLTP 系统位图索引基本上是不适用的

因bitmap在OLTP使用场景较少,PostgreSQL 没有实现这个类型的索引。
http://www.postgresql.org/message-id/flat/27879.1098227105@sss.pgh.pa.us#27879.1098227105@sss.pgh.pa.us
https://en.wikipedia.org/wiki/Bitmap_index
http://grokbase.com/t/postgresql/pgsql-hackers/051xeh5b0a/implementing-bitmap-indexes
想了解更多PG索引的情况,请参考
http://leopard.in.ua/2015/04/13/postgresql-indexes

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
7月前
|
存储 算法 C++
带用查找算法-countc++讲解
带用查找算法-countc++讲解
78 2
|
SQL 架构师 关系型数据库
不可置信!SQL 优化终于干掉了“distinct”
sql 优化之多表联合查询干掉 “distinct” 去重关键字 在我提交了代码的时候,架构师给我指出我这个sql这样写会有问题。因为在分库分表的时候,是不支持子查询的。 所以需要把多表的子查询的 sql 结构进行优化。 是不是挺恐怖的;(此处为了脱敏,我把相关的 sql 关键词都给打码掉了)
|
SQL 算法 搜索推荐
为排序使用索引 OrderBy 优化|学习笔记
快速学习为排序使用索引 OrderBy 优化
216 0
为排序使用索引 OrderBy 优化|学习笔记
count去重和distinct去重
count去重和distinct去重
6997 0
|
SQL 关系型数据库 MySQL
索引合并Intersection、union (3)--单表访问方法(三十八)
索引合并Intersection、union (3)--单表访问方法(三十八)
|
SQL 数据库
除了会排序,你对ORDER BY的用法可能一无所知!(一)
小伙伴们在进行SQL排序时,都能很自然的使用到ORDER BY。不管是默认ASC的升序,还是DESC降序,几乎都是信手拈来。 今天给大家分享一些你可能不知道的ORDER BY用法。
除了会排序,你对ORDER BY的用法可能一无所知!(一)
|
XML SQL 数据库
除了会排序,你对ORDER BY的用法可能一无所知!(二)
小伙伴们在进行SQL排序时,都能很自然的使用到ORDER BY。不管是默认ASC的升序,还是DESC降序,几乎都是信手拈来。 今天给大家分享一些你可能不知道的ORDER BY用法。
除了会排序,你对ORDER BY的用法可能一无所知!(二)
|
索引 SQL
Select Count 聚合函数三种不同写法区别与效率性能比较
Select Count 聚合函数三种不同写法区别与效率性能比较
509 0
Select Count 聚合函数三种不同写法区别与效率性能比较
|
关系型数据库 MySQL 数据库
select distinct去掉重复查询结果|学习笔记
快速学习select distinct去掉重复查询结果
272 0
|
算法 搜索推荐 关系型数据库
Orderby 排序优化
在日常的业务开发中,order by 排序是少不了的。但要写出高效的排序SQL,需要先花点精力和时间来了解排序的底层原理,这样才能找到优化排序的好策略。
1312 0
Orderby 排序优化