PostgreSQL 9.6 内核优化 - sort 性能增强

本文涉及的产品
云数据库 RDS MySQL,集群系列 2核4GB
推荐场景:
搭建个人博客
云原生数据库 PolarDB 分布式版,标准版 2核8GB
云数据库 RDS SQL Server,基础系列 2核4GB
简介:

PostgreSQL 9.6 内核优化 - sort 性能增强

作者

digoal

日期

2016-10-09

标签

PostgreSQL , sort , 内核优化


背景

PostgreSQL 9.6在排序这块做了一些性能增强,前面一篇主要讲了排序算法的变更。

《PostgreSQL 9.6 内核优化 - sort性能增强(batch化quicksort代替replacement selection when work_mem small)》

本文针对另外几个SORT增强的点再简单介绍一下。

PostgreSQL 9.6 排序增强点

1. 文本排序性能增强,当相同的文本多次出现时,性能有所增强。

Speed up text sorts where the same string occurs multiple times (Peter Geoghegan)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0e57b4d8bd9674adaf5747421b3255b85e385534

Speed up text sorts where the same strings occur multiple times.

Cache strxfrm() blobs across calls made to the text SortSupport abbreviation routine.  
This can speed up sorting if the same string needs to be abbreviated many times in a row.

Also, cache the result of the previous strcoll() comparison, so that if we're asked to compare the same strings agin, we do need to call strcoll() again.

Perhaps surprisingly, these optimizations don't seem to hurt even when they don't help.  
memcmp() is really cheap compared to strcoll() or strxfrm().

Peter Geoghegan, reviewed by me.  

2. 对uuid,bytea,char(n)的排序性能增强,使用abb keys, 整型比较算法取代memcmp。

Speed up sorting of uuid, bytea, and char(n) fields by using "abbreviated" keys (Peter Geoghegan)

Support for abbreviated keys has also been added to the non-default operator classes text_pattern_ops, varchar_pattern_ops, and bpchar_pattern_ops.

Processing of ordered-set aggregates can also now exploit abbreviated keys.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a76ef15d9fc9207a0758e8d6f6700dc8c931a934

Add sort support routine for the UUID data type.

This introduces a simple encoding scheme to produce abbreviated keys:
pack as many bytes of each UUID as will fit into a Datum.  
On little-endian machines, a byteswap is also performed; 
the abbreviated comparator can therefore just consist of a simple 3-way unsigned integer comparison.

The purpose of this change is to speed up sorting data on a column of type UUID.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bfb54ff15a447fb22e9deae096e0d45b3e4bd56f

Make abbreviated key comparisons for text a bit cheaper.

If we do some byte-swapping while abbreviating, 
we can do comparisons using integer arithmetic rather than memcmp.

3. 并行创建索引(非堵塞式)的性能增强

Speed up CREATE INDEX CONCURRENTLY by treating TIDs as 64-bit integers during sorting (Peter Geoghegan)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b648b70342fbe712383e8cd76dc8f7feaba9aaa3

Speed up CREATE INDEX CONCURRENTLY's TID sort. 

Encode TIDs as 64-bit integers to speed up comparisons.   
This seems to speed things up on all platforms, but is even more beneficial when 8-byte integers are passed by value. 

Peter Geoghegan.  Design suggestions and review by Tom Lane.  
Review also by Simon Riggs and by me. 

测试例子

1. 高度重复文本排序

1000万记录

postgres=# create table test(id int, info text);
CREATE TABLE
postgres=# insert into test select generate_series(1,10000000), '111111111111111111111111111111111111111111111111111111111111';
INSERT 0 10000000

1.1 PostgreSQL 9.5

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test order by info ;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1854886.10..1879886.14 rows=10000017 width=65) (actual time=9035.319..10444.277 rows=10000000 loops=1)
   Output: id, info
   Sort Key: test.info
   Sort Method: external sort  Disk: 733144kB
   Buffers: shared hit=123457, temp read=91643 written=91643
   ->  Seq Scan on public.test  (cost=0.00..223457.17 rows=10000017 width=65) (actual time=0.007..1135.602 rows=10000000 loops=1)
         Output: id, info
         Buffers: shared hit=123457
 Planning time: 0.095 ms
 Execution time: 11151.855 ms
(10 rows)

排序耗时7.9秒

1.2 PostgreSQL 9.6

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test order by info ;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4544431.28..4612332.63 rows=27160540 width=9) (actual time=8141.378..9660.753 rows=10000000 loops=1)
   Output: id, info
   Sort Key: test.info
   Sort Method: external sort  Disk: 733144kB
   Buffers: shared hit=123457, temp read=91643 written=91643
   ->  Seq Scan on public.test  (cost=0.00..395062.40 rows=27160540 width=9) (actual time=0.011..1188.940 rows=10000000 loops=1)
         Output: id, info
         Buffers: shared hit=123457
 Planning time: 0.053 ms
 Execution time: 10387.210 ms
(10 rows)

排序耗时6.9秒

2. 离散文本排序

1000万记录

postgres=# create table test_rand(id int, info text);
CREATE TABLE
postgres=# insert into test_rand select generate_series(1,10000000), md5(random()::text);
INSERT 0 10000000

2.1 PostgreSQL 9.5

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test_rand order by info ;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1754736.47..1781195.02 rows=10583418 width=36) (actual time=21266.183..26347.778 rows=10000000 loops=1)
   Output: id, info
   Sort Key: test_rand.info
   Sort Method: external merge  Disk: 459464kB
   Buffers: shared hit=83334, temp read=150352 written=150352
   ->  Seq Scan on public.test_rand  (cost=0.00..189168.18 rows=10583418 width=36) (actual time=0.013..1363.646 rows=10000000 loops=1)
         Output: id, info
         Buffers: shared hit=83334
 Planning time: 0.069 ms
 Execution time: 27005.070 ms
(10 rows)

排序耗时19.9秒

2.2 PostgreSQL 9.6

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test_rand order by info ;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1658485.88..1683485.52 rows=9999858 width=37) (actual time=16424.010..21024.141 rows=10000000 loops=1)
   Output: id, info
   Sort Key: test_rand.info
   Sort Method: external merge  Disk: 459464kB
   Buffers: shared hit=83334, temp read=179142 written=179142
   ->  Seq Scan on public.test_rand  (cost=0.00..183332.58 rows=9999858 width=37) (actual time=0.011..1025.182 rows=10000000 loops=1)
         Output: id, info
         Buffers: shared hit=83334
 Planning time: 0.053 ms
 Execution time: 21691.354 ms
(10 rows)

排序耗时15秒

3. char(n)排序

1000万记录

postgres=# create table char_32(id char(32));
CREATE TABLE
postgres=# insert into char_32 select md5(random()::text) from generate_series(1,10000000);
INSERT 0 10000000

3.1 PostgreSQL 9.5

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from char_32 order by id;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=925778.75..936403.83 rows=4250034 width=132) (actual time=29156.470..35327.059 rows=10000000 loops=1)
   Output: id
   Sort Key: char_32.id
   Sort Method: external merge  Disk: 420408kB
   Buffers: shared hit=83339, temp read=137591 written=137591
   ->  Seq Scan on public.char_32  (cost=0.00..125834.34 rows=4250034 width=132) (actual time=0.009..1324.482 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=83334
 Planning time: 0.085 ms
 Execution time: 35961.840 ms
(10 rows)

排序耗时27.8秒

3.2 PostgreSQL 9.6

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from char_32 order by id;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1658523.51..1683523.71 rows=10000080 width=33) (actual time=17423.580..22348.735 rows=10000000 loops=1)
   Output: id
   Sort Key: char_32.id
   Sort Method: external merge  Disk: 420432kB
   Buffers: shared hit=83339, temp read=163911 written=163911
   ->  Seq Scan on public.char_32  (cost=0.00..183334.80 rows=10000080 width=33) (actual time=0.019..1203.205 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=83336
 Planning time: 0.200 ms
 Execution time: 23013.343 ms
(10 rows)

排序耗时16.2秒

4. 非堵塞式创建索引

5000万随机数据

postgres=# create table curr_idx_test(id int);
CREATE TABLE
postgres=# insert into curr_idx_test select 50000000*random() from generate_series(1,50000000);
INSERT 0 50000000
postgres=# \dt+ curr_idx_test 
                         List of relations
 Schema |     Name      | Type  |  Owner   |  Size   | Description 
--------+---------------+-------+----------+---------+-------------
 public | curr_idx_test | table | postgres | 1729 MB | 
(1 row)

4.1 PostgreSQL 9.5

postgres=# set maintenance_work_mem='16GB';
SET
postgres=# \timing
Timing is on.
postgres=# create index CONCURRENTLY idx1_curr_idx_test on curr_idx_test(id);
CREATE INDEX
Time: 92591.646 ms

4.2 PostgreSQL 9.6

postgres=# set maintenance_work_mem='16GB';
SET
postgres=# \timing
Timing is on.
postgres=# create index CONCURRENTLY idx1_curr_idx_test on curr_idx_test(id);
CREATE INDEX
Time: 56483.672 ms

5. 堵塞式创建索引

5.1 PostgreSQL 9.5

postgres=# create index idx2_curr_idx_test on curr_idx_test(id);
CREATE INDEX
Time: 40733.880 ms

5.2 PostgreSQL 9.6

postgres=# create index idx2_curr_idx_test on curr_idx_test(id);
CREATE INDEX
Time: 40713.340 ms

结果对比

1. 高度重复文本排序(1000万记录)
PostgreSQL 9.5 排序耗时7.9秒
PostgreSQL 9.6 排序耗时6.9秒

2. 离散文本排序(1000万记录),quicksort的优化
PostgreSQL 9.5 排序耗时19.9秒
PostgreSQL 9.6 排序耗时15秒

3. char(n)排序(1000万记录)
PostgreSQL 9.5 排序耗时27.8秒
PostgreSQL 9.6 排序耗时16.2秒

4. 非堵塞式创建索引(5000万记录)
PostgreSQL 9.5 排序耗时92.6秒
PostgreSQL 9.6 排序耗时56.5秒

5. 堵塞式创建索引(5000万记录)
PostgreSQL 9.5 排序耗时40.7秒
PostgreSQL 9.6 排序耗时40.7秒

Count

相关实践学习
使用PolarDB和ECS搭建门户网站
本场景主要介绍基于PolarDB和ECS实现搭建门户网站。
阿里云数据库产品家族及特性
阿里云智能数据库产品团队一直致力于不断健全产品体系,提升产品性能,打磨产品功能,从而帮助客户实现更加极致的弹性能力、具备更强的扩展能力、并利用云设施进一步降低企业成本。以云原生+分布式为核心技术抓手,打造以自研的在线事务型(OLTP)数据库Polar DB和在线分析型(OLAP)数据库Analytic DB为代表的新一代企业级云原生数据库产品体系, 结合NoSQL数据库、数据库生态工具、云原生智能化数据库管控平台,为阿里巴巴经济体以及各个行业的企业客户和开发者提供从公共云到混合云再到私有云的完整解决方案,提供基于云基础设施进行数据从处理、到存储、再到计算与分析的一体化解决方案。本节课带你了解阿里云数据库产品家族及特性。
目录
相关文章
|
1月前
|
存储 关系型数据库 MySQL
提高MySQL查询性能的方法有很多
提高MySQL查询性能的方法有很多
138 7
|
1月前
|
存储 关系型数据库 MySQL
提高MySQL的查询性能
提高MySQL的查询性能
62 4
|
2月前
|
SQL 关系型数据库 MySQL
MySQL 8.0:filesort 性能退化的问题分析
用户将 RDS MySQL 实例从 5.6 升级到 8.0 后,发现相同 SQL 的执行时间增长了十几倍。本文就该问题逐步展开排查,并最终定位根因。
|
3天前
|
缓存 监控 关系型数据库
如何根据监控结果调整 MySQL 数据库的参数以提高性能?
【10月更文挑战第28天】根据MySQL数据库的监控结果来调整参数以提高性能,需要综合考虑多个方面的因素
29 1
|
3天前
|
监控 关系型数据库 MySQL
如何监控和诊断 MySQL 数据库的性能问题?
【10月更文挑战第28天】监控和诊断MySQL数据库的性能问题是确保数据库高效稳定运行的关键
8 1
|
3天前
|
缓存 关系型数据库 MySQL
如何优化 MySQL 数据库的性能?
【10月更文挑战第28天】
13 1
|
15天前
|
存储 关系型数据库 MySQL
优化 MySQL 的锁机制以提高并发性能
【10月更文挑战第16天】优化 MySQL 锁机制需要综合考虑多个因素,根据具体的应用场景和需求进行针对性的调整。通过不断地优化和改进,可以提高数据库的并发性能,提升系统的整体效率。
18 1
|
4天前
|
监控 关系型数据库 MySQL
数据库优化:MySQL索引策略与查询性能调优实战
【10月更文挑战第27天】本文深入探讨了MySQL的索引策略和查询性能调优技巧。通过介绍B-Tree索引、哈希索引和全文索引等不同类型,以及如何创建和维护索引,结合实战案例分析查询执行计划,帮助读者掌握提升查询性能的方法。定期优化索引和调整查询语句是提高数据库性能的关键。
20 0
|
5天前
|
监控 关系型数据库 MySQL
数据库优化:MySQL索引策略与查询性能调优实战
【10月更文挑战第26天】数据库作为现代应用系统的核心组件,其性能优化至关重要。本文主要探讨MySQL的索引策略与查询性能调优。通过合理创建索引(如B-Tree、复合索引)和优化查询语句(如使用EXPLAIN、优化分页查询),可以显著提升数据库的响应速度和稳定性。实践中还需定期审查慢查询日志,持续优化性能。
25 0
|
30天前
|
SQL 关系型数据库 MySQL
MySQL中的 where 1=1会不会影响性能?看完官方文档就悟了!
本文探讨了在Mybatis中使用`where 1=1`进行动态SQL拼接是否会影响性能。通过MySQL官方资料和实际测试表明,`where 1=1`在MySQL 5.7及以上版本中会被优化器优化,因此对性能影响不大。文中详细对比了`where 1=1`与`<where>`标签的使用方法,并建议根据MySQL版本和团队需求选择合适的方式。最后,推荐查找官方资料以确保技术路线正确。
30 4

相关产品

  • 云原生数据库 PolarDB
  • 云数据库 RDS PostgreSQL 版