开发者社区> 德哥> 正文
阿里云
为了无法计算的价值
打开APP
阿里云APP内打开

PostgreSQL 9.6 vacuum freeze大幅性能提升 代码浅析

简介:
+关注继续查看

PostgreSQL 9.6 vacuum freeze大幅性能提升 代码浅析

作者

digoal

日期

2016-10-02

标签

PostgreSQL , 9.6 , vacuum freeze , visibility map , skip frozen page


背景

PostgreSQL的tuple(即记录)头信息中有两个字段分别为XMIN,XMAX用于标记行产生与变更的事务号,以标示记录的版本号,事务的可见性等。

这个事务号是32BIT的长度,因此PG设计了一个事务存活的最长时间是约20亿,如果超过20亿必须将这个事务置为frozen。

被置为frozen状态的记录,对所有的事务可见,从而解决了32BIT的XID可以用于实现MVCC的目的。

因此PostgreSQL 需要周期性的对表进行扫描,检查是否需要将记录置为frozen。

PostgreSQL 9.4以前的版本,FROZEN是通过一个等于2的XID来表示的。

从9.4开始改成了通过tuple 头部的t_infomask中的两个互斥的比特位来表示HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID。

参见
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37484ad2aacef5ec794f4dd3d5cf814475180a78

有些插入操作,也可以直接将记录置为freeze,例如大批量的COPY数据。 insert into等。

src/include/access/heapam.h

/* "options" flag bits for heap_insert */
#define HEAP_INSERT_SKIP_WAL    0x0001
#define HEAP_INSERT_SKIP_FSM    0x0002
#define HEAP_INSERT_FROZEN              0x0004
#define HEAP_INSERT_SPECULATIVE 0x0008

表的全局年龄则记录在pg_class中,即使表没有任何变化,在年龄到达一定的值(参数配置)后,也需要发起frozen的动作,对表的记录进行扫描。

9.6对这块做了改进,当数据页中的所有记录已经是FROZEN状态时,在发起vacuum freeze时会跳过这个页的扫描,从而大幅提升静态数据的freeze操作,减少IO扫描。

目前这个page frozen标记放在表对应的VM文件中。

vacuum freeze改进代码分析

1. 如何将记录标记为freeze状态

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=37484ad2aacef5ec794f4dd3d5cf814475180a78

Change the way we mark tuples as frozen.

Instead of changing the tuple xmin to FrozenTransactionId, the combination
of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never
set together, is now defined as HEAP_XMIN_FROZEN.  A variety of previous
proposals to freeze tuples opportunistically before vacuum_freeze_min_age
is reached have foundered on the objection that replacing xmin by
FrozenTransactionId might hinder debugging efforts when things in this
area go awry; this patch is intended to solve that problem by keeping
the XID around (but largely ignoring the value to which it is set).

Third-party code that checks for HEAP_XMIN_INVALID on tuples where
HEAP_XMIN_COMMITTED might be set will be broken by this change.  To fix,
use the new accessor macros in htup_details.h rather than consulting the
bits directly.  HeapTupleHeaderGetXmin has been modified to return
FrozenTransactionId when the infomask bits indicate that the tuple is
frozen; use HeapTupleHeaderGetRawXmin when you already know that the
tuple isn't marked commited or frozen, or want the raw value anyway.
We currently do this in routines that display the xmin for user consumption,
in tqual.c where it's known to be safe and important for the avoidance of
extra cycles, and in the function-caching code for various procedural
languages, which shouldn't invalidate the cache just because the tuple
gets frozen.

Robert Haas and Andres Freund

src/include/access/htup_details.h

/*
 * information stored in t_infomask:
 */
#define HEAP_XMIN_FROZEN                (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)

2. 9.6在VM文件中,新增了对FROZEN页的标记,当整页数据都是静态的时,并且都已经是freeze状态,则该页标记为FROZEN,在执行vacuum freeze或自动触发freeze时,就会跳过这个页的扫描。

src/backend/access/heap/visibilitymap.c

 * The visibility map is a bitmap with two bits (all-visible and all-frozen)
 * per heap page. A set all-visible bit means that all tuples on the page are
 * known visible to all transactions, and therefore the page doesn't need to
 * be vacuumed. A set all-frozen bit means that all tuples on the page are
 * completely frozen, and therefore the page doesn't need to be vacuumed even
 * if whole table scanning vacuum is required (e.g. anti-wraparound vacuum).
 * The all-frozen bit must be set only when the page is already all-visible.

src/include/access/visibilitymap.h

/* Number of bits for one heap page */
#define BITS_PER_HEAPBLOCK 2

/* Flags for bit map */
#define VISIBILITYMAP_ALL_VISIBLE       0x01
#define VISIBILITYMAP_ALL_FROZEN        0x02
#define VISIBILITYMAP_VALID_BITS        0x03            /* OR of all valid
                                                                                                 * visiblitymap flags bits */

/* Macros for visibilitymap test */
#define VM_ALL_VISIBLE(r, b, v) \
        ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_VISIBLE) != 0)
#define VM_ALL_FROZEN(r, b, v) \
        ((visibilitymap_get_status((r), (b), (v)) & VISIBILITYMAP_ALL_FROZEN) != 0)

3. 如果vm页损坏了,我们可以通过vacuum DISABLE_PAGE_SKIPPING强制扫描所有的页。
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=ede62e56fbe809baa1a7bc3873d82f12ffe7540b

If you really want to vacuum every single page in the relation,
regardless of apparent visibility status or anything else, you can use
this option.  In previous releases, this behavior could be achieved
using VACUUM (FREEZE), but because we can now recognize all-frozen
pages as not needing to be frozen again, that no longer works.  There
should be no need for routine use of this option, but maybe bugs or
disaster recovery will necessitate its use.

postgres=# \h vacuum
Command:     VACUUM
Description: garbage-collect and optionally analyze a database
Syntax:
VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ table_name [ (column_name [, ...] ) ] ]

9.6 vs 9.5 FREEZE操作

测试用例

postgres=# create table test(id int, info text);
CREATE TABLE

插入6400万记录

$ vi test.sql
insert into test values (1,'test');

$ pgbench -M prepared -n -r -f ./test.sql -c 64 -j 64 -t 1000000


transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 64
number of threads: 64
number of transactions per client: 1000000
number of transactions actually processed: 64000000/64000000
latency average = 0.164 ms
tps = 389383.803477 (including connections establishing)
tps = 389393.063237 (excluding connections establishing)
script statistics:
 - statement latencies in milliseconds:
         0.162  insert into test values (1,'test');

1. 9.6测试结果
生成vm文件

postgres=# vacuum analyze test;
VACUUM
Time: 8629.180 ms

确认VM文件已生成

postgres=# select * from pg_stat_file(pg_relation_filepath('test')||'_vm');
 size  |         access         |      modification      |         change         | creation | isdir 
-------+------------------------+------------------------+------------------------+----------+-------
 90112 | 2016-10-02 13:34:24+08 | 2016-10-02 13:34:32+08 | 2016-10-02 13:34:32+08 |          | f
(1 row)

记录当前XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/87E3C58
(1 row)

执行第一次vacuum freeze

postgres=# vacuum freeze test;
VACUUM
Time: 3487.945 ms

记录XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C3F30
(1 row)

计算第一次VACUUM FREEZE产生了多少XLOG

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C3F30','3/87E3C58'));
 pg_size_pretty 
----------------
 604 MB
(1 row)

不产生数据变更,开始接下来的vacuum freeze。
9.6接下来的VACUUM FREEZE非常快,已经自动跳过了frozen page,并且不会产生XLOG。

postgres=# vacuum freeze test;
VACUUM
Time: 16.581 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C3F30
(1 row)
Time: 0.079 ms

postgres=# vacuum freeze test;
VACUUM
Time: 16.555 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C3F30
(1 row)
Time: 0.115 ms

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 0 nonremovable row versions in 0 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 16.566 ms

消耗几个事务后,重新vacuum freeze依旧很快,产生非常少量的xlog(这些xlog实际上是变更test表的pg_class.relfrozenxid字段产生的)。

postgres=# select txid_current();
 txid_current 
--------------
     64001925
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
     64001926
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
     64001927
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 0 nonremovable row versions in 0 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 18.020 ms
postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C4130
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C4130', '3/2E3C3F30'));
 pg_size_pretty 
----------------
 512 bytes
(1 row)

使用强制vacuum freeze,不跳过froze pages.
当VM文件损坏时,可以这样使用。

postgres=# vacuum (freeze, verbose, DISABLE_PAGE_SKIPPING) test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 345957 out of 345957 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
2 pages are entirely empty.
CPU 0.68s/3.70u sec elapsed 4.38 sec.
INFO:  vacuuming "pg_toast.pg_toast_16415"
INFO:  index "pg_toast_16415_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16415": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 4397.821 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 3/2E3C42F0
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('3/2E3C42F0','3/2E3C4130'));
 pg_size_pretty 
----------------
 448 bytes
(1 row)

通过测试,我们可以看到PostgreSQL 9.6在vacuum freeze这块改进非常实用,特别是当数据库很庞大,并且存储了大量的历史静态数据时,在发生vacuum freeze时,IO不会像以前那样飙升了。

2. 9.5测试结果
生成vm文件

postgres=# vacuum analyze test;
VACUUM
Time: 17441.652 ms

确认VM文件已生成

postgres=# select * from pg_stat_file(pg_relation_filepath('test')||'_vm');
 size  |         access         |      modification      |         change         | creation | isdir 
-------+------------------------+------------------------+------------------------+----------+-------
 49152 | 2016-10-02 14:11:17+08 | 2016-10-02 14:11:34+08 | 2016-10-02 14:11:34+08 |          | f
(1 row)

记录当前XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/A7FD5AC8
(1 row)

执行第一次vacuum freeze

postgres=# vacuum freeze test;
VACUUM
Time: 9889.702 ms

记录XLOG位置

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DAB8
(1 row)

计算第一次VACUUM FREEZE产生了多少XLOG

postgres=# select pg_size_pretty(pg_xlog_location_diff('5/CCB2DAB8', '5/A7FD5AC8'));
 pg_size_pretty 
----------------
 587 MB
(1 row)

不产生数据变更,开始接下来的vacuum freeze。
9.5的版本,接下来的VACUUM FREEZE同样需要扫描全表,因为VM中未记录frozen pages。

postgres=# vacuum freeze test;
VACUUM
Time: 7191.695 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DAB8
(1 row)

postgres=# vacuum freeze test;
VACUUM
Time: 7159.769 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DAB8
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 346248 out of 346248 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/7.18u sec elapsed 7.19 sec.
INFO:  vacuuming "pg_toast.pg_toast_16682"
INFO:  index "pg_toast_16682_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16682": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 7206.649 ms

消耗几个事务后,重新vacuum freeze。

postgres=# select txid_current();
 txid_current 
--------------
    128001924
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
    128001925
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
    128001926
(1 row)

postgres=# select txid_current();
 txid_current 
--------------
    128001927
(1 row)

postgres=# vacuum freeze verbose test;
INFO:  vacuuming "public.test"
INFO:  "test": found 0 removable, 64000000 nonremovable row versions in 346248 out of 346248 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/7.18u sec elapsed 7.19 sec.
INFO:  vacuuming "pg_toast.pg_toast_16682"
INFO:  index "pg_toast_16682_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16682": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 7208.527 ms

postgres=# select pg_current_xlog_location();
 pg_current_xlog_location 
--------------------------
 5/CCB2DEA8
(1 row)

postgres=# select pg_size_pretty(pg_xlog_location_diff('5/CCB2DEA8', '5/CCB2DAB8'));
 pg_size_pretty 
----------------
 1008 bytes
(1 row)

小结

PostgreSQL 9.6对VM文件的功能进行了扩展,实用2个比特位标记该页的记录数是否是clean的,对所有事务可见;以及标记该页的所有记录数是否都是FREEZE的。

在进行freeze操作时,或者触发autovacuum freeze prevent wrapped object时,会自动跳过标记为frozen的page,从而大幅减少vacuum freeze的IO。

对于数据库中有大量静态数据,并且又有高并发的写事务并行存在时,特别实用。

同时PostgreSQL 9.6还提供了强制vacuum freeze的接口,不跳过任何页。 保留这样的接口,目的是在vm文件损坏时可用,VM文件也可以通过这种方法自动修复。

Count

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

相关文章
RDS审计中心--数据库安全监控利器
依托阿里云SLS日志服务及阿里云RDS SQL审计,以极低的代价采集对SQL执行的所有操作,RDS审计中心提供丰富的内置告警规则,覆盖绝大部分场景的安全、运营、性能报表,只需要做最少的配置即可实现对数据库的全方位监控。
619 0
商业化十周年,阿里云RDS推出企业级自治数据库
近日,阿里云宣布RDS数据库品牌升级计划,推出云原生企业级自治数据库。今天也是阿里云RDS商业化十周年。据了解,阿里云是国内首家提供自治服务的数据库厂商,基于人工智能和机器学习技术,阿里云RDS数据库提供自动升级、自动调优等100%数据库自动驾驶能力。云原生数据库2.0时代,阿里云RDS通过企业级的自治能力为客户提供更快、更稳、更安全的数据库服务。
744 0
免费享受同城双可用区高可用容错能力!阿里云云数据库RDS新增可用区6月汇总(内含福利)
6月份,阿里云云数据库 MySQL 版,云数据库 PPAS 版,云数据库 SQL Server 版,云数据库 PostgreSQL 版均宣布新增可用区,用户在控制台上按需求创建实例,即可享受同城双可用区高可用容错能力。接下来小编将为大家详细列出新增可用区。
2729 0
MSSQL · 最佳实践 · 实例级别数据库上云RDS SQL Server
摘要 到目前,我们完成了SQL Server备份还原专题系列八篇月报分享:三种常见的数据库备份、备份策略的制定、查找备份链、数据库的三种恢复模式与备份之间的关系、利用文件组实现冷热数据隔离备份方案、如何监控备份还原进度、阿里云RDS SQL自动化迁移上云的一种解决方案以及上个月分享的RDS SDK实现数据库迁移上阿里云,本期我们分享如何将用户线下或者ECS上自建实例级别数据库一键迁移上阿里云RDS SQL Server。
1675 0
阿里云云数据库RDS秒级监控功能解锁,通宵加班找故障将成为过去式
每一个奋斗在前线的数据库管理员和运维人员似乎运气都不太好,这些人都绝对经历过的诡异事件就是:逢年过节必出故障,明明眼看着要休假了,又接到故障通知,只好通宵加班找问题。没问题的时候可能大家都不会想到你,一出问题就先拿运维试问,于是每逢佳节便出现拜数据库的戏谑图片。
2716 0
重磅干货免费下载!阿里云RDS团队论文被数据库顶会SIGMOD 2018收录
来自阿里云RDS团队的论文“**TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in Real-time” (TcpRT:面向大规模海量云数据库的服务质量实时采集与诊断系统)**被数据库顶会SIGMOD 2018收录。
10381 0
参与 API 创新应用大赛,体验RDS CloudDBA数据库性能优化 API
阿里云的RDS数据库,有开发者所需要的一系列的功能,但很多功能很多开发者可能并没有使用过。这里,介绍一个RDS比较有用的功能:CloudDBA数据库性能优化 API。
12948 0
同步RDS数据库到自建mysql数据库
同步RDS数据库到自建mysql数据库
4410 0
从运维的角度分析使用阿里云数据库RDS的必要性--你不应该在阿里云上使用自建的MySQL/SQL Server/Oracle/PostgreSQL数据库
开宗明义,你不应该在阿里云上使用自建的MySQL or SQL Server数据库,对了,还有Oracle or PostgreSQL数据库。 云数据库 RDS(Relational Database Service)是一种稳定可靠、可弹性伸缩的在线数据库服务。
4454 0
+关注
德哥
公益是一辈子的事, I am digoal, just do it.
文章
问答
来源圈子
更多
让用户数据永远在线,让数据无缝的自由流动
+ 订阅
文章排行榜
最热
最新
相关电子书
更多
PostgresChina2018_权宗亮_基于odyssey连接池实现企业级PostgreSQL数据分布中间件
立即下载
金融级 PostgreSQL监控及优化
立即下载
PostgresChina2018_曾文旌_阿里云RDS_for_PostgreSQL在PostgreSQL功能和性能改进
立即下载