PostgreSQL的序列奇快无比的秘密-阿里云开发者社区

PostgreSQL的序列奇快无比的秘密

2016-02-19 4673

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云原生数据库 PolarDB MySQL 版，通用型 2核4GB 50GB

云原生数据库 PolarDB PostgreSQL 版，标准版 2核4GB 50GB

简介： 1. 引言曾经有篇流传较广的文章Don’t Assume PostgreSQL is Slow 展示了PostgreSQL生成序列的速度不亚于redis的INCRs。而在此之前我就曾做过相关的测试（参考PostgreSQL的序列的性能验证），发现PG生成序列的速度远高于同类的关系数据库。根

1. 引言

曾经有篇流传较广的文章 Don’t Assume PostgreSQL is Slow 展示了PostgreSQL生成序列的速度不亚于redis的INCR s。而在此之前我就曾做过相关的测试（参考 PostgreSQL的序列的性能验证），发现PG生成序列的速度远高于同类的关系数据库。根据 PostgreSQL的序列的性能验证中测试结果，在没有启用序列cache的情况下， PG的每次调用nextval('seq1')的额外时间消耗大概是0.3us，也就是333w/s，所以即使做批量数据加载也不用担心序列拖后腿；而Oracle的nocache序列生成速度大概只有5w/s,当Oracle序列 cache了50以上时，速度才开始接近pg。
这个结果很惊人，但细一想，PG快得有点离谱。为什么这么说？因为当时测试的select nextval('seq1')在4核虚机上达到了7w/s的qps，而那个测试环境估计支撑不了这么高的iops，所以猜测PG一定对序列做了某种优化而不是每次刷盘。

2. 代码分析

关键代码见src/backend/commands/sequence.c的nextval_internal()函数，有个叫SEQ_LOG_VALS的常量，控制PG每产生32个序列值才记一次WAL。这相当于PG对序列做了全局缓存，而PG的create sequence语法上的cache是指每个进程（也就是连接）的本地cache。由于全局缓存优化的已经足够好了，所以一般不需要再启用本地cache。

src/backend/commands/sequence.c

点击(此处)折叠或打开

fetch = cache = seq->cache_value;
log = seq->log_cnt;
...
/*
* Decide whether we should emit a WAL log record. If so, force up the
* fetch count to grab SEQ_LOG_VALS more values than we actually need to
* cache. (These will then be usable without logging.)
*
* If this is the first nextval after a checkpoint, we must force a new
* WAL record to be written anyway, else replay starting from the
* checkpoint would fail to advance the sequence past the logged values.
* In this case we may as well fetch extra values.
*/
if (log < fetch || !seq->is_called)//此处fetch值为1.每次调nextval()log_cnt会递减,减到0时设置logit标志位
{
/* forced log to satisfy local demand for values */
fetch = log = fetch + SEQ_LOG_VALS;
logit = true;
}
else
{
XLogRecPtr redoptr = GetRedoRecPtr();
if (PageGetLSN(page) <= redoptr)
{
/* last update of seq was before checkpoint */
fetch = log = fetch + SEQ_LOG_VALS;
logit = true;
}
}
...
if (logit && RelationNeedsWAL(seqrel))
{
xl_seq_rec xlrec;
XLogRecPtr recptr;
/*
* We don't log the current state of the tuple, but rather the state
* as it would appear after "log" more fetches. This lets us skip
* that many future WAL records, at the cost that we lose those
* sequence values if we crash.
*/
XLogBeginInsert();
XLogRegisterBuffer(0, buf, REGBUF_WILL_INIT);
/* set values that will be saved in xlog */
seq->last_value = next;//WAL中记录的last_value是下一轮的序列值，所以pg crash再通过WAL恢复后，新产生的序列会跳过几个值
seq->is_called = true;
seq->log_cnt = 0;
xlrec.node = seqrel->rd_node;
XLogRegisterData((char *) &xlrec, sizeof(xl_seq_rec));
XLogRegisterData((char *) seqtuple.t_data, seqtuple.t_len);
recptr = XLogInsert(RM_SEQ_ID, XLOG_SEQ_LOG);
PageSetLSN(page, recptr);
}

3. 实测验证

3.1 WAL写入时机

PG通过内部log_cnt计数器控制是否要记录序列更新的WAL，新建的序列，计数器初始值为0。

postgres=# create sequence seq1;
CREATE SEQUENCE
postgres=# \d seq1
Sequence "public.seq1"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | seq1
last_value | bigint | 1
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 0
is_cycled | boolean | f
is_called | boolean | f

取得第一个序列值后，log_cnt变成32。

postgres=# select nextval('seq1');
nextval
---------
1
(1 row)
postgres=# \d seq1
Sequence "public.seq1"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | seq1
last_value | bigint | 1
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 32
is_cycled | boolean | f
is_called | boolean | t

这个过程中，通过strace监视"wal writer process"进程,可以发现发生了WAL写入和刷盘。

[root@localhost ~]# strace -efsync,write,fdatasync -p 2997
Process 2997 attached
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3209, si_uid=1001} ---
write(11, "\0", 1) = 1
write(3, "\207\320\5\0\1\0\0\0\0\300\273\t\0\0\0\0I\10\0\0\0\0\0\0\n\0\0004-\0\0\0"..., 8192) = 8192
fdatasync(3) = 0

以后每次获取序列值，log_cnt会减1，但只有log_cnt减到0，并从0重新跳到32的时候，strace中才能看到WAL写入。

postgres=# select nextval('seq1');
nextval
---------
2
(1 row)
postgres=# \d seq1
Sequence "public.seq1"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | seq1
last_value | bigint | 2
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 31
is_cycled | boolean | f
is_called | boolean | t

3.2 PG crash后的序列值

PG在记录序列的WAL时，记录的是当前值+32。所以如果PG crash再恢复后，将跳过一部分从未使用的序列值。这样做避免了产生重复序列的可能，但不能保证序列的连续，这是优化WAL写入而付出的必要代价。
下面是使用kill -9杀PG进程的情况。

点击(此处)折叠或打开

postgres=# select nextval('seq1');
nextval
---------
20
(1 row)
postgres=# \d seq1
Sequence "public.seq1"
Column | Type | Value
---------------+---------+---------------------
sequence_name | name | seq1
last_value | bigint | 20
start_value | bigint | 1
increment_by | bigint | 1
max_value | bigint | 9223372036854775807
min_value | bigint | 1
cache_value | bigint | 1
log_cnt | bigint | 28
is_cycled | boolean | f
is_called | boolean | t
postgres=# select nextval('seq1');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
postgres=# select nextval('seq1');
nextval
---------
49
(1 row)

不过，正常关闭或重启PG是不会出现这种问题的。

PostgreSQL的序列奇快无比的秘密

1. 引言

2. 代码分析

3. 实测验证

3.1 WAL写入时机

3.2 PG crash后的序列值

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

PostgreSQL的序列奇快无比的秘密

1. 引言

2. 代码分析

3. 实测验证

3.1 WAL写入时机

3.2 PG crash后的序列值

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像