标签
PostgreSQL , 审计日志 , duration , 性能影响 , syslogger , BUFFER , log_statement , log_duration , track_io_timing , osq_lock
背景
开启审计日志,或者开启数据库的SQL耗时记录,会给数据库带来多大的性能开销?
1、审计日志通过log_statement设置,可以设置为all, ddl, mod, none,分别表示审计所有SQL,DDL SQL,DML SQL,不审计。
开启审计时,在执行SQL的时候(提交SQL请求时),数据库日志中打印类似这样的信息:
2018-06-16 14:48:23.760 CST,"postgres","postgres",19448,"[local]",5b24b270.4bf8,8,"idle",2018-06-16 14:47:12 CST,3/1484181,0,LOG,00000,"statement: select count(*) from pg_class;",,,,,,,,"exec_simple_query, postgres.c:940","psql"
2、SQL耗时记录,通过设置log_duration参数来指定,设置为ON 或OFF。
如果打开时间记录,那么在SQL执行结束时,打印SQL的耗时。(不包括将结果传输到客户端的时间)
2018-06-16 14:48:23.760 CST,"postgres","postgres",19448,"[local]",5b24b270.4bf8,9,"SELECT",2018-06-16 14:47:12 CST,3/0,0,LOG,00000,"duration: 0.609 ms",,,,,,,,"exec_simple_query, postgres.c:1170","psql"
开启这两个开关,有多大的性能开销?
1、开启审计,实际上性能开销较小,因为日志是异步管道输出,而且目前的BUFFER已经调得比较大,性能损耗小。
但是,请注意,如果并发非常高,大量写日志可能引发mutex lock冲突,导致性能骤降。PostgreSQL syslogger相关代码有改进优化的地方。
src/backend/postmaster/syslogger.c
/*
* Buffers for saving partial messages from different backends.
*
* Keep NBUFFER_LISTS lists of these, with the entry for a given source pid
* being in the list numbered (pid % NBUFFER_LISTS), so as to cut down on
* the number of entries we have to examine for any one incoming message.
* There must never be more than one entry for the same source pid.
*
* An inactive buffer is not removed from its list, just held for re-use.
* An inactive buffer has pid == 0 and undefined contents of data.
*/
typedef struct
{
int32 pid; /* PID of source process */
StringInfoData data; /* accumulated data, as a StringInfo */
} save_buffer;
#define NBUFFER_LISTS 256
static List *buffer_lists[NBUFFER_LISTS];
2、开启时间统计,这个开销主要取决于读取时钟的额外开销。
通过pg_test_timing可以测试获取系统时间带来的额外开销。
数据库中如果要统计IO TIMING(track_io_timing = on),以及auto_explain中的log timing,开销也与之类似。
pg_test_timing -d 10
Testing timing overhead for 10 seconds.
Per loop time including overhead: 42.38 ns
Histogram of timing durations:
< us % of total count
1 95.76853 225960491
2 4.22695 9973261
4 0.00430 10145
8 0.00005 109
16 0.00016 370
32 0.00001 32
64 0.00000 3
实测
测试环境为ECS主机,测出了一个虚拟机的性能缺陷。
1、创建测试数据
pgbench -i -s 300
2、测试小事务
pgbench -M prepared -v -r -P 1 -c 56 -j 56 -T 120 -S
3、关闭log_statement与log_duration
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 25058528
latency average = 0.268 ms
latency stddev = 0.217 ms
tps = 208818.253316 (including connections establishing)
tps = 208834.342644 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.025 \set aid random(1, 100000 * :scale)
0.252 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
4、开启log_statement=all
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 23088368
latency average = 0.291 ms
latency stddev = 1.445 ms
tps = 192401.524926 (including connections establishing)
tps = 192413.921654 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.018 \set aid random(1, 100000 * :scale)
0.282 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
5、关闭log_statement,开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 12842222
latency average = 0.523 ms
latency stddev = 1.516 ms
tps = 107017.184761 (including connections establishing)
tps = 107025.759343 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.019 \set aid random(1, 100000 * :scale)
0.520 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
6、开启log_statement=all,同时开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 7484093
latency average = 0.898 ms
latency stddev = 1.386 ms
tps = 62366.260357 (including connections establishing)
tps = 62370.603530 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.016 \set aid random(1, 100000 * :scale)
0.897 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
7、开启log_statement=all,同时开启log_duration=on,同时开启track_io_timing = on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 7161693
latency average = 0.938 ms
latency stddev = 1.150 ms
tps = 59679.955252 (including connections establishing)
tps = 59684.540717 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.015 \set aid random(1, 100000 * :scale)
0.938 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
诊断
测试结果与实际的时钟开销不相符,因为时钟开销实际上只有几十纳秒,但是测试结果很显然不止这么多。
profiling
Samples: 1M of event 'cpu-clock', Event count (approx.): 234709749074
Overhead Shared Object Symbol
41.15% [kernel] [k] osq_lock
19.00% libc-2.17.so [.] __mcount_internal
10.37% [kernel] [k] _raw_spin_unlock_irqrestore
perf
主要是锁冲突引起的性能下降。
# Children Self Command Shared Object Symbol
# ........ ........ .............. ..................... ...............................................
#
55.47% 0.00% postgres postgres [.] PostmasterMain
|
---PostmasterMain
|
|--54.42%--ServerLoop
| |
| --54.20%--PostgresMain
| |
| |--40.68%--errfinish
| | |
| | --40.56%--EmitErrorReport
| | |
| | |--38.51%--__write_nocancel
| | | |
| | | --38.44%--system_call_fastpath
| | | |
| | | --38.43%--sys_write
| | | |
| | | --38.39%--vfs_write
| | | |
| | | --38.32%--do_sync_write
| | | |
| | | --38.27%--pipe_write
| | | |
| | | |--21.55%--pipe_wait
| | | | |
| | | | |--20.07%--mutex_lock
| | | | | |
| | | | | --20.02%--__mutex_lock_slowpath
| | | | | |
| | | | | |--17.89%--osq_lock
| | | | | |
| | | | | |--1.44%--apic_timer_interrupt
| | | | | | smp_apic_timer_interrupt
| | | | | | irq_exit
| | | | | | do_softirq
| | | | | | call_softirq
| | | | | | |
| | | | | | --1.25%--__do_softirq
| | | | | |
| | | | | --0.61%--mutex_spin_on_owner
| | | | |
| | | | --0.94%--prepare_to_wait
| | | | |
| | | | --0.91%--_raw_spin_unlock_irqrestore
| | | |
| | | |--11.44%--mutex_lock
| | | | |
| | | | --11.35%--__mutex_lock_slowpath
| | | | |
| | | | |--10.11%--osq_lock
| | | | |
| | | | --0.69%--apic_timer_interrupt
| | | | smp_apic_timer_interrupt
| | | | irq_exit
| | | | do_softirq
| | | | call_softirq
| | | | |
| | | | --0.60%--__do_softirq
| | | |
| | | --4.73%--__wake_up_sync_key
| | | |
| | | --4.68%--_raw_spin_unlock_irqrestore
| | |
| | |--0.72%--log_line_prefix.isra.1
| | |
| | --0.52%--_mcount
| |
| |--4.05%--PortalRun
| | |
| | --4.01%--PortalRunSelect
| | |
| | --3.91%--standard_ExecutorRun
| | |
| | --3.27%--ExecScan
| | |
| | --3.18%--IndexNext
| | |
| | --2.95%--index_getnext
| | |
| | |--2.19%--index_getnext_tid
| | | |
| | | --2.16%--btgettuple
| | | |
| | | --2.08%--_bt_first
| | | |
| | | --1.65%--_bt_search
| | | |
| | | --0.80%--_bt_relandgetbuf
| | |
| | --0.73%--index_fetch_heap
| |
| |--2.14%--PortalStart
| | |
| | --1.90%--standard_ExecutorStart
| | |
| | --1.64%--ExecInitNode
| | |
| | --1.59%--ExecInitIndexScan
| |
| |--1.65%--finish_xact_command.part.4
| | |
| | --1.64%--CommitTransactionCommand
| | |
| | --1.49%--CommitTransaction
| | |
| | --0.72%--PreCommit_Portals
| | |
| | --0.63%--PortalDrop
| | |
| | --0.53%--PortalCleanup
| |
| |--0.96%--pq_getbyte
| | |
| | --0.95%--pq_recvbuf
| | |
| | --0.62%--secure_read
| |
| --0.90%--ReadyForQuery
| |
| --0.83%--socket_flush
| |
| --0.82%--internal_flush
| |
| --0.78%--__libc_send
| |
| --0.76%--system_call_fastpath
| |
| --0.75%--sys_sendto
| |
| --0.75%--SYSC_sendto
| |
| --0.73%--sock_sendmsg
| |
| --0.69%--unix_stream_sendmsg
| |
| --0.57%--sock_def_readable
| |
| --0.57%--__wake_up_sync_key
| |
| --0.54%--_raw_spin_unlock_irqrestore
|
--1.04%--SysLogger_Start
|
--0.84%--__read_nocancel
|
--0.84%--system_call_fastpath
|
--0.84%--sys_read
|
--0.84%--vfs_read
|
--0.84%--do_sync_read
|
--0.82%--pipe_read
为什么只开log_statement,没有触发这个锁冲突呢?实际上只是并发没有够,或者调用elog的频率还不够。(开启log_duration后,调用elog翻倍,所以同样并发更容易触发这个锁冲突)
1、将并发调低,锁冲突问题减少,性能损耗相对来说就很小。
Samples: 672K of event 'cpu-clock', Event count (approx.): 146760374621
Overhead Shared Object Symbol
29.73% libc-2.17.so [.] __mcount_internal
6.37% [kernel] [k] _raw_spin_unlock_irqrestore
5.93% libc-2.17.so [.] _mcount
4.18% libc-2.17.so [.] vfprintf
2.96% [kernel] [k] osq_lock
2.92% [kernel] [k] __do_softirq
1.80% [kernel] [k] finish_task_switch
1.34% postgres [.] hash_search_with_hash_value
1.20% libc-2.17.so [.] _IO_default_xsputn
1、关闭log_statement与log_duration
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 17770447
latency average = 0.189 ms
latency stddev = 0.077 ms
tps = 148086.294302 (including connections establishing)
tps = 148093.564454 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.013 \set aid random(1, 100000 * :scale)
0.176 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
2、开启log_statement=all
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 17236212
latency average = 0.195 ms
latency stddev = 0.321 ms
tps = 143634.319683 (including connections establishing)
tps = 143642.034125 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.010 \set aid random(1, 100000 * :scale)
0.185 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
3、关闭log_statement,开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 16774800
latency average = 0.200 ms
latency stddev = 0.977 ms
tps = 139789.393297 (including connections establishing)
tps = 139795.661393 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.008 \set aid random(1, 100000 * :scale)
0.193 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
4、开启log_statement=all,同时开启log_duration=on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 14696887
latency average = 0.229 ms
latency stddev = 0.277 ms
tps = 122473.319859 (including connections establishing)
tps = 122479.381454 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.007 \set aid random(1, 100000 * :scale)
0.224 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
5、开启log_statement=all,同时开启log_duration=on,同时开启track_io_timing = on
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 14606445
latency average = 0.230 ms
latency stddev = 0.726 ms
tps = 121719.147924 (including connections establishing)
tps = 121725.278733 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.007 \set aid random(1, 100000 * :scale)
0.226 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
2、使用高并发,即使仅开启log_statement也会触发锁问题。
只开log_statement,调用elog是同时开启log_statement, log_duraiton的一半,所以仅开启log_statement时把并发开到一倍(112),就可以观测到问题。
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 112
number of threads: 112
duration: 120 s
number of transactions actually processed: 11832722
latency average = 1.135 ms
latency stddev = 2.437 ms
tps = 98604.216501 (including connections establishing)
tps = 98621.113426 (excluding connections establishing)
script statistics:
- statement latencies in milliseconds:
0.021 \set aid random(1, 100000 * :scale)
1.149 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
perf 如下
Samples: 557K of event 'cpu-clock', Event count (approx.): 94808854918
Children Self Shared Object Symbol
+ 67.35% 0.22% postgres [.] PostgresMain
+ 45.50% 0.01% [kernel] [k] system_call_fastpath
+ 41.27% 0.02% postgres [.] errfinish
+ 41.14% 0.27% postgres [.] EmitErrorReport
+ 39.62% 0.01% [kernel] [k] sys_write
+ 39.56% 0.03% [kernel] [k] vfs_write
+ 39.48% 0.02% [kernel] [k] do_sync_write
+ 39.31% 0.01% libpthread-2.17.so [.] __write_nocancel
+ 39.06% 0.08% [kernel] [k] pipe_write
+ 35.00% 0.09% [kernel] [k] mutex_lock
+ 34.88% 0.12% [kernel] [k] __mutex_lock_slowpath
+ 33.71% 0.00% libc-2.17.so [.] __libc_start_main
+ 33.61% 33.61% [kernel] [k] osq_lock
+ 32.61% 0.00% postgres [.] PostmasterMain
+ 32.61% 0.00% postgres [.] main
+ 32.33% 0.00% postgres [.] ServerLoop
+ 31.63% 2.75% libc-2.17.so [.] _mcount
+ 29.31% 29.14% libc-2.17.so [.] __mcount_internal
+ 27.19% 0.02% [kernel] [k] pipe_wait
+ 21.76% 0.27% pgbench [.] threadRun
+ 11.30% 0.00% libpthread-2.17.so [.] start_thread
syslogger相关代码
src/backend/tcop/postgres.c
934 /* Log immediately if dictated by log_statement */
935 if (check_log_statement(parsetree_list))
936 {
937 ereport(LOG,
938 (errmsg("statement: %s", query_string),
939 errhidestmt(true),
940 errdetail_execute(parsetree_list)));
941 was_logged = true;
942 }
1162 /*
1163 * Emit duration logging if appropriate.
1164 */
1165 switch (check_log_duration(msec_str, was_logged))
1166 {
1167 case 1:
1168 ereport(LOG,
1169 (errmsg("duration: %s ms", msec_str),
1170 errhidestmt(true)));
1171 break;
1172 case 2:
1173 ereport(LOG,
1174 (errmsg("duration: %s ms statement: %s",
1175 msec_str, query_string),
1176 errhidestmt(true),
1177 errdetail_execute(parsetree_list)));
1178 break;
1179 }
1946 /* Log immediately if dictated by log_statement */
1947 if (check_log_statement(portal->stmts))
1948 {
1949 ereport(LOG,
1950 (errmsg("%s %s%s%s: %s",
1951 execute_is_fetch ?
1952 _("execute fetch from") :
1953 _("execute"),
1954 prepStmtName,
1955 *portal_name ? "/" : "",
1956 *portal_name ? portal_name : "",
1957 sourceText),
1958 errhidestmt(true),
1959 errdetail_params(portalParams)));
1960 was_logged = true;
1961 }
src/backend/utils/error/elog.c
/*
* Constructs the error message, depending on the Errordata it gets, in a CSV
* format which is described in doc/src/sgml/config.sgml.
*/
static void
write_csvlog(ErrorData *edata)
{
............
/* If in the syslogger process, try to write messages direct to file */
if (am_syslogger)
write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
else
write_pipe_chunks(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
pfree(buf.data);
}
/*
* Send data to the syslogger using the chunked protocol
*
* Note: when there are multiple backends writing into the syslogger pipe,
* it's critical that each write go into the pipe indivisibly, and not
* get interleaved with data from other processes. Fortunately, the POSIX
* spec requires that writes to pipes be atomic so long as they are not
* more than PIPE_BUF bytes long. So we divide long messages into chunks
* that are no more than that length, and send one chunk per write() call.
* The collector process knows how to reassemble the chunks.
*
* Because of the atomic write requirement, there are only two possible
* results from write() here: -1 for failure, or the requested number of
* bytes. There is not really anything we can do about a failure; retry would
* probably be an infinite loop, and we can't even report the error usefully.
* (There is noplace else we could send it!) So we might as well just ignore
* the result from write(). However, on some platforms you get a compiler
* warning from ignoring write()'s result, so do a little dance with casting
* rc to void to shut up the compiler.
*/
static void
write_pipe_chunks(char *data, int len, int dest)
{
PipeProtoChunk p;
int fd = fileno(stderr);
int rc;
Assert(len > 0);
p.proto.nuls[0] = p.proto.nuls[1] = '\0';
p.proto.pid = MyProcPid;
/* write all but the last chunk */
while (len > PIPE_MAX_PAYLOAD)
{
p.proto.is_last = (dest == LOG_DESTINATION_CSVLOG ? 'F' : 'f');
p.proto.len = PIPE_MAX_PAYLOAD;
memcpy(p.proto.data, data, PIPE_MAX_PAYLOAD);
rc = write(fd, &p, PIPE_HEADER_SIZE + PIPE_MAX_PAYLOAD);
(void) rc;
data += PIPE_MAX_PAYLOAD;
len -= PIPE_MAX_PAYLOAD;
}
/* write the last chunk */
p.proto.is_last = (dest == LOG_DESTINATION_CSVLOG ? 'T' : 't');
p.proto.len = len;
memcpy(p.proto.data, data, len);
rc = write(fd, &p, PIPE_HEADER_SIZE + len);
(void) rc;
}
# Overhead Command Shared Object
# ........ ............... ................................... ............................
#
32.77% postgres [kernel.kallsyms] [k] __mutex_lock_slowpath
|
--- __mutex_lock_slowpath
|
|--100.00%-- mutex_lock
| |
| |--96.97%-- pipe_lock
| | pipe_wait
| | |
| | |--99.99%-- pipe_write
| | | do_sync_write
| | | vfs_write
| | | sys_write
| | | system_call_fastpath
| | | __write_nocancel
| | | |
| | | --100.00%-- write_csvlog
| | | EmitErrorReport
| | | errfinish
| | | PostgresMain
| | | PostmasterMain
| | | main
| | | __libc_start_main
| | --0.01%-- [...]
| |
| |--2.84%-- pipe_write
| | do_sync_write
| | vfs_write
| | sys_write
| | system_call_fastpath
| | |
| | |--99.58%-- __write_nocancel
| | | |
| | | --100.00%-- write_csvlog
| | | EmitErrorReport
| | | errfinish
| | | PostgresMain
| | | PostmasterMain
| | | main
| | | __libc_start_main
| | --0.42%-- [...]
| --0.19%-- [...]
--0.00%-- [...]
17.76% postgres [kernel.kallsyms] [k] mutex_spin_on_owner
|
--- mutex_spin_on_owner
|
|--99.96%-- __mutex_lock_slowpath
| mutex_lock
| |
| |--98.91%-- pipe_lock
| | pipe_wait
| | |
| | |--100.00%-- pipe_write
| | | do_sync_write
| | | vfs_write
| | | sys_write
| | | system_call_fastpath
| | | __write_nocancel
| | | |
| | | --100.00%-- write_csvlog
| | | EmitErrorReport
| | | errfinish
| | | PostgresMain
| | | PostmasterMain
| | | main
| | | __libc_start_main
| | --0.00%-- [...]
| |
| |--0.96%-- pipe_write
| | do_sync_write
| | vfs_write
| | sys_write
| | system_call_fastpath
| | |
| | |--97.72%-- __write_nocancel
| | | |
| | | --100.00%-- write_csvlog
| | | EmitErrorReport
| | | errfinish
| | | PostgresMain
| | | PostmasterMain
| | | main
| | | __libc_start_main
| | |
| | --2.28%-- 0x7f89375b59fd
| | |
| | |--58.85%-- 0x3731206e754a205d
| | |
| | |--24.32%-- 0x6574616572635f74
| | |
| | --16.82%-- 0xa746e756f6d65
| --0.13%-- [...]
--0.04%-- [...]
小结
PostgreSQL syslogger elog部分还有优化的空间,当调用elog并发非常高时,会触发lock冲突,导致性能下降。
测试结果1:
高并发,log_statement+log_duration 同时开启,56并发时引发lock冲突影响了性能。
测试CASE | QPS |
---|---|
关闭审计,关闭时间统计 | 208834 |
开启审计 | 192413 |
开启SQL时间统计 | 107025 |
同时开启审计,SQL时间统计 | 62370 |
同时开启审计,SQL时间统计,IO时间统计 | 59684 |
测试结果2:
低并发,开启审计和时间统计,未触发lock冲突,性能损耗小。
测试CASE | QPS |
---|---|
关闭审计,关闭时间统计 | 148093 |
开启审计 | 143642 |
开启SQL时间统计 | 139795 |
同时开启审计,SQL时间统计 | 122479 |
同时开启审计,SQL时间统计,IO时间统计 | 121725 |
参考
https://www.postgresql.org/docs/11/static/pgtesttiming.html
《Linux 时钟精度 与 PostgreSQL auto_explain (explain timing 时钟开销估算)》