测试PostgreSQL数据库性能的方法很多,例如pgbench, sysbench。
sysbench因为使用lua脚本编程,支持多线程,灵活度更高,测试复杂的业务逻辑建议用sysbench。
pgbench其实也很好,纯C写的,本身的开销小,测高并发低延迟的场景建议用pgbench。
首先要购买RDS PG数据库实例
创建数据库用户
还需要购买同机房,与RDS PG同VPC网络ECS或者同经典网络的ECS
在ECS端安装PostgreSQL客户端
useradd digoal
su - digoal
wget https://ftp.postgresql.org/pub/source/v9.5.2/postgresql-9.5.2.tar.bz2
tar -jxvf postgresql-9.5.2.tar.bz2
cd postgresql-9.5.2
./configure --prefix=/home/digoal/pgsql9.5
gmake world -j 16
gmake install-world -j 16
vi ~/env_pg.sh
export PS1="$USER@`/bin/hostname -s`-> "
export PGPORT=1921
export LANG=en_US.utf8
export PGHOME=/home/digoal/pgsql9.5
export LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib:$LD_LIBRARY_PATH
export DATE=`date +"%Y%m%d%H%M"`
export PATH=$PGHOME/bin:$PATH:.
export MANPATH=$PGHOME/share/man:$MANPATH
export PGHOST=$PGDATA
export PGUSER=postgres
export PGDATABASE=postgres
alias rm='rm -i'
alias ll='ls -lh'
unalias vi
. ~/env_pg.sh
安装sysbench
cd ~
mkdir sysbench
cd sysbench
git clone https://github.com/digoal/sysbench.git
cd sysbench
gcc -o gendata gendata.c
初始化测试数据
./sysbench_pg --test=lua/parallel_init_pg.lua \
--db-driver=pgsql \
--pgsql-host=xxx.xxx.xxx.xxx \
--pgsql-port=3432 \
--pgsql-user=digoal \
--pgsql-password=pwd \
--pgsql-db=postgres \
--oltp-tables-count=16 \
--oltp-table-size=1000000 \
--num-threads=16 \
cleanup
./sysbench_pg --test=lua/parallel_init_pg.lua \
--db-driver=pgsql \
--pgsql-host=xxx.xxx.xxx.xxx \
--pgsql-port=3432 \
--pgsql-user=digoal \
--pgsql-password=pwd \
--pgsql-db=postgres \
--oltp-tables-count=16 \
--oltp-table-size=1000000 \
--num-threads=16 \
run
测试oltp_pg.lua的内容,包含SQL如下,其中第一条SQL循环10次 :
-- select c from tbl where id = $1;
-- select id,k,c,pad from tbl where id in ($1,...$n);
-- select c from tbl where id between $1 and $2;
-- select sum(k) from tbl where id between $1 and $2;
-- select c from tbl where id between $1 and $2 order by c;
-- select distinct c from tbl where id between $1 and $2 order by c;
-- update tbl set k=k+1 where id = $1;
-- update tbl set c=$2 where id = $1;
-- delete from tbl where id = $1;
-- insert into tbl(id, k, c, pad) values ($1,$2,$3,$4);
一个事务执行19条SQL。
./sysbench_pg --test=lua/oltp_pg.lua \
--db-driver=pgsql \
--pgsql-host=xxx.xxx.xxx.xxx \
--pgsql-port=3432 \
--pgsql-user=digoal \
--pgsql-password=pwd \
--pgsql-db=postgres \
--oltp-tables-count=16 \
--oltp-table-size=1000000 \
--num-threads=16 \
--max-time=120 \
--max-requests=0 \
--report-interval=1 \
run
OLTP test statistics:
queries performed:
read: 0
write: 0
other: 566572
total: 566572
transactions: 26972 (224.62 per sec.)
deadlocks: 0 (0.00 per sec.)
read/write requests: 0 (0.00 per sec.)
other operations: 566572 (4718.32 per sec.)
General statistics:
total time: 120.0791s
total number of events: 26972
total time taken by event execution: 1919.7217s
response time:
min: 39.35ms
avg: 71.17ms
max: 3159.62ms
approx. 95 percentile: 124.54ms
Threads fairness:
events (avg/stddev): 1685.7500/85.94
execution time (avg/stddev): 119.9826/0.02
下面是本次测试的瓶颈分析
连接到阿里云RDS管控平台,观察压测时间段的资源开销,哪个到了瓶颈就升级哪个资源。
如果是网络的问题,可以增加测试的并发来提升TPS。
因为单个会话的链路延迟已经是没法降低的。
关于链路延迟量化分析的文章可参考
https://yq.aliyun.com/articles/35176
RDS PG的优化手段
alter role all set random_page_cost=1.2;
alter role all set synchronous_commit=off;
因为RDS链路较长,延迟会比本地延迟大很多。
但是如何量化这个延迟呢?
因为rds pg数据库服务器我们没法用qperf来测试,所以需要借助数据库本身来测试延迟。
alter role all set random_page_cost=1.2;
alter role all set synchronous_commit=off;
重连数据库,测试数据库本身处理SQL的RT
create table test(crt_time timestamp);
do language plpgsql
$$
declare
begin
for i in 1..10000 loop
insert into test values (clock_timestamp());
end loop;
end;
$$
;
postgres=> select avg(rt) from (select lead(extract(microseconds from crt_time)) over (order by crt_time)-extract(microseconds from crt_time) rt from test) t;
avg
------------------
10.1338133813381
(1 row)
数据库处理RT平均约10微秒。
创建用于测试网络RT的函数。
create or replace function f() returns void as
$$
insert into test values(clock_timestamp());
$$
language sql;
清除数据
truncate test;
在ECS主机上创建测试脚本
vi test.sql
select f();
压测
export PGPASSWORD=pwd; pgbench -M prepared -n -r -P 1 -f ./test.sql -c 1 -j 1 -T 10 -h xxx.xxx.xxx.xxx -p 3432 -U digoal postgres
tps = 197.976441 (including connections establishing)
计算RT
postgres=> select avg(rt) from (select lead(extract(microseconds from crt_time)) over (order by crt_time)-extract(microseconds from crt_time) rt from test) t;
avg
------------------
5045.96513390601
(1 row)
扣除数据库自身处理开销10微秒,网络的RT约5.036毫秒。
延迟不小。
使用并发可以弥补这个链路延迟的短板问题,例如开启300个并发,再次测试。
truncate test;
export PGPASSWORD=pwd; pgbench -M prepared -n -r -P 1 -f ./test.sql -c 300 -j 300 -T 10 -h xxx.xxx.xxx.xxx -p 3432 -U digoal postgres
tps = 27368.404844 (including connections establishing)
postgres=> select avg(rt) from (select lead(extract(microseconds from crt_time)) over (order by crt_time)-extract(microseconds from crt_time) rt from test) t;
avg
------------------
37.5476444551323
(1 row)
吞吐量上来了,但是单个事务的RT还是摆在那里的。
另外一点,使用云数据库,建议多用UDF,减少应用程序和数据库的交互次数,从而缩短整个业务逻辑的响应时间。