背景
前面一篇将EXT4 单机多实例在使用cgroup限制IOPS时,出现了IO HANG, 即使使用了data=writeback问题依旧。
从D状态的进程打印的PSTACK可以看到,问题卡在ext4上面。
详见 《PostgreSQL 9.6 检查点SYNC_FILE_RANGE 在单机多实例下的IO Hang问题浅析与优化》
阿里云的RDS PostgreSQL通过优化检查点调度解决了这个问题,原理也可以参考上文。
XFS是一个非常不错的文件系统,特别是在高并发的场景下面,性能比EXT4要好。
我之前在测试PostgreSQL 9.6的并行计算时,也能体现出XFS更优的一面。
本文将给大家展示XFS在PG单机多实例的场景,表现如何。
test cast请参考 《PostgreSQL 主机性能测试方法 - 单机多实例》
XFS 单机多实例测试
逻辑卷
parted -s /dev/dfa mklabel gpt
parted -s /dev/dfb mklabel gpt
parted -s /dev/dfa mkpart primary 1MiB 6400GB
parted -s /dev/dfb mkpart primary 1MiB 6400GB
pvcreate /dev/df[ab]1
vgcreate -s 128M vgdata01 /dev/df[ab]1
lvcreate -i 2 -I 8 -n lv01 -L 2GiB vgdata01
lvcreate -i 2 -I 8 -n lv02 -L 2GiB vgdata01
lvcreate -i 2 -I 8 -n lv03 -L 4TiB vgdata01
lvcreate -i 2 -I 8 -n lv04 -l 100%FREE vgdata01
设备号
#dmsetup ls
vgdata01-lv04 (253, 3) pg_root
vgdata01-lv03 (253, 2) pg_xlog
vgdata01-lv02 (253, 1)
vgdata01-lv01 (253, 0)
文件系统
mkfs.xfs -f -b size=4096 -l logdev=/dev/mapper/vgdata01-lv01,size=2136997888,sunit=16 -d agsize=524280k,sunit=16,swidth=32 /dev/mapper/vgdata01-lv03
mkfs.xfs -f -b size=4096 -l logdev=/dev/mapper/vgdata01-lv02,size=2136997888,sunit=16 -d agsize=524280k,sunit=16,swidth=32 /dev/mapper/vgdata01-lv04
mount -t xfs -o allocsize=1GiB,inode64,nobarrier,largeio,logbufs=8,logbsize=262144,noatime,nodiratime,swalloc,logdev=/dev/mapper/vgdata01-lv01 /dev/mapper/vgdata01-lv03 /u01
mount -t xfs -o allocsize=1GiB,inode64,nobarrier,largeio,logbufs=8,logbsize=262144,noatime,nodiratime,swalloc,logdev=/dev/mapper/vgdata01-lv02 /dev/mapper/vgdata01-lv04 /u02
目录
#mkdir /data01/digoal
#mkdir /data02/digoal
#chown digoal /data01/digoal
#chown digoal /data02/digoal
IOPS限制不一样的地方,不限制XFS的LOG设备(只是xfs metadata journal)。
$ vi start.sh
for ((i=1921;i<1921+$1;i++))
do
. /home/digoal/env.sh $i
cgcreate -g cpu:RULE$i
cgcreate -g cpuacct:RULE$i
cgcreate -g memory:RULE$i
cgcreate -g blkio:RULE$i
echo "253:2 4000" > /cgroup/blkio/RULE$i/blkio.throttle.write_iops_device
echo "253:2 4000" > /cgroup/blkio/RULE$i/blkio.throttle.read_iops_device
echo "253:3 800" > /cgroup/blkio/RULE$i/blkio.throttle.write_iops_device
echo "253:3 800" > /cgroup/blkio/RULE$i/blkio.throttle.read_iops_device
echo "70" > /cgroup/cpu/RULE$i/cpu.shares
echo "1000000" > /cgroup/cpu/RULE$i/cpu.cfs_period_us
echo "700000" > /cgroup/cpu/RULE$i/cpu.cfs_quota_us
echo "1000000" > /cgroup/cpu/RULE$i/cpu.rt_period_us
echo "1000" > /cgroup/cpu/RULE$i/cpu.rt_runtime_us
echo "4294967296" > /cgroup/memory/RULE$i/memory.limit_in_bytes
cgexec -g cpu:RULE$i -g cpuacct:RULE$i -g memory:RULE$i -g blkio:RULE$i su - digoal -c ". ~/env.sh $i ; nohup postgres -B 1GB -c port=$i -c listen_addresses='0.0.0.0' -c synchronous_commit=on -c full_page_writes=on -c wal_buffers=128MB -c wal_writer_flush_after=0 -c bgwriter_delay=10ms -c max_connections=100 -c bgwriter_lru_maxpages=1000 -c bgwriter_lru_multiplier=10.0 -c unix_socket_directories='.' -c max_wal_size=16GB -c checkpoint_timeout=50min -c checkpoint_completion_target=0.00001 -c log_checkpoints=on -c log_connections=on -c log_disconnections=on -c log_error_verbosity=verbose -c autovacuum_vacuum_scale_factor=0.002 -c autovacuum_max_workers=4 -c autovacuum_naptime=5s -c random_page_cost=1.0 -c constraint_exclusion=on -c log_destination='csvlog' -c logging_collector=on -c maintenance_work_mem=256MB -c autovacuum_work_mem=256MB -D $PGDATA -k $PGDATA >/dev/null 2>&1 &"
done
性能和稳定性表现
同样的测试方法下面,我们来看看
润滑性,在测试了7个小时后,表现很不错,没有出现完全hang的情况,而在EXT4下面则出现了长时间hang住的情况,此时PSTACK指向的也是ext4的操作。
检查点fsync时长都这样了,只是性能略微下降,没有出现hang死。
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 47243 buffers (36.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=59.004 s, sync=0.995 s, total=60.129 s; sync files=39, longest=0.696 s, average=0.025 s; distance=998569 kB, estimate=998569 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 91660 buffers (69.9%); 0 transaction log file(s) added, 0 removed, 61 recycled; write=82.958 s, sync=33.343 s, total=116.809 s; sync files=8, longest=32.943 s, average=4.167 s; distance=1362303 kB, estimate=1362303 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 128232 buffers (97.8%); 0 transaction log file(s) added, 0 removed, 83 recycled; write=2.489 s, sync=165.609 s, total=168.324 s; sync files=10, longest=162.110 s, average=16.560 s; distance=2058155 kB, estimate=2058155 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 114122 buffers (87.1%); 0 transaction log file(s) added, 0 removed, 126 recycled; write=3.176 s, sync=149.814 s, total=153.458 s; sync files=9, longest=146.011 s, average=16.646 s; distance=2543291 kB, estimate=2543291 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 114991 buffers (87.7%); 0 transaction log file(s) added, 0 removed, 155 recycled; write=5.199 s, sync=168.901 s, total=176.056 s; sync files=9, longest=149.837 s, average=18.766 s; distance=3013867 kB, estimate=3013867 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 112645 buffers (85.9%); 0 transaction log file(s) added, 0 removed, 184 recycled; write=2.360 s, sync=193.048 s, total=196.096 s; sync files=9, longest=154.145 s, average=21.449 s; distance=3550585 kB, estimate=3550585 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 61261 buffers (46.7%); 0 transaction log file(s) added, 33 removed, 183 recycled; write=222.717 s, sync=194.187 s, total=424.427 s; sync files=10, longest=89.013 s, average=19.417 s; distance=3990374 kB, estimate=3990374 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
checkpoint starting: time",,,,,,,,"LogCheckpointStart, xlog.c:7996",""
checkpoint complete: wrote 50192 buffers (38.3%); 0 transaction log file(s) added, 115 removed, 129 recycled; write=251.693 s, sync=546.324 s, total=811.781 s; sync files=8, longest=256.978 s, average=68.290 s; distance=4305749 kB, estimate=4305749 kB",,,,,,,,"LogCheckpointEnd, xlog.c:8078",""
性能方面,XFS表现完全超越EXT4。
如果你不想通过修改PG内核,优化检查点代码来提升稳定性的话,推荐使用XFS