在虚拟机中,单个块设备下的fdatasync性能约为ext4的1/3。
PostgreSQL有很多地方会用到fsync,例如flush xlog,检查点,创建数据库,alter database move tablespace ,重写表,pg_clog等等。
参考:
http://blog.163.com/digoal@126/blog/static/1638770402015840480734/
fsync的性能直接影响数据库的性能。
以下是在CentOS 7 x64中的对比,btrfs 使用4.3.1的版本源码编译。
http://blog.163.com/digoal@126/blog/static/16387704020151025102118544/
ext4:
[root@digoal ~]# mkfs.ext4 /dev/sdb1
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
2621440 inodes, 10485504 blocks
524275 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2157969408
320 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
[root@digoal ~]# mount /dev/sdb1 /data01 -o defaults,noatime,nodiratime,discard,data=ordered
[root@digoal ~]# cd /data01/
[root@digoal data01]# /opt/pgsql9.5/bin/pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 5496.006 ops/sec 182 usecs/op
fdatasync 5357.773 ops/sec 187 usecs/op
fsync 2872.555 ops/sec 348 usecs/op
fsync_writethrough n/a
open_sync 3059.961 ops/sec 327 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 2997.891 ops/sec 334 usecs/op
fdatasync 4980.309 ops/sec 201 usecs/op
fsync 2934.537 ops/sec 341 usecs/op
fsync_writethrough n/a
open_sync 1608.287 ops/sec 622 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 2909.899 ops/sec 344 usecs/op
2 * 8kB open_sync writes 1565.073 ops/sec 639 usecs/op
4 * 4kB open_sync writes 830.664 ops/sec 1204 usecs/op
8 * 2kB open_sync writes 459.544 ops/sec 2176 usecs/op
16 * 1kB open_sync writes 227.552 ops/sec 4395 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 3082.501 ops/sec 324 usecs/op
write, close, fsync 2798.324 ops/sec 357 usecs/op
Non-sync'ed 8kB writes:
write 300198.383 ops/sec 3 usecs/op
btrfs默认性能:
[root@digoal ~]# mkfs.btrfs /dev/sdb1 -f
btrfs-progs v4.3.1
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: 26f9fd42-0933-4382-8124-437091e1cddf
Node size: 16384
Sector size: 4096
Filesystem size: 40.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 1.01GiB
System: DUP 12.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 40.00GiB /dev/sdb1
[root@digoal ~]# mount /dev/sdb1 /data01
[root@digoal ~]# cd /data01/
[root@digoal data01]# /opt/pgsql9.5/bin/pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 672.325 ops/sec 1487 usecs/op
fdatasync 460.352 ops/sec 2172 usecs/op
fsync 385.227 ops/sec 2596 usecs/op
fsync_writethrough n/a
open_sync 392.941 ops/sec 2545 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 179.161 ops/sec 5582 usecs/op
fdatasync 358.958 ops/sec 2786 usecs/op
fsync 518.578 ops/sec 1928 usecs/op
fsync_writethrough n/a
open_sync 273.567 ops/sec 3655 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 566.545 ops/sec 1765 usecs/op
2 * 8kB open_sync writes 268.357 ops/sec 3726 usecs/op
4 * 4kB open_sync writes 144.014 ops/sec 6944 usecs/op
8 * 2kB open_sync writes 79.028 ops/sec 12654 usecs/op
16 * 1kB open_sync writes 31.814 ops/sec 31433 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 570.831 ops/sec 1752 usecs/op
write, close, fsync 562.849 ops/sec 1777 usecs/op
Non-sync'ed 8kB writes:
write 225085.038 ops/sec 4 usecs/op
btrfs优化后:
(data只存一份,使用4K的node size减少写锁冲突, 关闭压缩,使用space cache,关闭data cow。)
[root@digoal ~]# mkfs.btrfs /dev/sdb1 -m single -n 4096 -f
btrfs-progs v4.3.1
See http://btrfs.wiki.kernel.org for more information.
Label: (null)
UUID: 1e859a5c-570b-4426-83ac-b73a473d1936
Node size: 4096
Sector size: 4096
Filesystem size: 40.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: single 8.00MiB
System: single 4.00MiB
SSD detected: no
Incompat features: extref, skinny-metadata
Number of devices: 1
Devices:
ID SIZE PATH
1 40.00GiB /dev/sdb1
[root@digoal ~]# mount /dev/sdb1 /data01 -o ssd,discard,nodatacow,noatime,nodiratime,compress=no,space_cache
[root@digoal ~]# cd /data01/
[root@digoal data01]# /opt/pgsql9.5/bin/pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 1424.383 ops/sec 702 usecs/op
fdatasync 1870.474 ops/sec 535 usecs/op
fsync 1816.084 ops/sec 551 usecs/op
fsync_writethrough n/a
open_sync 1458.938 ops/sec 685 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync 750.109 ops/sec 1333 usecs/op
fdatasync 1747.257 ops/sec 572 usecs/op
fsync 1729.970 ops/sec 578 usecs/op
fsync_writethrough n/a
open_sync 723.056 ops/sec 1383 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
1 * 16kB open_sync write 1413.624 ops/sec 707 usecs/op
2 * 8kB open_sync writes 720.379 ops/sec 1388 usecs/op
4 * 4kB open_sync writes 352.704 ops/sec 2835 usecs/op
8 * 2kB open_sync writes 157.877 ops/sec 6334 usecs/op
16 * 1kB open_sync writes 73.355 ops/sec 13632 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
write, fsync, close 1827.975 ops/sec 547 usecs/op
write, close, fsync 1664.630 ops/sec 601 usecs/op
Non-sync'ed 8kB writes:
write 243183.732 ops/sec 4 usecs/op
btrfs使用条带性能可以进一步提升。
btrfs mount参数