btrfs vs ext4 fsync

简介:

在虚拟机中,单个块设备下的fdatasync性能约为ext4的1/3。

PostgreSQL有很多地方会用到fsync,例如flush xlog,检查点,创建数据库,alter database move tablespace ,重写表,pg_clog等等。

参考:

http://blog.163.com/digoal@126/blog/static/1638770402015840480734/

fsync的性能直接影响数据库的性能。

以下是在CentOS 7 x64中的对比,btrfs 使用4.3.1的版本源码编译。

http://blog.163.com/digoal@126/blog/static/16387704020151025102118544/


ext4:

[root@digoal ~]# mkfs.ext4 /dev/sdb1
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
2621440 inodes, 10485504 blocks
524275 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2157969408
320 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

[root@digoal ~]# mount /dev/sdb1 /data01 -o defaults,noatime,nodiratime,discard,data=ordered
[root@digoal ~]# cd /data01/
[root@digoal data01]# /opt/pgsql9.5/bin/pg_test_fsync 
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                      5496.006 ops/sec     182 usecs/op
        fdatasync                          5357.773 ops/sec     187 usecs/op
        fsync                              2872.555 ops/sec     348 usecs/op
        fsync_writethrough                            n/a
        open_sync                          3059.961 ops/sec     327 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                      2997.891 ops/sec     334 usecs/op
        fdatasync                          4980.309 ops/sec     201 usecs/op
        fsync                              2934.537 ops/sec     341 usecs/op
        fsync_writethrough                            n/a
        open_sync                          1608.287 ops/sec     622 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
         1 * 16kB open_sync write          2909.899 ops/sec     344 usecs/op
         2 *  8kB open_sync writes         1565.073 ops/sec     639 usecs/op
         4 *  4kB open_sync writes          830.664 ops/sec    1204 usecs/op
         8 *  2kB open_sync writes          459.544 ops/sec    2176 usecs/op
        16 *  1kB open_sync writes          227.552 ops/sec    4395 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
        write, fsync, close                3082.501 ops/sec     324 usecs/op
        write, close, fsync                2798.324 ops/sec     357 usecs/op

Non-sync'ed 8kB writes:
        write                            300198.383 ops/sec       3 usecs/op

btrfs默认性能:

[root@digoal ~]# mkfs.btrfs /dev/sdb1 -f
btrfs-progs v4.3.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               26f9fd42-0933-4382-8124-437091e1cddf
Node size:          16384
Sector size:        4096
Filesystem size:    40.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP               1.01GiB
  System:           DUP              12.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    40.00GiB  /dev/sdb1

[root@digoal ~]# mount /dev/sdb1 /data01
[root@digoal ~]# cd /data01/
[root@digoal data01]# /opt/pgsql9.5/bin/pg_test_fsync 
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                       672.325 ops/sec    1487 usecs/op
        fdatasync                           460.352 ops/sec    2172 usecs/op
        fsync                               385.227 ops/sec    2596 usecs/op
        fsync_writethrough                            n/a
        open_sync                           392.941 ops/sec    2545 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                       179.161 ops/sec    5582 usecs/op
        fdatasync                           358.958 ops/sec    2786 usecs/op
        fsync                               518.578 ops/sec    1928 usecs/op
        fsync_writethrough                            n/a
        open_sync                           273.567 ops/sec    3655 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
         1 * 16kB open_sync write           566.545 ops/sec    1765 usecs/op
         2 *  8kB open_sync writes          268.357 ops/sec    3726 usecs/op
         4 *  4kB open_sync writes          144.014 ops/sec    6944 usecs/op
         8 *  2kB open_sync writes           79.028 ops/sec   12654 usecs/op
        16 *  1kB open_sync writes           31.814 ops/sec   31433 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
        write, fsync, close                 570.831 ops/sec    1752 usecs/op
        write, close, fsync                 562.849 ops/sec    1777 usecs/op

Non-sync'ed 8kB writes:
        write                            225085.038 ops/sec       4 usecs/op

btrfs优化后:
(data只存一份,使用4K的node size减少写锁冲突, 关闭压缩,使用space cache,关闭data cow。)

[root@digoal ~]# mkfs.btrfs /dev/sdb1 -m single -n 4096 -f
btrfs-progs v4.3.1
See http://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               1e859a5c-570b-4426-83ac-b73a473d1936
Node size:          4096
Sector size:        4096
Filesystem size:    40.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         single            8.00MiB
  System:           single            4.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    40.00GiB  /dev/sdb1

[root@digoal ~]# mount /dev/sdb1 /data01 -o ssd,discard,nodatacow,noatime,nodiratime,compress=no,space_cache
[root@digoal ~]# cd /data01/
[root@digoal data01]# /opt/pgsql9.5/bin/pg_test_fsync 
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                      1424.383 ops/sec     702 usecs/op
        fdatasync                          1870.474 ops/sec     535 usecs/op
        fsync                              1816.084 ops/sec     551 usecs/op
        fsync_writethrough                            n/a
        open_sync                          1458.938 ops/sec     685 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync is Linux's default)
        open_datasync                       750.109 ops/sec    1333 usecs/op
        fdatasync                          1747.257 ops/sec     572 usecs/op
        fsync                              1729.970 ops/sec     578 usecs/op
        fsync_writethrough                            n/a
        open_sync                           723.056 ops/sec    1383 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB in different write
open_sync sizes.)
         1 * 16kB open_sync write          1413.624 ops/sec     707 usecs/op
         2 *  8kB open_sync writes          720.379 ops/sec    1388 usecs/op
         4 *  4kB open_sync writes          352.704 ops/sec    2835 usecs/op
         8 *  2kB open_sync writes          157.877 ops/sec    6334 usecs/op
        16 *  1kB open_sync writes           73.355 ops/sec   13632 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written on a different
descriptor.)
        write, fsync, close                1827.975 ops/sec     547 usecs/op
        write, close, fsync                1664.630 ops/sec     601 usecs/op

Non-sync'ed 8kB writes:
        write                            243183.732 ops/sec       4 usecs/op

btrfs使用条带性能可以进一步提升。

btrfs mount参数

https://btrfs.wiki.kernel.org/index.php/Mount_options

目录
相关文章
|
1月前
|
存储 大数据 Linux
文件系统EXT3,EXT4和XFS的区别
通过上述分析,我们可以看出,选择哪种文件系统需依据具体的应用需求而定,无论是寻求稳定性与兼容性的EXT3,追求高性能与扩展性的EXT4,还是面向大数据处理优化的XFS,各有千秋。正确评估业务需求,方能做出最适合的选择。
102 2
|
7月前
|
存储 Linux
BTRFS - what makes BTRFS different?
BTRFS - what makes BTRFS different?
32 1
|
7月前
|
存储 算法 Linux
BTRFS Defragmentation
BTRFS Defragmentation
41 1
|
开发工具 git 固态存储
|
JavaScript Linux Unix
|
关系型数据库 MySQL 开发工具
|
Linux Unix