概要
XFS文件系统的性能优化主要分4块
1. 逻辑卷/RAID优化部分
2. XFS mkfs 优化部分
3. XFS mount 优化部分
4. xfsctl 优化部分
以上几个部分,建议了解原理后针对应用场景再展开,通过man手册可以了解原理。
手册有:
man lvcreate
man xfs
man mkfs.xfs
man mount
man xfsctl
下面简单讲一下详细的优化过程:
1. 逻辑卷优化部分
1.1
创建PV前,将块设备对齐,前面1MB最好不要分配,从2048 sector开始分配。
fdisk -c -u /dev/dfa
start 2048
end + (2048*n) - 1
或者使用parted创建分区。
1.2
与性能相关的需要指定2个参数,
1. 条带数量,和pv数量一致即可
-i, --stripes Stripes
Gives the number of stripes. This is equal to the number of physical volumes to scatter the logical volume.
2. 条带大小,和数据库块大小一致,例如postgresql默认为 8KB。
-I, --stripesize StripeSize
Gives the number of kilobytes for the granularity of the stripes.
StripeSize must be 2^n (n = 2 to 9) for metadata in LVM1 format. For metadata in LVM2 format, the stripe size may be a larger power of 2 but must not exceed the physical extent size.
3. 创建快照时,指定的参数
chunksize, 最好和数据库的块大小一致, 例如postgresql默认为 8KB。
-c, --chunksize ChunkSize
Power of 2 chunk size for the snapshot logical volume between 4k and 512k.
例如:
#lvcreate -i 3 -I 8 -n lv01 -l 100%VG vgdata01
Logical volume "lv01" created
2. XFS mkfs 优化部分
XFS的layout:
xfs包含3个section,data, log, realtime files。
默认情况下 log存在data里面,没有realtime。所有的section都是由最小单位block组成,初始化xfs是-b指定block size。
2.1 data
包含 metadata(inode, 目录, 间接块), user file data, non-realtime files
data被拆分成多个allocation group,mkfs.xfs时可以指定group的个数,以及单个group的SIZE。
group越多,可以并行进行的文件和块的allocation就越多。你可以认为单个组的操作是串行的,多个组是并行的。
但是组越多,消耗的CPU会越多,需要权衡。对于并发写很高的场景,可以多一些组,(例如一台主机跑了很多小的数据库,每个数据库都很繁忙的场景下)
2.2 log
存储metadata的log,修改metadata前,必须先记录log,然后才能修改data section中的metadata。
也用于crash后的恢复。
2.3 realtime
被划分为很多个小的extents, 要将文件写入到realtime section中,必须使用xfsctl改一下文件描述符的bit位,并且一定要在数据写入前完成。在realtime中的文件大小是realtime extents的倍数关系。
mkfs.xfs优化
allocation group数量和SIZE相乘等于块设备大小。数量多少和用户需求的并行度相关。
allocation group数量最好是下面逻辑卷对应pv数量的倍数,例如有3个PV,则ag可以是9个,或者900个。
log最好放在SSD上,速度越快越好。最好不要使用cgroup限制LOG块设备的iops操作。
realtime不需要的话,不需要创建。
-b size=8192 与数据库块大小一致
-d agcount=9000,sunit=16,swidth=48
假设有9000个并发写操作,使用9000个allocation groups
(单位512 bytes)与lvm或RAID块设备的条带大小对齐
与lvm或RAID块设备条带跨度大小对齐,以上对应3*8 例如 -i 3 -I 8。
例子
#mkfs.xfs -f -b size=8192 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01
meta-data=/dev/mapper/vgdata01-lv01 isize=256 agcount=9000, agsize=260417 blks
= sectsz=512 attr=2
data = bsize=8192 blocks=2343748608, imaxpct=5
= sunit=1 swidth=3 blks
naming =version 2 bsize=8192 ascii-ci=0
log =internal log bsize=8192 blocks=260413, version=2
= sectsz=512 sunit=1 blks, lazy-count=1
realtime =none extsz=8192 blocks=0, rtextents=0
3. XFS mount 优化部分
nobarrier
largeio 针对数据仓库,流媒体这种大量连续读的应用
nolargeio 针对OLTP
logbsize=262144 指定 log buffer
logdev= 指定log section对应的块设备,用最快的SSD。
noatime,nodiratime
swalloc 条带对齐
例子
#mount -t xfs -o nobarrier,nolargeio,logbsize=262144,noatime,nodiratime,swalloc /dev/mapper/vgdata01-lv01 /data01
4. xfsctl 优化部分
控制文件打开策略,略。
[排错]
#mount -o noatime,swalloc /dev/mapper/vgdata01-lv01 /data01
mount: Function not implemented
原因是用了目前内核不支持的块大小,改成4096即可
[ 5736.642924] XFS (dm-0): File system with blocksize 8192 bytes. Only pagesize (4096) or less will currently work.
[ 5736.695146] XFS (dm-0): SB validate failed with error -38.
排除
#mkfs.xfs -f -b size=4096 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01
meta-data=/dev/mapper/vgdata01-lv01 isize=256 agcount=9000, agsize=520834 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=4687497216, imaxpct=5
= sunit=2 swidth=6 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=520830, version=2
= sectsz=512 sunit=2 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
[参考]
1.
xfs(5) xfs(5)
NAME
xfs - layout of the XFS filesystem
DESCRIPTION
An XFS filesystem can reside on a regular disk partition or on a logical volume. An XFS filesystem has up to three parts: a data section, a log section, and a realtime section. Using the default
mkfs.xfs(8) options, the realtime section is absent, and the log area is contained within the data section. The log section can be either separate from the data section or contained within it. The
filesystem sections are divided into a certain number of blocks, whose size is specified at mkfs.xfs(8) time with the -b option.
The data section contains all the filesystem metadata (inodes, directories, indirect blocks) as well as the user file data for ordinary (non-realtime) files and the log area if the log is internal to the
data section. The data section is divided into a number of allocation groups. The number and size of the allocation groups are chosen by mkfs.xfs(8) so that there is normally a small number of equal-sized
groups. The number of allocation groups controls the amount of parallelism available in file and block allocation. It should be increased from the default if there is sufficient memory and a lot of allo-
cation activity. The number of allocation groups should not be set very high, since this can cause large amounts of CPU time to be used by the filesystem, especially when the filesystem is nearly full.
More allocation groups are added (of the original size) when xfs_growfs(8) is run.
The log section (or area, if it is internal to the data section) is used to store changes to filesystem metadata while the filesystem is running until those changes are made to the data section. It is
written sequentially during normal operation and read only during mount. When mounting a filesystem after a crash, the log is read to complete operations that were in progress at the time of the crash.
The realtime section is used to store the data of realtime files. These files had an attribute bit set through xfsctl(3) after file creation, before any data was written to the file. The realtime section
is divided into a number of extents of fixed size (specified at mkfs.xfs(8) time). Each file in the realtime section has an extent size that is a multiple of the realtime section extent size.
Each allocation group contains several data structures. The first sector contains the superblock. For allocation groups after the first, the superblock is just a copy and is not updated after mkfs.xfs(8).
The next three sectors contain information for block and inode allocation within the allocation group. Also contained within each allocation group are data structures to locate free blocks and inodes;
these are located through the header structures.
Each XFS filesystem is labeled with a Universal Unique Identifier (UUID). The UUID is stored in every allocation group header and is used to help distinguish one XFS filesystem from another, therefore you
should avoid using dd(1) or other block-by-block copying programs to copy XFS filesystems. If two XFS filesystems on the same machine have the same UUID, xfsdump(8) may become confused when doing incremen-
tal and resumed dumps. xfsdump(8) and xfsrestore(8) are recommended for making copies of XFS filesystems.
OPERATIONS
Some functionality specific to the XFS filesystem is accessible to applications through the xfsctl(3) and by-handle (see open_by_handle(3)) interfaces.
MOUNT OPTIONS
Refer to the mount(8) manual entry for descriptions of the individual XFS mount options.
SEE ALSO
xfsctl(3), mount(8), mkfs.xfs(8), xfs_info(8), xfs_admin(8), xfsdump(8), xfsrestore(8).
xfs(5)