test zfs dedup vs compress which suit in your environment-阿里云开发者社区

test zfs dedup vs compress which suit in your environment

2016-04-05 2196

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介：

你的数据适合压缩还是适合开启去重?

这个可以拿到你的数据进行评估, 在一个已有使用zdb -S zpname进行评估, 然后使用zdb -DD产生一个报告.

例如 :

对zp1 pool的数据块, 采样评估

[root@db- ~]# zdb -S zp1

报告deduplicate table

[root@db- ~]# zdb -DD zp1

DDT-sha256-zap-duplicate: 272229 entries, size 291 on disk, 141 in core

DDT-sha256-zap-unique: 71346264 entries, size 314 on disk, 167 in core

DDT histogram (aggregated over all DDTs):

bucket allocated referenced

______ ______________________________ ______________________________

refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE

------ ------ ----- ----- ----- ------ ----- ----- -----

1 68.0M 8.50T 1.18T 1.18T 68.0M 8.50T 1.18T 1.18T

2 264K 32.8G 19.3G 19.3G 529K 65.7G 38.5G 38.5G

4 1.64K 210M 886K 886K 7.93K 1015M 4.16M 4.16M

8 330 41.0M 167K 167K 3.18K 404M 1.61M 1.61M

16 59 7.25M 29.5K 29.5K 1.33K 168M 684K 684K

32 69 8.62M 34.5K 34.5K 3.18K 406M 1.59M 1.59M

64 85 10.6M 42.5K 42.5K 8.07K 1.01G 4.04M 4.04M

128 72 9M 36K 36K 11.4K 1.43G 5.72M 5.72M

256 59 7.38M 29.5K 29.5K 17.5K 2.19G 8.76M 8.76M

512 5 640K 2.50K 2.50K 4.26K 546M 2.13M 2.13M

Total 68.3M 8.54T 1.19T 1.19T 68.6M 8.58T 1.21T 1.21T

dedup = 1.02, compress = 7.06, copies = 1.00, dedup * compress / copies = 7.18

从结果来看, 压缩比为7.06, dedup为1.02, 显然这份数据更适合开启压缩.

另外, dedup需要维持数据块的内存表(DDT), 跟踪deduplicate.

那么需要多少内存呢? 一般每个数据块需要320字节的内存.

根据zdb -DD zp1的输出, blocks总共有68.3M个, 那么DDT的大小是68.3M*320约21GB内存.

如果内存不够, 又需要开启dedup的话, 建议增加ssd作为L2ARC来保持DDT.

在没有数据的情况下, 没有办法对其进行评估. 所以至少需要一些测试数据.

我这里的测试数据基本上是一些PostgreSQL csvlog文本文件和tar.bz2文件, 看起来并不适合dedup.

[注意]

1. 压缩和去重一样, 都只对enable后的数据提交生效, 已经存在的数据不会被压缩或去重. 只有enable后加入的数据或修改的数据才能被压缩和去重.

2. 去重的flush操作不是原子操作, 所以断电可能导致数据受损.

Further, deduplicated data is not flushed to disk as an atomic transaction. Instead, the blocks are written to disk serially, one block at a time. Thus, this does open you up for corruption in the event of a power failure before the blocks have been written.

3. 去重的属性是在dataset级别设置的, 但是去重是整个zpool中进行的.

查看去重率也是在zpool get all中看到的.

[root@db- ~]# zpool get all

NAME PROPERTY VALUE SOURCE

zp1 size 9.75T -

zp1 capacity 12% -

zp1 altroot - default

zp1 health ONLINE -

zp1 guid 5877722976139588848 default

zp1 version - default

zp1 bootfs - default

zp1 delegation on default

zp1 autoreplace off default

zp1 cachefile - default

zp1 failmode wait default

zp1 listsnapshots off default

zp1 autoexpand off default

zp1 dedupditto 0 default

zp1 dedupratio 1.01x -

zp1 free 8.52T -

zp1 allocated 1.23T -

zp1 readonly off -

zp1 ashift 0 default

zp1 comment - default

zp1 expandsize 0 -

zp1 freeing 0 default

zp1 feature@async_destroy enabled local

zp1 feature@empty_bpobj active local

zp1 feature@lz4_compress active local

[参考]

1. https://pthree.org/2013/12/18/zfs-administration-appendix-d-the-true-cost-of-deduplication/

2. http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html

3. https://pthree.org/2012/12/18/zfs-administration-part-xi-compression-and-deduplication/

4. http://blog.163.com/digoal@126/blog/static/163877040201441992944647/

5. man zdb

-S

Simulate the effects of deduplication, constructing a DDT and then display that DDT as with -DD.

-D

Display deduplication statistics, including the deduplication ratio (dedup), compression ratio (compress),

inflation due to the zfs copies property (copies), and an overall effective ratio (dedup * compress /

copies).

If specified twice, display a histogram of deduplication statistics, showing the allocated (physically

present on disk) and referenced (logically referenced in the pool) block counts and sizes by reference

count.

If specified a third time, display the statistics independently for each deduplication table.

If specified a fourth time, dump the contents of the deduplication tables describing duplicate blocks.

If specified a fifth time, also dump the contents of the deduplication tables describing unique blocks.

文章标签：

C++

test zfs dedup vs compress which suit in your environment

热门文章

最新文章

相关电子书