ZFS ARC & L2ARC zfs-$ver/module/zfs/arc.c

简介:
可调参数, 含义以及默认值见arc.c或/sys/module/zfs/parameter/$parm_name
如果是freebsd或其他原生支持zfs的系统, 调整sysctl.conf.
parm:           zfs_arc_min:Min arc size (ulong)
parm:           zfs_arc_max:Max arc size (ulong)
parm:           zfs_arc_meta_limit:Meta limit for arc size (ulong)
parm:           zfs_arc_meta_prune:Bytes of meta data to prune (int)
parm:           zfs_arc_grow_retry:Seconds before growing arc size (int)
parm:           zfs_arc_shrink_shift:log2(fraction of arc to reclaim) (int)
parm:           zfs_arc_p_min_shift:arc_c shift to calc min/max arc_p (int)
parm:           zfs_disable_dup_eviction:disable duplicate buffer eviction (int)
parm:           zfs_arc_memory_throttle_disable:disable memory throttle (int)
parm:           zfs_arc_min_prefetch_lifespan:Min life of prefetch block (int)
parm:           l2arc_write_max:Max write bytes per interval (ulong)
parm:           l2arc_write_boost:Extra write bytes during device warmup (ulong)
parm:           l2arc_headroom:Number of max device writes to precache (ulong)
parm:           l2arc_headroom_boost:Compressed l2arc_headroom multiplier (ulong)
parm:           l2arc_feed_secs:Seconds between L2ARC writing (ulong)
parm:           l2arc_feed_min_ms:Min feed interval in milliseconds (ulong)
parm:           l2arc_noprefetch:Skip caching prefetched buffers (int)
parm:           l2arc_nocompress:Skip compressing L2ARC buffers (int)
parm:           l2arc_feed_again:Turbo L2ARC warmup (int)
parm:           l2arc_norw:No reads during writes (int)


L2ARC几点需要注意, 
1. L2ARC的内容是l2arc_feed_thread函数主动间歇性的从ARC读取的. 所以ARC里没有的内容, L2ARC也不可能有.
2. L2ARC不存储脏数据, 所以也不需要回写到DISK. 鉴于这个因素, L2ARC不适合频繁变更的场景(如oltp中的频繁更新场景)
3. 如果L2ARC中缓存的数据块在ARC变成脏数据了, 这部分数据会直接从L2ARC丢弃.
4. L2ARC的优化参数(配置到/etc/modprobe.d/zfs.conf或动态变更/sys/module/zfs/parameters/$PARM_NAME)
 *      l2arc_write_max         max write bytes per interval, 一次l2arc feed的最大量.
 *      l2arc_write_boost       extra write bytes during device warmup
 *      l2arc_noprefetch        skip caching prefetched buffers
 *      l2arc_nocompress        skip compressing buffers
 *      l2arc_headroom          number of max device writes to precache
 *      l2arc_headroom_boost    when we find compressed buffers during ARC
 *                              scanning, we multiply headroom by this
 *                              percentage factor for the next scan cycle,
 *                              since more compressed buffers are likely to
 *                              be present
 *      l2arc_feed_secs         seconds between L2ARC writing, 如果要加快从arc导入l2arc的速度, 可缩短interval


参见
zfs-0.6.2/module/zfs/arc.c

ARC
/*
 * DVA-based Adjustable Replacement Cache
 *
 * While much of the theory of operation used here is
 * based on the self-tuning, low overhead replacement cache
 * presented by Megiddo and Modha at FAST 2003, there are some
 * significant differences:
 *
 * 1. The Megiddo and Modha model assumes any page is evictable.
 * Pages in its cache cannot be "locked" into memory.  This makes
 * the eviction algorithm simple: evict the last page in the list.
 * This also make the performance characteristics easy to reason
 * about.  Our cache is not so simple.  At any given moment, some
 * subset of the blocks in the cache are un-evictable because we
 * have handed out a reference to them.  Blocks are only evictable
 * when there are no external references active.  This makes
 * eviction far more problematic:  we choose to evict the evictable
 * blocks that are the "lowest" in the list.
 *
 * There are times when it is not possible to evict the requested
 * space.  In these circumstances we are unable to adjust the cache
 * size.  To prevent the cache growing unbounded at these times we
 * implement a "cache throttle" that slows the flow of new data
 * into the cache until we can make space available.
 *
 * 2. The Megiddo and Modha model assumes a fixed cache size.
 * Pages are evicted when the cache is full and there is a cache
 * miss.  Our model has a variable sized cache.  It grows with
 * high use, but also tries to react to memory pressure from the
 * operating system: decreasing its size when system memory is
 * tight.
 *
 * 3. The Megiddo and Modha model assumes a fixed page size. All
 * elements of the cache are therefor exactly the same size.  So
 * when adjusting the cache size following a cache miss, its simply
 * a matter of choosing a single page to evict.  In our model, we
 * have variable sized cache blocks (rangeing from 512 bytes to
 * 128K bytes).  We therefor choose a set of blocks to evict to make
 * space for a cache miss that approximates as closely as possible
 * the space used by the new block.
 *
 * See also:  "ARC: A Self-Tuning, Low Overhead Replacement Cache"
 * by N. Megiddo & D. Modha, FAST 2003
 */

/*
 * The locking model:
 *
 * A new reference to a cache buffer can be obtained in two
 * ways: 1) via a hash table lookup using the DVA as a key,
 * or 2) via one of the ARC lists.  The arc_read() interface
 * uses method 1, while the internal arc algorithms for
 * adjusting the cache use method 2.  We therefor provide two
 * types of locks: 1) the hash table lock array, and 2) the
 * arc list locks.
 *
 * Buffers do not have their own mutexes, rather they rely on the
 * hash table mutexes for the bulk of their protection (i.e. most
 * fields in the arc_buf_hdr_t are protected by these mutexes).
 *
 * buf_hash_find() returns the appropriate mutex (held) when it
 * locates the requested buffer in the hash table.  It returns
 * NULL for the mutex if the buffer was not in the table.
 *
 * buf_hash_remove() expects the appropriate hash mutex to be
 * already held before it is invoked.
 *
 * Each arc state also has a mutex which is used to protect the
 * buffer list associated with the state.  When attempting to
 * obtain a hash table lock while holding an arc list lock you
 * must use: mutex_tryenter() to avoid deadlock.  Also note that
 * the active state mutex must be held before the ghost state mutex.
 *
 * Arc buffers may have an associated eviction callback function.
 * This function will be invoked prior to removing the buffer (e.g.
 * in arc_do_user_evicts()).  Note however that the data associated
 * with the buffer may be evicted prior to the callback.  The callback
 * must be made with *no locks held* (to prevent deadlock).  Additionally,
 * the users of callbacks must ensure that their private data is
 * protected from simultaneous callbacks from arc_buf_evict()
 * and arc_do_user_evicts().
 *
 * It as also possible to register a callback which is run when the
 * arc_meta_limit is reached and no buffers can be safely evicted.  In
 * this case the arc user should drop a reference on some arc buffers so
 * they can be reclaimed and the arc_meta_limit honored.  For example,
 * when using the ZPL each dentry holds a references on a znode.  These
 * dentries must be pruned before the arc buffer holding the znode can
 * be safely evicted.
 *
 * Note that the majority of the performance stats are manipulated
 * with atomic operations.
 *
 * The L2ARC uses the l2arc_buflist_mtx global mutex for the following:
 *
 *      - L2ARC buflist creation
 *      - L2ARC buflist eviction
 *      - L2ARC write completion, which walks L2ARC buflists
 *      - ARC header destruction, as it removes from L2ARC buflists
 *      - ARC header release, as it removes from L2ARC buflists
 */


L2ARC
/*
 * Level 2 ARC
 *
 * The level 2 ARC (L2ARC) is a cache layer in-between main memory and disk.
 * It uses dedicated storage devices to hold cached data, which are populated
 * using large infrequent writes.  The main role of this cache is to boost
 * the performance of random read workloads.  The intended L2ARC devices
 * include short-stroked disks, solid state disks, and other media with
 * substantially faster read latency than disk.
 *
 *                 +-----------------------+
 *                 |         ARC           |
 *                 +-----------------------+
 *                    |         ^     ^
 *                    |         |     |
 *      l2arc_feed_thread()    arc_read()
 *                    |         |     |
 *                    |  l2arc read   |
 *                    V         |     |
 *               +---------------+    |
 *               |     L2ARC     |    |
 *               +---------------+    |
 *                   |    ^           |
 *          l2arc_write() |           |
 *                   |    |           |
 *                   V    |           |
 *                 +-------+      +-------+
 *                 | vdev  |      | vdev  |
 *                 | cache |      | cache |
 *                 +-------+      +-------+
 *                 +=========+     .-----.
 *                 :  L2ARC  :    |-_____-|
 *                 : devices :    | Disks |
 *                 +=========+    `-_____-'
 *
 * Read requests are satisfied from the following sources, in order:
 *
 *      1) ARC
 *      2) vdev cache of L2ARC devices
 *      3) L2ARC devices
 *      4) vdev cache of disks
 *      5) disks
 *
 * Some L2ARC device types exhibit extremely slow write performance.
 * To accommodate for this there are some significant differences between
 * the L2ARC and traditional cache design:
 *
 * 1. There is no eviction path from the ARC to the L2ARC.  Evictions from
 * the ARC behave as usual, freeing buffers and placing headers on ghost
 * lists.  The ARC does not send buffers to the L2ARC during eviction as
 * this would add inflated write latencies for all ARC memory pressure.
 *
 * 2. The L2ARC attempts to cache data from the ARC before it is evicted.
 * It does this by periodically scanning buffers from the eviction-end of
 * the MFU and MRU ARC lists, copying them to the L2ARC devices if they are
 * not already there. It scans until a headroom of buffers is satisfied,
 * which itself is a buffer for ARC eviction. If a compressible buffer is
 * found during scanning and selected for writing to an L2ARC device, we
 * temporarily boost scanning headroom during the next scan cycle to make
 * sure we adapt to compression effects (which might significantly reduce
 * the data volume we write to L2ARC). The thread that does this is
 * l2arc_feed_thread(), illustrated below; example sizes are included to
 * provide a better sense of ratio than this diagram:
 *
 *             head -->                        tail
 *              +---------------------+----------+
 *      ARC_mfu |:::::#:::::::::::::::|o#o###o###|-->.   # already on L2ARC
 *              +---------------------+----------+   |   o L2ARC eligible
 *      ARC_mru |:#:::::::::::::::::::|#o#ooo####|-->|   : ARC buffer
 *              +---------------------+----------+   |
 *                   15.9 Gbytes      ^ 32 Mbytes    |
 *                                 headroom          |
 *                                            l2arc_feed_thread()
 *                                                   |
 *                       l2arc write hand <--[oooo]--'
 *                               |           8 Mbyte
 *                               |          write max
 *                               V
 *                +==============================+
 *      L2ARC dev |####|#|###|###|    |####| ... |
 *                +==============================+
 *                           32 Gbytes
 *
 * 3. If an ARC buffer is copied to the L2ARC but then hit instead of
 * evicted, then the L2ARC has cached a buffer much sooner than it probably
 * needed to, potentially wasting L2ARC device bandwidth and storage.  It is
 * safe to say that this is an uncommon case, since buffers at the end of
 * the ARC lists have moved there due to inactivity.
 *
 * 4. If the ARC evicts faster than the L2ARC can maintain a headroom,
 * then the L2ARC simply misses copying some buffers.  This serves as a
 * pressure valve to prevent heavy read workloads from both stalling the ARC
 * with waits and clogging the L2ARC with writes.  This also helps prevent
 * the potential for the L2ARC to churn if it attempts to cache content too
 * quickly, such as during backups of the entire pool.
 *
 * 5. After system boot and before the ARC has filled main memory, there are
 * no evictions from the ARC and so the tails of the ARC_mfu and ARC_mru
 * lists can remain mostly static.  Instead of searching from tail of these
 * lists as pictured, the l2arc_feed_thread() will search from the list heads
 * for eligible buffers, greatly increasing its chance of finding them.
 *
 * The L2ARC device write speed is also boosted during this time so that
 * the L2ARC warms up faster.  Since there have been no ARC evictions yet,
 * there are no L2ARC reads, and no fear of degrading read performance
 * through increased writes.
 *
 * 6. Writes to the L2ARC devices are grouped and sent in-sequence, so that
 * the vdev queue can aggregate them into larger and fewer writes.  Each
 * device is written to in a rotor fashion, sweeping writes through
 * available space then repeating.
 *
 * 7. The L2ARC does not store dirty content.  It never needs to flush
 * write buffers back to disk based storage.
 *
 * 8. If an ARC buffer is written (and dirtied) which also exists in the
 * L2ARC, the now stale L2ARC buffer is immediately dropped.
 *
 * The performance of the L2ARC can be tweaked by a number of tunables, which
 * may be necessary for different workloads:
 *
 *      l2arc_write_max         max write bytes per interval
 *      l2arc_write_boost       extra write bytes during device warmup
 *      l2arc_noprefetch        skip caching prefetched buffers
 *      l2arc_nocompress        skip compressing buffers
 *      l2arc_headroom          number of max device writes to precache
 *      l2arc_headroom_boost    when we find compressed buffers during ARC
 *                              scanning, we multiply headroom by this
 *                              percentage factor for the next scan cycle,
 *                              since more compressed buffers are likely to
 *                              be present
 *      l2arc_feed_secs         seconds between L2ARC writing
 *
 * Tunables may be removed or added as future performance improvements are
 * integrated, and also may become zpool properties.
 *
 * There are three key functions that control how the L2ARC warms up:
 *
 *      l2arc_write_eligible()  check if a buffer is eligible to cache
 *      l2arc_write_size()      calculate how much to write
 *      l2arc_write_interval()  calculate sleep delay between writes
 *
 * These three functions determine what to write, how much, and how quickly
 * to send writes.
 */

目录
相关文章
|
11月前
|
运维 监控 安全
盘点Linux服务器运维管理面板
随着云计算和大数据技术的迅猛发展,Linux服务器在运维管理中扮演着越来越重要的角色。传统的Linux服务器管理方式已经无法满足现代企业的需求,因此,高效、安全、易用的运维管理面板应运而生。
|
XML 安全 Java
微信公众平台安全模式下传输xml数据包时解密方式
微信公众平台安全模式下传输xml数据包时解密方式
533 0
|
JavaScript 前端开发 Java
毕设项目-基于Springboot和Vue实现蛋糕商城系统
毕设项目-基于Springboot和Vue实现蛋糕商城系统
479 1
|
弹性计算 大数据 测试技术
阿里云2核8G云服务器价格多少钱?2024年阿里云2核8G云服务器性能测评
2024年阿里云2核8G云服务器的价格有两种说法。一种是2核8G配置,年付价格为877.32元。另一种是作为通用算力型u1实例,2核8G配置的价格为955.58元一年。关于阿里云2核8G云服务器的性能测评,该服务器配备了8GB的内存和2核的CPU,具有较高的计算能力和处理速度,能够满足大部分中小型网站和应用的需求。同时,服务器还提供了1M-3M的固定带宽,确保了网络连接的稳定性和快速性。ESSD Entry盘20G起的存储空间也能够满足一般用户的需求。总体来说,阿里云2核8G云服务器在性能和价格方面都有不错的表现,适合中小型企业和个人开发者使用。用户可以根据自己的实际需求选择合适的配置和促销
448 0
|
存储 测试技术 Linux
存储稳定性测试与数据一致性校验工具和系统
LBA tools are very useful for testing Storage stability and verifying DATA consistency, there are much better than FIO & vdbench's verifying functions.
1528 0
|
存储 监控 算法
《Ceph源码分析》——导读
本节书摘来自华章出版社《Ceph源码分析》一书中的导读,作者常涛,更多章节内容可以访问云栖社区“华章计算机”公众号查看 目  录序言第1章 Ceph整体架构 **1.1 Ceph的发展历程1.2 Ceph的设计目标1.
7272 1
|
Unix Linux 网络安全
林帆:Docker运行GUI软件的方法
欢迎关注大数据和人工智能技术文章发布的微信公众号:清研学堂,在这里你可以学到夜白(作者笔名)精心整理的笔记,让我们每天进步一点点,让优秀成为一种习惯! 继上周的“Kubernetes v1.0特性解析”分享之后,本周我们邀请到ThoughtWorks咨询师林帆为大家带来主题为“Docker运行GUI软件的方法”的分享。
3772 0
|
Shell iOS开发 MacOS
Homebrew brew update 长时间没反应(或卡在 Updating Homebrew...)
Homebrew brew update 长时间没反应(或卡在 Updating Homebrew...)
4138 0
|
缓存 关系型数据库 MySQL
Linux环境变量设置(PATH/LIBRARY_PATH/LD_LIBRARY_PATH)
Linux环境变量设置(PATH/LIBRARY_PATH/LD_LIBRARY_PATH)
9081 0