Redis 源码分析字典（dict）（中）

2022-04-26 177

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云数据库 Tair（兼容Redis），内存型 2GB

Redis 开源版，标准版 2GB

简介： R上edis 源码分析字典（dict）

哈希算法

哈希算法

在 redis 中 hash 算法默认使用的是 siphash 算法（虽然也可以自定义）。计算哈希值和索引值的方法如下：

1、使用字典设置的哈希函数，计算键 key 的哈希值

hash = dict -> type -> hashFunction(Key);

2、使用哈希表的 sizemake 属性和第一步得到哈希值，计算索引值

#define DICTHT_SIZE_MASK(exp) ((exp) == -1 ? 0 : (DICTHT_SIZE(exp))-1)
index = hash & DICTHT_SIZE_MASK(d->ht_size_exp[table]);

哈希冲突

redis 解决哈希冲突的方法是链地址法，而且采用的是头插式

哈希拓容

采用链地址法来解决哈希冲突，如果数据量太大，大量的冲突会导致某个桶上的链表非常长，不利于数据查询，因此需要更具负载因子（负载因子用来评估键冲突的概率）来判断是否需要进行拓容量，就执行函数 _dictExpandIfNeeded。其中 resize 还与是否进行持久化有关。

void updateDictResizePolicy(void) {
    if (!hasActiveChildProcess())
        dictEnableResize();
    else
        dictDisableResize();
}

_dictExpandIfNeeded 函数

static int _dictExpandIfNeeded(dict *d)
{
    // 如果 rehash 在进行就不需要拓容
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK;
    //如果 ht_size_exp[0] 为空，那么就拓展为默认大小
    /* If the hash table is empty expand it to the initial size. */
    if (DICTHT_SIZE(d->ht_size_exp[0]) == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);
    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (d->ht_used[0] >= DICTHT_SIZE(d->ht_size_exp[0]) &&
        (dict_can_resize ||
         d->ht_used[0]/ DICTHT_SIZE(d->ht_size_exp[0]) > dict_force_resize_ratio) &&
        dictTypeExpandAllowed(d))
    {
        return dictExpand(d, d->ht_used[0] + 1);
    }
    return DICT_OK;
}

思考一下：为什么持久化下，尽量减少 resize?

缩容

Redis 中的 reszie 的条件是由两处决定 htNeedsResize 和 dictResize

如果了初始值填充小于 10% ，这个说明需要缩容

#define DICT_HT_INITIAL_EXP      2
#define DICT_HT_INITIAL_SIZE     (1<<(DICT_HT_INITIAL_EXP))
/* Hash table parameters */
#define HASHTABLE_MIN_FILL        10      /* Minimal hash table fill 10% */
int htNeedsResize(dict *dict) {
    long long size, used;
    size = dictSlots(dict);
    used = dictSize(dict);
    return (size > DICT_HT_INITIAL_SIZE &&
            (used*100/size < HASHTABLE_MIN_FILL));
}

发生的时机主要是在每次数据移除 serverCron 定时任务中

serverCron 定时任务：

void databasesCron(void) {
    /* Expire keys by random sampling. Not required for slaves
     * as master will synthesize DELs for us. */
    if (server.active_expire_enabled) {
        if (iAmMaster()) {
            activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW);
        } else {
            expireSlaveKeys();
        }
    }
    /* Defrag keys gradually. */
    activeDefragCycle();
    /* Perform hash tables rehashing if needed, but only if there are no
     * other processes saving the DB on disk. Otherwise rehashing is bad
     * as will cause a lot of copy-on-write of memory pages. */
    if (!hasActiveChildProcess()) {
        /* We use global counters so if we stop the computation at a given
         * DB we'll be able to start from the successive in the next
         * cron loop iteration. */
        static unsigned int resize_db = 0;
        static unsigned int rehash_db = 0;
        int dbs_per_call = CRON_DBS_PER_CALL;
        int j;
        /* Don't test more DBs than we have. */
        if (dbs_per_call > server.dbnum) dbs_per_call = server.dbnum;
        /* Resize */
        for (j = 0; j < dbs_per_call; j++) {
            tryResizeHashTables(resize_db % server.dbnum);
            resize_db++;
        }
        /* Rehash */
        if (server.activerehashing) {
            for (j = 0; j < dbs_per_call; j++) {
                int work_done = incrementallyRehash(rehash_db);
                if (work_done) {
                    /* If the function did some work, stop here, we'll do
                     * more at the next cron loop. */
                    break;
                } else {
                    /* If this db didn't need rehash, we'll try the next one. */
                    rehash_db++;
                    rehash_db %= server.dbnum;
                }
            }
        }
    }
}

rehash 优化

rehash 过程，ht_table[0] 中会存在空节点，为了防止阻塞，每次限定如果连续访问 10 * N （N 表示步长）个空字节后，直接返回。见 dictRehash 函数

int dictRehash(dict *d, int n) {
    int empty_visits = n*10; /* Max number of empty buckets to visit. */
    if (!dictIsRehashing(d)) return 0;
    while(n-- && d->ht_used[0] != 0) {
        dictEntry *de, *nextde;
        /* Note that rehashidx can't overflow as we are sure there are more
         * elements because ht[0].used != 0 */
        assert(DICTHT_SIZE(d->ht_size_exp[0]) > (unsigned long)d->rehashidx);
        while(d->ht_table[0][d->rehashidx] == NULL) {
            d->rehashidx++;
            if (--empty_visits == 0) return 1;
        }
        de = d->ht_table[0][d->rehashidx];
        /* Move all the keys in this bucket from the old to the new hash HT */
        while(de) {
            uint64_t h;
            nextde = de->next;
            /* Get the index in the new hash table */
            h = dictHashKey(d, de->key) & DICTHT_SIZE_MASK(d->ht_size_exp[1]);
            de->next = d->ht_table[1][h];
            d->ht_table[1][h] = de;
            d->ht_used[0]--;
            d->ht_used[1]++;
            de = nextde;
        }
        d->ht_table[0][d->rehashidx] = NULL;
        d->rehashidx++;
    }
    /* Check if we already rehashed the whole table... */
    if (d->ht_used[0] == 0) {
        zfree(d->ht_table[0]);
        /* Copy the new ht onto the old one */
        d->ht_table[0] = d->ht_table[1];
        d->ht_used[0] = d->ht_used[1];
        d->ht_size_exp[0] = d->ht_size_exp[1];
        _dictReset(d, 1);
        d->rehashidx = -1;
        return 0;
    }
    /* More to rehash... */
    return 1;
}

渐进式 hash , 是指 rehash 操作不是一次性、集中式完成的。

第一类：正在操作上面的 dictRehash 函数。但是对于 redis. 来说如果 hash 表的 key 太多，这样可能导致 rehash 操作需要很长的时间进行，阻塞服务器，所以 redis 本身将 rehash 操作分散在了后续的每次增删改查（以桶为单位）。

第二类：针对一类渐进式 rehash , 存在一个问题；如果 a 服务器长期处理空闲状态，导致哈希表长期使用 0 和 1 号两个表。为了解决这个问题，在 serverCron 定时函数中，每次拿出 1ms 时间来执行 rehash 操作，每次步长为 100 ，但是需要开启 activerehashing 。见 databaseCron 函数。

Redis 源码分析字典（dict）（中）

哈希算法

rehash 优化

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Redis 源码分析字典（dict）（中）

哈希算法

rehash 优化

热门文章

最新文章

相关课程

相关电子书

相关实验场景