3

Redis源码阅读:字典

 3 years ago
source link: https://jiajunhuang.com/articles/2021_05_24-redis_source_code_3.md.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Redis源码阅读:字典

dict,也就是哈希表这个数据结构,在Redis中的作用非常广泛,比如,Redis用它来存储支持的命令,这篇文章我们会看一下Redis是 如何实现dict的。

上一篇我们讲到,Redis读取网络请求的内容,解析出命令后,开始处理。其中就有一个函数叫做 processCommand ,就是用来处理客户端请求的:

int processCommand(client *c) {
    // ...

    /* Now lookup the command and check ASAP about trivial error conditions
     * such as wrong arity, bad command name and so forth. */
    c->cmd = c->lastcmd = lookupCommand(c->argv[0]->ptr);

    // ...
}

这里的 lookupCommand 就是我开始讲的,Redis用dict来存储命令。那么是在哪里存储的呢?在 initServerConfig 里有这么一行:

    populateCommandTable();

点进去看:

/* Populates the Redis Command Table starting from the hard coded list
 * we have on top of server.c file. */
void populateCommandTable(void) {
    int j;
    int numcommands = sizeof(redisCommandTable)/sizeof(struct redisCommand);

    for (j = 0; j < numcommands; j++) {
        struct redisCommand *c = redisCommandTable+j;
        int retval1, retval2;

        /* Translate the command string flags description into an actual
         * set of flags. */
        if (populateCommandTableParseFlags(c,c->sflags) == C_ERR)
            serverPanic("Unsupported command flag");

        c->id = ACLGetCommandID(c->name); /* Assign the ID used for ACL. */
        retval1 = dictAdd(server.commands, sdsnew(c->name), c);
        /* Populate an additional dictionary that will be unaffected
         * by rename-command statements in redis.conf. */
        retval2 = dictAdd(server.orig_commands, sdsnew(c->name), c);
        serverAssert(retval1 == DICT_OK && retval2 == DICT_OK);
    }
}

其实就是相当于在Go里,声明一个 map[string]Command{} 然后把所有的命令名字作为key,Command结构体作为value存进去。 最上面的 redisCommandTable 就是Redis支持的所有命令的一个列表:

struct redisCommand redisCommandTable[] = {
    {"module",moduleCommand,-2,
     "admin no-script",
     0,NULL,0,0,0,0,0,0},

    {"get",getCommand,2,
     "read-only fast @string",
     0,NULL,1,1,1,0,0,0},

    // ...

现在我们知道dict在Redis中是非常有用的一个数据结构了,偷偷告诉你,Redis中的expire命令也用到了dict:

/* Redis database representation. There are multiple databases identified
 * by integers from 0 (the default database) up to the max configured
 * database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
    dict *expires;              /* Timeout of keys with a timeout set */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
    unsigned long expires_cursor; /* Cursor of the active expire cycle. */
    list *defrag_later;         /* List of key names to attempt to defrag one by one, gradually. */
} redisDb;

不过这都扯远了,我们以后再来探索这些。回到正题,从上面的用法来看,我们知道,Redis的dict实现是类型无关的,那么 它到底是怎么实现的呢?让我们深入探索一下:

typedef struct dict {
    dictType *type;
    void *privdata;
    dictht ht[2];
    long rehashidx; /* rehashing not in progress if rehashidx == -1 */
    int16_t pauserehash; /* If >0 rehashing is paused (<0 indicates coding error) */
} dict;

可以看到,dict的结构体里,有 dictType *typedictht ht[2]long rehashidx 三个很重要的结构,其余两个我暂时 还不知道是干啥用的,但是先不管,我们继续看,先看 dictType 到底是什么:

typedef struct dictType {
    uint64_t (*hashFunction)(const void *key);
    void *(*keyDup)(void *privdata, const void *key);
    void *(*valDup)(void *privdata, const void *obj);
    int (*keyCompare)(void *privdata, const void *key1, const void *key2);
    void (*keyDestructor)(void *privdata, void *key);
    void (*valDestructor)(void *privdata, void *obj);
    int (*expandAllowed)(size_t moreMem, double usedRatio);
} dictType;

原来是一个结构体,这个结构体里全是函数指针,这是C里实现类型无关代码的常用方式,也就是不管具体类型,调用者传入处理具体 数据的函数指针,然后数据都以 void * 的指针来传递。我们知道 type 是一个 dictType * 的类型就可以了,我们继续看 dictht ht[2] 是什么:

/* This is our hash table structure. Every dictionary has two of this as we
 * implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
    dictEntry **table;
    unsigned long size;
    unsigned long sizemask;
    unsigned long used;
} dictht;

注释非常重要,它说了,这是我们哈希表的真正结构,每一个dict都有两个表,因为这样才可以实现渐进式哈希,渐进式哈希是什么呢? 试想,如果你是数据结构的实现者,当你的哈希表容量不够,冲突率严重上升时,你是不是应该选择扩容,那么扩容怎么做呢?当然是 创建一个容量更大的哈希表,然后把数据搬过去,替换老的表,然后把老表free。

这样可以吗?这样当然是一种方式,可是这样做有一个缺点,如果这个哈希表的数据量非常大,那么处理就会很耗时间,Redis的设计 是一个高并发的数据结构服务器,如果那样做的话,就势必突然有一些客户端的某些请求特别慢,因为恰巧触发了rehash。所以Redis 没有采用这种方式,而是采用了渐进式哈希,也就是每一次都搬一部分数据,直到搬完为止。

我们继续看Redis代码,dictht 的第一个属性,dictEntry **table 就是存储真正的数据的地方,size 是当前所申请的存储 数据的大小,哈希表说白了,就是用一个大数组,然后计算哈希值,对数组长度取余,放到对应的slot上,对于C来说,dictEntry **table 的意思就是,table 是一个指针,指向一块内存,而这块内存存储的数据的类型是 dictEntry *,可能有点绕,但是习惯了就好了。 sizemask 是用来做位运算的掩码,used 是已经存储的数据的大小。为啥要有 sizemask 呢?这是因为,当大小为2的幂时, 取余操作 num % size 可以直接用位运算 num & (size - 1) 来做,这样会更快,我们来看看:

#include <stdio.h>
#include <time.h>
#include <sys/time.h>

#define LOOP_TIME (8589934592 * 3)

double timeit(void (*f)()) {
    struct timeval start, end;

    gettimeofday(&start, NULL);
    (*f)();
    gettimeofday(&end, NULL);

    return end.tv_sec + end.tv_usec / 1e6 - start.tv_sec - start.tv_usec / 1e6; // in seconds
}

void use_bitwise(void) {
    long size = 64;
    long mask = size - 1;

    long num = 4294967296;
    for (long i = 0; i < LOOP_TIME; i++) {
        num & mask;
    }
}

void use_mod(void) {
    long size = 64;
    long mask = size - 1;

    long num = 4294967296;
    for (long i = 0; i < LOOP_TIME; i++) {
        num % size;
    }
}

int main(void) {
    for (int i = 0; i < 3; i++) {
        printf("use_bitwise took %f seconds, use_mod took %f seconds\n", timeit(use_bitwise), timeit(use_mod));
    }
}

执行一下:

$ cc -O0 main.c && ./a.out 
use_bitwise took 10.128805 seconds, use_mod took 10.356709 seconds
use_bitwise took 10.151692 seconds, use_mod took 10.334191 seconds
use_bitwise took 10.304045 seconds, use_mod took 10.360759 seconds

well,还是更快一丢丢的。我们再次回到主题,看下dict是如何使用的,上面说了,populateCommandTable() 把命令从列表存到 dict里,但是这个dict并不是在这里创建的,而是 populateCommandTable() 上面:

    /* Command table -- we initialize it here as it is part of the
     * initial configuration, since command names may be changed via
     * redis.conf using the rename-command directive. */
    server.commands = dictCreate(&commandTableDictType,NULL);

我们来看看 dictCreate

/* Create a new hash table */
dict *dictCreate(dictType *type,
        void *privDataPtr)
{
    dict *d = zmalloc(sizeof(*d));

    _dictInit(d,type,privDataPtr);
    return d;
}

/* Initialize the hash table */
int _dictInit(dict *d, dictType *type,
        void *privDataPtr)
{
    _dictReset(&d->ht[0]);
    _dictReset(&d->ht[1]);
    d->type = type;
    d->privdata = privDataPtr;
    d->rehashidx = -1;
    d->pauserehash = 0;
    return DICT_OK;
}

/* Reset a hash table already initialized with ht_init().
 * NOTE: This function should only be called by ht_destroy(). */
static void _dictReset(dictht *ht)
{
    ht->table = NULL;
    ht->size = 0;
    ht->sizemask = 0;
    ht->used = 0;
}

可以看到,dictCreate 的时候,没怎么申请内存,那么肯定是在添加第一个元素的时候申请的,我们来看下,我翻到 populateCommandTable 函数里,添加是调用的 dictAdd

/* Add an element to the target hash table */
int dictAdd(dict *d, void *key, void *val)
{
    dictEntry *entry = dictAddRaw(d,key,NULL);

    if (!entry) return DICT_ERR;
    dictSetVal(d, entry, val);
    return DICT_OK;
}

dictEntry *dictAddRaw(dict *d, void *key, dictEntry **existing)
{
    long index;
    dictEntry *entry;
    dictht *ht;

    if (dictIsRehashing(d)) _dictRehashStep(d);  // 渐进式哈希在这里

    /* Get the index of the new element, or -1 if
     * the element already exists. */
    if ((index = _dictKeyIndex(d, key, dictHashKey(d,key), existing)) == -1) // 计算应当落在哪个slot
        return NULL;

    /* Allocate the memory and store the new entry.
     * Insert the element in top, with the assumption that in a database
     * system it is more likely that recently added entries are accessed
     * more frequently. */
    // 放进去
    ht = dictIsRehashing(d) ? &d->ht[1] : &d->ht[0];
    entry = zmalloc(sizeof(*entry));
    entry->next = ht->table[index];
    ht->table[index] = entry;
    ht->used++;

    /* Set the hash entry fields. */
    dictSetKey(d, entry, key);
    return entry;
}

上面我们看到了是如何添加,那么啥时候会检查是否需要扩容以及开始渐进式哈希呢?原来这个逻辑在 _dictKeyIndex 里:

static long _dictKeyIndex(dict *d, const void *key, uint64_t hash, dictEntry **existing)
{
    unsigned long idx, table;
    dictEntry *he;
    if (existing) *existing = NULL;

    /* Expand the hash table if needed */
    if (_dictExpandIfNeeded(d) == DICT_ERR)
        return -1;
    // ...
}

/* Expand the hash table if needed */
static int _dictExpandIfNeeded(dict *d)
{
    /* Incremental rehashing already in progress. Return. */
    if (dictIsRehashing(d)) return DICT_OK;

    /* If the hash table is empty expand it to the initial size. */
    if (d->ht[0].size == 0) return dictExpand(d, DICT_HT_INITIAL_SIZE);

    /* If we reached the 1:1 ratio, and we are allowed to resize the hash
     * table (global setting) or we should avoid it but the ratio between
     * elements/buckets is over the "safe" threshold, we resize doubling
     * the number of buckets. */
    if (d->ht[0].used >= d->ht[0].size &&
        (dict_can_resize ||
         d->ht[0].used/d->ht[0].size > dict_force_resize_ratio) &&
        dictTypeExpandAllowed(d))
    {
        return dictExpand(d, d->ht[0].used + 1);
    }
    return DICT_OK;
}

/* return DICT_ERR if expand was not performed */
int dictExpand(dict *d, unsigned long size) {
    return _dictExpand(d, size, NULL);
}

/* Expand or create the hash table,
 * when malloc_failed is non-NULL, it'll avoid panic if malloc fails (in which case it'll be set to 1).
 * Returns DICT_OK if expand was performed, and DICT_ERR if skipped. */
int _dictExpand(dict *d, unsigned long size, int* malloc_failed)
{
    if (malloc_failed) *malloc_failed = 0;

    /* the size is invalid if it is smaller than the number of
     * elements already inside the hash table */
    if (dictIsRehashing(d) || d->ht[0].used > size)
        return DICT_ERR;

    dictht n; /* the new hash table */
    unsigned long realsize = _dictNextPower(size);

    /* Rehashing to the same table size is not useful. */
    if (realsize == d->ht[0].size) return DICT_ERR;

    /* Allocate the new hash table and initialize all pointers to NULL */
    n.size = realsize;
    n.sizemask = realsize-1;
    if (malloc_failed) {
        n.table = ztrycalloc(realsize*sizeof(dictEntry*));
        *malloc_failed = n.table == NULL;
        if (*malloc_failed)
            return DICT_ERR;
    } else
        n.table = zcalloc(realsize*sizeof(dictEntry*));

    n.used = 0;

    /* Is this the first initialization? If so it's not really a rehashing
     * we just set the first hash table so that it can accept keys. */
    if (d->ht[0].table == NULL) {
        d->ht[0] = n;
        return DICT_OK;
    }

    /* Prepare a second hash table for incremental rehashing */
    d->ht[1] = n;
    d->rehashidx = 0;
    return DICT_OK;
}

剩下的命令怎么实现的我就不讲了,其实道理都差不多。

这篇文章我们看了一下Redis是如何实现dict也就是哈希表的,哦对了,Redis的冲突解决是用链接法去避免的。


微信公众号
关注公众号,获得及时更新

Go设计模式:桥接模式和策略模式

Go设计模式:单例模式、原型模式和Builder模式

操作系统也是CRUD

Go设计模式:简单工厂模式

把USB设备穿透给虚拟机里的系统

debug故事之:事务让生活更美好

Go设计模式:模板模式

Go设计模式:适配器模式

Go设计模式:Iterator

glusterfs 笔记

用peewee代替SQLAlchemy

Go的slice工作机制

Linux系统迁移记录(从HDD到SSD)

Redis是如何工作的?

virsh自动关闭windows虚拟机




About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK