31

etcd源码阅读(三):wal

 5 years ago
source link: https://jiajunhuang.com/articles/2018_11_24-etcd_source_code_analysis_wal.md.html?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

今天来看看WAL(Write-Ahead Logging)。这是数据库中保证数据持久化的常用技术,即每次真正操作数据之前,先往磁盘上追加一条日志,由于日志 是追加的,也就是顺序写,而不是随机写,所以写入性能还是很高的。这样做的目的是,如果在写入磁盘之前发生崩溃,那么数据肯定是没有写入 的,如果在写入后发生崩溃,那么还是可以从WAL里恢复出来。

首先看一下 wal 里有什么:

$ tree
.
├── decoder.go
├── doc.go
├── encoder.go
├── file_pipeline.go
├── file_pipeline_test.go
├── metrics.go
├── record_test.go
├── repair.go
├── repair_test.go
├── util.go
├── wal.go
├── wal_bench_test.go
├── wal_test.go
└── walpb
    ├── record.go
    ├── record.pb.go
    └── record.proto

1 directory, 16 files

我们先阅读 doc.go ,可以知道这些东西:

w.Save
w.Close

我们看看 Save 的实现:

func (w *WAL) Save(st raftpb.HardState, ents []raftpb.Entry) error {
    w.mu.Lock()
    defer w.mu.Unlock()

    // short cut, do not call sync
    if raft.IsEmptyHardState(st) && len(ents) == 0 {
        return nil
    }

    mustSync := raft.MustSync(st, w.state, len(ents))

    // TODO(xiangli): no more reference operator
    for i := range ents {
        if err := w.saveEntry(&ents[i]); err != nil {
            return err
        }
    }
    if err := w.saveState(&st); err != nil {
        return err
    }

    curOff, err := w.tail().Seek(0, io.SeekCurrent)
    if err != nil {
        return err
    }
    if curOff < SegmentSizeBytes {
        if mustSync {
            return w.sync()
        }
        return nil
    }

    return w.cut()
}

可以看出来, Save 做的事情,就是写入一条记录,然后调用 w.sync ,而 w.sync 做的事情就是:

func (w *WAL) sync() error {
    if w.encoder != nil {
        if err := w.encoder.flush(); err != nil {
            return err
        }
    }
    start := time.Now()
    err := fileutil.Fdatasync(w.tail().File)

    took := time.Since(start)
    if took > warnSyncDuration {
        if w.lg != nil {
            w.lg.Warn(
                "slow fdatasync",
                zap.Duration("took", took),
                zap.Duration("expected-duration", warnSyncDuration),
            )
        } else {
            plog.Warningf("sync duration of %v, expected less than %v", took, warnSyncDuration)
        }
    }
    walFsyncSec.Observe(took.Seconds())

    return err

调用了 fileutil.Fdatasync ,而 fileutil.Fdatasync 就是调用了 fsync 这个系统调用保证数据会被写到磁盘。

而快照也是类似的,写入一条记录,然后同步。

func (w *WAL) SaveSnapshot(e walpb.Snapshot) error {
    b := pbutil.MustMarshal(&e)

    w.mu.Lock()
    defer w.mu.Unlock()

    rec := &walpb.Record{Type: snapshotType, Data: b}
    if err := w.encoder.encode(rec); err != nil {
        return err
    }
    // update enti only when snapshot is ahead of last index
    if w.enti < e.Index {
        w.enti = e.Index
    }
    return w.sync()
}

WAL更多的是对多个WAL文件进行管理,WAL文件的命名规则是 $seq-$index.wal 。第一个文件会是 0000000000000000-0000000000000000.wal , 此后,如果文件大小到了64M,就进行一次cut,比如,第一次cut的时候,raft的index是20,那么文件名就会变成 0000000000000001-0000000000000021.wal

WAL就看到这。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK