etcd源码阅读(三):wal
source link: https://jiajunhuang.com/articles/2018_11_24-etcd_source_code_analysis_wal.md.html?amp%3Butm_medium=referral
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
今天来看看WAL(Write-Ahead Logging)。这是数据库中保证数据持久化的常用技术,即每次真正操作数据之前,先往磁盘上追加一条日志,由于日志 是追加的,也就是顺序写,而不是随机写,所以写入性能还是很高的。这样做的目的是,如果在写入磁盘之前发生崩溃,那么数据肯定是没有写入 的,如果在写入后发生崩溃,那么还是可以从WAL里恢复出来。
首先看一下 wal
里有什么:
$ tree . ├── decoder.go ├── doc.go ├── encoder.go ├── file_pipeline.go ├── file_pipeline_test.go ├── metrics.go ├── record_test.go ├── repair.go ├── repair_test.go ├── util.go ├── wal.go ├── wal_bench_test.go ├── wal_test.go └── walpb ├── record.go ├── record.pb.go └── record.proto 1 directory, 16 files
我们先阅读 doc.go
,可以知道这些东西:
w.Save w.Close
我们看看 Save
的实现:
func (w *WAL) Save(st raftpb.HardState, ents []raftpb.Entry) error { w.mu.Lock() defer w.mu.Unlock() // short cut, do not call sync if raft.IsEmptyHardState(st) && len(ents) == 0 { return nil } mustSync := raft.MustSync(st, w.state, len(ents)) // TODO(xiangli): no more reference operator for i := range ents { if err := w.saveEntry(&ents[i]); err != nil { return err } } if err := w.saveState(&st); err != nil { return err } curOff, err := w.tail().Seek(0, io.SeekCurrent) if err != nil { return err } if curOff < SegmentSizeBytes { if mustSync { return w.sync() } return nil } return w.cut() }
可以看出来, Save
做的事情,就是写入一条记录,然后调用 w.sync
,而 w.sync
做的事情就是:
func (w *WAL) sync() error { if w.encoder != nil { if err := w.encoder.flush(); err != nil { return err } } start := time.Now() err := fileutil.Fdatasync(w.tail().File) took := time.Since(start) if took > warnSyncDuration { if w.lg != nil { w.lg.Warn( "slow fdatasync", zap.Duration("took", took), zap.Duration("expected-duration", warnSyncDuration), ) } else { plog.Warningf("sync duration of %v, expected less than %v", took, warnSyncDuration) } } walFsyncSec.Observe(took.Seconds()) return err
调用了 fileutil.Fdatasync
,而 fileutil.Fdatasync
就是调用了 fsync
这个系统调用保证数据会被写到磁盘。
而快照也是类似的,写入一条记录,然后同步。
func (w *WAL) SaveSnapshot(e walpb.Snapshot) error { b := pbutil.MustMarshal(&e) w.mu.Lock() defer w.mu.Unlock() rec := &walpb.Record{Type: snapshotType, Data: b} if err := w.encoder.encode(rec); err != nil { return err } // update enti only when snapshot is ahead of last index if w.enti < e.Index { w.enti = e.Index } return w.sync() }
WAL更多的是对多个WAL文件进行管理,WAL文件的命名规则是 $seq-$index.wal
。第一个文件会是 0000000000000000-0000000000000000.wal
,
此后,如果文件大小到了64M,就进行一次cut,比如,第一次cut的时候,raft的index是20,那么文件名就会变成 0000000000000001-0000000000000021.wal
。
WAL就看到这。
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK