

How RavenDB uses gossip protocol to replicate documents efficiently in a cluster
source link: https://ayende.com/blog/194082-B/how-ravendb-uses-gossip-protocol-to-replicate-documents-efficiently-in-a-cluster?Key=d519a137-683b-4cf3-b831-18f225076446&utm_campaign=Feed%3A+AyendeRahien+%28Ayende+%40+Rahien%29
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How RavenDB uses gossip protocol to replicate documents efficiently in a cluster
A RavenDB database can reside on multiple nodes in the cluster. RavenDB uses a multi master protocol to handle writes. Any node holding the database will be able to accept writes. This is in contrast to other databases that use the leader/follower model. In such systems, only a single instance is able to accept writes at any given point in time.
The node that accepted the write is responsible for disseminating the write to the rest of the cluster. This should work even if there are some breaks in communication, mind, which makes things more interesting.
Consider the case of a write to node A. Node A will accept the write and then replicate that as part of its normal operations to the other nodes in the cluster.
In other words, we’ll have:
- A –> B
- A –> C
- A –> D
- A –> E
In a distributed system, we need to be prepare for all sort of strange failures. Consider the case where node A cannot talk to node C, but the other nodes can. In this scenario, we still expect node C to have the full data. Even if node A cannot send the data to it directly.
The simple solution would be to simply have each node replicate the data it get from any source to all its siblings. However, consider the cost involved?
- Write to node A (1KB document) will result in 4 replication (4KB)
- Node B will replicate to 4 nodes (including A, mind), so that it another 4KB.
- Node C will replicate to 4 nodes, so that it another 4KB.
- Node D will replicate to 4 nodes, so that it another 4KB.
- Node E will replicate to 4 nodes, so that it another 4KB.
In other words, in a 5 nodes cluster, a single 1KB write will generate 20KB of network traffic, the vast majority of it totally unnecessary.
There are many gossip algorithms, and they are quite interesting, but they are usually not meant for a continuous stream of updates. They are focus on robustness over efficiency.
RavenDB takes the following approach, when a node accept a write from a client directly, it will send the new write to all its siblings immediately. However, if a node accept a write from replication, the situation is different. We assume that the node that replicate the document to us will also replicate the document to other nodes in the cluster. As such, we’ll not initiate replication immediately. What we’ll do, instead, it let all the nodes that replicate to us, that we got the new document.
If we don’t have any writes on the node, we’ll check every 15 seconds whatever we have documents that aren’t present on our siblings. Remember that the siblings will report to us what documents they currently have, proactively. There is no need to chat over the network about that.
In other words, during normal operations, what we’ll have is node A replicating the document to all the other nodes. They’ll each inform the other nodes that they have this document and nothing further needs to be done. However, in the case of a break between node A and node C, the other nodes will realize that they have a document that isn’t on node C, in which case they’ll complete the cycle and send it to node C, healing the gap in the network.
I’m using the term “tell the other nodes what documents we have”, but that isn’t what is actually going on. We use change vectors to track the replication state across the cluster. We don’t need to send each individual document write to the other side, instead, we can send a single change vector (a short string) that will tell the other side all the documents that we have in one shot. You can read more about change vectors here.
In short, the behavior on the part of the node is simple:
- On document write to the node, replicate the document to all siblings immediately.
- On document replication, notify all siblings about the new change vector.
- Every 15 seconds, replicate to siblings the documents that they missed.
Just these rules allow us to have a sophisticated system in practice, because we’ll not have excessive writes over the network but we’ll bypass any errors in the network layer without issue.
Recommend
-
64
简单介绍下集群数据同步,集群监控用到的两种常见算法。Raft算法raft 集群中的每个节点都可以根据集群运行的情况在三种状态间切换:follower, candidate 与 leader。leader 向 follower 同步日志,follower 只从 leader 处获取日志。在节点初始启动时,节点的 raft...
-
43
-
32
昨天的文章写了关于分布式系统中一致性哈希算法的问题,文末提了一下Redis-Cluster对于一致性哈希算法的实现方案,今天来看一下Redis-Cluster和其中的重要概念Gossip协议。 1.Redis Cluster的基本概念 集群版的Redis听...
-
22
今天来看一下Redis-Cluster和其中的重要概念Gossip协议。1.Redis Cluster的基本概念集群版的Redis听起来很高大上,确实相比单实例一主一从或者一主多从模式来说复杂了许多,互联网的架构总是随着业务的发展不断演进的。单实例Redis架构最开始的一主N从加上读写分离...
-
10
Jan 20 2021
-
5
Tuning Spark Applications to Efficiently Utilize Dataproc Cluster
-
4
-
9
Query operators are essential to search and retrieve data from Appwrite. To allow more advanced queries, we have added new operators to the mix with Appwrite 1.3. 🤔 New to Appwrite?
-
6
Replicate documents from SAP S/4HANA to SAP S/4HANA Cloud via SAP BTP Event Mesh in a Two-Tier ERP setup Introduction You would have already seen how data can be replicated from SAP S/4HANA to...
-
5
深入理解 Redis cluster GOSSIP 协议 GOSSIP 是一种分布式系统中常用的协议,用于在节点之间传播信息,维护集群拓扑结构。通过 GOSSIP 协议,Redis Cluster 中的每个节点都与其他节点进行通信,并共享集群的状态信息,最...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK