Anti-Entropy Protocols

简介:

Anti-Entropy Protocols, Gossips

Anti-entropy, or gossip, is an attractive way of replicating state that does not have strong consistency requirements. 
在不需要强一致性的条件下的(或者说, 只需要达到最终一致性), 一种高容错的, 分布式的一致性同步协议. 
为什么叫Anti-entropy? 很久我都不明白反熵的意思, 借用上面引用的说法

Gossip算法又被称为反熵(Anti-Entropy),熵是物理学上的一个概念,代表杂乱无章,而反熵就是在杂乱无章中寻求一致,这充分说明了 Gossip的特点:在一个有界网络中,每个节点都随机地与其他节点通信,经过一番杂乱无章的通信,最终所有节点的状态都会达成一致。每个节点可能知道所有其他节点,也可能仅知道几个邻居节点,只要这些节可以通过网络连通,最终他们的状态都是一致的,当然这也是疫情传播的特点。

 

Let us start our study with the following problem statement:

There is a set of nodes and each data item is replicated to a subset of nodes. Each node serves update requests even if there is no network connection to other nodes. Each node periodically synchronizes its state with other nodes is such a way that if no updates take place for a long time, all replicas will gradually become consistent. How this synchronization should be organized – when synchronization is triggered, how a peer to synchronize with is chosen, what is the data exchange protocol? Let us assume that two nodes can always merge their versions of data selecting a newest version or preserving both versions for further application-side resolution.

问题是什么?

This problem appears both in data consistency maintenance and in synchronization of a cluster state (propagation of the cluster membership information and so on). Although the problem above can be solved by means of a global coordinator that monitors a database and builds a global synchronization plan or schedule, decentralized databases take advantage of more fault-tolerant approach. The main idea is to use well-studied epidemic protocols [7] that are relatively simple, provide a pretty good convergence time, and can tolerate almost any failures or network partitions. Although there are different classes of epidemic algorithms, we focus on anti-entropy protocols because of their intensive usage in NoSQL databases.

Anti-entropy protocols assume that synchronization is performed by a fixed schedule – every node regularly chooses another node at random or by some rule and exchanges database contents, resolving differences. There are three flavors of anti-entropy protocols: push, pull, and push-pull. The idea of the push protocol is to simply select a random peer and push a current state of data to it. In practice, it is quite silly to push the entire database, so nodes typically work in accordance with the protocol which is depicted in the figure below.

这个问题当然可以通过global coordinator来解决, 但是decentralized设计可以提供more fault-tolerant approach的设计.

其实算法很简单, 就是epidemic protocols, 这儿选了在Nosql中广泛应用的anti-entropy protocols.

image

Push, 问B你有什么和我不同, B告诉我, 我把不同部分push给B

Pull, 告诉B我有什么, B把我没有的发给我

Push-pull, 把上面两个同时结合做了, 图的下面两条线箭头画反了

 

Anti-entropy protocols provide reasonable good convergence time and scalability. The following figure shows simulation results for propagation of an update in the cluster of 100 nodes. On each iteration, each node contacts one randomly selected peer.

image

One can see that the pull style provides better convergence than the push, and this can be proven theoretically [7]. Also, push has a problem with a “convergence tail” when a small percent of nodes remains unaffected during many iterations, although almost all nodes are already touched. The Push-Pull approach greatly improves efficiency in comparison with the original push or pulls techniques, so it is typically used in practice. Anti-entropy is scalable because the average conversion time grows as a logarithmic function of the cluster size.

Although these techniques look pretty simple, there are many studies [5] regarding performance of anti-entropy protocols under different constraints. One can leverage knowledge of the network topology to replace a random peer selection by a more efficient schema [10]; adjust transmit rates or use advanced rules to select data to be synchronized if the network bandwidth is limited [9]. Computation of digest can also be challenging, so a database can maintain a journal of the recent updates to facilitate digests computing.

怎么衡量Anti-entropy protocols, 当然是通过convergence time,  肯定是Push-pull效率最高


本文章摘自博客园,原文发布日期:2013-04-02

目录
相关文章
|
9月前
|
机器学习/深度学习 TensorFlow 算法框架/工具
交叉验证(Cross-Validation)
交叉验证(Cross-Validation)是一种常用的评估机器学习模型性能的技术。它通过将数据集分为训练集和验证集,并多次重复这个过程,以获得对模型性能的更准确估计。
139 2
|
机器学习/深度学习 编解码 算法
Self-Training using Selection Network for Semi-supervised Learning
Self-Training using Selection Network for Semi-supervised Learning
116 0
Self-Training using Selection Network for Semi-supervised Learning
|
SQL PyTorch 算法框架/工具
pytorch损失函数binary_cross_entropy和binary_cross_entropy_with_logits的区别
binary_cross_entropy和binary_cross_entropy_with_logits都是来自torch.nn.functional的函数
1398 0
|
数据可视化 算法 数据挖掘
Evaluation of Clustering|学习笔记
快速学习 Evaluation of Clustering
126 0
Evaluation of Clustering|学习笔记
|
机器学习/深度学习 自然语言处理 数据挖掘
Re7:读论文 FLA/MLAC/FactLaw Learning to Predict Charges for Criminal Cases with Legal Basis
Re7:读论文 FLA/MLAC/FactLaw Learning to Predict Charges for Criminal Cases with Legal Basis
Re7:读论文 FLA/MLAC/FactLaw Learning to Predict Charges for Criminal Cases with Legal Basis
交叉熵初识-cross entropy
交叉熵初识-cross entropy
126 0
交叉熵初识-cross entropy
|
算法
Signalling entropy: A novel network-theoretical fram
摘要 系统生物学的一个关键挑战是阐明决定细胞表型的基本原理或基本定律。了解如何在癌症等疾病中改变这些基本原则对于将基础科学知识转化为临床进展非常重要。虽然正在取得重大进展,但通过系统生物学方法确定了新的药物靶点和治疗方法,我们仍然缺乏基本系统对某些治疗成功和其他治疗失败的理解。
1210 0
|
SQL
CROSS JOIN
原文:CROSS JOIN 最近在讲到T-SQL查询的Join部分时,一下子没有想起来CROSS JOIN的用法,因为其实平常也确实基本不用到。特意找了一个例子,以供参考 CROSS JOIN又称为笛卡尔乘积,实际上是把两个表乘起来。
1290 0