In order to further explore the capabilities and limitations of Go, I thought it would be interesting to try implementing something that was practical, non-trivial, and of interest on its own. With that in mind, we're going to spend the next few posts creating an implementation of the Kademlia DHT in Go.
A DHT, or Distributed Hash Table is a peer-to-peer networking primitive that, in its most basic formulation, permits storage and lookup of key, value pairs - that is, it's a hash table that's distributed across many nodes. The name is not entirely accurate for some newer DHTs, as some formulations permit many other operations besides those focused around data storage and retrieval.
Kademlia is a good example of a basic DHT, because unlike some competing algorithms, it's extremely simple. There are no explicit routing update messages, and the internal state it maintains is fairly straightforward and easy to understand. Lookups are also accomplished in an obvious - yet very efficient - manner. In return for this simplicity, Kademlia sacrifices a few of the features of competitors like Pastry and Chord - it's not as practical to implement other primitives such as pubsub over it.
The reference we're going to use for building our implementation is XLattice's excellent protocol spec. This covers the important facets of implementing a Kademlia DHT, while being easier to read than the original paper (PDF).
The central concept of any DHT is the Node ID. In the case of Kademlia, this is a 160 bit opaque identifier, used to identify nodes on the network, as well as content hashes. Let's define it in Go, using Go's handy ability to extend any arbitrary type:
constIdLength=20 type NodeID[IdLength]byte
In order to create NodeIDs, we need a couple of functions - one to decode them from a hex string, and one to generate a new random ID:
func NewNodeID(data string)(ret NodeID){ decoded, _ := hex.DecodeString(data); for i :=0; i <IdLength; i++{ ret[i]= decoded[i]; } return; } func NewRandomNodeID()(ret NodeID){ for i :=0; i <IdLength; i++{ ret[i]= uint8(rand.Intn(256)); } return; }
This should be fairly self explanatory so far. Note we're using Go's support for named return arguments in order to avoid the need to declare one of our own in both of these functions. We're also making use of theencoding/hex and rand modules from the Go standard library.
We should also define a function to convert a Node ID back into hex, for printing and so forth:
func (node NodeID)String()string{ return hex.EncodeToString(node[0:IdLength]); }
Since this function has the correct signature, our NodeID struct now implements the fmt.Stringer interface, meaning that our String function will be used to convert our NodeID for display any time we use Printf or similar functions on it.
We should also define a couple of common operations on our NodeID - equality and ordering:
func (node NodeID)Equals(other NodeID)bool{ for i :=0; i <IdLength; i++{ if node[i]!= other[i]{ returnfalse; } } returntrue; } func (node NodeID)Less(other interface{})bool{ for i :=0; i <IdLength; i++{ if node[i]!= other.(NodeID)[i]{ return node[i]< other.(NodeID)[i]; } } returnfalse; }
Together, these methods define a well-ordering for NodeIDs. NodeIDs are big-endian, with the most significant byte being the low-order one. This will allow us to sort NodeIDs, which will be important later.
All DHTs rely on having a 'distance metric' - a way of comparing two NodeIDs or hashes to determine how far apart they are. Kademlia uses the 'XOR metric': The distance between two NodeIDs is the XOR of those two NodeIDs, interpreted as a number. For that, we'll need to define an Xor method for our NodeIDs:
func (node NodeID)Xor(other NodeID)(ret NodeID){ for i :=0; i <IdLength; i++{ ret[i]= node[i]^ other[i]; } return; }
Now that we've defined our NodeID thoroughly, we can start on the routing table implementation. To enable efficient traversal through a DHT, a routing table needs to contain a selection of nodes both close to and far away from our own node. Kademlia accomplishes this by breaking up the routing table into 'buckets', with each 'bucket' corresponding to a particular range of distances between the nodes stored in that bucket and ourselves. Bucket 0 contains nodes that differ in the first bit, bucket 1 contains nodes that differ in the second bit, and so forth.
This exponential sizing of buckets has a couple of implications. First, fully half the nodes in the DHT should be expected to end up in bucket 0; half of the remainder in bucket 1, and so forth. This means that we should have a complete set of all the nodes nearest us, gradually getting sparser over increasing distance. This is necessary in order to ensure we can always find data if it exists in the DHT. The second implication of this approach is that the number of the bucket a given node should be placed in is determined by the number of leading 0 bits in the XOR of our node ID with the target node ID, which makes for easy implementation. Let's add a function to our NodeID struct to facilitate this:
func (node NodeID)PrefixLen()(ret int){ for i :=0; i <IdLength; i++{ for j :=0; j <8; j++{ if(node[i]>> uint8(7- j))&0x1!=0{ return i *8+ j; } } } returnIdLength*8-1; }
Now we can begin defining our routing table proper:
constBucketSize=20; type Contactstruct{ id NodeID; } type RoutingTablestruct{ node NodeID; buckets [IdLength*8]*list.List; } func NewRoutingTable(node NodeID)(ret RoutingTable){ for i :=0; i <IdLength*8; i++{ ret.buckets[i]= list.New(); } ret.node = node; return; }
First, we define a Contact struct, which contains the data we need to store in the routing table. For now, this is just the Node ID, but a practical implementation will also need to store some way of contacting the node in question, such as an IP address and port. We'll come back to that in future posts.
Our RoutingTable struct stores the current node's ID, and an array of list instances, one for each bit in the ID. The list package implements doubly-linked lists, and includes convenient methods on the list, such asMoveToFront.. We'll make use of this in our Update method, which takes a Contact and updates the routing table with it:
func (table *RoutingTable)Update(contact *Contact){ prefix_length := contact.id.Xor(table.node.id).PrefixLen(); bucket := table.buckets[prefix_length]; element := iterable.Find(bucket, func(x interface{})bool{ return x.(*Contact).id.Equals(table.node.id); }); if element ==nil{ if bucket.Len()<=BucketSize{ bucket.PushFront(contact); } // TODO: Handle insertion when the list is full by evicting old elements if // they don't respond to a ping. }else{ bucket.MoveToFront(element.(*list.Element)); } }
This function starts by finding the appropriate bucket, then using the iterable module's Find method to locate the contact inside the bucket if it already exists. If it does, it moves the contact to the front of the bucket. This behaviour is an important part of Kademlia's robustness: Nodes that have been around for a long time are more likely to remain in the DHT than new nodes, so Kademlia has a large preference for established nodes. If the node already in the bucket, it's appended to the end if there's room. If there's not, it's simply discarded.
The TODO here is also important: If adding our contact to the bucket would make the bucket too full, we should asynchronously ping the least-recently-seen element in the bucket (the one at the back of the list). If it doesn't respond, we should evict it from the bucket.
Finally, we need a way to query the DHT, returning the closest nodes to a particular hash or ID. We do this by first finding the bucket that the queried ID would fall into and adding that bucket's elements to the values to return. We then work outward from there, adding elements from the buckets on either side, until our list contains at least enough elements or we run out of buckets to check. Then, we sort the list of results, and return those closest to the original ID. Here's the implementation:
type ContactRecordstruct{ node *Contact; sortKey NodeID; } func (rec *ContactRecord)Less(other interface{})bool{ return rec.sortKey.Less(other.(*ContactRecord).sortKey); } func copyToVector(start,end*list.Element, vec *vector.Vector, target NodeID){ for elt := start; elt !=end; elt = elt.Next(){ contact := elt.Value.(*Contact); vec.Push(&ContactRecord{contact, contact.id.Xor(target)}); } } func (table *RoutingTable)FindClosest(target NodeID, count int)(ret *vector.Vector){ ret =new(vector.Vector).Resize(0, count); bucket_num := target.Xor(table.node.id).PrefixLen(); bucket := table.buckets[bucket_num]; copyToVector(bucket.Front(),nil, ret, target); for i :=1;(bucket_num-i >=0|| bucket_num+i <IdLength*8)&& ret.Len()< count; i++{ if bucket_num - i >=0{ bucket = table.buckets[bucket_num - i]; copyToVector(bucket.Front(),nil, ret, target); } if bucket_num + i <IdLength*8{ bucket = table.buckets[bucket_num + i]; copyToVector(bucket.Front(),nil, ret, target); } } sort.Sort(ret); if ret.Len()> count { ret.Cut(count, ret.Len()); } return; }
Note that sorting the vector is a simple matter of calling .Sort() on it; this is thanks to our use of the 'ContactRecord' struct type, which has a Less() method that compares items based on their distance to the target. Also note the conditional call to Cut() to eliminate excess elements; this is conditional due to a current bug in the Go library that causes it to expand a Vector if you call Cut with a start index larger than the current dimensions.
That's it for the implementation of our routing table; you can see the entire code, with some basic unit tests,here. In the next post, we'll go over Go's support for RPCs, and make use of it to begin implementing the RPCs that form the basis of our DHT.
相关推荐
Implementing Cisco IP Telephony and Video, Part 1 (CIPTV1), Foundation Learning Guide(3rd) 英文mobi 第3版 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索...
This will let us cover a fairly broad range of language design and LLVM-specific usage issues, showing and explaining the code for it all along the way, without overwhelming you with tons of details ...
VMware vRealize Automation Handbook Implementing Cloud Management in the Enterprise Environment 英文azw3 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索...
#### 1. 概述 本教程详细介绍了如何在客户修改(CMOD)环境中实施业务添加接口(BAdI),并从中获得BAdI所带来的灵活性,同时避免与CMOD相关的所有限制。 #### 2. SAP 增强功能概述 SAP提供了多种增强功能,用于...
Implementing Cisco IP Telephony and Video, Part 1 (CIPTV1), Foundation Learning Guide(3rd) 英文epub 第3版 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索...
Data Governance is the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archiving and deletion of information....
在微软实施零信任安全模型,Implementing a Zero Trust security model at Microsoft
Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R by V Kishore Ayyadevara Bridge the gap between a high-level understanding of how an algorithm works and...
Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R by V Kishore Ayyadevara Bridge the gap between a high-level understanding of how an algorithm works and...
Implementing Cisco IP Telephony and Video, Part 1 (CIPTV1), Foundation Learning Guide(3rd) 英文无水印原版pdf 第3版 pdf所有页面使用FoxitReader、PDF-XChangeViewer、SumatraPDF和Firefox测试都可以打开 ...
本书《70-463 Implementing a Data Warehouse with Microsoft SQL Server 2012》是一本针对数据仓库考试的认证书籍,专门为希望通过70-463认证考试的读者准备。该认证考试是针对Microsoft SQL Server 2012中的数据...
VMware vRealize Automation Handbook Implementing Cloud Management in the Enterprise Environment 英文epub 本资源转载自网络,如有侵权,请联系上传者或csdn删除 查看此书详细信息请在美国亚马逊官网搜索...
1. **配置Kerberos基础设施**:这一步骤通常包括设置Kerberos域、创建Kerberos服务主体、配置Kerberos密钥库等。 2. **集成Kerberos与WebSphere**:需要在WebSphere应用服务器上配置Kerberos相关的安全性策略,...
system, without written permission from the publisher, except for the inclusion of brief quotations in a review. Printed in the United States of America First Printing June 2010 Library of Congress ...
《Implementing Useful Algorithms in C++》是Dmytro Kedyk撰写的一本关于C++编程中实现实用算法的书籍。本书旨在帮助读者深入理解如何在C++中有效地编写和优化算法,提高编程技能。 首先,书中提到算法设计的基本...
A class for implementing a thread with a message pump on it. There is an example derived class and an example MFC application. The class itself does not require MFC执行一个弹出消息的线程
TheAlgorithms/C++ 1.0.0 - All the algorithms implemented in C++ # 概述 - 这是一个开源实现的集合,包含各种用C++实现的算法,并在MIT许可证下授权。这些算法涵盖了计算机科学、数学和统计学、数据科学、机器...
Implementing OpenShift will walk the reader through how to easily develop and deploy upon an open source OpenShift Platform-as-a-Service. We will then discuss the architecture of the platform so that ...
Welcome to Planning, Implementing, and Maintaining a Microsoft Windows Server 2003 Active Directory Infrastructure (70-294), a part of the Microsoft Official Academic Course (MOAC) series. Through ...
T he median filter is a popular image processing technique for removing salt and pepper (“shot”) noise from images. With this technique, the eight direct neigh- bors and center point of a sliding 3-...