forked from abrosen/thesis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro.tex
433 lines (310 loc) · 29.6 KB
/
intro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
\chapter{Introduction}
\label{chapter:intro}
% % % layout
% % % Distributed Computing Challenges
% % % Qualities of DHTs
% % % Hypothesis = These Problems + These qualiteies -> solution
% % % Framework of what these solutions are and what they can do
Distributed Hash Tables (DHTs) are protocols and frameworks used by peer-to-peer (P2P) systems.
They are used as the organizational backbone for many P2P file-sharing systems due to their scalability, fault-tolerance, and load-balancing properties.
These same properties are highly desirable in a distributed computing environment, especially one that wants to use heterogeneous components.
We will show that DHTs can be used not only as the framework to build a P2P file-sharing service, but a more generic distributed computing platform.
% What do I want to do?
\section{Objective}
Our goal is to create a framework to further generalize Distributed Hash Tables (DHTs) to be used for distributed computing.
Distributed computing platforms need to be scalable, fault-tolerant, and load-balancing.
%The ability to incorporate heterogeneous hardware is a definite benefit.
We will discuss what each of these mean and why they are important in section \ref{sec:challenges}, but briefly:
\begin{itemize}
\item The system should be able to work effectively no matter how large it gets.
As the system grows in size, we can expect the overhead to grow in size as well, but at an extremely slower rate.
\item The more machines integrated into the system, the more we can expect to see hardware failures.
The system needs to be able to automatically handle these hardware failures.
\item Having a large number of machines to use is worthless if the amount of work is divided unevenly among the system.
The same is true if the system hands out larger jobs to less powerful machines or smaller jobs to the more powerful machines.
%\item We cannot assume that we be able to replace our broken machines with exact replicas, nor do we assume we would want to.
\end{itemize}
These are many of the same challenges that Peer-to-peer (P2P) file sharing applications have.
Many P2P applications use DHTs to address these challenges, since DHTs are designed with these problems in mind.
We propose that DHTs can be used to create P2P distributed computing platforms that are completely decentralized.
%Rather than keys being assigned to some data, we can assign keys to tasks and automatically distribute those tasks to the responsible nodes
There would be no need for some central organizer or scheduler to coordinate the nodes in the network.
Our framework would not be limited to only a P2P context, but could be applied in data centers, a normally centrally organized context.
A successful DHT-based computing platform would need to address the problem of dynamic load-balancing.
This is currently an unsolved\footnote{As far as we know.} problem. %I have to check a couple of papers of to confirm.
If an application can dynamically reassign work to nodes added at runtime, this opens up new options for resource management.
Similarly. if a distributed computation is running too slow, new nodes can be added to the network during runtime or idle nodes can boot up more virtual nodes. %(now that I think of it this is two different but highly related problems: internal and external).
%Move this bit to the end?
Chapter \ref{chapter:background} will delve into how DHTs work and examine specific DHTs.
The remainder of the dissertation will then discuss the work we have completed and plan on doing to demonstrate the viability of using DHTs for distributed computing and other non-traditional tasks.
% How I will do it is in the experiments chapter
% Why should you care?
\section{Applications of Distributed Hash Tables}
Distributed Hash Tables have been used in numerous applications:
\begin{itemize}
\item \textit{P2P file sharing} is by far the most prominent use of DHTs.
The most well-known application is BitTorrent \cite{bittorrent}, which is built on Mainline DHT \cite{mainline}.
\item DHTs have been used for \textit{distributed storage} systems \cite{CFS}.
\item \textit{Distributed Domain Name Systems} (DNS) have been built upon DHTs \cite{cox2002serving} \cite{pappas2006comparative}.
Distributed DNSs are much more robust that DNS to orchestrated attacks, but otherwise require more overhead.
%\item Distributed search (faroo)
\item DHT was used as the name resolution layer of a large \textit{distributed database} \cite{Mateescu2011440}.
\item Distributed \textit{machine learning} \cite{liparameter}.
\item Many \textit{botnets} are now P2P based and built using well established DHTs \cite{saad2011detecting}.
This is because the decentralized nature of P2P systems means there is no single vulnerable location in the botnet.
\item \textit{Live video streaming} (BitTorrent Live) \cite{mol2009design}.
\end{itemize}
We can see from this list that DHTs are primarily used in P2P applications, but other applications, such as botnets, use DHTs for their decentralization.
We want to use DHTs primarily for their intuitive way of organizing a distributed system.
Our goal was to further extend the use of DHTs.
In previous work \cite{chordreduce}, we showed that a DHT can be to create a distributed computing framework.
We used the same mechanism used in P2P applications that assigns nodes their location in the network to evenly distribute work among members of a DHT.
The most direct application of a DHT distributed computing framework is a quick and intuitive way to solve embarrassingly parallel problems, such as:
\begin{itemize}
\item Brute force cryptography.
\item Genetic algorithms.
\item Markov chain Monte Carlo methods.
\item Random forests.
\item Any problem that could be phrased as a MapReduce problem.
\end{itemize}
Unlike the current distributed applications that utilize DHTs, we want to create a complete framework that can be used to build decentralized applications.
We have found no existing projects that provide a means of building your own DHT or DHT based applications. %without a given DHT in mind at least
% So you're diong these things with this tool
\section{Why Use Distributed Hash Tables in Distributed Computing}
% Okay so this is all great, but what's special about a DHT
% First, let's talk about the problems with distributed computing
Using Distributed Hash Tables for distributed computing is not necessarily the most intuitive step.
To understand why we want to use DHTs for distributed computing, we will first examine some of the more prominent challenges in distributed computing.
\subsection{General Challenges of Distributed Computing}
\label{sec:challenges}
As we mentioned earlier, distributed computing platforms need to be scalable, fault-tolerant, and load-balancing.
We will look at these individually:
\begin{description}
\item[Scalability] - Distributed computing platforms should not be completely static and should grow to accommodate new needs.
However, as systems grow in size, the cost of keeping that system organized grows too.
The challenge of scalability is designing a protocol that grows this organizational cost at an extremely slow rate.
For example, a single node keeping track of all members of the system might be a tenable situation up to a certain point, but eventually, the cost becomes too high for a single node.
%not sure about this line
We want this organizational cost spread among many nodes to the point where this cost is insignificant. %
\item[Fault Tolerance]
The quality of fault-tolerance or \textit{robustness} means that the system still works even after a component breaks (or many components break).
We want our platform to gracefully handle failures during runtime and be able to quickly reassign work to other workers.
In addition, the network should be equally graceful in handling the introduction of new nodes during runtime.
\item[Load-Balancing]
The challenge of load balancing is to evenly distribute the work among nodes in the network.
This is always an approximation; rarely are there exactly enough pieces for every node to get the same amount of work.
The system needs an efficient set of rules for dividing arbitrary jobs into small pieces and sending those pieces to the nodes, without incurring a large overhead.
A subproblem here is handling \textit{heterogeneity},\footnote{It could even be considered a problem in its own right.} or how should the system should handle different pieces of hardware with different amounts of computational power.
\end{description}
Note that there is some crossover between these categories.
For example, adding new nodes to the system needs to have a low organizational overhead (scalability) and will change the network configuration, which will need to be updated (fault-tolerance).
%move below
%One question we are particularly interested in answering touches on all three categories: can we do load balancing during run-time?
%A goal we have is t
\subsection{How DHTs Address these Challenges}
%Without getting into the details of what a DHT is, what do they do?
Distributed Hash Tables are essentially distributed lookup tables.
DHTs use a consistent hashing algorithm, such as SHA-1 \cite{sha1}, to associate nodes and file identifiers with keys.
These keys dictate where the nodes and files will be located on the network.
The connections between nodes are organized such that any node can efficiently lookup the value associated with any given key, even though the node only knows a small portion of the network.
We discuss the specifics of this in Chapter \ref{chapter:background}.
Nearly every DHT was designed with large P2P applications in mind, with millions of nodes in the network and new nodes entering and leaving continuously.
%This has lead to DHTs being designed with specific qualities in mind.
\paragraph{Scalability}
The organizational responsibility in DHTs is spread among all members of the network.
Each node only knows a small subset of the network,\footnote{Except for ZHT \cite{li2013zht}, which breaks this rule deliberately by giving each node a full copy of the routing table.} but can use the nodes it knows to efficiently find any other node in the network.
Because each individual node only knows a small part of the network, the maintenance costs associated with organization are correspondingly small.
Using consistent hashing allows the network to scale up incrementally, adding one node at a time \cite{dynamo}.
In addition, each join operation has minimal impact on the network, since a node affects only its immediate neighbors on a join operation.
Similarly, the only nodes that need to react to a node leaving are its neighbors.
Other nodes can be notified of the missing node passively through maintenance or in response to a lookup.
There have been multiple proposed strategies for tackling scalability, and it is these strategies that play the greatest role in driving the variety of DHT architectures.
Each DHT must strike a balance between the size of the lookup table and lookup time.
The vast majority of DHTs choose to use $\lg(n)$ sized tables and $\lg(n)$ hops, where $ n $ is the number of nodes in the network.
Chapter \ref{chapter:background} discusses these tradeoffs in greater detail and how they affect the each DHT.
\paragraph{Fault-Tolerance}
One of the most important assumptions of DHTs is that they are deployed on a constantly changing network.
DHTs are built to account for a high level of \textit{churn}.\footnote{Again, except for ZHT.}
\textit{Churn} is the disruption of routing caused by the constant joining and leaving of nodes.
In other words, the network topology is assumed to always be in flux.
This is mitigated by a few factors.
First, the network is decentralized, with no single node acting as a single point of failure.
This is accomplished by each node in the routing table having a small portion of the both the routing table and the data stored on the DHT.
Second is that each DHT has an inexpensive maintenance processes that mitigates the damage caused by churn.
DHTs often integrate a backup process into their protocols so that when a node goes down, one of the neighboring nodes can immediately assume responsibility.
The join process also slightly disrupts the topology, as affected nodes must adjust their the list of peers they know to accommodate the joiner.
The last property is that the hashing algorithm used to distribute content evenly across the DHT also distributes nodes evenly across the DHT.
This means that nodes in the same geographic region occupy vastly different locations in the network.
If an entire geographic region is affected by a network outage, this damage is spread evenly across the DHT, and can be handled, rather than if a contiguous portion were lost.
%This property is the most important, as it deals with failure of entire sections of the network, rather than a single node.
%Recent research in using DHTs for High End Computing \cite{li2013zht} shows what can happen if we remove this assumption by working with an almost completely static network.
The fault tolerance mechanisms in DHTs also provide near constant availability for P2P applications.
The node that is responsible for a particular key can always be found, even when numerous failures or joins occur \cite{chord}.
\paragraph{Load-Balancing}
Consistent hashing is also used to ensure load-balancing in DHTs.
Consistent hashing algorithms associate nodes and file identifiers with keys.
These keys are generated by passing the identifiers into a hash function, typically SHA-160.
The chosen hash function is typically large enough to avoid hash collisions\footnote{A hash collision occurs when the hashing algorithm outputs the same key for two different inputs. This is so exceedingly rare, that most authors working on DHTs do not even account for this possibility.} and generates keys in a uniform manner.
Essentially, both nodes and data are spread about the network uniformly at random.
Nodes are responsible for the files with keys ``close'' to their own.
What ``close'' means depends on the specific implementation.
For example, ``close'' might mean ``closest without going over'' or ``shortest distance.''
We found defining the meaning of ``close'' equivalent choosing a metric for Voronoi tessellation \cite{vhash}.
However, because this is a random process, not all values are evenly distributed, but enough hash keys yield a close enough approximation.
Heterogenity presents a challenge for load-balancing DHTs due to conflicting assumptions and goals.
DHTs assume that members are usually going to be varied in hardware, but the load-balancing process defined in DHTs treats each node equally.
In other words, DHTs support heterogeneity, but do not attempt to exploit it.
This does not mean that heterogeneity cannot be exploited.
Nodes can be given addition responsibilities manually, by running multiple instances of the P2P application on the same machine or creating more virtual nodes.
We will take advantage of this for distributing the workload automatically.
%However, this is not a feasible option for any kind of truly decentralized system and would need to be done automatically.
%There is no well-known mechanism to that exists to automatically allocate virtual nodes on the fly \footnote{citation needed, although this can be similar to IRM}.
\section{Outline}
In this section, we give a brief overall summary overview of our work, followed by a summary of each individual chapter in the dissertation.
We also provide a list of publications for this work.
\subsection{Summary of Dissertation}
%chord reduce
One of our first projects was to create a distributed computing platform using the Chord DHT \cite{chordreduce}.
Our goal here was to create a completely decentralized distributed computing framework that was fault-tolerant during job execution.
We did this by implementing MapReduce over Chord.
We then tested our prototype's fault-tolerance by executing MapReduce jobs under churn.
Our experiments with excessively high levels of churn created an anomaly in the runtime of our computations.
Under beyond practical levels of experimental churn, we found that our computation was quicker than our experiments without churn.
We hypothesized that this is because the random churn is acting as a (inefficient) process for autonomous load-balancing.
This phenomena is described in detail in Chapter \ref{chapter:auto-balance}, but suggested to us that there was a way to dynamically load-balance during execution.
Our second project was to develop VHash \cite{dgvh} \cite{vhash}, a distributed hash table based on Delaunay Triangulation.
VHash is unique due to the way it could work in multidimensional spaces.
Other DHTs typically use a space with a single dimension and optimize for the number of hops.
VHash can optimize for whatever attributes are used to define the space.
Our experiments showed that VHash outperforms Chord in terms of routing latency.
VHash eventually morphed into a more generic DHT, UrDHT \cite{urdht}, which is the subject of Chapter \ref{chapter:urdht}.
Our third project which analyzed the amount of effort that would be required to attack a DHT using a method known as the Sybil attack \cite{sybil-analysis}.
The Sybil attack \cite{sybil} is a well known attack against distributed systems, but it had not been fully analyzed from the perspective of an attacker.
Our results showed that attackers required relatively few resources to compromise a much larger network.
We believe that some of the components that are used to perform a Sybil attack can be used for autonomous load balancing.
Finally, we studied how we could use churn and intentional injection of benign Sybil nodes into the network to speed up execution of a distributed process or dynamically load-balance a network during runtime.
This process could be done completely autonomously by each node without any control from a centralized source.
In other words, we have developed completely decentralized strategies that a DHT can use to load-balance the network.
\subsection{Chapter Summaries}
The dissertation is divided into distinct, but mutually dependent parts.
Chapter \ref{chapter:background} covers the requisite background material.
It is also suitable as a primer on Distributed Hash Tables.
Each subsequent chapter, summarized below, covers a specific publication.
Chapter \ref{chapter:conclusion}, closes with remarks on our work.
\subsubsection{ChordReduce - DHT Distributed Computing}
We present our first project, ChordReduce \cite{chordreduce}, in Chapter \ref{chapter:chordreduce}.
ChordReduce utilized Chord \cite{chord} to create a distributed computing platform using the MapReduce \cite{mapreduce} platform.
The novel contribution of ChordReduce was it's ability to perform completely decentralized MapReduce operations in either a P2P environment or a datacenter.
Using our created framework, we will create implement and test distributed computing problems on different DHT implementations, such as Chord \cite{chord} and Kademlia \cite{kademlia}.
\subsubsection{DGVH}
Chapter \ref{chapter:dgvh} covers the origin of our Distributed Greedy Voronoi Heuristic (DGVH).
This provides an efficient way to create (with some error) Voronoi Tessellations and the corresponding Delaunay Triangulation.
While DGVH produces only approximates the solution for Voronoi Tessellations, it can do so in any geometric space with distance function in any number of dimensions.
In addition, the computational complexity is independent of the number of dimensions and can be performed locally at each node, rather than requiring global knowledge of the network's state.
This directly leads into Chapter \ref{chapter:urdht}, which using DGVH to abstract DHTs.
\subsubsection{UrDHT -- an Abstract DHT Framework}
In Chapter \ref{chapter:urdht}, we found that DHTs can be mapped to the constructs of Delaunay Triangulation and Voronoi Tessellation.
Thus, creating the topology of a DHT can be done by solving for the appropriate Delaunay Triangulation.
UrDHT is an open source project for creating DHTs using this principle.
%significance is abstractions and our Voronoi definitations
\subsubsection{Distributed Decentralized Domain Name Service}
D$^{3}$NS (Chapter \ref{chapter:d3ns}) is a system to replace the current top level DNS system and certificate authorities, offering increased scalability, security and robustness.
D$^{3}$NS is based on a distributed hash table and utilizes a domain name ownership system based on the Bitcoin blockchain \cite{bitcoin} and addresses previous criticism that a DHT would not suffice as a DNS replacement.
Our system provides solutions to current DNS vulnerabilities such as DDOS attacks, DNS spoofing and censorship by local governments.
It eliminates the need for certificate authorities by providing a decentralized authenticated record of domain name ownership.
Unlike many other DNS replacement proposals, D$^{3}$NS is reverse compatible with DNS and allows for incremental implementation within the current system.
\subsubsection{Sybil Analysis}
Chapter \ref{chapter:sybil} analyses the cost of performing a Sybil attack from the perspective of an adversary.
Previous analyses have all focused on defending from the aforementioned adversary, but little to no work has been performed quantifying the attacker's capabilities.
Our analysis places reasonable constraints on the attacker and evaluates the resources the attacker needs to effectively own a target network.
\subsubsection{Autonomous Load-Balancing}
Chapter \ref{chapter:auto-balance} presents strategies for nodes to balance the workload among members of the DHT.
Load balancing schemes do exist for file storage, but none exist for computation.
What makes our strategies completely different is that they can be executed at runtime.
Furthermore, we wanted to develop a system that takes into account the heterogeneity of a given system, allowing more powerful nodes to take on more responsibility.
Our strategies use churn and a benign variation of the Sybil attack to perform a decentralized redistribution of work.
\subsection{Publications}
\begin{itemize}
\item Andrew Rosen, Brendan Benshoof, Robert W. Harrison, Anu G. Bourgeois ``MapReduce on a Chord Distributed Hash Table'' Poster at IPDPS 2014 PhD Forum \cite{chordreduce}
\item Andrew Rosen, Brendan Benshoof, Robert W. Harrison, Anu G. Bourgeois ``MapReduce on a Chord Distributed Hash Table'' Presentation ICA CON 2014
\item Brendan Benshoof, Andrew Rosen, Anu G. Bourgeois, Robert W. Harrison ``VHASH: Spatial DHT based on Voronoi Tessellation'' Short Paper ICA CON 2014 \cite{vhash}
\item Brendan Benshoof, Andrew Rosen, Anu G. Bourgeois, Robert W. Harrison ``VHASH: Spatial DHT based on Voronoi Tessellation'' Poster ICA CON 2014
\item Brendan Benshoof, Andrew Rosen, Anu G. Bourgeois, Robert W. Harrison ``A Distributed Greedy Heuristic for Computing Voronoi Tessellations With Applications Towards Peer-to-Peer Networks'' IEEE IPDPS 2015 - Workshop on Dependable Parallel, Distributed and Network-Centric Systems \cite{dgvh}
\item Brendan Benshoof, Andrew Rosen, Anu G. Bourgeois, Robert W. Harrison
``Distributed Decentralized Domain Name Service''
IEEE IPDPS 2016 - Workshop on Dependable Parallel, Distributed and Network-Centric Systems
\end{itemize}
The following papers are in progress:
\begin{itemize}
\item Brendan Benshoof, Andrew Rosen, Anu G. Bourgeois, Robert W. Harrison ``UrDHT: A Generalized DHT''
\item Andrew Rosen, Brendan Benshoof, Robert W. Harrison, Anu G. Bourgeois ``The Sybil Attack on Peer-to-Peer Networks From the Attacker's Perspective''
\item Andrew Rosen, Brendan Benshoof, Robert W. Harrison, Anu G. Bourgeois ``Autonomous Load Balancing in Distributed Hash Tables Using Churn and The Sybil Attack''
\item Chaoyang Li, Andrew Rosen, Anu G. Bourgeois ``On Minimum Camera Set Problem in Camera Sensor Networks''
\end{itemize}
Below are publications with other authors not relevant to the work discussed in this dissertation.
\begin{itemize}
\item Erin-Elizabeth A. Durham, Andrew Rosen, Robert W. Harrison
``A Model Architecture for Big Data applications using Relational Databases''
2014 IEEE BigData - C4BD2014 - Workshop on Complexity for Big Data \cite{durham2014model}
\item Chinua Umoja, J.T. Torrance, Erin-Elizabeth A. Durham, Andrew Rosen, Dr. Robert Harrison
``A Novel Approach to Determine Docking Locations Using Fuzzy Logic and Shape Determination''
2014 IEEE BigData - Poster and Short Paper \cite{umoja2014novel}
\item Erin-Elizabeth A. Durham, Andrew Rosen, Robert W. Harrison
``Optimization of Relational Database Usage Involving Big Data''
IEEE SSCI 2014 - CIDM 2014 - The IEEE Symposium Series on Computational Intelligence and Data Mining \cite{durham2014optimization}
\end{itemize}
%\section{Old stuff starts here}
%Distributed Hash Tables (DHTs) are traditionally used as the backbone of structured Peer-to-Peer (P2P) file-sharing applications.
%The largest such application is Bittorrent \cite{bittorrent}, which is built using Mainline DHT \cite{mainline}, a derivative of Kademlia \cite{kademlia}.
%The number of users on Bittorrent ranges from 15 million to 27 million users daily, with a turnover of 10 million users a day \cite{mainlineMeasure}.
%Most research on DHTs assumes that DHTs will be used in the context of a large P2P file-sharing application (or at least, an application \textit{potentially} incorporating millions of nodes).
%This leads the DHT to having particular qualities.
%The application must be scalable and no single node knows every other node in the network.
%The network must be able to handle members joining and leaving arbitrarily.
%The resulting application must be agnostic towards hardware.
%The network must be decentralized and split whatever burden there is equally among its members.
%In other words, distributed hash tables provide scalability, fault-tolerance, and load-balancing to an application.
%Recent applications have leveraged these qualities, since these qualities are desirable in many different frameworks.
%For example, one paper \cite{Mateescu2011440} used a DHT as the name resolution layer of a large distributed database.
%Research has also been done in using DHTs as an organizing mechanism in distributed machine learning \cite{liparameter}.
%e describe each of the aforementioned qualities and their ramifications below in sections \ref{subsec:ft}, \ref{subsec:lb}, \ref{subsec:scalability}, and \ref{subsec:hetero} .
% While these properties are individually enumerated, they are greatly intertwined and the division between their impacts can be somewhat arbitrary.
%\subsubsection{Load Balancing}
%\label{subsec:lb}
%This appears to be a weakness, but can be turned into an advantage in heterogeneous systems by using \textit{virtual nodes} \cite{dynamo} \cite{godfrey2005heterogeneity} .
%When a node joins the network, it joins not at one position, but multiple virtual positions in the network \cite{dynamo}.
%Using virtual nodes allows load-balance optimization in a heterogeneous network; more powerful machines can create more virtual nodes and handle more of the overall responsibility in the network.
%Load balancing is still an active area of research for DHTs.
%DeCandia et al\. discussed various load balancing techniques that were tested on Dynamo \cite{dynamo}.
%Each node was assigned a certain number of tokens and the node would create a virtual node for each token.
%The realization DeCandia et al\. had was that there was no reason to use the same scheme for data partitioning and data placement.
%DeCandia et al\. introduced two new strategies which work off assigning nodes equally sized partitions.
%While most DHTs follow this scheme, there is some minor variation on how keys are generated.
%Some DHTs can or do use geographic information to generate keys.
%In VHash, keys are not static, but move according to a simplified spring model.
%\paragraph{Heterogeneity}
%\label{subsec:hetero}
%A few options present themselves. % and are discuessed in \ref{}
%add the above if the below is moved
%This might be the wrong place for this
%One is to use adapt a request tracking mechanism, such as what is used in IRM, except instead of tracking file requests, it tracks requests that are directed to a particular (real) node.
%If a particular (real) node receives an inordinate amount of requests, the node doing the detecting suggests that the node obtain another token/create another virtual node.
%Another strategy is to use the preference lists/successor predecessor lists, and observe the distribution of the workload, adjusting the virtual nodes based on that.
%Dynamic load balancing may not be essential to P2P file-sharing applications, but is absolutely essential to any kind of P2P distributed computation.
%In our ChordReduce experiments, we observed that just approximating dynamic load-balancing by simulating high levels of churn noticeably improved results\footnote{We found this by accident, just by testing the network's fault tolerance in regards to a high level of churn}.
%\section{Different or subproblem: Certain DHTs are better at one application than another due to differences}
%\subsection{Design Differences Impacts}
%\subsection{Geometries}
%\subsection{Routing Table Construction}
%\subsection{Implementation Differences Impacts}
%\paragraph{Recursive or iterative seek}
%Add these bullets to the above paragraph
%\begin{itemize}
% \item DHTs can use consistent hashing supplemented by virtual nodes to efficiently load-balance.
% The larger the network grows, the more evenly distributed the load becomes.
% \item DHTs are highly resilient to damage and can handle abnormally high rates of disruption.
% This is extremely desirable in any kind of distributed application %
% \item Large-scale P2P file sharing applications have been using DHTs for a long time and
% \item DHTs are extremely good if your problem is embarrassingly par
% \item Heterogeneity
%\end{itemize}