Accountability for Distributed Systems

Overview | Downloads | Papers | People

Motivation: Faults in distributed systems

Finding faults in distributed systems is difficult. Imagine a large-scale distributed system, like the Internet's routing infrastructure or the international banking network. A system like this can consist of thousands of nodes, run by hundreds of different organizations. Suppose we suspect that some of the nodes may have become faulty - maybe the hardware is malfunctioning, or the nodes have been misconfigured, or a hacker has compromised them and has changed the software they are running. However, we do not know which nodes are faulty, or what the symptoms of the fault are. How can we deal with such a situation?

There are three important problems we must solve:

Our approach: Accountability

We are exploring accountability as a new approach to this problem. In an accountable system, all actions - such as the transmission of a message - are cryptographically linked to the node that performs them, and the system maintains a secure record of past actions. Nodes can use this record to check each other's actions for correctness. Using this approach, it is possible to guarantee that any fault that is observable by a correct node is eventually detected. Moreover, at least one node obtains irrefutable evidence that is linked to a faulty node. This evidence can then be used to convince other nodes (to isolate the faulty node) or a human administrator (to repair the node). At the same time, a correct node can always defend itself against false accusations.

Accountability has both technical and non-technical benefits. On the technical side, accountability provides reliable information about faults, which can be used to build more resilient systems. Because accountability works for such a general class of faults, it can be used as a `safety net' that works even when other techniques fail. On the non-technical side, the ability to inspect a principal's record helps to build trust and reputation. Also, the very presence of an accountability system discourages some types of faults, such as rational/selfish behavior, since all faults are guaranteed to be detected eventually.


We are working on several projects related to accountability:


Latest PeerReview source code (v1.0.8):

We are currently adding PeerReview support to FreePastry.





Andreas Haeberlen
Rodrigo Rodrigues
Petr Kuznetsov
Peter Druschel

Imprint | Data Protection