Introduction to Distributed Systems and Consensus

    Intro lecture [slides] [video]

Proving the Correctness of Multiprocess Programs1977 – Leslie Lamport’s seminal paper describing liveness and safety properties of distributed systems; includes informal and formal proofs of correctness. – 19 pages

Impossibility of Distributed Consensus with One Faulty Process1983 – Fischer-Lynch-Paterson impossibility result. “The consensus problem involves an asynchronous system of processes, some of which may be unreliable. The problem is for the reliable processes to agree on a binary value. In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process. By way of contrast, solutions are known for the synchronous case, the ‘Byzantine Generals’ problem.” – 9 pages

The Byzantine Generals Problem1982 – Lamport, Shostak, and Pease. Complete description of BGP, its implications upon distributed systems, and solutions. – 20 pages

Byzantine Fault Tolerance Video2016 – Simple and precise explanation of the byzantine generals problem. How many byzantine node failures can a system survive, and how can you build such a system? This is part of a great online series by Chris Colohan. – 26 minutes

CAP Theorem Video2010 – Consistency Availability Partition tolerance (CAP) Theorem proof; this theorem states that only two of these three promises of distributed systems can be satisfied at once – 5 minutes

FLP and CAP aren’t the same thing2012 – Blog post exploring differences between FLP and CAP theorems. – 11 minute read

Proof of Stake FAQ – Guided FAQ, best read in order; superb explanation of CAP, FLP, Nakamoto vs BFT PoS, common attacks, Casper and other variants – 42 minute read

Proof of Stake: How I Learned to Love Weak Subjectivity2014 – How Vitalik’s Slasher solves long-range attacks with exponential subjective scoring and weak subjectivity, along with PoS basics. – 18 minute read

BFT Algorithms

Practical Byzantine Fault Tolerance1999 – “This paper describes a new replication algorithm that is able to tolerate Byzantine faults… Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude.” – 14 pages

The Part-Time Parliament1998 – Original Paxos paper by Leslie Lamport. See his comment on it here. Abstract: “Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistent copies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers. The Paxon parliament’s protocol provides a new way of implementing the state-machine approach to the design of distributed systems.” – 33 pages

Paxos videoJuly 2017 – Chris Colohan’s explanatory presentation of the Paxos algorithm. – 33 minutes

In Search of an Understandable Consensus Algorithm – “Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to (multi-)Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems.” – 18 pages

Federated Consensus

Ripple Consensus Whitepaper2014 – “While several consensus algorithms exist for the Byzantine Generals Problem, specifically as it pertains to distributed payment systems, many suffer from high latency induced by the requirement that all nodes within the network communicate synchronously … we present a novel consensus algorithm that circumvents this requirement by utilizing collectively-trusted subnetworks within the larger network. We show that the “trust” required of these subnetworks is in fact minimal and can be further reduced with principled choice of the member nodes. In addition, we show that minimal connectivity is required to maintain agreement throughout the whole network. The result is a low-latency consensus algorithm which still maintains robustness in the face of Byzantine failures.” – 8 pages

Ripple FAQA short faq on the Ripple set explaining what a validator is, what a validator set is and other questions. – 3 minute read

Stellar Consensus Protocol – “This paper introduces a new model for consensus called federated Byzantine agreement (FBA). FBA achieves robustness through quorum slices—individual trust decisions made by each node that together determine system-level quorums. Slices bind the system together much the way individual networks’ peering and transit decisions now unify the Internet…” – 32 pages

On Worldwide Consensus2015 – Succinct summary of the Stellar Consensus Protocol whitepaper. – 9 minute read

Chain ConsensusChain’s whitepaper section on consensus. – 7 minute read

Ripple vs Stellar – 2017 – A trading article succinctly telling the differences between    Ripple and Stellar. – 5 minute read

Tendermint

Consensus in the Presence of Partial Synchrony1988 – Tendermint is based on a modified version of the protocol presented in this paper by Dwork, Lynch, and Stockmeyer, known as the “DLS Protocol”. – 34 pages

Tendermint Consensus2014 – Original paper describing the mostly asynchronous, BFT consensus protocol with voting power denominated in validator stake (note: protocol slightly outdated, see revisions) – 16 minute read

Tendermint Consensus Overview2017 – Current documentation on the tendermint consensus protocol. – 5 minute read

Specification – Byzantine Consensus Algorithm2017 – Formal Specification of the tendermint consensus algorithm. – 15 minute read

CAP Theorem Twelve Years Later: How the “Rules” Have Changed2012 – Reflective article by the creator of the CAP Theorem, about its interpretation and application to modern distributed systems. – 30 minute read

How the Internet works: Submarine fiber, brains in jars, and coaxial cables2016 – Ars Technica article about the physical infrastructure that makes a global internet possible. Useful information for thinking about global scale network partitions. – 40 minute read

Cosmos whitepaper2017 – Whitepaper for the Cosmos network, which is built on tendermint. Aims to be an “internet of blockchains” allowing complex interaction between existing and future blockchain systems. – 1 hour read

Tendermint: Byzantine Fault Tolerance in the Age of Blockchains2016 – Master’s Thesis by Ethan Buchman, as in-depth as it can get, covers all aspects of Tendermint. – 2-3 hour read

Introducing Casper2015 – Introduction to Casper “the friendly GHOST” (in reference to it borrowing properties of Greedy Heaviest Observed Sub-Tree protocol), the proof of stake protocol to be implemented in Ethereum’s Serenity release – 12 minute read

Vlad Zamfir’s History of Casper Series [Part 1] [Part 2] [Part 3] [Part 4] [Part 5] – 2016 – “the Casper tech story, given as a chronological history of the evolution of the key technology, ideas and language that are involved in Casper research” – 7-10 minutes each

A Proof of Stake Design Philosophy2016 – Vitalik’s core proof of stake principles that Casper is based on, including the cypherpunk spirit, and a balance of social and economic consensus – 7 minute read

Minimal Slashing ConditionsMarch 2017 – Explanation of slashing conditions (i.e. set of unbreakable rules) that allow Casper to achieve economic finality – 14 minute read

Casper, as I understand It2016 – Third party analysis  of Casper compared to Tendermint consensus  – 5 minute read

Understanding Casper  – 2016 – Explanation of Casper PoS Economics – 20 minute read