On the Efficiency of Durable State Machine Replication

abstract

Projects in this category follow a common pattern:

Students study the corresponding paper and try to understand the consensus algorithm's functioning;
They create a proof of concept implementation it using a technology of choice;
They test the consensus algorithm's robustness agains crash and byzantine faults.

There is an implicit step hidden into the procedure above: consensus algorithms are usually aimed at implementing some sort of SMR system (e.g. a replicated DB). Even if the goal of a project in this category is to study & implement a consensus algorithm, students may need to actually use it in order to test it. To do so they will have to create a simple SMR system, like, e.g. a replicated Key-Value store supporting at least two operations: put and get.

Paper abstract:

State Machine Replication (SMR) is a fundamental technique for ensuring the dependability of critical services in modern internet-scale infrastructures. SMR alone does not protect from full crashes, and thus in practice it is employed together with secondary storage to ensure the durability of the data managed by these services. In this work we show that the classical durability enforcing mechanisms – logging, checkpointing, state transfer – can have a high impact on the performance of SMRbased services even if SSDs are used instead of disks. To alleviate this impact, we propose three techniques that can be used in a transparent manner, i.e., without modifying the SMR programming model or requiring extra resources: parallel logging, sequential checkpointing, and collaborative state transfer. We show the benefits of these techniques experimentally by implementing them in an open-source replication library, and evaluating them in the context of a consistent key-value store and a coordination service.

keywords

BFT

references

http://www.di.fc.ul.pt/~bessani/publications/usenix13-dsmr.pdf
https://github.com/bft-smart/library
http://www.di.fc.ul.pt/~bessani/publications/edcc12-modsmart.pdf
http://www.di.fc.ul.pt/~bessani/publications/dsn14-bftsmart.pdf

Overviews and surveys are available in order to perform a quick evaluation and comparison among the protocols above:
https://arxiv.org/pdf/1707.01873.pdf
https://infoscience.epfl.ch/record/121590/files/TR-700-2009.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8014672
https://www.cs.unc.edu/~reiter/papers/1998/DC.pdf

Finally, the following readings may of interest for who is interest in understanding the many impossibility results affecting the distributed consensus:
https://www.the-paper-trail.org/post/2008-08-13-a-brief-tour-of-flp-impossibility/
https://people.eecs.berkeley.edu/~luca/cs174/byzantine.pdf