Decentralized Replication Using CRDTs In Systems with Limited Connection and Bandwidth
Typ
Examensarbete för masterexamen
Program
Publicerad
2021
Författare
Johansson, Fredrik
Perzon, André
Modellbyggare
Tidskriftstitel
ISSN
Volymtitel
Utgivare
Sammanfattning
Distributed storage systems use replication to deal with problems such as fault tolerance,
availability of data, lowering latencies and scalability. However, a prominent
problem with distributed systems is how to keep replicas consistent during partitioning
where ad-hoc solutions have proven to be error prone. For applications that do
not require strict consistency, Conflict-Free Replicated Data Types (CRDTs) can be
used. CRDTs provide a theoretically sound approach to replicating data under the
eventual consistency model. Strict consistency is traded for improved availability,
better response time and support for disconnected operations. Simple mathematical
properties are used to solve inconsistencies between replicas that may occur due to
the eventual consistency.
The State-Based CRDT provides full availability, is commutative, associative and
idempotent making it suitable for applications on unreliable networks. The State-
CRDT sends its whole state straining the network as the state increase. To deal with
the weakness of the State-CRDT, Delta-CRDT was proposed. The Delta-CRDT has
the same properties as the State-CRDT but splits states into deltas, limiting the load
on the network. The Delta-CRDT is more complex to implement than State-CRDT
and is more vulnerable to message loss and requires more communication overhead.
This research aims to examine the viability of replicating data by applying CRDTs
in the same environment as an existing centralized system. The questions answered
in this project are: Can CRDTs be used in a real-life environment with large data
sizes and a low number of updates on a low bandwidth network with connection
losses? Which type of CRDTs are suitable for this environment? How does the
performance of the CRDT systems compare to the existing system? What are the
trade offs between the different types of systems? The metrics used to evaluate
the systems are operation latency, message latency, time to reach consistency, bytes
sent and tolerance to message loss by looking at implemented quality arrays. Testing
was conducted on a virtual network with data collected from a real-life scenario in
which vehicles operating in a mine synchronize their databases through a centralized
system on an unreliable network with low bandwidth.
The results show that the State-Based CRDT system is not viable due to flooding
the network with too many and too large messages. The Delta-State CRDT system
perform better than the existing system in most metrics and is a viable replacement
for the existing system if availability is prioritized.
Beskrivning
Ämne/nyckelord
CRDT , Distributed Systems , Centralized , Decentralized , Replication , Strong Eventual Consistency