Databases replication in Distributed system

Data Replication

Replication

Replication refers to keeping multiple copies of the data at various nodes (preferably geographically distributed) to achieve availability, scalability, and performance. Replication is generally straightforward when the data being replicated does not require regular modifications. The primary challenge with replication becomes evident when there is a need to manage alterations in the replicated data over a period of time.

How do we assure the consistency for all replicas?
How do we deal with the replica failure?
Synchronously or asynchronously?
- Data lagging issue with asynchronous.
Concurrency
Consistency model for end programmer

Synchronous vs. Asynchronous Replication

Synchronous replication involves the primary node waiting for acknowledgments from secondary nodes after data updates. It only signals success to the client after receiving acknowledgment from all secondary nodes. However, asynchronous replication is different; the primary node doesn't wait for acknowledgments from secondary nodes and indicates success to the client after updating itself.

Synchronous replication's advantage is that it ensures secondary nodes are entirely synchronized with the primary node. But there's a downside: if a secondary node doesn't respond due to failure or network issues, the primary node can't signal success to the client until it gets a successful acknowledgment from the faulty node, leading to high latency in response.

Conversely, asynchronous replication allows the primary node to continue operations even if all secondary nodes are down, which is its main advantage. But it has a disadvantage: if the primary node crashes, any write operations not yet replicated to the secondary nodes will be permanently lost.

Replication models

Central/Single leader
Multiple leader
Peer-to-peer (No leader)

Single leader

The primary-secondary replication model is suit for read-heavy workload. We can evenly distribute the read load across all available secondary nodes.

Nonetheless, replicating data across numerous follower nodes can lead to a primary node becoming a bottleneck. Moreover, primary-secondary replication may not be suitable for workloads that are heavily focused on write operations.

An additional benefit of primary-secondary replication is its resilience to read operations. In the event of a primary node failure, secondary nodes can still process read requests. Therefore, it's a beneficial strategy for applications that are read-intensive.

However, this method of replication can lead to data inconsistency, especially if asynchronous replication is in use. Clients accessing different replicas might encounter inconsistent data if the primary node fails before it can propagate updated data to the secondary nodes. Thus, if the primary node experiences a failure, any updates that weren't transferred to the secondary nodes could be lost.

What happened when the primary node fails?
Manual approach. The operator decides which node should be the new primary node.
Automatic approach. All secondary nodes will find out / elect a new primary node. AKA leader election.

Methods

Statement-based replication

The primary node stores all SQL statements that it executes, like insert, delete, etc,. The MySQL was used this before v5.1.

The biggest cons are:

For any nondeterministic function like NOW(), different nodes might have different result.
If a write statement is dependent on a prior write, and both of them reach the secondary nodes in the wrong order, it may cause error.

Write-ahead log (WAL)

This is a commonly used technique in many databases that require durability. The primary node stores a log file (WAL) then uses these logs to copy the data onto the secondary nodes. The WAL stores all the information needed to redo or undo a transaction. It records the result of the transaction, not the SQL statement.

Yet, the problem is, it only defines data at a very low level. It is tightly coupled with the inner structure and the database engine, which makes upgrading the nodes complicated.

Logical (row-based) log

The secondary nodes replicate the actual data changes. The binary log records change to database tables on the primary node at the record level. To create a replica of the primary node, the secondary node reads this data and changes its records accordingly. Row-based replication doesn’t have the same difficulties as WAL because it doesn’t require information about data layout inside the database engine. But it is costly of course.

Multi-leader replication

In this scenario, multiple leader to serve the client to ensure the write performance (bottle neck). Each leader will communicate/propagate the changes to other primary nodes. And they will propagate the changes to the secondary nodes as well. The secondary nodes can be divided into groups and assign to different primary node.

For the topology, most companies use all-to-all topology to ensure the safety if one of the nodes fails.

How to handle the Conflict?

If two primary nodes write the same data at the same time with different value, a conflict occurs.

Conflict avoidance

Make the application able to verify that all writes for a given record goes to the same leader.

However, if a user moves to a different location, a conflict may still occur.

Last-write wins

The update with the latest timestamp will be selected. But the clock synchronization across nodes is still challenging.

Custom logic

The developer can create their own logic to handle the conflict.

Peer-to-peer replication

All the nodes have equal weightage and can accept reads and writes requests. Amazon popularized such a scheme in their DynamoDB data store. primary-secondary replication, this replication can also yield inconsistency. A helpful approach is called quorums.

Quorums

This is like a math ideology. Suppose we have n replicated nodes, if we have at least w nodes to be considered write success, we must read at least r nodes to guarantee we read the updated data as long as w+r > n.