Replication & Eventual Consistency

Suppose your whole application depends on a single database server. The moment that one machine dies, loses a disk, or needs maintenance, your entire site goes down with it. And even while it's healthy, every read and every write in the system has to funnel through that one box — so as traffic grows, it becomes the bottleneck everyone is waiting on.

Replication is the answer to both problems. Instead of trusting one copy of your data, you keep several identical copies on different machines and keep them in sync. If one machine fails, another can take over. If reads pile up, you can spread them across the copies. This is one of the foundational tricks behind almost every large-scale system.

Why replicate at all

There are three big reasons to keep multiple copies of the same data. The first is availability and failover: if the primary database crashes, a replica that already holds the data can be promoted to take its place, so the system keeps running instead of going dark. The second is scaling reads: most applications read far more often than they write, and by letting many replicas answer read queries you multiply your read capacity without overloading a single machine.

The third reason is putting data closer to users. If your customers are spread across the globe, a single database in one region means everyone outside that region waits on a long network round-trip. By placing replicas in different regions, a reader in Tokyo can hit a nearby copy instead of crossing an ocean to reach a server in Virginia, cutting latency dramatically.

Leader and follower

The most common arrangement is leader/follower replication, also called primary–replica. One node is designated the leader (the primary), and it is the only one allowed to accept writes. Whenever a client changes data, the change is applied on the leader and then sent out to every follower (replica), which applies the same change to its own copy.

Reads, on the other hand, can be served by any node — the leader or any of the followers. This split is what makes the pattern so useful: writes stay simple because there's a single authoritative copy to coordinate them, while reads scale out across all the replicas. The diagram below shows how a single write to the primary fans out to the replicas, and how reads can land on any of them.

How it works

Picture a client sending a write to the primary database. The primary records the change in its own storage, then forwards that change to each of its replicas over the network. The replicas replay the change so their copies match. Meanwhile, read requests from clients are spread across all the copies — some hit the primary, but many are answered by the replicas, taking load off the leader.

The crucial detail is timing. This propagation is usually asynchronous: the primary confirms the write to the client immediately, before all the replicas have caught up. For a brief window, a replica may still be holding the old value while the primary already has the new one. The animation below shows writes flowing from the primary out to several replicas, with reads able to hit any replica — including one that's momentarily a step behind.

Writes flow from primary to replicas

write

Writer

Primary

Replica

The primary copies each write to its replicas — which may briefly lag (eventually consistent).

Note

The gap has a name: replication lag. It's the short delay between a write landing on the primary and that same change appearing on a replica. Usually it's milliseconds, but under heavy load or network trouble it can stretch to seconds — long enough for a user to notice stale data.

Eventual consistency

Because replicas lag behind the primary for a moment, a read served from a replica might return data that's slightly out of date. This is the heart of eventual consistency: reads may be stale for a short time, but if writes stop, all copies will converge to the same value given enough time. Nothing is permanently wrong — the system just isn't instantaneously identical everywhere.

This is a deliberate trade-off against strong consistency, where every read is guaranteed to see the most recent write no matter which copy answers it. Strong consistency is what the strict transactional model gives you on a single server — see ACID & transactions. Distributed systems often relax that guarantee in favor of the BASE philosophy (Basically Available, Soft state, Eventually consistent), accepting brief staleness so the system can stay fast and available.

Why accept staleness at all? The CAP theorem explains the bind: when the network between nodes breaks (a partition), a distributed system has to choose between staying consistent and staying available. Eventually-consistent systems lean toward availability, answering with possibly-stale data rather than refusing to respond — a fine choice for a like count or a feed, but a dangerous one for a bank balance.

Pitfalls and variations

Replication lag creates some subtle bugs. The classic one is read-your-own-writes: a user updates their profile, the write goes to the primary, but their next read is served by a replica that hasn't caught up yet — so they see their old profile and assume the save failed. Common fixes are to route a user's own reads to the primary (or to a replica known to be up to date) for a short time after they write, so they always see their latest changes.

Leader/follower isn't the only model. In multi-leader replication, more than one node can accept writes — useful when you have datacenters in several regions and want each to take local writes. The catch is conflict resolution: if two leaders accept conflicting changes to the same record at the same time, the system has to decide which one wins, which is genuinely hard to get right. For most applications, a single leader with several read replicas is the simplest setup that delivers the big wins.

Watch out

Don't blindly read from a replica after a write. If correctness depends on seeing the freshest data — a payment confirmation, an inventory check at checkout — route that read to the primary or wait for the replica to catch up. Eventual consistency is great for tolerant data and quietly wrong for the data that really matters.