Explainstuff.mebeta
All concepts
Scalabilitybeginner8 min

Caching

Keep a copy of expensive answers close by, so most requests never touch the slow path.

Some answers are expensive to produce: a complex database query, a call to a slow third-party API, a rendered page. If the same answer is requested over and over, recomputing it every time is wasteful — and it puts the slow component under load it didn't need.

A cache keeps a copy of the answer somewhere fast (usually memory) so that the next time someone asks the same question, you can hand back the saved copy instead of doing the work again.

The problem: doing the same expensive work twice

Imagine a product page that runs a heavy database query on every view. The data barely changes minute to minute, yet each visitor triggers the full query. As traffic grows, the database becomes the bottleneck — even though almost every request is asking for the same thing.

The insight: reads usually outnumber writes, and the same few items tend to be requested far more than the rest (hot keys). That repetition is exactly what a cache exploits.

How it works

A cache sits between the application and the slow source (a database, an API). On each read, the app checks the cache first:

  • Cache hit — the answer is already there. Return it immediately. The database is never touched.
  • Cache miss — the answer isn't there. Fetch it from the source, store it in the cache, then return it. The next identical request will be a hit.

This pattern is called read-through (or cache-aside) caching. The first request for an item pays the full cost; everyone after rides for nearly free, until the entry expires.

Reading through a cache
read
Clients
App Server
Cache
Database
Most reads are served straight from the cache — no database needed.
Note

Why it's so much faster: a memory cache (like Redis or Memcached) answers in microseconds, while a database query can take milliseconds — often 100–1000× slower. If 90% of reads are hits, you've removed 90% of the load from your database and made most requests dramatically faster at the same time.

Keeping the cache fresh

A cached copy is a snapshot. The moment the underlying data changes, the cache can be stale — serving an old answer. Two mechanisms keep this under control:

  • TTL (time to live) — each entry is stamped with an expiry. After the TTL elapses, the entry is dropped and the next read is a miss, refreshing it from the source. A short TTL means fresher data but more misses; a long TTL means fewer misses but more staleness.
  • Eviction — memory is finite, so when the cache fills up it must drop something. LRU (least-recently-used) is the common choice: evict whatever hasn't been touched in the longest time, keeping the hot keys resident.
Watch out

Cache invalidation is famously hard. When data changes, stale entries must be expired or updated — and getting this wrong means users see old data. Prefer a sensible TTL as a safety net, and invalidate explicitly on writes only where correctness demands it. As the saying goes, there are only two hard things in computer science: cache invalidation and naming things.

The trade-offs

Caching is not free:

  • Staleness — you accept that reads may be slightly out of date. Fine for a product description; dangerous for an account balance.
  • Cold starts — an empty cache (after a deploy or restart) sends a burst of misses straight to the database. A sudden flood of misses for a hot key is called a cache stampede.
  • Complexity — another moving part to run, monitor, and reason about, plus the invalidation logic.
  • Memory cost — fast storage isn't free; you cache the valuable subset, not everything.

When to reach for it

Caching pays off when reads vastly outnumber writes, when the same items are requested repeatedly, and when slightly stale data is acceptable. It's one of the highest-leverage performance tools available — often a few lines in front of a hot query.

It pairs naturally with load balancing: the load balancer spreads requests across servers, and a shared cache keeps each of those servers from re-doing the same expensive work.

Key takeaways

  • A cache stores the results of expensive work so repeated requests can skip it.
  • Reads check the cache first: a hit returns instantly; a miss falls back to the source and then stores the result.
  • Caching trades freshness for speed — cached data can be stale until it expires or is invalidated.
  • A TTL (time to live) bounds staleness by expiring entries; eviction policies like LRU bound memory.
  • Caching shines for read-heavy workloads with hot keys and tolerable staleness.

Keep going