Some answers are expensive to produce: a complex database query, a call to a slow third-party API, a rendered page. If the same answer is requested over and over, recomputing it every time is wasteful — and it puts the slow component under load it didn't need.
A cache keeps a copy of the answer somewhere fast (usually memory) so that the next time someone asks the same question, you can hand back the saved copy instead of doing the work again.
The problem: doing the same expensive work twice
Imagine a product page that runs a heavy database query on every view. The data barely changes minute to minute, yet each visitor triggers the full query. As traffic grows, the database becomes the bottleneck — even though almost every request is asking for the same thing.
The insight: reads usually outnumber writes, and the same few items tend to be requested far more than the rest (hot keys). That repetition is exactly what a cache exploits.
How it works
A cache sits between the application and the slow source (a database, an API). On each read, the app checks the cache first:
- Cache hit — the answer is already there. Return it immediately. The database is never touched.
- Cache miss — the answer isn't there. Fetch it from the source, store it in the cache, then return it. The next identical request will be a hit.
This pattern is called read-through (or cache-aside) caching. The first request for an item pays the full cost; everyone after rides for nearly free, until the entry expires.
- ClientsUsers or apps sending read requests in to the app.
- App ServerRuns your application; checks the cache before hitting the database.
- CacheA fast in-memory store (e.g. Redis) holding recently-used answers.
- DatabaseThe source of truth — slower, consulted only on a cache miss.
Why it's so much faster: a memory cache (like Redis or Memcached) answers in microseconds, while a database query can take milliseconds — often 100–1000× slower. If 90% of reads are hits, you've removed 90% of the load from your database and made most requests dramatically faster at the same time.
Keeping the cache fresh
A cached copy is a snapshot. The moment the underlying data changes, the cache can be stale — serving an old answer. Two mechanisms keep this under control:
- TTL (time to live) — each entry is stamped with an expiry. After the TTL elapses, the entry is dropped and the next read is a miss, refreshing it from the source. A short TTL means fresher data but more misses; a long TTL means fewer misses but more staleness.
- Eviction — memory is finite, so when the cache fills up it must drop something. LRU (least-recently-used) is the common choice: evict whatever hasn't been touched in the longest time, keeping the hot keys resident.
Cache invalidation is famously hard. When data changes, stale entries must be expired or updated — and getting this wrong means users see old data. Prefer a sensible TTL as a safety net, and invalidate explicitly on writes only where correctness demands it. As the saying goes, there are only two hard things in computer science: cache invalidation and naming things.
The trade-offs
Caching is not free:
- Staleness — you accept that reads may be slightly out of date. Fine for a product description; dangerous for an account balance.
- Cold starts — an empty cache (after a deploy or restart) sends a burst of misses straight to the database. A sudden flood of misses for a hot key is called a cache stampede.
- Complexity — another moving part to run, monitor, and reason about, plus the invalidation logic.
- Memory cost — fast storage isn't free; you cache the valuable subset, not everything.
When to reach for it
Caching pays off when reads vastly outnumber writes, when the same items are requested repeatedly, and when slightly stale data is acceptable. It's one of the highest-leverage performance tools available — often a few lines in front of a hot query.
It pairs naturally with load balancing: the load balancer spreads requests across servers, and a shared cache keeps each of those servers from re-doing the same expensive work.