Explainstuff.mebeta
All concepts
Data Fundamentalsbeginner7 min

Database Indexing

A small sorted lookup structure that lets the database jump straight to the rows you want instead of reading the whole table.

A database table is, underneath, just a pile of rows. When you ask it to find every customer named "Patel" or every order placed yesterday, it has to figure out where those rows actually are. How it does that — quickly, or painfully slowly — comes down to whether the right index exists.

Indexing is one of the highest-leverage ideas in working with data: the same query can take milliseconds or minutes depending on whether the database can jump straight to the answer or has to hunt for it.

The problem: the full table scan

Imagine a users table with ten million rows and you run SELECT * FROM users WHERE email = 'amy@example.com'. Without any help, the database has no idea where Amy's row lives. So it does the only thing it can: it reads every single row, top to bottom, checking each one's email against what you asked for.

This is called a full table scan. It's the brute-force approach — correct, but slow. Its cost grows linearly with the size of the table: this is O(n) work, meaning if the table doubles, the query takes roughly twice as long.

On a tiny table you'd never notice. But tables grow, and a scan that was instant at a thousand rows becomes a real problem at ten million. The scene below shows the painful version: the query has to walk past every row in the table just to find the few that match.

Full table scan
scan
Query
row
row
row
row
With no index, the query checks every row in turn — fine when small, slow as the table grows.
Watch out

A full table scan is the default, not an error. If you query a column that has no index, the database silently falls back to scanning the entire table — it still returns the right answer, so nothing looks broken. The query just gets slower and slower as the table grows, and you often don't notice until production traffic makes it hurt.

The solution: an index

An index is a separate, sorted lookup structure that the database keeps alongside the table. Think of the index at the back of a textbook: instead of flipping through every page to find "photosynthesis," you go to the back, find the word in alphabetical order, and it tells you exactly which pages to turn to.

A database index works the same way. It maps each value of a column (like email) to the location of the rows that hold it. Because the entries are kept in sorted order, the database can find any value without reading everything.

Most indexes are built as a B-tree — a balanced tree structure that stays shallow even when it holds millions of entries. To find a value, the database starts at the top and follows a handful of branches down to the right leaf, discarding huge chunks of the data at every step. The scene below shows the payoff: instead of scanning the whole table, the query jumps almost straight to the matching rows.

Index lookup
lookup
Query
Index
Matching row
An index jumps almost straight to the matching rows — a scan becomes a fast lookup.

This is the big win. A lookup through a B-tree index is roughly O(log n) — the work grows with the logarithm of the table size, not the size itself. Doubling the table adds barely any cost. Concretely, finding one row among ten million might touch only a few dozen index entries instead of ten million rows, turning a slow scan into a near-instant lookup. That's a dramatic latency improvement for reads.

Indexes aren't free

If indexes only made things faster, you'd put one on every column and be done. The catch is the cost on the write side. An index is extra data, so it takes additional storage on disk — sometimes a lot of it for a big table.

More importantly, the index has to stay in sync with the table. Every time you insert, update, or delete a row, the database must also update every index that covers the affected columns, keeping each one correctly sorted. So indexes make reads faster but make writes slower — a classic trade-off.

Tip

Don't index everything. Each index you add speeds up some reads but slows down every write and consumes storage. A table buried under redundant indexes can write more slowly than one with none — and the planner may not even use most of them. Add indexes deliberately, in response to real query patterns, not just in case.

When to add an index

The rule of thumb is to index the columns the database actually searches by. In practice that means columns used in:

  • WHERE clauses — the columns you filter on, like email or status.
  • JOIN conditions — the keys you join tables on, so the database can match rows quickly.
  • ORDER BY clauses — because a sorted index can hand back rows already in order, skipping a separate sort step.

Columns that never appear in any of these don't benefit from an index, so leave them alone.

You can also build a composite index that covers several columns at once — for example, an index on (last_name, first_name). This is ideal when your queries filter or sort on those columns together. The order matters: a composite index on (last_name, first_name) helps queries that filter by last name (or by last name and first name), but not ones that filter by first name alone, much like a phone book sorted by surname.

Indexing is one of two complementary tools for making reads fast. An index speeds up how the database finds data on disk; caching avoids touching the database at all by keeping hot results in fast memory. They solve the same problem — slow reads — at different layers, and real systems often lean on both: a well-chosen index for the queries that reach the database, and a cache in front of the ones that run again and again.

Key takeaways

  • Without an index, finding rows by a column means a full table scan — checking every row, which is O(n) and gets slow as the table grows.
  • An index is a separate sorted lookup structure (usually a B-tree, like the index at the back of a book) that maps column values to row locations.
  • An index turns a scan into a fast lookup — roughly O(log n) instead of O(n) — a big latency win for reads.
  • Indexes aren't free: they take extra storage and must be updated on every write, so they slow down inserts and updates.
  • Index the columns you actually filter, join, or sort on; over-indexing is a real anti-pattern that wastes space and drags down writes.

Keep going