Pipes and Filters

Think of a factory assembly line for bottling juice: one station washes the bottles, the next fills them, another caps them, a fourth slaps on a label. No single worker does everything — each handles one step and slides the bottle along to the next. New bottles keep entering while finished ones roll off the end.

Pipes and Filters is that assembly line for data. A big processing job is split into a series of small stages (the filters), connected by channels (the pipes) that carry the output of one stage into the input of the next.

The problem

When all the steps of a complex task live inside one big function or service, they fuse into a tangle. Validation, transformation, enrichment, and formatting all share the same code and state, so you can't change one step without risking the others, and you can't reuse a step elsewhere because it's welded to its neighbors.

Scaling gets awkward too. Maybe one step is CPU-heavy and slow while the rest are trivial — but because they're bundled together, you have to scale the whole monolith just to give that one hot step more room. You end up paying for capacity the other steps don't need.

Before pipes and filters — one monolithic black box

all-in-one black box

Raw items

Monolithic processor (validate + transform + enrich + format)

Sink

All the steps live inside a single processor, so no stage can be tested, reused, or scaled on its own — and one heavy step holds up everything else.

How it works

You pull each step out into its own self-contained filter. A filter receives data on its input pipe, performs exactly one transformation, and writes the result to its output pipe — and that's all it knows about. It doesn't know who fed it or who consumes it, only the shape of the data flowing through.

That independence is the whole payoff. You can reorder stages, drop a stage in or out, reuse a stage in another pipeline, and scale each stage on its own — give the slow one more workers while the fast ones run lean. And because every stage processes a stream, they all run at once: while stage three works on item one, stage one is already pulling in item three. This is map/filter/reduce thinking stretched into a distributed pipeline. The diagram below shows data flowing left to right through a chain of filter stages.

Pipes and Filters — an assembly line for data

one item down the line

Raw items

Validate

Transform

Enrich

Sink

Each filter does one focused step and passes the result down the pipe; items stream left to right, so every stage stays busy on a different item at once.

Tip

Make filters idempotent and let pipes buffer. If a stage crashes partway through, the item should be safe to reprocess without corrupting anything — so design each filter to be idempotent. Using a durable queue as the pipe between stages also lets you fan a busy stage out to multiple competing consumers, absorbing bursts and recovering cleanly from failures.

When to use it

Pipes and filters fits naturally when a task is a clear sequence of distinct steps that operate on a stream of data — ETL jobs, image and video processing, log enrichment, or any workflow where stages have different resource appetites and you want to scale them independently.

It's overkill for a quick task that runs in a few milliseconds inside one process; the pipes themselves add latency and operational overhead. It's also a poor fit when the steps are tightly interdependent and need to share lots of state, since the whole point is that filters stay isolated. And you'll need to think hard about failures and ordering up front — a stage dying mid-stream is a question the pattern asks you to answer deliberately, not by accident.

Pipes and Filters

The problem

How it works

When to use it

Key takeaways

Keep going