Explainstuff.mebeta
All concepts
Cloud Native Patternsintermediate6 min

Pipes and Filters

Break a complex processing job into a chain of small, independent stages connected by pipes — each does one thing and passes the result on.

Think of a factory assembly line for bottling juice: one station washes the bottles, the next fills them, another caps them, a fourth slaps on a label. No single worker does everything — each handles one step and slides the bottle along to the next. New bottles keep entering while finished ones roll off the end.

Pipes and Filters is that assembly line for data. A big processing job is split into a series of small stages (the filters), connected by channels (the pipes) that carry the output of one stage into the input of the next.

The problem

When all the steps of a complex task live inside one big function or service, they fuse into a tangle. Validation, transformation, enrichment, and formatting all share the same code and state, so you can't change one step without risking the others, and you can't reuse a step elsewhere because it's welded to its neighbors.

Scaling gets awkward too. Maybe one step is CPU-heavy and slow while the rest are trivial — but because they're bundled together, you have to scale the whole monolith just to give that one hot step more room. You end up paying for capacity the other steps don't need.

Before pipes and filters — one monolithic black box
all-in-one black box
Raw items
Monolithic processor (validate + transform + enrich + format)
Sink
All the steps live inside a single processor, so no stage can be tested, reused, or scaled on its own — and one heavy step holds up everything else.

How it works

You pull each step out into its own self-contained filter. A filter receives data on its input pipe, performs exactly one transformation, and writes the result to its output pipe — and that's all it knows about. It doesn't know who fed it or who consumes it, only the shape of the data flowing through.

That independence is the whole payoff. You can reorder stages, drop a stage in or out, reuse a stage in another pipeline, and scale each stage on its own — give the slow one more workers while the fast ones run lean. And because every stage processes a stream, they all run at once: while stage three works on item one, stage one is already pulling in item three. This is map/filter/reduce thinking stretched into a distributed pipeline. The diagram below shows data flowing left to right through a chain of filter stages.

Pipes and Filters — an assembly line for data
one item down the line
Raw items
Validate
Transform
Enrich
Sink
Each filter does one focused step and passes the result down the pipe; items stream left to right, so every stage stays busy on a different item at once.
Tip

Make filters idempotent and let pipes buffer. If a stage crashes partway through, the item should be safe to reprocess without corrupting anything — so design each filter to be idempotent. Using a durable queue as the pipe between stages also lets you fan a busy stage out to multiple competing consumers, absorbing bursts and recovering cleanly from failures.

When to use it

Pipes and filters fits naturally when a task is a clear sequence of distinct steps that operate on a stream of data — ETL jobs, image and video processing, log enrichment, or any workflow where stages have different resource appetites and you want to scale them independently.

It's overkill for a quick task that runs in a few milliseconds inside one process; the pipes themselves add latency and operational overhead. It's also a poor fit when the steps are tightly interdependent and need to share lots of state, since the whole point is that filters stay isolated. And you'll need to think hard about failures and ordering up front — a stage dying mid-stream is a question the pattern asks you to answer deliberately, not by accident.

Key takeaways

  • Pipes and filters decompose a task into a sequence of independent stages connected by channels that carry data between them.
  • Each filter does one focused transformation, knows nothing about its neighbors, and can be developed and tested in isolation.
  • Because stages are independent, you can reorder, add, remove, or reuse them — and scale each one separately.
  • Stages can run in parallel on a stream of items, so the pipeline keeps every stage busy instead of idling.
  • It shines for data-processing workloads; the cost is extra plumbing and the need to handle a stage failing mid-stream.

Keep going