Pipelines

A jetro query is a pipeline of stages. The shape is always:

Source → Stage* → Sink

Source produces values one at a time. Each Stage consumes one value and produces zero, one, or many. The Sink collects results.

What counts as a stage

Stage	Examples	Output
One-to-one	`.map`, `.enumerate`, `.lag`, `.zscore`	One out per in
Filter	`.filter`, `.find`, `.compact`, `.takewhile`	Zero or one out per in
Expander	`.flat_map`, `.flatten`, `.split`, `.lines`, `.chars`	Many out per in
Reducer	`.sum`, `.count`, `.min`, `.any`, `.find_index`	One total
Positional	`.first`, `.last`, `.nth(i)`, `.collect`	One or N
Barrier	`.sort`, `.unique`, `.group_by`, `.window`, `.chunk`	Buffers, then emits

A reducer or positional terminator ends the pipeline; further methods chain on the result (a scalar or array) rather than streaming.

Most stages stream — they process one value, emit, repeat. The pull-based backend means each value travels end-to-end before the next is fetched. This is what makes early termination work (.first, .find).

Barriers cannot stream: .sort must see every element before it can emit any. The pipeline buffers up to the barrier, runs the barrier as a unit, then resumes streaming if more stages follow.

$.xs.map(f).filter(p).sort(@.x).take(10).map(g)
        \________________/   \____________/
            streaming         streaming again
                          ↑
                    barrier point

Barriers carry an apply_barrier method on the builtin.

Sources

The most common source is a path: $.users is a source. Other shapes:

An array literal ([1,2,3].map(f))
A range ((0..10).map(f))
A method that returns a sequence ($.text.lines().map(...))

Source	Preferred backends
`FieldChain` (e.g. `$.a.b.c`)	tape-view → tape-rows → materialised → val-view → interpreted
Generic expression	fast-children → interpreted
Deep search	structural index → interpreted
Single root path	tape-path → interpreted

You don't pick the backend — the planner does. But knowing they exist explains why simple queries are fast: they often run zero-copy over the simd-json tape.

When to think about pipeline shape

In practice, almost never. Two cases:

Don't sort until you have to. A pre-sort barrier defeats early termination. Push .filter, .take, .first before .sort if the semantics allow.
Avoid full materialisation in the middle. Chains of streaming stages stay zero-copy. A .collect() mid-chain forces a full pass.

The next chapter, Demand Propagation, explains why these heuristics work.

The Jetro Book

Pipelines

What counts as a stage

Streaming vs. barrier

Sources

Sinks

Composed stages

Backend selection

When to think about pipeline shape