Introduction

Jetro is a query, transform, and patch engine for JSON, written in Rust. It parses a small dot-syntax DSL, plans the query through a multi-tier optimizer, and routes each subtree to whichever execution backend will run it fastest — zero-copy borrowed views over a simd-json tape, a bitmap structural index, a streaming pull pipeline, or the universal interpreted fallback.

If you have used jq, jetro will feel familiar but takes a different shape:

Dot syntax, not pipe-of-filters. $.users.filter(active).map(name) reads left-to-right and chains methods. The | operator exists, but it is for passing a value into an arbitrary expression — not for calling methods with arguments.
One source of truth per builtin. Every method is one impl Builtin for X block: identity, demand law, optimizer hints, and runtime layers all co-located. There are 181 of them.
Demand-driven planning. .first() doesn't materialise the whole array. .filter(p).take(3) doesn't filter the whole array. The planner walks backward from the sink, telling each operator what its source actually needs to produce.
Writes are first-class. $.users[0].name.set("Ada") rewrites to a fused patch over the document. Multiple chain-writes batch through a single fused pass.
Pattern match with guards. match x with { {kind: "err"} -> .msg, _ -> "" } compiles to a Maranget decision tree and runs over Val, borrowed View, and tape domains; deep ..match is bitmap-accelerated.

What this book covers

Part	What you get
Language Reference	Every grammar form with at least one runnable example.
Concepts	Pipelines, demand propagation, the cache hierarchy.
Builtin Reference	One section per builtin — input, output, behavior, examples, demand law, common pitfalls.
Recipes	Real chained queries, pattern-match cookbook, write-fusion.
Appendix	The public Rust API (`Jetro`, `JetroEngine`), and a glossary.

What this book doesn't cover

Implementation internals — the IR layer, the bytecode VM, plan caching, peephole passes — are documented in the source. This book stops at user-facing surface, with one exception: the demand-propagation chapter, because demand is what makes "obvious" queries fast and not understanding it leads to surprised benchmarks.

Conventions

Examples use this layout:

DOC:    {"books": [{"title": "Dune", "year": 1965}, {"title": "Foundation", "year": 1951}]}
QUERY:  $.books.filter(@.year < 1960).map(@.title)
OUT:    ["Foundation"]

Where the document matters, you'll see DOC:. Where it's obvious from the query, only QUERY: and OUT: appear. Method aliases are listed inline: unique (alias distinct).

Ready? Start with the Quick Tour, or jump to the Builtin Reference if you already know jetro and need a specific method.

A few v0.5 sharp edges worth noting up front. This book documents jetro's stable semantics; the behaviours listed below are intentional design choices for v0.5. See Known Limitations for the canonical fix-list.

replace(needle, with) replaces only the first occurrence (JavaScript-style); use replace_all for substitute-every behaviour.

There is no in operator ("x" in xs is a parse error) because in doubles as the binder in let and for; use xs has "x" or xs.includes("x") instead.

Regex specials use single backslash inside string literals ("\d" works); double-backslash also parses but matches the same class.

rec(fn) caps at 10 000 iterations when the step never reaches a structural fixpoint; pass rec(fn, cond) to bound the loop.

Installation

Jetro ships as three artifacts:

Artifact	What it is	Audience
`jetro` (crate)	Rust library — query/transform JSON in-process	Rust developers
`jetro-py`	Python bindings (PyPI)	Python users
`jetrocli`	Standalone CLI `jetrocli` for shell use	Anyone with JSON in a terminal

Rust library

Add to Cargo.toml:

[dependencies]
jetro = "0.5"

The simd-json feature is on by default and gives a ~4× cold-start win by parsing bytes directly into Val (no serde_json::Value intermediate). To fall back to the legacy serde-only path:

[dependencies]
jetro = { version = "0.5", default-features = false }

Quick sanity check:

use jetro::Jetro;

fn main() -> anyhow::Result<()> {
    let bytes = br#"{"books":[{"title":"Dune","year":1965}]}"#;
    let j = Jetro::from_bytes(bytes)?;
    let titles: serde_json::Value = j.collect("$.books.map(@.title)")?;
    println!("{}", titles);  // ["Dune"]
    Ok(())
}

Long-lived engine

If you process many documents with overlapping queries, keep a JetroEngine around. It holds shared plan and VM caches:

use jetro::JetroEngine;

let eng = JetroEngine::default();
for doc in docs {
    let v = eng.collect(&doc, "$.users.filter(active).count()")?;
    println!("{}", v);
}

Plan-cache default capacity is 256 entries; it evicts wholesale when full.

Python bindings

pip install jetro-py

import jetro

doc = {"books": [{"title": "Dune", "year": 1965}]}
print(jetro.collect(doc, "$.books.map(@.title)"))   # ['Dune']

The Python wheel embeds the same Rust core, so query syntax is identical.

CLI (jetrocli)

Install via Homebrew:

brew install mitghi/jetrocli/jetrocli

Or build from source:

git clone https://github.com/mitghi/jetrocli
cd jetrocli && cargo install --path .

Use it like jq:

echo '{"x":[1,2,3]}' | jetrocli '$.x.sum()'
# 6

cat data.json | jetrocli '$.users.filter(@.active).map(@.email)'

Building from source

git clone https://github.com/mitghi/jetro
cd jetro
cargo build --release         # build everything
cargo test                    # full suite
cargo bench -p jetro-core     # micro-benchmarks

Workspace layout:

jetro/             facade crate (re-exports + public API)
jetro-core/        engine: parser, planner, executor, builtins, runtime
jetro-core/fuzz/   cargo-fuzz harness (feature-gated)

Verifying your install

Run the tour from the next chapter against your install. If every query produces the printed output, you're ready.

A 5-Minute Tour

This page is a working tour of jetro. Every example has a document, a query, and an output. Run them in your shell with jetrocli, in Rust with Jetro::collect, or in Python with jetro.collect.

The document for this tour

{
  "books": [
    {"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"]},
    {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"]},
    {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"]},
    {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"]}
  ],
  "active": true
}

QUERY:  $.books[0].title
OUT:    "Dune"

$ is the root, .books is field access, [0] is index. Negative indices work: [-1] is "Snow Crash".

2. The whole array

QUERY:  $.books[*].title
OUT:    ["Dune","Foundation","Hyperion","Snow Crash"]

[*] produces every element.

3. Filter

QUERY:  $.books.filter(@.year > 1980).map(@.title)
OUT:    ["Hyperion","Snow Crash"]

Inside .filter, .map, and similar method args, the current item is @. Use @.field to walk into it; the leading-dot shorthand .field is also accepted and desugars to @.field.

4. Four lambda forms

These are all equivalent:

$.books.filter(@.year > 1980)
$.books.filter(.year > 1980)
$.books.filter(b => b.year > 1980)
$.books.filter(lambda b: b.year > 1980)

Pick whichever reads best. The named-lambda and @-forms compile to identical bytecode; benchmarks confirm them perf-equal.

5. Reducers

QUERY:  $.books.count()
OUT:    4

QUERY:  $.books.map(@.year).min()
OUT:    1951

QUERY:  $.books.map(@.year).avg()
OUT:    1724.25

Reducers terminate the streaming pipeline.

6. Group / count / sort

QUERY:  $.books.count_by(@.author)
OUT:    {"Herbert":1,"Asimov":1,"Simmons":1,"Stephenson":1}

QUERY:  $.books.sort(@.year).map(@.title)
OUT:    ["Foundation","Dune","Hyperion","Snow Crash"]

7. Object projection

QUERY:  $.books[0].pick(title, author)
OUT:    {"title":"Dune","author":"Herbert"}

QUERY:  $.books.map(b => b.pick(title, year))
OUT:    [{"title":"Dune","year":1965}, ...]

.pick(name, alias: src) also renames: .pick(t: title, y: year).

8. Deep search

QUERY:  $..find(@.year < 1960)
OUT:    [{"title":"Foundation","year":1951,...}]

QUERY:  $..like({author: "Asimov"})
OUT:    [{"title":"Foundation","year":1951,...}]

..find, ..shape, and ..like are DFS pre-order over the whole document. Equivalent named forms: .deep_find, .deep_shape, .deep_like.

9. Pipe and ternary

QUERY:  $.books.count() | "found " + (@ as string) + " books"
OUT:    "found 4 books"

QUERY:  $.books[0] | "old" if @.year < 1980 else "modern"
OUT:    "old"

| passes a value through an expression — not a method-call sugar. Use .method() for methods.

10. F-strings

QUERY:  $.books.map(b => f"{b.title} ({b.year})")
OUT:    ["Dune (1965)","Foundation (1951)","Hyperion (1989)","Snow Crash (1992)"]

11. Pattern match

QUERY:
  match $.books[0] with {
    {year: y} when y < 1970 -> f"classic {y}",
    {year: y} -> f"modern {y}",
    _ -> "unknown"
  }
OUT:    "classic 1965"

Patterns include literals, ranges (1900..2000), or-patterns, guards, object shape, array shape, and rest captures.

12. Writes

QUERY:  $.books[0].year.set(1900)
OUT:    full document with books[0].year now 1900

QUERY:  $.books[*].tags.append("read")
OUT:    full document with "read" added to every book's tags

QUERY:  $.books[0].unset(tags)
OUT:    full document with books[0].tags removed

Multiple writes in one query batch through a single fused pass.

13. Engine entrypoint (Rust)

use jetro::JetroEngine;
use serde_json::json;

let eng = JetroEngine::default();
let doc = json!({"x":[1,2,3,4,5]});
let v = eng.collect_value(doc, "$.x.filter(@ > 2).sum()")?;
assert_eq!(v, json!(12));

That's the tour. Next: the Grammar Overview, or skip straight to the Builtin Index.

Grammar Overview

The jetro DSL is a small, expression-oriented language. There are no statements at the top level — every program is an expression that produces a value (or, in the case of patches, a rewritten document).

The grammar lives in grammar.pest and is parsed by pest.

Five things that make jetro different

Method calls use dot syntax. xs.map(f), not xs | map(f).
Pipe | is value-flow. x | expr evaluates expr with @ bound to x.
@ is the current value. Inside .filter(...) it's the row; at the top level it's the input.
Bare paths inside method args. .filter(@.age > 18) is sugar for .filter(@.age > 18).
Writes are queries. $.x.set(v) is parsed as a query that produces a patched document, not a mutation.

Categories of syntax

Category	Forms	Chapter
Paths	`$`, `@`, `.field`, `[idx]`, `[*]`, `[start:end:step]`, `..desc`, `{pred}`	Paths
Operators	arithmetic, comparison, logical, pipe, coalesce, ternary, kind, cast	Operators
Methods	`.name(args)`, lambdas (`@`, `=>`, `lambda`)	Lambdas
Literals	numbers, strings, f-strings, arrays, objects, regex	Literals
Control flow	`match`, ternary, `try`, comprehensions	Control Flow
Writes	`patch $ {…}`, chain-write terminals	Patch

A handy precedence table sits at the end of this part.

A worked sample

$.users
  .filter(u => u.active and u.age >= 18)
  .map(u => { id: u.id, name: u.name, email: u.email })
  .sort(@.name)
  .take(10)

That's: root, field users, predicate filter (named lambda), object-mapping, sort by name, take first 10.

Comments

There are no comments inside a query. Strip them client-side before calling jetro, or factor commentary into the surrounding host program.

Whitespace

Whitespace and newlines are insignificant between tokens. Keep queries on one line in CLIs; break across multiple lines in source.

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5]}

A path is the part of a query that walks into the document. Paths start at a root marker ($, @, or an identifier inside a lambda) and chain steps left-to-right.

Roots

Form	Meaning
`$`	The whole input document (top-level root)
`@`	The current value (set by `.filter`, `.map`, `\|`, etc.)
`name`	A let-bound name or lambda parameter

DOC:    {"x": 10}
QUERY:  $
OUT:    {"x":10}

QUERY:  $.x | @ + 1
OUT:    11

Field access

DOC:    {"user": {"name": "Ada"}}
QUERY:  $.user.name
OUT:    ["Ada"]

Field names may also use string keys via ["name"]:

QUERY:  $["user"]["name"]

Use the bracket form when the key contains characters disallowed in identifiers (-, spaces, dots inside the key, leading digits).

Indexing arrays

DOC:    {"xs": [10, 20, 30, 40]}
QUERY:  $.xs[0]
OUT:    10

QUERY:  $.xs[-1]
OUT:    40

Negative indices count from the end.

Slicing

QUERY:  $.xs[1:3]
OUT:    [20,30]

QUERY:  $.xs[:2]
OUT:    [10,20]

QUERY:  $.xs[2:]
OUT:    [30,40]

QUERY:  $.xs[0:4:2]
OUT:    [10,30]

Wildcards

QUERY:  $.xs[*]
OUT:    [10,20,30,40]

[*] is "every element". Most users prefer chained methods (.filter, .map) which already iterate.

Filtered wildcard `[* if pred]`

A predicated wildcard — keeps only elements satisfying pred (with @ bound to the candidate).

DOC:    {"books": [{"title": "Dune", "year": 1965}, {"title": "Hyperion", "year": 1989}]}
QUERY:  $.books[* if year > 1980]
OUT:    [{"title":"Hyperion","year":1989}]

Equivalent to [*] immediately followed by an inline-filter {cond}, but stays on the path side of parsing. Particularly useful inside .update selectors and quoted patch path keys (see Patch).

Chaining a bare field step after a filtered wildcard collapses to null — chain a method instead:

QUERY:  $.books[* if year > 1980].map(@.title)
OUT:    ["Hyperion"]

Inline filter

{predicate} after a path step keeps only matching elements:

DOC:    {"books": [{"year": 1965}, {"year": 1989}]}
QUERY:  $.books{@.year > 1970}
OUT:    [{"year":1989}]

This is shorthand for .filter(@.year > 1970). Use .filter when you want named-lambda forms.

Descendant search

.. walks every descendant value in DFS pre-order:

DOC:    {"a": {"b": {"x": 1}}, "c": [{"x": 2}, {"x": 3}]}
QUERY:  $..x
OUT:    [1,2,3]

Combine with method calls (no space):

QUERY:  $..find(@.year < 1960)
QUERY:  $..shape({year, title})
QUERY:  $..like({author: "Asimov"})

The deep variants are bitmap-accelerated when a structural index is available.

Dynamic keys

Compute a key at runtime:

DOC:    {"realnames": {"abc": "Ada"}, "post": {"author": "abc"}}
QUERY:  $.realnames[$.post.author]
OUT:    "Ada"

Inside a lambda:

DOC:    {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY:  $.posts.map(p => $.realnames[p.author])
OUT:    ["Ada"]

Quantifiers (postfix)

Form	Meaning
`step?`	Optional — return null instead of error if missing
`step!`	Exactly-one — error if zero or many

DOC:    {"xs": [42]}
QUERY:  $.xs!
OUT:    [42]

QUERY:  $.maybe?
OUT:    null      # absent, no error

Path after a method

Paths and methods are interchangeable steps:

$.users.filter(@.active).pick(name, email)[0]

That's: field, method, method, index. There is no special "tail position".

Paths inside method args need a root

Inside method-call arguments, paths must start with @ (current item), $ (document root), or a bound name. Bare-path forms like .field do not parse:

$.users.filter(@.age > 18)        # ✓ @-form
$.users.filter(u => u.age > 18)   # ✓ named lambda
$.users.filter(.age > 18)         # ✗ parse error
$.users.map(@.name)               # ✓
$.users.map(.name)                # ✗

The same rule applies to inline filters: $.xs{@.k > 1} works, $.xs{.k > 1} does not.

Top-level paths still need $.

Summary

Step	Example	Notes
Root	`$`, `@`	One per chain (or implicit `@` in args)
Field	`.name`	Use `["..."]` for tricky keys
Index	`[3]`, `[-1]`	Negative counts from end
Slice	`[1:5]`, `[::2]`	Half-open like Python
Wildcard	`[*]`	Whole array
Filtered wildcard	`[* if pred]`	Wildcard restricted by predicate (`@` = element)
Descendant	`..name`, `..`	DFS pre-order
Inline filter	`{cond}`	Sugar for `.filter`
Dynamic key	`[expr]`	Expression resolves to key
Quantifier	`?`, `!`	Postfix on a step

Operators

Jetro has the operators you'd expect plus a small number of extras that come up in JSON work.

Arithmetic

1 + 2          # 3
3 - 1          # 2
2 * 3          # 6
6 / 2          # 3
7 % 3          # 1
-x             # unary negation

+ on strings concatenates: "foo" + "bar" → "foobar".

+ on arrays concatenates: [1,2] + [3] → [1,2,3].

Comparison

a == b         # equality
a != b         # inequality
a < b          # less than
a <= b
a > b
a >= b

== and != work across types (strings to strings, numbers to numbers, etc). Cross-type comparison returns false for == and true for !=.

Logical

a and b        # short-circuit AND
a or b         # short-circuit OR
not a          # negation

Truthiness: null, false, 0, "", [], {} are falsy. Everything else is truthy.

Pipe

value | expr

Evaluates expr with @ bound to value. It is not a method-call shorthand.

DOC:    {"x": 10}
QUERY:  $.x | @ * 2
OUT:    20

QUERY:  $.x | f"got {@}"
OUT:    "got 10"

To call a method, use dot syntax: $.x.upper(), not $.x | upper.

Coalesce

a ?? b

Return a unless it is null, in which case b.

DOC:    {"name": null}
QUERY:  $.name ?? "anon"
OUT:    "anon"

Ternary

Python-style — postfix condition:

"hot" if temp > 30 else "cool"

DOC:    {"temp": 35}
QUERY:  "hot" if $.temp > 30 else "cool"
OUT:    "hot"

Kind tests

v is number
v is string
v is array
v is object
v is null
v is bool

Returns boolean.

QUERY:  $.x is number

Cast

x as int
x as float
x as string
x as bool
x as array
x as object

Coerces the value (or returns null if the cast is impossible — depends on the specific cast).

"42" as int        # 42
42 as string       # "42"

Membership

xs has v           # array membership: true if v is in xs
o  has "k"         # object membership: true if key "k" exists

There is no v in xs operator — that form is a parse error. Use the postfix has operator above, or call .includes(v) (arrays/strings) explicitly:

$.tags.includes("hugo")    # ✓
"hugo" in $.tags           # ✗ parse error

Regex match

s ~= "pattern"

Returns boolean. Uses Rust regex syntax. Bind captures with .captures or .match_first for richer info — see String Search.

Boolean shortcut on patches

In a patch $ { … } body, a key when condition clause skips the assignment when condition is falsy. See Patch.

Examples

DOC:    {"books": [{"year": 1965, "tags": ["sf"]}, {"year": 1989, "tags": ["sf","hugo"]}], "year_floor": 2000}

QUERY:  $.books.filter((@.year > 1970 and @.tags.includes("hugo")) or @.year >= $.year_floor)
OUT:    []

QUERY:  $.books[0].year ?? 9999
OUT:    1965

QUERY:  $.books.map(b => "old" if b.year < 1970 else "new")
OUT:    ["old","new"]

No in operator. Membership in jetro is xs.includes(v) (or xs.has(v) for objects/arrays). There is no v in xs operator — that form is a parse error. Wrap and/or mixes in parens to make precedence unambiguous; jetro follows standard binding (and tighter than or), but parens read clearer.

Lambdas and Method Calls

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5], "pairs": [["a", 1], ["b", 2], ["c", 3]]}

Methods take arguments. Most arguments are values; one common one is a lambda — a small function evaluated per element. Jetro accepts three lambda syntaxes; pick whichever reads best.

The `@`-form

@ is the current item. Inside method args, prefix paths with @ to walk into it:

$.users.filter(@.age >= 18)
$.users.map(@.name)
$.xs{@.active}                  # inline filter must also use @

Leading-dot shorthand .age inside method args desugars to @.age — the two forms are equivalent and the planner sees identical opcodes.

$.users.filter(.age >= 18)
$.users.map(.name)
$.xs{.active}                    # works inside inline filters too

Arrow-form named lambda

$.users.filter(u => u.age >= 18)
$.users.map((u) => u.name)

The parens around the parameter are optional for one parameter.

For multiple parameters:

$.pairs.map(([k, v]) => k + ":" + v)

Python-style `lambda` keyword

$.users.filter(lambda u: u.age >= 18)
$.users.map(lambda u: u.name)

Functionally identical to the arrow form. Useful when porting from Python.

Performance

Named lambdas (u => u.x, lambda u: u.x) and the @-form compile to the same bytecode. Benchmarks confirm parity (3.42 ms vs 3.44 ms / 100K rows in the lambda regression suite). Pick what reads best — there is no perf reason to prefer @.

Method call basics

.method()                       # no args
.method(arg)                    # one positional
.method(arg1, arg2)             # multiple
.method(name=value)             # named (a few methods support these)
.method(arg1, name=value)       # mixed

Examples:

$.xs.take(3)
$.xs.replace("foo", "bar")
$.xs.join(",")
$.xs.sort(@.year)                # sort by key projection

Methods inside method args

Lambdas can chain methods just like top-level queries:

$.posts.map(p => p.tags.unique().count())
$.users.filter(u => u.email.starts_with("admin"))

Multi-arg lambdas with destructuring

Some barriers (e.g. pairwise) yield 2-tuples. Destructure them:

$.xs.pairwise().map(([a, b]) => b - a)

Captured `$`

Inside a lambda, $ still means "the document root" — it does not get shadowed by the lambda parameter:

DOC:    {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY:  $.posts.map(p => $.realnames[p.author])
OUT:    ["Ada"]

First-class lambdas via `let`

Bind a lambda once, use it many times:

let by_year = (b => b.year < 1970) in
  $.books.filter(by_year)

The let-bound lambda is inlined at every method-arg use before compilation, so it has zero closure overhead — exactly the same code as if you'd written the body directly in .filter(...).

Outside method-arg position, the binding is a normal name reference.

Literals

Scalars

null
true     false
42       3.14     -7    1.5e3
"double-quoted"   'single-quoted'

Strings allow standard escapes (\n, \t, \\, \", \uXXXX).

F-strings

f"…" interpolates {expression}:

DOC:    {"name": "Ada", "age": 36}
QUERY:  f"hi {$.name}, you are {$.age + 1} next year"
OUT:    "hi Ada, you are 37 next year"

Inside a lambda:

$.users.map(u => f"{u.name} <{u.email}>")

Escape literal braces with {{ and }}:

f"{{not interpolated}}"      # "{not interpolated}"

Arrays

[1, 2, 3]
["a", "b"]
[$.x, $.y, 99]              # values can be expressions

[...$.xs, 4, 5]             # spread
[1, ...mid, 9]              # spread anywhere

Heterogeneous arrays are fine: [1, "a", null, [2,3]].

Objects

{name: "Ada", age: 36}            # bare-key (identifier-like)
{"name": "Ada"}                   # quoted-key (any string)

{x, y}                            # shorthand: same as {x: x, y: y}

{[dyn_key]: 1}                    # computed key
{...obj, extra: 1}                # spread
{...**deep}                       # deep recursive spread

{name: "Ada", role: "admin" when $.is_admin}
                                  # conditional value (omit if cond falsy)

Regex literals

Regex appear as the right operand of ~= or as arguments to regex builtins:

$.s ~= "^[A-Z]+$"
$.text.scan("\d+")

Patterns use Rust's regex crate syntax.

Numeric notes

Jetro distinguishes integers from floats internally where possible. 42 and 42.0 compare equal but a downstream sink that requires "integer" (e.g. indexing) will only accept the former.

Negative literals: -7 is a unary-negated literal — the parser handles this correctly without ambiguity in arithmetic positions (a - 7 is subtraction, a + -7 is addition with -7).

Control Flow

Ternary

Python-style:

expr if condition else fallback

DOC:    {"x": 10}
QUERY:  "big" if $.x > 5 else "small"
OUT:    "big"

Right-associative; chain via parens for clarity.

Try / else

Catch evaluation errors:

try expr else fallback

QUERY:  try $.maybe.deep.path else "missing"
OUT:    "missing"

QUERY:  try $.xs[0].name.upper() else "n/a"

? quantifier handles the "missing field" subset more concisely: $.maybe? returns null instead of erroring.

`let … in …`

Local bindings:

let x = $.users.count() in
  f"there are {x} users"

Multi-binding:

let a = 1, b = 2 in a + b   # equiv: let a=1 in let b=2 in a+b

let shines for first-class lambdas — see Lambdas.

Pattern match

match value with {
  pattern1 -> expr1,
  pattern2 when guard -> expr2,
  _ -> default
}

Patterns

Pattern	Matches
`42`, `"x"`, `true`, `null`	Equal literal
`_`	Any value
`name`	Any value, bound to `name`
`1..10`	Number ≥ 1 and < 10
`1..=10`	Number ≥ 1 and ≤ 10
`{k1: p1, k2: p2}`	Object with these keys, each matching (no shorthand `{k1, k2}` in v0.5)
`[p1, p2]`	Array of length 2, each matching
`[h, ...t]`	Head + tail
`p1 \| p2`	Either pattern (or-pattern)
`x: number`	Kind-bound: matches if `x` is a number

Guards

match $.x with {
  v when v > 100 -> "big",
  v when v > 10 -> "medium",
  _ -> "small"
}

Worked example

DOC:    {"event": {"kind": "click", "x": 100, "y": 200}}
QUERY:
  match $.event with {
    {kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
    {kind: "key",   code: c}       -> f"key:{c}",
    _ -> "unknown"
  }
OUT:    "click@100,200"

Deep match

$..match { pattern -> expr, _ -> null }

Walks every descendant; returns matched results as an array.

$..match! { pattern -> expr }      # first match only, early-stops

The bang variant terminates as soon as one match succeeds (uses the bitmap structural index when available).

Comprehensions

Jetro supports list, dict, set, and generator comprehensions over both literal and path-rooted sources. Pair destructure works in two interchangeable forms (for k, v in ... and for [k, v] in ...), and multiple if clauses are folded with and.

List

[expr for x in source if cond1 if cond2 ...]

DOC:    {"xs": [1, 2, 3, 4, 5]}

QUERY:  [n*n for n in $.xs if n > 2]
OUT:    [9,16,25]

QUERY:  [n for n in $.xs if n > 1 if n < 5]
OUT:    [2,3,4]

Object

{key: value for x in source if cond}
{k: v for [k, v] in pairs}
{k: v for k, v in pairs}

DOC:    {"pairs": [["a", 1], ["b", 2]]}

QUERY:  {k: v for [k, v] in $.pairs}
OUT:    {"a":1,"b":2}

QUERY:  {n: n*n for n in [1,2,3]}
OUT:    {"1":1,"2":4,"3":9}

Iterating an object yields {key, value} records:

DOC:    {"o": {"a": 1, "b": 2}}
QUERY:  {e.key: e.value*10 for e in $.o}
OUT:    {"a":10,"b":20}

Set

Deduplicating comprehension. Returns an array of unique values.

QUERY:  {n*n for n in [-2, -1, 0, 1, 2]}
OUT:    [4,1,0]

Generator

(x for x in items)

Same semantics as the list form; useful as a lazy source for a downstream reducer or barrier.

`if`-on-patch

Inside a patch $ {…} body, key: expr when cond skips the assignment when cond is falsy:

patch $ {
  status: "active" when $.verified
}

See Patch.

Patch and Writes

Fixture

Examples below run against:

DOC:    {"user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "xs": [1, 2, 3, 4, 5]}

Jetro treats writes as queries: a write returns the patched document. There are two equivalent surfaces.

Chain-write terminals

Add a write method at the end of a rooted path:

Method	Effect
`.set(v)`	Replace the value at this path with `v`
`.modify(expr)`	Replace, with `@` bound to the current value
`.delete()`	Remove the leaf
`.unset(key)`	Remove `key` from the leaf object
`.merge({…})`	Shallow-merge into the leaf object
`.deep_merge({…})`	Recursive merge
`.append(v)`	Push to the leaf array
`.prepend(v)`	Unshift onto the leaf array

DOC:    {"user": {"name": "Ada", "tags": ["math"]}}

QUERY:  $.user.name.set("Ada Lovelace")
OUT:    {"user":{"name":"Ada Lovelace","tags":["math"]}}

QUERY:  $.user.tags.append("code")
OUT:    ["math","code"]

QUERY:  $.user.unset(tags)
OUT:    {"user":{"name":"Ada"}}

QUERY:  $.user.modify(u => u.merge({active: true}))
OUT:    {"user":{"active":true,"name":"Ada","tags":["math"]}}

The classifier fires only when the base of the chain is $. Inside lambdas ($.xs.map(@.set(...))) it remains a regular method call — useful when a sub-pipeline wants the old "return the new value" semantics.

`patch $ { … }` block

The same operation expressed as a block:

patch $ {
  user.name: "Ada Lovelace",
  user.tags: DELETE
}

Block syntax is best for multiple writes — it batches them through a single fused pass (see Write Fusion).

Block clause	Meaning
`path: value`	Assignment
`path: DELETE`	Removal
`path: value when cond`	Conditional
`path[*]: value`	Broadcast over an array

Conditional writes

patch $ {
  status: "active" when $.verified,
  retired_at: now() when $.retired
}

If the condition is falsy, the assignment is skipped entirely — neither written nor zeroed.

Broadcast over arrays

DOC:    {"items": [{"x": 1}, {"x": 2}, {"x": 3}]}

QUERY:  $.items[*].x.set(0)
OUT:    [0,0,0]

Pipe form preserves "return-the-new-value"

Some users prefer the v1 behavior where a write inside a .map returned the written value, not the patched root:

$.items.map(item => item | set(item.x + 1))

The pipe form value | set(new) keeps that meaning.

Modify with pipe

$.user.modify(u => u.merge({last_seen: now()}))

modify evaluates its argument with @ bound to the current value, then writes the result back at the same path.

Multiple writes in one query

Either chain them:

$.user.name.set("Ada").tags.append("admin")

or use a block:

patch $ {
  user.name: "Ada",
  user.tags[*]: "active"   # broadcast
}

The planner detects multi-write patterns and routes them through the patch-fusion optimizer, which lowers repeated path traversals into a single fused write pass.

Functional `.update({...})`

A third surface, written as a method call:

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}

QUERY:  $.books[*].update({tags: tags.append("modern") when year > 1980, reviewed: true})
OUT:    {"books":[{"reviewed":true,"tags":["sf"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

Use .update when you want all of the following at once:

A selector chosen with chain syntax ($.books[*], $.books[* if year > 1980])
An object body listing multiple field updates evaluated against each selected snapshot
The same when / DELETE semantics as patch $ { ... }
Quoted path keys ("books[*].tags") when the receiver is $, giving root-level batched updates without an explicit selector

.update parses to its own AST node (UpdateBatch) so the planner can keep the user-level shape — useful for selector pushdown, demand analysis, and fusion. See Path Mutation → update for the full argument matrix.

Filtered wildcard `[* if pred]`

A predicated wildcard inside a path. Available wherever [*] is, and particularly useful inside .update selectors and quoted path keys:

DOC:    {"books": [
  {"title": "Dune", "year": 1965},
  {"title": "Hyperion", "year": 1989}
]}

QUERY:  $.books[* if year > 1980]
OUT:    [{"title":"Hyperion","year":1989}]

The predicate runs against @ = the candidate element. Falsy elements are skipped from the path traversal entirely.

Wildcard `.modify` chains

Wildcard chain-writes are now lowered to a fused patch:

DOC:    {"books": [{"tags": ["sf"]}, {"tags": ["hugo"]}]}
QUERY:  $.books[*].tags.modify(@.append("test"))
OUT:    {"books":[{"tags":["sf","test"]},{"tags":["hugo","test"]}]}

Caveats

.replace(needle, with) is not a write terminal — it is the string-replace builtin.
The classifier only triggers on chains rooted at $. Use the block syntax when the base path is computed.
DELETE is a marker, not a value — you can't store it in a binding.

Precedence Table

Lowest precedence at the top. Operators on the same row associate left unless noted.

Level	Operators	Associativity	Notes
1	`if … else …`, `try … else …`	right	Ternary, try-else
2	`\|`, `\|>`	left	Pipe (value-flow)
3	`??`, `?\|`	right	Coalesce
4	`or`	left	Logical OR (short-circuit)
5	`and`	left	Logical AND (short-circuit)
6	`not`	n/a	Logical NOT (prefix)
7	`is`, `kind`, `is not`	n/a	Kind test
8	`has`	left	Membership operator (no `in` — use `.includes(v)`)
9	`==`, `!=`, `<`, `<=`, `>`, `>=`, `~=`	left	Comparison
10	`+`, `-`	left	Additive (and string/array concat)
11	`*`, `/`, `%`	left	Multiplicative
12	`as`	left	Cast
13	`-` (unary)	n/a	Negation
14	`.field`, `.method()`, `[idx]`, `{cond}`, `?`, `!`	left	Postfix steps
15	`$`, `@`, literal, `(...)`, `lambda`, `let`, `match`, `patch`, comp	n/a	Primary

Common pitfalls

Pipe vs method call.

$.x | upper           # ✗ — interprets `upper` as a name to pipe into
$.x.upper()           # ✓ — method call

Comparison chains.

1 < x < 10            # ✗ — parses as `(1 < x) < 10`
1 < x and x < 10      # ✓

Ternary mid-chain.

$.x.upper() if cond else $.x   # parses fine — the ternary wraps the whole
                                # left expression

Negation tightness.

not a == b            # parses as `(not a) == b` — surprising!
not (a == b)          # parens are clearer
a != b                # cleanest

Coalesce + comparison.

$.x ?? 0 > 5          # parses as `($.x ?? 0) > 5` (low-precedence coalesce)

Try captures errors only.

try $.x.parse_int() else 0

try does not catch falsy-as-error — only actual evaluation errors (missing field, bad cast, regex failure, etc.).

Pipelines

A jetro query is a pipeline of stages. The shape is always:

Source → Stage* → Sink

Source produces values one at a time. Each Stage consumes one value and produces zero, one, or many. The Sink collects results.

What counts as a stage

Stage	Examples	Output
One-to-one	`.map`, `.enumerate`, `.lag`, `.zscore`	One out per in
Filter	`.filter`, `.find`, `.compact`, `.takewhile`	Zero or one out per in
Expander	`.flat_map`, `.flatten`, `.split`, `.lines`, `.chars`	Many out per in
Reducer	`.sum`, `.count`, `.min`, `.any`, `.find_index`	One total
Positional	`.first`, `.last`, `.nth(i)`, `.collect`	One or N
Barrier	`.sort`, `.unique`, `.group_by`, `.window`, `.chunk`	Buffers, then emits

A reducer or positional terminator ends the pipeline; further methods chain on the result (a scalar or array) rather than streaming.

Streaming vs. barrier

Most stages stream — they process one value, emit, repeat. The pull-based backend means each value travels end-to-end before the next is fetched. This is what makes early termination work (.first, .find).

Barriers cannot stream: .sort must see every element before it can emit any. The pipeline buffers up to the barrier, runs the barrier as a unit, then resumes streaming if more stages follow.

$.xs.map(f).filter(p).sort(@.x).take(10).map(g)
        \________________/   \____________/
            streaming         streaming again
                          ↑
                    barrier point

Barriers carry an apply_barrier method on the builtin.

Sources

The most common source is a path: $.users is a source. Other shapes:

An array literal ([1,2,3].map(f))
A range ((0..10).map(f))
A method that returns a sequence ($.text.lines().map(...))

Sinks

If your final stage is a reducer, the sink is the reducer's accumulator. If it's a streaming stage, the sink collects into an array.

.collect() is the explicit sink: scalar in → [scalar], array in → identity, null in → []. Use it when you need a deterministic array shape.

Composed stages

Adjacent stages get composed when possible: two Stages fold into one virtual call per element. This is Composed<A, B> under the hood; the optimizer fuses chains of .maps, .filters, and .picks aggressively.

User-visible effect: writing many short stages costs roughly the same as one big lambda — write for clarity.

Backend selection

Each pipeline node carries a list of preferred backends. The router tries them in order; the first to declare it can run the node wins.

Source	Preferred backends
`FieldChain` (e.g. `$.a.b.c`)	tape-view → tape-rows → materialised → val-view → interpreted
Generic expression	fast-children → interpreted
Deep search	structural index → interpreted
Single root path	tape-path → interpreted

You don't pick the backend — the planner does. But knowing they exist explains why simple queries are fast: they often run zero-copy over the simd-json tape.

When to think about pipeline shape

In practice, almost never. Two cases:

Don't sort until you have to. A pre-sort barrier defeats early termination. Push .filter, .take, .first before .sort if the semantics allow.
Avoid full materialisation in the middle. Chains of streaming stages stay zero-copy. A .collect() mid-chain forces a full pass.

The next chapter, Demand Propagation, explains why these heuristics work.

Demand Propagation

Demand propagation is the planner pass that makes "obvious" queries fast. It walks the pipeline backward — from sink to source — asking each operator: given what comes after you, what do you actually need from your source?

The answer is encoded in three lanes per stage and then used at execution time to skip work.

The three lanes

1. `PullDemand` — how many inputs?

Variant	Meaning
`All`	Read everything
`FirstInput(n)`	Stop after `n` inputs
`LastInput(n)`	Seek to the end, take last `n`
`NthInput(i)`	Jump to a single index
`UntilOutput(n)`	Keep reading until `n` outputs are produced

2. `ValueNeed` — what payload from each input?

Variant	Meaning
`None`	Don't decode the row at all
`Predicate`	Only what the predicate touches
`Projection`	Only the fields used in a projection
`Numeric`	Only numeric content
`Whole`	The full row (default pessimistic)

3. `order: bool` — does input order matter?

Some sinks (e.g. .sum()) don't care about order. The planner can use this to enable parallel-friendly access patterns when supported.

Backward walk

For a pipeline s1 → s2 → … → sN → sink, the planner does:

demand = sink_demand
for op in [sN, …, s2, s1]:        # reverse order
    upstream = op.propagate_demand(demand)
    record (op, downstream=demand, upstream)
    demand = upstream

The final demand is what the source must satisfy. The source backend chooses an access strategy that matches.

Operator laws

Every builtin declares one of these laws (in defs.rs):

Law	Effect on demand
`Identity`	Pass through unchanged (e.g. `.upper`, `.lower`)
`MapLike`	Preserve pull, force `ValueNeed::Whole`
`FilterLike`	`FirstInput(n)` becomes `UntilOutput(n)`
`TakeWhile`	Same as filter, but bounded
`UniqueLike`	Must scan until N distinct outputs
`Take(n)`	Cap pull at `FirstInput(n)`
`First`	Always `FirstInput(1)`
`Last`	Always `LastInput(1)`
`Count`	All inputs, `ValueNeed::None`
`NumericReducer`	All inputs, `ValueNeed::Numeric`

Six worked examples

A. Early termination on `.first`

$.items.map(name).first()

first() declares FirstInput(1) to its source
.map(name) is MapLike: preserves pull, demands Whole from items
Source receives: read 1 item, decode fully

Without demand: read all items, decode all, take first.

B. Bounded filter

$.items.filter(active).take(3)

take(3) ← FirstInput(3)
filter(active) ← UntilOutput(3) (read until 3 pass)
Source: read until 3 active items found

Without demand: filter the entire array, then slice.

C. Field-level projection

$.users.map(u => {id, name})

The map projection touches id and name
Source: decode only id, name from each user

Other fields are not allocated. Over a wide-record document, this is the biggest win.

D. Last-element scan

$.logs.filter(severity >= 3).last()

last() ← LastInput(1)
filter(...) ← UntilOutput(1) from the end
Source: scan backward, stop after first match

Without demand: scan forward, materialise all matches, take last.

E. Count without payloads

$.items.filter(status == "done").count()

count() declares ValueNeed::None
filter(...) declares Predicate on status
Source: decode only status, no other fields

F. Reverse + take

$.items.reverse().take(2)

take(2) ← FirstInput(2)
reverse() flips: source receives LastInput(2)
Source: seek to end, read 2 backward, then reverse

What demand does not do

It does not change result semantics. Two pipelines with identical text produce identical output regardless of demand state.
It does not optimise across barriers (.sort, .group_by). A barrier forces All upstream — it must see every input.
It does not move work between stages. Operators don't fuse; demand only gates what they read.

When you'll feel demand kick in

Three rough rules of thumb:

Put take/first/find near the end. That's how their pull demand reaches back to the source.
Project early when possible. map(@.field) upstream of a barrier reduces the buffered set.
Avoid unnecessary collect(). It forces full materialisation and resets the demand walk.

Demand is invisible most of the time — your queries get faster than they "should" be, and that's exactly the goal.

Lazy Evaluation and Caches

Jetro is lazy in three places that matter to users.

1. Document parsing

Jetro::from_bytes does not fully parse the document up front when the default simd-json feature is enabled. Instead it builds a tape — a flat array of tokens — and lazily decodes parts as queries demand them.

What this means:

Cold-start is ~4× faster than the legacy serde_json::Value path.
A query that touches only $.x.y decodes the rest of the doc only when asked.
Borrowed string slices (Val::StrSlice) avoid a copy when the value is read-only.

If you want eager full parsing (e.g. for serde_json::Value round-trips):

let doc: serde_json::Value = serde_json::from_slice(bytes)?;
let v = engine.collect_value(doc, "$.x")?;

2. Streaming pipelines

The pull-based pipeline backend processes one element at a time. A stage doesn't run until its downstream consumer pulls. This is what enables .first() and .find() to terminate early.

A consequence: side effects in lambdas are not guaranteed to fire for every element. (Lambdas in jetro have no I/O, so this is mostly an academic concern, but worth knowing if you write a custom builtin.)

3. Plan caches

Two caches matter:

Plan cache (per `JetroEngine`)

When you call engine.collect(&doc, query) repeatedly with the same query, the parsed AST → IR → bytecode pipeline is computed once and reused. Default capacity: 256 entries, evicted wholesale when full.

For workloads with a small fixed set of queries and many documents, this is a big speedup. For ad-hoc one-shot queries, it's a no-op.

Path cache (per VM)

The bytecode VM caches resolved pointer paths per document. The cache key hashes both structure and primitive leaf values bounded at depth 8 — two documents with identical shape but different leaves produce different hashes, so the cache stays correct across calls.

You don't manage this directly. It's amortised over many queries on the same document.

When laziness backfires

It rarely does, but two pitfalls:

Forcing materialisation. Methods like .collect(), .sort(), .unique(), .group_by() are barriers — they materialise. Putting them mid-chain when they aren't needed defeats laziness.

Holding onto Vals. A Val is Arc-wrapped, so cloning is O(1), but the Arc keeps the underlying data alive. If you query a giant doc, hold onto a small projection, and let the doc go, you may be surprised that the original data is still resident — the projection's Val::StrSlices borrow into the tape.

Use .to_json() (or serde_json::Value round-trip) to disconnect a projection from the source tape when you really need to release memory.

Practical recipe

For long-lived servers:

// At startup
let engine = JetroEngine::default();

// Per request
let result = engine.collect_bytes(req_body, "$.users.filter(@.active).count()")?;

Plans get cached, parsing is lazy, the pipeline early-terminates. There's typically nothing else to tune.

Builtin Reference — Overview

Jetro ships 181 builtin methods. They fall into 18 categories. Every method has the same shape:

.method(arg1, arg2, …)

…or, when the parser routes through inline path filters and sugar:

$.path.method(...)

This part documents every method. Each entry follows the format:

name (aliases: …)

Signature: what it takes and returns

Behavior: one-paragraph description

Example: at least one minimal runnable example

Demand law / Notes: when relevant

Index

Category	What goes here	Page
Value introspection	`type`, `len`, `schema`, JSON round-trip	Introspection
Numeric scalars	`ceil`, `floor`, `round`, `abs`	Numeric
String transforms	`upper`, `trim`, `pad_*`, `slice`, `replace` …	String
String search / regex	`starts_with`, `match_*`, `captures`, `split_re`	String Search
Conversion	`to_number`, `parse_int`, `parse_bool`	Conversion
Streaming one-to-one	`map`, `enumerate`, `pairwise`, `lag`, `zscore`	Streaming
Filtering	`filter`, `find`, `compact`, `takewhile`	Filtering
Expanding	`flat_map`, `flatten`, `lines`, `chars`	Expanding
Reducers	`sum`, `count`, `any`, `max_by`	Reducers
Positional	`first`, `last`, `nth`, `collect`	Positional
Barriers	`sort`, `unique`, `group_by`, `window`	Barrier
Arrays / sets	`append`, `diff`, `union`, `zip`	Arrays
Objects	`keys`, `pick`, `merge`, `transform_values`	Objects
Path mutation	`get_path`, `set_path`, `set`, `update`	Path Mutation
Deep traversal	`deep_find`, `walk`, `rec`	Deep
Predicates	`has`, `missing`, `includes`, `index`	Predicates
Tabular	`to_csv`, `to_tsv`	Tabular
Relational	`equi_join`	Relational

Notation in this part

aliases — alternative names accepted by the parser. They lower to the same builtin and behave identically.
"demand law" — what kind of Demand this builtin propagates upstream. See Demand Propagation for the model.
"barrier" / "stream" / "scalar" — execution shape (does it buffer, stream, or run once on a single value).

When a method appears under multiple categories (e.g. .find is both a filter and positional), it lives in the most specific chapter and is cross-linked.

Sharp edges

A small set of v0.5 design choices is documented in Known Limitations: replace is single-occurrence (use replace_all for substitute-every), there is no in operator (use xs has v), and rec(fn) caps at 10 000 iterations when the step never converges (use rec(fn, cond) to bound). Two engine items remain on the fix-list: rec() no-arg and a stronger runaway-iteration guard.

Aliases at a glance

Canonical	Aliases
`any`	`exists`
`chunk`	`batch`
`drop_while`	`dropwhile`
`take_while`	`takewhile`
`includes`	`contains`
`skip`	`drop`
`sort`	`sort_by`
`unique`	`distinct`
`deep_find`	`..find` (deep-method form)
`deep_shape`	`..shape`
`deep_like`	`..like`

These pairs are interchangeable. Pick whichever reads better.

Value Introspection

Methods that report on the kind and shape of a value, plus JSON round-trip.

`type`

Signature: Any -> String
Behavior: Returns the kind of value as a string: "null", "bool", "number", "string", "array", "object".

QUERY:  $.x.type()
DOC:    {"x": [1,2,3]}
OUT:    "array"

`len`

Signature: (String|Array|Object) -> Number
Behavior: Length: chars for strings, elements for arrays, key count for objects. Errors on null/bool/number.

DOC:    {"s": "hello", "xs": [1,2,3], "o": {"a":1,"b":2}}

QUERY:  $.s.len()     OUT: 1
QUERY:  $.xs.len()     OUT: 3
QUERY:  $.o.len()     OUT: 1

`to_string`

Signature: Any -> String
Behavior: Stringifies a scalar (42 → "42", true → "true", null → "null"). For arrays/objects, returns the JSON serialisation.

QUERY:  42.to_string()     OUT: "42"
QUERY:  ([1, 2]).to_string()     OUT: "[1,2]"

`to_json`

Signature: Any -> String
Behavior: Compact JSON serialisation of any value.

QUERY:  $.user.to_json()

Distinguish from to_string: for compound values, the two are equivalent; for scalars, to_json always quotes strings ("foo" → "\"foo\""), to_string does not.

`from_json`

Signature: String -> Any
Behavior: Parse a JSON string into a value.

QUERY:  '{"x":1}'.from_json()
OUT:    {"x":1}

QUERY:  $.encoded.from_json().x

Errors on malformed input. Wrap in try if the source is untrusted:

try $.s.from_json() else null

`schema`

Signature: Any -> Object
Behavior: Infers a schema sketch — keys, kinds, nullable flags. Useful for "what does this document look like?" probes.

DOC:    [{"id": 1, "name": "a"}, {"id": 2, "name": null}]
QUERY:  $.schema()
OUT:    {"items":{"fields":{"id":{"type":"Int"},"name":{"nullable":true,"type":"String"}},"required":["id"],"type":"Object"},"len":2,"type":"Array"}

The exact output format is documented in builtins/ops/schema.rs; treat it as advisory rather than a stable contract.

Demand notes

len over an array is ValueNeed::None upstream — it doesn't decode rows.
type is Identity demand-wise.
from_json/to_json are scalar transforms with no demand interaction.

Practical examples

# Quick shape check
$.payload.type()                        # → "object"
$.payload.len()                         # for object: number of keys

# Distinguish array length vs string length
$.items.len()                           # array element count
$.title.len()                           # number of characters

# Safe deserialization of a payload field
try $.body.from_json() else null

# Compact serialization
$.event.to_json()

# Stringify any value
$.x.to_string()

# Probe an unknown payload's schema
$.events[0].schema()

Numeric Scalars

Fixture

Examples below run against:

DOC:    {"products": [{"id": 1, "price": 3.7}, {"id": 2, "price": 4.2}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "deltas": [-1, 2, -3, 4], "xs": [1, 2, 3, 4, 5]}

Pure scalar transforms over numbers.

`ceil`

Signature: Number -> Number
Behavior: Smallest integer ≥ x.

QUERY:  3.2.ceil()     OUT: 4
QUERY:  (-3.2).ceil() OUT: -3

`floor`

Signature: Number -> Number
Behavior: Largest integer ≤ x.

QUERY:  3.7.floor()     OUT: 3
QUERY:  (-3.7).floor() OUT: -4

`round`

Signature: Number -> Number
Behavior: Round to nearest; ties round half-away-from-zero.

QUERY:  3.5.round()     OUT: 4
QUERY:  3.4.round()     OUT: 3
QUERY:  (-3.5).round() OUT: -4

`abs`

Signature: Number -> Number
Behavior: Absolute value.

QUERY:  (-7).abs()     OUT: 7
QUERY:  3.5.abs()     OUT: 3.5

Mapping over arrays

These are scalar; lift them with .map:

DOC:    {"xs": [1.4, 2.6, -3.5]}

QUERY:  $.xs.map(@.round())
OUT:    [1,3,-4]

QUERY:  $.xs.map(@.abs()).sum()
OUT:    7.5

Practical examples

# Round every price up to the nearest dollar
$.products.map(p => p.merge({price_ceil: p.price.ceil()}))

# Percent → integer percent
$.metric.pct.map(@ * 100).map(@.round())

# Magnitudes (drop sign)
$.deltas.map(@.abs())

# Banker-style splits
$.amount.floor()                   # cents component, etc.

# Build a histogram with binned values
$.measurements.map(m => (m / 10).floor() * 10).count_by(@)
# → {0: 12, 10: 5, 20: 3, ...}

String Transforms

Scalar string operations. Lift with .map to apply to an array of strings.

Case

Method	What	Example
`upper`	ASCII uppercase	`"foo".upper()` → `"FOO"`
`lower`	ASCII lowercase	`"FOO".lower()` → `"foo"`
`capitalize`	First char upper, rest lower	`"foo bar".capitalize()` → `"Foo bar"`
`title_case`	Each word capitalised	`"foo bar".title_case()` → `"Foo Bar"`
`snake_case`	`lowerSnake_case` to `lower_snake_case`	`"FooBar".snake_case()` → `"foo_bar"`
`kebab_case`	Words joined with `-`	`"FooBar".kebab_case()` → `"foo-bar"`
`camel_case`	`fooBar` style	`"foo_bar".camel_case()` → `"fooBar"`
`pascal_case`	`FooBar` style	`"foo_bar".pascal_case()` → `"FooBar"`
`reverse_str`	Reverse char order	`"abc".reverse_str()` → `"cba"`

Trim

Method	What
`trim`	Strip whitespace from both ends
`trim_left`	Strip leading whitespace
`trim_right`	Strip trailing whitespace

QUERY:  "  hi  ".trim()     OUT: "hi"
QUERY:  "  hi  ".trim_left()     OUT: "hi  "

Padding and centering

Method	Signature	Example
`pad_left(width, char?)`	Right-align by padding left	`"7".pad_left(3, "0")` → `"007"`
`pad_right(width, char?)`	Left-align by padding right	`"hi".pad_right(5)` → `"hi "`
`center(width, char?)`	Center within width	`"hi".center(6)` → `" hi "`

If char is omitted, space is used.

Indent / dedent

indent(n) takes an integer (number of spaces); the prefix is fixed spaces.

QUERY:  "line1\nline2".indent(2)
OUT:    "  line1\n  line2"

dedent() strips the first line's leading whitespace from every subsequent line that begins with the same prefix. It is not a common-prefix dedent across all lines:

QUERY:  "  a\n  b".dedent()
OUT:    "a\nb"

Slice

"hello world".slice(0, 5)      # "hello"
"hello world".slice(6)         # "world"
"hello".slice(-3)              # "llo"

slice(start, end?) mirrors Python; end is exclusive.

Repeat

"ab".repeat(3)        # "ababab"

Replace

Method	Behavior
`replace(needle, with)`	Replace first literal occurrence
`replace_all(needle, with)`	Replace all literal occurrences
`replace_re(pattern, with)`	Regex-aware single replacement
`replace_all_re(pattern, with)`	Regex-aware all replacements

QUERY:  "hello hello".replace("hello", "hi")
OUT:    ["hi hello"]

QUERY:  "hello hello".replace_all("hello", "hi")
OUT:    ["hi hi"]

QUERY:  "abc123def".replace_all_re("\d+", "#")
OUT:    "abc#def"

Regex escapes inside jetro string literals. Use a single backslash: "\d", "\w+", "\s". Jetro string literals don't eat backslashes separately; doubling ("\\d") sends the regex engine the literal two-char sequence \\d, which is not the digit class and silently fails to match. This differs from host languages like Python or JavaScript where you must double-escape.

Strip

"prefix-foo".strip_prefix("prefix-")  # "foo"
"foo.txt".strip_suffix(".txt")        # "foo"

If the prefix/suffix isn't present, returns the input unchanged.

Encoding

Method	What
`to_base64`	Standard base64 encode
`from_base64`	Standard base64 decode
`url_encode`	Percent-encode
`url_decode`	Percent-decode
`html_escape`	`&` → `&`, `<` → `<`, etc.
`html_unescape`	Reverse of `html_escape`

QUERY:  "hello world".to_base64()     OUT: "aGVsbG8gd29ybGQ="
QUERY:  "a b".url_encode()     OUT: "a%20b"
QUERY:  "<b>".html_escape()     OUT: "&lt;b&gt;"

Demand notes

All string transforms are Identity demand-wise: they don't change what the upstream needs to produce.

Practical examples

# Normalise display names
$.users.map(u => u.name.trim().title_case().first())

# Build an URL-safe slug
"My Article Title".lower().replace_all(" ", "-")
# → "my-article-title"

# CamelCase to snake_case migration
"FooBarBaz".snake_case()                # → "foo_bar_baz"

# Truncate with ellipsis
$.posts.map(p => p.body.slice(0, 100) + "..." if p.body.len() > 100 else p.body)

# Parse a comma-separated tag list
$.tags_csv.split(",").map(@.trim())

# Encode for URL
$.query.url_encode()

# Encode binary as base64
$.bytes.to_base64()

# HTML-escape user input
$.comments.map(c => c.text.html_escape())

# Pad a numeric ID for fixed-width keys
($.id as string).pad_left(8, "0")
# → "00000042" for id=42

# Strip a known prefix
"https://example.com/path".strip_prefix("https://")
# → "example.com/path"

# Build a banner
"=".repeat(40)                          # → "========================================"

# Indent a nested message
$.message.indent(4)

String Search and Regex

Predicates (return boolean)

Method	Behavior
`is_blank`	True if empty or only whitespace
`is_numeric`	True if all chars are digits
`is_alpha`	True if all chars are letters
`is_ascii`	True if all bytes < 128
`starts_with(prefix)`	Prefix check
`ends_with(suffix)`	Suffix check

QUERY:  "  ".is_blank()     OUT: true
QUERY:  "abc123".is_numeric()     OUT: false
QUERY:  "hello".starts_with("he")     OUT: true

Position

Method	Returns
`index_of(needle)`	First index of `needle`, or `-1`
`last_index_of(needle)`	Last index of `needle`, or `-1`

QUERY:  "hello world".index_of("o")     OUT: 4
QUERY:  "hello world".last_index_of("o")     OUT: 7

Substring search

"foo bar foo".matches("foo")    # 2 (count of literal occurrences)
"abc 12 cd 34".scan("\d+")     # ["12", "34"] (regex matches as strings)

Regex match

Method	Returns
`re_match(pattern)`	Boolean
`match_first(pattern)`	First match string, or null
`match_all(pattern)`	Array of all match strings
`captures(pattern)`	First match with groups: `[full, g1, g2, …]`
`captures_all(pattern)`	Array of `captures` results

QUERY:  "a1b2".re_match("\d")     OUT: true
QUERY:  "a1b2".match_first("\d+")     OUT: "1"
QUERY:  "a1b2".match_all("\d+")     OUT: ["1","2"]

QUERY:  "key=val".captures("(\\w+)=(\\w+)")
OUT:    ["key=val","key","val"]

The ~= operator is sugar for re_match and returns the same boolean.

Splitting

Method	Behavior
`split(sep)`	Split on literal separator
`split_re(pattern)`	Split on regex

QUERY:  "a,b,c".split(",")     OUT: ["a","b","c"]
QUERY:  "a,,b".split_re(",+")     OUT: ["a","b"]

Multi-needle membership

"abc def".contains_any(["abc", "xyz"])    # true (matches first)
"abc def".contains_all(["abc", "def"])    # true (all match)

Demand notes

Regex builtins are scalar. Lift across an array with .map(...). The underlying regex is compiled once per query and reused — no per-element re-compilation cost.

Conversion and Parsing

Coerce between value kinds.

`to_number`

Signature: Any -> Number | null
Behavior: Coerce to number. "42" → 42, "3.14" → 3.14, true → 1, false → 0. Returns null for unparseable strings.

QUERY:  "42".to_number()     OUT: 42
QUERY:  "3.14".to_number()     OUT: 3.14
QUERY:  "abc".to_number()      OUT: null

`to_bool`

Signature: Any -> Boolean
Behavior: Truthiness: false/null/0/""/[]/{} → false, everything else → true.

QUERY:  $.maybe.to_bool()

`parse_int(radix?)`

Signature: String -> Number | null
Behavior: Parse a string as integer, optional radix (default 10).

QUERY:  "42".parse_int()     OUT: 42
QUERY:  "ff".parse_int(16)     OUT: 255
QUERY:  "0b101".parse_int(2)     OUT: 5

`parse_float`

Signature: String -> Number | null
Behavior: Parse a string as float (IEEE 754 double).

QUERY:  "3.14".parse_float()     OUT: 3.14
QUERY:  "1e6".parse_float()     OUT: 1000000.0

`parse_bool`

Signature: String -> Boolean | null
Behavior: Strict parse: only "true" and "false" (lowercase) match; everything else returns null.

QUERY:  "true".parse_bool()     OUT: true
QUERY:  "TRUE".parse_bool()     OUT: true

`as` cast (operator)

The as operator does the same coercions as to_*:

"42" as int          # 42
42 as string         # "42"
true as int          # 1

Use as when the type is statically known; use to_number/parse_* when parsing untrusted strings (since as errors on failure rather than returning null).

Round-trip JSON

For full document round-trip, see from_json/to_json.

Practical examples

# Coerce strings collected from a CSV
$.rows.map(r => r.merge({age: r.age.to_number(), price: r.price.parse_float()}))

# Defensive parse — null on garbage
$.user_input.parse_int() ?? 0

# Boolean coercion of a flag string
"true".parse_bool() ?? false

# Truthiness coercion
$.value.to_bool()               # null/0/""/empty → false; else true

# Cast operator for static conversions
($.id as string).pad_left(8, "0")

# Round-trip number → string → back
(3.14 as string).parse_float()  # → 3.14

Streaming One-to-One

Each input produces exactly one output. These compose freely; the planner fuses adjacent stages into a single composed stage when possible.

Fixture

Examples in this chapter run against:

{
  "users": [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}],
  "xs":    [1, 2, 3, 4, 5],
  "prices":[100, 105, 102, 110, 108, 115]
}

`map`

Signature: Array<A> -> Array (with f: A -> B)
Demand law: MapLike — preserves pull, forces Whole.

QUERY:  $.users.map(u => u.name)
OUT:    ["Ada","Bob"]

QUERY:  $.xs.map(@ * 2)
OUT:    [2, 4, 6, 8, 10]

QUERY:  $.users.map(@.name.upper())
OUT:    ["ADA","BOB"]

map is the workhorse. The lambda may use any of the three forms.

`enumerate`

Signature: Array<A> -> Array<{index: Number, value: A}>
Behavior: Pair each element with its zero-based index. Output is a record {index, value} per element.

QUERY:  $.xs.enumerate()
OUT:    [{"index":0,"value":1},{"index":1,"value":2},{"index":2,"value":3},{"index":3,"value":4},{"index":4,"value":5}]

QUERY:  $.users.map(@.name).enumerate()
OUT:    [{"index":0,"value":"Ada"},{"index":1,"value":"Bob"}]

`pairwise`

Signature: Array<A> -> Array<[A, A]>
Behavior: Yield consecutive pairs [xs[0], xs[1]], [xs[1], xs[2]], …

QUERY:  [1,2,3,4].pairwise()
OUT:    [[1,2],[2,3],[3,4]]

QUERY:  $.xs.pairwise().map(p => p[1] - p[0])
OUT:    [1, 1, 1, 1]

`lag(n=1)` and `lead(n=1)`

Signature: Array<Number> -> Array<Number | null>
Behavior: Shift by n positions; out-of-range positions become null.
Numeric: Output values are returned as floats regardless of input numeric type.

QUERY:  $.xs.lag()
OUT:    [null, 1.0, 2.0, 3.0, 4.0]

QUERY:  $.xs.lead()
OUT:    [2.0, 3.0, 4.0, 5.0, null]

QUERY:  $.xs.lag(2)
OUT:    [null, null, 1.0, 2.0, 3.0]

`diff_window(n=1)`

Signature: Array<Number> -> Array<Number | null>
Behavior: xs[i] - xs[i - n], with null until lag is satisfied.

QUERY:  $.prices.diff_window()
OUT:    [null, 5.0, -3.0, 8.0, -2.0, 7.0]

`pct_change(n=1)`

Signature: Array<Number> -> Array<Number | null>
Behavior: (xs[i] - xs[i-n]) / xs[i-n] — relative change.

QUERY:  [100.0, 110.0, 121.0].pct_change()
OUT:    [null, 0.1, 0.09999999999999998]

`cummax` and `cummin`

Signature: Array<Number> -> Array<Number>
Behavior: Running max / min up to and including the current position.

QUERY:  $.prices.cummax()
OUT:    [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]

QUERY:  $.prices.cummin()
OUT:    [100.0, 100.0, 100.0, 100.0, 100.0, 100.0]

`zscore`

Signature: Array<Number> -> Array<Number>
Behavior: Standardise: (x - mean) / stddev. Two passes (one for stats, one for transform); not strictly streaming, but presented as a one-to-one stage at the user surface.

QUERY:  [1.0, 2.0, 3.0, 4.0, 5.0].zscore()
OUT:    [-1.414213562373095, -0.7071067811865475, 0.0, 0.7071067811865475, 1.414213562373095]

`accumulate`

See Barriers — accumulate is a barrier because it requires a custom reducer over the full input.

Practical examples

DOC:    {"prices":[100, 105, 102, 110, 108, 115]}

# Apply tax to every price
QUERY:  $.prices.map(@ * 1.08)
OUT:    [108.0, 113.4, 110.16000000000001, 118.80000000000001, 116.64000000000001, 124.2]

# Day-over-day deltas
QUERY:  [100,105,102,110,108].pairwise().map(p => p[1] - p[0])
OUT:    [5, -3, 8, -2]

# Running max ("high-water mark")
QUERY:  $.prices.cummax()
OUT:    [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]

# Lag-1 to compare current vs previous
QUERY:  $.prices.lag()
OUT:    [null, 100.0, 105.0, 102.0, 110.0, 108.0]

Filtering

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "xs": [1, 2, 3, 4, 5]}

Methods that drop elements based on a predicate.

`filter`

Signature: Array<A> -> Array<A> (with pred: A -> Bool)
Demand law: FilterLike — FirstInput(n) from downstream becomes UntilOutput(n) upstream.

$.users.filter(u => u.active)
$.users.filter(@.age >= 18)
$.users.filter(@.email ~= "@admin\.")

filter is the universal predicate stage. Combine with .take(n) for bounded scans:

$.events.filter(@.severity >= 3).take(10)

The planner stops reading from the source as soon as 10 events pass — no full scan.

`find`

Signature: Array<A> -> A | null (first match only on this branch)
Demand law: FilterLike with FirstInput(1) → source.

DOC:    {"users": [{"id":1,"role":"user"},{"id":2,"role":"admin"}]}
QUERY:  $.users.find(@.role == "admin")
OUT:    {"id":2,"role":"admin"}

find returns the first match (or null if none), not an array. Use find_all for the array form.

`find_all`

Signature: Array<A> -> Array<A>
Behavior: Like filter. Alias kept for readability.

$.users.find_all(@.role == "admin")

Equivalent to .filter(@.role == "admin"). The two are interchangeable.

`compact`

Signature: Array<Any> -> Array<Any>
Behavior: Drop nulls.

QUERY:  [1, null, 2, null, 3].compact()
OUT:    [1,2,3]

Equivalent to .filter(@ != null), but reads better and avoids a lambda.

`take_while` (alias `takewhile`)

Signature: Array<A> -> Array<A>
Behavior: Take elements while pred is true; stop at the first false (don't keep checking).

QUERY:  [1, 2, 3, 4, 1, 2].take_while(@ < 3)
OUT:    [1,2]

Demand law: bounded — terminates the source as soon as pred flips.

`drop_while` (alias `dropwhile`)

Signature: Array<A> -> Array<A>
Behavior: Drop the leading run where pred holds; emit the rest.

QUERY:  [1, 2, 3, 4, 1, 2].drop_while(@ < 3)
OUT:    [3,4,1,2]

`remove`

Signature: Array<A> -> Array<A>
Behavior: Inverse of filter. Drop elements where pred is true.

QUERY:  $.xs.remove(@ < 0)

Useful when the negated predicate reads worse than the affirmative.

Filtering objects

For object filtering, see filter_keys and filter_values in Objects. They take a predicate over keys / values and return a filtered object.

Practical examples

DOC:    {"users":[
  {"id":1,"name":"Ada","active":true,"age":30},
  {"id":2,"name":"Bob","active":false,"age":24},
  {"id":3,"name":"Cy", "active":true,"age":42}
]}

# Active users only
QUERY:  $.users.filter(@.active)
OUT:    []

# Active users over 30, just names
QUERY:  $.users.filter(@.active and @.age >= 30).map(@.name)
OUT:    []

# First admin (early-exit)
QUERY:  $.users.find(@.active).name
OUT:    "Ada"

# Take while a streak holds
QUERY:  [1,2,3,4,1,2].take_while(@ < 3)
OUT:    [1,2]

# Negate a predicate
QUERY:  $.users.remove(@.active).count()
OUT:    1

# Drop nulls
QUERY:  [1, null, 2, null, 3].compact()
OUT:    [1,2,3]

Worked demand example

DOC:    {"events": [
  {"sev": 1, "msg": "ok"},
  {"sev": 2, "msg": "warn"},
  {"sev": 3, "msg": "err"},
  {"sev": 1, "msg": "ok2"}
]}

QUERY:  $.events.filter(@.sev >= 2).map(@.msg).take(2)
OUT:    []

Demand walks back: take(2) → FirstInput(2), map → preserves, filter → UntilOutput(2). Source reads events one-by-one, stops after the second match.

Expanding Sequences

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}]}

Each input produces zero or many outputs.

`flat_map`

Signature: Array<A> -> Array (with f: A -> Array)
Behavior: Map then concatenate.

QUERY:  [[1,2],[3,4]].flat_map(@)
OUT:    [1,2,3,4]

QUERY:  $.users.flat_map(u => u.tags)

If f returns a non-array, it's wrapped first (flat_map(@ + 1) works on numbers).

`flatten`

Signature: Array<Array<A>> -> Array<A>
Behavior: One level of flattening.

QUERY:  [[1,2],[3],[4,5]].flatten()
OUT:    [1,2,3,4,5]

To flatten more levels, chain: .flatten().flatten(). Or use walk for full recursive flatten of arbitrary structure.

`explode`

⚠ v0.5 status: explode requires an argument in v0.5 (errors with "explode: missing argument" on no-arg call). Spec is intended to mirror chars / to_pairs for the common cases; until then, use those builtins directly.

Signature (intended): (Array | Object | String) -> Array<...>
Behavior (intended): Convert to a flat sequence of elements / pairs / chars.
- Array: identity
- Object: array of [key, value] pairs (= to_pairs)
- String: array of single-char strings (= chars)

`split(sep)`

Signature: String -> Array<String>
Behavior: Split a string on a literal separator. (See split_re for regex.)

QUERY:  "a,b,c".split(",")
OUT:    ["a","b","c"]

`lines`

Signature: String -> Array<String>
Behavior: Split on newline (\n or \r\n).

QUERY:  "a\nb\nc".lines()
OUT:    ["a","b","c"]

`words`

Signature: String -> Array<String>
Behavior: Split on whitespace (any run).

QUERY:  "  hello  world  ".words()
OUT:    ["hello","world"]

`chars`

Signature: String -> Array<String>
Behavior: Array of single-character strings.

QUERY:  "abc".chars()
OUT:    ["a","b","c"]

`chars_of(s)`

Signature: String -> Array<String>
Behavior: Equivalent to s.chars(). Useful when the source is the argument:

QUERY:  ($.text).chars_of()

`bytes`

Signature: String -> Array<Number>
Behavior: UTF-8 byte values, 0–255.

QUERY:  "abc".bytes()
OUT:    [97,98,99]

Demand notes

Expanding stages declare an indeterminate output count. Pull demand from downstream still flows back, but the planner can't tightly bound how many inputs are needed — it pulls one input at a time and yields outputs lazily.

.flat_map(...) followed by .first() will read inputs until the first flat-mapped output appears, then stop.

Practical examples

# Flatten one level
[[1,2],[3,4],[5]].flatten()                # → [1, 2, 3, 4, 5]

# Tags across all books
$.books.flat_map(@.tags)

# Distinct hashtags across tweets
$.tweets.flat_map(t => t.entities.hashtags.map(@.text)).unique()

# Word histogram from a paragraph
$.text.words().map(@.lower()).count_by(@)

# Parse CSV headers
"id,name,email".split(",")

# Process logs line by line
$.log_blob.lines().filter(@.contains_any(["ERROR","WARN"]))

# Char-level analysis
$.password.chars().count_by(@)             # frequency of each char

# Bytes for a binary diff
"hello".bytes()                            # → [104, 101, 108, 108, 111]

Reducers and Aggregates

Reducers consume the whole stream and emit a single value. They terminate the streaming pipeline.

Numeric

Method	Signature	Notes
`sum`	`Array<Number> -> Number`	Empty → `0`
`avg`	`Array<Number> -> Number`	Empty → `null`
`min`	`Array<Number\|String> -> ...`	Empty → `null`
`max`	`Array<Number\|String> -> ...`	Empty → `null`

QUERY:  [1,2,3,4].sum()     OUT: 10
QUERY:  [1,2,3,4].avg()     OUT: 2.5
QUERY:  [3,1,4,1,5].min()     OUT: 1.0
QUERY:  ["b","a","c"].max()   OUT: "c"

Demand law: NumericReducer — ValueNeed::Numeric, pull = All.

`count`

Signature: Array -> Number
Behavior: Element count.
Demand: All inputs, ValueNeed::None (no payload decoded).

QUERY:  $.users.count()
QUERY:  $.users.filter(@.active).count()

This is the cheapest reducer — the source skips deserialisation entirely.

`approx_count_distinct`

⚠ Not yet supported in v0.5 — runtime returns "ApproxCountDistinct: builtin unsupported". Spec exists; HyperLogLog backend pending.

Signature (planned): Array<Any> -> Number
Behavior (planned): Approximate count of distinct values via HLL.

For now, use .unique().count() for exact distinct count.

`any` (alias `exists`)

Signature: Array<A> -> Bool (with pred: A -> Bool)
Behavior: True if any element matches. Short-circuits.

QUERY:  $.users.any(@.role == "admin")
OUT:    false

`all`

Signature: Array<A> -> Bool
Behavior: True if every element matches. Short-circuits on first false.

QUERY:  $.flags.all(@ == true)

`find_index`

Signature: Array<A> -> Number | null
Behavior: Zero-based index of first match, or null.

QUERY:  ["a","b","c"].find_index(@ == "b")
OUT:    1

`indices_where`

Signature: Array<A> -> Array<Number>
Behavior: All indices where pred matches.

QUERY:  [10, 20, 5, 30, 8].indices_where(@ < 15)
OUT:    [0,2,4]

`max_by` and `min_by`

Signature: Array<A> -> A | null
Behavior: Element with the maximum / minimum projected key.

QUERY:  $.books.max_by(@.year)
QUERY:  $.users.min_by(@.age)

Distinguish from .sort(@.key).first() — max_by is one pass; the sort form allocates the sorted array first.

When to use which

Goal	Use
Sum/avg numbers	`sum`, `avg`
Count rows	`count`
Exact distinct count	`.unique().count()`
Existence check	`any`
Universal check	`all`
Find index	`find_index`
Pick single max/min element	`max_by`, `min_by`

Practical examples

DOC:    {"books":[
  {"title":"Dune","year":1965,"price":15},
  {"title":"Foundation","year":1951,"price":10},
  {"title":"Hyperion","year":1989,"price":18},
  {"title":"Snow Crash","year":1992,"price":12}
]}

# Total revenue across all books
QUERY:  $.books.map(@.price).sum()
OUT:    0

# Mean price
QUERY:  $.books.map(@.price).avg()
OUT:    13.75

# Earliest and most expensive
QUERY:  $.books.min_by(b => b.year).title
OUT:    "Foundation"

QUERY:  $.books.max_by(b => b.price).title
OUT:    "Hyperion"

# Any cyberpunk in the catalog?
QUERY:  $.books.any(@.tags? and @.tags.includes("cyberpunk"))
# (where @.tags? guards against missing field)

# Count books published before 1970
QUERY:  $.books.filter(@.year < 1970).count()
OUT:    0

# Position of the first 1990s book
QUERY:  $.books.find_index(@.year >= 1990)
OUT:    3

# All published years where price > 12
QUERY:  $.books.indices_where(@.price > 12)
OUT:    []

Positional Access

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "transactions": [{"ts": "01"}, {"ts": "02"}, {"ts": "03"}]}

Bounded extraction by position.

`first`

Signature: Array<A> -> A | null
Demand law: First — always FirstInput(1).

QUERY:  [10,20,30].first()     OUT: 10
QUERY:  [].first()              OUT: null

QUERY:  $.users.filter(@.active).first()
# Source reads only enough to get one active user.

Equivalent to .nth(0) but reads better and is the canonical "early-exit" sink.

`last`

Signature: Array<A> -> A | null
Demand law: Last — always LastInput(1).

QUERY:  [10,20,30].last()     OUT: 30

When the source supports it (an in-memory array, or a tape with known length), last seeks to the end; for streams it must drain.

`nth(i)`

Signature: Array<A> -> A | null
Demand law: NthInput(i) if i is non-negative; LastInput(-i) otherwise.

QUERY:  [10,20,30,40].nth(2)     OUT: 30
QUERY:  [10,20,30,40].nth(-1)     OUT: 40

`find_first(pred)`

Signature: Array<A> -> A | null
Behavior: Same as find — kept for naming clarity. Use find in new code.

`find_one(pred)`

Signature: Array<A> -> A | null
Behavior: Asserts at most one match; errors if more than one matches. Useful for "exactly one user with this id" shapes.

QUERY:  $.users.find_one(@.id == 1)

`collect`

Signature: Any -> Array<Any>
Behavior: Coerce to array. Scalar → [scalar]; array → identity; null → [].

QUERY:  42.collect()     OUT: [42]
QUERY:  [1,2].collect()     OUT: [1,2]
QUERY:  null.collect()     OUT: []

Use collect to guarantee an array shape at a pipeline boundary — useful for callers that always want to iterate.

When to use a positional vs. a reducer

first() is a positional sink (returns one element). count() is a reducer (returns one number). Both terminate the pipeline. Use whichever matches your output type.

Worked example

DOC:    {"orders": [
  {"id": 1, "total": 100},
  {"id": 2, "total": 50},
  {"id": 3, "total": 200}
]}

QUERY:  $.orders.filter(@.total > 75).first().id
OUT:    1

QUERY:  $.orders.sort_by(@.total).last().id
OUT:    3

The first query early-exits (one filter pass, one match). The second sorts (barrier), then takes the last — the planner can't avoid the sort.

Practical examples

# First active user — early-exit, demand-aware
$.users.find(@.active).name

# Last log entry of severity 3+ (when the source supports random access)
$.logs.filter(@.sev >= 3).last().msg

# Get a user at known index
$.users.nth(2).email

# Negative-index array tail
$.transactions.nth(-1).ts

# Coerce-or-empty: scalar source becomes a 1-element array
"hello".collect()      # → ["hello"]
null.collect()         # → []

# Use collect() at a method-call boundary so callers always iterate
$.config.tags.collect().map(@.lower())

Barrier Operators

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}], "daily": [{"day": 1, "value": 10}, {"day": 2, "value": 12}]}

Barriers must see the full input before emitting any output. They materialise. Place them late in pipelines when possible.

Sort

`sort` (alias `sort_by`)

Signature: Array<A> -> Array<A>
Behavior: Stable ascending sort. With a projection, sorts by the projected key.

QUERY:  [3,1,4,1,5].sort()
OUT:    [1,1,3,4,5]

QUERY:  $.books.sort(@.year)
QUERY:  $.books.sort(b => -b.year)
QUERY:  $.users.sort(@.last_name, @.first_name)

Multi-arg form sorts by a tuple of keys.

Distinct

`unique` (alias `distinct`)

Signature: Array<A> -> Array<A>
Behavior: Remove duplicates by structural equality, preserving first occurrence order.

QUERY:  [3,1,4,1,5,9,2,6,5].unique()
OUT:    [3,1,4,5,9,2,6]

`unique_by(f)`

Signature: Array<A> -> Array<A>
Behavior: Dedup by projected key.

QUERY:  $.books.unique_by(@.author)

Group / count / index

`group_by(key)`

Signature: Array<A> -> Object<KeyString, Array<A>>
Behavior: Bucket by projected key.

QUERY:  $.books.group_by(@.author)
OUT:    {"null":[null]}

`count_by(key)`

Signature: Array<A> -> Object<KeyString, Number>
Behavior: Bucket counts.

QUERY:  $.books.count_by(@.author)
OUT:    [null]

`index_by(key)`

Signature: Array<A> -> Object<KeyString, A>
Behavior: Index by key. Last wins on collision.

QUERY:  $.users.index_by(@.id)
OUT:    [null]

`group_shape`

⚠ Not yet supported in v0.5 — runtime returns "GroupShape: builtin unsupported". Tracked for a future release.

Signature: Array<Object> -> Array<Object>
Behavior (planned): Group by structural shape (key set).

Partition

`partition(pred)`

⚠ Not yet supported in v0.5 for chained / pipeline use. The apply_* trait dispatch isn't wired through the streaming planner; calling it inside a chain like $.store.books.partition(@.x) is unreliable. Spec exists but output shape and execution path are subject to change.

Signature (planned): Array<A> -> [Array<A>, Array<A>]
Behavior (planned): [matching, non-matching].

Window / chunk

`window(size)`

Signature: Array<A> -> Array<Array<A>>
Behavior: Sliding window of size.

QUERY:  [1,2,3,4,5].window(3)
OUT:    [[1,2,3],[2,3,4],[3,4,5]]

`chunk(size)` (alias `batch`)

Signature: Array<A> -> Array<Array<A>>
Behavior: Non-overlapping chunks. Last chunk may be shorter.

QUERY:  [1,2,3,4,5,6,7].chunk(3)
OUT:    [[1,2,3],[4,5,6],[7]]

Rolling aggregates

Method	Behavior
`rolling_sum(n)`	Sum over a window of size `n`
`rolling_avg(n)`	Average over a window
`rolling_min(n)`	Min over a window
`rolling_max(n)`	Max over a window

QUERY:  [1,2,3,4,5].rolling_sum(3)
OUT:    [null,null,6.0,9.0,12.0]

The leading n-1 positions emit null until the window fills.

`accumulate(init, fn)`

⚠ Not yet supported in v0.5 — runtime returns "accumulate: builtin not migrated to builtins.rs AST adapter". Spec exists; runtime hookup pending.

Signature (planned): Array<A> -> Array (with fn: (B, A) -> B, init: B)
Behavior (planned): Streaming fold producing intermediate states.

For now, use cummax / cummin for running min/max, or build the fold with a let + recursive helper if absolutely needed.

When to barrier

You have to barrier when:

Order needs computation (sort, unique)
Output is grouped / indexed (group_by, index_by)
A window crosses element boundaries (window, rolling_*)

You don't need a barrier for:

Per-element transforms (map)
Predicates (filter)
Numeric reducers (sum, count) — they're streaming reducers, not barriers

Practical examples

DOC:    {"books":[
  {"title":"Dune","year":1965,"author":"Herbert","price":15},
  {"title":"Foundation","year":1951,"author":"Asimov","price":10},
  {"title":"Hyperion","year":1989,"author":"Simmons","price":18},
  {"title":"Snow Crash","year":1992,"author":"Stephenson","price":12}
]}

# Sort by year ascending
QUERY:  $.books.sort(b => b.year).map(@.title)
OUT:    [null]

# Sort by price descending (negate the key)
QUERY:  $.books.sort(b => -b.price).map(@.title)
OUT:    [null]

# Distinct tags across books
QUERY:  $.books.flat_map(@.tags).unique()

# How many distinct authors
QUERY:  $.books.unique_by(b => b.author).count()
OUT:    1

# Group by author
QUERY:  $.books.group_by(b => b.author)
OUT:    {"null":[null]}

# Histogram of authors (prefer count_by — no buffering of bucket payloads)
QUERY:  $.books.count_by(b => b.author)
OUT:    [null]

# Build a quick lookup table
QUERY:  $.users.index_by(u => u.id)

# Sliding-3 windows for moving stats
QUERY:  $.measurements.window(3).map(w => w.sum() / 3)

# 50/50 split into batches of 10 for paginated processing
QUERY:  $.records.chunk(10)

# 7-day moving average over a numeric series
QUERY:  $.daily.rolling_avg(7)

Array and Set Operations

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "tags_today": ["a", "b", "c"], "tags_yesterday": ["b", "c", "d"], "left_tags": ["a", "b", "c"], "right_tags": ["b", "c", "d"]}

Operations that take an array and produce a derivative array (or join two arrays).

`append(v)` and `prepend(v)`

Signature: Array<A> -> Array<A>
Behavior: Add v to the end / front.

QUERY:  [1,2,3].append(4)     OUT: [1,2,3,4]
QUERY:  [1,2,3].prepend(0)     OUT: [0,1,2,3]

When used as chain-write terminals ($.path.append(v)), they patch the document — see Patch.

`reverse`

Signature: Array<A> -> Array<A>
Behavior: Reverse element order. Also works on strings (calls reverse_str).

QUERY:  [1,2,3].reverse()     OUT: [3,2,1]
QUERY:  "abc".reverse()     OUT: ["abc"]

Set-like operations

Method	Behavior
`diff(other)`	Elements in self not in other
`intersect(other)`	Elements in both
`union(other)`	Elements in either, deduped

QUERY:  [1,2,3,4].diff([3,4,5])     OUT: [1,2]
QUERY:  [1,2,3,4].intersect([3,4,5])     OUT: [3,4]
QUERY:  [1,2,3].union([3,4,5])     OUT: [1,2,3,4,5]

Equality is structural. Order: result preserves first-occurrence order from the left operand.

`join(sep)`

Signature: Array<String> -> String
Behavior: Concatenate strings with separator.

QUERY:  ["a","b","c"].join(", ")
OUT:    "a, b, c"

QUERY:  $.users.map(@.name).join(" / ")

For non-string elements, lift with .map(@.to_string()) first.

`zip(other)` and `zip_longest(other, fill?)`

Signature: Array<A>, Array -> Array<[A, B]>
Behavior: Pair element-wise.

QUERY:  [1,2,3].zip(["a","b","c"])
OUT:    [[1,"a"],[2,"b"],[3,"c"]]

QUERY:  [1,2,3].zip(["a","b"])     OUT: [[1,"a"],[2,"b"]]
QUERY:  [1,2,3].zip_longest(["a","b"]) OUT: [[1,"a"],[2,"b"],[3,null]]
QUERY:  [1,2,3].zip_longest(["a"], "x") OUT: [[1,"a"],[2,"x"],[3,"x"]]

`fanout(...lambdas)`

Signature: A -> Array<...>
Behavior: Apply each lambda to the same input; collect results.

DOC:    {"x": 10}
QUERY:  $.x.fanout(@ * 2, @ + 1, @.to_string())
OUT:    [20,11,"10"]

Useful for building multi-shape projections without repeating subexpressions.

`zip_shape(arrays)`

⚠ Not yet supported in v0.5 — runtime returns "ZipShape: builtin unsupported". Spec exists; runtime hookup pending.

Signature (planned): Object<KeyString, Array<A>> -> Array<Object>
Behavior (planned): Combine parallel arrays under shared keys into an array of objects.

The inverse is pivot — see Objects.

Demand notes

Set operations and join are barriers (they consume both inputs fully). reverse is a barrier too — but it's cheap and well-supported by demand: reverse().take(n) is rewritten so the source seeks to the end.

Practical examples

# Add an item to a tag list
$.user.tags.append("admin")             # patches the doc

# Build a "label = value" string
$.user.pick(name, email).values().join(" = ")

# CSV row from selected fields
[$.user.id, $.user.name, $.user.email].join(",")

# Set difference — find items missing from a baseline
[1,2,3,4,5].diff([2,4])                 # → [1, 3, 5]

# Set intersection — common items
$.left_tags.intersect($.right_tags)

# Merge unique values, preserving first-occurrence order
$.tags_today.union($.tags_yesterday)

# Reverse and take last 5 (demand-aware: seeks end)
$.events.reverse().take(5)

# Pair two arrays positionally
[1,2,3].zip(["a","b","c"])              # → [[1,"a"],[2,"b"],[3,"c"]]

# Pad shorter array with default
[1,2,3].zip_longest(["a","b"], "?")     # → [[1,"a"],[2,"b"],[3,"?"]]

# Run several projections at once
$.metric.value.fanout(@ * 2, @ + 1, @ - 1)    # → [v*2, v+1, v-1]

Object Projection and Transform

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

Methods that read or rewrite objects.

Keys and values

Method	Signature	Result
`keys`	`Object -> Array<String>`	Insertion-order key list
`values`	`Object -> Array<Any>`	Insertion-order value list
`entries`	`Object -> Array<[String, Any]>`	Key-value pairs
`to_pairs`	`Object -> Array<[String, Any]>`	Alias of `entries`

DOC:    {"a": 1, "b": 2}
QUERY:  $.keys()     OUT: ["a","b"]
QUERY:  $.values()     OUT: [1,2]
QUERY:  $.entries()     OUT: [["a",1],["b",2]]

`from_pairs`

Signature: Array<[String, Any]> -> Object
Behavior: Inverse of to_pairs.

QUERY:  [["a",1],["b",2]].from_pairs()
OUT:    {"a":1,"b":2}

`invert`

Signature: Object<K, V> -> Object<V, K>
Behavior: Swap keys and values. Values must be coercible to keys (string-like).

QUERY:  {"a":"x","b":"y"}.invert()
OUT:    {"x":"a","y":"b"}

`pick(field, ...)`

Signature: Object -> Object
Behavior: Keep only the named keys. Supports alias: src rename.

DOC:    {"id": 1, "name": "Ada", "secret": "!"}

QUERY:  $.pick(id, name)
OUT:    {"id":1,"name":"Ada"}

QUERY:  $.pick(uid: id, name)
OUT:    {"name":"Ada","uid":1}

Maps over arrays of objects:

$.users.pick(id, email)

is equivalent to $.users.map(u => u.pick(id, email)).

`omit(field, ...)`

Signature: Object -> Object
Behavior: Inverse of pick. Drop the named keys.

QUERY:  $.user.omit(secret, password)

Merge

Method	Behavior
`merge(other)`	Shallow merge — `other`'s keys win on collision
`deep_merge(other)`	Recursive merge — sub-objects merged, arrays replaced
`defaults(other)`	Reverse merge — keep self's keys, fill missing from `other`

QUERY:  {"a":1,"b":2}.merge({"b":99,"c":3})
OUT:    {"a":1,"b":99,"c":3}

QUERY:  {"a":{"x":1}}.deep_merge({"a":{"y":2}})
OUT:    {"a":{"x":1,"y":2}}

QUERY:  {"a":1}.defaults({"a":99,"b":2})
OUT:    {"a":1,"b":2}

`rename(...mapping)`

Signature: Object -> Object
Behavior: Rename keys per a {old: new, ...} mapping.

QUERY:  $.user.rename({user_id: id, full_name: name})

`transform_keys(fn)` and `transform_values(fn)`

Signature: Object -> Object
Behavior: Apply fn to every key / value.

QUERY:  {"foo": 1, "bar": 2}.transform_keys(@.upper())
OUT:    [{"BAR":2,"FOO":1}]

QUERY:  {"a": 1, "b": 2}.transform_values(@ * 10)
OUT:    [{"a":10,"b":20}]

`filter_keys(pred)` and `filter_values(pred)`

Signature: Object -> Object
Behavior: Keep entries whose key / value matches the predicate.

QUERY:  $.config.filter_keys(k => k.starts_with("aws_"))
QUERY:  $.scores.filter_values(@ >= 50)

`pivot(rows, cols, value)`

Signature: Array<Object> -> Object<KeyString, Object>
Behavior: Pivot a table-shaped array into a nested object indexed by rows then cols, with value as the leaf.

DOC:    [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY:  $.pivot("y", "q", "v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15}}

`implode(joiner=",")`

Signature: Array<String> -> String
Behavior: Like join, but works on object values too:

QUERY:  {"a":"x","b":"y"}.values().implode("/")
OUT:    ["x","y"]

Demand notes

pick is a powerful demand signal — it tells the source which fields are needed. Over a wide-record document, pick(id, name) upstream of the rest of the pipeline avoids decoding all the other fields.

keys over an array stage emits one row per element, but keys over a single object is a scalar.

Practical examples

DOC:    {"users":[
  {"id":1,"name":"Ada","email":"ada@x.com","secret":"!"},
  {"id":2,"name":"Bob","email":"bob@y.org","secret":"?"}
]}

# Project safe public fields
QUERY:  $.users.map(u => u.pick(id, name, email))

# Drop sensitive keys
QUERY:  $.users.map(u => u.omit(secret))

# Rename in flight
QUERY:  $.users.map(u => u.pick(uid: id, full_name: name, email))

# Keys / values / entries
QUERY:  $.users[0].keys()                  → ["id","name","email","secret"]
QUERY:  $.users[0].values().count()        → 4
QUERY:  $.users[0].entries().count()       → 4

# Round-trip through entries
QUERY:  $.users[0].entries().from_pairs()  → equivalent to $.users[0]

# Merge with defaults (existing keys win)
QUERY:  $.config.defaults({timeout: 30, retries: 3})

# Deep-merge config layers
QUERY:  $.base_config.deep_merge($.user_config)

# Filter object by key prefix
QUERY:  $.env.filter_keys(k => k.starts_with("AWS_"))

# Filter values
QUERY:  $.scores.filter_values(@ >= 50)

# Apply transform to every value
QUERY:  $.prices.transform_values(@ * 1.08)

# Normalise keys to snake_case
QUERY:  $.payload.transform_keys(k => k.snake_case())

# Invert a code-to-name table
QUERY:  $.country_codes.invert()           # {"US":"United States",...} → {"United States":"US",...}

# Pivot long-format records
DOC:    [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY:  $.pivot("y","q","v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15}}

Path and Structural Mutation

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

Methods that read, set, delete, or rewrite values at specific paths within a document. These work on whole documents or sub-trees.

For chain-write terminals ($.path.set(v)) see Patch. This chapter documents the method-call versions.

`get_path(path)`

⚠ v0.5 quirk: only resolves a single key — get_path("a/b/c") returns null even when $.a.b.c exists. Use direct path navigation ($.a.b.c) when the path is statically known. For dynamic paths, walk manually with let + chained [expr].

Signature (intended): Any, String -> Any | null
Behavior (intended): Read a value at a slash-separated path.

DOC:    {"user": {"profile": {"name": "Ada"}}}
QUERY:  $.get_path("user")
OUT:    {"profile":{"name":"Ada"}}
QUERY:  $.get_path("user/profile")
OUT:    {"name":"Ada"}

`set_path(path, value)`

Signature: Any, String, Any -> Any
Behavior: Return a copy with value written at path. Creates intermediate objects as needed.

QUERY:  $.set_path("user/profile/email", "ada@example.com")

`del_path(path)`

Signature: Any, String -> Any
Behavior: Return a copy with the leaf at path removed.

QUERY:  $.del_path("user/secret")

`del_paths(paths)`

Signature: Any, Array<String> -> Any
Behavior: Remove all listed paths in one pass. Cheaper than chained del_path for many removals.

QUERY:  $.del_paths(["user/secret", "user/temp", "session/csrf"])

`has_path(path)`

Signature: Any, String -> Bool
Behavior: True if a value exists at path. Distinguishes "missing" from "explicit null":

DOC:    {"a": null}
QUERY:  $.has_path("a")     OUT: false
QUERY:  $.has_path("b")     OUT: false

`flatten_keys(sep="/")`

Signature: Object -> Object
Behavior: Flatten a nested object into a single-level object with joined keys.

DOC:    {"a": {"b": 1, "c": 2}, "d": 3}
QUERY:  $.flatten_keys()
OUT:    {"a.b":1,"a.c":2,"d":3}

QUERY:  $.flatten_keys(".")
OUT:    {"a.b":1,"a.c":2,"d":3}

`unflatten_keys(sep="/")`

Signature: Object -> Object
Behavior: Inverse of flatten_keys.

QUERY:  {"a/b": 1, "a/c": 2}.unflatten_keys()
OUT:    {"a/b":1,"a/c":2}

`set(path, value)` (method-call form)

Signature: Any, String, Any -> Any
Behavior: Same as set_path. Kept for ergonomic chains.

The chain-write terminal $.path.set(v) is different — it's parsed as a patch and operates on the rooted document path.

`update`

update is jetro's functional batched update. Two surfaces:

Object body — `update({k: expr, ...})`

Apply a set of field updates to one or more selected subtrees. Plain keys update fields below the receiver; quoted keys carry full paths.

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf", "hugo"]}
]}

QUERY:  $.books[*].update({tags: tags.append("test"), reviewed: true})
OUT:    {"books":[{"reviewed":true,"tags":["sf","test"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","hugo","test"],"title":"Hyperion","year":1989}]}

Each selected book gets both fields written. Plain identifiers (tags, reviewed) are read against the selected snapshot — not the mid-batch document — so two ops on the same target both see the original field values.

Body forms:

Form	Meaning
`field: expr`	Write `expr` into `field` of each selected target
`"a.b.c": expr`	Write into a nested path inside each selected target
`"books[*].tags": expr`	Quoted path key — full root-relative path with wildcards/filters
`field: expr when cond`	Skip when `cond` is falsy
`field: DELETE`	Remove the field (with optional `when`)

@ inside the body is the current value at the target field (handy inside path keys); $ is the original root.

QUERY:  $.books[*].update({tags: tags.append("modern") when year > 1980})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","hugo","modern"],"title":"Hyperion","year":1989}]}

Root-level batch with quoted paths

When the receiver is $, quoted keys carry full paths, including wildcards and DELETE:

QUERY:  $.update({"books[*].tags": @.append("test"), active: false})
DOC:    {"books": [{"tags": ["sf"]}], "active": true}
OUT:    {"active":false,"books":[{"tags":["sf","test"]}]}

DOC:    {"users": [{"id":1,"secret":"a"}, {"id":2,"secret":"b"}]}
QUERY:  $.update({"users[*].secret": DELETE})
OUT:    {"users":[{"id":1},{"id":2}]}

Filtered wildcard `[* if pred]`

Both selectors and quoted path keys support a filtered wildcard:

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}

QUERY:  $.books[* if year > 1980].update({tags: tags.append("modern")})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

QUERY:  $.update({"books[* if year > 1980].tags": @.append("modern")})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

Two-argument path form — `update(path, expr)`

The classic shape: a slash- or dot-separated path plus an expression. @ inside the expression is the current value at path.

DOC:    {"counters": {"visits": 10, "clicks": 3}}
QUERY:  $.update("counters.visits", @ + 1)
OUT:    {"counters":{"clicks":3,"visits":11}}

QUERY:  $.update("counters/visits", @ + 1)
OUT:    {"counters":{"clicks":3,"visits":11}}

Semantics

Property	Behavior
Snapshot reads	Each body expression sees the pre-batch values, not partial mid-batch state
Order	Ops apply in source order — last write wins on overlap
Selectors	Index, wildcard `[]`, filtered wildcard `[ if pred]`, nested chains all OK
Scalar targets	An update with object body promotes scalar elements to objects (`{seen: true}` over `[1,2]` → `[{seen:true},{seen:true}]`)
Untouched subtrees	Preserved by `Arc` sharing — no deep copy of unrelated fields
Empty body	`.update({})` is a no-op — returns the doc unchanged

Worked example

DOC:    {"users": [
  {"id": 1, "secret": "a", "name": "Ada"},
  {"id": 2, "secret": "b", "name": "Bob"}
]}

QUERY:  $.users.map(u => u.del_paths(["secret"]).set_path("display", u.name))
OUT:    [{"display":null}]

Demand notes

Path-mutation methods produce a full result and can't tell the source what fields they need (the path is data, not statically analysable). When the path is a literal, prefer pick/omit/set over get_path/set_path — the planner can use literal field names.

Practical examples

# Single-key write (preferred over set_path for v0.5)
$.user.name.set("Ada Lovelace")                  # chain-write

# Set a field deep
patch $ { user.profile.email: "ada@x.com" }

# Bulk delete
$.del_paths(["secret","temp","csrf"])

# Flatten a nested config for environment-variable export
$.config.flatten_keys(".")                       # {"db.host":..., "db.port":..., ...}

# Round-trip via flatten/unflatten
$.config.flatten_keys().unflatten_keys()         # ≈ $.config

# Existence test before write
patch $ {
  email: $.user.email when $.has_path("user.email")
}

# Flat-key patches
$.patch_set.flatten_keys().entries().map(([k,v]) => $.set_path(k, v))

Deep Traversal and Recursion

Walk every descendant value in DFS pre-order. The deep methods are also available as ..method(...) syntax sugar in path position.

`deep_find(pred)` (or `..find(pred)`)

Signature: Any -> Array<Any>
Behavior: Every descendant satisfying pred. Order: DFS pre-order.

DOC:    {"a": {"x": 1}, "b": [{"x": 2}, {"y": 3}]}
QUERY:  $..find(@.x?)
OUT:    [{"x":1},{"x":2}]

QUERY:  $.deep_find(@ is number)
OUT:    [1,2,3]

When the structural index is available, deep_find runs over a bitmap representation in jetro-experimental rather than walking Val nodes — significantly faster for shallow predicates.

`deep_shape({k1, k2, ...})` (or `..shape({...})`)

Signature: Any -> Array<Object>
Behavior: Every object that has all listed keys (regardless of value).

DOC:    [{"id":1,"name":"a"},{"id":2},{"name":"c","id":3}]
QUERY:  $..shape({id, name})
OUT:    [{"id":1,"name":"a"},{"id":3,"name":"c"}]

`deep_like({k1: v1, ...})` (or `..like({...})`)

Signature: Any -> Array<Object>
Behavior: Every object whose listed keys equal the listed literal values.

DOC:    [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942},{"author":"Herbert","year":1965}]
QUERY:  $..like({author: "Asimov"})
OUT:    [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942}]

`walk(fn)`

Signature: Any, (Any -> Any) -> Any
Behavior: Apply fn to every node bottom-up; rebuild the tree.

QUERY:  $.walk(node => node.upper() if node is string else node)
# Returns the document with every string node uppercased.

`walk_pre(fn)`

Signature: Any, (Any -> Any) -> Any
Behavior: Like walk, but pre-order — fn sees parent before children.

Use walk_pre when the transform decides whether to recurse based on the node's identity (e.g. "stop at leaves of kind X").

`rec(pattern, fn)`

⚠ Unstable in v0.5 — observed runtime error "rec: exceeded 10000 iterations without reaching fixpoint" even on simple inputs. Spec exists but the fixpoint loop is buggy. Avoid in production until fixed; track migration progress in the issue tracker.

Signature (planned): Any, Pattern, (Any -> Any) -> Any
Behavior (planned): Match-and-rewrite. Recursively walks; replaces every match with fn(match).

This is the recursive sibling of Pattern Match; useful for AST rewrites and document migrations.

`trace_path(pred)`

Signature: Any, (Any -> Bool) -> Array<Array<Step>>
Behavior: For every node matching pred, return the path from root to the node as an array of steps.

DOC:    {"a": {"x": 1}, "b": [{"x": 2}]}
QUERY:  $.trace_path(@.x?)
OUT:    [{"path":"$.a","value":{"x":1}},{"path":"$.b[0]","value":{"x":2}}]

The steps are the keys/indices to walk to reach the match. Pair with set_path for find-and-replace operations.

Deep `match`

The pattern-match construct has deep variants ..match and ..match! — see Control Flow and the pattern-match cookbook.

When the bitmap kicks in

Deep search uses the structural index when:

The query is rooted at $.. or .deep_*
The predicate is a shape/key check (not a complex lambda)
The document was loaded with the simd-json tape (default)

You don't enable this — it's selected by the planner.

Demand notes

Deep traversals declare All upstream by nature. The optimisation surface is the predicate: shape and like checks bypass the per-node lambda evaluation entirely.

Practical examples

# Find every node with an "id" key (anywhere in the tree)
$..find(@.id?)

# Find all numbers
$..find(@ is number)

# Every object that has both id + name keys
$..shape({id, name})

# Every object where a field equals a specific value
$..like({status: "error"})

# Locate an event by ID inside a deeply nested tree
$..match! { {id: 42} -> @, _ -> null }

# Walk every node, transforming strings to upper
$.walk(node => node.upper() if node is string else node)

# Trace paths from root to nodes matching a predicate
$.trace_path(@.is_admin?)
# → [["users",0],["users",2]]

# Bulk audit: find every "secret"-named field
$..find(@.secret?)

Membership and Predicates

Tests and small helpers.

`or(default)`

Signature: Any, Any -> Any
Behavior: If self is null, return default. Otherwise return self.

QUERY:  null.or("default")     OUT: "default"
QUERY:  "hi".or("default")     OUT: "hi"

Equivalent to ?? default but reads better in chains:

$.user.name.or("anon")

`has(key)`

Signature: Object|Array, KeyOrIndex -> Bool
Behavior: True if the key exists (objects) or index is in range (arrays).

QUERY:  {"a":1,"b":2}.has("a")     OUT: true
QUERY:  {"a":1}.has("b")     OUT: false
QUERY:  [1,2,3].has(2)     OUT: true
QUERY:  [1,2,3].has(5)     OUT: false

The has operator (x has y) is sugar for x.includes(y) — distinct from this method.

`missing(...keys)`

⚠ Broken in v0.5 — empirically returns false instead of the array of missing keys. Compute manually until fixed:
["host", "port", "user"].filter(k => not $.config.has_path(k))

Signature (intended): Object, ...String -> Array<String>
Behavior (intended): Return the subset of provided keys that are not present.

`includes(value)` (alias `contains`)

Signature: Array|String, Any -> Bool
Behavior: Membership.

QUERY:  [1,2,3].includes(2)           OUT: true
QUERY:  "hello".includes("ell")       OUT: true

`index(value)`

Signature: Array|String, Any -> Number | null
Behavior: Index of first occurrence; null if not found.

QUERY:  [10,20,30].index(20)          OUT: 1
QUERY:  [10,20,30].index(99)          OUT: null

For strings, see also index_of in String Search.

`indices_of(value)`

Signature: Array|String, Any -> Array<Number>
Behavior: All indices of value.

QUERY:  [1,2,3,2,1].indices_of(2)
OUT:    [1, 3]

Quick comparison: predicates that look similar

Pattern	Returns
`xs.has("foo")`	Bool — does the key/index exist?
`xs.includes("foo")`	Bool — is the value present?
`xs.index("foo")`	Number\|null — where?
`xs.indices_of("foo")`	Array — all positions
`xs.find(p)`	A\|null — first matching element
`xs.find_index(p)`	Number\|null — first matching index

Practical examples

# Default for missing field
$.user.email.or("no-email@example.com")

# Existence check on key
$.config.has("aws_region")

# Index of a value (not the predicate form)
$.tags.index("admin")

# All positions of duplicates
[1, 2, 1, 3, 1].indices_of(1)            # → [0, 2, 4]

# Membership in a set
$.tags.includes("urgent")

# Allow-list / deny-list patterns
$.role.includes("admin") and not $.banned_users.includes($.id)

Tabular Output

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}]}

Serialise sequences of objects to row-oriented text formats.

`to_csv(headers?)`

Signature: Array<Object> -> String
Behavior: RFC-4180-ish CSV. Without arguments, the union of object keys is the header set, sorted by first-appearance.

DOC:    [{"name":"Ada","age":36},{"name":"Bob","age":42}]
QUERY:  $.to_csv()
OUT:
"name,age
Ada,36
Bob,42"

With explicit headers:

QUERY:  $.to_csv(["age","name"])
OUT:
"age,name
36,Ada
42,Bob"

Strings containing commas, quotes, or newlines are quoted and escaped per RFC 4180.

`to_tsv(headers?)`

Signature: Array<Object> -> String
Behavior: Same as to_csv but tab-separated. No quoting (tab-in-value is replaced with a space).

QUERY:  $.users.to_tsv(["id","email"])

Composing with the rest of the pipeline

Build a report:

$.users
  .filter(@.active)
  .map(u => u.pick(id, name, email))
  .sort(@.id)
  .to_csv()

Pipe to a file from the CLI:

jetrocli '$.users.filter(@.active).pick(id,name).to_csv()' < users.json > out.csv

Limitations

Nested values are JSON-encoded into the cell. For deeply-nested structures, flatten first with flatten_keys:
```
$.records.map(r => r.flatten_keys()).to_csv()
```
The format is row-major. For wide-narrow long-format reshape, use pivot / zip_shape first.
For Excel-flavored CSV (BOM, CRLF), post-process the result.

Practical examples

# Active-user export
$.users.filter(@.active).map(u => u.pick(id, name, email)).sort(u => u.id).to_csv()

# Daily sales report (use e[0]/e[1] indexing — array-pattern destructure
# inside a lambda doesn't parse in v0.5)
$.sales.group_by(s => s.day).entries().map(e => {
  day:   e[0],
  total: e[1].map(@.amount).sum(),
  count: e[1].count()
}).to_csv()

# Hashtag frequency CSV
$.tweets.flat_map(t => t.entities.hashtags.map(@.text))
  .count_by(@)
  .entries()
  .map(e => {tag: e[0], count: e[1]})
  .to_csv()

# TSV for log shipping
$.logs.map(l => l.pick(ts, level, message)).to_tsv()

Relational

Fixture

Examples below run against:

DOC:    {"orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "customers": [{"id": 1, "name": "Ada", "email": "ada@x.com"}, {"id": 2, "name": "Bob", "email": "bob@y.org"}], "left": [{"id": 1, "name": "Ada"}, {"id": 2, "name": "Bob"}], "right": [{"uid": 1, "role": "admin"}, {"uid": 2, "role": "user"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}]}

Operations that combine two arrays of objects on a key.

`equi_join(other, leftKey, rightKey, fn?)`

Signature: Array<L>, Array<R>, KeyL, KeyR, ((L, R) -> Any)? -> Array<Any>
Behavior: Inner equi-join: for every pair (l, r) where l[leftKey] == r[rightKey], emit a result. If fn is omitted, the result is the merged object l.merge(r).

LEFT:   [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}]
RIGHT:  [{"uid":1,"role":"admin"},{"uid":2,"role":"user"}]

QUERY:  $.left.equi_join($.right, "id", "uid")
OUT:    [{"id":1,"name":"Ada","uid":1,"role":"admin"},
         {"id":2,"name":"Bob","uid":2,"role":"user"}]

QUERY:  $.left.equi_join($.right, "id", "uid", (l, r) => {
          name: l.name,
          role: r.role
        })
OUT:    [{"name":"Ada","role":"admin"},{"name":"Bob","role":"user"}]

Worked example: orders + customers

DOC:
{
  "customers": [
    {"id": 1, "name": "Ada"},
    {"id": 2, "name": "Bob"}
  ],
  "orders": [
    {"customer": 1, "amount": 100},
    {"customer": 1, "amount": 50},
    {"customer": 2, "amount": 75}
  ]
}

QUERY:
  $.orders.equi_join($.customers, "customer", "id", (o, c) => {
    customer: c.name,
    amount: o.amount
  })

OUT:
  [
    {"customer":"Ada","amount":100},
    {"customer":"Ada","amount":50},
    {"customer":"Bob","amount":75}
  ]

Notes and limitations

Inner only. No outer joins. For "all left, fill missing right with null" you can hand-roll:
```
$.left.map(l =>
  l.merge($.right.find(@.uid == l.id).or({role: null}))
)
```
Equality only. No range, prefix, or function joins.

One key on each side. For multi-key joins, project a tuple key first:

$.left.map(l => l.merge({_k: [l.a, l.b]}))
     .equi_join($.right.map(r => r.merge({_k: [r.x, r.y]})), "_k", "_k")

The implementation builds a hash on the right side; left is streamed. Pre-sort or pre-filter before joining if either side is large and only a subset matters.

When to choose join vs. lookup

For "many left rows, lookup one field on each":

$.orders.map(o => o.merge({customer_name: $.customers.find(@.id == o.customer).name}))

This nested find is O(n×m) — fine for small data. For large data, use equi_join (O(n+m)) or build a lookup table first:

let by_id = $.customers.index_by(@.id) in
  $.orders.map(o => o.merge({customer_name: by_id[o.customer].name}))

Practical examples

# Enrich orders with customer info
$.orders.equi_join($.customers, "customer_id", "id")

# Custom result shape
$.orders.equi_join($.customers, "customer_id", "id", (o, c) => {
  order_id: o.id,
  total: o.amount,
  buyer: c.name,
  email: c.email
})

# Self-join: pair adjacent records via shared key
$.events.equi_join($.events, "session_id", "session_id", (a, b) => {a, b})

# Multi-key join via tuple projection
let lk = $.left.map(l => l.merge({_k: f"{l.a}-{l.b}"})) in
  let rk = $.right.map(r => r.merge({_k: f"{r.x}-{r.y}"})) in
    lk.equi_join(rk, "_k", "_k")

# Filter-then-join (drop rows before paying join cost)
$.orders.filter(@.status == "paid").equi_join($.customers, "cid", "id")

Chained Pipelines

Real-world queries assembled from the building blocks. Each recipe uses one small document and shows the query chain plus a sentence on what the planner does.

1. Top-N by aggregate

DOC:    {"sales": [
  {"region": "NA", "amount": 100},
  {"region": "EU", "amount": 200},
  {"region": "NA", "amount": 50},
  {"region": "AS", "amount": 300},
  {"region": "EU", "amount": 75}
]}

QUERY:  $.sales
          .group_by(@.region)
          .entries()
          .map(([region, rows]) => {region, total: rows.map(@.amount).sum()})
          .sort(@.total)
          .reverse()
          .take(2)

OUT:    [{"region":"AS","total":300},{"region":"EU","total":275}]

group_by and sort are barriers; take(2) after the sort doesn't help — the sort must complete first. Push the demand earlier where possible.

2. Active users + role-based count

DOC:    {"users": [
  {"id":1,"role":"admin","active":true},
  {"id":2,"role":"user","active":false},
  {"id":3,"role":"user","active":true},
  {"id":4,"role":"admin","active":true}
]}

QUERY:  $.users
          .filter(@.active)
          .count_by(@.role)

OUT:    {"admin":2,"user":1}

Streaming filter + barrier count_by. The filter passes only what's needed; count_by buffers but with ValueNeed::Predicate (only the role key) — the rest of the user object is never decoded.

3. Histogram of word frequency

DOC:    {"text": "the quick brown fox jumps over the lazy dog the end"}

QUERY:  $.text
          .words()
          .map(@.lower())
          .count_by(@)

OUT:    {"the": 3, "quick": 1, "brown": 1, ...}

4. Customer order summary

QUERY:  $.orders
          .group_by(@.customer_id)
          .entries()
          .map(([cid, orders]) => {
            customer_id: cid,
            total: orders.map(@.amount).sum(),
            count: orders.count(),
            recent: orders.sort(@.date).last().date
          })
          .sort_by(@.total)
          .reverse()

The inner .sort(@.date).last() is wasteful: it sorts every group to grab the last. Rewrite with max_by:

QUERY:  ...
          .map(([cid, orders]) => {
            customer_id: cid,
            total: orders.map(@.amount).sum(),
            count: orders.count(),
            recent: orders.max_by(@.date).date
          })

5. Unique recent active sessions

QUERY:  $.events
          .filter(@.kind == "login" and .at >= "2026-01-01")
          .map(@.user_id)
          .unique()
          .count()

6. Pretty-print a CSV from objects

QUERY:  $.users
          .filter(@.active)
          .map(u => u.pick(id: id, name: full_name, email))
          .sort(@.id)
          .to_csv()

7. Find a needle in a deep document

QUERY:  $..find(@.id == 42)

If the document was loaded from bytes (default), this hits the structural index — no full traversal.

8. Compute deltas with `pairwise`

DOC:    {"prices": [100, 105, 102, 110, 108]}

QUERY:  $.prices.pairwise().map(([a, b]) => b - a)
OUT:    [5,-3,8,-2]

9. Rolling 3-point moving average

QUERY:  $.measurements.rolling_avg(3)

The first two outputs are null until the window fills.

10. Build a lookup, then enrich

QUERY:  let by_id = $.users.index_by(@.id) in
          $.events.map(e => e.merge({user: by_id[e.user_id].name}))

index_by is a barrier that runs once; the .map streams.

11. Select rows with all required fields

QUERY:  $.records.filter(r => r.missing("id", "name", "email").count() == 0)

12. Re-shape a long-format table

DOC:    [
  {"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},
  {"y":2025,"q":1,"v":15},{"y":2025,"q":2,"v":25}
]
QUERY:  $.pivot("y", "q", "v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15,"2":25}}

13. Mask sensitive fields

QUERY:  $.users.map(u => u.omit("password", "ssn", "token"))

14. Delta + cumulative sum

QUERY:  $.daily.pairwise().map(([a, b]) => b.value - a.value)

Cumulative-sum form (.accumulate(0, (a, x) => a + x)) isn't yet wired up in v0.5 — see the Limitations page. Until then, cummax / cummin cover running min/max; full fold needs a host loop.

15. Migrate a document shape

⚠ rec is unstable in v0.5 (fixpoint loop bug). For now, prefer walk / walk_pre with a manual shape check, or do the rewrite host-side.

QUERY (planned, currently broken):
  $.rec({type: "v1"}, doc =>
    doc.merge({type: "v2"})
       .rename({old_field: "new_field"})
       .omit("legacy_blob"))

rec walks the document, finds every node matching the shape, and rewrites in place.

Pattern Match Cookbook

Fixture

Examples below run against:

DOC:    {"xs": [1, 2, 3, 4, 5], "row": {"k": "foo", "data": {"a": 1, "b": 2}}, "doc": {"a": 1, "b": 2, "type": "v1"}, "tree": {"x": 1, "children": [{"x": 2}]}, "value": 3.14}

Pattern matching is one of jetro's most expressive features. It compiles to a Maranget decision tree at lower-time and runs over all three execution domains (Val, borrowed View, tape).

Anatomy

match scrutinee with {
  pattern1 -> expr1,
  pattern2 when guard -> expr2,
  _ -> default
}

Arms checked top-down.
First match wins.
_ is the universal fallback.
when guards run after the structural match succeeds.

Pattern reference

Pattern	Matches
`42`, `"x"`, `true`, `null`	Equal literal
`_`	Anything
`name`	Anything, binds to `name`
`1..10`	Number ≥ 1 and < 10
`1..=10`	Number ≥ 1 and ≤ 10
`{k: p, ...}`	Object with key `k`, value matches `p`
`[p1, p2]`	Array of length 2
`[h, ...t]`	Head + tail
`p1 \| p2`	Either
`x: number`	Kind-bind

v0.5 note: object shorthand {id, name} binds each key to a same-name local, and rest-capture is spelled ...*rest (object) or ...tail (array): {id, name, ...*rest}, [h, ...tail]. See Limitations for the canonical pattern grammar.

1. Discriminated union

match $.event with {
  {kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
  {kind: "key",   code: c}       -> f"key:{c}",
  {kind: "scroll", dy: d}        -> f"scroll:{d}",
  _ -> "unknown"
}

In v0.5 every object pattern key needs an explicit key: binding form; the bare {kind: "click", x, y} shorthand parses-error.

2. Numeric ranges

match $.score with {
  s when s < 0 -> "invalid",
  0..50 -> "low",
  50..80 -> "medium",
  80..=100 -> "high",
  _ -> "out of range"
}

3. Or-patterns

match $.day with {
  "sat" | "sun" -> "weekend",
  _ -> "weekday"
}

4. Rest capture

⚠ Not yet supported in v0.5. The ..rest pattern parse-errors. Bind the keys you care about explicitly and compute rest outside the match if needed:

match $.config with {
  {host: h, port: p} -> {host: h, port: p, extras: $.config.omit("host", "port")},
  _ -> null
}

5. Array shape

match $.coords with {
  [x, y] -> {x, y},
  [x, y, z] -> {x, y, z},
  _ -> null
}

6. Head + tail

match $.xs with {
  [] -> "empty",
  [first, ...rest] -> f"head={first}, count={rest.count()}",
}

7. Kind-bound + guard

match $.value with {
  s: string when s.len() > 100 -> "long string",
  s: string -> "short string",
  n: number when n > 0 -> "positive",
  n: number -> "non-positive",
  _: array -> "array",
  _ -> "other"
}

8. Deep match (`..match`)

Walk every descendant; collect results.

$.tree..match {
  {kind: "leaf", value} -> value,
  _ -> null
} | .compact()

The trailing .compact() drops the nulls from non-leaf descendants.

9. First-match deep (`..match!`)

Stops at the first match — the bang variant uses early termination via the structural index where possible.

$.tree..match! {
  {role: "admin", id} -> id,
  _ -> null
}

10. Migration / rewrite (`rec`)

$.doc.rec({type: "v1"}, node => node.merge({type: "v2"}))

rec is the recursive sibling of match — it descends and rewrites every matching node.

When multiple arms test the same prefix ({kind: "x", ...}, {kind: "y", ...}), the lowering shares the discriminant test. You don't write anything special — the planner does it for you. Practically: write many narrow arms; they cost about as much as one big switch.

12. Guards over deep patterns

match $.row with {
  {user: {age, role: "admin"}} when age >= 18 -> "adult admin",
  {user: {age}} when age < 18 -> "minor",
  _ -> "other"
}

Bench tips

Patterns with literal-only discriminants (no guards) compile to switch-like decision trees and run as fast as a hand-written if/else if.
Guards add a per-arm conditional; cheap, but don't put expensive computation in them.
Deep ..match over a large doc benefits a lot from the structural index; deep ..match! (first-match) is even better.

Write Fusion

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

When a query contains multiple chain-writes, jetro fuses them into a single pass over the document. This is the patch-fusion optimizer.

What gets fused

Any sequence of chain-write terminals on the same document:

$.user.name.set("Ada")
   .user.email.set("ada@x.com")
   .user.tags.append("admin")

Or the equivalent block form (preferred for many writes):

patch $ {
  user.name: "Ada",
  user.email: "ada@x.com",
  user.tags[*]: "admin"
}

Without fusion

Naively, three writes mean three traversals from $:

$ → user → name      (write)
$ → user → email     (write)
$ → user → tags[*]   (write)

Each rebuilds the path from the root. For deeply-nested documents, the cost adds up.

With fusion

The optimizer collects effects, walks the document once, and applies all relevant rewrites at each visited node:

$ → user → {set name, set email, append tags}

Three writes, one walk.

Phases

The patch-fusion pass has internal phases (Phase C, Phase E in the source); the user-visible properties are:

Same-base writes group together. Writes under $.user.* batch.
Disjoint paths don't interfere. Writes to $.user.name and $.config.theme execute in one walk but at different nodes.
Conflicts are resolved last-wins. Two writes to the same path: the later one wins.
Conditional writes (when) are evaluated per-write. They short-circuit per clause; the walk doesn't redo work.

Worked example

DOC:
{
  "users": [
    {"id": 1, "name": "Ada", "active": false},
    {"id": 2, "name": "Bob", "active": true}
  ]
}

QUERY:
patch $ {
  users[*].active: true,                        # broadcast write
  users[0].name: "Ada Lovelace",                # specific write
  users[*].last_seen: "2026-05-08" when .active # conditional broadcast
}

What happens:

One walk visits every user.
For each, three potential writes evaluate. Per element:
- active: true always applies.
- name only at index 0.
- last_seen only when post-active write is true (so all of them).

Output:

{
  "users": [
    {"id": 1, "name": "Ada Lovelace", "active": true, "last_seen": "2026-05-08"},
    {"id": 2, "name": "Bob",          "active": true, "last_seen": "2026-05-08"}
  ]
}

When fusion doesn't fire

The chain isn't rooted at $ (parser doesn't classify it as a write).
The writes are gated by data-dependent conditions that change document shape mid-pipeline.
Mixed read/write — $.users[0].name.set("A").upper() keeps standard method semantics.

Tips

Prefer the block form (patch $ { … }) when you have ≥ 3 writes — easier to read, and the optimizer treats it identically.
Use broadcast (xs[*].field: v) instead of a .map that calls .set per element.
Conditionals (when) are fine — they don't break fusion.

jq vs jetro Cheatsheet

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}]}

For users coming from jq. Same shape: query JSON in a terminal. Different philosophy in places — call this out where it matters.

Big differences at a glance

Topic	jq	jetro
Calling methods	Pipe-of-filters: `. \| length`	Dot syntax: `.len()`
Pipe `\|`	Sole composition operator	Value-flow only — passes `@` to RHS
Iteration	Implicit on `.[]`	Explicit on chained methods
Lambdas	None — uses `.` rebinding	Three forms: `@`, `r =>`, `lambda r:`
Pattern matching	None	First-class with guards and ranges
Writes	`\|=`, `=`, `del()`	`.set()`, `patch $ {}`, chain-writes
Backend	Single interpreter	Six backends, planner-selected
Caching	None	Plan + path caches in `JetroEngine`

One-liner translations

Identity / projection

jq:     .
jetro:  $

jq:     .x
jetro:  $.x

jq:     .x.y[0]
jetro:  $.x.y[0]

Iteration

jq:     .users[]
jetro:  $.users[*]                  # explicit; or just .users for chained methods

jq:     .users[].name
jetro:  $.users.map(@.name)

Field selection / projection

jq:     {id, name}
jetro:  .pick(id, name)            # method form, maps over arrays

jq:     .users | map({id, name})
jetro:  $.users.map(u => u.pick(id, name))
        # or
        $.users.pick(id, name)

jq:     del(.password)
jetro:  $.omit(password)            # or $.password.delete()

Filter

jq:     .users | map(select(.active))
jetro:  $.users.filter(@.active)

jq:     .users[] | select(.age > 18)
jetro:  $.users.filter(@.age > 18)

Aggregates

jq:     length
jetro:  .len()                      # for arrays, objects, strings
        .count()                    # explicit array-count reducer

jq:     [.[] | .price] | add
jetro:  $.map(@.price).sum()

jq:     [.[] | .age] | min
jetro:  $.map(@.age).min()
        # or
        $.min_by(@.age).age           # one-pass, returns whole element

Sort / unique / group

jq:     sort
jetro:  .sort()

jq:     sort_by(.year)
jetro:  .sort(@.year)

jq:     unique
jetro:  .unique()

jq:     group_by(.author)
jetro:  .group_by(@.author)
        # jq returns array-of-arrays; jetro returns object indexed by key

jq:     [group_by(.k)[] | {k: .[0].k, n: length}]
jetro:  .count_by(@.k).entries().map(([k,n]) => {k, n})

Slice and take

jq:     .[0:3]
jetro:  $[0:3]

jq:     .[0]
jetro:  $[0]
        # or
        $.first()                    # demand-aware sink

jq:     .[-1]
jetro:  $[-1]
        # or
        $.last()

Has / index / membership

jq:     has("foo")
jetro:  .has("foo")

jq:     .tags | index("admin")
jetro:  $.tags.index("admin")

jq:     .tags | contains(["admin"])
jetro:  $.tags.includes("admin")

Strings

jq:     ascii_upcase
jetro:  .upper()

jq:     ltrimstr("foo")
jetro:  .strip_prefix("foo")

jq:     split(",")
jetro:  .split(",")

jq:     test("regex")
jetro:  @ ~= "regex"
        # or
        .re_match("regex")

jq:     match("(\\d+)").captures
jetro:  .captures("(\d+)")

Recursive descent

jq:     ..
jetro:  ..                           # same notation

jq:     .. | strings
jetro:  $..find(@ is string)

jq:     .. | objects | select(.id?)
jetro:  $..find(@.id?)
        # or
        $..shape({id})

String formatting

jq:     "Hello, \(.name)!"
jetro:  f"Hello, {$.name}!"

Conditional

jq:     if .x > 5 then "big" else "small" end
jetro:  "big" if $.x > 5 else "small"

jq:     .x // "default"
jetro:  $.x ?? "default"

Variables

jq:     . as $doc | $doc.x + $doc.y
jetro:  let doc = $ in doc.x + doc.y

Reduce / fold

jq:     reduce .[] as $x (0; . + $x)
jetro:  $.sum()                      # for sum specifically
        # or general fold:
        $.accumulate(0, (a, x) => a + x).last()

Object construction

jq:     {users: [.[] | {id, name}]}
jetro:  {users: $.map(u => u.pick(id, name))}

Modification

jq:     .x = 1
jetro:  $.x.set(1)
        # or
        patch $ {x: 1}

jq:     .x |= . + 1
jetro:  $.x.modify(@ + 1)

jq:     del(.x)
jetro:  $.x.delete()

jq:     .users[].active = true
jetro:  $.users[*].active.set(true)
        # or
        patch $ {users[*].active: true}

Multiple writes

jq:     .x = 1 | .y = 2 | del(.z)
jetro:  patch $ {x: 1, y: 2, z: DELETE}

jetro fuses these into one document walk. jq evaluates each pipe stage independently.

Complex pipeline translations

Real-world jq queries from the wild. Originals are taken verbatim from the jq manual and the Programming Historian "Reshaping JSON with jq" lesson; all credit to those sources. Each shows the original jq alongside an idiomatic jetro rewrite.

1. Alternative-binding destructure (jq manual)

Flatten a list of resources whose events field may be either a single object or an array of objects, into one row per (resource, event) pair. jq uses its alternative-destructuring operator ?// to try both shapes:

.resources[] as {$id, $kind, events: {$user_id, $ts}} ?// {$id, $kind, events: [{$user_id, $ts}]}
  | {$user_id, $kind, $id, $ts}

jetro has no ?//. Use kind-test + flat_map to normalise:

$.resources.flat_map(r =>
  let evts = (r.events if r.events is array else [r.events]) in
    evts.map(e => {
      user_id: e.user_id,
      kind:    r.kind,
      id:      r.id,
      ts:      e.ts
    })
)

…or with a match to make the two shapes explicit:

$.resources.flat_map(r =>
  match r.events with {
    arr: array -> arr.map(e => {user_id: e.user_id, kind: r.kind, id: r.id, ts: e.ts}),
    {user_id, ts} -> [{user_id, kind: r.kind, id: r.id, ts}],
    _ -> []
  }
)

The match form is more explicit and surfaces the "single object" branch as its own arm — easier to extend (e.g. add a third event-shape later).

2. Tweet hashtags as semicolon-joined CSV (Programming Historian)

Take an array of tweets, project id plus a semicolon-joined string of hashtag texts, emit as CSV. Original jq, threaded through five pipe stages:

{id: .id, hashtags: .entities.hashtags}
| {id: .id, hashtags: [.hashtags[].text]}
| {id: .id, hashtags: .hashtags | join(";")}
| [.id, .hashtags]
| @csv

Each pipe stage rebuilds the object — jq has no nested method chaining, so projection accumulates by reassignment.

jetro collapses it to one chain:

$.map(t => {
  id:       t.id,
  hashtags: t.entities.hashtags.map(@.text).join(";")
}).to_csv()

to_csv already emits the row, headers and all. To match jq's headerless output:

$.map(t => [t.id, t.entities.hashtags.map(@.text).join(";")])
 .map(row => row.map(@.to_string()).join(","))
 .join("\n")

3. Hashtag frequency CSV (Programming Historian)

Explode each tweet into one row per hashtag, group by hashtag, count, emit (tag, count) as CSV. Original jq:

[.[] | {id: .id, hashtag: .entities.hashtags} | {id: .id, hashtag: .hashtag[].text}]
| group_by(.hashtag)
| .[]
| {tag: .[0].hashtag, count: . | length}
| [.tag, .count]
| @csv

jq's group_by returns an array-of-arrays, so the trailing .[] and .[0].hashtag extract the key from the first element of each group.

jetro uses count_by, which already produces a {tag: count} map:

$.flat_map(t => t.entities.hashtags.map(@.text))
 .count_by(@)
 .entries()
 .map(([tag, count]) => {tag, count})
 .to_csv()

The pipeline reads top-to-bottom: explode → tally → reshape → emit. count_by is one of several jetro idioms (also index_by, unique_by, max_by) that fold a common jq pattern (group_by | map(...)) into a single barrier.

Why these examples are shorter in jetro

Three patterns recur:

Method chaining. jq's ... | {...} | {...} style rebuilds the object at each stage; jetro's .map(t => {...}) builds it once.
Specialised barriers. count_by, index_by, unique_by, max_by, min_by collapse group_by | map(...) chains into one call.
First-class lambdas. jq's . rebinding inside as / [] becomes plain t => t.field in jetro, with no positional gymnastics.

The trade-off: jq's pipe-of-filters is more uniform — every stage is a filter that takes one input and produces zero-or-more outputs. jetro's methods are typed (one-to-one, filter, expander, reducer, barrier), so the pipeline shape is more visible but the surface is bigger.

Things jq has that jetro doesn't

@base64, @uri, @csv formatters as suffix. jetro spells these as methods: .to_base64(), .url_encode(), .to_csv().
SQL-style modules. No equivalent.
input, inputs, nul-separated streaming. jetro is in-process; no streaming-input model.
recurse(f; cond). Use walk_pre or rec with a pattern.

Things jetro has that jq doesn't

Pattern matching with guards, ranges, kind binding, deep ..match.
Demand propagation. .first(), .find(), .take(n) cut off the source; no full materialization.
Bitmap structural index. ..find, ..shape, ..like skip non-matching subtrees in O(1) per node.
First-class lambdas (r => body, lambda r: body) with let-binding + inlining.
Write fusion. Many writes batch into one walk.
Backends. Tape-zero-copy, structural index, columnar — selected by the planner.

Pitfalls when porting

.[] doesn't exist. Replace with [*] or just chain methods (most jetro methods auto-iterate over arrays).
Pipe is not composition. .x | .y in jq means "x then y". In jetro it's "evaluate .y with @ = .x". For chaining methods, use .: .x.y().
Method calls need parens. length is .len(), not .len.
select(p) becomes filter(p), and works on whole arrays — no need to first iterate with .[].
Group_by returns an object, not an array of arrays. Use .entries() for jq-shaped output.

Quick reference card

Need	jq	jetro
Project	`{a, b}`	`.pick(a, b)`
Drop key	`del(@.k)`	`.omit(k)`
Filter	`select(p)`	`.filter(p)`
Map	`map(f)`	`.map(f)`
Iterate	`.[]`	`[*]` or implicit
Length	`length`	`.len()`
Sort	`sort_by(@.k)`	`.sort(@.k)`
Unique	`unique`	`.unique()`
First	`.[0]`	`.first()`
Last	`.[-1]`	`.last()`
String concat	`"\(@.x)"`	`f"{$.x}"`
Default	`// d`	`?? d`
If	`if c then a else b end`	`a if c else b`
Var	`as $x`	`let x = ...`
Set	`.x = v`	`.x.set(v)`
Update	`.x \|= f`	`.x.modify(f)`
Delete	`del(@.x)`	`.x.delete()`

Performance Guide

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "rows": [{"age": "30", "price": "3.14"}]}

How to write jetro queries that the planner can run fast, and how to read the benchmarks.

Mental model

Jetro picks one of six backends per pipeline node. Fast paths share three properties:

The source is a path of pure field accesses. $.a.b.c triggers tape backends (zero-copy over simd-json output).
The pipeline ends in a sink that bounds demand. .first(), .take(n), .find(p), .count() propagate backward and gate source reads.
No mid-pipeline materialization. .collect(), .sort(), .group_by() flush the tape access pattern back to a Val walk.

If you write to those three rules, queries land on the fast path automatically.

Backend selection (cheat-sheet)

Source / shape	Primary backend
`$.a.b.c` (field-chain)	tape-view (zero-copy)
`$..find(...)`, `$..shape({...})`	bitmap structural index
Single `$.a.b` (path only)	tape-path
Generic expr / lambda body	fast-children
Any backend declines	interpreted (universal fallback)

You don't pick — the planner does. Knowing the table tells you why a query is fast.

Demand: the killer feature

Every Demand-aware sink lets the source skip work. Concrete impact:

Pattern	Speedup vs. naive
`xs.first()`	~N× (reads 1 element)
`xs.find(p)`	up to ~N× (stops at first match)
`xs.filter(p).take(k)`	up to N/k×
`xs.count()`	2-5× (no payload decoded)
`xs.sum()`, `xs.avg()`	2-3× (only numeric leaves)
`xs.last()` (random-access source)	~N× (seek to end)
`xs.reverse().take(k)`	rewritten to `LastInput(k)`

For wide objects, field projection is the other big win:

$.users.map(u => u.pick(id, name))

The source decodes only id and name per row. Other fields stay as raw tape tokens.

What kills performance

Mid-chain materialization

$.users
  .filter(@.active)
  .collect()                # unnecessary
  .map(@.email)

The .collect() forces a full pass before .map. Drop it.

Pre-sort barriers blocking demand

$.events.sort(@.ts).first()

.sort is a barrier — must see every element. The .first() doesn't help. Rewrite with min_by:

$.events.min_by(@.ts)

One pass, no allocation of the sorted array.

Per-element joins (O(n×m))

$.orders.map(o => o.merge({name: $.users.find(@.id == o.user_id).name}))

Each find rescans $.users. For large data, build a lookup once:

let by_id = $.users.index_by(@.id) in
  $.orders.map(o => o.merge({name: by_id[o.user_id].name}))

Or use equi_join.

Repeated sub-expressions

$.user.profile.name + " <" + $.user.profile.email + ">"

Three tape walks. Bind once:

let p = $.user.profile in
  f"{p.name} <{p.email}>"

Heavy lambdas in barriers

$.rows.unique_by(@.to_string())

unique_by calls the lambda once per row. If the projection is non-trivial (regex, deep traversal), pre-project once:

$.rows.map(r => r.merge({_k: r.to_string()}))
     .unique_by(@._k)
     .map(@.omit(_k))

Engine tuning

Plan cache

JetroEngine caches (query, context) → compiled pipeline. Default 256 entries, wholesale eviction.

For a small fixed query set with high doc volume — the typical web-server shape — every call after the first is a cache hit. Don't fight it.

For unique-per-call queries (CLI ad-hoc), the cache is a no-op; just use Jetro directly.

Path cache

The VM caches resolved pointer paths per document. The hash key includes both structure and primitive values bounded at depth 8 — so two docs with the same shape but different leaves stay distinct. You don't manage this.

simd-json (default)

The simd-json feature gives ~4× cold-start. Disable only if you need to round-trip serde_json::Value and the conversion cost dominates.

Benchmarks

cargo bench -p jetro-core

The harness covers:

Field access ($.a.b.c) — tape-view zero-copy
Filter / map / take pipelines — demand propagation
Deep search (..find, ..shape) — bitmap structural index
Pattern match — Maranget tree
Lambda forms — @ vs. => vs. lambda parity
Write fusion — single vs. fused multi-writes

To compare your changes against main:

git checkout main
cargo bench -p jetro-core -- --save-baseline main
git checkout your-branch
cargo bench -p jetro-core -- --baseline main

Reading the output: criterion reports geometric mean ratios. >5% regression should have a clear cause.

Profiling

For Rust workloads:

cargo bench -p jetro-core --bench <name> -- --profile-time 10

Then attach with samply or cargo flamegraph. Hot paths usually live in:

exec/pipeline/exec.rs — pipeline driver
exec/view/*.rs — borrowed view stages
exec/router.rs — backend selection
vm/exec.rs — bytecode VM (interpreted fallback)

If the interpreter (vm::execute) shows up hot, the planner is falling through to the universal fallback. Check the query — usually a non-$ source or a generic expr inside a method arg.

Quick checklist

Before benchmarking a query, ask:

Can .first() / .take() / .find() replace a full materialization?
Is there a barrier (sort, unique, group_by) before the bound? Push the bound earlier or use a one-pass equivalent (min_by, count_by).
Does a lookup repeat per row? Pre-build with index_by.
Are wide rows projected early with pick?
Are sub-expressions duplicated? Bind with let.
Is simd-json enabled (default)?
Is the same query run many times? Use JetroEngine.

If all yes, the query is on the fast path.

Public API and Engine

The full public surface of the jetro crate is two types and a handful of methods. Everything else is implementation detail.

`Jetro` — single-document handle

For one document, possibly many queries:

use jetro::Jetro;

let bytes = br#"{"x":[1,2,3]}"#;
let j = Jetro::from_bytes(bytes)?;          // lazy parse via simd-json tape
let v: serde_json::Value = j.collect("$.x.sum()")?;
assert_eq!(v, serde_json::json!(6));

Constructors

Method	Input	Notes
`Jetro::from_bytes(&[u8])`	Raw JSON bytes	Lazy parse — fastest path
`Jetro::from_value(serde_json::Value)`	Parsed value	Skip simd-json
`Jetro::from_val(Val)`	Internal `Val`	Advanced — re-using engine state

Methods

Method	Returns
`j.collect(query)`	`Result<serde_json::Value, EvalError>`
`j.collect_typed::<T>(query)`	`Result<T, EvalError>` (deserialize directly)

Jetro uses a thread-local VM with a path cache. Cheap to construct; prefer to drop it when you move to a new document so the cache key stays valid.

`JetroEngine` — long-lived multi-doc handle

For many documents and many queries with overlap, share the plan/VM caches:

use jetro::JetroEngine;

let eng = JetroEngine::default();

for doc_bytes in inputs {
    let v = eng.collect_bytes(doc_bytes, "$.users.filter(@.active).count()")?;
    println!("{}", v);
}

Methods

Method	Input	Notes
`eng.collect(&doc, q)`	`&Val`	Document already in `Val` form
`eng.collect_value(serde_value, q)`	`serde_json::Value`	Round-trips
`eng.collect_bytes(&[u8], q)`	Raw bytes	Lazy parse

Returns Result<serde_json::Value, JetroEngineError> — a wider error type that may also wrap JSON-parse errors.

Configuration

Option	Default	Effect
Plan-cache capacity	256	Wholesale-evicted when full

The engine's plan cache amortises parse + lower + compile across calls. Hits are O(hash); misses do full work.

Errors

pub enum EvalError {
    /* … */
}

pub enum JetroEngineError {
    Json(serde_json::Error),
    Eval(EvalError),
}

Error messages include the query position when available.

Feature flags

Feature	Default	What it does
`simd-json`	on	Direct `bytes → Val` parse, skipping `serde_json::Value`
`fuzz_internal`	off	Re-exports parser + planner for fuzz harness — not stable

To disable simd-json:

[dependencies]
jetro = { version = "0.5", default-features = false }

Python binding

jetro_py exposes a collect(doc, query) function. Internals are identical to the Rust crate.

import jetro

result = jetro.collect({"x": [1,2,3]}, "$.x.sum()")
# result == 6

CLI

jetrocli '$.x.sum()' < input.json

The CLI is a thin wrapper around Jetro::from_bytes.

Threading

Jetro is Send + Sync for read-only queries — multiple threads can share a Jetro and run different queries concurrently.
JetroEngine is Send + Sync and intended for shared-engine workloads.
The VM path-cache is thread-local; cross-thread access goes through separate caches.

Stability

The query DSL is stable as of jetro 0.5.x.
The Rust API surface (Jetro, JetroEngine, error types) is stable.
BuiltinMethod, opcodes, IR types are internal and may change in any minor release.
The fuzz_internal feature is explicitly unstable.

Known Limitations and Behavior Surprises (v0.5)

Empirically validated against jetro 0.5.5. This page is the canonical fix-list — every entry is a known gap between intended and actual behavior. Use it as a backlog: items here should drop as the runtime catches up.

v0.5.5 — fixed in this release

The 14 audit-surfaced bugs were addressed plus three follow-up sweeps:

✅ [*] wildcard parses (mid-chain expands to .map(@ + rest)).
✅ [a:b:c] and [::n] (incl. [::-1] reverse) — Python-style step slicing.
✅ Lambda array-pattern destructure ([k, v]) => body and rest form ([h, ...tail]) => body.
✅ Object patterns in match accept reserved words as keys ({kind: "click"}).
✅ Object pattern shorthand {id, name} ≡ {id: id, name: name} in match.
✅ Val::StrSlice + Val::Str → string concat. Path-rooted concat works.
✅ entries()/keys()/values() no longer triple-wrap their array result.
✅ parse_int(radix) — base-aware integer parsing with prefix stripping.
✅ to_csv(headers) / to_tsv(headers) — explicit header column ordering.
✅ accumulate(init, fn) and accumulate(fn) — both forms.
✅ partition(pred) — chained and standalone.
✅ approx_count_distinct() — HyperLogLog.
✅ missing("k1", "k2", ...) — returns missing-keys array.
✅ get_path("a/b/c") and get_path("a.b.c") — multi-segment paths.
✅ dedent() — common-prefix removal.
✅ remove(pred) — predicate evaluated.
✅ enumerate() — survives composition with map / filter.
✅ pairwise() — works on path sources.
✅ .has(v) returns boolean.
✅ rec(fn) fixpoint via deep structural equality.
✅ rec(fn, cond) — iterate while cond(@) holds, capped at 10 000 iters.
✅ update(path, fn) and functional .update({...}) — see Path Mutation.
✅ Filtered wildcard [* if pred].
✅ Wildcard chain modify $.xs[*].field.modify(@).
✅ Object literal as method receiver {a: 1}.keys() and ({a: 1}).keys().
✅ Regex escape: "\d" and "\\d" both parse as digit class.
✅ Path-call scalar unwrap: $.s.upper() → "HELLO" (was ["HELLO"]). Scalar OneToOne builtins on path receivers dispatch directly via apply_one; opt out per-builtin with BuiltinSpec::never_unwrap().
✅ to_json on array path: $.users.to_json() → single JSON document (was per-element JSON strings).
✅ zip_shape({a, b}) object-shape arg form.
✅ group_shape(key) 1-arg key projection (lambda or bare ident).
✅ indent("> ") accepts a string prefix in addition to integer count.
✅ Bare-path .field inside method args ($.users.filter(.active) ≡ (@.active)).
✅ Double-quoted string escape "{\"a\":1}".from_json() parses.

Items below are still outstanding.

Organized into:

Open engine items
Design choices — intentional, won't change

1. Open engine items

1.1 `rec()` no-arg

rec requires a step expression — there is no defined no-arg semantic. The closest match is walk(fn) for traversal-style transforms or rec(fn) for fixpoint iteration. May be retired or aliased to a default walker in a later release.

1.2 `rec(fn)` runaway iteration cap

Calls to rec(fn) where fn is non-idempotent and never reaches a deep-structural fixed point are bounded at 10 000 iterations and then error. The new error message names the cap and recommends rec(fn, cond) for explicit bounding. No guard short of analytic decidability prevents the worst case; document the cap and surface it loudly.

2. Design choices

2.1 No `in` operator

in would be ambiguous with let X = Y in Z and for x in xs. Use the postfix has operator or .includes(v) method:

xs has "x"             # ✓ operator
xs.includes("x")       # ✓ method
"x" in xs              # ✗ parse error (intentional)

2.2 `replace` is single-occurrence

.replace(needle, with) replaces only the first match — JavaScript-style. Use .replace_all for substitute-every behaviour:

"hello hello".replace("hello", "hi")          # → "hi hello"
"hello hello".replace_all("hello", "hi")      # → "hi hi"

2.3 Comments

There are no comments inside a query. Strip client-side.

2.4 `[expr]` vs `{expr}`

Inline filter is {predicate}. [expr] is index/slice.

$.xs{@.active}        # ✓ inline filter
$.xs[@.active]        # ✗ index expression

3. Argument / receiver shape rules

3.1 Methods accepting lambda forms

Method	Working forms
`filter`, `find`, `find_all`, `find_first`, `find_one`, `find_index`, `indices_where`, `any`, `all`, `take_while`, `drop_while`, `remove`	`(@.x op v)`, `(.x op v)`, `(b => b.x op v)`, `(lambda b: ...)`
`map`, `flat_map`, `transform_keys`, `transform_values`, `filter_keys`, `filter_values`	Same
`sort`, `unique_by`, `group_by`, `count_by`, `index_by`, `max_by`, `min_by`	Same; `(b => b.x)` named lambda preferred for readability

$.books.sort(b => b.year)             # named lambda
$.books.sort(@.year)                  # @-form
$.books.sort(.year)                   # bare-path sugar (≡ @-form)

3.2 Methods that take bare identifiers (no `@`)

Method	Form
`pick(field, alias: src, ...)`	Bare identifiers. Not `@.field`.
`omit(field, ...)`	Same
`rename({old: new, ...})`	Object map
`missing("k1", "k2", ...)`	String literals

$.user.pick(id, name)                 # ✓
$.user.pick(@.id, @.name)             # ✗ parse error
$.user.pick(uid: id)                  # ✓ alias

3.3 Multi-arg lambdas

Two-arg lambdas use parens:

$.orders.equi_join($.customers, "cid", "id", (o, c) => {buyer: c.name})
$.xs.accumulate(0, (a, b) => a + b)

Single-arg array destructure (with optional rest) is supported:

$.entries.map(([k, v]) => {k, v})         # ✓
$.rows.map(([h, ...tail]) => tail)        # ✓ rest binding

Versions

This page reflects v0.5.5 behavior empirically tested. As the engine catches up, entries here drop.

Open count: 2 engine items + 4 design choices documented.

Glossary

Backend. One of the execution paths the planner can route a node through: Structural, TapeView, TapeRows, TapePath, ValView, MaterializedSource, FastChildren, Interpreted. Selected automatically based on shape and capabilities.

Barrier. A stage that must see all input before emitting output. sort, unique, group_by, window, etc.

Bitmap structural index. A bit-packed index over the simd-json tape that lets ..find, ..shape, ..like, and ..match skip non-matching subtrees in O(1) per node. Used when the document is loaded with the simd-json tape (default).

Borrowed view. A ValueView — a read-only borrowed reference into a parsed document. Zero-copy substrings via Val::StrSlice.

Builtin. One of the 181 methods in jetro's catalog. Each is one impl Builtin for X block in defs.rs with identity, demand law, and runtime layers co-located.

Chain-write. A query ending in a write terminal (.set, .modify, .delete, .unset, .merge, .deep_merge, .append, .prepend) on a rooted path. Rewritten to Expr::Patch by the parser.

Composed stage. A Composed<A, B> pair that fuses two adjacent stages into one virtual call per element.

Demand. The triple (pull, value, order) describing what an operator needs from its source. See Demand Propagation.

Demand law. The rule by which a builtin transforms downstream demand into upstream demand. Encoded in the builtin's BuiltinDemandLaw.

Effect lifting. The patch-fusion pass that batches multiple chain-writes into a single document walk.

Engine. A JetroEngine — a long-lived handle that caches parsed and compiled queries for reuse across documents.

F-string. f"text {expr}" — string with embedded expression interpolation.

Field chain. A path of pure field accesses, e.g. $.a.b.c. Recognised by the planner and routed to fast tape backends.

Jetro. Single-document handle. Jetro::from_bytes(bytes)?.collect(q).

JetroEngine. Multi-document handle with plan/VM caches.

Lambda. A small function value: @, r => body, lambda r: body. All three forms compile identically.

Maranget tree. The decision-tree compilation strategy used for pattern matching. Cross-arm sharing of common discriminant tests.

Patch. The internal write operation. Generated by both patch $ { … } blocks and chain-write classification.

Patch fusion. The optimizer pass that batches multiple writes into a single walk.

Pipeline. The streaming execution model: Source → Stage* → Sink. One element at a time.

Plan / Logical Plan. Tree-shaped IR between AST and bytecode. Lives in ir/logical.rs.

Plan cache. A cache in JetroEngine that maps (query, context) to a compiled Pipeline. Default capacity 256.

Pull demand. The first lane of Demand: how many inputs must be read. Variants: All, FirstInput(n), LastInput(n), NthInput(i), UntilOutput(n).

Quantifier. A postfix operator on a path step. ? = optional, ! = exactly-one.

Sink. The terminal stage of a pipeline. Reducers, positional, and implicit collectors.

Source. The first stage of a pipeline. Usually a path or array literal.

Streaming. Per-element execution; no buffering.

Tape. The simd-json output: a flat array of tokens describing structural positions in the JSON byte buffer. Used for zero-copy access.

Val. The internal value type. Arc-wrapped compound nodes ensure cheap clones.

Value need. The second lane of Demand: how much of each row's content is required. Variants: None, Predicate, Projection, Numeric, Whole.

View. A ValueView — borrowed read-only access to a value.

VM. The bytecode executor. Used as the universal fallback backend; also provides the path-cache.

Write fusion. Same as patch fusion. See above.

The Jetro Book