Introduction
Jetro is a query, transform, and patch engine for JSON, written in Rust. It parses a small dot-syntax DSL, plans the query through a multi-tier optimizer, and routes each subtree to whichever execution backend will run it fastest — zero-copy borrowed views over a simd-json tape, a bitmap structural index, a streaming pull pipeline, or the universal interpreted fallback.
If you have used jq, jetro will feel familiar but takes a different shape:
- Dot syntax, not pipe-of-filters.
$.users.filter(active).map(name)reads left-to-right and chains methods. The|operator exists, but it is for passing a value into an arbitrary expression — not for calling methods with arguments. - One source of truth per builtin. Every method is one
impl Builtin for Xblock: identity, demand law, optimizer hints, and runtime layers all co-located. There are 181 of them. - Demand-driven planning.
.first()doesn't materialise the whole array..filter(p).take(3)doesn't filter the whole array. The planner walks backward from the sink, telling each operator what its source actually needs to produce. - Writes are first-class.
$.users[0].name.set("Ada")rewrites to a fused patch over the document. Multiple chain-writes batch through a single fused pass. - Pattern match with guards.
match x with { {kind: "err"} -> .msg, _ -> "" }compiles to a Maranget decision tree and runs overVal, borrowedView, and tape domains; deep..matchis bitmap-accelerated.
What this book covers
| Part | What you get |
|---|---|
| Language Reference | Every grammar form with at least one runnable example. |
| Concepts | Pipelines, demand propagation, the cache hierarchy. |
| Builtin Reference | One section per builtin — input, output, behavior, examples, demand law, common pitfalls. |
| Recipes | Real chained queries, pattern-match cookbook, write-fusion. |
| Appendix | The public Rust API (Jetro, JetroEngine), and a glossary. |
What this book doesn't cover
Implementation internals — the IR layer, the bytecode VM, plan caching, peephole passes — are documented in the source. This book stops at user-facing surface, with one exception: the demand-propagation chapter, because demand is what makes "obvious" queries fast and not understanding it leads to surprised benchmarks.
Conventions
Examples use this layout:
DOC: {"books": [{"title": "Dune", "year": 1965}, {"title": "Foundation", "year": 1951}]}
QUERY: $.books.filter(@.year < 1960).map(@.title)
OUT: ["Foundation"]
Where the document matters, you'll see DOC:. Where it's obvious from the
query, only QUERY: and OUT: appear. Method aliases are listed inline:
unique (alias distinct).
Ready? Start with the Quick Tour, or jump to the Builtin Reference if you already know jetro and need a specific method.
A few v0.5 sharp edges worth noting up front. This book documents jetro's stable semantics; the behaviours listed below are intentional design choices for v0.5. See Known Limitations for the canonical fix-list.
replace(needle, with)replaces only the first occurrence (JavaScript-style); usereplace_allfor substitute-every behaviour.- There is no
inoperator ("x" in xsis a parse error) becauseindoubles as the binder inletandfor; usexs has "x"orxs.includes("x")instead.- Regex specials use single backslash inside string literals (
"\d"works); double-backslash also parses but matches the same class.rec(fn)caps at 10 000 iterations when the step never reaches a structural fixpoint; passrec(fn, cond)to bound the loop.
Installation
Jetro ships as three artifacts:
| Artifact | What it is | Audience |
|---|---|---|
jetro (crate) | Rust library — query/transform JSON in-process | Rust developers |
jetro-py | Python bindings (PyPI) | Python users |
jetrocli | Standalone CLI jetrocli for shell use | Anyone with JSON in a terminal |
Rust library
Add to Cargo.toml:
[dependencies]
jetro = "0.5"
The simd-json feature is on by default and gives a ~4× cold-start win by
parsing bytes directly into Val (no serde_json::Value intermediate). To
fall back to the legacy serde-only path:
[dependencies]
jetro = { version = "0.5", default-features = false }
Quick sanity check:
use jetro::Jetro;
fn main() -> anyhow::Result<()> {
let bytes = br#"{"books":[{"title":"Dune","year":1965}]}"#;
let j = Jetro::from_bytes(bytes)?;
let titles: serde_json::Value = j.collect("$.books.map(@.title)")?;
println!("{}", titles); // ["Dune"]
Ok(())
}
Long-lived engine
If you process many documents with overlapping queries, keep a JetroEngine
around. It holds shared plan and VM caches:
use jetro::JetroEngine;
let eng = JetroEngine::default();
for doc in docs {
let v = eng.collect(&doc, "$.users.filter(active).count()")?;
println!("{}", v);
}
Plan-cache default capacity is 256 entries; it evicts wholesale when full.
Python bindings
pip install jetro-py
import jetro
doc = {"books": [{"title": "Dune", "year": 1965}]}
print(jetro.collect(doc, "$.books.map(@.title)")) # ['Dune']
The Python wheel embeds the same Rust core, so query syntax is identical.
CLI (jetrocli)
Install via Homebrew:
brew install mitghi/jetrocli/jetrocli
Or build from source:
git clone https://github.com/mitghi/jetrocli
cd jetrocli && cargo install --path .
Use it like jq:
echo '{"x":[1,2,3]}' | jetrocli '$.x.sum()'
# 6
cat data.json | jetrocli '$.users.filter(@.active).map(@.email)'
Building from source
git clone https://github.com/mitghi/jetro
cd jetro
cargo build --release # build everything
cargo test # full suite
cargo bench -p jetro-core # micro-benchmarks
Workspace layout:
jetro/ facade crate (re-exports + public API)
jetro-core/ engine: parser, planner, executor, builtins, runtime
jetro-core/fuzz/ cargo-fuzz harness (feature-gated)
Verifying your install
Run the tour from the next chapter against your install. If every query produces the printed output, you're ready.
A 5-Minute Tour
This page is a working tour of jetro. Every example has a document, a query,
and an output. Run them in your shell with jetrocli, in Rust with
Jetro::collect, or in Python with jetro.collect.
The document for this tour
{
"books": [
{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"]},
{"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"]},
{"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"]},
{"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"]}
],
"active": true
}
1. Path navigation
QUERY: $.books[0].title
OUT: "Dune"
$ is the root, .books is field access, [0] is index. Negative indices
work: [-1] is "Snow Crash".
2. The whole array
QUERY: $.books[*].title
OUT: ["Dune","Foundation","Hyperion","Snow Crash"]
[*] produces every element.
3. Filter
QUERY: $.books.filter(@.year > 1980).map(@.title)
OUT: ["Hyperion","Snow Crash"]
Inside .filter, .map, and similar method args, the current item is @.
Use @.field to walk into it; the leading-dot shorthand .field is also
accepted and desugars to @.field.
4. Four lambda forms
These are all equivalent:
$.books.filter(@.year > 1980)
$.books.filter(.year > 1980)
$.books.filter(b => b.year > 1980)
$.books.filter(lambda b: b.year > 1980)
Pick whichever reads best. The named-lambda and @-forms compile to
identical bytecode; benchmarks confirm them perf-equal.
5. Reducers
QUERY: $.books.count()
OUT: 4
QUERY: $.books.map(@.year).min()
OUT: 1951
QUERY: $.books.map(@.year).avg()
OUT: 1724.25
Reducers terminate the streaming pipeline.
6. Group / count / sort
QUERY: $.books.count_by(@.author)
OUT: {"Herbert":1,"Asimov":1,"Simmons":1,"Stephenson":1}
QUERY: $.books.sort(@.year).map(@.title)
OUT: ["Foundation","Dune","Hyperion","Snow Crash"]
7. Object projection
QUERY: $.books[0].pick(title, author)
OUT: {"title":"Dune","author":"Herbert"}
QUERY: $.books.map(b => b.pick(title, year))
OUT: [{"title":"Dune","year":1965}, ...]
.pick(name, alias: src) also renames: .pick(t: title, y: year).
8. Deep search
QUERY: $..find(@.year < 1960)
OUT: [{"title":"Foundation","year":1951,...}]
QUERY: $..like({author: "Asimov"})
OUT: [{"title":"Foundation","year":1951,...}]
..find, ..shape, and ..like are DFS pre-order over the whole document.
Equivalent named forms: .deep_find, .deep_shape, .deep_like.
9. Pipe and ternary
QUERY: $.books.count() | "found " + (@ as string) + " books"
OUT: "found 4 books"
QUERY: $.books[0] | "old" if @.year < 1980 else "modern"
OUT: "old"
| passes a value through an expression — not a method-call sugar. Use .method() for methods.
10. F-strings
QUERY: $.books.map(b => f"{b.title} ({b.year})")
OUT: ["Dune (1965)","Foundation (1951)","Hyperion (1989)","Snow Crash (1992)"]
11. Pattern match
QUERY:
match $.books[0] with {
{year: y} when y < 1970 -> f"classic {y}",
{year: y} -> f"modern {y}",
_ -> "unknown"
}
OUT: "classic 1965"
Patterns include literals, ranges (1900..2000), or-patterns, guards, object
shape, array shape, and rest captures.
12. Writes
QUERY: $.books[0].year.set(1900)
OUT: full document with books[0].year now 1900
QUERY: $.books[*].tags.append("read")
OUT: full document with "read" added to every book's tags
QUERY: $.books[0].unset(tags)
OUT: full document with books[0].tags removed
Multiple writes in one query batch through a single fused pass.
13. Engine entrypoint (Rust)
use jetro::JetroEngine;
use serde_json::json;
let eng = JetroEngine::default();
let doc = json!({"x":[1,2,3,4,5]});
let v = eng.collect_value(doc, "$.x.filter(@ > 2).sum()")?;
assert_eq!(v, json!(12));
That's the tour. Next: the Grammar Overview, or skip straight to the Builtin Index.
Grammar Overview
The jetro DSL is a small, expression-oriented language. There are no statements at the top level — every program is an expression that produces a value (or, in the case of patches, a rewritten document).
The grammar lives in
grammar.pest
and is parsed by pest.
Five things that make jetro different
- Method calls use dot syntax.
xs.map(f), notxs | map(f). - Pipe
|is value-flow.x | exprevaluatesexprwith@bound tox. @is the current value. Inside.filter(...)it's the row; at the top level it's the input.- Bare paths inside method args.
.filter(@.age > 18)is sugar for.filter(@.age > 18). - Writes are queries.
$.x.set(v)is parsed as a query that produces a patched document, not a mutation.
Categories of syntax
| Category | Forms | Chapter |
|---|---|---|
| Paths | $, @, .field, [idx], [*], [start:end:step], ..desc, {pred} | Paths |
| Operators | arithmetic, comparison, logical, pipe, coalesce, ternary, kind, cast | Operators |
| Methods | .name(args), lambdas (@, =>, lambda) | Lambdas |
| Literals | numbers, strings, f-strings, arrays, objects, regex | Literals |
| Control flow | match, ternary, try, comprehensions | Control Flow |
| Writes | patch $ {…}, chain-write terminals | Patch |
A handy precedence table sits at the end of this part.
A worked sample
$.users
.filter(u => u.active and u.age >= 18)
.map(u => { id: u.id, name: u.name, email: u.email })
.sort(@.name)
.take(10)
That's: root, field users, predicate filter (named lambda), object-mapping,
sort by name, take first 10.
Comments
There are no comments inside a query. Strip them client-side before calling jetro, or factor commentary into the surrounding host program.
Whitespace
Whitespace and newlines are insignificant between tokens. Keep queries on one line in CLIs; break across multiple lines in source.
Paths and Navigation
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5]}
A path is the part of a query that walks into the document. Paths start at
a root marker ($, @, or an identifier inside a lambda) and chain steps
left-to-right.
Roots
| Form | Meaning |
|---|---|
$ | The whole input document (top-level root) |
@ | The current value (set by .filter, .map, |, etc.) |
name | A let-bound name or lambda parameter |
DOC: {"x": 10}
QUERY: $
OUT: {"x":10}
QUERY: $.x | @ + 1
OUT: 11
Field access
DOC: {"user": {"name": "Ada"}}
QUERY: $.user.name
OUT: ["Ada"]
Field names may also use string keys via ["name"]:
QUERY: $["user"]["name"]
Use the bracket form when the key contains characters disallowed in identifiers
(-, spaces, dots inside the key, leading digits).
Indexing arrays
DOC: {"xs": [10, 20, 30, 40]}
QUERY: $.xs[0]
OUT: 10
QUERY: $.xs[-1]
OUT: 40
Negative indices count from the end.
Slicing
QUERY: $.xs[1:3]
OUT: [20,30]
QUERY: $.xs[:2]
OUT: [10,20]
QUERY: $.xs[2:]
OUT: [30,40]
QUERY: $.xs[0:4:2]
OUT: [10,30]
Wildcards
QUERY: $.xs[*]
OUT: [10,20,30,40]
[*] is "every element". Most users prefer chained methods (.filter,
.map) which already iterate.
Filtered wildcard [* if pred]
A predicated wildcard — keeps only elements satisfying pred (with @
bound to the candidate).
DOC: {"books": [{"title": "Dune", "year": 1965}, {"title": "Hyperion", "year": 1989}]}
QUERY: $.books[* if year > 1980]
OUT: [{"title":"Hyperion","year":1989}]
Equivalent to [*] immediately followed by an inline-filter {cond},
but stays on the path side of parsing. Particularly useful inside
.update selectors and quoted patch path keys (see
Patch).
Chaining a bare field step after a filtered wildcard collapses to
null — chain a method instead:
QUERY: $.books[* if year > 1980].map(@.title)
OUT: ["Hyperion"]
Inline filter
{predicate} after a path step keeps only matching elements:
DOC: {"books": [{"year": 1965}, {"year": 1989}]}
QUERY: $.books{@.year > 1970}
OUT: [{"year":1989}]
This is shorthand for .filter(@.year > 1970). Use .filter when you want
named-lambda forms.
Descendant search
.. walks every descendant value in DFS pre-order:
DOC: {"a": {"b": {"x": 1}}, "c": [{"x": 2}, {"x": 3}]}
QUERY: $..x
OUT: [1,2,3]
Combine with method calls (no space):
QUERY: $..find(@.year < 1960)
QUERY: $..shape({year, title})
QUERY: $..like({author: "Asimov"})
The deep variants are bitmap-accelerated when a structural index is available.
Dynamic keys
Compute a key at runtime:
DOC: {"realnames": {"abc": "Ada"}, "post": {"author": "abc"}}
QUERY: $.realnames[$.post.author]
OUT: "Ada"
Inside a lambda:
DOC: {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY: $.posts.map(p => $.realnames[p.author])
OUT: ["Ada"]
Quantifiers (postfix)
| Form | Meaning |
|---|---|
step? | Optional — return null instead of error if missing |
step! | Exactly-one — error if zero or many |
DOC: {"xs": [42]}
QUERY: $.xs!
OUT: [42]
QUERY: $.maybe?
OUT: null # absent, no error
Path after a method
Paths and methods are interchangeable steps:
$.users.filter(@.active).pick(name, email)[0]
That's: field, method, method, index. There is no special "tail position".
Paths inside method args need a root
Inside method-call arguments, paths must start with @ (current item),
$ (document root), or a bound name. Bare-path forms like .field do not
parse:
$.users.filter(@.age > 18) # ✓ @-form
$.users.filter(u => u.age > 18) # ✓ named lambda
$.users.filter(.age > 18) # ✗ parse error
$.users.map(@.name) # ✓
$.users.map(.name) # ✗
The same rule applies to inline filters: $.xs{@.k > 1} works,
$.xs{.k > 1} does not.
Top-level paths still need $.
Summary
| Step | Example | Notes |
|---|---|---|
| Root | $, @ | One per chain (or implicit @ in args) |
| Field | .name | Use ["..."] for tricky keys |
| Index | [3], [-1] | Negative counts from end |
| Slice | [1:5], [::2] | Half-open like Python |
| Wildcard | [*] | Whole array |
| Filtered wildcard | [* if pred] | Wildcard restricted by predicate (@ = element) |
| Descendant | ..name, .. | DFS pre-order |
| Inline filter | {cond} | Sugar for .filter |
| Dynamic key | [expr] | Expression resolves to key |
| Quantifier | ?, ! | Postfix on a step |
Operators
Jetro has the operators you'd expect plus a small number of extras that come up in JSON work.
Arithmetic
1 + 2 # 3
3 - 1 # 2
2 * 3 # 6
6 / 2 # 3
7 % 3 # 1
-x # unary negation
+ on strings concatenates: "foo" + "bar" → "foobar".
+ on arrays concatenates: [1,2] + [3] → [1,2,3].
Comparison
a == b # equality
a != b # inequality
a < b # less than
a <= b
a > b
a >= b
== and != work across types (strings to strings, numbers to numbers, etc).
Cross-type comparison returns false for == and true for !=.
Logical
a and b # short-circuit AND
a or b # short-circuit OR
not a # negation
Truthiness: null, false, 0, "", [], {} are falsy. Everything else
is truthy.
Pipe
value | expr
Evaluates expr with @ bound to value. It is not a method-call
shorthand.
DOC: {"x": 10}
QUERY: $.x | @ * 2
OUT: 20
QUERY: $.x | f"got {@}"
OUT: "got 10"
To call a method, use dot syntax: $.x.upper(), not $.x | upper.
Coalesce
a ?? b
Return a unless it is null, in which case b.
DOC: {"name": null}
QUERY: $.name ?? "anon"
OUT: "anon"
Ternary
Python-style — postfix condition:
"hot" if temp > 30 else "cool"
DOC: {"temp": 35}
QUERY: "hot" if $.temp > 30 else "cool"
OUT: "hot"
Kind tests
v is number
v is string
v is array
v is object
v is null
v is bool
Returns boolean.
QUERY: $.x is number
Cast
x as int
x as float
x as string
x as bool
x as array
x as object
Coerces the value (or returns null if the cast is impossible — depends on the specific cast).
"42" as int # 42
42 as string # "42"
Membership
xs has v # array membership: true if v is in xs
o has "k" # object membership: true if key "k" exists
There is no v in xs operator — that form is a parse error. Use the
postfix has operator above, or call .includes(v) (arrays/strings)
explicitly:
$.tags.includes("hugo") # ✓
"hugo" in $.tags # ✗ parse error
Regex match
s ~= "pattern"
Returns boolean. Uses Rust regex syntax. Bind captures with .captures or
.match_first for richer info — see String Search.
Boolean shortcut on patches
In a patch $ { … } body, a key when condition clause skips the assignment
when condition is falsy. See Patch.
Examples
DOC: {"books": [{"year": 1965, "tags": ["sf"]}, {"year": 1989, "tags": ["sf","hugo"]}], "year_floor": 2000}
QUERY: $.books.filter((@.year > 1970 and @.tags.includes("hugo")) or @.year >= $.year_floor)
OUT: []
QUERY: $.books[0].year ?? 9999
OUT: 1965
QUERY: $.books.map(b => "old" if b.year < 1970 else "new")
OUT: ["old","new"]
No
inoperator. Membership in jetro isxs.includes(v)(orxs.has(v)for objects/arrays). There is nov in xsoperator — that form is a parse error. Wrapand/ormixes in parens to make precedence unambiguous; jetro follows standard binding (andtighter thanor), but parens read clearer.
Lambdas and Method Calls
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5], "pairs": [["a", 1], ["b", 2], ["c", 3]]}
Methods take arguments. Most arguments are values; one common one is a lambda — a small function evaluated per element. Jetro accepts three lambda syntaxes; pick whichever reads best.
The @-form
@ is the current item. Inside method args, prefix paths with @ to walk
into it:
$.users.filter(@.age >= 18)
$.users.map(@.name)
$.xs{@.active} # inline filter must also use @
Leading-dot shorthand .age inside method args desugars to @.age — the
two forms are equivalent and the planner sees identical opcodes.
$.users.filter(.age >= 18)
$.users.map(.name)
$.xs{.active} # works inside inline filters too
Arrow-form named lambda
$.users.filter(u => u.age >= 18)
$.users.map((u) => u.name)
The parens around the parameter are optional for one parameter.
For multiple parameters:
$.pairs.map(([k, v]) => k + ":" + v)
Python-style lambda keyword
$.users.filter(lambda u: u.age >= 18)
$.users.map(lambda u: u.name)
Functionally identical to the arrow form. Useful when porting from Python.
Performance
Named lambdas (u => u.x, lambda u: u.x) and the @-form compile to the
same bytecode. Benchmarks confirm parity (3.42 ms vs 3.44 ms / 100K rows in
the lambda regression suite). Pick what reads best — there is no perf reason
to prefer @.
Method call basics
.method() # no args
.method(arg) # one positional
.method(arg1, arg2) # multiple
.method(name=value) # named (a few methods support these)
.method(arg1, name=value) # mixed
Examples:
$.xs.take(3)
$.xs.replace("foo", "bar")
$.xs.join(",")
$.xs.sort(@.year) # sort by key projection
Methods inside method args
Lambdas can chain methods just like top-level queries:
$.posts.map(p => p.tags.unique().count())
$.users.filter(u => u.email.starts_with("admin"))
Multi-arg lambdas with destructuring
Some barriers (e.g. pairwise) yield 2-tuples. Destructure them:
$.xs.pairwise().map(([a, b]) => b - a)
Captured $
Inside a lambda, $ still means "the document root" — it does not get
shadowed by the lambda parameter:
DOC: {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY: $.posts.map(p => $.realnames[p.author])
OUT: ["Ada"]
First-class lambdas via let
Bind a lambda once, use it many times:
let by_year = (b => b.year < 1970) in
$.books.filter(by_year)
The let-bound lambda is inlined at every method-arg use before
compilation, so it has zero closure overhead — exactly the same code as if
you'd written the body directly in .filter(...).
Outside method-arg position, the binding is a normal name reference.
Literals
Scalars
null
true false
42 3.14 -7 1.5e3
"double-quoted" 'single-quoted'
Strings allow standard escapes (\n, \t, \\, \", \uXXXX).
F-strings
f"…" interpolates {expression}:
DOC: {"name": "Ada", "age": 36}
QUERY: f"hi {$.name}, you are {$.age + 1} next year"
OUT: "hi Ada, you are 37 next year"
Inside a lambda:
$.users.map(u => f"{u.name} <{u.email}>")
Escape literal braces with {{ and }}:
f"{{not interpolated}}" # "{not interpolated}"
Arrays
[1, 2, 3]
["a", "b"]
[$.x, $.y, 99] # values can be expressions
[...$.xs, 4, 5] # spread
[1, ...mid, 9] # spread anywhere
Heterogeneous arrays are fine: [1, "a", null, [2,3]].
Objects
{name: "Ada", age: 36} # bare-key (identifier-like)
{"name": "Ada"} # quoted-key (any string)
{x, y} # shorthand: same as {x: x, y: y}
{[dyn_key]: 1} # computed key
{...obj, extra: 1} # spread
{...**deep} # deep recursive spread
{name: "Ada", role: "admin" when $.is_admin}
# conditional value (omit if cond falsy)
Regex literals
Regex appear as the right operand of ~= or as arguments to regex builtins:
$.s ~= "^[A-Z]+$"
$.text.scan("\d+")
Patterns use Rust's regex crate syntax.
Numeric notes
Jetro distinguishes integers from floats internally where possible. 42 and
42.0 compare equal but a downstream sink that requires "integer" (e.g.
indexing) will only accept the former.
Negative literals: -7 is a unary-negated literal — the parser handles this
correctly without ambiguity in arithmetic positions (a - 7 is subtraction,
a + -7 is addition with -7).
Control Flow
Ternary
Python-style:
expr if condition else fallback
DOC: {"x": 10}
QUERY: "big" if $.x > 5 else "small"
OUT: "big"
Right-associative; chain via parens for clarity.
Try / else
Catch evaluation errors:
try expr else fallback
QUERY: try $.maybe.deep.path else "missing"
OUT: "missing"
QUERY: try $.xs[0].name.upper() else "n/a"
? quantifier handles the "missing field" subset more concisely:
$.maybe? returns null instead of erroring.
let … in …
Local bindings:
let x = $.users.count() in
f"there are {x} users"
Multi-binding:
let a = 1, b = 2 in a + b # equiv: let a=1 in let b=2 in a+b
let shines for first-class lambdas — see Lambdas.
Pattern match
match value with {
pattern1 -> expr1,
pattern2 when guard -> expr2,
_ -> default
}
Patterns
| Pattern | Matches |
|---|---|
42, "x", true, null | Equal literal |
_ | Any value |
name | Any value, bound to name |
1..10 | Number ≥ 1 and < 10 |
1..=10 | Number ≥ 1 and ≤ 10 |
{k1: p1, k2: p2} | Object with these keys, each matching (no shorthand {k1, k2} in v0.5) |
[p1, p2] | Array of length 2, each matching |
[h, ...t] | Head + tail |
p1 | p2 | Either pattern (or-pattern) |
x: number | Kind-bound: matches if x is a number |
Guards
match $.x with {
v when v > 100 -> "big",
v when v > 10 -> "medium",
_ -> "small"
}
Worked example
DOC: {"event": {"kind": "click", "x": 100, "y": 200}}
QUERY:
match $.event with {
{kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
{kind: "key", code: c} -> f"key:{c}",
_ -> "unknown"
}
OUT: "click@100,200"
Deep match
$..match { pattern -> expr, _ -> null }
Walks every descendant; returns matched results as an array.
$..match! { pattern -> expr } # first match only, early-stops
The bang variant terminates as soon as one match succeeds (uses the bitmap structural index when available).
Comprehensions
Jetro supports list, dict, set, and generator comprehensions over both
literal and path-rooted sources. Pair destructure works in two
interchangeable forms (for k, v in ... and for [k, v] in ...), and
multiple if clauses are folded with and.
List
[expr for x in source if cond1 if cond2 ...]
DOC: {"xs": [1, 2, 3, 4, 5]}
QUERY: [n*n for n in $.xs if n > 2]
OUT: [9,16,25]
QUERY: [n for n in $.xs if n > 1 if n < 5]
OUT: [2,3,4]
Object
{key: value for x in source if cond}
{k: v for [k, v] in pairs}
{k: v for k, v in pairs}
DOC: {"pairs": [["a", 1], ["b", 2]]}
QUERY: {k: v for [k, v] in $.pairs}
OUT: {"a":1,"b":2}
QUERY: {n: n*n for n in [1,2,3]}
OUT: {"1":1,"2":4,"3":9}
Iterating an object yields {key, value} records:
DOC: {"o": {"a": 1, "b": 2}}
QUERY: {e.key: e.value*10 for e in $.o}
OUT: {"a":10,"b":20}
Set
Deduplicating comprehension. Returns an array of unique values.
QUERY: {n*n for n in [-2, -1, 0, 1, 2]}
OUT: [4,1,0]
Generator
(x for x in items)
Same semantics as the list form; useful as a lazy source for a downstream reducer or barrier.
if-on-patch
Inside a patch $ {…} body, key: expr when cond skips the assignment when
cond is falsy:
patch $ {
status: "active" when $.verified
}
See Patch.
Patch and Writes
Fixture
Examples below run against:
DOC: {"user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "xs": [1, 2, 3, 4, 5]}
Jetro treats writes as queries: a write returns the patched document. There are two equivalent surfaces.
Chain-write terminals
Add a write method at the end of a rooted path:
| Method | Effect |
|---|---|
.set(v) | Replace the value at this path with v |
.modify(expr) | Replace, with @ bound to the current value |
.delete() | Remove the leaf |
.unset(key) | Remove key from the leaf object |
.merge({…}) | Shallow-merge into the leaf object |
.deep_merge({…}) | Recursive merge |
.append(v) | Push to the leaf array |
.prepend(v) | Unshift onto the leaf array |
DOC: {"user": {"name": "Ada", "tags": ["math"]}}
QUERY: $.user.name.set("Ada Lovelace")
OUT: {"user":{"name":"Ada Lovelace","tags":["math"]}}
QUERY: $.user.tags.append("code")
OUT: ["math","code"]
QUERY: $.user.unset(tags)
OUT: {"user":{"name":"Ada"}}
QUERY: $.user.modify(u => u.merge({active: true}))
OUT: {"user":{"active":true,"name":"Ada","tags":["math"]}}
The classifier fires only when the base of the chain is $. Inside
lambdas ($.xs.map(@.set(...))) it remains a regular method call — useful
when a sub-pipeline wants the old "return the new value" semantics.
patch $ { … } block
The same operation expressed as a block:
patch $ {
user.name: "Ada Lovelace",
user.tags: DELETE
}
Block syntax is best for multiple writes — it batches them through a single fused pass (see Write Fusion).
| Block clause | Meaning |
|---|---|
path: value | Assignment |
path: DELETE | Removal |
path: value when cond | Conditional |
path[*]: value | Broadcast over an array |
Conditional writes
patch $ {
status: "active" when $.verified,
retired_at: now() when $.retired
}
If the condition is falsy, the assignment is skipped entirely — neither written nor zeroed.
Broadcast over arrays
DOC: {"items": [{"x": 1}, {"x": 2}, {"x": 3}]}
QUERY: $.items[*].x.set(0)
OUT: [0,0,0]
Pipe form preserves "return-the-new-value"
Some users prefer the v1 behavior where a write inside a .map returned the
written value, not the patched root:
$.items.map(item => item | set(item.x + 1))
The pipe form value | set(new) keeps that meaning.
Modify with pipe
$.user.modify(u => u.merge({last_seen: now()}))
modify evaluates its argument with @ bound to the current value, then
writes the result back at the same path.
Multiple writes in one query
Either chain them:
$.user.name.set("Ada").tags.append("admin")
or use a block:
patch $ {
user.name: "Ada",
user.tags[*]: "active" # broadcast
}
The planner detects multi-write patterns and routes them through the patch-fusion optimizer, which lowers repeated path traversals into a single fused write pass.
Functional .update({...})
A third surface, written as a method call:
DOC: {"books": [
{"title": "Dune", "year": 1965, "tags": ["sf"]},
{"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}
QUERY: $.books[*].update({tags: tags.append("modern") when year > 1980, reviewed: true})
OUT: {"books":[{"reviewed":true,"tags":["sf"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","modern"],"title":"Hyperion","year":1989}]}
Use .update when you want all of the following at once:
- A selector chosen with chain syntax (
$.books[*],$.books[* if year > 1980]) - An object body listing multiple field updates evaluated against each selected snapshot
- The same
when/DELETEsemantics aspatch $ { ... } - Quoted path keys (
"books[*].tags") when the receiver is$, giving root-level batched updates without an explicit selector
.update parses to its own AST node (UpdateBatch) so the planner can
keep the user-level shape — useful for selector pushdown, demand
analysis, and fusion. See
Path Mutation → update for the
full argument matrix.
Filtered wildcard [* if pred]
A predicated wildcard inside a path. Available wherever [*] is, and
particularly useful inside .update selectors and quoted path keys:
DOC: {"books": [
{"title": "Dune", "year": 1965},
{"title": "Hyperion", "year": 1989}
]}
QUERY: $.books[* if year > 1980]
OUT: [{"title":"Hyperion","year":1989}]
The predicate runs against @ = the candidate element. Falsy elements
are skipped from the path traversal entirely.
Wildcard .modify chains
Wildcard chain-writes are now lowered to a fused patch:
DOC: {"books": [{"tags": ["sf"]}, {"tags": ["hugo"]}]}
QUERY: $.books[*].tags.modify(@.append("test"))
OUT: {"books":[{"tags":["sf","test"]},{"tags":["hugo","test"]}]}
Caveats
.replace(needle, with)is not a write terminal — it is the string-replace builtin.- The classifier only triggers on chains rooted at
$. Use the block syntax when the base path is computed. DELETEis a marker, not a value — you can't store it in a binding.
Precedence Table
Lowest precedence at the top. Operators on the same row associate left unless noted.
| Level | Operators | Associativity | Notes |
|---|---|---|---|
| 1 | if … else …, try … else … | right | Ternary, try-else |
| 2 | |, |> | left | Pipe (value-flow) |
| 3 | ??, ?| | right | Coalesce |
| 4 | or | left | Logical OR (short-circuit) |
| 5 | and | left | Logical AND (short-circuit) |
| 6 | not | n/a | Logical NOT (prefix) |
| 7 | is, kind, is not | n/a | Kind test |
| 8 | has | left | Membership operator (no in — use .includes(v)) |
| 9 | ==, !=, <, <=, >, >=, ~= | left | Comparison |
| 10 | +, - | left | Additive (and string/array concat) |
| 11 | *, /, % | left | Multiplicative |
| 12 | as | left | Cast |
| 13 | - (unary) | n/a | Negation |
| 14 | .field, .method(), [idx], {cond}, ?, ! | left | Postfix steps |
| 15 | $, @, literal, (...), lambda, let, match, patch, comp | n/a | Primary |
Common pitfalls
Pipe vs method call.
$.x | upper # ✗ — interprets `upper` as a name to pipe into
$.x.upper() # ✓ — method call
Comparison chains.
1 < x < 10 # ✗ — parses as `(1 < x) < 10`
1 < x and x < 10 # ✓
Ternary mid-chain.
$.x.upper() if cond else $.x # parses fine — the ternary wraps the whole
# left expression
Negation tightness.
not a == b # parses as `(not a) == b` — surprising!
not (a == b) # parens are clearer
a != b # cleanest
Coalesce + comparison.
$.x ?? 0 > 5 # parses as `($.x ?? 0) > 5` (low-precedence coalesce)
Try captures errors only.
try $.x.parse_int() else 0
try does not catch falsy-as-error — only actual evaluation errors (missing
field, bad cast, regex failure, etc.).
Pipelines
A jetro query is a pipeline of stages. The shape is always:
Source → Stage* → Sink
Source produces values one at a time. Each Stage consumes one value and
produces zero, one, or many. The Sink collects results.
What counts as a stage
| Stage | Examples | Output |
|---|---|---|
| One-to-one | .map, .enumerate, .lag, .zscore | One out per in |
| Filter | .filter, .find, .compact, .takewhile | Zero or one out per in |
| Expander | .flat_map, .flatten, .split, .lines, .chars | Many out per in |
| Reducer | .sum, .count, .min, .any, .find_index | One total |
| Positional | .first, .last, .nth(i), .collect | One or N |
| Barrier | .sort, .unique, .group_by, .window, .chunk | Buffers, then emits |
A reducer or positional terminator ends the pipeline; further methods chain on the result (a scalar or array) rather than streaming.
Streaming vs. barrier
Most stages stream — they process one value, emit, repeat. The pull-based
backend means each value travels end-to-end before the next is fetched. This
is what makes early termination work (.first, .find).
Barriers cannot stream: .sort must see every element before it can emit
any. The pipeline buffers up to the barrier, runs the barrier as a unit,
then resumes streaming if more stages follow.
$.xs.map(f).filter(p).sort(@.x).take(10).map(g)
\________________/ \____________/
streaming streaming again
↑
barrier point
Barriers carry an apply_barrier method on the builtin.
Sources
The most common source is a path: $.users is a source. Other shapes:
- An array literal (
[1,2,3].map(f)) - A range (
(0..10).map(f)) - A method that returns a sequence (
$.text.lines().map(...))
Sinks
If your final stage is a reducer, the sink is the reducer's accumulator. If it's a streaming stage, the sink collects into an array.
.collect() is the explicit sink: scalar in → [scalar], array in → identity,
null in → []. Use it when you need a deterministic array shape.
Composed stages
Adjacent stages get composed when possible: two Stages fold into one
virtual call per element. This is Composed<A, B> under the hood; the
optimizer fuses chains of .maps, .filters, and .picks aggressively.
User-visible effect: writing many short stages costs roughly the same as one big lambda — write for clarity.
Backend selection
Each pipeline node carries a list of preferred backends. The router tries them in order; the first to declare it can run the node wins.
| Source | Preferred backends |
|---|---|
FieldChain (e.g. $.a.b.c) | tape-view → tape-rows → materialised → val-view → interpreted |
| Generic expression | fast-children → interpreted |
| Deep search | structural index → interpreted |
| Single root path | tape-path → interpreted |
You don't pick the backend — the planner does. But knowing they exist explains why simple queries are fast: they often run zero-copy over the simd-json tape.
When to think about pipeline shape
In practice, almost never. Two cases:
- Don't sort until you have to. A pre-sort barrier defeats early
termination. Push
.filter,.take,.firstbefore.sortif the semantics allow. - Avoid full materialisation in the middle. Chains of streaming stages
stay zero-copy. A
.collect()mid-chain forces a full pass.
The next chapter, Demand Propagation, explains why these heuristics work.
Demand Propagation
Demand propagation is the planner pass that makes "obvious" queries fast. It walks the pipeline backward — from sink to source — asking each operator: given what comes after you, what do you actually need from your source?
The answer is encoded in three lanes per stage and then used at execution time to skip work.
The three lanes
1. PullDemand — how many inputs?
| Variant | Meaning |
|---|---|
All | Read everything |
FirstInput(n) | Stop after n inputs |
LastInput(n) | Seek to the end, take last n |
NthInput(i) | Jump to a single index |
UntilOutput(n) | Keep reading until n outputs are produced |
2. ValueNeed — what payload from each input?
| Variant | Meaning |
|---|---|
None | Don't decode the row at all |
Predicate | Only what the predicate touches |
Projection | Only the fields used in a projection |
Numeric | Only numeric content |
Whole | The full row (default pessimistic) |
3. order: bool — does input order matter?
Some sinks (e.g. .sum()) don't care about order. The planner can use this
to enable parallel-friendly access patterns when supported.
Backward walk
For a pipeline s1 → s2 → … → sN → sink, the planner does:
demand = sink_demand
for op in [sN, …, s2, s1]: # reverse order
upstream = op.propagate_demand(demand)
record (op, downstream=demand, upstream)
demand = upstream
The final demand is what the source must satisfy. The source backend
chooses an access strategy that matches.
Operator laws
Every builtin declares one of these laws (in defs.rs):
| Law | Effect on demand |
|---|---|
Identity | Pass through unchanged (e.g. .upper, .lower) |
MapLike | Preserve pull, force ValueNeed::Whole |
FilterLike | FirstInput(n) becomes UntilOutput(n) |
TakeWhile | Same as filter, but bounded |
UniqueLike | Must scan until N distinct outputs |
Take(n) | Cap pull at FirstInput(n) |
First | Always FirstInput(1) |
Last | Always LastInput(1) |
Count | All inputs, ValueNeed::None |
NumericReducer | All inputs, ValueNeed::Numeric |
Six worked examples
A. Early termination on .first
$.items.map(name).first()
first()declaresFirstInput(1)to its source.map(name)isMapLike: preserves pull, demandsWholefrom items- Source receives: read 1 item, decode fully
Without demand: read all items, decode all, take first.
B. Bounded filter
$.items.filter(active).take(3)
take(3)←FirstInput(3)filter(active)←UntilOutput(3)(read until 3 pass)- Source: read until 3 active items found
Without demand: filter the entire array, then slice.
C. Field-level projection
$.users.map(u => {id, name})
- The map projection touches
idandname - Source: decode only
id,namefrom each user
Other fields are not allocated. Over a wide-record document, this is the biggest win.
D. Last-element scan
$.logs.filter(severity >= 3).last()
last()←LastInput(1)filter(...)←UntilOutput(1)from the end- Source: scan backward, stop after first match
Without demand: scan forward, materialise all matches, take last.
E. Count without payloads
$.items.filter(status == "done").count()
count()declaresValueNeed::Nonefilter(...)declaresPredicateonstatus- Source: decode only
status, no other fields
F. Reverse + take
$.items.reverse().take(2)
take(2)←FirstInput(2)reverse()flips: source receivesLastInput(2)- Source: seek to end, read 2 backward, then reverse
What demand does not do
- It does not change result semantics. Two pipelines with identical text produce identical output regardless of demand state.
- It does not optimise across barriers (
.sort,.group_by). A barrier forcesAllupstream — it must see every input. - It does not move work between stages. Operators don't fuse; demand only gates what they read.
When you'll feel demand kick in
Three rough rules of thumb:
- Put
take/first/findnear the end. That's how their pull demand reaches back to the source. - Project early when possible.
map(@.field)upstream of a barrier reduces the buffered set. - Avoid unnecessary
collect(). It forces full materialisation and resets the demand walk.
Demand is invisible most of the time — your queries get faster than they "should" be, and that's exactly the goal.
Lazy Evaluation and Caches
Jetro is lazy in three places that matter to users.
1. Document parsing
Jetro::from_bytes does not fully parse the document up front when the
default simd-json feature is enabled. Instead it builds a tape — a flat
array of tokens — and lazily decodes parts as queries demand them.
What this means:
- Cold-start is ~4× faster than the legacy
serde_json::Valuepath. - A query that touches only
$.x.ydecodes the rest of the doc only when asked. - Borrowed string slices (
Val::StrSlice) avoid a copy when the value is read-only.
If you want eager full parsing (e.g. for serde_json::Value round-trips):
let doc: serde_json::Value = serde_json::from_slice(bytes)?;
let v = engine.collect_value(doc, "$.x")?;
2. Streaming pipelines
The pull-based pipeline backend processes one element at a time. A stage
doesn't run until its downstream consumer pulls. This is what enables
.first() and .find() to terminate early.
A consequence: side effects in lambdas are not guaranteed to fire for every element. (Lambdas in jetro have no I/O, so this is mostly an academic concern, but worth knowing if you write a custom builtin.)
3. Plan caches
Two caches matter:
Plan cache (per JetroEngine)
When you call engine.collect(&doc, query) repeatedly with the same query,
the parsed AST → IR → bytecode pipeline is computed once and reused. Default
capacity: 256 entries, evicted wholesale when full.
For workloads with a small fixed set of queries and many documents, this is a big speedup. For ad-hoc one-shot queries, it's a no-op.
Path cache (per VM)
The bytecode VM caches resolved pointer paths per document. The cache key hashes both structure and primitive leaf values bounded at depth 8 — two documents with identical shape but different leaves produce different hashes, so the cache stays correct across calls.
You don't manage this directly. It's amortised over many queries on the same document.
When laziness backfires
It rarely does, but two pitfalls:
Forcing materialisation. Methods like .collect(), .sort(),
.unique(), .group_by() are barriers — they materialise. Putting them
mid-chain when they aren't needed defeats laziness.
Holding onto Vals. A Val is Arc-wrapped, so cloning is O(1), but the
Arc keeps the underlying data alive. If you query a giant doc, hold onto a
small projection, and let the doc go, you may be surprised that the original
data is still resident — the projection's Val::StrSlices borrow into the
tape.
Use .to_json() (or serde_json::Value round-trip) to disconnect a
projection from the source tape when you really need to release memory.
Practical recipe
For long-lived servers:
// At startup
let engine = JetroEngine::default();
// Per request
let result = engine.collect_bytes(req_body, "$.users.filter(@.active).count()")?;
Plans get cached, parsing is lazy, the pipeline early-terminates. There's typically nothing else to tune.
Builtin Reference — Overview
Jetro ships 181 builtin methods. They fall into 18 categories. Every method has the same shape:
.method(arg1, arg2, …)
…or, when the parser routes through inline path filters and sugar:
$.path.method(...)
This part documents every method. Each entry follows the format:
name(aliases: …)
- Signature: what it takes and returns
- Behavior: one-paragraph description
- Example: at least one minimal runnable example
- Demand law / Notes: when relevant
Index
| Category | What goes here | Page |
|---|---|---|
| Value introspection | type, len, schema, JSON round-trip | Introspection |
| Numeric scalars | ceil, floor, round, abs | Numeric |
| String transforms | upper, trim, pad_*, slice, replace … | String |
| String search / regex | starts_with, match_*, captures, split_re | String Search |
| Conversion | to_number, parse_int, parse_bool | Conversion |
| Streaming one-to-one | map, enumerate, pairwise, lag, zscore | Streaming |
| Filtering | filter, find, compact, takewhile | Filtering |
| Expanding | flat_map, flatten, lines, chars | Expanding |
| Reducers | sum, count, any, max_by | Reducers |
| Positional | first, last, nth, collect | Positional |
| Barriers | sort, unique, group_by, window | Barrier |
| Arrays / sets | append, diff, union, zip | Arrays |
| Objects | keys, pick, merge, transform_values | Objects |
| Path mutation | get_path, set_path, set, update | Path Mutation |
| Deep traversal | deep_find, walk, rec | Deep |
| Predicates | has, missing, includes, index | Predicates |
| Tabular | to_csv, to_tsv | Tabular |
| Relational | equi_join | Relational |
Notation in this part
- aliases — alternative names accepted by the parser. They lower to the same builtin and behave identically.
- "demand law" — what kind of
Demandthis builtin propagates upstream. See Demand Propagation for the model. - "barrier" / "stream" / "scalar" — execution shape (does it buffer, stream, or run once on a single value).
When a method appears under multiple categories (e.g. .find is both a
filter and positional), it lives in the most specific chapter and is
cross-linked.
Sharp edges
A small set of v0.5 design choices is documented in
Known Limitations: replace is
single-occurrence (use replace_all for substitute-every), there is no
in operator (use xs has v), and rec(fn) caps at 10 000 iterations
when the step never converges (use rec(fn, cond) to bound). Two engine
items remain on the fix-list: rec() no-arg and a stronger
runaway-iteration guard.
Aliases at a glance
| Canonical | Aliases |
|---|---|
any | exists |
chunk | batch |
drop_while | dropwhile |
take_while | takewhile |
includes | contains |
skip | drop |
sort | sort_by |
unique | distinct |
deep_find | ..find (deep-method form) |
deep_shape | ..shape |
deep_like | ..like |
These pairs are interchangeable. Pick whichever reads better.
Value Introspection
Methods that report on the kind and shape of a value, plus JSON round-trip.
type
- Signature:
Any -> String - Behavior: Returns the kind of value as a string:
"null","bool","number","string","array","object".
QUERY: $.x.type()
DOC: {"x": [1,2,3]}
OUT: "array"
len
- Signature:
(String|Array|Object) -> Number - Behavior: Length: chars for strings, elements for arrays, key count for
objects. Errors on
null/bool/number.
DOC: {"s": "hello", "xs": [1,2,3], "o": {"a":1,"b":2}}
QUERY: $.s.len() OUT: 1
QUERY: $.xs.len() OUT: 3
QUERY: $.o.len() OUT: 1
to_string
- Signature:
Any -> String - Behavior: Stringifies a scalar (
42→"42",true→"true",null→"null"). For arrays/objects, returns the JSON serialisation.
QUERY: 42.to_string() OUT: "42"
QUERY: ([1, 2]).to_string() OUT: "[1,2]"
to_json
- Signature:
Any -> String - Behavior: Compact JSON serialisation of any value.
QUERY: $.user.to_json()
Distinguish from to_string: for compound values, the two are equivalent;
for scalars, to_json always quotes strings ("foo" → "\"foo\""),
to_string does not.
from_json
- Signature:
String -> Any - Behavior: Parse a JSON string into a value.
QUERY: '{"x":1}'.from_json()
OUT: {"x":1}
QUERY: $.encoded.from_json().x
Errors on malformed input. Wrap in try if the source is untrusted:
try $.s.from_json() else null
schema
- Signature:
Any -> Object - Behavior: Infers a schema sketch — keys, kinds, nullable flags. Useful for "what does this document look like?" probes.
DOC: [{"id": 1, "name": "a"}, {"id": 2, "name": null}]
QUERY: $.schema()
OUT: {"items":{"fields":{"id":{"type":"Int"},"name":{"nullable":true,"type":"String"}},"required":["id"],"type":"Object"},"len":2,"type":"Array"}
The exact output format is documented in
builtins/ops/schema.rs;
treat it as advisory rather than a stable contract.
Demand notes
lenover an array isValueNeed::Noneupstream — it doesn't decode rows.typeisIdentitydemand-wise.from_json/to_jsonare scalar transforms with no demand interaction.
Practical examples
# Quick shape check
$.payload.type() # → "object"
$.payload.len() # for object: number of keys
# Distinguish array length vs string length
$.items.len() # array element count
$.title.len() # number of characters
# Safe deserialization of a payload field
try $.body.from_json() else null
# Compact serialization
$.event.to_json()
# Stringify any value
$.x.to_string()
# Probe an unknown payload's schema
$.events[0].schema()
Numeric Scalars
Fixture
Examples below run against:
DOC: {"products": [{"id": 1, "price": 3.7}, {"id": 2, "price": 4.2}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "deltas": [-1, 2, -3, 4], "xs": [1, 2, 3, 4, 5]}
Pure scalar transforms over numbers.
ceil
- Signature:
Number -> Number - Behavior: Smallest integer ≥ x.
QUERY: 3.2.ceil() OUT: 4
QUERY: (-3.2).ceil() OUT: -3
floor
- Signature:
Number -> Number - Behavior: Largest integer ≤ x.
QUERY: 3.7.floor() OUT: 3
QUERY: (-3.7).floor() OUT: -4
round
- Signature:
Number -> Number - Behavior: Round to nearest; ties round half-away-from-zero.
QUERY: 3.5.round() OUT: 4
QUERY: 3.4.round() OUT: 3
QUERY: (-3.5).round() OUT: -4
abs
- Signature:
Number -> Number - Behavior: Absolute value.
QUERY: (-7).abs() OUT: 7
QUERY: 3.5.abs() OUT: 3.5
Mapping over arrays
These are scalar; lift them with .map:
DOC: {"xs": [1.4, 2.6, -3.5]}
QUERY: $.xs.map(@.round())
OUT: [1,3,-4]
QUERY: $.xs.map(@.abs()).sum()
OUT: 7.5
See also
Numeric reducers (sum, avg, min, max) live in
Reducers. Streaming numeric transforms (zscore,
pct_change, cummax, cummin) live in Streaming.
Practical examples
# Round every price up to the nearest dollar
$.products.map(p => p.merge({price_ceil: p.price.ceil()}))
# Percent → integer percent
$.metric.pct.map(@ * 100).map(@.round())
# Magnitudes (drop sign)
$.deltas.map(@.abs())
# Banker-style splits
$.amount.floor() # cents component, etc.
# Build a histogram with binned values
$.measurements.map(m => (m / 10).floor() * 10).count_by(@)
# → {0: 12, 10: 5, 20: 3, ...}
String Transforms
Scalar string operations. Lift with .map to apply to an array of strings.
Case
| Method | What | Example |
|---|---|---|
upper | ASCII uppercase | "foo".upper() → "FOO" |
lower | ASCII lowercase | "FOO".lower() → "foo" |
capitalize | First char upper, rest lower | "foo bar".capitalize() → "Foo bar" |
title_case | Each word capitalised | "foo bar".title_case() → "Foo Bar" |
snake_case | lowerSnake_case to lower_snake_case | "FooBar".snake_case() → "foo_bar" |
kebab_case | Words joined with - | "FooBar".kebab_case() → "foo-bar" |
camel_case | fooBar style | "foo_bar".camel_case() → "fooBar" |
pascal_case | FooBar style | "foo_bar".pascal_case() → "FooBar" |
reverse_str | Reverse char order | "abc".reverse_str() → "cba" |
Trim
| Method | What |
|---|---|
trim | Strip whitespace from both ends |
trim_left | Strip leading whitespace |
trim_right | Strip trailing whitespace |
QUERY: " hi ".trim() OUT: "hi"
QUERY: " hi ".trim_left() OUT: "hi "
Padding and centering
| Method | Signature | Example |
|---|---|---|
pad_left(width, char?) | Right-align by padding left | "7".pad_left(3, "0") → "007" |
pad_right(width, char?) | Left-align by padding right | "hi".pad_right(5) → "hi " |
center(width, char?) | Center within width | "hi".center(6) → " hi " |
If char is omitted, space is used.
Indent / dedent
indent(n) takes an integer (number of spaces); the prefix is fixed
spaces.
QUERY: "line1\nline2".indent(2)
OUT: " line1\n line2"
dedent() strips the first line's leading whitespace from every
subsequent line that begins with the same prefix. It is not a
common-prefix dedent across all lines:
QUERY: " a\n b".dedent()
OUT: "a\nb"
Slice
"hello world".slice(0, 5) # "hello"
"hello world".slice(6) # "world"
"hello".slice(-3) # "llo"
slice(start, end?) mirrors Python; end is exclusive.
Repeat
"ab".repeat(3) # "ababab"
Replace
| Method | Behavior |
|---|---|
replace(needle, with) | Replace first literal occurrence |
replace_all(needle, with) | Replace all literal occurrences |
replace_re(pattern, with) | Regex-aware single replacement |
replace_all_re(pattern, with) | Regex-aware all replacements |
QUERY: "hello hello".replace("hello", "hi")
OUT: ["hi hello"]
QUERY: "hello hello".replace_all("hello", "hi")
OUT: ["hi hi"]
QUERY: "abc123def".replace_all_re("\d+", "#")
OUT: "abc#def"
Regex escapes inside jetro string literals. Use a single backslash:
"\d","\w+","\s". Jetro string literals don't eat backslashes separately; doubling ("\\d") sends the regex engine the literal two-char sequence\\d, which is not the digit class and silently fails to match. This differs from host languages like Python or JavaScript where you must double-escape.
Strip
"prefix-foo".strip_prefix("prefix-") # "foo"
"foo.txt".strip_suffix(".txt") # "foo"
If the prefix/suffix isn't present, returns the input unchanged.
Encoding
| Method | What |
|---|---|
to_base64 | Standard base64 encode |
from_base64 | Standard base64 decode |
url_encode | Percent-encode |
url_decode | Percent-decode |
html_escape | & → &, < → <, etc. |
html_unescape | Reverse of html_escape |
QUERY: "hello world".to_base64() OUT: "aGVsbG8gd29ybGQ="
QUERY: "a b".url_encode() OUT: "a%20b"
QUERY: "<b>".html_escape() OUT: "<b>"
Demand notes
All string transforms are Identity demand-wise: they don't change what the
upstream needs to produce.
Practical examples
# Normalise display names
$.users.map(u => u.name.trim().title_case().first())
# Build an URL-safe slug
"My Article Title".lower().replace_all(" ", "-")
# → "my-article-title"
# CamelCase to snake_case migration
"FooBarBaz".snake_case() # → "foo_bar_baz"
# Truncate with ellipsis
$.posts.map(p => p.body.slice(0, 100) + "..." if p.body.len() > 100 else p.body)
# Parse a comma-separated tag list
$.tags_csv.split(",").map(@.trim())
# Encode for URL
$.query.url_encode()
# Encode binary as base64
$.bytes.to_base64()
# HTML-escape user input
$.comments.map(c => c.text.html_escape())
# Pad a numeric ID for fixed-width keys
($.id as string).pad_left(8, "0")
# → "00000042" for id=42
# Strip a known prefix
"https://example.com/path".strip_prefix("https://")
# → "example.com/path"
# Build a banner
"=".repeat(40) # → "========================================"
# Indent a nested message
$.message.indent(4)
String Search and Regex
Predicates (return boolean)
| Method | Behavior |
|---|---|
is_blank | True if empty or only whitespace |
is_numeric | True if all chars are digits |
is_alpha | True if all chars are letters |
is_ascii | True if all bytes < 128 |
starts_with(prefix) | Prefix check |
ends_with(suffix) | Suffix check |
QUERY: " ".is_blank() OUT: true
QUERY: "abc123".is_numeric() OUT: false
QUERY: "hello".starts_with("he") OUT: true
Position
| Method | Returns |
|---|---|
index_of(needle) | First index of needle, or -1 |
last_index_of(needle) | Last index of needle, or -1 |
QUERY: "hello world".index_of("o") OUT: 4
QUERY: "hello world".last_index_of("o") OUT: 7
Substring search
"foo bar foo".matches("foo") # 2 (count of literal occurrences)
"abc 12 cd 34".scan("\d+") # ["12", "34"] (regex matches as strings)
Regex match
| Method | Returns |
|---|---|
re_match(pattern) | Boolean |
match_first(pattern) | First match string, or null |
match_all(pattern) | Array of all match strings |
captures(pattern) | First match with groups: [full, g1, g2, …] |
captures_all(pattern) | Array of captures results |
QUERY: "a1b2".re_match("\d") OUT: true
QUERY: "a1b2".match_first("\d+") OUT: "1"
QUERY: "a1b2".match_all("\d+") OUT: ["1","2"]
QUERY: "key=val".captures("(\\w+)=(\\w+)")
OUT: ["key=val","key","val"]
The ~= operator is sugar for re_match and returns the same boolean.
Splitting
| Method | Behavior |
|---|---|
split(sep) | Split on literal separator |
split_re(pattern) | Split on regex |
QUERY: "a,b,c".split(",") OUT: ["a","b","c"]
QUERY: "a,,b".split_re(",+") OUT: ["a","b"]
Multi-needle membership
"abc def".contains_any(["abc", "xyz"]) # true (matches first)
"abc def".contains_all(["abc", "def"]) # true (all match)
Demand notes
Regex builtins are scalar. Lift across an array with .map(...). The
underlying regex is compiled once per query and reused — no per-element
re-compilation cost.
Conversion and Parsing
Coerce between value kinds.
to_number
- Signature:
Any -> Number | null - Behavior: Coerce to number.
"42"→42,"3.14"→3.14,true→1,false→0. Returns null for unparseable strings.
QUERY: "42".to_number() OUT: 42
QUERY: "3.14".to_number() OUT: 3.14
QUERY: "abc".to_number() OUT: null
to_bool
- Signature:
Any -> Boolean - Behavior: Truthiness:
false/null/0/""/[]/{}→false, everything else →true.
QUERY: $.maybe.to_bool()
parse_int(radix?)
- Signature:
String -> Number | null - Behavior: Parse a string as integer, optional radix (default 10).
QUERY: "42".parse_int() OUT: 42
QUERY: "ff".parse_int(16) OUT: 255
QUERY: "0b101".parse_int(2) OUT: 5
parse_float
- Signature:
String -> Number | null - Behavior: Parse a string as float (IEEE 754 double).
QUERY: "3.14".parse_float() OUT: 3.14
QUERY: "1e6".parse_float() OUT: 1000000.0
parse_bool
- Signature:
String -> Boolean | null - Behavior: Strict parse: only
"true"and"false"(lowercase) match; everything else returns null.
QUERY: "true".parse_bool() OUT: true
QUERY: "TRUE".parse_bool() OUT: true
as cast (operator)
The as operator does the same coercions as to_*:
"42" as int # 42
42 as string # "42"
true as int # 1
Use as when the type is statically known; use to_number/parse_* when
parsing untrusted strings (since as errors on failure rather than returning
null).
Round-trip JSON
For full document round-trip, see from_json/to_json.
Practical examples
# Coerce strings collected from a CSV
$.rows.map(r => r.merge({age: r.age.to_number(), price: r.price.parse_float()}))
# Defensive parse — null on garbage
$.user_input.parse_int() ?? 0
# Boolean coercion of a flag string
"true".parse_bool() ?? false
# Truthiness coercion
$.value.to_bool() # null/0/""/empty → false; else true
# Cast operator for static conversions
($.id as string).pad_left(8, "0")
# Round-trip number → string → back
(3.14 as string).parse_float() # → 3.14
Streaming One-to-One
Each input produces exactly one output. These compose freely; the planner fuses adjacent stages into a single composed stage when possible.
Fixture
Examples in this chapter run against:
{
"users": [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}],
"xs": [1, 2, 3, 4, 5],
"prices":[100, 105, 102, 110, 108, 115]
}
map
- Signature:
Array<A> -> Array<B>(withf: A -> B) - Demand law:
MapLike— preserves pull, forcesWhole.
QUERY: $.users.map(u => u.name)
OUT: ["Ada","Bob"]
QUERY: $.xs.map(@ * 2)
OUT: [2, 4, 6, 8, 10]
QUERY: $.users.map(@.name.upper())
OUT: ["ADA","BOB"]
map is the workhorse. The lambda may use any of the three forms.
enumerate
- Signature:
Array<A> -> Array<{index: Number, value: A}> - Behavior: Pair each element with its zero-based index. Output is a
record
{index, value}per element.
QUERY: $.xs.enumerate()
OUT: [{"index":0,"value":1},{"index":1,"value":2},{"index":2,"value":3},{"index":3,"value":4},{"index":4,"value":5}]
QUERY: $.users.map(@.name).enumerate()
OUT: [{"index":0,"value":"Ada"},{"index":1,"value":"Bob"}]
pairwise
- Signature:
Array<A> -> Array<[A, A]> - Behavior: Yield consecutive pairs
[xs[0], xs[1]],[xs[1], xs[2]], …
QUERY: [1,2,3,4].pairwise()
OUT: [[1,2],[2,3],[3,4]]
QUERY: $.xs.pairwise().map(p => p[1] - p[0])
OUT: [1, 1, 1, 1]
lag(n=1) and lead(n=1)
- Signature:
Array<Number> -> Array<Number | null> - Behavior: Shift by
npositions; out-of-range positions becomenull. - Numeric: Output values are returned as floats regardless of input numeric type.
QUERY: $.xs.lag()
OUT: [null, 1.0, 2.0, 3.0, 4.0]
QUERY: $.xs.lead()
OUT: [2.0, 3.0, 4.0, 5.0, null]
QUERY: $.xs.lag(2)
OUT: [null, null, 1.0, 2.0, 3.0]
diff_window(n=1)
- Signature:
Array<Number> -> Array<Number | null> - Behavior:
xs[i] - xs[i - n], withnulluntil lag is satisfied.
QUERY: $.prices.diff_window()
OUT: [null, 5.0, -3.0, 8.0, -2.0, 7.0]
pct_change(n=1)
- Signature:
Array<Number> -> Array<Number | null> - Behavior:
(xs[i] - xs[i-n]) / xs[i-n]— relative change.
QUERY: [100.0, 110.0, 121.0].pct_change()
OUT: [null, 0.1, 0.09999999999999998]
cummax and cummin
- Signature:
Array<Number> -> Array<Number> - Behavior: Running max / min up to and including the current position.
QUERY: $.prices.cummax()
OUT: [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]
QUERY: $.prices.cummin()
OUT: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
zscore
- Signature:
Array<Number> -> Array<Number> - Behavior: Standardise:
(x - mean) / stddev. Two passes (one for stats, one for transform); not strictly streaming, but presented as a one-to-one stage at the user surface.
QUERY: [1.0, 2.0, 3.0, 4.0, 5.0].zscore()
OUT: [-1.414213562373095, -0.7071067811865475, 0.0, 0.7071067811865475, 1.414213562373095]
accumulate
See Barriers — accumulate is a barrier because it requires
a custom reducer over the full input.
Practical examples
DOC: {"prices":[100, 105, 102, 110, 108, 115]}
# Apply tax to every price
QUERY: $.prices.map(@ * 1.08)
OUT: [108.0, 113.4, 110.16000000000001, 118.80000000000001, 116.64000000000001, 124.2]
# Day-over-day deltas
QUERY: [100,105,102,110,108].pairwise().map(p => p[1] - p[0])
OUT: [5, -3, 8, -2]
# Running max ("high-water mark")
QUERY: $.prices.cummax()
OUT: [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]
# Lag-1 to compare current vs previous
QUERY: $.prices.lag()
OUT: [null, 100.0, 105.0, 102.0, 110.0, 108.0]
Filtering
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "xs": [1, 2, 3, 4, 5]}
Methods that drop elements based on a predicate.
filter
- Signature:
Array<A> -> Array<A>(withpred: A -> Bool) - Demand law:
FilterLike—FirstInput(n)from downstream becomesUntilOutput(n)upstream.
$.users.filter(u => u.active)
$.users.filter(@.age >= 18)
$.users.filter(@.email ~= "@admin\.")
filter is the universal predicate stage. Combine with .take(n) for
bounded scans:
$.events.filter(@.severity >= 3).take(10)
The planner stops reading from the source as soon as 10 events pass — no full scan.
find
- Signature:
Array<A> -> A | null(first match only on this branch) - Demand law:
FilterLikewithFirstInput(1)→ source.
DOC: {"users": [{"id":1,"role":"user"},{"id":2,"role":"admin"}]}
QUERY: $.users.find(@.role == "admin")
OUT: {"id":2,"role":"admin"}
find returns the first match (or null if none), not an array. Use
find_all for the array form.
find_all
- Signature:
Array<A> -> Array<A> - Behavior: Like
filter. Alias kept for readability.
$.users.find_all(@.role == "admin")
Equivalent to .filter(@.role == "admin"). The two are interchangeable.
compact
- Signature:
Array<Any> -> Array<Any> - Behavior: Drop nulls.
QUERY: [1, null, 2, null, 3].compact()
OUT: [1,2,3]
Equivalent to .filter(@ != null), but reads better and avoids a lambda.
take_while (alias takewhile)
- Signature:
Array<A> -> Array<A> - Behavior: Take elements while
predis true; stop at the first false (don't keep checking).
QUERY: [1, 2, 3, 4, 1, 2].take_while(@ < 3)
OUT: [1,2]
Demand law: bounded — terminates the source as soon as pred flips.
drop_while (alias dropwhile)
- Signature:
Array<A> -> Array<A> - Behavior: Drop the leading run where
predholds; emit the rest.
QUERY: [1, 2, 3, 4, 1, 2].drop_while(@ < 3)
OUT: [3,4,1,2]
remove
- Signature:
Array<A> -> Array<A> - Behavior: Inverse of
filter. Drop elements wherepredis true.
QUERY: $.xs.remove(@ < 0)
Useful when the negated predicate reads worse than the affirmative.
Filtering objects
For object filtering, see filter_keys and filter_values in
Objects. They take a predicate over keys / values and return
a filtered object.
Practical examples
DOC: {"users":[
{"id":1,"name":"Ada","active":true,"age":30},
{"id":2,"name":"Bob","active":false,"age":24},
{"id":3,"name":"Cy", "active":true,"age":42}
]}
# Active users only
QUERY: $.users.filter(@.active)
OUT: []
# Active users over 30, just names
QUERY: $.users.filter(@.active and @.age >= 30).map(@.name)
OUT: []
# First admin (early-exit)
QUERY: $.users.find(@.active).name
OUT: "Ada"
# Take while a streak holds
QUERY: [1,2,3,4,1,2].take_while(@ < 3)
OUT: [1,2]
# Negate a predicate
QUERY: $.users.remove(@.active).count()
OUT: 1
# Drop nulls
QUERY: [1, null, 2, null, 3].compact()
OUT: [1,2,3]
Worked demand example
DOC: {"events": [
{"sev": 1, "msg": "ok"},
{"sev": 2, "msg": "warn"},
{"sev": 3, "msg": "err"},
{"sev": 1, "msg": "ok2"}
]}
QUERY: $.events.filter(@.sev >= 2).map(@.msg).take(2)
OUT: []
Demand walks back: take(2) → FirstInput(2), map → preserves, filter → UntilOutput(2). Source reads events one-by-one, stops after the second match.
Expanding Sequences
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}]}
Each input produces zero or many outputs.
flat_map
- Signature:
Array<A> -> Array<B>(withf: A -> Array<B>) - Behavior: Map then concatenate.
QUERY: [[1,2],[3,4]].flat_map(@)
OUT: [1,2,3,4]
QUERY: $.users.flat_map(u => u.tags)
If f returns a non-array, it's wrapped first (flat_map(@ + 1) works on
numbers).
flatten
- Signature:
Array<Array<A>> -> Array<A> - Behavior: One level of flattening.
QUERY: [[1,2],[3],[4,5]].flatten()
OUT: [1,2,3,4,5]
To flatten more levels, chain: .flatten().flatten(). Or use walk for full
recursive flatten of arbitrary structure.
explode
⚠ v0.5 status:
exploderequires an argument in v0.5 (errors with"explode: missing argument"on no-arg call). Spec is intended to mirrorchars/to_pairsfor the common cases; until then, use those builtins directly.
- Signature (intended):
(Array | Object | String) -> Array<...> - Behavior (intended): Convert to a flat sequence of elements / pairs /
chars.
- Array: identity
- Object: array of
[key, value]pairs (=to_pairs) - String: array of single-char strings (=
chars)
split(sep)
- Signature:
String -> Array<String> - Behavior: Split a string on a literal separator. (See
split_refor regex.)
QUERY: "a,b,c".split(",")
OUT: ["a","b","c"]
lines
- Signature:
String -> Array<String> - Behavior: Split on newline (
\nor\r\n).
QUERY: "a\nb\nc".lines()
OUT: ["a","b","c"]
words
- Signature:
String -> Array<String> - Behavior: Split on whitespace (any run).
QUERY: " hello world ".words()
OUT: ["hello","world"]
chars
- Signature:
String -> Array<String> - Behavior: Array of single-character strings.
QUERY: "abc".chars()
OUT: ["a","b","c"]
chars_of(s)
- Signature:
String -> Array<String> - Behavior: Equivalent to
s.chars(). Useful when the source is the argument:
QUERY: ($.text).chars_of()
bytes
- Signature:
String -> Array<Number> - Behavior: UTF-8 byte values, 0–255.
QUERY: "abc".bytes()
OUT: [97,98,99]
Demand notes
Expanding stages declare an indeterminate output count. Pull demand from downstream still flows back, but the planner can't tightly bound how many inputs are needed — it pulls one input at a time and yields outputs lazily.
.flat_map(...) followed by .first() will read inputs until the first
flat-mapped output appears, then stop.
Practical examples
# Flatten one level
[[1,2],[3,4],[5]].flatten() # → [1, 2, 3, 4, 5]
# Tags across all books
$.books.flat_map(@.tags)
# Distinct hashtags across tweets
$.tweets.flat_map(t => t.entities.hashtags.map(@.text)).unique()
# Word histogram from a paragraph
$.text.words().map(@.lower()).count_by(@)
# Parse CSV headers
"id,name,email".split(",")
# Process logs line by line
$.log_blob.lines().filter(@.contains_any(["ERROR","WARN"]))
# Char-level analysis
$.password.chars().count_by(@) # frequency of each char
# Bytes for a binary diff
"hello".bytes() # → [104, 101, 108, 108, 111]
Reducers and Aggregates
Reducers consume the whole stream and emit a single value. They terminate the streaming pipeline.
Numeric
| Method | Signature | Notes |
|---|---|---|
sum | Array<Number> -> Number | Empty → 0 |
avg | Array<Number> -> Number | Empty → null |
min | Array<Number|String> -> ... | Empty → null |
max | Array<Number|String> -> ... | Empty → null |
QUERY: [1,2,3,4].sum() OUT: 10
QUERY: [1,2,3,4].avg() OUT: 2.5
QUERY: [3,1,4,1,5].min() OUT: 1.0
QUERY: ["b","a","c"].max() OUT: "c"
Demand law: NumericReducer — ValueNeed::Numeric, pull = All.
count
- Signature:
Array -> Number - Behavior: Element count.
- Demand:
Allinputs,ValueNeed::None(no payload decoded).
QUERY: $.users.count()
QUERY: $.users.filter(@.active).count()
This is the cheapest reducer — the source skips deserialisation entirely.
approx_count_distinct
⚠ Not yet supported in v0.5 — runtime returns
"ApproxCountDistinct: builtin unsupported". Spec exists; HyperLogLog backend pending.
- Signature (planned):
Array<Any> -> Number - Behavior (planned): Approximate count of distinct values via HLL.
For now, use .unique().count() for exact distinct count.
any (alias exists)
- Signature:
Array<A> -> Bool(withpred: A -> Bool) - Behavior: True if any element matches. Short-circuits.
QUERY: $.users.any(@.role == "admin")
OUT: false
all
- Signature:
Array<A> -> Bool - Behavior: True if every element matches. Short-circuits on first false.
QUERY: $.flags.all(@ == true)
find_index
- Signature:
Array<A> -> Number | null - Behavior: Zero-based index of first match, or null.
QUERY: ["a","b","c"].find_index(@ == "b")
OUT: 1
indices_where
- Signature:
Array<A> -> Array<Number> - Behavior: All indices where
predmatches.
QUERY: [10, 20, 5, 30, 8].indices_where(@ < 15)
OUT: [0,2,4]
max_by and min_by
- Signature:
Array<A> -> A | null - Behavior: Element with the maximum / minimum projected key.
QUERY: $.books.max_by(@.year)
QUERY: $.users.min_by(@.age)
Distinguish from .sort(@.key).first() — max_by is one pass; the sort form
allocates the sorted array first.
When to use which
| Goal | Use |
|---|---|
| Sum/avg numbers | sum, avg |
| Count rows | count |
| Exact distinct count | .unique().count() |
| Existence check | any |
| Universal check | all |
| Find index | find_index |
| Pick single max/min element | max_by, min_by |
Practical examples
DOC: {"books":[
{"title":"Dune","year":1965,"price":15},
{"title":"Foundation","year":1951,"price":10},
{"title":"Hyperion","year":1989,"price":18},
{"title":"Snow Crash","year":1992,"price":12}
]}
# Total revenue across all books
QUERY: $.books.map(@.price).sum()
OUT: 0
# Mean price
QUERY: $.books.map(@.price).avg()
OUT: 13.75
# Earliest and most expensive
QUERY: $.books.min_by(b => b.year).title
OUT: "Foundation"
QUERY: $.books.max_by(b => b.price).title
OUT: "Hyperion"
# Any cyberpunk in the catalog?
QUERY: $.books.any(@.tags? and @.tags.includes("cyberpunk"))
# (where @.tags? guards against missing field)
# Count books published before 1970
QUERY: $.books.filter(@.year < 1970).count()
OUT: 0
# Position of the first 1990s book
QUERY: $.books.find_index(@.year >= 1990)
OUT: 3
# All published years where price > 12
QUERY: $.books.indices_where(@.price > 12)
OUT: []
Positional Access
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "transactions": [{"ts": "01"}, {"ts": "02"}, {"ts": "03"}]}
Bounded extraction by position.
first
- Signature:
Array<A> -> A | null - Demand law:
First— alwaysFirstInput(1).
QUERY: [10,20,30].first() OUT: 10
QUERY: [].first() OUT: null
QUERY: $.users.filter(@.active).first()
# Source reads only enough to get one active user.
Equivalent to .nth(0) but reads better and is the canonical "early-exit"
sink.
last
- Signature:
Array<A> -> A | null - Demand law:
Last— alwaysLastInput(1).
QUERY: [10,20,30].last() OUT: 30
When the source supports it (an in-memory array, or a tape with known
length), last seeks to the end; for streams it must drain.
nth(i)
- Signature:
Array<A> -> A | null - Demand law:
NthInput(i)ifiis non-negative;LastInput(-i)otherwise.
QUERY: [10,20,30,40].nth(2) OUT: 30
QUERY: [10,20,30,40].nth(-1) OUT: 40
find_first(pred)
- Signature:
Array<A> -> A | null - Behavior: Same as
find— kept for naming clarity. Usefindin new code.
find_one(pred)
- Signature:
Array<A> -> A | null - Behavior: Asserts at most one match; errors if more than one matches. Useful for "exactly one user with this id" shapes.
QUERY: $.users.find_one(@.id == 1)
collect
- Signature:
Any -> Array<Any> - Behavior: Coerce to array. Scalar →
[scalar]; array → identity; null →[].
QUERY: 42.collect() OUT: [42]
QUERY: [1,2].collect() OUT: [1,2]
QUERY: null.collect() OUT: []
Use collect to guarantee an array shape at a pipeline boundary —
useful for callers that always want to iterate.
When to use a positional vs. a reducer
first() is a positional sink (returns one element). count() is a reducer
(returns one number). Both terminate the pipeline. Use whichever matches
your output type.
Worked example
DOC: {"orders": [
{"id": 1, "total": 100},
{"id": 2, "total": 50},
{"id": 3, "total": 200}
]}
QUERY: $.orders.filter(@.total > 75).first().id
OUT: 1
QUERY: $.orders.sort_by(@.total).last().id
OUT: 3
The first query early-exits (one filter pass, one match). The second sorts (barrier), then takes the last — the planner can't avoid the sort.
Practical examples
# First active user — early-exit, demand-aware
$.users.find(@.active).name
# Last log entry of severity 3+ (when the source supports random access)
$.logs.filter(@.sev >= 3).last().msg
# Get a user at known index
$.users.nth(2).email
# Negative-index array tail
$.transactions.nth(-1).ts
# Coerce-or-empty: scalar source becomes a 1-element array
"hello".collect() # → ["hello"]
null.collect() # → []
# Use collect() at a method-call boundary so callers always iterate
$.config.tags.collect().map(@.lower())
Barrier Operators
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}], "daily": [{"day": 1, "value": 10}, {"day": 2, "value": 12}]}
Barriers must see the full input before emitting any output. They materialise. Place them late in pipelines when possible.
Sort
sort (alias sort_by)
- Signature:
Array<A> -> Array<A> - Behavior: Stable ascending sort. With a projection, sorts by the projected key.
QUERY: [3,1,4,1,5].sort()
OUT: [1,1,3,4,5]
QUERY: $.books.sort(@.year)
QUERY: $.books.sort(b => -b.year)
QUERY: $.users.sort(@.last_name, @.first_name)
Multi-arg form sorts by a tuple of keys.
Distinct
unique (alias distinct)
- Signature:
Array<A> -> Array<A> - Behavior: Remove duplicates by structural equality, preserving first occurrence order.
QUERY: [3,1,4,1,5,9,2,6,5].unique()
OUT: [3,1,4,5,9,2,6]
unique_by(f)
- Signature:
Array<A> -> Array<A> - Behavior: Dedup by projected key.
QUERY: $.books.unique_by(@.author)
Group / count / index
group_by(key)
- Signature:
Array<A> -> Object<KeyString, Array<A>> - Behavior: Bucket by projected key.
QUERY: $.books.group_by(@.author)
OUT: {"null":[null]}
count_by(key)
- Signature:
Array<A> -> Object<KeyString, Number> - Behavior: Bucket counts.
QUERY: $.books.count_by(@.author)
OUT: [null]
index_by(key)
- Signature:
Array<A> -> Object<KeyString, A> - Behavior: Index by key. Last wins on collision.
QUERY: $.users.index_by(@.id)
OUT: [null]
group_shape
⚠ Not yet supported in v0.5 — runtime returns
"GroupShape: builtin unsupported". Tracked for a future release.
- Signature:
Array<Object> -> Array<Object> - Behavior (planned): Group by structural shape (key set).
Partition
partition(pred)
⚠ Not yet supported in v0.5 for chained / pipeline use. The
apply_*trait dispatch isn't wired through the streaming planner; calling it inside a chain like$.store.books.partition(@.x)is unreliable. Spec exists but output shape and execution path are subject to change.
- Signature (planned):
Array<A> -> [Array<A>, Array<A>] - Behavior (planned):
[matching, non-matching].
Window / chunk
window(size)
- Signature:
Array<A> -> Array<Array<A>> - Behavior: Sliding window of
size.
QUERY: [1,2,3,4,5].window(3)
OUT: [[1,2,3],[2,3,4],[3,4,5]]
chunk(size) (alias batch)
- Signature:
Array<A> -> Array<Array<A>> - Behavior: Non-overlapping chunks. Last chunk may be shorter.
QUERY: [1,2,3,4,5,6,7].chunk(3)
OUT: [[1,2,3],[4,5,6],[7]]
Rolling aggregates
| Method | Behavior |
|---|---|
rolling_sum(n) | Sum over a window of size n |
rolling_avg(n) | Average over a window |
rolling_min(n) | Min over a window |
rolling_max(n) | Max over a window |
QUERY: [1,2,3,4,5].rolling_sum(3)
OUT: [null,null,6.0,9.0,12.0]
The leading n-1 positions emit null until the window fills.
accumulate(init, fn)
⚠ Not yet supported in v0.5 — runtime returns
"accumulate: builtin not migrated to builtins.rs AST adapter". Spec exists; runtime hookup pending.
- Signature (planned):
Array<A> -> Array<B>(withfn: (B, A) -> B,init: B) - Behavior (planned): Streaming fold producing intermediate states.
For now, use cummax / cummin for running min/max, or build the fold
with a let + recursive helper if absolutely needed.
When to barrier
You have to barrier when:
- Order needs computation (
sort,unique) - Output is grouped / indexed (
group_by,index_by) - A window crosses element boundaries (
window,rolling_*)
You don't need a barrier for:
- Per-element transforms (
map) - Predicates (
filter) - Numeric reducers (
sum,count) — they're streaming reducers, not barriers
Practical examples
DOC: {"books":[
{"title":"Dune","year":1965,"author":"Herbert","price":15},
{"title":"Foundation","year":1951,"author":"Asimov","price":10},
{"title":"Hyperion","year":1989,"author":"Simmons","price":18},
{"title":"Snow Crash","year":1992,"author":"Stephenson","price":12}
]}
# Sort by year ascending
QUERY: $.books.sort(b => b.year).map(@.title)
OUT: [null]
# Sort by price descending (negate the key)
QUERY: $.books.sort(b => -b.price).map(@.title)
OUT: [null]
# Distinct tags across books
QUERY: $.books.flat_map(@.tags).unique()
# How many distinct authors
QUERY: $.books.unique_by(b => b.author).count()
OUT: 1
# Group by author
QUERY: $.books.group_by(b => b.author)
OUT: {"null":[null]}
# Histogram of authors (prefer count_by — no buffering of bucket payloads)
QUERY: $.books.count_by(b => b.author)
OUT: [null]
# Build a quick lookup table
QUERY: $.users.index_by(u => u.id)
# Sliding-3 windows for moving stats
QUERY: $.measurements.window(3).map(w => w.sum() / 3)
# 50/50 split into batches of 10 for paginated processing
QUERY: $.records.chunk(10)
# 7-day moving average over a numeric series
QUERY: $.daily.rolling_avg(7)
Array and Set Operations
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "tags_today": ["a", "b", "c"], "tags_yesterday": ["b", "c", "d"], "left_tags": ["a", "b", "c"], "right_tags": ["b", "c", "d"]}
Operations that take an array and produce a derivative array (or join two arrays).
append(v) and prepend(v)
- Signature:
Array<A> -> Array<A> - Behavior: Add
vto the end / front.
QUERY: [1,2,3].append(4) OUT: [1,2,3,4]
QUERY: [1,2,3].prepend(0) OUT: [0,1,2,3]
When used as chain-write terminals ($.path.append(v)), they patch the
document — see Patch.
reverse
- Signature:
Array<A> -> Array<A> - Behavior: Reverse element order. Also works on strings (calls
reverse_str).
QUERY: [1,2,3].reverse() OUT: [3,2,1]
QUERY: "abc".reverse() OUT: ["abc"]
Set-like operations
| Method | Behavior |
|---|---|
diff(other) | Elements in self not in other |
intersect(other) | Elements in both |
union(other) | Elements in either, deduped |
QUERY: [1,2,3,4].diff([3,4,5]) OUT: [1,2]
QUERY: [1,2,3,4].intersect([3,4,5]) OUT: [3,4]
QUERY: [1,2,3].union([3,4,5]) OUT: [1,2,3,4,5]
Equality is structural. Order: result preserves first-occurrence order from the left operand.
join(sep)
- Signature:
Array<String> -> String - Behavior: Concatenate strings with separator.
QUERY: ["a","b","c"].join(", ")
OUT: "a, b, c"
QUERY: $.users.map(@.name).join(" / ")
For non-string elements, lift with .map(@.to_string()) first.
zip(other) and zip_longest(other, fill?)
- Signature:
Array<A>, Array<B> -> Array<[A, B]> - Behavior: Pair element-wise.
QUERY: [1,2,3].zip(["a","b","c"])
OUT: [[1,"a"],[2,"b"],[3,"c"]]
QUERY: [1,2,3].zip(["a","b"]) OUT: [[1,"a"],[2,"b"]]
QUERY: [1,2,3].zip_longest(["a","b"]) OUT: [[1,"a"],[2,"b"],[3,null]]
QUERY: [1,2,3].zip_longest(["a"], "x") OUT: [[1,"a"],[2,"x"],[3,"x"]]
fanout(...lambdas)
- Signature:
A -> Array<...> - Behavior: Apply each lambda to the same input; collect results.
DOC: {"x": 10}
QUERY: $.x.fanout(@ * 2, @ + 1, @.to_string())
OUT: [20,11,"10"]
Useful for building multi-shape projections without repeating subexpressions.
zip_shape(arrays)
⚠ Not yet supported in v0.5 — runtime returns
"ZipShape: builtin unsupported". Spec exists; runtime hookup pending.
- Signature (planned):
Object<KeyString, Array<A>> -> Array<Object> - Behavior (planned): Combine parallel arrays under shared keys into an array of objects.
The inverse is pivot — see Objects.
Demand notes
Set operations and join are barriers (they consume both inputs fully).
reverse is a barrier too — but it's cheap and well-supported by demand:
reverse().take(n) is rewritten so the source seeks to the end.
Practical examples
# Add an item to a tag list
$.user.tags.append("admin") # patches the doc
# Build a "label = value" string
$.user.pick(name, email).values().join(" = ")
# CSV row from selected fields
[$.user.id, $.user.name, $.user.email].join(",")
# Set difference — find items missing from a baseline
[1,2,3,4,5].diff([2,4]) # → [1, 3, 5]
# Set intersection — common items
$.left_tags.intersect($.right_tags)
# Merge unique values, preserving first-occurrence order
$.tags_today.union($.tags_yesterday)
# Reverse and take last 5 (demand-aware: seeks end)
$.events.reverse().take(5)
# Pair two arrays positionally
[1,2,3].zip(["a","b","c"]) # → [[1,"a"],[2,"b"],[3,"c"]]
# Pad shorter array with default
[1,2,3].zip_longest(["a","b"], "?") # → [[1,"a"],[2,"b"],[3,"?"]]
# Run several projections at once
$.metric.value.fanout(@ * 2, @ + 1, @ - 1) # → [v*2, v+1, v-1]
Object Projection and Transform
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}
Methods that read or rewrite objects.
Keys and values
| Method | Signature | Result |
|---|---|---|
keys | Object -> Array<String> | Insertion-order key list |
values | Object -> Array<Any> | Insertion-order value list |
entries | Object -> Array<[String, Any]> | Key-value pairs |
to_pairs | Object -> Array<[String, Any]> | Alias of entries |
DOC: {"a": 1, "b": 2}
QUERY: $.keys() OUT: ["a","b"]
QUERY: $.values() OUT: [1,2]
QUERY: $.entries() OUT: [["a",1],["b",2]]
from_pairs
- Signature:
Array<[String, Any]> -> Object - Behavior: Inverse of
to_pairs.
QUERY: [["a",1],["b",2]].from_pairs()
OUT: {"a":1,"b":2}
invert
- Signature:
Object<K, V> -> Object<V, K> - Behavior: Swap keys and values. Values must be coercible to keys (string-like).
QUERY: {"a":"x","b":"y"}.invert()
OUT: {"x":"a","y":"b"}
pick(field, ...)
- Signature:
Object -> Object - Behavior: Keep only the named keys. Supports
alias: srcrename.
DOC: {"id": 1, "name": "Ada", "secret": "!"}
QUERY: $.pick(id, name)
OUT: {"id":1,"name":"Ada"}
QUERY: $.pick(uid: id, name)
OUT: {"name":"Ada","uid":1}
Maps over arrays of objects:
$.users.pick(id, email)
is equivalent to $.users.map(u => u.pick(id, email)).
omit(field, ...)
- Signature:
Object -> Object - Behavior: Inverse of
pick. Drop the named keys.
QUERY: $.user.omit(secret, password)
Merge
| Method | Behavior |
|---|---|
merge(other) | Shallow merge — other's keys win on collision |
deep_merge(other) | Recursive merge — sub-objects merged, arrays replaced |
defaults(other) | Reverse merge — keep self's keys, fill missing from other |
QUERY: {"a":1,"b":2}.merge({"b":99,"c":3})
OUT: {"a":1,"b":99,"c":3}
QUERY: {"a":{"x":1}}.deep_merge({"a":{"y":2}})
OUT: {"a":{"x":1,"y":2}}
QUERY: {"a":1}.defaults({"a":99,"b":2})
OUT: {"a":1,"b":2}
rename(...mapping)
- Signature:
Object -> Object - Behavior: Rename keys per a
{old: new, ...}mapping.
QUERY: $.user.rename({user_id: id, full_name: name})
transform_keys(fn) and transform_values(fn)
- Signature:
Object -> Object - Behavior: Apply
fnto every key / value.
QUERY: {"foo": 1, "bar": 2}.transform_keys(@.upper())
OUT: [{"BAR":2,"FOO":1}]
QUERY: {"a": 1, "b": 2}.transform_values(@ * 10)
OUT: [{"a":10,"b":20}]
filter_keys(pred) and filter_values(pred)
- Signature:
Object -> Object - Behavior: Keep entries whose key / value matches the predicate.
QUERY: $.config.filter_keys(k => k.starts_with("aws_"))
QUERY: $.scores.filter_values(@ >= 50)
pivot(rows, cols, value)
- Signature:
Array<Object> -> Object<KeyString, Object> - Behavior: Pivot a table-shaped array into a nested object indexed by
rowsthencols, withvalueas the leaf.
DOC: [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY: $.pivot("y", "q", "v")
OUT: {"2024":{"1":10,"2":20},"2025":{"1":15}}
implode(joiner=",")
- Signature:
Array<String> -> String - Behavior: Like
join, but works on object values too:
QUERY: {"a":"x","b":"y"}.values().implode("/")
OUT: ["x","y"]
Demand notes
pick is a powerful demand signal — it tells the source which fields are
needed. Over a wide-record document, pick(id, name) upstream of the rest
of the pipeline avoids decoding all the other fields.
keys over an array stage emits one row per element, but keys over a
single object is a scalar.
Practical examples
DOC: {"users":[
{"id":1,"name":"Ada","email":"ada@x.com","secret":"!"},
{"id":2,"name":"Bob","email":"bob@y.org","secret":"?"}
]}
# Project safe public fields
QUERY: $.users.map(u => u.pick(id, name, email))
# Drop sensitive keys
QUERY: $.users.map(u => u.omit(secret))
# Rename in flight
QUERY: $.users.map(u => u.pick(uid: id, full_name: name, email))
# Keys / values / entries
QUERY: $.users[0].keys() → ["id","name","email","secret"]
QUERY: $.users[0].values().count() → 4
QUERY: $.users[0].entries().count() → 4
# Round-trip through entries
QUERY: $.users[0].entries().from_pairs() → equivalent to $.users[0]
# Merge with defaults (existing keys win)
QUERY: $.config.defaults({timeout: 30, retries: 3})
# Deep-merge config layers
QUERY: $.base_config.deep_merge($.user_config)
# Filter object by key prefix
QUERY: $.env.filter_keys(k => k.starts_with("AWS_"))
# Filter values
QUERY: $.scores.filter_values(@ >= 50)
# Apply transform to every value
QUERY: $.prices.transform_values(@ * 1.08)
# Normalise keys to snake_case
QUERY: $.payload.transform_keys(k => k.snake_case())
# Invert a code-to-name table
QUERY: $.country_codes.invert() # {"US":"United States",...} → {"United States":"US",...}
# Pivot long-format records
DOC: [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY: $.pivot("y","q","v")
OUT: {"2024":{"1":10,"2":20},"2025":{"1":15}}
Path and Structural Mutation
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}
Methods that read, set, delete, or rewrite values at specific paths within a document. These work on whole documents or sub-trees.
For chain-write terminals ($.path.set(v)) see Patch.
This chapter documents the method-call versions.
get_path(path)
⚠ v0.5 quirk: only resolves a single key —
get_path("a/b/c")returns null even when$.a.b.cexists. Use direct path navigation ($.a.b.c) when the path is statically known. For dynamic paths, walk manually withlet+ chained[expr].
- Signature (intended):
Any, String -> Any | null - Behavior (intended): Read a value at a slash-separated path.
DOC: {"user": {"profile": {"name": "Ada"}}}
QUERY: $.get_path("user")
OUT: {"profile":{"name":"Ada"}}
QUERY: $.get_path("user/profile")
OUT: {"name":"Ada"}
set_path(path, value)
- Signature:
Any, String, Any -> Any - Behavior: Return a copy with
valuewritten atpath. Creates intermediate objects as needed.
QUERY: $.set_path("user/profile/email", "ada@example.com")
del_path(path)
- Signature:
Any, String -> Any - Behavior: Return a copy with the leaf at
pathremoved.
QUERY: $.del_path("user/secret")
del_paths(paths)
- Signature:
Any, Array<String> -> Any - Behavior: Remove all listed paths in one pass. Cheaper than chained
del_pathfor many removals.
QUERY: $.del_paths(["user/secret", "user/temp", "session/csrf"])
has_path(path)
- Signature:
Any, String -> Bool - Behavior: True if a value exists at
path. Distinguishes "missing" from "explicit null":
DOC: {"a": null}
QUERY: $.has_path("a") OUT: false
QUERY: $.has_path("b") OUT: false
flatten_keys(sep="/")
- Signature:
Object -> Object - Behavior: Flatten a nested object into a single-level object with joined keys.
DOC: {"a": {"b": 1, "c": 2}, "d": 3}
QUERY: $.flatten_keys()
OUT: {"a.b":1,"a.c":2,"d":3}
QUERY: $.flatten_keys(".")
OUT: {"a.b":1,"a.c":2,"d":3}
unflatten_keys(sep="/")
- Signature:
Object -> Object - Behavior: Inverse of
flatten_keys.
QUERY: {"a/b": 1, "a/c": 2}.unflatten_keys()
OUT: {"a/b":1,"a/c":2}
set(path, value) (method-call form)
- Signature:
Any, String, Any -> Any - Behavior: Same as
set_path. Kept for ergonomic chains.
The chain-write terminal $.path.set(v) is different — it's parsed as
a patch and operates on the rooted document path.
update
update is jetro's functional batched update. Two surfaces:
Object body — update({k: expr, ...})
Apply a set of field updates to one or more selected subtrees. Plain keys update fields below the receiver; quoted keys carry full paths.
DOC: {"books": [
{"title": "Dune", "year": 1965, "tags": ["sf"]},
{"title": "Hyperion", "year": 1989, "tags": ["sf", "hugo"]}
]}
QUERY: $.books[*].update({tags: tags.append("test"), reviewed: true})
OUT: {"books":[{"reviewed":true,"tags":["sf","test"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","hugo","test"],"title":"Hyperion","year":1989}]}
Each selected book gets both fields written. Plain identifiers (tags,
reviewed) are read against the selected snapshot — not the
mid-batch document — so two ops on the same target both see the original
field values.
Body forms:
| Form | Meaning |
|---|---|
field: expr | Write expr into field of each selected target |
"a.b.c": expr | Write into a nested path inside each selected target |
"books[*].tags": expr | Quoted path key — full root-relative path with wildcards/filters |
field: expr when cond | Skip when cond is falsy |
field: DELETE | Remove the field (with optional when) |
@ inside the body is the current value at the target field (handy
inside path keys); $ is the original root.
QUERY: $.books[*].update({tags: tags.append("modern") when year > 1980})
OUT: {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","hugo","modern"],"title":"Hyperion","year":1989}]}
Root-level batch with quoted paths
When the receiver is $, quoted keys carry full paths, including
wildcards and DELETE:
QUERY: $.update({"books[*].tags": @.append("test"), active: false})
DOC: {"books": [{"tags": ["sf"]}], "active": true}
OUT: {"active":false,"books":[{"tags":["sf","test"]}]}
DOC: {"users": [{"id":1,"secret":"a"}, {"id":2,"secret":"b"}]}
QUERY: $.update({"users[*].secret": DELETE})
OUT: {"users":[{"id":1},{"id":2}]}
Filtered wildcard [* if pred]
Both selectors and quoted path keys support a filtered wildcard:
DOC: {"books": [
{"title": "Dune", "year": 1965, "tags": ["sf"]},
{"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}
QUERY: $.books[* if year > 1980].update({tags: tags.append("modern")})
OUT: {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}
QUERY: $.update({"books[* if year > 1980].tags": @.append("modern")})
OUT: {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}
Two-argument path form — update(path, expr)
The classic shape: a slash- or dot-separated path plus an expression.
@ inside the expression is the current value at path.
DOC: {"counters": {"visits": 10, "clicks": 3}}
QUERY: $.update("counters.visits", @ + 1)
OUT: {"counters":{"clicks":3,"visits":11}}
QUERY: $.update("counters/visits", @ + 1)
OUT: {"counters":{"clicks":3,"visits":11}}
Semantics
| Property | Behavior |
|---|---|
| Snapshot reads | Each body expression sees the pre-batch values, not partial mid-batch state |
| Order | Ops apply in source order — last write wins on overlap |
| Selectors | Index, wildcard [*], filtered wildcard [* if pred], nested chains all OK |
| Scalar targets | An update with object body promotes scalar elements to objects ({seen: true} over [1,2] → [{seen:true},{seen:true}]) |
| Untouched subtrees | Preserved by Arc sharing — no deep copy of unrelated fields |
| Empty body | .update({}) is a no-op — returns the doc unchanged |
Worked example
DOC: {"users": [
{"id": 1, "secret": "a", "name": "Ada"},
{"id": 2, "secret": "b", "name": "Bob"}
]}
QUERY: $.users.map(u => u.del_paths(["secret"]).set_path("display", u.name))
OUT: [{"display":null}]
Demand notes
Path-mutation methods produce a full result and can't tell the source what
fields they need (the path is data, not statically analysable). When the
path is a literal, prefer pick/omit/set over get_path/set_path —
the planner can use literal field names.
Practical examples
# Single-key write (preferred over set_path for v0.5)
$.user.name.set("Ada Lovelace") # chain-write
# Set a field deep
patch $ { user.profile.email: "ada@x.com" }
# Bulk delete
$.del_paths(["secret","temp","csrf"])
# Flatten a nested config for environment-variable export
$.config.flatten_keys(".") # {"db.host":..., "db.port":..., ...}
# Round-trip via flatten/unflatten
$.config.flatten_keys().unflatten_keys() # ≈ $.config
# Existence test before write
patch $ {
email: $.user.email when $.has_path("user.email")
}
# Flat-key patches
$.patch_set.flatten_keys().entries().map(([k,v]) => $.set_path(k, v))
Deep Traversal and Recursion
Walk every descendant value in DFS pre-order. The deep methods are also
available as ..method(...) syntax sugar in path position.
deep_find(pred) (or ..find(pred))
- Signature:
Any -> Array<Any> - Behavior: Every descendant satisfying
pred. Order: DFS pre-order.
DOC: {"a": {"x": 1}, "b": [{"x": 2}, {"y": 3}]}
QUERY: $..find(@.x?)
OUT: [{"x":1},{"x":2}]
QUERY: $.deep_find(@ is number)
OUT: [1,2,3]
When the structural index is available, deep_find runs over a bitmap
representation in jetro-experimental rather than walking Val nodes —
significantly faster for shallow predicates.
deep_shape({k1, k2, ...}) (or ..shape({...}))
- Signature:
Any -> Array<Object> - Behavior: Every object that has all listed keys (regardless of value).
DOC: [{"id":1,"name":"a"},{"id":2},{"name":"c","id":3}]
QUERY: $..shape({id, name})
OUT: [{"id":1,"name":"a"},{"id":3,"name":"c"}]
deep_like({k1: v1, ...}) (or ..like({...}))
- Signature:
Any -> Array<Object> - Behavior: Every object whose listed keys equal the listed literal values.
DOC: [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942},{"author":"Herbert","year":1965}]
QUERY: $..like({author: "Asimov"})
OUT: [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942}]
walk(fn)
- Signature:
Any, (Any -> Any) -> Any - Behavior: Apply
fnto every node bottom-up; rebuild the tree.
QUERY: $.walk(node => node.upper() if node is string else node)
# Returns the document with every string node uppercased.
walk_pre(fn)
- Signature:
Any, (Any -> Any) -> Any - Behavior: Like
walk, but pre-order —fnsees parent before children.
Use walk_pre when the transform decides whether to recurse based on the
node's identity (e.g. "stop at leaves of kind X").
rec(pattern, fn)
⚠ Unstable in v0.5 — observed runtime error
"rec: exceeded 10000 iterations without reaching fixpoint"even on simple inputs. Spec exists but the fixpoint loop is buggy. Avoid in production until fixed; track migration progress in the issue tracker.
- Signature (planned):
Any, Pattern, (Any -> Any) -> Any - Behavior (planned): Match-and-rewrite. Recursively walks; replaces
every match with
fn(match).
This is the recursive sibling of Pattern Match; useful for AST rewrites and document migrations.
trace_path(pred)
- Signature:
Any, (Any -> Bool) -> Array<Array<Step>> - Behavior: For every node matching
pred, return the path from root to the node as an array of steps.
DOC: {"a": {"x": 1}, "b": [{"x": 2}]}
QUERY: $.trace_path(@.x?)
OUT: [{"path":"$.a","value":{"x":1}},{"path":"$.b[0]","value":{"x":2}}]
The steps are the keys/indices to walk to reach the match. Pair with
set_path for find-and-replace operations.
Deep match
The pattern-match construct has deep variants ..match and ..match! —
see Control Flow and the pattern-match
cookbook.
When the bitmap kicks in
Deep search uses the structural index when:
- The query is rooted at
$..or.deep_* - The predicate is a shape/key check (not a complex lambda)
- The document was loaded with the simd-json tape (default)
You don't enable this — it's selected by the planner.
Demand notes
Deep traversals declare All upstream by nature. The optimisation surface
is the predicate: shape and like checks bypass the per-node lambda
evaluation entirely.
Practical examples
# Find every node with an "id" key (anywhere in the tree)
$..find(@.id?)
# Find all numbers
$..find(@ is number)
# Every object that has both id + name keys
$..shape({id, name})
# Every object where a field equals a specific value
$..like({status: "error"})
# Locate an event by ID inside a deeply nested tree
$..match! { {id: 42} -> @, _ -> null }
# Walk every node, transforming strings to upper
$.walk(node => node.upper() if node is string else node)
# Trace paths from root to nodes matching a predicate
$.trace_path(@.is_admin?)
# → [["users",0],["users",2]]
# Bulk audit: find every "secret"-named field
$..find(@.secret?)
Membership and Predicates
Tests and small helpers.
or(default)
- Signature:
Any, Any -> Any - Behavior: If self is null, return
default. Otherwise return self.
QUERY: null.or("default") OUT: "default"
QUERY: "hi".or("default") OUT: "hi"
Equivalent to ?? default but reads better in chains:
$.user.name.or("anon")
has(key)
- Signature:
Object|Array, KeyOrIndex -> Bool - Behavior: True if the key exists (objects) or index is in range (arrays).
QUERY: {"a":1,"b":2}.has("a") OUT: true
QUERY: {"a":1}.has("b") OUT: false
QUERY: [1,2,3].has(2) OUT: true
QUERY: [1,2,3].has(5) OUT: false
The has operator (x has y) is sugar for x.includes(y) — distinct
from this method.
missing(...keys)
⚠ Broken in v0.5 — empirically returns
falseinstead of the array of missing keys. Compute manually until fixed:["host", "port", "user"].filter(k => not $.config.has_path(k))
- Signature (intended):
Object, ...String -> Array<String> - Behavior (intended): Return the subset of provided keys that are not present.
includes(value) (alias contains)
- Signature:
Array|String, Any -> Bool - Behavior: Membership.
QUERY: [1,2,3].includes(2) OUT: true
QUERY: "hello".includes("ell") OUT: true
index(value)
- Signature:
Array|String, Any -> Number | null - Behavior: Index of first occurrence; null if not found.
QUERY: [10,20,30].index(20) OUT: 1
QUERY: [10,20,30].index(99) OUT: null
For strings, see also index_of in String Search.
indices_of(value)
- Signature:
Array|String, Any -> Array<Number> - Behavior: All indices of
value.
QUERY: [1,2,3,2,1].indices_of(2)
OUT: [1, 3]
Quick comparison: predicates that look similar
| Pattern | Returns |
|---|---|
xs.has("foo") | Bool — does the key/index exist? |
xs.includes("foo") | Bool — is the value present? |
xs.index("foo") | Number|null — where? |
xs.indices_of("foo") | Array — all positions |
xs.find(p) | A|null — first matching element |
xs.find_index(p) | Number|null — first matching index |
Practical examples
# Default for missing field
$.user.email.or("no-email@example.com")
# Existence check on key
$.config.has("aws_region")
# Index of a value (not the predicate form)
$.tags.index("admin")
# All positions of duplicates
[1, 2, 1, 3, 1].indices_of(1) # → [0, 2, 4]
# Membership in a set
$.tags.includes("urgent")
# Allow-list / deny-list patterns
$.role.includes("admin") and not $.banned_users.includes($.id)
Tabular Output
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}]}
Serialise sequences of objects to row-oriented text formats.
to_csv(headers?)
- Signature:
Array<Object> -> String - Behavior: RFC-4180-ish CSV. Without arguments, the union of object keys is the header set, sorted by first-appearance.
DOC: [{"name":"Ada","age":36},{"name":"Bob","age":42}]
QUERY: $.to_csv()
OUT:
"name,age
Ada,36
Bob,42"
With explicit headers:
QUERY: $.to_csv(["age","name"])
OUT:
"age,name
36,Ada
42,Bob"
Strings containing commas, quotes, or newlines are quoted and escaped per RFC 4180.
to_tsv(headers?)
- Signature:
Array<Object> -> String - Behavior: Same as
to_csvbut tab-separated. No quoting (tab-in-value is replaced with a space).
QUERY: $.users.to_tsv(["id","email"])
Composing with the rest of the pipeline
Build a report:
$.users
.filter(@.active)
.map(u => u.pick(id, name, email))
.sort(@.id)
.to_csv()
Pipe to a file from the CLI:
jetrocli '$.users.filter(@.active).pick(id,name).to_csv()' < users.json > out.csv
Limitations
- Nested values are JSON-encoded into the cell. For deeply-nested structures,
flatten first with
flatten_keys:$.records.map(r => r.flatten_keys()).to_csv() - The format is row-major. For wide-narrow long-format reshape, use
pivot/zip_shapefirst. - For Excel-flavored CSV (BOM, CRLF), post-process the result.
Practical examples
# Active-user export
$.users.filter(@.active).map(u => u.pick(id, name, email)).sort(u => u.id).to_csv()
# Daily sales report (use e[0]/e[1] indexing — array-pattern destructure
# inside a lambda doesn't parse in v0.5)
$.sales.group_by(s => s.day).entries().map(e => {
day: e[0],
total: e[1].map(@.amount).sum(),
count: e[1].count()
}).to_csv()
# Hashtag frequency CSV
$.tweets.flat_map(t => t.entities.hashtags.map(@.text))
.count_by(@)
.entries()
.map(e => {tag: e[0], count: e[1]})
.to_csv()
# TSV for log shipping
$.logs.map(l => l.pick(ts, level, message)).to_tsv()
Relational
Fixture
Examples below run against:
DOC: {"orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "customers": [{"id": 1, "name": "Ada", "email": "ada@x.com"}, {"id": 2, "name": "Bob", "email": "bob@y.org"}], "left": [{"id": 1, "name": "Ada"}, {"id": 2, "name": "Bob"}], "right": [{"uid": 1, "role": "admin"}, {"uid": 2, "role": "user"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}]}
Operations that combine two arrays of objects on a key.
equi_join(other, leftKey, rightKey, fn?)
- Signature:
Array<L>, Array<R>, KeyL, KeyR, ((L, R) -> Any)? -> Array<Any> - Behavior: Inner equi-join: for every pair
(l, r)wherel[leftKey] == r[rightKey], emit a result. Iffnis omitted, the result is the merged objectl.merge(r).
LEFT: [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}]
RIGHT: [{"uid":1,"role":"admin"},{"uid":2,"role":"user"}]
QUERY: $.left.equi_join($.right, "id", "uid")
OUT: [{"id":1,"name":"Ada","uid":1,"role":"admin"},
{"id":2,"name":"Bob","uid":2,"role":"user"}]
QUERY: $.left.equi_join($.right, "id", "uid", (l, r) => {
name: l.name,
role: r.role
})
OUT: [{"name":"Ada","role":"admin"},{"name":"Bob","role":"user"}]
Worked example: orders + customers
DOC:
{
"customers": [
{"id": 1, "name": "Ada"},
{"id": 2, "name": "Bob"}
],
"orders": [
{"customer": 1, "amount": 100},
{"customer": 1, "amount": 50},
{"customer": 2, "amount": 75}
]
}
QUERY:
$.orders.equi_join($.customers, "customer", "id", (o, c) => {
customer: c.name,
amount: o.amount
})
OUT:
[
{"customer":"Ada","amount":100},
{"customer":"Ada","amount":50},
{"customer":"Bob","amount":75}
]
Notes and limitations
- Inner only. No outer joins. For "all left, fill missing right with
null" you can hand-roll:
$.left.map(l => l.merge($.right.find(@.uid == l.id).or({role: null})) ) - Equality only. No range, prefix, or function joins.
- One key on each side. For multi-key joins, project a tuple key first:
$.left.map(l => l.merge({_k: [l.a, l.b]})) .equi_join($.right.map(r => r.merge({_k: [r.x, r.y]})), "_k", "_k") - The implementation builds a hash on the right side; left is streamed. Pre-sort or pre-filter before joining if either side is large and only a subset matters.
When to choose join vs. lookup
For "many left rows, lookup one field on each":
$.orders.map(o => o.merge({customer_name: $.customers.find(@.id == o.customer).name}))
This nested find is O(n×m) — fine for small data. For large data, use
equi_join (O(n+m)) or build a lookup table first:
let by_id = $.customers.index_by(@.id) in
$.orders.map(o => o.merge({customer_name: by_id[o.customer].name}))
Practical examples
# Enrich orders with customer info
$.orders.equi_join($.customers, "customer_id", "id")
# Custom result shape
$.orders.equi_join($.customers, "customer_id", "id", (o, c) => {
order_id: o.id,
total: o.amount,
buyer: c.name,
email: c.email
})
# Self-join: pair adjacent records via shared key
$.events.equi_join($.events, "session_id", "session_id", (a, b) => {a, b})
# Multi-key join via tuple projection
let lk = $.left.map(l => l.merge({_k: f"{l.a}-{l.b}"})) in
let rk = $.right.map(r => r.merge({_k: f"{r.x}-{r.y}"})) in
lk.equi_join(rk, "_k", "_k")
# Filter-then-join (drop rows before paying join cost)
$.orders.filter(@.status == "paid").equi_join($.customers, "cid", "id")
Chained Pipelines
Real-world queries assembled from the building blocks. Each recipe uses one small document and shows the query chain plus a sentence on what the planner does.
1. Top-N by aggregate
DOC: {"sales": [
{"region": "NA", "amount": 100},
{"region": "EU", "amount": 200},
{"region": "NA", "amount": 50},
{"region": "AS", "amount": 300},
{"region": "EU", "amount": 75}
]}
QUERY: $.sales
.group_by(@.region)
.entries()
.map(([region, rows]) => {region, total: rows.map(@.amount).sum()})
.sort(@.total)
.reverse()
.take(2)
OUT: [{"region":"AS","total":300},{"region":"EU","total":275}]
group_by and sort are barriers; take(2) after the sort doesn't help —
the sort must complete first. Push the demand earlier where possible.
2. Active users + role-based count
DOC: {"users": [
{"id":1,"role":"admin","active":true},
{"id":2,"role":"user","active":false},
{"id":3,"role":"user","active":true},
{"id":4,"role":"admin","active":true}
]}
QUERY: $.users
.filter(@.active)
.count_by(@.role)
OUT: {"admin":2,"user":1}
Streaming filter + barrier count_by. The filter passes only what's needed;
count_by buffers but with ValueNeed::Predicate (only the role key) — the
rest of the user object is never decoded.
3. Histogram of word frequency
DOC: {"text": "the quick brown fox jumps over the lazy dog the end"}
QUERY: $.text
.words()
.map(@.lower())
.count_by(@)
OUT: {"the": 3, "quick": 1, "brown": 1, ...}
4. Customer order summary
QUERY: $.orders
.group_by(@.customer_id)
.entries()
.map(([cid, orders]) => {
customer_id: cid,
total: orders.map(@.amount).sum(),
count: orders.count(),
recent: orders.sort(@.date).last().date
})
.sort_by(@.total)
.reverse()
The inner .sort(@.date).last() is wasteful: it sorts every group to grab
the last. Rewrite with max_by:
QUERY: ...
.map(([cid, orders]) => {
customer_id: cid,
total: orders.map(@.amount).sum(),
count: orders.count(),
recent: orders.max_by(@.date).date
})
5. Unique recent active sessions
QUERY: $.events
.filter(@.kind == "login" and .at >= "2026-01-01")
.map(@.user_id)
.unique()
.count()
6. Pretty-print a CSV from objects
QUERY: $.users
.filter(@.active)
.map(u => u.pick(id: id, name: full_name, email))
.sort(@.id)
.to_csv()
7. Find a needle in a deep document
QUERY: $..find(@.id == 42)
If the document was loaded from bytes (default), this hits the structural index — no full traversal.
8. Compute deltas with pairwise
DOC: {"prices": [100, 105, 102, 110, 108]}
QUERY: $.prices.pairwise().map(([a, b]) => b - a)
OUT: [5,-3,8,-2]
9. Rolling 3-point moving average
QUERY: $.measurements.rolling_avg(3)
The first two outputs are null until the window fills.
10. Build a lookup, then enrich
QUERY: let by_id = $.users.index_by(@.id) in
$.events.map(e => e.merge({user: by_id[e.user_id].name}))
index_by is a barrier that runs once; the .map streams.
11. Select rows with all required fields
QUERY: $.records.filter(r => r.missing("id", "name", "email").count() == 0)
12. Re-shape a long-format table
DOC: [
{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},
{"y":2025,"q":1,"v":15},{"y":2025,"q":2,"v":25}
]
QUERY: $.pivot("y", "q", "v")
OUT: {"2024":{"1":10,"2":20},"2025":{"1":15,"2":25}}
13. Mask sensitive fields
QUERY: $.users.map(u => u.omit("password", "ssn", "token"))
14. Delta + cumulative sum
QUERY: $.daily.pairwise().map(([a, b]) => b.value - a.value)
Cumulative-sum form (
.accumulate(0, (a, x) => a + x)) isn't yet wired up in v0.5 — see the Limitations page. Until then,cummax/cummincover running min/max; full fold needs a host loop.
15. Migrate a document shape
⚠
recis unstable in v0.5 (fixpoint loop bug). For now, preferwalk/walk_prewith a manual shape check, or do the rewrite host-side.
QUERY (planned, currently broken):
$.rec({type: "v1"}, doc =>
doc.merge({type: "v2"})
.rename({old_field: "new_field"})
.omit("legacy_blob"))
rec walks the document, finds every node matching the shape, and rewrites
in place.
Pattern Match Cookbook
Fixture
Examples below run against:
DOC: {"xs": [1, 2, 3, 4, 5], "row": {"k": "foo", "data": {"a": 1, "b": 2}}, "doc": {"a": 1, "b": 2, "type": "v1"}, "tree": {"x": 1, "children": [{"x": 2}]}, "value": 3.14}
Pattern matching is one of jetro's most expressive features. It compiles to
a Maranget decision tree at lower-time and runs over all three execution
domains (Val, borrowed View, tape).
Anatomy
match scrutinee with {
pattern1 -> expr1,
pattern2 when guard -> expr2,
_ -> default
}
- Arms checked top-down.
- First match wins.
_is the universal fallback.whenguards run after the structural match succeeds.
Pattern reference
| Pattern | Matches |
|---|---|
42, "x", true, null | Equal literal |
_ | Anything |
name | Anything, binds to name |
1..10 | Number ≥ 1 and < 10 |
1..=10 | Number ≥ 1 and ≤ 10 |
{k: p, ...} | Object with key k, value matches p |
[p1, p2] | Array of length 2 |
[h, ...t] | Head + tail |
p1 | p2 | Either |
x: number | Kind-bind |
v0.5 note: object shorthand
{id, name}binds each key to a same-name local, and rest-capture is spelled...*rest(object) or...tail(array):{id, name, ...*rest},[h, ...tail]. See Limitations for the canonical pattern grammar.
1. Discriminated union
match $.event with {
{kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
{kind: "key", code: c} -> f"key:{c}",
{kind: "scroll", dy: d} -> f"scroll:{d}",
_ -> "unknown"
}
In v0.5 every object pattern key needs an explicit key: binding form;
the bare {kind: "click", x, y} shorthand parses-error.
2. Numeric ranges
match $.score with {
s when s < 0 -> "invalid",
0..50 -> "low",
50..80 -> "medium",
80..=100 -> "high",
_ -> "out of range"
}
3. Or-patterns
match $.day with {
"sat" | "sun" -> "weekend",
_ -> "weekday"
}
4. Rest capture
⚠ Not yet supported in v0.5. The
..restpattern parse-errors. Bind the keys you care about explicitly and computerestoutside the match if needed:
match $.config with {
{host: h, port: p} -> {host: h, port: p, extras: $.config.omit("host", "port")},
_ -> null
}
5. Array shape
match $.coords with {
[x, y] -> {x, y},
[x, y, z] -> {x, y, z},
_ -> null
}
6. Head + tail
match $.xs with {
[] -> "empty",
[first, ...rest] -> f"head={first}, count={rest.count()}",
}
7. Kind-bound + guard
match $.value with {
s: string when s.len() > 100 -> "long string",
s: string -> "short string",
n: number when n > 0 -> "positive",
n: number -> "non-positive",
_: array -> "array",
_ -> "other"
}
8. Deep match (..match)
Walk every descendant; collect results.
$.tree..match {
{kind: "leaf", value} -> value,
_ -> null
} | .compact()
The trailing .compact() drops the nulls from non-leaf descendants.
9. First-match deep (..match!)
Stops at the first match — the bang variant uses early termination via the structural index where possible.
$.tree..match! {
{role: "admin", id} -> id,
_ -> null
}
10. Migration / rewrite (rec)
$.doc.rec({type: "v1"}, node => node.merge({type: "v2"}))
rec is the recursive sibling of match — it descends and rewrites every
matching node.
11. Cross-arm sharing
When multiple arms test the same prefix ({kind: "x", ...},
{kind: "y", ...}), the lowering shares the discriminant test. You don't
write anything special — the planner does it for you. Practically: write
many narrow arms; they cost about as much as one big switch.
12. Guards over deep patterns
match $.row with {
{user: {age, role: "admin"}} when age >= 18 -> "adult admin",
{user: {age}} when age < 18 -> "minor",
_ -> "other"
}
Bench tips
- Patterns with literal-only discriminants (no guards) compile to switch-like
decision trees and run as fast as a hand-written
if/else if. - Guards add a per-arm conditional; cheap, but don't put expensive computation in them.
- Deep
..matchover a large doc benefits a lot from the structural index; deep..match!(first-match) is even better.
Write Fusion
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}
When a query contains multiple chain-writes, jetro fuses them into a single pass over the document. This is the patch-fusion optimizer.
What gets fused
Any sequence of chain-write terminals on the same document:
$.user.name.set("Ada")
.user.email.set("ada@x.com")
.user.tags.append("admin")
Or the equivalent block form (preferred for many writes):
patch $ {
user.name: "Ada",
user.email: "ada@x.com",
user.tags[*]: "admin"
}
Without fusion
Naively, three writes mean three traversals from $:
$ → user → name (write)
$ → user → email (write)
$ → user → tags[*] (write)
Each rebuilds the path from the root. For deeply-nested documents, the cost adds up.
With fusion
The optimizer collects effects, walks the document once, and applies all relevant rewrites at each visited node:
$ → user → {set name, set email, append tags}
Three writes, one walk.
Phases
The patch-fusion pass has internal phases (Phase C, Phase E in the source); the user-visible properties are:
- Same-base writes group together. Writes under
$.user.*batch. - Disjoint paths don't interfere. Writes to
$.user.nameand$.config.themeexecute in one walk but at different nodes. - Conflicts are resolved last-wins. Two writes to the same path: the later one wins.
- Conditional writes (
when) are evaluated per-write. They short-circuit per clause; the walk doesn't redo work.
Worked example
DOC:
{
"users": [
{"id": 1, "name": "Ada", "active": false},
{"id": 2, "name": "Bob", "active": true}
]
}
QUERY:
patch $ {
users[*].active: true, # broadcast write
users[0].name: "Ada Lovelace", # specific write
users[*].last_seen: "2026-05-08" when .active # conditional broadcast
}
What happens:
- One walk visits every user.
- For each, three potential writes evaluate. Per element:
active: truealways applies.nameonly at index 0.last_seenonly when post-activewrite is true (so all of them).
Output:
{
"users": [
{"id": 1, "name": "Ada Lovelace", "active": true, "last_seen": "2026-05-08"},
{"id": 2, "name": "Bob", "active": true, "last_seen": "2026-05-08"}
]
}
When fusion doesn't fire
- The chain isn't rooted at
$(parser doesn't classify it as a write). - The writes are gated by data-dependent conditions that change document shape mid-pipeline.
- Mixed read/write —
$.users[0].name.set("A").upper()keeps standard method semantics.
Tips
- Prefer the block form (
patch $ { … }) when you have ≥ 3 writes — easier to read, and the optimizer treats it identically. - Use broadcast (
xs[*].field: v) instead of a.mapthat calls.setper element. - Conditionals (
when) are fine — they don't break fusion.
jq vs jetro Cheatsheet
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}]}
For users coming from jq. Same shape: query JSON in a terminal. Different
philosophy in places — call this out where it matters.
Big differences at a glance
| Topic | jq | jetro |
|---|---|---|
| Calling methods | Pipe-of-filters: . | length | Dot syntax: .len() |
Pipe | | Sole composition operator | Value-flow only — passes @ to RHS |
| Iteration | Implicit on .[] | Explicit on chained methods |
| Lambdas | None — uses . rebinding | Three forms: @, r =>, lambda r: |
| Pattern matching | None | First-class with guards and ranges |
| Writes | |=, =, del() | .set(), patch $ {}, chain-writes |
| Backend | Single interpreter | Six backends, planner-selected |
| Caching | None | Plan + path caches in JetroEngine |
One-liner translations
Identity / projection
jq: .
jetro: $
jq: .x
jetro: $.x
jq: .x.y[0]
jetro: $.x.y[0]
Iteration
jq: .users[]
jetro: $.users[*] # explicit; or just .users for chained methods
jq: .users[].name
jetro: $.users.map(@.name)
Field selection / projection
jq: {id, name}
jetro: .pick(id, name) # method form, maps over arrays
jq: .users | map({id, name})
jetro: $.users.map(u => u.pick(id, name))
# or
$.users.pick(id, name)
jq: del(.password)
jetro: $.omit(password) # or $.password.delete()
Filter
jq: .users | map(select(.active))
jetro: $.users.filter(@.active)
jq: .users[] | select(.age > 18)
jetro: $.users.filter(@.age > 18)
Aggregates
jq: length
jetro: .len() # for arrays, objects, strings
.count() # explicit array-count reducer
jq: [.[] | .price] | add
jetro: $.map(@.price).sum()
jq: [.[] | .age] | min
jetro: $.map(@.age).min()
# or
$.min_by(@.age).age # one-pass, returns whole element
Sort / unique / group
jq: sort
jetro: .sort()
jq: sort_by(.year)
jetro: .sort(@.year)
jq: unique
jetro: .unique()
jq: group_by(.author)
jetro: .group_by(@.author)
# jq returns array-of-arrays; jetro returns object indexed by key
jq: [group_by(.k)[] | {k: .[0].k, n: length}]
jetro: .count_by(@.k).entries().map(([k,n]) => {k, n})
Slice and take
jq: .[0:3]
jetro: $[0:3]
jq: .[0]
jetro: $[0]
# or
$.first() # demand-aware sink
jq: .[-1]
jetro: $[-1]
# or
$.last()
Has / index / membership
jq: has("foo")
jetro: .has("foo")
jq: .tags | index("admin")
jetro: $.tags.index("admin")
jq: .tags | contains(["admin"])
jetro: $.tags.includes("admin")
Strings
jq: ascii_upcase
jetro: .upper()
jq: ltrimstr("foo")
jetro: .strip_prefix("foo")
jq: split(",")
jetro: .split(",")
jq: test("regex")
jetro: @ ~= "regex"
# or
.re_match("regex")
jq: match("(\\d+)").captures
jetro: .captures("(\d+)")
Recursive descent
jq: ..
jetro: .. # same notation
jq: .. | strings
jetro: $..find(@ is string)
jq: .. | objects | select(.id?)
jetro: $..find(@.id?)
# or
$..shape({id})
String formatting
jq: "Hello, \(.name)!"
jetro: f"Hello, {$.name}!"
Conditional
jq: if .x > 5 then "big" else "small" end
jetro: "big" if $.x > 5 else "small"
jq: .x // "default"
jetro: $.x ?? "default"
Variables
jq: . as $doc | $doc.x + $doc.y
jetro: let doc = $ in doc.x + doc.y
Reduce / fold
jq: reduce .[] as $x (0; . + $x)
jetro: $.sum() # for sum specifically
# or general fold:
$.accumulate(0, (a, x) => a + x).last()
Object construction
jq: {users: [.[] | {id, name}]}
jetro: {users: $.map(u => u.pick(id, name))}
Modification
jq: .x = 1
jetro: $.x.set(1)
# or
patch $ {x: 1}
jq: .x |= . + 1
jetro: $.x.modify(@ + 1)
jq: del(.x)
jetro: $.x.delete()
jq: .users[].active = true
jetro: $.users[*].active.set(true)
# or
patch $ {users[*].active: true}
Multiple writes
jq: .x = 1 | .y = 2 | del(.z)
jetro: patch $ {x: 1, y: 2, z: DELETE}
jetro fuses these into one document walk. jq evaluates each pipe stage independently.
Complex pipeline translations
Real-world jq queries from the wild. Originals are taken verbatim from the jq manual and the Programming Historian "Reshaping JSON with jq" lesson; all credit to those sources. Each shows the original jq alongside an idiomatic jetro rewrite.
1. Alternative-binding destructure (jq manual)
Flatten a list of resources whose events field may be either a single
object or an array of objects, into one row per (resource, event) pair.
jq uses its alternative-destructuring operator ?// to try both shapes:
.resources[] as {$id, $kind, events: {$user_id, $ts}} ?// {$id, $kind, events: [{$user_id, $ts}]}
| {$user_id, $kind, $id, $ts}
jetro has no ?//. Use kind-test + flat_map to normalise:
$.resources.flat_map(r =>
let evts = (r.events if r.events is array else [r.events]) in
evts.map(e => {
user_id: e.user_id,
kind: r.kind,
id: r.id,
ts: e.ts
})
)
…or with a match to make the two shapes explicit:
$.resources.flat_map(r =>
match r.events with {
arr: array -> arr.map(e => {user_id: e.user_id, kind: r.kind, id: r.id, ts: e.ts}),
{user_id, ts} -> [{user_id, kind: r.kind, id: r.id, ts}],
_ -> []
}
)
The match form is more explicit and surfaces the "single object" branch as
its own arm — easier to extend (e.g. add a third event-shape later).
2. Tweet hashtags as semicolon-joined CSV (Programming Historian)
Take an array of tweets, project id plus a semicolon-joined string of
hashtag texts, emit as CSV. Original jq, threaded through five pipe stages:
{id: .id, hashtags: .entities.hashtags}
| {id: .id, hashtags: [.hashtags[].text]}
| {id: .id, hashtags: .hashtags | join(";")}
| [.id, .hashtags]
| @csv
Each pipe stage rebuilds the object — jq has no nested method chaining, so projection accumulates by reassignment.
jetro collapses it to one chain:
$.map(t => {
id: t.id,
hashtags: t.entities.hashtags.map(@.text).join(";")
}).to_csv()
to_csv already emits the row, headers and all. To match jq's headerless
output:
$.map(t => [t.id, t.entities.hashtags.map(@.text).join(";")])
.map(row => row.map(@.to_string()).join(","))
.join("\n")
3. Hashtag frequency CSV (Programming Historian)
Explode each tweet into one row per hashtag, group by hashtag, count, emit
(tag, count) as CSV. Original jq:
[.[] | {id: .id, hashtag: .entities.hashtags} | {id: .id, hashtag: .hashtag[].text}]
| group_by(.hashtag)
| .[]
| {tag: .[0].hashtag, count: . | length}
| [.tag, .count]
| @csv
jq's group_by returns an array-of-arrays, so the trailing .[] and
.[0].hashtag extract the key from the first element of each group.
jetro uses count_by, which already produces a {tag: count} map:
$.flat_map(t => t.entities.hashtags.map(@.text))
.count_by(@)
.entries()
.map(([tag, count]) => {tag, count})
.to_csv()
The pipeline reads top-to-bottom: explode → tally → reshape → emit.
count_by is one of several jetro idioms (also index_by, unique_by,
max_by) that fold a common jq pattern (group_by | map(...)) into a
single barrier.
Why these examples are shorter in jetro
Three patterns recur:
- Method chaining. jq's
... | {...} | {...}style rebuilds the object at each stage; jetro's.map(t => {...})builds it once. - Specialised barriers.
count_by,index_by,unique_by,max_by,min_bycollapsegroup_by | map(...)chains into one call. - First-class lambdas. jq's
.rebinding insideas/[]becomes plaint => t.fieldin jetro, with no positional gymnastics.
The trade-off: jq's pipe-of-filters is more uniform — every stage is a filter that takes one input and produces zero-or-more outputs. jetro's methods are typed (one-to-one, filter, expander, reducer, barrier), so the pipeline shape is more visible but the surface is bigger.
Things jq has that jetro doesn't
@base64,@uri,@csvformatters as suffix. jetro spells these as methods:.to_base64(),.url_encode(),.to_csv().- SQL-style modules. No equivalent.
input,inputs,nul-separated streaming. jetro is in-process; no streaming-input model.recurse(f; cond). Usewalk_preorrecwith a pattern.
Things jetro has that jq doesn't
- Pattern matching with guards, ranges, kind binding, deep
..match. - Demand propagation.
.first(),.find(),.take(n)cut off the source; no full materialization. - Bitmap structural index.
..find,..shape,..likeskip non-matching subtrees in O(1) per node. - First-class lambdas (
r => body,lambda r: body) with let-binding + inlining. - Write fusion. Many writes batch into one walk.
- Backends. Tape-zero-copy, structural index, columnar — selected by the planner.
Pitfalls when porting
.[]doesn't exist. Replace with[*]or just chain methods (most jetro methods auto-iterate over arrays).- Pipe is not composition.
.x | .yin jq means "x then y". In jetro it's "evaluate.ywith@=.x". For chaining methods, use.:.x.y(). - Method calls need parens.
lengthis.len(), not.len. select(p)becomesfilter(p), and works on whole arrays — no need to first iterate with.[].- Group_by returns an object, not an array of arrays. Use
.entries()for jq-shaped output.
Quick reference card
| Need | jq | jetro |
|---|---|---|
| Project | {a, b} | .pick(a, b) |
| Drop key | del(@.k) | .omit(k) |
| Filter | select(p) | .filter(p) |
| Map | map(f) | .map(f) |
| Iterate | .[] | [*] or implicit |
| Length | length | .len() |
| Sort | sort_by(@.k) | .sort(@.k) |
| Unique | unique | .unique() |
| First | .[0] | .first() |
| Last | .[-1] | .last() |
| String concat | "\(@.x)" | f"{$.x}" |
| Default | // d | ?? d |
| If | if c then a else b end | a if c else b |
| Var | as $x | let x = ... |
| Set | .x = v | .x.set(v) |
| Update | .x |= f | .x.modify(f) |
| Delete | del(@.x) | .x.delete() |
Performance Guide
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "rows": [{"age": "30", "price": "3.14"}]}
How to write jetro queries that the planner can run fast, and how to read the benchmarks.
Mental model
Jetro picks one of six backends per pipeline node. Fast paths share three properties:
- The source is a path of pure field accesses.
$.a.b.ctriggers tape backends (zero-copy over simd-json output). - The pipeline ends in a sink that bounds demand.
.first(),.take(n),.find(p),.count()propagate backward and gate source reads. - No mid-pipeline materialization.
.collect(),.sort(),.group_by()flush the tape access pattern back to aValwalk.
If you write to those three rules, queries land on the fast path automatically.
Backend selection (cheat-sheet)
| Source / shape | Primary backend |
|---|---|
$.a.b.c (field-chain) | tape-view (zero-copy) |
$..find(...), $..shape({...}) | bitmap structural index |
Single $.a.b (path only) | tape-path |
| Generic expr / lambda body | fast-children |
| Any backend declines | interpreted (universal fallback) |
You don't pick — the planner does. Knowing the table tells you why a query is fast.
Demand: the killer feature
Every Demand-aware sink lets the source skip work. Concrete impact:
| Pattern | Speedup vs. naive |
|---|---|
xs.first() | ~N× (reads 1 element) |
xs.find(p) | up to ~N× (stops at first match) |
xs.filter(p).take(k) | up to N/k× |
xs.count() | 2-5× (no payload decoded) |
xs.sum(), xs.avg() | 2-3× (only numeric leaves) |
xs.last() (random-access source) | ~N× (seek to end) |
xs.reverse().take(k) | rewritten to LastInput(k) |
For wide objects, field projection is the other big win:
$.users.map(u => u.pick(id, name))
The source decodes only id and name per row. Other fields stay as raw
tape tokens.
What kills performance
Mid-chain materialization
$.users
.filter(@.active)
.collect() # unnecessary
.map(@.email)
The .collect() forces a full pass before .map. Drop it.
Pre-sort barriers blocking demand
$.events.sort(@.ts).first()
.sort is a barrier — must see every element. The .first() doesn't help.
Rewrite with min_by:
$.events.min_by(@.ts)
One pass, no allocation of the sorted array.
Per-element joins (O(n×m))
$.orders.map(o => o.merge({name: $.users.find(@.id == o.user_id).name}))
Each find rescans $.users. For large data, build a lookup once:
let by_id = $.users.index_by(@.id) in
$.orders.map(o => o.merge({name: by_id[o.user_id].name}))
Or use equi_join.
Repeated sub-expressions
$.user.profile.name + " <" + $.user.profile.email + ">"
Three tape walks. Bind once:
let p = $.user.profile in
f"{p.name} <{p.email}>"
Heavy lambdas in barriers
$.rows.unique_by(@.to_string())
unique_by calls the lambda once per row. If the projection is
non-trivial (regex, deep traversal), pre-project once:
$.rows.map(r => r.merge({_k: r.to_string()}))
.unique_by(@._k)
.map(@.omit(_k))
Engine tuning
Plan cache
JetroEngine caches (query, context) → compiled pipeline. Default 256
entries, wholesale eviction.
For a small fixed query set with high doc volume — the typical web-server shape — every call after the first is a cache hit. Don't fight it.
For unique-per-call queries (CLI ad-hoc), the cache is a no-op; just use
Jetro directly.
Path cache
The VM caches resolved pointer paths per document. The hash key includes both structure and primitive values bounded at depth 8 — so two docs with the same shape but different leaves stay distinct. You don't manage this.
simd-json (default)
The simd-json feature gives ~4× cold-start. Disable only if you need to
round-trip serde_json::Value and the conversion cost dominates.
Benchmarks
cargo bench -p jetro-core
The harness covers:
- Field access (
$.a.b.c) — tape-view zero-copy - Filter / map / take pipelines — demand propagation
- Deep search (
..find,..shape) — bitmap structural index - Pattern match — Maranget tree
- Lambda forms —
@vs.=>vs.lambdaparity - Write fusion — single vs. fused multi-writes
To compare your changes against main:
git checkout main
cargo bench -p jetro-core -- --save-baseline main
git checkout your-branch
cargo bench -p jetro-core -- --baseline main
Reading the output: criterion reports geometric mean ratios. >5% regression should have a clear cause.
Profiling
For Rust workloads:
cargo bench -p jetro-core --bench <name> -- --profile-time 10
Then attach with samply or cargo flamegraph. Hot paths usually live in:
exec/pipeline/exec.rs— pipeline driverexec/view/*.rs— borrowed view stagesexec/router.rs— backend selectionvm/exec.rs— bytecode VM (interpreted fallback)
If the interpreter (vm::execute) shows up hot, the planner is falling
through to the universal fallback. Check the query — usually a non-$
source or a generic expr inside a method arg.
Quick checklist
Before benchmarking a query, ask:
-
Can
.first()/.take()/.find()replace a full materialization? -
Is there a barrier (
sort,unique,group_by) before the bound? Push the bound earlier or use a one-pass equivalent (min_by,count_by). -
Does a lookup repeat per row? Pre-build with
index_by. -
Are wide rows projected early with
pick? -
Are sub-expressions duplicated? Bind with
let. -
Is
simd-jsonenabled (default)? -
Is the same query run many times? Use
JetroEngine.
If all yes, the query is on the fast path.
Public API and Engine
The full public surface of the jetro crate is two types and a handful of
methods. Everything else is implementation detail.
Jetro — single-document handle
For one document, possibly many queries:
use jetro::Jetro;
let bytes = br#"{"x":[1,2,3]}"#;
let j = Jetro::from_bytes(bytes)?; // lazy parse via simd-json tape
let v: serde_json::Value = j.collect("$.x.sum()")?;
assert_eq!(v, serde_json::json!(6));
Constructors
| Method | Input | Notes |
|---|---|---|
Jetro::from_bytes(&[u8]) | Raw JSON bytes | Lazy parse — fastest path |
Jetro::from_value(serde_json::Value) | Parsed value | Skip simd-json |
Jetro::from_val(Val) | Internal Val | Advanced — re-using engine state |
Methods
| Method | Returns |
|---|---|
j.collect(query) | Result<serde_json::Value, EvalError> |
j.collect_typed::<T>(query) | Result<T, EvalError> (deserialize directly) |
Jetro uses a thread-local VM with a path cache. Cheap to construct;
prefer to drop it when you move to a new document so the cache key stays
valid.
JetroEngine — long-lived multi-doc handle
For many documents and many queries with overlap, share the plan/VM caches:
use jetro::JetroEngine;
let eng = JetroEngine::default();
for doc_bytes in inputs {
let v = eng.collect_bytes(doc_bytes, "$.users.filter(@.active).count()")?;
println!("{}", v);
}
Methods
| Method | Input | Notes |
|---|---|---|
eng.collect(&doc, q) | &Val | Document already in Val form |
eng.collect_value(serde_value, q) | serde_json::Value | Round-trips |
eng.collect_bytes(&[u8], q) | Raw bytes | Lazy parse |
Returns Result<serde_json::Value, JetroEngineError> — a wider error type
that may also wrap JSON-parse errors.
Configuration
| Option | Default | Effect |
|---|---|---|
| Plan-cache capacity | 256 | Wholesale-evicted when full |
The engine's plan cache amortises parse + lower + compile across calls. Hits are O(hash); misses do full work.
Errors
pub enum EvalError {
/* … */
}
pub enum JetroEngineError {
Json(serde_json::Error),
Eval(EvalError),
}
Error messages include the query position when available.
Feature flags
| Feature | Default | What it does |
|---|---|---|
simd-json | on | Direct bytes → Val parse, skipping serde_json::Value |
fuzz_internal | off | Re-exports parser + planner for fuzz harness — not stable |
To disable simd-json:
[dependencies]
jetro = { version = "0.5", default-features = false }
Python binding
jetro_py exposes a collect(doc, query) function. Internals are identical
to the Rust crate.
import jetro
result = jetro.collect({"x": [1,2,3]}, "$.x.sum()")
# result == 6
CLI
jetrocli '$.x.sum()' < input.json
The CLI is a thin wrapper around Jetro::from_bytes.
Threading
JetroisSend + Syncfor read-only queries — multiple threads can share aJetroand run different queries concurrently.JetroEngineisSend + Syncand intended for shared-engine workloads.- The VM path-cache is thread-local; cross-thread access goes through separate caches.
Stability
- The query DSL is stable as of jetro 0.5.x.
- The Rust API surface (
Jetro,JetroEngine, error types) is stable. BuiltinMethod, opcodes, IR types are internal and may change in any minor release.- The
fuzz_internalfeature is explicitly unstable.
Known Limitations and Behavior Surprises (v0.5)
Empirically validated against jetro 0.5.5. This page is the canonical fix-list — every entry is a known gap between intended and actual behavior. Use it as a backlog: items here should drop as the runtime catches up.
v0.5.5 — fixed in this release
The 14 audit-surfaced bugs were addressed plus three follow-up sweeps:
- ✅
[*]wildcard parses (mid-chain expands to.map(@ + rest)). - ✅
[a:b:c]and[::n](incl.[::-1]reverse) — Python-style step slicing. - ✅ Lambda array-pattern destructure
([k, v]) => bodyand rest form([h, ...tail]) => body. - ✅ Object patterns in
matchaccept reserved words as keys ({kind: "click"}). - ✅ Object pattern shorthand
{id, name}≡{id: id, name: name}inmatch. - ✅
Val::StrSlice + Val::Str→ string concat. Path-rooted concat works. - ✅
entries()/keys()/values()no longer triple-wrap their array result. - ✅
parse_int(radix)— base-aware integer parsing with prefix stripping. - ✅
to_csv(headers)/to_tsv(headers)— explicit header column ordering. - ✅
accumulate(init, fn)andaccumulate(fn)— both forms. - ✅
partition(pred)— chained and standalone. - ✅
approx_count_distinct()— HyperLogLog. - ✅
missing("k1", "k2", ...)— returns missing-keys array. - ✅
get_path("a/b/c")andget_path("a.b.c")— multi-segment paths. - ✅
dedent()— common-prefix removal. - ✅
remove(pred)— predicate evaluated. - ✅
enumerate()— survives composition withmap/filter. - ✅
pairwise()— works on path sources. - ✅
.has(v)returns boolean. - ✅
rec(fn)fixpoint via deep structural equality. - ✅
rec(fn, cond)— iterate whilecond(@)holds, capped at 10 000 iters. - ✅
update(path, fn)and functional.update({...})— see Path Mutation. - ✅ Filtered wildcard
[* if pred]. - ✅ Wildcard chain modify
$.xs[*].field.modify(@). - ✅ Object literal as method receiver
{a: 1}.keys()and({a: 1}).keys(). - ✅ Regex escape:
"\d"and"\\d"both parse as digit class. - ✅ Path-call scalar unwrap:
$.s.upper()→"HELLO"(was["HELLO"]). ScalarOneToOnebuiltins on path receivers dispatch directly viaapply_one; opt out per-builtin withBuiltinSpec::never_unwrap(). - ✅
to_jsonon array path:$.users.to_json()→ single JSON document (was per-element JSON strings). - ✅
zip_shape({a, b})object-shape arg form. - ✅
group_shape(key)1-arg key projection (lambda or bare ident). - ✅
indent("> ")accepts a string prefix in addition to integer count. - ✅ Bare-path
.fieldinside method args ($.users.filter(.active)≡(@.active)). - ✅ Double-quoted string escape
"{\"a\":1}".from_json()parses.
Items below are still outstanding.
Organized into:
- Open engine items
- Design choices — intentional, won't change
1. Open engine items
1.1 rec() no-arg
rec requires a step expression — there is no defined no-arg semantic.
The closest match is walk(fn) for traversal-style transforms or
rec(fn) for fixpoint iteration. May be retired or aliased to a default
walker in a later release.
1.2 rec(fn) runaway iteration cap
Calls to rec(fn) where fn is non-idempotent and never reaches a
deep-structural fixed point are bounded at 10 000 iterations and then
error. The new error message names the cap and recommends rec(fn, cond)
for explicit bounding. No guard short of analytic decidability prevents
the worst case; document the cap and surface it loudly.
2. Design choices
2.1 No in operator
in would be ambiguous with let X = Y in Z and for x in xs. Use the
postfix has operator or .includes(v) method:
xs has "x" # ✓ operator
xs.includes("x") # ✓ method
"x" in xs # ✗ parse error (intentional)
2.2 replace is single-occurrence
.replace(needle, with) replaces only the first match — JavaScript-style.
Use .replace_all for substitute-every behaviour:
"hello hello".replace("hello", "hi") # → "hi hello"
"hello hello".replace_all("hello", "hi") # → "hi hi"
2.3 Comments
There are no comments inside a query. Strip client-side.
2.4 [expr] vs {expr}
Inline filter is {predicate}. [expr] is index/slice.
$.xs{@.active} # ✓ inline filter
$.xs[@.active] # ✗ index expression
3. Argument / receiver shape rules
3.1 Methods accepting lambda forms
| Method | Working forms |
|---|---|
filter, find, find_all, find_first, find_one, find_index, indices_where, any, all, take_while, drop_while, remove | (@.x op v), (.x op v), (b => b.x op v), (lambda b: ...) |
map, flat_map, transform_keys, transform_values, filter_keys, filter_values | Same |
sort, unique_by, group_by, count_by, index_by, max_by, min_by | Same; (b => b.x) named lambda preferred for readability |
$.books.sort(b => b.year) # named lambda
$.books.sort(@.year) # @-form
$.books.sort(.year) # bare-path sugar (≡ @-form)
3.2 Methods that take bare identifiers (no @)
| Method | Form |
|---|---|
pick(field, alias: src, ...) | Bare identifiers. Not @.field. |
omit(field, ...) | Same |
rename({old: new, ...}) | Object map |
missing("k1", "k2", ...) | String literals |
$.user.pick(id, name) # ✓
$.user.pick(@.id, @.name) # ✗ parse error
$.user.pick(uid: id) # ✓ alias
3.3 Multi-arg lambdas
Two-arg lambdas use parens:
$.orders.equi_join($.customers, "cid", "id", (o, c) => {buyer: c.name})
$.xs.accumulate(0, (a, b) => a + b)
Single-arg array destructure (with optional rest) is supported:
$.entries.map(([k, v]) => {k, v}) # ✓
$.rows.map(([h, ...tail]) => tail) # ✓ rest binding
Versions
This page reflects v0.5.5 behavior empirically tested. As the engine catches up, entries here drop.
Open count: 2 engine items + 4 design choices documented.
Glossary
Backend. One of the execution paths the planner can route a node
through: Structural, TapeView, TapeRows, TapePath, ValView,
MaterializedSource, FastChildren, Interpreted. Selected automatically
based on shape and capabilities.
Barrier. A stage that must see all input before emitting output. sort,
unique, group_by, window, etc.
Bitmap structural index. A bit-packed index over the simd-json tape that
lets ..find, ..shape, ..like, and ..match skip non-matching subtrees
in O(1) per node. Used when the document is loaded with the simd-json tape
(default).
Borrowed view. A ValueView — a read-only borrowed reference into a
parsed document. Zero-copy substrings via Val::StrSlice.
Builtin. One of the 181 methods in jetro's catalog. Each is one
impl Builtin for X block in defs.rs with identity, demand law, and
runtime layers co-located.
Chain-write. A query ending in a write terminal (.set, .modify,
.delete, .unset, .merge, .deep_merge, .append, .prepend) on a
rooted path. Rewritten to Expr::Patch by the parser.
Composed stage. A Composed<A, B> pair that fuses two adjacent stages
into one virtual call per element.
Demand. The triple (pull, value, order) describing what an operator
needs from its source. See Demand Propagation.
Demand law. The rule by which a builtin transforms downstream demand
into upstream demand. Encoded in the builtin's BuiltinDemandLaw.
Effect lifting. The patch-fusion pass that batches multiple chain-writes into a single document walk.
Engine. A JetroEngine — a long-lived handle that caches parsed and
compiled queries for reuse across documents.
F-string. f"text {expr}" — string with embedded expression
interpolation.
Field chain. A path of pure field accesses, e.g. $.a.b.c. Recognised
by the planner and routed to fast tape backends.
Jetro. Single-document handle. Jetro::from_bytes(bytes)?.collect(q).
JetroEngine. Multi-document handle with plan/VM caches.
Lambda. A small function value: @, r => body, lambda r: body. All
three forms compile identically.
Maranget tree. The decision-tree compilation strategy used for pattern matching. Cross-arm sharing of common discriminant tests.
Patch. The internal write operation. Generated by both patch $ { … }
blocks and chain-write classification.
Patch fusion. The optimizer pass that batches multiple writes into a single walk.
Pipeline. The streaming execution model: Source → Stage* → Sink. One
element at a time.
Plan / Logical Plan. Tree-shaped IR between AST and bytecode. Lives in
ir/logical.rs.
Plan cache. A cache in JetroEngine that maps (query, context) to a
compiled Pipeline. Default capacity 256.
Pull demand. The first lane of Demand: how many inputs must be read.
Variants: All, FirstInput(n), LastInput(n), NthInput(i),
UntilOutput(n).
Quantifier. A postfix operator on a path step. ? = optional,
! = exactly-one.
Sink. The terminal stage of a pipeline. Reducers, positional, and implicit collectors.
Source. The first stage of a pipeline. Usually a path or array literal.
Streaming. Per-element execution; no buffering.
Tape. The simd-json output: a flat array of tokens describing structural positions in the JSON byte buffer. Used for zero-copy access.
Val. The internal value type. Arc-wrapped compound nodes ensure cheap
clones.
Value need. The second lane of Demand: how much of each row's content
is required. Variants: None, Predicate, Projection, Numeric,
Whole.
View. A ValueView — borrowed read-only access to a value.
VM. The bytecode executor. Used as the universal fallback backend; also provides the path-cache.
Write fusion. Same as patch fusion. See above.