Introduction
Jetro is a JSON processor which provides query, transform, and patching, written in Rust. It parses a small dot-syntax DSL, plans the query through a multi-tier optimizer, and routes each subtree to whichever execution backend will run it fastest: zero-copy borrowed views over a simd-json tape, a bitmap structural index, a streaming pull pipeline, or the universal interpreted fallback.
Jetro's shape is deliberately different from a small jq clone. Method chains
compose with lambdas, pattern matching, f-strings, reducers, and document
updates inside one expression language. It also has distinctive features such
as demand propagation, more often associated with lazy languages such as
Haskell, so sinks like first, last, and take(n) can change how much
upstream work is performed. For mutation-heavy workflows,
update can batch compatible path rewrites into one document patch instead of
forcing callers to round-trip through host-language object editing.
jetrocli -e '$.services.filter(@.enabled).map({name: @.name, p95: @.latency_ms})' < services.json
[
{"name":"api","p95":42},
{"name":"worker","p95":85}
]
If you have used jq, Jetro will feel familiar but takes a different shape: it
is method-chain oriented, closer to the collection APIs most application
developers already use.
$.services
.filter(@.enabled)
.sort_by(-latency_ms)
.take(1)
.map({service: name, alert: errors > 5})
That query reads like the code you would otherwise write by hand: keep enabled services, sort by latency, keep the slowest one, return only the fields the next system needs.
Why Developers Reach For Jetro
Use Jetro when the shape of the data matters more than the ceremony around it:
- Inspect production JSON without writing a script. Pull out the one field, row, group, or summary you need from a real payload.
- Embed dynamic transformations. Let users, pipelines, or config files define data-shaping rules without recompiling your service.
- Normalize API and event payloads. Filter, project, rename, aggregate, and label JSON before it crosses a boundary.
- Patch documents deliberately. Use update expressions for migrations, fixture generation, and config rewrites.
- Process NDJSON files from the terminal. Run row-local expressions over
logs and event streams with
jetrocli --ndjson.
What Makes Jetro Different
Jetro is small at the surface, but it is not a toy interpreter.
- The syntax is expression-first. Objects, arrays, lambdas, filters, reducers, string formatting, pattern matching, and updates compose inside one expression language.
- The planner tries to do less work. Queries like
first,take, and bounded projections can tell earlier stages how much data is actually needed. - Writes are part of the language. Updating JSON is not bolted on as a separate API; document rewrites are planned alongside reads.
- There is a real Rust API.
Jetrois the byte-oriented document handle.JetroEngineis the long-lived engine for reusable plans, VM state, and streaming workflows.
A Small Taste
Here is a document shaped like the kind of service inventory developers often meet in scripts, dashboards, and deploy tooling:
{
"services": [
{"name":"api","lang":"rust","latency_ms":42,"owner":"platform","enabled":true,"errors":2},
{"name":"worker","lang":"go","latency_ms":85,"owner":"data","enabled":true,"errors":9},
{"name":"admin","lang":"ts","latency_ms":130,"owner":"platform","enabled":false,"errors":0}
],
"deploys": [
{"service":"api","sha":"a1","status":"ok"},
{"service":"worker","sha":"b2","status":"fail"}
],
"meta": {"env":"prod","version":7}
}
Project the active services:
$.services.filter(@.enabled).map({name: @.name, p95: @.latency_ms, owner: @.owner})
Count ownership:
$.services.count_by(@.owner)
Turn deploy states into operator messages:
$.deploys.map(d => match d with {
{status:"fail",service:s} -> f"rollback {s}",
{status:"ok",service:s} -> f"ship {s}",
_ -> "inspect"
})
Patch the document:
$.update({"meta.version": @ + 1, "services[*].checked": true})
The rest of this book teaches the language from that practical angle: how to read JSON, reshape it, aggregate it, update it, and embed the same behavior in Rust.
Example Conventions
Examples use this layout:
DOC: {"services": [{"name": "api", "enabled": true}, {"name": "admin", "enabled": false}]}
QUERY: $.services.filter(@.enabled).map(@.name)
OUT: ["api"]
Where the input document matters, examples include DOC:. Where the source is
already clear from the section, examples usually show only QUERY: and OUT:.
Method aliases are listed inline, for example unique (alias distinct).
Start with the Quick Tour, then use the Builtin Reference when you need exact method behavior.
Installation
Jetro ships as three artifacts:
| Artifact | What it is | Audience |
|---|---|---|
jetro (crate) | Rust library — query/transform JSON in-process | Rust developers |
jetro-py | Python bindings (PyPI) | Python users |
jetrocli | Standalone CLI jetrocli for shell use | Anyone with JSON in a terminal |
Rust library
Add to Cargo.toml:
[dependencies]
jetro = "0.5.11"
The simd-json feature is on by default and gives a ~4× cold-start win by
parsing bytes directly into Val (no serde_json::Value intermediate). To
fall back to the legacy serde-only path:
[dependencies]
jetro = { version = "0.5.11", default-features = false }
Quick sanity check:
use jetro::Jetro;
fn main() -> anyhow::Result<()> {
let bytes = br#"{"books":[{"title":"Dune","year":1965}]}"#;
let j = Jetro::from_bytes(bytes)?;
let titles: serde_json::Value = j.collect("$.books.map(@.title)")?;
println!("{}", titles); // ["Dune"]
Ok(())
}
Long-lived engine
If you process many documents with overlapping queries, keep a JetroEngine
around. It holds shared plan and VM caches:
use jetro::JetroEngine;
let eng = JetroEngine::default();
for doc in docs {
let v = eng.collect(&doc, "$.users.filter(active).count()")?;
println!("{}", v);
}
Plan-cache default capacity is 256 entries; it evicts wholesale when full.
Python bindings
pip install jetro-py
import jetro
doc = {"books": [{"title": "Dune", "year": 1965}]}
print(jetro.collect(doc, "$.books.map(@.title)")) # ['Dune']
The Python wheel embeds the same Rust core, so query syntax is identical.
CLI (jetrocli)
Install via Homebrew:
brew install mitghi/jetrocli/jetrocli
Or build from source:
git clone https://github.com/mitghi/jetrocli
cd jetrocli && cargo install --path .
Use it like jq:
echo '{"x":[1,2,3]}' | jetrocli -e '$.x.sum()'
# 6
cat data.json | jetrocli -e '$.users.filter(@.active).map(@.email)'
For file-backed NDJSON, add --ndjson, -i, and -e:
jetrocli --ndjson -i events.ndjson -e '$.id'
jetrocli --ndjson -i events.ndjson \
-e '$.rows().reverse().distinct_by($.id).take(100)'
Building from source
git clone https://github.com/mitghi/jetro
cd jetro
cargo build --release # build everything
cargo test # full suite
cargo bench -p jetro-core # micro-benchmarks
Workspace layout:
jetro/ facade crate (re-exports + public API)
jetro-core/ engine: parser, planner, executor, builtins, runtime
jetro-core/fuzz/ cargo-fuzz harness (feature-gated)
Verifying your install
Run the tour from the next chapter against your install. If every query produces the printed output, you're ready.
A Practical Tour
This tour teaches Jetro the way you will probably use it: grab a real JSON
payload, ask a precise question, reshape the answer, and move on. Every query in
this chapter was checked with the release build of jetrocli 0.2.9.
Run a query against a JSON file:
jetrocli -e '$.services.filter(@.enabled).count()' < services.json
Run a row-local query against NDJSON:
jetrocli --ndjson -i events.ndjson -e '$.service + ":" + $.level'
The Working Document
Save this as services.json:
{
"services": [
{"name":"api","lang":"rust","latency_ms":42,"owner":"platform","enabled":true,"errors":2,"tags":["edge","json"]},
{"name":"worker","lang":"go","latency_ms":85,"owner":"data","enabled":true,"errors":9,"tags":["queue"]},
{"name":"admin","lang":"ts","latency_ms":130,"owner":"platform","enabled":false,"errors":0,"tags":["internal"]}
],
"deploys": [
{"service":"api","sha":"a1","status":"ok"},
{"service":"worker","sha":"b2","status":"fail"}
],
"meta": {"env":"prod","version":7}
}
1. Start With Paths
Use $ for the root document, then walk fields and indexes.
QUERY: $.services[0].name
OUT: "api"
Wildcards collect the same field from many array items:
QUERY: $.services[*].name
OUT: ["api","worker","admin"]
2. Filter Like You Would In Code
Inside filter, map, and similar methods, @ is the current item.
QUERY: $.services.filter(@.enabled).count()
OUT: 2
That is the basic Jetro shape: start from a path, chain operations, return the value you actually need.
3. Return A Useful Shape
Projection objects let you rename fields, drop noise, and compute small derived values in one pass.
QUERY:
$.services
.filter(@.enabled)
.map({name: @.name, p95: @.latency_ms, owner: @.owner})
OUT:
[
{"name":"api","owner":"platform","p95":42},
{"name":"worker","owner":"data","p95":85}
]
This is where Jetro starts paying rent in developer workflows: the output is already shaped for the next command, dashboard, test assertion, or API boundary.
4. Sort, Bound, Then Project
Use sort_by, take, and map for top-N questions.
QUERY:
$.services
.filter(@.enabled)
.sort_by(-latency_ms)
.take(1)
.map({service: name, alert: errors > 5})
OUT:
[
{"alert":true,"service":"worker"}
]
The minus sign sorts descending by latency. take(1) makes the intended demand
explicit: you only want the worst enabled service.
5. Aggregate When A List Is Too Much
Reducers consume a sequence and return a single value.
QUERY: $.services.map(@.latency_ms).avg()
OUT: 85.66666666666667
Group-style reducers return summaries that are easy to scan:
QUERY: $.services.count_by(@.owner)
OUT: {"data":1,"platform":2}
6. Build Operator-Friendly Strings
F-strings are useful for logs, labels, report fields, and shell output.
QUERY:
$.services
.filter(@.errors > 0)
.map(f"{@.name}: {@.errors} errors")
OUT:
["api: 2 errors","worker: 9 errors"]
7. Classify Data With Pattern Matching
Pattern matching is a good fit for status payloads, event kinds, and tagged objects.
QUERY:
$.deploys.map(d => match d with {
{status:"fail",service:s} -> f"rollback {s}",
{status:"ok",service:s} -> f"ship {s}",
_ -> "inspect"
})
OUT:
["ship api","rollback worker"]
Arms are checked top-down. Put specific cases before the fallback arm.
8. Search Deeply When The Path Is Not Stable
When you know the condition but not the exact location, use recursive descent.
QUERY: $..find(@.status == "fail")
OUT:
[
{"service":"worker","sha":"b2","status":"fail"}
]
For known schemas, prefer direct paths. For exploratory work over unfamiliar payloads, deep search is often the fastest way to ask the first question.
9. Patch Documents
update returns the full document with the selected changes applied.
QUERY: $.update({"meta.version": @ + 1, "services[*].checked": true})
OUT:
{
"deploys":[
{"service":"api","sha":"a1","status":"ok"},
{"service":"worker","sha":"b2","status":"fail"}
],
"meta":{"env":"prod","version":8},
"services":[
{"checked":true,"enabled":true,"errors":2,"lang":"rust","latency_ms":42,"name":"api","owner":"platform","tags":["edge","json"]},
{"checked":true,"enabled":true,"errors":9,"lang":"go","latency_ms":85,"name":"worker","owner":"data","tags":["queue"]},
{"checked":true,"enabled":false,"errors":0,"lang":"ts","latency_ms":130,"name":"admin","owner":"platform","tags":["internal"]}
]
}
The object keys are paths to update. The expression on the right is evaluated
against the value at that path, so "meta.version": @ + 1 increments the
current version.
10. Row-Local NDJSON
Save this as events.ndjson:
{"ts":"10:00","service":"api","level":"info","ms":38}
{"ts":"10:01","service":"worker","level":"error","ms":220}
{"ts":"10:02","service":"api","level":"error","ms":91}
Run:
jetrocli --ndjson -i events.ndjson -e '$.service + ":" + $.level'
Output:
"api:info"
"worker:error"
"api:error"
Without $.rows(), NDJSON mode evaluates the expression once per line.
11. Whole-Stream NDJSON
Use $.rows() when the expression should see the NDJSON file as one stream.
jetrocli --ndjson -i events.ndjson \
-e '$.rows().filter($.level == "error").map({service: $.service, ms: $.ms})'
Output:
{"service":"worker","ms":220}
{"service":"api","ms":91}
This is the mode for file-level filtering, slicing, grouping, latest-record queries, and compacted-topic inspection.
12. Latest Record Per Key
For Kafka-style records where the payload starts after |:
1|{"id":1,"name":"api old","active":false}
2|{"id":2,"name":"worker","active":true}
1|{"id":1,"name":"api","active":true}
Run:
jetrocli --ndjson -i topic.ndjson --payload-after '|' \
-e '$.rows().reverse().distinct_by($.id).filter($.active).map({id: $.id, name: $.name})'
Output:
{"id":1,"name":"api"}
{"id":2,"name":"worker"}
Read from the end, keep the first row for each id, then filter and project. That is a compacted-topic audit query in one expression.
A Few Power Moves
The tour above keeps to the common path. These examples are worth knowing once you start writing longer queries.
Lambda Forms
The shorthand @ form is usually enough, but named lambdas are useful when an
expression gets dense:
QUERY: $.services.filter(s => s.latency_ms > 80).map(s => s.name)
OUT: ["worker","admin"]
These forms are equivalent where a single current item is in scope:
$.services.filter(@.enabled)
$.services.filter(.enabled)
$.services.filter(lambda s: s.enabled)
Schema Checks
Use has_key for object-key existence, includes for value membership, and
missing for compact schema checks:
QUERY:
$.services.map(s => {
name: s.name,
has_json_tag: s.tags.includes("json"),
missing: s.missing("owner", "tags", "runtime")
})
OUT:
[
{"has_json_tag":true,"missing":["runtime"],"name":"api"},
{"has_json_tag":false,"missing":["runtime"],"name":"worker"},
{"has_json_tag":false,"missing":["runtime"],"name":"admin"}
]
Guards In Pattern Matching
Patterns can bind fields, and guards can refine the match:
QUERY:
$.services.map(s => match s with {
{enabled:false,name:n} -> f"disabled {n}",
{latency_ms:ms,name:n} when ms > 100 -> f"slow {n}",
{name:n} -> f"ok {n}"
})
OUT:
["ok api","ok worker","disabled admin"]
Pipe Value Flow
| passes the value on the left into the right expression as @. It is value
flow, not method dispatch:
QUERY: $.services.count() | "found " + (@ as string) + " services"
OUT: "found 3 services"
Conditional Updates
Filtered wildcards let updates target many items without writing a host loop:
QUERY:
$.services[* if errors > 5].update({
tags: tags.append("hot"),
checked: true
})
The result is still the full document with untouched subtrees preserved.
Demand-Aware Queries
These are ordinary queries:
$.services.map(@.name).last()
$.services.filter(@.enabled).first()
$.services.sort_by(-latency_ms).take(2)
Jetro plans from the demanded result backward. Pure one-to-one maps can be
delayed, first and take can bound input, and tape-backed sources can avoid
materializing values until a stage actually needs them.
Rust Embedding
Use the small facade for one document:
let j = jetro::Jetro::from_bytes(bytes)?;
let out = j.collect("$.services.filter(@.enabled).map(@.name)")?;
Use JetroEngine when you want a long-lived engine with plan and VM reuse:
use jetro::JetroEngine;
use serde_json::json;
let eng = JetroEngine::default();
let doc = json!({"services":[{"latency_ms":42},{"latency_ms":85}]});
let v = eng.collect_value(doc, "$.services.map(@.latency_ms).sum()")?;
assert_eq!(v, json!(127));
What To Read Next
You now have the core mental model: path, chain, project, reduce, patch, and stream.
Grammar Overview
The jetro DSL is a small, expression-oriented language. There are no statements at the top level — every program is an expression that produces a value (or, in the case of patches, a rewritten document).
The grammar lives in
grammar.pest
and is parsed by pest.
Five things that make jetro different
- Method calls use dot syntax.
xs.map(f), notxs | map(f). - Pipe
|is value-flow.x | exprevaluatesexprwith@bound tox. @is the current value. Inside.filter(...)it's the row; at the top level it's the input.- Bare paths inside method args.
.filter(@.age > 18)is sugar for.filter(@.age > 18). - Writes are queries.
$.x.set(v)is parsed as a query that produces a patched document, not a mutation.
Categories of syntax
| Category | Forms | Chapter |
|---|---|---|
| Paths | $, @, .field, [idx], [*], [start:end:step], ..desc, {pred} | Paths |
| Operators | arithmetic, comparison, logical, pipe, coalesce, ternary, kind, cast | Operators |
| Methods | .name(args), lambdas (@, =>, lambda) | Lambdas |
| Literals | numbers, strings, f-strings, arrays, objects, regex | Literals |
| Control flow | match, ternary, try, comprehensions | Control Flow |
| Writes | patch $ {…}, chain-write terminals | Patch |
A handy precedence table sits at the end of this part.
A worked sample
$.users
.filter(u => u.active and u.age >= 18)
.map(u => { id: u.id, name: u.name, email: u.email })
.sort(@.name)
.take(10)
That's: root, field users, predicate filter (named lambda), object-mapping,
sort by name, take first 10.
Comments
There are no comments inside a query. Strip them client-side before calling jetro, or factor commentary into the surrounding host program.
Whitespace
Whitespace and newlines are insignificant between tokens. Keep queries on one line in CLIs; break across multiple lines in source.
Paths and Navigation
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5]}
A path is the part of a query that walks into the document. Paths start at
a root marker ($, @, or an identifier inside a lambda) and chain steps
left-to-right.
Roots
| Form | Meaning |
|---|---|
$ | The whole input document (top-level root) |
@ | The current value (set by .filter, .map, |, etc.) |
name | A let-bound name or lambda parameter |
DOC: {"x": 10}
QUERY: $
OUT: {"x":10}
QUERY: $.x | @ + 1
OUT: 11
Field access
DOC: {"user": {"name": "Ada"}}
QUERY: $.user.name
OUT: ["Ada"]
Field names may also use string keys via ["name"]:
QUERY: $["user"]["name"]
Use the bracket form when the key contains characters disallowed in identifiers
(-, spaces, dots inside the key, leading digits).
Indexing arrays
DOC: {"xs": [10, 20, 30, 40]}
QUERY: $.xs[0]
OUT: 10
QUERY: $.xs[-1]
OUT: 40
Negative indices count from the end.
Slicing
QUERY: $.xs[1:3]
OUT: [20,30]
QUERY: $.xs[:2]
OUT: [10,20]
QUERY: $.xs[2:]
OUT: [30,40]
QUERY: $.xs[0:4:2]
OUT: [10,30]
Wildcards
QUERY: $.xs[*]
OUT: [10,20,30,40]
[*] is "every element". Most users prefer chained methods (.filter,
.map) which already iterate.
Filtered wildcard [* if pred]
A predicated wildcard — keeps only elements satisfying pred (with @
bound to the candidate).
DOC: {"books": [{"title": "Dune", "year": 1965}, {"title": "Hyperion", "year": 1989}]}
QUERY: $.books[* if year > 1980]
OUT: [{"title":"Hyperion","year":1989}]
Equivalent to [*] immediately followed by an inline-filter {cond},
but stays on the path side of parsing. Particularly useful inside
.update selectors and quoted patch path keys (see
Patch).
Chaining a bare field step after a filtered wildcard collapses to
null — chain a method instead:
QUERY: $.books[* if year > 1980].map(@.title)
OUT: ["Hyperion"]
Inline filter
{predicate} after a path step keeps only matching elements:
DOC: {"books": [{"year": 1965}, {"year": 1989}]}
QUERY: $.books{@.year > 1970}
OUT: [{"year":1989}]
This is shorthand for .filter(@.year > 1970). Use .filter when you want
named-lambda forms.
Descendant search
.. walks every descendant value in DFS pre-order:
DOC: {"a": {"b": {"x": 1}}, "c": [{"x": 2}, {"x": 3}]}
QUERY: $..x
OUT: [1,2,3]
Combine with method calls (no space):
QUERY: $..find(@.year < 1960)
QUERY: $..shape({year, title})
QUERY: $..like({author: "Asimov"})
The deep variants are bitmap-accelerated when a structural index is available.
Dynamic keys
Compute a key at runtime:
DOC: {"realnames": {"abc": "Ada"}, "post": {"author": "abc"}}
QUERY: $.realnames[$.post.author]
OUT: "Ada"
Inside a lambda:
DOC: {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY: $.posts.map(p => $.realnames[p.author])
OUT: ["Ada"]
Quantifiers (postfix)
| Form | Meaning |
|---|---|
step? | Optional — return null instead of error if missing |
step! | Exactly-one — error if zero or many |
DOC: {"xs": [42]}
QUERY: $.xs!
OUT: [42]
QUERY: $.maybe?
OUT: null # absent, no error
Path after a method
Paths and methods are interchangeable steps:
$.users.filter(@.active).pick(name, email)[0]
That's: field, method, method, index. There is no special "tail position".
Paths inside method args need a root
Inside method-call arguments, paths must start with @ (current item),
$ (document root), or a bound name. Bare-path forms like .field do not
parse:
$.users.filter(@.age > 18) # ✓ @-form
$.users.filter(u => u.age > 18) # ✓ named lambda
$.users.filter(.age > 18) # ✗ parse error
$.users.map(@.name) # ✓
$.users.map(.name) # ✗
The same rule applies to inline filters: $.xs{@.k > 1} works,
$.xs{.k > 1} does not.
Top-level paths still need $.
Summary
| Step | Example | Notes |
|---|---|---|
| Root | $, @ | One per chain (or implicit @ in args) |
| Field | .name | Use ["..."] for tricky keys |
| Index | [3], [-1] | Negative counts from end |
| Slice | [1:5], [::2] | Half-open like Python |
| Wildcard | [*] | Whole array |
| Filtered wildcard | [* if pred] | Wildcard restricted by predicate (@ = element) |
| Descendant | ..name, .. | DFS pre-order |
| Inline filter | {cond} | Sugar for .filter |
| Dynamic key | [expr] | Expression resolves to key |
| Quantifier | ?, ! | Postfix on a step |
Operators
Jetro has the operators you'd expect plus a small number of extras that come up in JSON work.
Arithmetic
1 + 2 # 3
3 - 1 # 2
2 * 3 # 6
6 / 2 # 3
7 % 3 # 1
-x # unary negation
+ on strings concatenates: "foo" + "bar" → "foobar".
+ on arrays concatenates: [1,2] + [3] → [1,2,3].
Comparison
a == b # equality
a != b # inequality
a < b # less than
a <= b
a > b
a >= b
== and != work across types (strings to strings, numbers to numbers, etc).
Cross-type comparison returns false for == and true for !=.
Logical
a and b # short-circuit AND
a or b # short-circuit OR
not a # negation
Truthiness: null, false, 0, "", [], {} are falsy. Everything else
is truthy.
Pipe
value | expr
Evaluates expr with @ bound to value. It is not a method-call
shorthand.
DOC: {"x": 10}
QUERY: $.x | @ * 2
OUT: 20
QUERY: $.x | f"got {@}"
OUT: "got 10"
To call a method, use dot syntax: $.x.upper(), not $.x | upper.
Coalesce
a ?? b
Return a unless it is null, in which case b.
DOC: {"name": null}
QUERY: $.name ?? "anon"
OUT: "anon"
Ternary
Python-style — postfix condition:
"hot" if temp > 30 else "cool"
DOC: {"temp": 35}
QUERY: "hot" if $.temp > 30 else "cool"
OUT: "hot"
Kind tests
v is number
v is string
v is array
v is object
v is null
v is bool
Returns boolean.
QUERY: $.x is number
Cast
x as int
x as float
x as string
x as bool
x as array
x as object
Coerces the value (or returns null if the cast is impossible — depends on the specific cast).
"42" as int # 42
42 as string # "42"
Membership
xs has v # array membership: true if v is in xs
o has "k" # object membership: true if key "k" exists
There is no v in xs operator — that form is a parse error. Use the
postfix has operator above, or call .includes(v) (arrays/strings)
explicitly:
$.tags.includes("hugo") # ✓
"hugo" in $.tags # ✗ parse error
Regex match
s ~= "pattern"
Returns boolean. Uses Rust regex syntax. Bind captures with .captures or
.match_first for richer info — see String Search.
Boolean shortcut on patches
In a patch $ { … } body, a key when condition clause skips the assignment
when condition is falsy. See Patch.
Examples
DOC: {"books": [{"year": 1965, "tags": ["sf"]}, {"year": 1989, "tags": ["sf","hugo"]}], "year_floor": 2000}
QUERY: $.books.filter((@.year > 1970 and @.tags.includes("hugo")) or @.year >= $.year_floor)
OUT: []
QUERY: $.books[0].year ?? 9999
OUT: 1965
QUERY: $.books.map(b => "old" if b.year < 1970 else "new")
OUT: ["old","new"]
No
inoperator. Membership in jetro isxs.includes(v)(orxs.has(v)for objects/arrays). There is nov in xsoperator — that form is a parse error. Wrapand/ormixes in parens to make precedence unambiguous; jetro follows standard binding (andtighter thanor), but parens read clearer.
Lambdas and Method Calls
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5], "pairs": [["a", 1], ["b", 2], ["c", 3]]}
Methods take arguments. Most arguments are values; one common one is a lambda — a small function evaluated per element. Jetro accepts three lambda syntaxes; pick whichever reads best.
The @-form
@ is the current item. Inside method args, prefix paths with @ to walk
into it:
$.users.filter(@.age >= 18)
$.users.map(@.name)
$.xs{@.active} # inline filter must also use @
Leading-dot shorthand .age inside method args desugars to @.age — the
two forms are equivalent and the planner sees identical opcodes.
$.users.filter(.age >= 18)
$.users.map(.name)
$.xs{.active} # works inside inline filters too
Arrow-form named lambda
$.users.filter(u => u.age >= 18)
$.users.map((u) => u.name)
The parens around the parameter are optional for one parameter.
For multiple parameters:
$.pairs.map(([k, v]) => k + ":" + v)
Python-style lambda keyword
$.users.filter(lambda u: u.age >= 18)
$.users.map(lambda u: u.name)
Functionally identical to the arrow form. Useful when porting from Python.
Performance
Named lambdas (u => u.x, lambda u: u.x) and the @-form compile to the
same bytecode. Benchmarks confirm parity (3.42 ms vs 3.44 ms / 100K rows in
the lambda regression suite). Pick what reads best — there is no perf reason
to prefer @.
Method call basics
.method() # no args
.method(arg) # one positional
.method(arg1, arg2) # multiple
.method(name=value) # named (a few methods support these)
.method(arg1, name=value) # mixed
Examples:
$.xs.take(3)
$.xs.replace("foo", "bar")
$.xs.join(",")
$.xs.sort(@.year) # sort by key projection
Methods inside method args
Lambdas can chain methods just like top-level queries:
$.posts.map(p => p.tags.unique().count())
$.users.filter(u => u.email.starts_with("admin"))
Multi-arg lambdas with destructuring
Some barriers (e.g. pairwise) yield 2-tuples. Destructure them:
$.xs.pairwise().map(([a, b]) => b - a)
Captured $
Inside a lambda, $ still means "the document root" — it does not get
shadowed by the lambda parameter:
DOC: {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY: $.posts.map(p => $.realnames[p.author])
OUT: ["Ada"]
First-class lambdas via let
Bind a lambda once, use it many times:
let by_year = (b => b.year < 1970) in
$.books.filter(by_year)
The let-bound lambda is inlined at every method-arg use before
compilation, so it has zero closure overhead — exactly the same code as if
you'd written the body directly in .filter(...).
Outside method-arg position, the binding is a normal name reference.
Literals
Scalars
null
true false
42 3.14 -7 1.5e3
"double-quoted" 'single-quoted'
Strings allow standard escapes (\n, \t, \\, \", \uXXXX).
F-strings
f"…" interpolates {expression}:
DOC: {"name": "Ada", "age": 36}
QUERY: f"hi {$.name}, you are {$.age + 1} next year"
OUT: "hi Ada, you are 37 next year"
Inside a lambda:
$.users.map(u => f"{u.name} <{u.email}>")
Escape literal braces with {{ and }}:
f"{{not interpolated}}" # "{not interpolated}"
Arrays
[1, 2, 3]
["a", "b"]
[$.x, $.y, 99] # values can be expressions
[...$.xs, 4, 5] # spread
[1, ...mid, 9] # spread anywhere
Heterogeneous arrays are fine: [1, "a", null, [2,3]].
Objects
{name: "Ada", age: 36} # bare-key (identifier-like)
{"name": "Ada"} # quoted-key (any string)
{x, y} # shorthand: same as {x: x, y: y}
{[dyn_key]: 1} # computed key
{...obj, extra: 1} # spread
{...**deep} # deep recursive spread
{name: "Ada", role: "admin" when $.is_admin}
# conditional value (omit if cond falsy)
Regex literals
Regex appear as the right operand of ~= or as arguments to regex builtins:
$.s ~= "^[A-Z]+$"
$.text.scan("\d+")
Patterns use Rust's regex crate syntax.
Numeric notes
Jetro distinguishes integers from floats internally where possible. 42 and
42.0 compare equal but a downstream sink that requires "integer" (e.g.
indexing) will only accept the former.
Negative literals: -7 is a unary-negated literal — the parser handles this
correctly without ambiguity in arithmetic positions (a - 7 is subtraction,
a + -7 is addition with -7).
Control Flow
Ternary
Python-style:
expr if condition else fallback
DOC: {"x": 10}
QUERY: "big" if $.x > 5 else "small"
OUT: "big"
Right-associative; chain via parens for clarity.
Try / else
Catch evaluation errors:
try expr else fallback
QUERY: try $.maybe.deep.path else "missing"
OUT: "missing"
QUERY: try $.xs[0].name.upper() else "n/a"
? quantifier handles the "missing field" subset more concisely:
$.maybe? returns null instead of erroring.
let … in …
Local bindings:
let x = $.users.count() in
f"there are {x} users"
Multi-binding:
let a = 1, b = 2 in a + b # equiv: let a=1 in let b=2 in a+b
let shines for first-class lambdas — see Lambdas.
Pattern match
match value with {
pattern1 -> expr1,
pattern2 when guard -> expr2,
_ -> default
}
Patterns
| Pattern | Matches |
|---|---|
42, "x", true, null | Equal literal |
_ | Any value |
name | Any value, bound to name |
1..10 | Number ≥ 1 and < 10 |
1..=10 | Number ≥ 1 and ≤ 10 |
{k1: p1, k2: p2} | Object with these keys, each matching |
{id, name} | Object shorthand; binds id and name |
{id, ...*rest} | Object with rest capture |
[p1, p2] | Array of length 2, each matching |
[h, ...t] | Head + tail |
p1 | p2 | Either pattern (or-pattern) |
x: number | Kind-bound: matches if x is a number |
Guards
match $.x with {
v when v > 100 -> "big",
v when v > 10 -> "medium",
_ -> "small"
}
Worked example
DOC: {"event": {"kind": "click", "x": 100, "y": 200}}
QUERY:
match $.event with {
{kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
{kind: "key", code: c} -> f"key:{c}",
_ -> "unknown"
}
OUT: "click@100,200"
Deep match
$..match { pattern -> expr, _ -> null }
Walks every descendant; returns matched results as an array.
$..match! { pattern -> expr } # first match only, early-stops
The bang variant terminates as soon as one match succeeds (uses the bitmap structural index when available).
Comprehensions
Jetro supports list, dict, set, and generator comprehensions over both
literal and path-rooted sources. Pair destructure works in two
interchangeable forms (for k, v in ... and for [k, v] in ...), and
multiple if clauses are folded with and.
List
[expr for x in source if cond1 if cond2 ...]
DOC: {"xs": [1, 2, 3, 4, 5]}
QUERY: [n*n for n in $.xs if n > 2]
OUT: [9,16,25]
QUERY: [n for n in $.xs if n > 1 if n < 5]
OUT: [2,3,4]
Object
{key: value for x in source if cond}
{k: v for [k, v] in pairs}
{k: v for k, v in pairs}
DOC: {"pairs": [["a", 1], ["b", 2]]}
QUERY: {k: v for [k, v] in $.pairs}
OUT: {"a":1,"b":2}
QUERY: {n: n*n for n in [1,2,3]}
OUT: {"1":1,"2":4,"3":9}
Iterating an object yields {key, value} records:
DOC: {"o": {"a": 1, "b": 2}}
QUERY: {e.key: e.value*10 for e in $.o}
OUT: {"a":10,"b":20}
Set
Deduplicating comprehension. Returns an array of unique values.
QUERY: {n*n for n in [-2, -1, 0, 1, 2]}
OUT: [4,1,0]
Generator
(x for x in items)
Same semantics as the list form; useful as a lazy source for a downstream reducer or barrier.
if-on-patch
Inside a patch $ {…} body, key: expr when cond skips the assignment when
cond is falsy:
patch $ {
status: "active" when $.verified
}
See Patch.
Patch and Writes
Fixture
Examples below run against:
DOC: {"user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "xs": [1, 2, 3, 4, 5]}
Jetro treats writes as queries: a write returns the patched document. There are two equivalent surfaces.
Chain-write terminals
Add a write method at the end of a rooted path:
| Method | Effect |
|---|---|
.set(v) | Replace the value at this path with v |
.modify(expr) | Replace, with @ bound to the current value |
.delete() | Remove the leaf |
.unset(key) | Remove key from the leaf object |
.merge({…}) | Shallow-merge into the leaf object |
.deep_merge({…}) | Recursive merge |
.append(v) | Push to the leaf array |
.prepend(v) | Unshift onto the leaf array |
DOC: {"user": {"name": "Ada", "tags": ["math"]}}
QUERY: $.user.name.set("Ada Lovelace")
OUT: {"user":{"name":"Ada Lovelace","tags":["math"]}}
QUERY: $.user.tags.append("code")
OUT: ["math","code"]
QUERY: $.user.unset(tags)
OUT: {"user":{"name":"Ada"}}
QUERY: $.user.modify(u => u.merge({active: true}))
OUT: {"user":{"active":true,"name":"Ada","tags":["math"]}}
The classifier fires only when the base of the chain is $. Inside
lambdas ($.xs.map(@.set(...))) it remains a regular method call — useful
when a sub-pipeline wants the old "return the new value" semantics.
patch $ { … } block
The same operation expressed as a block:
patch $ {
user.name: "Ada Lovelace",
user.tags: DELETE
}
Block syntax is best for multiple writes — it batches them through a single fused pass (see Write Fusion).
| Block clause | Meaning |
|---|---|
path: value | Assignment |
path: DELETE | Removal |
path: value when cond | Conditional |
path[*]: value | Broadcast over an array |
Conditional writes
patch $ {
status: "active" when $.verified,
retired_at: now() when $.retired
}
If the condition is falsy, the assignment is skipped entirely — neither written nor zeroed.
Broadcast over arrays
DOC: {"items": [{"x": 1}, {"x": 2}, {"x": 3}]}
QUERY: $.items[*].x.set(0)
OUT: [0,0,0]
Pipe form preserves "return-the-new-value"
Some users prefer the v1 behavior where a write inside a .map returned the
written value, not the patched root:
$.items.map(item => item | set(item.x + 1))
The pipe form value | set(new) keeps that meaning.
Modify with pipe
$.user.modify(u => u.merge({last_seen: now()}))
modify evaluates its argument with @ bound to the current value, then
writes the result back at the same path.
Multiple writes in one query
Either chain them:
$.user.name.set("Ada").tags.append("admin")
or use a block:
patch $ {
user.name: "Ada",
user.tags[*]: "active" # broadcast
}
The planner detects multi-write patterns and routes them through the patch-fusion optimizer, which lowers repeated path traversals into a single fused write pass.
Functional .update({...})
A third surface, written as a method call:
DOC: {"books": [
{"title": "Dune", "year": 1965, "tags": ["sf"]},
{"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}
QUERY: $.books[*].update({tags: tags.append("modern") when year > 1980, reviewed: true})
OUT: {"books":[{"reviewed":true,"tags":["sf"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","modern"],"title":"Hyperion","year":1989}]}
Use .update when you want all of the following at once:
- A selector chosen with chain syntax (
$.books[*],$.books[* if year > 1980]) - An object body listing multiple field updates evaluated against each selected snapshot
- The same
when/DELETEsemantics aspatch $ { ... } - Quoted path keys (
"books[*].tags") when the receiver is$, giving root-level batched updates without an explicit selector
.update parses to its own AST node (UpdateBatch) so the planner can
keep the user-level shape — useful for selector pushdown, demand
analysis, and fusion. See
Path Mutation → update for the
full argument matrix.
Filtered wildcard [* if pred]
A predicated wildcard inside a path. Available wherever [*] is, and
particularly useful inside .update selectors and quoted path keys:
DOC: {"books": [
{"title": "Dune", "year": 1965},
{"title": "Hyperion", "year": 1989}
]}
QUERY: $.books[* if year > 1980]
OUT: [{"title":"Hyperion","year":1989}]
The predicate runs against @ = the candidate element. Falsy elements
are skipped from the path traversal entirely.
Wildcard .modify chains
Wildcard chain-writes are now lowered to a fused patch:
DOC: {"books": [{"tags": ["sf"]}, {"tags": ["hugo"]}]}
QUERY: $.books[*].tags.modify(@.append("test"))
OUT: {"books":[{"tags":["sf","test"]},{"tags":["hugo","test"]}]}
Caveats
.replace(needle, with)is not a write terminal — it is the string-replace builtin.- The classifier only triggers on chains rooted at
$. Use the block syntax when the base path is computed. DELETEis a marker, not a value — you can't store it in a binding.
Precedence Table
Lowest precedence at the top. Operators on the same row associate left unless noted.
| Level | Operators | Associativity | Notes |
|---|---|---|---|
| 1 | if … else …, try … else … | right | Ternary, try-else |
| 2 | |, |> | left | Pipe (value-flow) |
| 3 | ??, ?| | right | Coalesce |
| 4 | or | left | Logical OR (short-circuit) |
| 5 | and | left | Logical AND (short-circuit) |
| 6 | not | n/a | Logical NOT (prefix) |
| 7 | is, kind, is not | n/a | Kind test |
| 8 | has | left | Membership operator (no in — use .includes(v)) |
| 9 | ==, !=, <, <=, >, >=, ~= | left | Comparison |
| 10 | +, - | left | Additive (and string/array concat) |
| 11 | *, /, % | left | Multiplicative |
| 12 | as | left | Cast |
| 13 | - (unary) | n/a | Negation |
| 14 | .field, .method(), [idx], {cond}, ?, ! | left | Postfix steps |
| 15 | $, @, literal, (...), lambda, let, match, patch, comp | n/a | Primary |
Common pitfalls
Pipe vs method call.
$.x | upper # ✗ — interprets `upper` as a name to pipe into
$.x.upper() # ✓ — method call
Comparison chains.
1 < x < 10 # ✗ — parses as `(1 < x) < 10`
1 < x and x < 10 # ✓
Ternary mid-chain.
$.x.upper() if cond else $.x # parses fine — the ternary wraps the whole
# left expression
Negation tightness.
not a == b # parses as `(not a) == b` — surprising!
not (a == b) # parens are clearer
a != b # cleanest
Coalesce + comparison.
$.x ?? 0 > 5 # parses as `($.x ?? 0) > 5` (low-precedence coalesce)
Try captures errors only.
try $.x.parse_int() else 0
try does not catch falsy-as-error — only actual evaluation errors (missing
field, bad cast, regex failure, etc.).
Pipelines
A jetro query is a pipeline of stages. The shape is always:
Source → Stage* → Sink
Source produces values one at a time. Each Stage consumes one value and
produces zero, one, or many. The Sink collects results.
What counts as a stage
| Stage | Examples | Output |
|---|---|---|
| One-to-one | .map, .enumerate, .lag, .zscore | One out per in |
| Filter | .filter, .find, .compact, .takewhile | Zero or one out per in |
| Expander | .flat_map, .flatten, .split, .lines, .chars | Many out per in |
| Reducer | .sum, .count, .min, .any, .find_index | One total |
| Positional | .first, .last, .nth(i), .collect | One or N |
| Barrier | .sort, .unique, .group_by, .window, .chunk | Buffers, then emits |
A reducer or positional terminator ends the pipeline; further methods chain on the result (a scalar or array) rather than streaming.
Streaming vs. barrier
Most stages stream — they process one value, emit, repeat. The pull-based
backend means each value travels end-to-end before the next is fetched. This
is what makes early termination work (.first, .find).
Barriers cannot stream: .sort must see every element before it can emit
any. The pipeline buffers up to the barrier, runs the barrier as a unit,
then resumes streaming if more stages follow.
$.xs.map(f).filter(p).sort(@.x).take(10).map(g)
\________________/ \____________/
streaming streaming again
↑
barrier point
Barriers carry an apply_barrier method on the builtin.
Sources
The most common source is a path: $.users is a source. Other shapes:
- An array literal (
[1,2,3].map(f)) - A range (
(0..10).map(f)) - A method that returns a sequence (
$.text.lines().map(...))
Sinks
If your final stage is a reducer, the sink is the reducer's accumulator. If it's a streaming stage, the sink collects into an array.
.collect() is the explicit sink: scalar in → [scalar], array in → identity,
null in → []. Use it when you need a deterministic array shape.
Composed stages
Adjacent stages get composed when possible: two Stages fold into one
virtual call per element. This is Composed<A, B> under the hood; the
optimizer fuses chains of .maps, .filters, and .picks aggressively.
User-visible effect: writing many short stages costs roughly the same as one big lambda — write for clarity.
Backend selection
Each pipeline node carries a list of preferred backends. The router tries them in order; the first to declare it can run the node wins.
| Source | Preferred backends |
|---|---|
FieldChain (e.g. $.a.b.c) | tape-view → tape-rows → materialised → val-view → interpreted |
| Generic expression | fast-children → interpreted |
| Deep search | structural index → interpreted |
| Single root path | tape-path → interpreted |
You don't pick the backend — the planner does. But knowing they exist explains why simple queries are fast: they often run zero-copy over the simd-json tape.
When to think about pipeline shape
In practice, almost never. Two cases:
- Don't sort until you have to. A pre-sort barrier defeats early
termination. Push
.filter,.take,.firstbefore.sortif the semantics allow. - Avoid full materialisation in the middle. Chains of streaming stages
stay zero-copy. A
.collect()mid-chain forces a full pass.
The next chapter, Demand Propagation, explains why these heuristics work.
Demand Propagation
Demand propagation is the planner pass that makes "obvious" queries fast. It walks the pipeline backward — from sink to source — asking each operator: given what comes after you, what do you actually need from your source?
The answer is encoded in three lanes per stage and then used at execution time to skip work.
The three lanes
1. PullDemand — how many inputs?
| Variant | Meaning |
|---|---|
All | Read everything |
FirstInput(n) | Stop after n inputs |
LastInput(n) | Seek to the end, take last n |
NthInput(i) | Jump to a single index |
UntilOutput(n) | Keep reading until n outputs are produced |
2. ValueNeed — what payload from each input?
| Variant | Meaning |
|---|---|
None | Don't decode the row at all |
Predicate | Only what the predicate touches |
Projection | Only the fields used in a projection |
Numeric | Only numeric content |
Whole | The full row (default pessimistic) |
3. order: bool — does input order matter?
Some sinks (e.g. .sum()) don't care about order. The planner can use this
to enable parallel-friendly access patterns when supported.
Backward walk
For a pipeline s1 → s2 → … → sN → sink, the planner does:
demand = sink_demand
for op in [sN, …, s2, s1]: # reverse order
upstream = op.propagate_demand(demand)
record (op, downstream=demand, upstream)
demand = upstream
The final demand is what the source must satisfy. The source backend
chooses an access strategy that matches.
Operator laws
Every builtin declares one of these laws (in defs.rs):
| Law | Effect on demand |
|---|---|
Identity | Pass through unchanged (e.g. .upper, .lower) |
MapLike | Preserve pull, force ValueNeed::Whole |
FilterLike | FirstInput(n) becomes UntilOutput(n) |
TakeWhile | Same as filter, but bounded |
UniqueLike | Must scan until N distinct outputs |
Take(n) | Cap pull at FirstInput(n) |
First | Always FirstInput(1) |
Last | Always LastInput(1) |
Count | All inputs, ValueNeed::None |
NumericReducer | All inputs, ValueNeed::Numeric |
Six worked examples
A. Early termination on .first
$.items.map(name).first()
first()declaresFirstInput(1)to its source.map(name)isMapLike: preserves pull, demandsWholefrom items- Source receives: read 1 item, decode fully
Without demand: read all items, decode all, take first.
B. Bounded filter
$.items.filter(active).take(3)
take(3)←FirstInput(3)filter(active)←UntilOutput(3)(read until 3 pass)- Source: read until 3 active items found
Without demand: filter the entire array, then slice.
C. Field-level projection
$.users.map(u => {id, name})
- The map projection touches
idandname - Source: decode only
id,namefrom each user
Other fields are not allocated. Over a wide-record document, this is the biggest win.
D. Last-element scan
$.logs.filter(severity >= 3).last()
last()←LastInput(1)filter(...)←UntilOutput(1)from the end- Source: scan backward, stop after first match
Without demand: scan forward, materialise all matches, take last.
E. Count without payloads
$.items.filter(status == "done").count()
count()declaresValueNeed::Nonefilter(...)declaresPredicateonstatus- Source: decode only
status, no other fields
F. Reverse + take
$.items.reverse().take(2)
take(2)←FirstInput(2)reverse()flips: source receivesLastInput(2)- Source: seek to end, read 2 backward, then reverse
What demand does not do
- It does not change result semantics. Two pipelines with identical text produce identical output regardless of demand state.
- It does not optimise across barriers (
.sort,.group_by). A barrier forcesAllupstream — it must see every input. - It does not move work between stages. Operators don't fuse; demand only gates what they read.
When you'll feel demand kick in
Three rough rules of thumb:
- Put
take/first/findnear the end. That's how their pull demand reaches back to the source. - Project early when possible.
map(@.field)upstream of a barrier reduces the buffered set. - Avoid unnecessary
collect(). It forces full materialisation and resets the demand walk.
Demand is invisible most of the time — your queries get faster than they "should" be, and that's exactly the goal.
Lazy Evaluation and Caches
Jetro is lazy in three places that matter to users.
1. Document parsing
Jetro::from_bytes does not fully parse the document up front when the
default simd-json feature is enabled. Instead it builds a tape — a flat
array of tokens — and lazily decodes parts as queries demand them.
What this means:
- Cold-start is ~4× faster than the legacy
serde_json::Valuepath. - A query that touches only
$.x.ydecodes the rest of the doc only when asked. - Borrowed string slices (
Val::StrSlice) avoid a copy when the value is read-only.
If you want eager full parsing (e.g. for serde_json::Value round-trips):
let doc: serde_json::Value = serde_json::from_slice(bytes)?;
let v = engine.collect_value(doc, "$.x")?;
2. Streaming pipelines
The pull-based pipeline backend processes one element at a time. A stage
doesn't run until its downstream consumer pulls. This is what enables
.first() and .find() to terminate early.
A consequence: side effects in lambdas are not guaranteed to fire for every element. (Lambdas in jetro have no I/O, so this is mostly an academic concern, but worth knowing if you write a custom builtin.)
3. Plan caches
Two caches matter:
Plan cache (per JetroEngine)
When you call engine.collect(&doc, query) repeatedly with the same query,
the parsed AST → IR → bytecode pipeline is computed once and reused. Default
capacity: 256 entries, evicted wholesale when full.
For workloads with a small fixed set of queries and many documents, this is a big speedup. For ad-hoc one-shot queries, it's a no-op.
Path cache (per VM)
The bytecode VM caches resolved pointer paths per document. The cache key hashes both structure and primitive leaf values bounded at depth 8 — two documents with identical shape but different leaves produce different hashes, so the cache stays correct across calls.
You don't manage this directly. It's amortised over many queries on the same document.
When laziness backfires
It rarely does, but two pitfalls:
Forcing materialisation. Methods like .collect(), .sort(),
.unique(), .group_by() are barriers — they materialise. Putting them
mid-chain when they aren't needed defeats laziness.
Holding onto Vals. A Val is Arc-wrapped, so cloning is O(1), but the
Arc keeps the underlying data alive. If you query a giant doc, hold onto a
small projection, and let the doc go, you may be surprised that the original
data is still resident — the projection's Val::StrSlices borrow into the
tape.
Use .to_json() (or serde_json::Value round-trip) to disconnect a
projection from the source tape when you really need to release memory.
Practical recipe
For long-lived servers:
// At startup
let engine = JetroEngine::default();
// Per request
let result = engine.collect_bytes(req_body, "$.users.filter(@.active).count()")?;
Plans get cached, parsing is lazy, the pipeline early-terminates. There's typically nothing else to tune.
Builtin Reference — Overview
Jetro ships 181 builtin methods. They fall into 18 categories. Every method has the same shape:
.method(arg1, arg2, …)
…or, when the parser routes through inline path filters and sugar:
$.path.method(...)
This part documents every method. Each entry follows the format:
name(aliases: …)
- Signature: what it takes and returns
- Behavior: one-paragraph description
- Example: at least one minimal runnable example
- Demand law / Notes: when relevant
Index
| Category | What goes here | Page |
|---|---|---|
| Value introspection | type, len, schema, JSON round-trip | Introspection |
| Numeric scalars | ceil, floor, round, abs | Numeric |
| String transforms | upper, trim, pad_*, slice, replace … | String |
| String search / regex | starts_with, match_*, captures, split_re | String Search |
| Conversion | to_number, parse_int, parse_bool | Conversion |
| Streaming one-to-one | map, enumerate, pairwise, lag, zscore | Streaming |
| Filtering | filter, find, compact, takewhile | Filtering |
| Expanding | flat_map, flatten, lines, chars | Expanding |
| Reducers | sum, count, any, max_by | Reducers |
| Positional | first, last, nth, collect | Positional |
| Barriers | sort, unique, group_by, window | Barrier |
| Arrays / sets | append, diff, union, zip | Arrays |
| Objects | keys, pick, merge, transform_values | Objects |
| Path mutation | get_path, set_path, set, update | Path Mutation |
| Deep traversal | deep_find, walk, rec | Deep |
| Predicates | has, missing, includes, index | Predicates |
| Tabular | to_csv, to_tsv | Tabular |
| Relational | equi_join | Relational |
Notation in this part
- aliases — alternative names accepted by the parser. They lower to the same builtin and behave identically.
- "demand law" — what kind of
Demandthis builtin propagates upstream. See Demand Propagation for the model. - "barrier" / "stream" / "scalar" — execution shape (does it buffer, stream, or run once on a single value).
When a method appears under multiple categories (e.g. .find is both a
filter and positional), it lives in the most specific chapter and is
cross-linked.
Sharp edges
A small set of 0.5-line design choices is documented in
Known Limitations: replace is
single-occurrence (use replace_all for substitute-every), there is no
in operator (use xs has v), and rec(fn) caps at 10 000 iterations
when the step never converges (use rec(fn, cond) to bound). Two engine
items remain on the fix-list: rec() no-arg and a stronger
runaway-iteration guard.
Aliases at a glance
| Canonical | Aliases |
|---|---|
any | exists |
chunk | batch |
drop_while | dropwhile |
take_while | takewhile |
includes | contains |
skip | drop |
sort | sort_by |
unique | distinct |
deep_find | ..find (deep-method form) |
deep_shape | ..shape |
deep_like | ..like |
These pairs are interchangeable. Pick whichever reads better.
Value Introspection
Methods that report on the kind and shape of a value, plus JSON round-trip.
type
- Signature:
Any -> String - Behavior: Returns the kind of value as a string:
"null","bool","number","string","array","object".
QUERY: $.x.type()
DOC: {"x": [1,2,3]}
OUT: "array"
len
- Signature:
(String|Array|Object) -> Number - Behavior: Length: chars for strings, elements for arrays, key count for
objects. Errors on
null/bool/number.
DOC: {"s": "hello", "xs": [1,2,3], "o": {"a":1,"b":2}}
QUERY: $.s.len() OUT: 1
QUERY: $.xs.len() OUT: 3
QUERY: $.o.len() OUT: 1
to_string
- Signature:
Any -> String - Behavior: Stringifies a scalar (
42→"42",true→"true",null→"null"). For arrays/objects, returns the JSON serialisation.
QUERY: 42.to_string() OUT: "42"
QUERY: ([1, 2]).to_string() OUT: "[1,2]"
to_json
- Signature:
Any -> String - Behavior: Compact JSON serialisation of any value.
QUERY: $.user.to_json()
Distinguish from to_string: for compound values, the two are equivalent;
for scalars, to_json always quotes strings ("foo" → "\"foo\""),
to_string does not.
from_json
- Signature:
String -> Any - Behavior: Parse a JSON string into a value.
QUERY: '{"x":1}'.from_json()
OUT: {"x":1}
QUERY: $.encoded.from_json().x
Errors on malformed input. Wrap in try if the source is untrusted:
try $.s.from_json() else null
schema
- Signature:
Any -> Object - Behavior: Infers a schema sketch — keys, kinds, nullable flags. Useful for "what does this document look like?" probes.
DOC: [{"id": 1, "name": "a"}, {"id": 2, "name": null}]
QUERY: $.schema()
OUT: {"items":{"fields":{"id":{"type":"Int"},"name":{"nullable":true,"type":"String"}},"required":["id"],"type":"Object"},"len":2,"type":"Array"}
The exact output format is documented in
builtins/ops/schema.rs;
treat it as advisory rather than a stable contract.
Demand notes
lenover an array isValueNeed::Noneupstream — it doesn't decode rows.typeisIdentitydemand-wise.from_json/to_jsonare scalar transforms with no demand interaction.
Practical examples
# Quick shape check
$.payload.type() # → "object"
$.payload.len() # for object: number of keys
# Distinguish array length vs string length
$.items.len() # array element count
$.title.len() # number of characters
# Safe deserialization of a payload field
try $.body.from_json() else null
# Compact serialization
$.event.to_json()
# Stringify any value
$.x.to_string()
# Probe an unknown payload's schema
$.events[0].schema()
Numeric Scalars
Fixture
Examples below run against:
DOC: {"products": [{"id": 1, "price": 3.7}, {"id": 2, "price": 4.2}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "deltas": [-1, 2, -3, 4], "xs": [1, 2, 3, 4, 5]}
Pure scalar transforms over numbers.
ceil
- Signature:
Number -> Number - Behavior: Smallest integer ≥ x.
QUERY: 3.2.ceil() OUT: 4
QUERY: (-3.2).ceil() OUT: -3
floor
- Signature:
Number -> Number - Behavior: Largest integer ≤ x.
QUERY: 3.7.floor() OUT: 3
QUERY: (-3.7).floor() OUT: -4
round
- Signature:
Number -> Number - Behavior: Round to nearest; ties round half-away-from-zero.
QUERY: 3.5.round() OUT: 4
QUERY: 3.4.round() OUT: 3
QUERY: (-3.5).round() OUT: -4
abs
- Signature:
Number -> Number - Behavior: Absolute value.
QUERY: (-7).abs() OUT: 7
QUERY: 3.5.abs() OUT: 3.5
Mapping over arrays
These are scalar; lift them with .map:
DOC: {"xs": [1.4, 2.6, -3.5]}
QUERY: $.xs.map(@.round())
OUT: [1,3,-4]
QUERY: $.xs.map(@.abs()).sum()
OUT: 7.5
See also
Numeric reducers (sum, avg, min, max) live in
Reducers. Streaming numeric transforms (zscore,
pct_change, cummax, cummin) live in Streaming.
Practical examples
# Round every price up to the nearest dollar
$.products.map(p => p.merge({price_ceil: p.price.ceil()}))
# Percent → integer percent
$.metric.pct.map(@ * 100).map(@.round())
# Magnitudes (drop sign)
$.deltas.map(@.abs())
# Banker-style splits
$.amount.floor() # cents component, etc.
# Build a histogram with binned values
$.measurements.map(m => (m / 10).floor() * 10).count_by(@)
# → {0: 12, 10: 5, 20: 3, ...}
String Transforms
Scalar string operations. Lift with .map to apply to an array of strings.
Case
| Method | What | Example |
|---|---|---|
upper | ASCII uppercase | "foo".upper() → "FOO" |
lower | ASCII lowercase | "FOO".lower() → "foo" |
capitalize | First char upper, rest lower | "foo bar".capitalize() → "Foo bar" |
title_case | Each word capitalised | "foo bar".title_case() → "Foo Bar" |
snake_case | lowerSnake_case to lower_snake_case | "FooBar".snake_case() → "foo_bar" |
kebab_case | Words joined with - | "FooBar".kebab_case() → "foo-bar" |
camel_case | fooBar style | "foo_bar".camel_case() → "fooBar" |
pascal_case | FooBar style | "foo_bar".pascal_case() → "FooBar" |
reverse_str | Reverse char order | "abc".reverse_str() → "cba" |
Trim
| Method | What |
|---|---|
trim | Strip whitespace from both ends |
trim_left | Strip leading whitespace |
trim_right | Strip trailing whitespace |
QUERY: " hi ".trim() OUT: "hi"
QUERY: " hi ".trim_left() OUT: "hi "
Padding and centering
| Method | Signature | Example |
|---|---|---|
pad_left(width, char?) | Right-align by padding left | "7".pad_left(3, "0") → "007" |
pad_right(width, char?) | Left-align by padding right | "hi".pad_right(5) → "hi " |
center(width, char?) | Center within width | "hi".center(6) → " hi " |
If char is omitted, space is used.
Indent / dedent
indent(n) takes an integer (number of spaces); the prefix is fixed
spaces.
QUERY: "line1\nline2".indent(2)
OUT: " line1\n line2"
dedent() strips the first line's leading whitespace from every
subsequent line that begins with the same prefix. It is not a
common-prefix dedent across all lines:
QUERY: " a\n b".dedent()
OUT: "a\nb"
Slice
"hello world".slice(0, 5) # "hello"
"hello world".slice(6) # "world"
"hello".slice(-3) # "llo"
slice(start, end?) mirrors Python; end is exclusive.
Repeat
"ab".repeat(3) # "ababab"
Replace
| Method | Behavior |
|---|---|
replace(needle, with) | Replace first literal occurrence |
replace_all(needle, with) | Replace all literal occurrences |
replace_re(pattern, with) | Regex-aware single replacement |
replace_all_re(pattern, with) | Regex-aware all replacements |
QUERY: "hello hello".replace("hello", "hi")
OUT: ["hi hello"]
QUERY: "hello hello".replace_all("hello", "hi")
OUT: ["hi hi"]
QUERY: "abc123def".replace_all_re("\d+", "#")
OUT: "abc#def"
Regex escapes inside jetro string literals. Use a single backslash:
"\d","\w+","\s". Jetro string literals don't eat backslashes separately; doubling ("\\d") sends the regex engine the literal two-char sequence\\d, which is not the digit class and silently fails to match. This differs from host languages like Python or JavaScript where you must double-escape.
Strip
"prefix-foo".strip_prefix("prefix-") # "foo"
"foo.txt".strip_suffix(".txt") # "foo"
If the prefix/suffix isn't present, returns the input unchanged.
Encoding
| Method | What |
|---|---|
to_base64 | Standard base64 encode |
from_base64 | Standard base64 decode |
url_encode | Percent-encode |
url_decode | Percent-decode |
html_escape | & → &, < → <, etc. |
html_unescape | Reverse of html_escape |
QUERY: "hello world".to_base64() OUT: "aGVsbG8gd29ybGQ="
QUERY: "a b".url_encode() OUT: "a%20b"
QUERY: "<b>".html_escape() OUT: "<b>"
Demand notes
All string transforms are Identity demand-wise: they don't change what the
upstream needs to produce.
Practical examples
# Normalise display names
$.users.map(u => u.name.trim().title_case().first())
# Build an URL-safe slug
"My Article Title".lower().replace_all(" ", "-")
# → "my-article-title"
# CamelCase to snake_case migration
"FooBarBaz".snake_case() # → "foo_bar_baz"
# Truncate with ellipsis
$.posts.map(p => p.body.slice(0, 100) + "..." if p.body.len() > 100 else p.body)
# Parse a comma-separated tag list
$.tags_csv.split(",").map(@.trim())
# Encode for URL
$.query.url_encode()
# Encode binary as base64
$.bytes.to_base64()
# HTML-escape user input
$.comments.map(c => c.text.html_escape())
# Pad a numeric ID for fixed-width keys
($.id as string).pad_left(8, "0")
# → "00000042" for id=42
# Strip a known prefix
"https://example.com/path".strip_prefix("https://")
# → "example.com/path"
# Build a banner
"=".repeat(40) # → "========================================"
# Indent a nested message
$.message.indent(4)
String Search and Regex
Predicates (return boolean)
| Method | Behavior |
|---|---|
is_blank | True if empty or only whitespace |
is_numeric | True if all chars are digits |
is_alpha | True if all chars are letters |
is_ascii | True if all bytes < 128 |
starts_with(prefix) | Prefix check |
ends_with(suffix) | Suffix check |
QUERY: " ".is_blank() OUT: true
QUERY: "abc123".is_numeric() OUT: false
QUERY: "hello".starts_with("he") OUT: true
Position
| Method | Returns |
|---|---|
index_of(needle) | First index of needle, or -1 |
last_index_of(needle) | Last index of needle, or -1 |
QUERY: "hello world".index_of("o") OUT: 4
QUERY: "hello world".last_index_of("o") OUT: 7
Substring search
"foo bar foo".matches("foo") # 2 (count of literal occurrences)
"abc 12 cd 34".scan("\d+") # ["12", "34"] (regex matches as strings)
Regex match
| Method | Returns |
|---|---|
re_match(pattern) | Boolean |
match_first(pattern) | First match string, or null |
match_all(pattern) | Array of all match strings |
captures(pattern) | First match with groups: [full, g1, g2, …] |
captures_all(pattern) | Array of captures results |
QUERY: "a1b2".re_match("\d") OUT: true
QUERY: "a1b2".match_first("\d+") OUT: "1"
QUERY: "a1b2".match_all("\d+") OUT: ["1","2"]
QUERY: "key=val".captures("(\\w+)=(\\w+)")
OUT: ["key=val","key","val"]
The ~= operator is sugar for re_match and returns the same boolean.
Splitting
| Method | Behavior |
|---|---|
split(sep) | Split on literal separator |
split_re(pattern) | Split on regex |
QUERY: "a,b,c".split(",") OUT: ["a","b","c"]
QUERY: "a,,b".split_re(",+") OUT: ["a","b"]
Multi-needle membership
"abc def".contains_any(["abc", "xyz"]) # true (matches first)
"abc def".contains_all(["abc", "def"]) # true (all match)
Demand notes
Regex builtins are scalar. Lift across an array with .map(...). The
underlying regex is compiled once per query and reused — no per-element
re-compilation cost.
Conversion and Parsing
Coerce between value kinds.
to_number
- Signature:
Any -> Number | null - Behavior: Coerce to number.
"42"→42,"3.14"→3.14,true→1,false→0. Returns null for unparseable strings.
QUERY: "42".to_number() OUT: 42
QUERY: "3.14".to_number() OUT: 3.14
QUERY: "abc".to_number() OUT: null
to_bool
- Signature:
Any -> Boolean - Behavior: Truthiness:
false/null/0/""/[]/{}→false, everything else →true.
QUERY: $.maybe.to_bool()
parse_int(radix?)
- Signature:
String -> Number | null - Behavior: Parse a string as integer, optional radix (default 10).
QUERY: "42".parse_int() OUT: 42
QUERY: "ff".parse_int(16) OUT: 255
QUERY: "0b101".parse_int(2) OUT: 5
parse_float
- Signature:
String -> Number | null - Behavior: Parse a string as float (IEEE 754 double).
QUERY: "3.14".parse_float() OUT: 3.14
QUERY: "1e6".parse_float() OUT: 1000000.0
parse_bool
- Signature:
String -> Boolean | null - Behavior: Strict parse: only
"true"and"false"(lowercase) match; everything else returns null.
QUERY: "true".parse_bool() OUT: true
QUERY: "TRUE".parse_bool() OUT: true
as cast (operator)
The as operator does the same coercions as to_*:
"42" as int # 42
42 as string # "42"
true as int # 1
Use as when the type is statically known; use to_number/parse_* when
parsing untrusted strings (since as errors on failure rather than returning
null).
Round-trip JSON
For full document round-trip, see from_json/to_json.
Practical examples
# Coerce strings collected from a CSV
$.rows.map(r => r.merge({age: r.age.to_number(), price: r.price.parse_float()}))
# Defensive parse — null on garbage
$.user_input.parse_int() ?? 0
# Boolean coercion of a flag string
"true".parse_bool() ?? false
# Truthiness coercion
$.value.to_bool() # null/0/""/empty → false; else true
# Cast operator for static conversions
($.id as string).pad_left(8, "0")
# Round-trip number → string → back
(3.14 as string).parse_float() # → 3.14
Row Stream Source
rows() is a source builtin. It changes what the receiver means: instead of
querying one document value, it exposes a stream of rows.
rows()
- Signature:
Source -> Stream<Row> - Arity: zero
- Demand behavior: forwards retained-row demand to the source
- Supported stream stages:
reverse,filter,find,distinct_by,take,first,map
Normal JSON
On a normal JSON document, $.rows() treats the document itself as one row:
DOC: {"id":1,"name":"Ada"}
QUERY: $.rows().map({id: $.id, name: $.name})
OUT: [{"id":1,"name":"Ada"}]
Top-level arrays are also one document row in normal JSON mode. Use normal array methods directly when the input document is an array.
NDJSON
In NDJSON mode, root $.rows() means all rows in the file or reader:
jetrocli --ndjson -i events.ndjson \
-e '$.rows().filter($.active).take(10).map({id: $.id, name: $.name})'
Without $.rows(), the same CLI mode is row-local:
jetrocli --ndjson -i events.ndjson -e '$.id'
Reverse
For file-backed NDJSON, reverse() scans from the end:
jetrocli --ndjson -i app.log \
-e '$.rows().reverse().find($.level == "error").first()'
Reader-backed reverse streams are unsupported because readers cannot seek.
Latest Per Key
For Kafka compacted-topic dumps, scan newest-to-oldest and keep the first row seen for each key:
jetrocli --ndjson -i users.ndjson --payload-after '|' \
-e '$.rows().reverse().distinct_by($.id).take(100).map({id: $.id, name: $.name})'
distinct_by in this stream order keeps the newest row for each key and drops
older duplicates immediately.
Notes
rows()is currently root-level: use$.rows(), not$.books.rows().mapis delayed or direct-written only when it is semantically safe.- Unsupported stream methods fail before scanning input.
- For more examples, see NDJSON and Whole-Stream Queries.
Streaming One-to-One
Each input produces exactly one output. These compose freely; the planner fuses adjacent stages into a single composed stage when possible.
Fixture
Examples in this chapter run against:
{
"users": [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}],
"xs": [1, 2, 3, 4, 5],
"prices":[100, 105, 102, 110, 108, 115]
}
map
- Signature:
Array<A> -> Array<B>(withf: A -> B) - Demand law:
MapLike— preserves pull, forcesWhole.
QUERY: $.users.map(u => u.name)
OUT: ["Ada","Bob"]
QUERY: $.xs.map(@ * 2)
OUT: [2, 4, 6, 8, 10]
QUERY: $.users.map(@.name.upper())
OUT: ["ADA","BOB"]
map is the workhorse. The lambda may use any of the three forms.
enumerate
- Signature:
Array<A> -> Array<{index: Number, value: A}> - Behavior: Pair each element with its zero-based index. Output is a
record
{index, value}per element.
QUERY: $.xs.enumerate()
OUT: [{"index":0,"value":1},{"index":1,"value":2},{"index":2,"value":3},{"index":3,"value":4},{"index":4,"value":5}]
QUERY: $.users.map(@.name).enumerate()
OUT: [{"index":0,"value":"Ada"},{"index":1,"value":"Bob"}]
pairwise
- Signature:
Array<A> -> Array<[A, A]> - Behavior: Yield consecutive pairs
[xs[0], xs[1]],[xs[1], xs[2]], …
QUERY: [1,2,3,4].pairwise()
OUT: [[1,2],[2,3],[3,4]]
QUERY: $.xs.pairwise().map(p => p[1] - p[0])
OUT: [1, 1, 1, 1]
lag(n=1) and lead(n=1)
- Signature:
Array<Number> -> Array<Number | null> - Behavior: Shift by
npositions; out-of-range positions becomenull. - Numeric: Output values are returned as floats regardless of input numeric type.
QUERY: $.xs.lag()
OUT: [null, 1.0, 2.0, 3.0, 4.0]
QUERY: $.xs.lead()
OUT: [2.0, 3.0, 4.0, 5.0, null]
QUERY: $.xs.lag(2)
OUT: [null, null, 1.0, 2.0, 3.0]
diff_window(n=1)
- Signature:
Array<Number> -> Array<Number | null> - Behavior:
xs[i] - xs[i - n], withnulluntil lag is satisfied.
QUERY: $.prices.diff_window()
OUT: [null, 5.0, -3.0, 8.0, -2.0, 7.0]
pct_change(n=1)
- Signature:
Array<Number> -> Array<Number | null> - Behavior:
(xs[i] - xs[i-n]) / xs[i-n]— relative change.
QUERY: [100.0, 110.0, 121.0].pct_change()
OUT: [null, 0.1, 0.09999999999999998]
cummax and cummin
- Signature:
Array<Number> -> Array<Number> - Behavior: Running max / min up to and including the current position.
QUERY: $.prices.cummax()
OUT: [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]
QUERY: $.prices.cummin()
OUT: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
zscore
- Signature:
Array<Number> -> Array<Number> - Behavior: Standardise:
(x - mean) / stddev. Two passes (one for stats, one for transform); not strictly streaming, but presented as a one-to-one stage at the user surface.
QUERY: [1.0, 2.0, 3.0, 4.0, 5.0].zscore()
OUT: [-1.414213562373095, -0.7071067811865475, 0.0, 0.7071067811865475, 1.414213562373095]
accumulate
See Barriers — accumulate is a barrier because it requires
a custom reducer over the full input.
Practical examples
DOC: {"prices":[100, 105, 102, 110, 108, 115]}
# Apply tax to every price
QUERY: $.prices.map(@ * 1.08)
OUT: [108.0, 113.4, 110.16000000000001, 118.80000000000001, 116.64000000000001, 124.2]
# Day-over-day deltas
QUERY: [100,105,102,110,108].pairwise().map(p => p[1] - p[0])
OUT: [5, -3, 8, -2]
# Running max ("high-water mark")
QUERY: $.prices.cummax()
OUT: [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]
# Lag-1 to compare current vs previous
QUERY: $.prices.lag()
OUT: [null, 100.0, 105.0, 102.0, 110.0, 108.0]
Filtering
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "xs": [1, 2, 3, 4, 5]}
Methods that drop elements based on a predicate.
filter
- Signature:
Array<A> -> Array<A>(withpred: A -> Bool) - Demand law:
FilterLike—FirstInput(n)from downstream becomesUntilOutput(n)upstream.
$.users.filter(u => u.active)
$.users.filter(@.age >= 18)
$.users.filter(@.email ~= "@admin\.")
filter is the universal predicate stage. Combine with .take(n) for
bounded scans:
$.events.filter(@.severity >= 3).take(10)
The planner stops reading from the source as soon as 10 events pass — no full scan.
find
- Signature:
Array<A> -> A | null(first match only on this branch) - Demand law:
FilterLikewithFirstInput(1)→ source.
DOC: {"users": [{"id":1,"role":"user"},{"id":2,"role":"admin"}]}
QUERY: $.users.find(@.role == "admin")
OUT: {"id":2,"role":"admin"}
find returns the first match (or null if none), not an array. Use
find_all for the array form.
find_all
- Signature:
Array<A> -> Array<A> - Behavior: Like
filter. Alias kept for readability.
$.users.find_all(@.role == "admin")
Equivalent to .filter(@.role == "admin"). The two are interchangeable.
compact
- Signature:
Array<Any> -> Array<Any> - Behavior: Drop nulls.
QUERY: [1, null, 2, null, 3].compact()
OUT: [1,2,3]
Equivalent to .filter(@ != null), but reads better and avoids a lambda.
take_while (alias takewhile)
- Signature:
Array<A> -> Array<A> - Behavior: Take elements while
predis true; stop at the first false (don't keep checking).
QUERY: [1, 2, 3, 4, 1, 2].take_while(@ < 3)
OUT: [1,2]
Demand law: bounded — terminates the source as soon as pred flips.
drop_while (alias dropwhile)
- Signature:
Array<A> -> Array<A> - Behavior: Drop the leading run where
predholds; emit the rest.
QUERY: [1, 2, 3, 4, 1, 2].drop_while(@ < 3)
OUT: [3,4,1,2]
remove
- Signature:
Array<A> -> Array<A> - Behavior: Inverse of
filter. Drop elements wherepredis true.
QUERY: $.xs.remove(@ < 0)
Useful when the negated predicate reads worse than the affirmative.
Filtering objects
For object filtering, see filter_keys and filter_values in
Objects. They take a predicate over keys / values and return
a filtered object.
Practical examples
DOC: {"users":[
{"id":1,"name":"Ada","active":true,"age":30},
{"id":2,"name":"Bob","active":false,"age":24},
{"id":3,"name":"Cy", "active":true,"age":42}
]}
# Active users only
QUERY: $.users.filter(@.active)
OUT: []
# Active users over 30, just names
QUERY: $.users.filter(@.active and @.age >= 30).map(@.name)
OUT: []
# First admin (early-exit)
QUERY: $.users.find(@.active).name
OUT: "Ada"
# Take while a streak holds
QUERY: [1,2,3,4,1,2].take_while(@ < 3)
OUT: [1,2]
# Negate a predicate
QUERY: $.users.remove(@.active).count()
OUT: 1
# Drop nulls
QUERY: [1, null, 2, null, 3].compact()
OUT: [1,2,3]
Worked demand example
DOC: {"events": [
{"sev": 1, "msg": "ok"},
{"sev": 2, "msg": "warn"},
{"sev": 3, "msg": "err"},
{"sev": 1, "msg": "ok2"}
]}
QUERY: $.events.filter(@.sev >= 2).map(@.msg).take(2)
OUT: []
Demand walks back: take(2) → FirstInput(2), map → preserves, filter → UntilOutput(2). Source reads events one-by-one, stops after the second match.
Expanding Sequences
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}]}
Each input produces zero or many outputs.
flat_map
- Signature:
Array<A> -> Array<B>(withf: A -> Array<B>) - Behavior: Map then concatenate.
QUERY: [[1,2],[3,4]].flat_map(@)
OUT: [1,2,3,4]
QUERY: $.users.flat_map(u => u.tags)
If f returns a non-array, it's wrapped first (flat_map(@ + 1) works on
numbers).
flatten
- Signature:
Array<Array<A>> -> Array<A> - Behavior: One level of flattening.
QUERY: [[1,2],[3],[4,5]].flatten()
OUT: [1,2,3,4,5]
To flatten more levels, chain: .flatten().flatten(). Or use walk for full
recursive flatten of arbitrary structure.
explode
⚠ 0.5.11 status:
exploderequires an argument (errors with"explode: missing argument"on no-arg call). Spec is intended to mirrorchars/to_pairsfor the common cases; until then, use those builtins directly.
- Signature (intended):
(Array | Object | String) -> Array<...> - Behavior (intended): Convert to a flat sequence of elements / pairs /
chars.
- Array: identity
- Object: array of
[key, value]pairs (=to_pairs) - String: array of single-char strings (=
chars)
split(sep)
- Signature:
String -> Array<String> - Behavior: Split a string on a literal separator. (See
split_refor regex.)
QUERY: "a,b,c".split(",")
OUT: ["a","b","c"]
lines
- Signature:
String -> Array<String> - Behavior: Split on newline (
\nor\r\n).
QUERY: "a\nb\nc".lines()
OUT: ["a","b","c"]
words
- Signature:
String -> Array<String> - Behavior: Split on whitespace (any run).
QUERY: " hello world ".words()
OUT: ["hello","world"]
chars
- Signature:
String -> Array<String> - Behavior: Array of single-character strings.
QUERY: "abc".chars()
OUT: ["a","b","c"]
chars_of(s)
- Signature:
String -> Array<String> - Behavior: Equivalent to
s.chars(). Useful when the source is the argument:
QUERY: ($.text).chars_of()
bytes
- Signature:
String -> Array<Number> - Behavior: UTF-8 byte values, 0–255.
QUERY: "abc".bytes()
OUT: [97,98,99]
Demand notes
Expanding stages declare an indeterminate output count. Pull demand from downstream still flows back, but the planner can't tightly bound how many inputs are needed — it pulls one input at a time and yields outputs lazily.
.flat_map(...) followed by .first() will read inputs until the first
flat-mapped output appears, then stop.
Practical examples
# Flatten one level
[[1,2],[3,4],[5]].flatten() # → [1, 2, 3, 4, 5]
# Tags across all books
$.books.flat_map(@.tags)
# Distinct hashtags across tweets
$.tweets.flat_map(t => t.entities.hashtags.map(@.text)).unique()
# Word histogram from a paragraph
$.text.words().map(@.lower()).count_by(@)
# Parse CSV headers
"id,name,email".split(",")
# Process logs line by line
$.log_blob.lines().filter(@.contains_any(["ERROR","WARN"]))
# Char-level analysis
$.password.chars().count_by(@) # frequency of each char
# Bytes for a binary diff
"hello".bytes() # → [104, 101, 108, 108, 111]
Reducers and Aggregates
Reducers consume the whole stream and emit a single value. They terminate the streaming pipeline.
Numeric
| Method | Signature | Notes |
|---|---|---|
sum | Array<Number> -> Number | Empty → 0 |
avg | Array<Number> -> Number | Empty → null |
min | Array<Number|String> -> ... | Empty → null |
max | Array<Number|String> -> ... | Empty → null |
QUERY: [1,2,3,4].sum() OUT: 10
QUERY: [1,2,3,4].avg() OUT: 2.5
QUERY: [3,1,4,1,5].min() OUT: 1.0
QUERY: ["b","a","c"].max() OUT: "c"
Demand law: NumericReducer — ValueNeed::Numeric, pull = All.
count
- Signature:
Array -> Number - Behavior: Element count.
- Demand:
Allinputs,ValueNeed::None(no payload decoded).
QUERY: $.users.count()
QUERY: $.users.filter(@.active).count()
This is the cheapest reducer — the source skips deserialisation entirely.
approx_count_distinct
⚠ Not yet supported in 0.5.11 — runtime returns
"ApproxCountDistinct: builtin unsupported". Spec exists; HyperLogLog backend pending.
- Signature (planned):
Array<Any> -> Number - Behavior (planned): Approximate count of distinct values via HLL.
For now, use .unique().count() for exact distinct count.
any (alias exists)
- Signature:
Array<A> -> Bool(withpred: A -> Bool) - Behavior: True if any element matches. Short-circuits.
QUERY: $.users.any(@.role == "admin")
OUT: true
all
- Signature:
Array<A> -> Bool - Behavior: True if every element matches. Short-circuits on first false.
QUERY: $.flags.all(@ == true)
find_index
- Signature:
Array<A> -> Number | null - Behavior: Zero-based index of first match, or null.
QUERY: ["a","b","c"].find_index(@ == "b")
OUT: 1
indices_where
- Signature:
Array<A> -> Array<Number> - Behavior: All indices where
predmatches.
QUERY: [10, 20, 5, 30, 8].indices_where(@ < 15)
OUT: [0,2,4]
max_by and min_by
- Signature:
Array<A> -> A | null - Behavior: Element with the maximum / minimum projected key.
QUERY: $.books.max_by(@.year)
QUERY: $.users.min_by(@.age)
Distinguish from .sort(@.key).first() — max_by is one pass; the sort form
allocates the sorted array first.
When to use which
| Goal | Use |
|---|---|
| Sum/avg numbers | sum, avg |
| Count rows | count |
| Exact distinct count | .unique().count() |
| Existence check | any |
| Universal check | all |
| Find index | find_index |
| Pick single max/min element | max_by, min_by |
Practical examples
DOC: {"books":[
{"title":"Dune","year":1965,"price":15},
{"title":"Foundation","year":1951,"price":10},
{"title":"Hyperion","year":1989,"price":18},
{"title":"Snow Crash","year":1992,"price":12}
]}
# Total revenue across all books
QUERY: $.books.map(@.price).sum()
OUT: 55
# Mean price
QUERY: $.books.map(@.price).avg()
OUT: 13.75
# Earliest and most expensive
QUERY: $.books.min_by(b => b.year).title
OUT: "Foundation"
QUERY: $.books.max_by(b => b.price).title
OUT: "Hyperion"
# Any cyberpunk in the catalog?
QUERY: $.books.any(@.tags? and @.tags.includes("cyberpunk"))
# (where @.tags? guards against missing field)
# Count books published before 1970
QUERY: $.books.filter(@.year < 1970).count()
OUT: 2
# Position of the first 1990s book
QUERY: $.books.find_index(@.year >= 1990)
OUT: 3
# All published years where price > 12
QUERY: $.books.indices_where(@.price > 12)
OUT: [0,2]
Positional Access
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "transactions": [{"ts": "01"}, {"ts": "02"}, {"ts": "03"}]}
Bounded extraction by position.
first
- Signature:
Array<A> -> A | null - Demand law:
First— alwaysFirstInput(1).
QUERY: [10,20,30].first() OUT: 10
QUERY: [].first() OUT: null
QUERY: $.users.filter(@.active).first()
# Source reads only enough to get one active user.
Equivalent to .nth(0) but reads better and is the canonical "early-exit"
sink.
last
- Signature:
Array<A> -> A | null - Demand law:
Last— alwaysLastInput(1).
QUERY: [10,20,30].last() OUT: 30
When the source supports it (an in-memory array, or a tape with known
length), last seeks to the end; for streams it must drain.
nth(i)
- Signature:
Array<A> -> A | null - Demand law:
NthInput(i)ifiis non-negative;LastInput(-i)otherwise.
QUERY: [10,20,30,40].nth(2) OUT: 30
QUERY: [10,20,30,40].nth(-1) OUT: 40
find_first(pred)
- Signature:
Array<A> -> A | null - Behavior: Same as
find— kept for naming clarity. Usefindin new code.
find_one(pred)
- Signature:
Array<A> -> A | null - Behavior: Asserts at most one match; errors if more than one matches. Useful for "exactly one user with this id" shapes.
QUERY: $.users.find_one(@.id == 1)
collect
- Signature:
Any -> Array<Any> - Behavior: Coerce to array. Scalar →
[scalar]; array → identity; null →[].
QUERY: 42.collect() OUT: [42]
QUERY: [1,2].collect() OUT: [1,2]
QUERY: null.collect() OUT: []
Use collect to guarantee an array shape at a pipeline boundary —
useful for callers that always want to iterate.
When to use a positional vs. a reducer
first() is a positional sink (returns one element). count() is a reducer
(returns one number). Both terminate the pipeline. Use whichever matches
your output type.
Worked example
DOC: {"orders": [
{"id": 1, "total": 100},
{"id": 2, "total": 50},
{"id": 3, "total": 200}
]}
QUERY: $.orders.filter(@.total > 75).first().id
OUT: 1
QUERY: $.orders.sort_by(@.total).last().id
OUT: 3
The first query early-exits (one filter pass, one match). The second sorts (barrier), then takes the last — the planner can't avoid the sort.
Practical examples
# First active user — early-exit, demand-aware
$.users.find(@.active).name
# Last log entry of severity 3+ (when the source supports random access)
$.logs.filter(@.sev >= 3).last().msg
# Get a user at known index
$.users.nth(2).email
# Negative-index array tail
$.transactions.nth(-1).ts
# Coerce-or-empty: scalar source becomes a 1-element array
"hello".collect() # → ["hello"]
null.collect() # → []
# Use collect() at a method-call boundary so callers always iterate
$.config.tags.collect().map(@.lower())
Barrier Operators
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}], "daily": [{"day": 1, "value": 10}, {"day": 2, "value": 12}]}
Barriers must see the full input before emitting any output. They materialise. Place them late in pipelines when possible.
Sort
sort (alias sort_by)
- Signature:
Array<A> -> Array<A> - Behavior: Stable ascending sort. With a projection, sorts by the projected key.
QUERY: [3,1,4,1,5].sort()
OUT: [1,1,3,4,5]
QUERY: $.books.sort(@.year)
QUERY: $.books.sort(b => -b.year)
QUERY: $.users.sort(@.last_name, @.first_name)
Multi-arg form sorts by a tuple of keys.
Distinct
unique (alias distinct)
- Signature:
Array<A> -> Array<A> - Behavior: Remove duplicates by structural equality, preserving first occurrence order.
QUERY: [3,1,4,1,5,9,2,6,5].unique()
OUT: [3,1,4,5,9,2,6]
unique_by(f)
- Signature:
Array<A> -> Array<A> - Behavior: Dedup by projected key.
QUERY: $.books.unique_by(@.author)
Group / count / index
group_by(key)
- Signature:
Array<A> -> Object<KeyString, Array<A>> - Behavior: Bucket by projected key.
QUERY: $.books.group_by(@.author)
OUT: {"Herbert":[{"title":"Dune",...}],"Asimov":[{"title":"Foundation",...}],...}
count_by(key)
- Signature:
Array<A> -> Object<KeyString, Number> - Behavior: Bucket counts.
QUERY: $.books.count_by(@.author)
OUT: {"Herbert":1,"Asimov":1,"Simmons":1,"Stephenson":1}
index_by(key)
- Signature:
Array<A> -> Object<KeyString, A> - Behavior: Index by key. Last wins on collision.
QUERY: $.users.index_by(@.id)
OUT: {"1":{"id":1,"name":"Ada",...},"2":{"id":2,"name":"Bob",...},"3":{"id":3,"name":"Cy",...}}
group_shape
⚠ Not yet supported in 0.5.11 — runtime returns
"GroupShape: builtin unsupported". Tracked for a future release.
- Signature:
Array<Object> -> Array<Object> - Behavior (planned): Group by structural shape (key set).
Partition
partition(pred)
- Signature:
Array<A> -> [Array<A>, Array<A>] - Behavior: Split into
[matching, non_matching].
QUERY: $.books.partition(@.year < 1970)
OUT: [[{"title":"Dune",...},{"title":"Foundation",...}],[{"title":"Hyperion",...},{"title":"Snow Crash",...}]]
Window / chunk
window(size)
- Signature:
Array<A> -> Array<Array<A>> - Behavior: Sliding window of
size.
QUERY: [1,2,3,4,5].window(3)
OUT: [[1,2,3],[2,3,4],[3,4,5]]
chunk(size) (alias batch)
- Signature:
Array<A> -> Array<Array<A>> - Behavior: Non-overlapping chunks. Last chunk may be shorter.
QUERY: [1,2,3,4,5,6,7].chunk(3)
OUT: [[1,2,3],[4,5,6],[7]]
Rolling aggregates
| Method | Behavior |
|---|---|
rolling_sum(n) | Sum over a window of size n |
rolling_avg(n) | Average over a window |
rolling_min(n) | Min over a window |
rolling_max(n) | Max over a window |
QUERY: [1,2,3,4,5].rolling_sum(3)
OUT: [null,null,6.0,9.0,12.0]
The leading n-1 positions emit null until the window fills.
accumulate(init, fn)
- Signature:
Array<A> -> Array<B>withfn: (B, A) -> B - Behavior: Streaming fold producing intermediate states.
QUERY: [1,2,3,4].accumulate(0, (a, x) => a + x)
OUT: [1,3,6,10]
QUERY: [1,2,3,4].accumulate((a, x) => a + x)
OUT: [1,3,6,10]
When to barrier
You have to barrier when:
- Order needs computation (
sort,unique) - Output is grouped / indexed (
group_by,index_by) - A window crosses element boundaries (
window,rolling_*)
You don't need a barrier for:
- Per-element transforms (
map) - Predicates (
filter) - Numeric reducers (
sum,count) — they're streaming reducers, not barriers
Practical examples
DOC: {"books":[
{"title":"Dune","year":1965,"author":"Herbert","price":15},
{"title":"Foundation","year":1951,"author":"Asimov","price":10},
{"title":"Hyperion","year":1989,"author":"Simmons","price":18},
{"title":"Snow Crash","year":1992,"author":"Stephenson","price":12}
]}
# Sort by year ascending
QUERY: $.books.sort(b => b.year).map(@.title)
OUT: ["Foundation","Dune","Hyperion","Snow Crash"]
# Sort by price descending (negate the key)
QUERY: $.books.sort(b => -b.price).map(@.title)
OUT: ["Hyperion","Dune","Snow Crash","Foundation"]
# Distinct tags across books
QUERY: $.books.flat_map(@.tags).unique()
# How many distinct authors
QUERY: $.books.unique_by(b => b.author).count()
OUT: 4
# Group by author
QUERY: $.books.group_by(b => b.author)
OUT: {"Herbert":[{"title":"Dune",...}],"Asimov":[{"title":"Foundation",...}],...}
# Histogram of authors (prefer count_by — no buffering of bucket payloads)
QUERY: $.books.count_by(b => b.author)
OUT: {"Herbert":1,"Asimov":1,"Simmons":1,"Stephenson":1}
# Build a quick lookup table
QUERY: $.users.index_by(u => u.id)
# Sliding-3 windows for moving stats
QUERY: $.measurements.window(3).map(w => w.sum() / 3)
# 50/50 split into batches of 10 for paginated processing
QUERY: $.records.chunk(10)
# 7-day moving average over a numeric series
QUERY: $.daily.rolling_avg(7)
Array and Set Operations
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "tags_today": ["a", "b", "c"], "tags_yesterday": ["b", "c", "d"], "left_tags": ["a", "b", "c"], "right_tags": ["b", "c", "d"]}
Operations that take an array and produce a derivative array (or join two arrays).
append(v) and prepend(v)
- Signature:
Array<A> -> Array<A> - Behavior: Add
vto the end / front.
QUERY: [1,2,3].append(4) OUT: [1,2,3,4]
QUERY: [1,2,3].prepend(0) OUT: [0,1,2,3]
When used as chain-write terminals ($.path.append(v)), they patch the
document — see Patch.
reverse
- Signature:
Array<A> -> Array<A> - Behavior: Reverse element order. Also works on strings (calls
reverse_str).
QUERY: [1,2,3].reverse() OUT: [3,2,1]
QUERY: "abc".reverse() OUT: ["abc"]
Set-like operations
| Method | Behavior |
|---|---|
diff(other) | Elements in self not in other |
intersect(other) | Elements in both |
union(other) | Elements in either, deduped |
QUERY: [1,2,3,4].diff([3,4,5]) OUT: [1,2]
QUERY: [1,2,3,4].intersect([3,4,5]) OUT: [3,4]
QUERY: [1,2,3].union([3,4,5]) OUT: [1,2,3,4,5]
Equality is structural. Order: result preserves first-occurrence order from the left operand.
join(sep)
- Signature:
Array<String> -> String - Behavior: Concatenate strings with separator.
QUERY: ["a","b","c"].join(", ")
OUT: "a, b, c"
QUERY: $.users.map(@.name).join(" / ")
For non-string elements, lift with .map(@.to_string()) first.
zip(other) and zip_longest(other, fill?)
- Signature:
Array<A>, Array<B> -> Array<[A, B]> - Behavior: Pair element-wise.
QUERY: [1,2,3].zip(["a","b","c"])
OUT: [[1,"a"],[2,"b"],[3,"c"]]
QUERY: [1,2,3].zip(["a","b"]) OUT: [[1,"a"],[2,"b"]]
QUERY: [1,2,3].zip_longest(["a","b"]) OUT: [[1,"a"],[2,"b"],[3,null]]
QUERY: [1,2,3].zip_longest(["a"], "x") OUT: [[1,"a"],[2,"x"],[3,"x"]]
fanout(...lambdas)
- Signature:
A -> Array<...> - Behavior: Apply each lambda to the same input; collect results.
DOC: {"x": 10}
QUERY: $.x.fanout(@ * 2, @ + 1, @.to_string())
OUT: [20,11,"10"]
Useful for building multi-shape projections without repeating subexpressions.
zip_shape(arrays)
⚠ Not yet supported in 0.5.11 — runtime returns
"ZipShape: builtin unsupported". Spec exists; runtime hookup pending.
- Signature (planned):
Object<KeyString, Array<A>> -> Array<Object> - Behavior (planned): Combine parallel arrays under shared keys into an array of objects.
The inverse is pivot — see Objects.
Demand notes
Set operations and join are barriers (they consume both inputs fully).
reverse is a barrier too — but it's cheap and well-supported by demand:
reverse().take(n) is rewritten so the source seeks to the end.
Practical examples
# Add an item to a tag list
$.user.tags.append("admin") # patches the doc
# Build a "label = value" string
$.user.pick(name, email).values().join(" = ")
# CSV row from selected fields
[$.user.id, $.user.name, $.user.email].join(",")
# Set difference — find items missing from a baseline
[1,2,3,4,5].diff([2,4]) # → [1, 3, 5]
# Set intersection — common items
$.left_tags.intersect($.right_tags)
# Merge unique values, preserving first-occurrence order
$.tags_today.union($.tags_yesterday)
# Reverse and take last 5 (demand-aware: seeks end)
$.events.reverse().take(5)
# Pair two arrays positionally
[1,2,3].zip(["a","b","c"]) # → [[1,"a"],[2,"b"],[3,"c"]]
# Pad shorter array with default
[1,2,3].zip_longest(["a","b"], "?") # → [[1,"a"],[2,"b"],[3,"?"]]
# Run several projections at once
$.metric.value.fanout(@ * 2, @ + 1, @ - 1) # → [v*2, v+1, v-1]
Object Projection and Transform
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}
Methods that read or rewrite objects.
Keys and values
| Method | Signature | Result |
|---|---|---|
keys | Object -> Array<String> | Insertion-order key list |
values | Object -> Array<Any> | Insertion-order value list |
entries | Object -> Array<[String, Any]> | Key-value pairs |
to_pairs | Object -> Array<[String, Any]> | Alias of entries |
DOC: {"a": 1, "b": 2}
QUERY: $.keys() OUT: ["a","b"]
QUERY: $.values() OUT: [1,2]
QUERY: $.entries() OUT: [["a",1],["b",2]]
from_pairs
- Signature:
Array<[String, Any]> -> Object - Behavior: Inverse of
to_pairs.
QUERY: [["a",1],["b",2]].from_pairs()
OUT: {"a":1,"b":2}
invert
- Signature:
Object<K, V> -> Object<V, K> - Behavior: Swap keys and values. Values must be coercible to keys (string-like).
QUERY: {"a":"x","b":"y"}.invert()
OUT: {"x":"a","y":"b"}
pick(field, ...)
- Signature:
Object -> Object - Behavior: Keep only the named keys. Supports
alias: srcrename.
DOC: {"id": 1, "name": "Ada", "secret": "!"}
QUERY: $.pick(id, name)
OUT: {"id":1,"name":"Ada"}
QUERY: $.pick(uid: id, name)
OUT: {"name":"Ada","uid":1}
Maps over arrays of objects:
$.users.pick(id, email)
is equivalent to $.users.map(u => u.pick(id, email)).
omit(field, ...)
- Signature:
Object -> Object - Behavior: Inverse of
pick. Drop the named keys.
QUERY: $.user.omit(secret, password)
Merge
| Method | Behavior |
|---|---|
merge(other) | Shallow merge — other's keys win on collision |
deep_merge(other) | Recursive merge — sub-objects merged, arrays replaced |
defaults(other) | Reverse merge — keep self's keys, fill missing from other |
QUERY: {"a":1,"b":2}.merge({"b":99,"c":3})
OUT: {"a":1,"b":99,"c":3}
QUERY: {"a":{"x":1}}.deep_merge({"a":{"y":2}})
OUT: {"a":{"x":1,"y":2}}
QUERY: {"a":1}.defaults({"a":99,"b":2})
OUT: {"a":1,"b":2}
rename(...mapping)
- Signature:
Object -> Object - Behavior: Rename keys per a
{old: new, ...}mapping.
QUERY: $.user.rename({user_id: id, full_name: name})
transform_keys(fn) and transform_values(fn)
- Signature:
Object -> Object - Behavior: Apply
fnto every key / value.
QUERY: {"foo": 1, "bar": 2}.transform_keys(@.upper())
OUT: [{"BAR":2,"FOO":1}]
QUERY: {"a": 1, "b": 2}.transform_values(@ * 10)
OUT: [{"a":10,"b":20}]
filter_keys(pred) and filter_values(pred)
- Signature:
Object -> Object - Behavior: Keep entries whose key / value matches the predicate.
QUERY: $.config.filter_keys(k => k.starts_with("aws_"))
QUERY: $.scores.filter_values(@ >= 50)
pivot(rows, cols, value)
- Signature:
Array<Object> -> Object<KeyString, Object> - Behavior: Pivot a table-shaped array into a nested object indexed by
rowsthencols, withvalueas the leaf.
DOC: [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY: $.pivot("y", "q", "v")
OUT: {"2024":{"1":10,"2":20},"2025":{"1":15}}
implode(joiner=",")
- Signature:
Array<String> -> String - Behavior: Like
join, but works on object values too:
QUERY: {"a":"x","b":"y"}.values().implode("/")
OUT: ["x","y"]
Demand notes
pick is a powerful demand signal — it tells the source which fields are
needed. Over a wide-record document, pick(id, name) upstream of the rest
of the pipeline avoids decoding all the other fields.
keys over an array stage emits one row per element, but keys over a
single object is a scalar.
Practical examples
DOC: {"users":[
{"id":1,"name":"Ada","email":"ada@x.com","secret":"!"},
{"id":2,"name":"Bob","email":"bob@y.org","secret":"?"}
]}
# Project safe public fields
QUERY: $.users.map(u => u.pick(id, name, email))
# Drop sensitive keys
QUERY: $.users.map(u => u.omit(secret))
# Rename in flight
QUERY: $.users.map(u => u.pick(uid: id, full_name: name, email))
# Keys / values / entries
QUERY: $.users[0].keys() → ["id","name","email","secret"]
QUERY: $.users[0].values().count() → 4
QUERY: $.users[0].entries().count() → 4
# Round-trip through entries
QUERY: $.users[0].entries().from_pairs() → equivalent to $.users[0]
# Merge with defaults (existing keys win)
QUERY: $.config.defaults({timeout: 30, retries: 3})
# Deep-merge config layers
QUERY: $.base_config.deep_merge($.user_config)
# Filter object by key prefix
QUERY: $.env.filter_keys(k => k.starts_with("AWS_"))
# Filter values
QUERY: $.scores.filter_values(@ >= 50)
# Apply transform to every value
QUERY: $.prices.transform_values(@ * 1.08)
# Normalise keys to snake_case
QUERY: $.payload.transform_keys(k => k.snake_case())
# Invert a code-to-name table
QUERY: $.country_codes.invert() # {"US":"United States",...} → {"United States":"US",...}
# Pivot long-format records
DOC: [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY: $.pivot("y","q","v")
OUT: {"2024":{"1":10,"2":20},"2025":{"1":15}}
Path and Structural Mutation
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}
Methods that read, set, delete, or rewrite values at specific paths within a document. These work on whole documents or sub-trees.
For chain-write terminals ($.path.set(v)) see Patch.
This chapter documents the method-call versions.
get_path(path)
- Signature:
Any, String -> Any | null - Behavior: Read a value at a slash- or dot-separated path.
DOC: {"user": {"profile": {"name": "Ada"}}}
QUERY: $.get_path("user")
OUT: {"profile":{"name":"Ada"}}
QUERY: $.get_path("user/profile")
OUT: {"name":"Ada"}
set_path(path, value)
- Signature:
Any, String, Any -> Any - Behavior: Return a copy with
valuewritten atpath. Creates intermediate objects as needed.
QUERY: $.set_path("user/profile/email", "ada@example.com")
del_path(path)
- Signature:
Any, String -> Any - Behavior: Return a copy with the leaf at
pathremoved.
QUERY: $.del_path("user/secret")
del_paths(paths)
- Signature:
Any, Array<String> -> Any - Behavior: Remove all listed paths in one pass. Cheaper than chained
del_pathfor many removals.
QUERY: $.del_paths(["user/secret", "user/temp", "session/csrf"])
has_path(path)
- Signature:
Any, String -> Bool - Behavior: True if a path exists and resolves to a non-null value.
Current 0.5.11 behavior treats a present
nulllike a missing path:
DOC: {"a": null}
QUERY: $.has_path("a") OUT: false
QUERY: $.has_path("b") OUT: false
flatten_keys(sep="/")
- Signature:
Object -> Object - Behavior: Flatten a nested object into a single-level object with joined keys.
DOC: {"a": {"b": 1, "c": 2}, "d": 3}
QUERY: $.flatten_keys()
OUT: {"a.b":1,"a.c":2,"d":3}
QUERY: $.flatten_keys(".")
OUT: {"a.b":1,"a.c":2,"d":3}
unflatten_keys(sep="/")
- Signature:
Object -> Object - Behavior: Inverse of
flatten_keys.
QUERY: {"a/b": 1, "a/c": 2}.unflatten_keys()
OUT: {"a/b":1,"a/c":2}
set(path, value) (method-call form)
- Signature:
Any, String, Any -> Any - Behavior: Same as
set_path. Kept for ergonomic chains.
The chain-write terminal $.path.set(v) is different — it's parsed as
a patch and operates on the rooted document path.
update
update is jetro's functional batched update. Two surfaces:
Object body — update({k: expr, ...})
Apply a set of field updates to one or more selected subtrees. Plain keys update fields below the receiver; quoted keys carry full paths.
DOC: {"books": [
{"title": "Dune", "year": 1965, "tags": ["sf"]},
{"title": "Hyperion", "year": 1989, "tags": ["sf", "hugo"]}
]}
QUERY: $.books[*].update({tags: tags.append("test"), reviewed: true})
OUT: {"books":[{"reviewed":true,"tags":["sf","test"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","hugo","test"],"title":"Hyperion","year":1989}]}
Each selected book gets both fields written. Plain identifiers (tags,
reviewed) are read against the selected snapshot — not the
mid-batch document — so two ops on the same target both see the original
field values.
Body forms:
| Form | Meaning |
|---|---|
field: expr | Write expr into field of each selected target |
"a.b.c": expr | Write into a nested path inside each selected target |
"books[*].tags": expr | Quoted path key — full root-relative path with wildcards/filters |
field: expr when cond | Skip when cond is falsy |
field: DELETE | Remove the field (with optional when) |
@ inside the body is the current value at the target field (handy
inside path keys); $ is the original root.
QUERY: $.books[*].update({tags: tags.append("modern") when year > 1980})
OUT: {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","hugo","modern"],"title":"Hyperion","year":1989}]}
Root-level batch with quoted paths
When the receiver is $, quoted keys carry full paths, including
wildcards and DELETE:
QUERY: $.update({"books[*].tags": @.append("test"), active: false})
DOC: {"books": [{"tags": ["sf"]}], "active": true}
OUT: {"active":false,"books":[{"tags":["sf","test"]}]}
DOC: {"users": [{"id":1,"secret":"a"}, {"id":2,"secret":"b"}]}
QUERY: $.update({"users[*].secret": DELETE})
OUT: {"users":[{"id":1},{"id":2}]}
Filtered wildcard [* if pred]
Both selectors and quoted path keys support a filtered wildcard:
DOC: {"books": [
{"title": "Dune", "year": 1965, "tags": ["sf"]},
{"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}
QUERY: $.books[* if year > 1980].update({tags: tags.append("modern")})
OUT: {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}
QUERY: $.update({"books[* if year > 1980].tags": @.append("modern")})
OUT: {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}
Two-argument path form — update(path, expr)
The classic shape: a slash- or dot-separated path plus an expression.
@ inside the expression is the current value at path.
DOC: {"counters": {"visits": 10, "clicks": 3}}
QUERY: $.update("counters.visits", @ + 1)
OUT: {"counters":{"clicks":3,"visits":11}}
QUERY: $.update("counters/visits", @ + 1)
OUT: {"counters":{"clicks":3,"visits":11}}
Semantics
| Property | Behavior |
|---|---|
| Snapshot reads | Each body expression sees the pre-batch values, not partial mid-batch state |
| Order | Ops apply in source order — last write wins on overlap |
| Selectors | Index, wildcard [*], filtered wildcard [* if pred], nested chains all OK |
| Scalar targets | An update with object body promotes scalar elements to objects ({seen: true} over [1,2] → [{seen:true},{seen:true}]) |
| Untouched subtrees | Preserved by Arc sharing — no deep copy of unrelated fields |
| Empty body | .update({}) is a no-op — returns the doc unchanged |
Worked example
DOC: {"users": [
{"id": 1, "secret": "a", "name": "Ada"},
{"id": 2, "secret": "b", "name": "Bob"}
]}
QUERY: $.users.map(u => u.omit("secret").set_path("display", u.name))
OUT: [{"display":"Ada","id":1,"name":"Ada"},{"display":"Bob","id":2,"name":"Bob"}]
Demand notes
Path-mutation methods produce a full result and can't tell the source what
fields they need (the path is data, not statically analysable). When the
path is a literal, prefer pick/omit/set over get_path/set_path —
the planner can use literal field names.
Practical examples
# Single-key write
$.user.name.set("Ada Lovelace") # chain-write
# Set a field deep
patch $ { user.profile.email: "ada@x.com" }
# Bulk delete
$.del_paths(["secret","temp","csrf"])
# Flatten a nested config for environment-variable export
$.config.flatten_keys(".") # {"db.host":..., "db.port":..., ...}
# Round-trip via flatten/unflatten
$.config.flatten_keys().unflatten_keys() # ≈ $.config
# Existence test before write
patch $ {
email: $.user.email when $.has_path("user.email")
}
# Flat-key patches
$.patch_set.flatten_keys().entries().map(([k,v]) => $.set_path(k, v))
# Batched functional update
$.books[*].update({
reviewed: true,
tags: tags.append("classic") when year < 1970,
tmp: DELETE
})
Deep Traversal and Recursion
Walk every descendant value in DFS pre-order. The deep methods are also
available as ..method(...) syntax sugar in path position.
deep_find(pred) (or ..find(pred))
- Signature:
Any -> Array<Any> - Behavior: Every descendant satisfying
pred. Order: DFS pre-order.
DOC: {"a": {"x": 1}, "b": [{"x": 2}, {"y": 3}]}
QUERY: $..find(@.x?)
OUT: [{"x":1},{"x":2}]
QUERY: $.deep_find(@ is number)
OUT: [1,2,3]
When the structural index is available, deep_find runs over a bitmap
representation in jetro-experimental rather than walking Val nodes —
significantly faster for shallow predicates.
deep_shape({k1, k2, ...}) (or ..shape({...}))
- Signature:
Any -> Array<Object> - Behavior: Every object that has all listed keys (regardless of value).
DOC: [{"id":1,"name":"a"},{"id":2},{"name":"c","id":3}]
QUERY: $..shape({id, name})
OUT: [{"id":1,"name":"a"},{"id":3,"name":"c"}]
deep_like({k1: v1, ...}) (or ..like({...}))
- Signature:
Any -> Array<Object> - Behavior: Every object whose listed keys equal the listed literal values.
DOC: [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942},{"author":"Herbert","year":1965}]
QUERY: $..like({author: "Asimov"})
OUT: [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942}]
walk(fn)
- Signature:
Any, (Any -> Any) -> Any - Behavior: Apply
fnto every node bottom-up; rebuild the tree.
QUERY: $.walk(node => node.upper() if node is string else node)
# Returns the document with every string node uppercased.
walk_pre(fn)
- Signature:
Any, (Any -> Any) -> Any - Behavior: Like
walk, but pre-order —fnsees parent before children.
Use walk_pre when the transform decides whether to recurse based on the
node's identity (e.g. "stop at leaves of kind X").
rec(pattern, fn)
⚠ Limited in 0.5.11 — recursive rewrites are guarded with a 10 000 iteration cap. Prefer
walkorwalk_prefor one-pass document traversal, and keeprecfor bounded fixpoint-style rewrites.
- Signature (planned):
Any, Pattern, (Any -> Any) -> Any - Behavior (planned): Match-and-rewrite. Recursively walks; replaces
every match with
fn(match).
This is the recursive sibling of Pattern Match; useful for AST rewrites and document migrations.
trace_path(pred)
- Signature:
Any, (Any -> Bool) -> Array<Array<Step>> - Behavior: For every node matching
pred, return the path from root to the node as an array of steps.
DOC: {"a": {"x": 1}, "b": [{"x": 2}]}
QUERY: $.trace_path(@.x?)
OUT: [{"path":"$.a","value":{"x":1}},{"path":"$.b[0]","value":{"x":2}}]
The steps are the keys/indices to walk to reach the match. Pair with
set_path for find-and-replace operations.
Deep match
The pattern-match construct has deep variants ..match and ..match! —
see Control Flow and the pattern-match
cookbook.
When the bitmap kicks in
Deep search uses the structural index when:
- The query is rooted at
$..or.deep_* - The predicate is a shape/key check (not a complex lambda)
- The document was loaded with the simd-json tape (default)
You don't enable this — it's selected by the planner.
Demand notes
Deep traversals declare All upstream by nature. The optimisation surface
is the predicate: shape and like checks bypass the per-node lambda
evaluation entirely.
Practical examples
# Find every node with an "id" key (anywhere in the tree)
$..find(@.id?)
# Find all numbers
$..find(@ is number)
# Every object that has both id + name keys
$..shape({id, name})
# Every object where a field equals a specific value
$..like({status: "error"})
# Locate an event by ID inside a deeply nested tree
$..match! { {id: 42} -> @, _ -> null }
# Walk every node, transforming strings to upper
$.walk(node => node.upper() if node is string else node)
# Trace paths from root to nodes matching a predicate
$.trace_path(@.is_admin?)
# → [["users",0],["users",2]]
# Bulk audit: find every "secret"-named field
$..find(@.secret?)
Membership and Predicates
Tests and small helpers.
or(default)
- Signature:
Any, Any -> Any - Behavior: If self is null, return
default. Otherwise return self.
QUERY: null.or("default") OUT: "default"
QUERY: "hi".or("default") OUT: "hi"
Equivalent to ?? default but reads better in chains:
$.user.name.or("anon")
has(key)
- Signature:
Object|Array, KeyOrIndex -> Bool - Behavior: True if the key exists (objects) or index is in range (arrays).
QUERY: {"a":1,"b":2}.has("a") OUT: true
QUERY: {"a":1}.has("b") OUT: false
QUERY: [1,2,3].has(2) OUT: true
QUERY: [1,2,3].has(5) OUT: false
The has operator (x has y) is sugar for x.includes(y) — distinct
from this method.
has_key(key)
- Signature:
Object, String -> Bool - Behavior: True if the receiver is an object and the key exists.
QUERY: {"a":1,"b":null}.has_key("a") OUT: true
QUERY: {"a":1,"b":null}.has_key("b") OUT: true
QUERY: {"a":1}.has_key("z") OUT: false
QUERY: [1,2,3].has_key("0") OUT: false
Use has_key when you specifically mean object-key existence. It is narrower
than has and easier for direct object-key checks to optimize.
missing(...keys)
- Signature:
Object, ...String -> Array<String> - Behavior: Return the subset of provided keys that are not present.
QUERY: {"host":"localhost","port":5432}.missing("host", "port", "user")
OUT: ["user"]
includes(value) (alias contains)
- Signature:
Array|String, Any -> Bool - Behavior: Membership.
QUERY: [1,2,3].includes(2) OUT: true
QUERY: "hello".includes("ell") OUT: true
index(value)
- Signature:
Array|String, Any -> Number | null - Behavior: Index of first occurrence; null if not found.
QUERY: [10,20,30].index(20) OUT: 1
QUERY: [10,20,30].index(99) OUT: null
For strings, see also index_of in String Search.
indices_of(value)
- Signature:
Array|String, Any -> Array<Number> - Behavior: All indices of
value.
QUERY: [1,2,3,2,1].indices_of(2)
OUT: [1, 3]
Quick comparison: predicates that look similar
| Pattern | Returns |
|---|---|
obj.has_key("foo") | Bool — does this object key exist? |
xs.has("foo") | Bool — key/index style existence helper |
xs.includes("foo") | Bool — is the value present? |
x has y | Bool — membership/containment operator |
doc.has_path("a.b") | Bool — does this nested path exist? |
xs.index("foo") | Number|null — where? |
xs.indices_of("foo") | Array — all positions |
xs.find(p) | A|null — first matching element |
xs.find_index(p) | Number|null — first matching index |
Practical examples
# Default for missing field
$.user.email.or("no-email@example.com")
# Existence check on key
$.config.has_key("aws_region")
# Which required config keys are absent
$.config.missing("host", "port", "user")
# Index of a value (not the predicate form)
$.tags.index("admin")
# All positions of duplicates
[1, 2, 1, 3, 1].indices_of(1) # → [0, 2, 4]
# Membership in a set
$.tags.includes("urgent")
# Allow-list / deny-list patterns
$.role.includes("admin") and not $.banned_users.includes($.id)
Tabular Output
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}]}
Serialise sequences of objects to row-oriented text formats.
to_csv(headers?)
- Signature:
Array<Object> -> String - Behavior: RFC-4180-ish CSV. Without arguments, the union of object keys is the header set, sorted by first-appearance.
DOC: [{"name":"Ada","age":36},{"name":"Bob","age":42}]
QUERY: $.to_csv()
OUT:
"name,age
Ada,36
Bob,42"
With explicit headers:
QUERY: $.to_csv(["age","name"])
OUT:
"age,name
36,Ada
42,Bob"
Strings containing commas, quotes, or newlines are quoted and escaped per RFC 4180.
to_tsv(headers?)
- Signature:
Array<Object> -> String - Behavior: Same as
to_csvbut tab-separated. No quoting (tab-in-value is replaced with a space).
QUERY: $.users.to_tsv(["id","email"])
Composing with the rest of the pipeline
Build a report:
$.users
.filter(@.active)
.map(u => u.pick(id, name, email))
.sort(@.id)
.to_csv()
Pipe to a file from the CLI:
jetrocli -e '$.users.filter(@.active).pick(id,name).to_csv()' < users.json > out.csv
Limitations
- Nested values are JSON-encoded into the cell. For deeply-nested structures,
flatten first with
flatten_keys:$.records.map(r => r.flatten_keys()).to_csv() - The format is row-major. For wide-narrow long-format reshape, use
pivot/zip_shapefirst. - For Excel-flavored CSV (BOM, CRLF), post-process the result.
Practical examples
# Active-user export
$.users.filter(@.active).map(u => u.pick(id, name, email)).sort(u => u.id).to_csv()
# Daily sales report
$.sales.group_by(s => s.day).entries().map(e => {
day: e[0],
total: e[1].map(@.amount).sum(),
count: e[1].count()
}).to_csv()
# Hashtag frequency CSV
$.tweets.flat_map(t => t.entities.hashtags.map(@.text))
.count_by(@)
.entries()
.map(e => {tag: e[0], count: e[1]})
.to_csv()
# TSV for log shipping
$.logs.map(l => l.pick(ts, level, message)).to_tsv()
Relational
Fixture
Examples below run against:
DOC: {"orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "customers": [{"id": 1, "name": "Ada", "email": "ada@x.com"}, {"id": 2, "name": "Bob", "email": "bob@y.org"}], "left": [{"id": 1, "name": "Ada"}, {"id": 2, "name": "Bob"}], "right": [{"uid": 1, "role": "admin"}, {"uid": 2, "role": "user"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}]}
Operations that combine two arrays of objects on a key.
equi_join(other, leftKey, rightKey, fn?)
- Signature:
Array<L>, Array<R>, KeyL, KeyR, ((L, R) -> Any)? -> Array<Any> - Behavior: Inner equi-join: for every pair
(l, r)wherel[leftKey] == r[rightKey], emit a result. Iffnis omitted, the result is the merged objectl.merge(r).
LEFT: [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}]
RIGHT: [{"uid":1,"role":"admin"},{"uid":2,"role":"user"}]
QUERY: $.left.equi_join($.right, "id", "uid")
OUT: [{"id":1,"name":"Ada","uid":1,"role":"admin"},
{"id":2,"name":"Bob","uid":2,"role":"user"}]
QUERY: $.left.equi_join($.right, "id", "uid", (l, r) => {
name: l.name,
role: r.role
})
OUT: [{"name":"Ada","role":"admin"},{"name":"Bob","role":"user"}]
Worked example: orders + customers
DOC:
{
"customers": [
{"id": 1, "name": "Ada"},
{"id": 2, "name": "Bob"}
],
"orders": [
{"customer": 1, "amount": 100},
{"customer": 1, "amount": 50},
{"customer": 2, "amount": 75}
]
}
QUERY:
$.orders.equi_join($.customers, "customer", "id", (o, c) => {
customer: c.name,
amount: o.amount
})
OUT:
[
{"customer":"Ada","amount":100},
{"customer":"Ada","amount":50},
{"customer":"Bob","amount":75}
]
Notes and limitations
- Inner only. No outer joins. For "all left, fill missing right with
null" you can hand-roll:
$.left.map(l => l.merge($.right.find(@.uid == l.id).or({role: null})) ) - Equality only. No range, prefix, or function joins.
- One key on each side. For multi-key joins, project a tuple key first:
$.left.map(l => l.merge({_k: [l.a, l.b]})) .equi_join($.right.map(r => r.merge({_k: [r.x, r.y]})), "_k", "_k") - The implementation builds a hash on the right side; left is streamed. Pre-sort or pre-filter before joining if either side is large and only a subset matters.
When to choose join vs. lookup
For "many left rows, lookup one field on each":
$.orders.map(o => o.merge({customer_name: $.customers.find(@.id == o.customer).name}))
This nested find is O(n×m) — fine for small data. For large data, use
equi_join (O(n+m)) or build a lookup table first:
let by_id = $.customers.index_by(@.id) in
$.orders.map(o => o.merge({customer_name: by_id[o.customer].name}))
Practical examples
# Enrich orders with customer info
$.orders.equi_join($.customers, "customer_id", "id")
# Custom result shape
$.orders.equi_join($.customers, "customer_id", "id", (o, c) => {
order_id: o.id,
total: o.amount,
buyer: c.name,
email: c.email
})
# Self-join: pair adjacent records via shared key
$.events.equi_join($.events, "session_id", "session_id", (a, b) => {a, b})
# Multi-key join via tuple projection
let lk = $.left.map(l => l.merge({_k: f"{l.a}-{l.b}"})) in
let rk = $.right.map(r => r.merge({_k: f"{r.x}-{r.y}"})) in
lk.equi_join(rk, "_k", "_k")
# Filter-then-join (drop rows before paying join cost)
$.orders.filter(@.status == "paid").equi_join($.customers, "cid", "id")
Chained Pipelines
Real-world queries assembled from the building blocks. Each recipe uses one small document and shows the query chain plus a sentence on what the planner does.
1. Top-N by aggregate
DOC: {"sales": [
{"region": "NA", "amount": 100},
{"region": "EU", "amount": 200},
{"region": "NA", "amount": 50},
{"region": "AS", "amount": 300},
{"region": "EU", "amount": 75}
]}
QUERY: $.sales
.group_by(@.region)
.entries()
.map(([region, rows]) => {region, total: rows.map(@.amount).sum()})
.sort(@.total)
.reverse()
.take(2)
OUT: [{"region":"AS","total":300},{"region":"EU","total":275}]
group_by and sort are barriers; take(2) after the sort doesn't help —
the sort must complete first. Push the demand earlier where possible.
2. Active users + role-based count
DOC: {"users": [
{"id":1,"role":"admin","active":true},
{"id":2,"role":"user","active":false},
{"id":3,"role":"user","active":true},
{"id":4,"role":"admin","active":true}
]}
QUERY: $.users
.filter(@.active)
.count_by(@.role)
OUT: {"admin":2,"user":1}
Streaming filter + barrier count_by. The filter passes only what's needed;
count_by buffers but with ValueNeed::Predicate (only the role key) — the
rest of the user object is never decoded.
3. Histogram of word frequency
DOC: {"text": "the quick brown fox jumps over the lazy dog the end"}
QUERY: $.text
.words()
.map(@.lower())
.count_by(@)
OUT: {"the": 3, "quick": 1, "brown": 1, ...}
4. Customer order summary
QUERY: $.orders
.group_by(@.customer_id)
.entries()
.map(([cid, orders]) => {
customer_id: cid,
total: orders.map(@.amount).sum(),
count: orders.count(),
recent: orders.sort(@.date).last().date
})
.sort_by(@.total)
.reverse()
The inner .sort(@.date).last() is wasteful: it sorts every group to grab
the last. Rewrite with max_by:
QUERY: ...
.map(([cid, orders]) => {
customer_id: cid,
total: orders.map(@.amount).sum(),
count: orders.count(),
recent: orders.max_by(@.date).date
})
5. Unique recent active sessions
QUERY: $.events
.filter(@.kind == "login" and .at >= "2026-01-01")
.map(@.user_id)
.unique()
.count()
6. Pretty-print a CSV from objects
QUERY: $.users
.filter(@.active)
.map(u => u.pick(id: id, name: full_name, email))
.sort(@.id)
.to_csv()
7. Find a needle in a deep document
QUERY: $..find(@.id == 42)
If the document was loaded from bytes (default), this hits the structural index — no full traversal.
8. Compute deltas with pairwise
DOC: {"prices": [100, 105, 102, 110, 108]}
QUERY: $.prices.pairwise().map(([a, b]) => b - a)
OUT: [5,-3,8,-2]
9. Rolling 3-point moving average
QUERY: $.measurements.rolling_avg(3)
The first two outputs are null until the window fills.
10. Build a lookup, then enrich
QUERY: let by_id = $.users.index_by(@.id) in
$.events.map(e => e.merge({user: by_id[e.user_id].name}))
index_by is a barrier that runs once; the .map streams.
11. Select rows with all required fields
QUERY: $.records.filter(r => r.missing("id", "name", "email").count() == 0)
12. Re-shape a long-format table
DOC: [
{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},
{"y":2025,"q":1,"v":15},{"y":2025,"q":2,"v":25}
]
QUERY: $.pivot("y", "q", "v")
OUT: {"2024":{"1":10,"2":20},"2025":{"1":15,"2":25}}
13. Mask sensitive fields
QUERY: $.users.map(u => u.omit("password", "ssn", "token"))
14. Delta + cumulative sum
DOC: {"daily":[{"value":10},{"value":15},{"value":12},{"value":20}]}
QUERY: $.daily
.pairwise()
.map(([a, b]) => b.value - a.value)
OUT: [5,-3,8]
For a running total, use accumulate:
DOC: {"amounts":[10,12,9]}
QUERY: $.amounts.accumulate(0, (total, x) => total + x)
OUT: [10,22,31]
15. Classify rows with match
DOC: {"books": [
{"title":"Dune","year":1965,"tags":["sf"]},
{"title":"Snow Crash","year":1992,"tags":["sf","cyberpunk"]},
{"title":"Foundation","year":1951,"tags":["sf","hugo"]}
]}
QUERY: $.books
.map(book => {
title: book.title,
era: match book with {
{year: y} when y < 1970 -> f"classic {y}",
{year: y} -> f"modern {y}",
_ -> "unknown"
},
tag_count: book.tags.count()
})
OUT: [
{"title":"Dune","era":"classic 1965","tag_count":1},
{"title":"Snow Crash","era":"modern 1992","tag_count":2},
{"title":"Foundation","era":"classic 1951","tag_count":2}
]
16. Latest active rows from NDJSON
jetrocli --ndjson -i users.topic --payload-after '|' -e '
$.rows()
.reverse()
.distinct_by(@.id)
.filter(@.active)
.take(100)
.map({
id: $.id,
name: $.profile.name,
city: $.profile.address.city
})
'
On a compacted Kafka-style file, reverse rows make the newest record for each
key appear first. distinct_by(@.id) keeps that first row and discards older
duplicates as soon as the key has been seen.
17. Patch several paths in one pass
DOC: {"books":[
{"title":"Dune","year":1965,"tags":["sf"],"tmp":true},
{"title":"Snow Crash","year":1992,"tags":["sf"],"tmp":true}
]}
QUERY: $.update({
books[*].tags: @.append("catalog"),
books[*].reviewed: true,
books[*].tmp: DELETE
})
OUT: {"books":[
{"title":"Dune","year":1965,"tags":["sf","catalog"],"reviewed":true},
{"title":"Snow Crash","year":1992,"tags":["sf","catalog"],"reviewed":true}
]}
The planner can batch compatible rooted writes so shared ancestors are cloned once and all writes under that prefix are applied together.
18. Migrate a document shape
Use walk when every nested object with a matching shape must be rewritten:
QUERY:
$.walk(node =>
node.merge({type: "v2"})
.rename({old_field: "new_field"})
.omit("legacy_blob")
if node is object and node.type == "v1" else node)
For query-local rewrites on known paths, prefer update(...); for broad shape
migration, walk makes the traversal explicit.
Pattern Match Cookbook
Fixture
Examples below run against:
DOC: {"xs": [1, 2, 3, 4, 5], "row": {"k": "foo", "data": {"a": 1, "b": 2}}, "doc": {"a": 1, "b": 2, "type": "v1"}, "tree": {"x": 1, "children": [{"x": 2}]}, "value": 3.14}
Pattern matching is one of jetro's most expressive features. It compiles to
a Maranget decision tree at lower-time and runs over all three execution
domains (Val, borrowed View, tape).
Anatomy
match scrutinee with {
pattern1 -> expr1,
pattern2 when guard -> expr2,
_ -> default
}
- Arms checked top-down.
- First match wins.
_is the universal fallback.whenguards run after the structural match succeeds.
Pattern reference
| Pattern | Matches |
|---|---|
42, "x", true, null | Equal literal |
_ | Anything |
name | Anything, binds to name |
1..10 | Number ≥ 1 and < 10 |
1..=10 | Number ≥ 1 and ≤ 10 |
{k: p, ...} | Object with key k, value matches p |
[p1, p2] | Array of length 2 |
[h, ...t] | Head + tail |
p1 | p2 | Either |
x: number | Kind-bind |
Object shorthand {id, name} binds each key to a same-name local. Rest
captures are spelled ...*rest for objects and ...tail for arrays:
{id, name, ...*rest}, [h, ...tail].
1. Discriminated union
match $.event with {
{kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
{kind: "key", code: c} -> f"key:{c}",
{kind: "scroll", dy: d} -> f"scroll:{d}",
_ -> "unknown"
}
Literal discriminants and shorthand captures can be mixed, so the click arm
could also be written as {kind: "click", x, y}.
2. Numeric ranges
match $.score with {
s when s < 0 -> "invalid",
0..50 -> "low",
50..80 -> "medium",
80..=100 -> "high",
_ -> "out of range"
}
3. Or-patterns
match $.day with {
"sat" | "sun" -> "weekend",
_ -> "weekday"
}
4. Object rest capture
match $.config with {
{host, port, ...*extra} -> {host, port, extra},
_ -> null
}
5. Array shape
match $.coords with {
[x, y] -> {x, y},
[x, y, z] -> {x, y, z},
_ -> null
}
6. Head + tail
match $.xs with {
[] -> "empty",
[first, ...rest] -> f"head={first}, count={rest.count()}",
}
7. Kind-bound + guard
match $.value with {
s: string when s.len() > 100 -> "long string",
s: string -> "short string",
n: number when n > 0 -> "positive",
n: number -> "non-positive",
_: array -> "array",
_ -> "other"
}
8. Deep match (..match)
Walk every descendant; collect results.
$.tree..match {
{kind: "leaf", value} -> value,
_ -> null
} | .compact()
The trailing .compact() drops the nulls from non-leaf descendants.
9. First-match deep (..match!)
Stops at the first match — the bang variant uses early termination via the structural index where possible.
$.tree..match! {
{role: "admin", id} -> id,
_ -> null
}
10. Migration / rewrite (rec)
$.doc.rec({type: "v1"}, node => node.merge({type: "v2"}))
rec is the recursive sibling of match — it descends and rewrites every
matching node.
11. Cross-arm sharing
When multiple arms test the same prefix ({kind: "x", ...},
{kind: "y", ...}), the lowering shares the discriminant test. You don't
write anything special — the planner does it for you. Practically: write
many narrow arms; they cost about as much as one big switch.
12. Guards over deep patterns
match $.row with {
{user: {age, role: "admin"}} when age >= 18 -> "adult admin",
{user: {age}} when age < 18 -> "minor",
_ -> "other"
}
Bench tips
- Patterns with literal-only discriminants (no guards) compile to switch-like
decision trees and run as fast as a hand-written
if/else if. - Guards add a per-arm conditional; cheap, but don't put expensive computation in them.
- Deep
..matchover a large doc benefits a lot from the structural index; deep..match!(first-match) is even better.
Kafka Compacted Topic Dumps
Kafka compacted topics keep the latest value for each key logically. A file dump can still contain older values earlier in the file:
user-a|{"id":"a","version":1,"name":"Ada"}
user-b|{"id":"b","version":1,"name":"Bob"}
user-a|{"id":"a","version":2,"name":"Ada Lovelace"}
user-c|null
Here user-c|null is a tombstone. With jetrocli, query only the JSON
payload after the separator and skip tombstones:
jetrocli --ndjson -i users.topic --payload-after '|' -e '$.id'
Latest N Unique Keys
Scan from the tail, keep the first row seen for each logical id, then project only the retained rows:
jetrocli --ndjson -i users.topic --payload-after '|' \
-e '$.rows()
.reverse()
.distinct_by($.id)
.take(100)
.map({id: $.id, version: $.version, name: $.name})'
Why this works:
$.rows()switches from row-local mode to one stream over the file.reverse()starts at the newest records.distinct_by($.id)keeps the first row per key in that reverse order.take(100)stops after 100 retained unique keys.map(...)shapes only the rows that survived selection.
Find One Recent Record
jetrocli --ndjson -i users.topic --payload-after '|' \
-e '$.rows().reverse().find($.id == "user-42").first()'
This can stop as soon as the newest matching record is found.
Keep Only Active Latest Records
Filter before de-duplication when the key should be unique among active rows:
jetrocli --ndjson -i users.topic --payload-after '|' \
-e '$.rows()
.reverse()
.filter($.active)
.distinct_by($.id)
.take(500)
.map({id: $.id, email: $.email})'
If tombstones carry important delete semantics for your workload, use
--null-payload keep and handle null explicitly. The default skip policy is
best when you only want live JSON payloads.
Write Fusion
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}
When a query contains multiple chain-writes, jetro fuses them into a single pass over the document. This is the patch-fusion optimizer.
What gets fused
Any sequence of chain-write terminals on the same document:
$.user.name.set("Ada")
.user.email.set("ada@x.com")
.user.tags.append("admin")
Or the equivalent block form (preferred for many writes):
patch $ {
user.name: "Ada",
user.email: "ada@x.com",
user.tags[*]: "admin"
}
Without fusion
Naively, three writes mean three traversals from $:
$ → user → name (write)
$ → user → email (write)
$ → user → tags[*] (write)
Each rebuilds the path from the root. For deeply-nested documents, the cost adds up.
With fusion
The optimizer collects effects, walks the document once, and applies all relevant rewrites at each visited node:
$ → user → {set name, set email, append tags}
Three writes, one walk.
Phases
The patch-fusion pass has internal phases (Phase C, Phase E in the source); the user-visible properties are:
- Same-base writes group together. Writes under
$.user.*batch. - Disjoint paths don't interfere. Writes to
$.user.nameand$.config.themeexecute in one walk but at different nodes. - Conflicts are resolved last-wins. Two writes to the same path: the later one wins.
- Conditional writes (
when) are evaluated per-write. They short-circuit per clause; the walk doesn't redo work.
Worked example
DOC:
{
"users": [
{"id": 1, "name": "Ada", "active": false},
{"id": 2, "name": "Bob", "active": true}
]
}
QUERY:
patch $ {
users[*].active: true, # broadcast write
users[0].name: "Ada Lovelace", # specific write
users[*].last_seen: "2026-05-08" when .active # conditional broadcast
}
What happens:
- One walk visits every user.
- For each, three potential writes evaluate. Per element:
active: truealways applies.nameonly at index 0.last_seenonly when post-activewrite is true (so all of them).
Output:
{
"users": [
{"id": 1, "name": "Ada Lovelace", "active": true, "last_seen": "2026-05-08"},
{"id": 2, "name": "Bob", "active": true, "last_seen": "2026-05-08"}
]
}
When fusion doesn't fire
- The chain isn't rooted at
$(parser doesn't classify it as a write). - The writes are gated by data-dependent conditions that change document shape mid-pipeline.
- Mixed read/write —
$.users[0].name.set("A").upper()keeps standard method semantics.
Tips
- Prefer the block form (
patch $ { … }) when you have ≥ 3 writes — easier to read, and the optimizer treats it identically. - Use broadcast (
xs[*].field: v) instead of a.mapthat calls.setper element. - Conditionals (
when) are fine — they don't break fusion.
jq vs jetro Cheatsheet
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}]}
For users coming from jq. Same shape: query JSON in a terminal. Different
philosophy in places — call this out where it matters.
In the CLI, use -e for direct expression execution:
jetrocli -e '$.users.filter($.active).map($.email)' < users.json
jetrocli --ndjson -i events.ndjson -e '$.id'
Big differences at a glance
| Topic | jq | jetro |
|---|---|---|
| Calling methods | Pipe-of-filters: . | length | Dot syntax: .len() |
Pipe | | Sole composition operator | Value-flow only — passes @ to RHS |
| Iteration | Implicit on .[] | Explicit on chained methods |
| Lambdas | None — uses . rebinding | Three forms: @, r =>, lambda r: |
| Pattern matching | None | First-class with guards and ranges |
| Writes | |=, =, del() | .set(), patch $ {}, chain-writes |
| Backend | Single interpreter | Six backends, planner-selected |
| Caching | None | Plan + path caches in JetroEngine |
Jetro favors functional method chains over jq's pipe-of-filters style:
$.users
.filter($.active)
.map({id: $.id, email: $.email})
.take(100)
One-liner translations
Identity / projection
jq: .
jetro: $
jq: .x
jetro: $.x
jq: .x.y[0]
jetro: $.x.y[0]
Iteration
jq: .users[]
jetro: $.users[*] # explicit; or just .users for chained methods
jq: .users[].name
jetro: $.users.map(@.name)
Field selection / projection
jq: {id, name}
jetro: .pick(id, name) # method form, maps over arrays
jq: .users | map({id, name})
jetro: $.users.map(u => u.pick(id, name))
# or
$.users.pick(id, name)
jq: del(.password)
jetro: $.omit(password) # or $.password.delete()
Filter
jq: .users | map(select(.active))
jetro: $.users.filter(@.active)
jq: .users[] | select(.age > 18)
jetro: $.users.filter(@.age > 18)
Aggregates
jq: length
jetro: .len() # for arrays, objects, strings
.count() # explicit array-count reducer
jq: [.[] | .price] | add
jetro: $.map(@.price).sum()
jq: [.[] | .age] | min
jetro: $.map(@.age).min()
# or
$.min_by(@.age).age # one-pass, returns whole element
Sort / unique / group
jq: sort
jetro: .sort()
jq: sort_by(.year)
jetro: .sort(@.year)
jq: unique
jetro: .unique()
jq: group_by(.author)
jetro: .group_by(@.author)
# jq returns array-of-arrays; jetro returns object indexed by key
jq: [group_by(.k)[] | {k: .[0].k, n: length}]
jetro: .count_by(@.k).entries().map(([k,n]) => {k, n})
Slice and take
jq: .[0:3]
jetro: $[0:3]
jq: .[0]
jetro: $[0]
# or
$.first() # demand-aware sink
jq: .[-1]
jetro: $[-1]
# or
$.last()
Has / index / membership
jq: has("foo")
jetro: .has("foo")
jq: .tags | index("admin")
jetro: $.tags.index("admin")
jq: .tags | contains(["admin"])
jetro: $.tags.includes("admin")
Strings
jq: ascii_upcase
jetro: .upper()
jq: ltrimstr("foo")
jetro: .strip_prefix("foo")
jq: split(",")
jetro: .split(",")
jq: test("regex")
jetro: @ ~= "regex"
# or
.re_match("regex")
jq: match("(\\d+)").captures
jetro: .captures("(\d+)")
Recursive descent
jq: ..
jetro: .. # same notation
jq: .. | strings
jetro: $..find(@ is string)
jq: .. | objects | select(.id?)
jetro: $..find(@.id?)
# or
$..shape({id})
String formatting
jq: "Hello, \(.name)!"
jetro: f"Hello, {$.name}!"
Conditional
jq: if .x > 5 then "big" else "small" end
jetro: "big" if $.x > 5 else "small"
jq: .x // "default"
jetro: $.x ?? "default"
Variables
jq: . as $doc | $doc.x + $doc.y
jetro: let doc = $ in doc.x + doc.y
Reduce / fold
jq: reduce .[] as $x (0; . + $x)
jetro: $.sum() # for sum specifically
# or general fold:
$.accumulate(0, (a, x) => a + x).last()
Object construction
jq: {users: [.[] | {id, name}]}
jetro: {users: $.map(u => u.pick(id, name))}
Modification
jq: .x = 1
jetro: $.x.set(1)
# or
patch $ {x: 1}
jq: .x |= . + 1
jetro: $.x.modify(@ + 1)
jq: del(.x)
jetro: $.x.delete()
jq: .users[].active = true
jetro: $.users[*].active.set(true)
# or
patch $ {users[*].active: true}
Multiple writes
jq: .x = 1 | .y = 2 | del(.z)
jetro: patch $ {x: 1, y: 2, z: DELETE}
jetro fuses these into one document walk. jq evaluates each pipe stage independently.
NDJSON
jq: jaq -c '.id' events.ndjson
jetro: jetrocli --ndjson -i events.ndjson -e '$.id'
For whole-file stream operations, use $.rows():
jq: tac events.ndjson | jaq -c 'select(.level == "error"), halt'
jetro: jetrocli --ndjson -i events.ndjson \
-e '$.rows().reverse().find($.level == "error").first()'
For Kafka compacted-topic dumps:
jetrocli --ndjson -i users.topic --payload-after '|' \
-e '$.rows().reverse().distinct_by($.id).take(100)'
Complex pipeline translations
Real-world jq queries from the wild. Originals are taken verbatim from the jq manual and the Programming Historian "Reshaping JSON with jq" lesson; all credit to those sources. Each shows the original jq alongside an idiomatic jetro rewrite.
1. Alternative-binding destructure (jq manual)
Flatten a list of resources whose events field may be either a single
object or an array of objects, into one row per (resource, event) pair.
jq uses its alternative-destructuring operator ?// to try both shapes:
.resources[] as {$id, $kind, events: {$user_id, $ts}} ?// {$id, $kind, events: [{$user_id, $ts}]}
| {$user_id, $kind, $id, $ts}
jetro has no ?//. Use kind-test + flat_map to normalise:
$.resources.flat_map(r =>
let evts = (r.events if r.events is array else [r.events]) in
evts.map(e => {
user_id: e.user_id,
kind: r.kind,
id: r.id,
ts: e.ts
})
)
…or with a match to make the two shapes explicit:
$.resources.flat_map(r =>
match r.events with {
arr: array -> arr.map(e => {user_id: e.user_id, kind: r.kind, id: r.id, ts: e.ts}),
{user_id, ts} -> [{user_id, kind: r.kind, id: r.id, ts}],
_ -> []
}
)
The match form is more explicit and surfaces the "single object" branch as
its own arm — easier to extend (e.g. add a third event-shape later).
2. Tweet hashtags as semicolon-joined CSV (Programming Historian)
Take an array of tweets, project id plus a semicolon-joined string of
hashtag texts, emit as CSV. Original jq, threaded through five pipe stages:
{id: .id, hashtags: .entities.hashtags}
| {id: .id, hashtags: [.hashtags[].text]}
| {id: .id, hashtags: .hashtags | join(";")}
| [.id, .hashtags]
| @csv
Each pipe stage rebuilds the object — jq has no nested method chaining, so projection accumulates by reassignment.
jetro collapses it to one chain:
$.map(t => {
id: t.id,
hashtags: t.entities.hashtags.map(@.text).join(";")
}).to_csv()
to_csv already emits the row, headers and all. To match jq's headerless
output:
$.map(t => [t.id, t.entities.hashtags.map(@.text).join(";")])
.map(row => row.map(@.to_string()).join(","))
.join("\n")
3. Hashtag frequency CSV (Programming Historian)
Explode each tweet into one row per hashtag, group by hashtag, count, emit
(tag, count) as CSV. Original jq:
[.[] | {id: .id, hashtag: .entities.hashtags} | {id: .id, hashtag: .hashtag[].text}]
| group_by(.hashtag)
| .[]
| {tag: .[0].hashtag, count: . | length}
| [.tag, .count]
| @csv
jq's group_by returns an array-of-arrays, so the trailing .[] and
.[0].hashtag extract the key from the first element of each group.
jetro uses count_by, which already produces a {tag: count} map:
$.flat_map(t => t.entities.hashtags.map(@.text))
.count_by(@)
.entries()
.map(([tag, count]) => {tag, count})
.to_csv()
The pipeline reads top-to-bottom: explode → tally → reshape → emit.
count_by is one of several jetro idioms (also index_by, unique_by,
max_by) that fold a common jq pattern (group_by | map(...)) into a
single barrier.
Why these examples are shorter in jetro
Three patterns recur:
- Method chaining. jq's
... | {...} | {...}style rebuilds the object at each stage; jetro's.map(t => {...})builds it once. - Specialised barriers.
count_by,index_by,unique_by,max_by,min_bycollapsegroup_by | map(...)chains into one call. - First-class lambdas. jq's
.rebinding insideas/[]becomes plaint => t.fieldin jetro, with no positional gymnastics.
The trade-off: jq's pipe-of-filters is more uniform — every stage is a filter that takes one input and produces zero-or-more outputs. jetro's methods are typed (one-to-one, filter, expander, reducer, barrier), so the pipeline shape is more visible but the surface is bigger.
Things jq has that jetro doesn't
@base64,@uri,@csvformatters as suffix. jetro spells these as methods:.to_base64(),.url_encode(),.to_csv().- SQL-style modules. No equivalent.
input,inputs,nul-separated streaming. jetro is in-process; no streaming-input model.recurse(f; cond). Usewalk_preorrecwith a pattern.
Things jetro has that jq doesn't
- Pattern matching with guards, ranges, kind binding, deep
..match. - Demand propagation.
.first(),.find(),.take(n)cut off the source; no full materialization. - Bitmap structural index.
..find,..shape,..likeskip non-matching subtrees in O(1) per node. - First-class lambdas (
r => body,lambda r: body) with let-binding + inlining. - Write fusion. Many writes batch into one walk.
- Backends. Tape-zero-copy, structural index, columnar — selected by the planner.
Pitfalls when porting
.[]doesn't exist. Replace with[*]or just chain methods (most jetro methods auto-iterate over arrays).- Pipe is not composition.
.x | .yin jq means "x then y". In jetro it's "evaluate.ywith@=.x". For chaining methods, use.:.x.y(). - Method calls need parens.
lengthis.len(), not.len. select(p)becomesfilter(p), and works on whole arrays — no need to first iterate with.[].- Group_by returns an object, not an array of arrays. Use
.entries()for jq-shaped output.
Quick reference card
| Need | jq | jetro |
|---|---|---|
| Project | {a, b} | .pick(a, b) |
| Drop key | del(@.k) | .omit(k) |
| Filter | select(p) | .filter(p) |
| Map | map(f) | .map(f) |
| Iterate | .[] | [*] or implicit |
| Length | length | .len() |
| Sort | sort_by(@.k) | .sort(@.k) |
| Unique | unique | .unique() |
| First | .[0] | .first() |
| Last | .[-1] | .last() |
| String concat | "\(@.x)" | f"{$.x}" |
| Default | // d | ?? d |
| If | if c then a else b end | a if c else b |
| Var | as $x | let x = ... |
| Set | .x = v | .x.set(v) |
| Update | .x |= f | .x.modify(f) |
| Delete | del(@.x) | .x.delete() |
NDJSON and Whole-Stream Queries
jetrocli --ndjson reads newline-delimited JSON from a file: one JSON
document per physical line, one compact JSON result per output line.
Use -e to run an expression directly and stay out of the interactive TUI:
jetrocli --ndjson -i events.ndjson -e '$.id'
jetrocli --ndjson -i events.ndjson -e '$.user.name.upper()'
jetrocli --ndjson -i events.ndjson -e '$.attributes.first().value'
This row-local mode evaluates the expression independently for each line. It is the fastest path for projections, scalar transforms, small array operations, and filters that do not need to coordinate across rows.
Payload Framing
Many log and Kafka dump formats store metadata before the JSON payload:
customer-42|{"id":42,"name":"Ada","active":true}
customer-17|null
Use --payload-after to query only the JSON payload after a one-byte
separator:
jetrocli --ndjson -i topic.ndjson --payload-after '|' -e '$.id'
Literal null payloads are tombstones in many Kafka compacted topics. They
are skipped by default:
jetrocli --ndjson -i topic.ndjson \
--payload-after '|' \
-e '$.name'
The null policy is configurable:
jetrocli --ndjson -i topic.ndjson \
--payload-after '|' \
--null-payload keep \
-e '$'
$.rows() Whole-Stream Mode
Use $.rows() when the expression should operate on the whole file as one
stream instead of running independently per line:
jetrocli --ndjson -i events.ndjson \
-e '$.rows().filter($.active).take(10).map({id: $.id, name: $.name})'
The expression is now a stream program:
- read rows from the NDJSON source
- filter active rows
- keep the first ten retained rows
- project only those rows
No extra CLI flags are needed for filtering, limiting, mapping, or de-duplication.
Reverse Streams
For file inputs, $.rows().reverse() scans from the end of the file:
jetrocli --ndjson -i app.log \
-e '$.rows().reverse().find($.level == "error").first()'
This is useful for append-only logs and Kafka compacted-topic dumps where the newest record for a key is physically last.
Latest Record Per Key
Kafka compacted topics keep the newest value for each key logically, but a dump file can still contain older values earlier in the file. Scan backward and keep the first row seen per key:
jetrocli --ndjson -i users.ndjson --payload-after '|' \
-e '$.rows()
.reverse()
.distinct_by($.id)
.take(100)
.map({id: $.id, name: $.name, updated_at: $.updated_at})'
For rows:
{"id":"a","version":1}
{"id":"b","version":1}
{"id":"a","version":2}
the reverse distinct stream sees a@2 first, then b@1, and discards
a@1.
Performance Expectations
On the 1 GB benchmark used by jetrocli, simple row-local projections are
usually tens of times faster than jaq; the best direct byte paths are near
100x faster. Whole-stream $.rows() queries keep the same mmap and direct
byte/tape foundation, but total time depends on how much of the file must be
inspected.
Fastest shapes:
jetrocli --ndjson -i big.ndjson -e '$.name'
jetrocli --ndjson -i big.ndjson -e '$.attributes.first().value'
jetrocli --ndjson -i big.ndjson \
-e '$.rows().reverse().find($.name == "user_355617").first()'
Naturally heavier shapes:
jetrocli --ndjson -i big.ndjson \
-e '$.rows().filter($.active).distinct_by($.id).map({id: $.id, name: $.name})'
Those must inspect many rows and maintain stream state. They should still avoid unnecessary materialization, but they cannot be as cheap as a direct single-field projection.
Normal JSON Documents
$.rows() is not NDJSON-only. On a normal JSON document, it treats the
document itself as one row:
DOC: {"id":1}
QUERY: $.rows().map($.id)
OUT: [1]
Top-level arrays are one document row in normal JSON mode; use normal array
methods directly when the input document is an array. In NDJSON mode,
$.rows() means the whole input stream.
Performance Guide
Fixture
Examples below run against:
DOC: {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "rows": [{"age": "30", "price": "3.14"}]}
How to write jetro queries that the planner can run fast, and how to read the benchmarks.
Jetro is optimized for cold, file-backed workloads as well as long-lived embedded engines. The fastest paths avoid building full JSON trees: they read raw bytes, simd-json tape, or borrowed views and materialize only the requested result.
Mental model
Jetro picks one of six backends per pipeline node. Fast paths share three properties:
- The source is a path of pure field accesses.
$.a.b.ctriggers tape backends (zero-copy over simd-json output). - The pipeline ends in a sink that bounds demand.
.first(),.take(n),.find(p),.count()propagate backward and gate source reads. - No mid-pipeline materialization.
.collect(),.sort(),.group_by()flush the tape access pattern back to aValwalk.
If you write to those three rules, queries land on the fast path automatically.
Backend selection (cheat-sheet)
| Source / shape | Primary backend |
|---|---|
$.a.b.c (field-chain) | tape-view (zero-copy) |
$..find(...), $..shape({...}) | bitmap structural index |
Single $.a.b (path only) | tape-path |
| Generic expr / lambda body | fast-children |
| NDJSON direct projection | byte/tape writer |
$.rows().filter(...).take(n) over a file | demand-aware row stream, sometimes partitioned |
| Any backend declines | interpreted (universal fallback) |
You don't pick — the planner does. Knowing the table tells you why a query is fast.
Demand: the killer feature
Every Demand-aware sink lets the source skip work. Concrete impact:
| Pattern | Speedup vs. naive |
|---|---|
xs.first() | ~N× (reads 1 element) |
xs.find(p) | up to ~N× (stops at first match) |
xs.filter(p).take(k) | up to N/k× |
xs.count() | 2-5× (no payload decoded) |
xs.sum(), xs.avg() | 2-3× (only numeric leaves) |
xs.last() (random-access source) | ~N× (seek to end) |
xs.reverse().take(k) | rewritten to LastInput(k) |
For wide objects, field projection is the other big win:
$.users.map(u => u.pick(id, name))
The source decodes only id and name per row. Other fields stay as raw
tape tokens.
NDJSON cold path
In jetrocli --ndjson, a row-local expression runs once per line:
jetrocli --ndjson -i big.ndjson -e '$.name'
jetrocli --ndjson -i big.ndjson -e '$.attributes.first().value'
The best row-local shapes are direct byte/tape plans. They can project fields, evaluate simple scalar calls, and write compact JSON output without converting the whole row to an owned tree.
On the 1 GB jetrocli benchmark, expect:
| Shape | Typical expectation vs jaq |
|---|---|
| Root field projection, string scalar calls | Tens of times faster; best cases near 100x |
| Nested first/last field access | Usually tens of times faster |
| Small array map/projection | Strong, but bounded by output bytes |
| Filtered nested array reductions | Strong when predicates stay direct |
| Large derived arrays or fallback lambdas | Slower; more allocation and VM work |
Use $.rows() when the query needs whole-file stream state:
jetrocli --ndjson -i events.ndjson \
-e '$.rows().filter($.active).take(100).map({id: $.id, name: $.name})'
For append-only logs and Kafka compacted-topic dumps, reverse streams can stop near the tail:
jetrocli --ndjson -i topic.ndjson --payload-after '|' \
-e '$.rows().reverse().distinct_by($.id).take(1000)'
The important distinction is how much input must be inspected. take(10) and
tail-first find(...) can stop early. Broad filter, distinct_by, or
fallback expressions may need to inspect the full file, even though they still
avoid avoidable materialization.
What kills performance
Mid-chain materialization
$.users
.filter(@.active)
.collect() # unnecessary
.map(@.email)
The .collect() forces a full pass before .map. Drop it.
Pre-sort barriers blocking demand
$.events.sort(@.ts).first()
.sort is a barrier — must see every element. The .first() doesn't help.
Rewrite with min_by:
$.events.min_by(@.ts)
One pass, no allocation of the sorted array.
Per-element joins (O(n×m))
$.orders.map(o => o.merge({name: $.users.find(@.id == o.user_id).name}))
Each find rescans $.users. For large data, build a lookup once:
let by_id = $.users.index_by(@.id) in
$.orders.map(o => o.merge({name: by_id[o.user_id].name}))
Or use equi_join.
Repeated sub-expressions
$.user.profile.name + " <" + $.user.profile.email + ">"
Three tape walks. Bind once:
let p = $.user.profile in
f"{p.name} <{p.email}>"
Heavy lambdas in barriers
$.rows.unique_by(@.to_string())
unique_by calls the lambda once per row. If the projection is
non-trivial (regex, deep traversal), pre-project once:
$.rows.map(r => r.merge({_k: r.to_string()}))
.unique_by(@._k)
.map(@.omit(_k))
Engine tuning
Plan cache
JetroEngine caches (query, context) → compiled pipeline. Default 256
entries, wholesale eviction.
For a small fixed query set with high doc volume — the typical web-server shape — every call after the first is a cache hit. Don't fight it.
For unique-per-call queries (CLI ad-hoc), the cache is a no-op; just use
Jetro directly.
Path cache
The VM caches resolved pointer paths per document. The hash key includes both structure and primitive values bounded at depth 8 — so two docs with the same shape but different leaves stay distinct. You don't manage this.
simd-json (default)
The simd-json feature gives ~4× cold-start. Disable only if you need to
round-trip serde_json::Value and the conversion cost dominates.
Benchmarks
cargo bench -p jetro-core
The harness covers:
- Field access (
$.a.b.c) — tape-view zero-copy - Filter / map / take pipelines — demand propagation
- Deep search (
..find,..shape) — bitmap structural index - Pattern match — Maranget tree
- Lambda forms —
@vs.=>vs.lambdaparity - Write fusion — single vs. fused multi-writes
To compare your changes against main:
git checkout main
cargo bench -p jetro-core -- --save-baseline main
git checkout your-branch
cargo bench -p jetro-core -- --baseline main
Reading the output: criterion reports geometric mean ratios. >5% regression should have a clear cause.
Profiling
For Rust workloads:
cargo bench -p jetro-core --bench <name> -- --profile-time 10
Then attach with samply or cargo flamegraph. Hot paths usually live in:
exec/pipeline/exec.rs— pipeline driverexec/view/*.rs— borrowed view stagesexec/router.rs— backend selectionvm/exec.rs— bytecode VM (interpreted fallback)
If the interpreter (vm::execute) shows up hot, the planner is falling
through to the universal fallback. Check the query — usually a non-$
source or a generic expr inside a method arg.
Quick checklist
Before benchmarking a query, ask:
-
Can
.first()/.take()/.find()replace a full materialization? -
Is there a barrier (
sort,unique,group_by) before the bound? Push the bound earlier or use a one-pass equivalent (min_by,count_by). -
Does a lookup repeat per row? Pre-build with
index_by. -
Are wide rows projected early with
pick? -
Are sub-expressions duplicated? Bind with
let. -
Is
simd-jsonenabled (default)? -
Is the same query run many times? Use
JetroEngine.
If all yes, the query is on the fast path.
Public API and Engine
The full public surface of the jetro crate is two types and a handful of
methods. Everything else is implementation detail.
Jetro — single-document handle
For one document, possibly many queries:
use jetro::Jetro;
let bytes = br#"{"x":[1,2,3]}"#;
let j = Jetro::from_bytes(bytes)?; // lazy parse via simd-json tape
let v: serde_json::Value = j.collect("$.x.sum()")?;
assert_eq!(v, serde_json::json!(6));
Constructors
| Method | Input | Notes |
|---|---|---|
Jetro::from_bytes(&[u8]) | Raw JSON bytes | Lazy parse — fastest path |
Jetro::from_value(serde_json::Value) | Parsed value | Skip simd-json |
Jetro::from_val(Val) | Internal Val | Advanced — re-using engine state |
Methods
| Method | Returns |
|---|---|
j.collect(query) | Result<serde_json::Value, EvalError> |
j.collect_typed::<T>(query) | Result<T, EvalError> (deserialize directly) |
Jetro owns its per-document lazy state: raw bytes, tape/value caches, object
vector promotion cache, and an instance VM used for fallback execution. It is
cheap to construct for a document and can answer many queries over the same
bytes without reparsing.
JetroEngine — long-lived multi-doc handle
For many documents and many queries with overlap, share the plan/VM caches:
use jetro::JetroEngine;
let eng = JetroEngine::default();
for doc_bytes in inputs {
let v = eng.collect_bytes(doc_bytes, "$.users.filter(@.active).count()")?;
println!("{}", v);
}
Methods
| Method | Input | Notes |
|---|---|---|
eng.collect(&doc, q) | &Val | Document already in Val form |
eng.collect_value(serde_value, q) | serde_json::Value | Round-trips |
eng.collect_bytes(&[u8], q) | Raw bytes | Lazy parse |
eng.run_ndjson(...) | Reader, query, writer | Row-local NDJSON execution |
eng.run_ndjson_file(...) | File path, query, writer | File-backed NDJSON, including $.rows() stream mode |
eng.run_ndjson_source(...) | Reader or file source | Dispatches reader/file behavior explicitly |
Returns Result<serde_json::Value, JetroEngineError> — a wider error type
that may also wrap JSON-parse errors.
NDJSON options
NDJSON helpers accept NdjsonOptions variants for file and reader workloads:
| Option | Effect |
|---|---|
row_frame | Plain JSON lines or delimited payloads such as `key |
null_output | Skip or emit expression results that are JSON null |
parallelism | Automatic or disabled partition execution for eligible file streams |
parallel_min_bytes | Minimum file size before parallel partitions are considered |
max_line_len | Per-line safety cap |
reverse_chunk_size | Reverse file-reader chunk size |
Expression-level $.rows() switches NDJSON from row-local execution to a
whole-source stream plan. On files, $.rows().reverse() uses reverse file
traversal; reader-backed reverse streams return a clear unsupported-source
error.
Configuration
| Option | Default | Effect |
|---|---|---|
| Plan-cache capacity | 256 | Wholesale-evicted when full |
The engine's plan cache amortises parse + lower + compile across calls. Hits are O(hash); misses do full work.
Errors
pub enum EvalError {
/* … */
}
pub enum JetroEngineError {
Json(serde_json::Error),
Eval(EvalError),
}
Error messages include the query position when available.
Feature flags
| Feature | Default | What it does |
|---|---|---|
simd-json | on | Direct bytes → Val parse, skipping serde_json::Value |
fuzz_internal | off | Re-exports parser + planner for fuzz harness — not stable |
To disable simd-json:
[dependencies]
jetro = { version = "0.5.11", default-features = false }
Python binding
jetro_py exposes a collect(doc, query) function. Internals are identical
to the Rust crate.
import jetro
result = jetro.collect({"x": [1,2,3]}, "$.x.sum()")
# result == 6
CLI
jetrocli -e '$.x.sum()' < input.json
jetrocli --ndjson -i events.ndjson -e '$.rows().take(10)'
The CLI is a thin wrapper around the Rust APIs, with -e selecting
non-interactive expression execution.
Threading
Jetrois intended as a document handle. Prefer one handle per document owner; useJetroEnginefor shared multi-document workloads.JetroEngineisSend + Syncand intended for shared-engine workloads.- The engine owns shared plan/VM caches so repeated queries over many documents avoid parse/lower/compile cost.
Stability
- The query DSL is stable as of jetro 0.5.x.
- The Rust API surface (
Jetro,JetroEngine, error types) is stable. BuiltinMethod, opcodes, IR types are internal and may change in any minor release.- The
fuzz_internalfeature is explicitly unstable.
Known Limitations and Behavior Notes (0.5.11)
This page documents current boundaries and intentional language choices for jetro 0.5.11. It is not a bug graveyard: fixed audit items have moved back into their normal reference pages.
Current Boundaries
$.rows() is a root stream source
$.rows() starts a source-level stream. In NDJSON mode it means "all rows in
the file or reader"; in normal JSON mode it means "the top-level array
elements" or one row for an object/scalar.
Supported:
$.rows().filter($.active).take(10)
$.rows().reverse().distinct_by($.id).take(100)
Not yet supported:
$.books.rows().take(10)
Nested stream sources need a separate design because they mix document-local arrays with source-level IO and reverse traversal.
Reader-backed reverse NDJSON is unsupported
$.rows().reverse() needs a seekable file-backed source. It works with
run_ndjson_file, NdjsonSource::file, and jetrocli --ndjson -i file.
Reader-backed NDJSON sources return a clear error instead of materializing the
whole stream implicitly.
Row-stream operators are deliberately small
Current $.rows() stream mode supports the operators needed for retained-row
workloads:
reverse()filter(pred)find(pred)/find_first(pred)/find_one(pred)distinct_by(key)take(n)/first()map(expr)
Operators such as sort, group_by, windows, joins, and multi-source
streaming are normal array/document operators, but not yet source-level
$.rows() stages.
Parallel NDJSON is selective
File-backed row-stream partitioning is automatic only for plans where it is
expected to help. For example, selective filter(...).take(n) can benefit
from partitioned scanning. Plain map(...).take(n) stays sequential because
it can stop after the first n rows without scanning unrelated partitions.
Public observability is still minimal
The engine records internal rows-stream stats for tests and future explain
output, but 0.5.11 does not expose a stable public explain() API yet.
Intentional Language Choices
No in operator
in would conflict with let x = y in z and for x in xs. Use has,
includes, or has_key:
$.tags.includes("urgent")
$.user.has_key("email")
$.users has {id: 1}
has, has_key, includes, and has_path differ
| Form | Meaning |
|---|---|
obj.has_key("k") | Object key exists |
obj.has("k") | Key/index style existence helper |
xs.includes(v) | Value membership |
doc.has_path("a.b") | Path exists in a nested structure |
x has y | Membership/containment operator sugar |
Use has_key when you specifically want an object-key check.
replace is single-occurrence
.replace(needle, with) replaces only the first match. Use replace_all
for every occurrence:
"hello hello".replace("hello", "hi") # "hi hello"
"hello hello".replace_all("hello", "hi") # "hi hi"
Comments are outside the query language
Jetro expressions do not contain comments. Keep query comments in the host language, shell script, or documentation.
Safety Limits
rec(fn) has an iteration cap
rec(fn) runs until a deep structural fixpoint. If the function never
converges, jetro stops at the iteration cap and reports an error. Prefer
rec(fn, cond) when the loop has an explicit bound.
$.state.rec(step, done)
NDJSON line size is bounded
NDJSON readers enforce a per-line byte cap to avoid unbounded memory use on
malformed input. Tune it with NdjsonOptions or the CLI flag when processing
legitimately huge rows.
Version Note
This page reflects jetro 0.5.11. If a page elsewhere still carries an older audit note, prefer this page and the current builtin reference.
Glossary
Backend. One of the execution paths the planner can route a node
through: Structural, TapeView, TapeRows, TapePath, ValView,
MaterializedSource, FastChildren, Interpreted. Selected automatically
based on shape and capabilities.
Barrier. A stage that must see all input before emitting output. sort,
unique, group_by, window, etc.
Bitmap structural index. A bit-packed index over the simd-json tape that
lets ..find, ..shape, ..like, and ..match skip non-matching subtrees
in O(1) per node. Used when the document is loaded with the simd-json tape
(default).
Borrowed view. A ValueView — a read-only borrowed reference into a
parsed document. Zero-copy substrings via Val::StrSlice.
Builtin. One of the 181 methods in jetro's catalog. Each is one
impl Builtin for X block in defs.rs with identity, demand law, and
runtime layers co-located.
Chain-write. A query ending in a write terminal (.set, .modify,
.delete, .unset, .merge, .deep_merge, .append, .prepend) on a
rooted path. Rewritten to Expr::Patch by the parser.
Composed stage. A Composed<A, B> pair that fuses two adjacent stages
into one virtual call per element.
Demand. The triple (pull, value, order) describing what an operator
needs from its source. See Demand Propagation.
Demand law. The rule by which a builtin transforms downstream demand
into upstream demand. Encoded in the builtin's BuiltinDemandLaw.
Effect lifting. The patch-fusion pass that batches multiple chain-writes into a single document walk.
Engine. A JetroEngine — a long-lived handle that caches parsed and
compiled queries for reuse across documents.
F-string. f"text {expr}" — string with embedded expression
interpolation.
Field chain. A path of pure field accesses, e.g. $.a.b.c. Recognised
by the planner and routed to fast tape backends.
Jetro. Single-document handle. Jetro::from_bytes(bytes)?.collect(q).
JetroEngine. Multi-document handle with plan/VM caches.
Lambda. A small function value: @, r => body, lambda r: body. All
three forms compile identically.
Maranget tree. The decision-tree compilation strategy used for pattern matching. Cross-arm sharing of common discriminant tests.
Patch. The internal write operation. Generated by both patch $ { … }
blocks and chain-write classification.
Patch fusion. The optimizer pass that batches multiple writes into a single walk.
Pipeline. The streaming execution model: Source → Stage* → Sink. One
element at a time.
Plan / Logical Plan. Tree-shaped IR between AST and bytecode. Lives in
ir/logical.rs.
Plan cache. A cache in JetroEngine that maps (query, context) to a
compiled Pipeline. Default capacity 256.
Pull demand. The first lane of Demand: how many inputs must be read.
Variants: All, FirstInput(n), LastInput(n), NthInput(i),
UntilOutput(n).
Quantifier. A postfix operator on a path step. ? = optional,
! = exactly-one.
Sink. The terminal stage of a pipeline. Reducers, positional, and implicit collectors.
Source. The first stage of a pipeline. Usually a path or array literal.
Streaming. Per-element execution; no buffering.
Tape. The simd-json output: a flat array of tokens describing structural positions in the JSON byte buffer. Used for zero-copy access.
Val. The internal value type. Arc-wrapped compound nodes ensure cheap
clones.
Value need. The second lane of Demand: how much of each row's content
is required. Variants: None, Predicate, Projection, Numeric,
Whole.
View. A ValueView — borrowed read-only access to a value.
VM. The bytecode executor. Used as the universal fallback backend; also provides the path-cache.
Write fusion. Same as patch fusion. See above.