Introduction

Jetro is a JSON processor which provides query, transform, and patching, written in Rust. It parses a small dot-syntax DSL, plans the query through a multi-tier optimizer, and routes each subtree to whichever execution backend will run it fastest: zero-copy borrowed views over a simd-json tape, a bitmap structural index, a streaming pull pipeline, or the universal interpreted fallback.

Jetro's shape is deliberately different from a small jq clone. Method chains compose with lambdas, pattern matching, f-strings, reducers, and document updates inside one expression language. It also has distinctive features such as demand propagation, more often associated with lazy languages such as Haskell, so sinks like first, last, and take(n) can change how much upstream work is performed. For mutation-heavy workflows, update can batch compatible path rewrites into one document patch instead of forcing callers to round-trip through host-language object editing.

jetrocli -e '$.services.filter(@.enabled).map({name: @.name, p95: @.latency_ms})' < services.json
[
  {"name":"api","p95":42},
  {"name":"worker","p95":85}
]

If you have used jq, Jetro will feel familiar but takes a different shape: it is method-chain oriented, closer to the collection APIs most application developers already use.

$.services
  .filter(@.enabled)
  .sort_by(-latency_ms)
  .take(1)
  .map({service: name, alert: errors > 5})

That query reads like the code you would otherwise write by hand: keep enabled services, sort by latency, keep the slowest one, return only the fields the next system needs.

Why Developers Reach For Jetro

Use Jetro when the shape of the data matters more than the ceremony around it:

  • Inspect production JSON without writing a script. Pull out the one field, row, group, or summary you need from a real payload.
  • Embed dynamic transformations. Let users, pipelines, or config files define data-shaping rules without recompiling your service.
  • Normalize API and event payloads. Filter, project, rename, aggregate, and label JSON before it crosses a boundary.
  • Patch documents deliberately. Use update expressions for migrations, fixture generation, and config rewrites.
  • Process NDJSON files from the terminal. Run row-local expressions over logs and event streams with jetrocli --ndjson.

What Makes Jetro Different

Jetro is small at the surface, but it is not a toy interpreter.

  • The syntax is expression-first. Objects, arrays, lambdas, filters, reducers, string formatting, pattern matching, and updates compose inside one expression language.
  • The planner tries to do less work. Queries like first, take, and bounded projections can tell earlier stages how much data is actually needed.
  • Writes are part of the language. Updating JSON is not bolted on as a separate API; document rewrites are planned alongside reads.
  • There is a real Rust API. Jetro is the byte-oriented document handle. JetroEngine is the long-lived engine for reusable plans, VM state, and streaming workflows.

A Small Taste

Here is a document shaped like the kind of service inventory developers often meet in scripts, dashboards, and deploy tooling:

{
  "services": [
    {"name":"api","lang":"rust","latency_ms":42,"owner":"platform","enabled":true,"errors":2},
    {"name":"worker","lang":"go","latency_ms":85,"owner":"data","enabled":true,"errors":9},
    {"name":"admin","lang":"ts","latency_ms":130,"owner":"platform","enabled":false,"errors":0}
  ],
  "deploys": [
    {"service":"api","sha":"a1","status":"ok"},
    {"service":"worker","sha":"b2","status":"fail"}
  ],
  "meta": {"env":"prod","version":7}
}

Project the active services:

$.services.filter(@.enabled).map({name: @.name, p95: @.latency_ms, owner: @.owner})

Count ownership:

$.services.count_by(@.owner)

Turn deploy states into operator messages:

$.deploys.map(d => match d with {
  {status:"fail",service:s} -> f"rollback {s}",
  {status:"ok",service:s} -> f"ship {s}",
  _ -> "inspect"
})

Patch the document:

$.update({"meta.version": @ + 1, "services[*].checked": true})

The rest of this book teaches the language from that practical angle: how to read JSON, reshape it, aggregate it, update it, and embed the same behavior in Rust.

Example Conventions

Examples use this layout:

DOC:    {"services": [{"name": "api", "enabled": true}, {"name": "admin", "enabled": false}]}
QUERY:  $.services.filter(@.enabled).map(@.name)
OUT:    ["api"]

Where the input document matters, examples include DOC:. Where the source is already clear from the section, examples usually show only QUERY: and OUT:. Method aliases are listed inline, for example unique (alias distinct).

Start with the Quick Tour, then use the Builtin Reference when you need exact method behavior.

Installation

Jetro ships as three artifacts:

ArtifactWhat it isAudience
jetro (crate)Rust library — query/transform JSON in-processRust developers
jetro-pyPython bindings (PyPI)Python users
jetrocliStandalone CLI jetrocli for shell useAnyone with JSON in a terminal

Rust library

Add to Cargo.toml:

[dependencies]
jetro = "0.5.11"

The simd-json feature is on by default and gives a ~4× cold-start win by parsing bytes directly into Val (no serde_json::Value intermediate). To fall back to the legacy serde-only path:

[dependencies]
jetro = { version = "0.5.11", default-features = false }

Quick sanity check:

use jetro::Jetro;

fn main() -> anyhow::Result<()> {
    let bytes = br#"{"books":[{"title":"Dune","year":1965}]}"#;
    let j = Jetro::from_bytes(bytes)?;
    let titles: serde_json::Value = j.collect("$.books.map(@.title)")?;
    println!("{}", titles);  // ["Dune"]
    Ok(())
}

Long-lived engine

If you process many documents with overlapping queries, keep a JetroEngine around. It holds shared plan and VM caches:

use jetro::JetroEngine;

let eng = JetroEngine::default();
for doc in docs {
    let v = eng.collect(&doc, "$.users.filter(active).count()")?;
    println!("{}", v);
}

Plan-cache default capacity is 256 entries; it evicts wholesale when full.

Python bindings

pip install jetro-py
import jetro

doc = {"books": [{"title": "Dune", "year": 1965}]}
print(jetro.collect(doc, "$.books.map(@.title)"))   # ['Dune']

The Python wheel embeds the same Rust core, so query syntax is identical.

CLI (jetrocli)

Install via Homebrew:

brew install mitghi/jetrocli/jetrocli

Or build from source:

git clone https://github.com/mitghi/jetrocli
cd jetrocli && cargo install --path .

Use it like jq:

echo '{"x":[1,2,3]}' | jetrocli -e '$.x.sum()'
# 6

cat data.json | jetrocli -e '$.users.filter(@.active).map(@.email)'

For file-backed NDJSON, add --ndjson, -i, and -e:

jetrocli --ndjson -i events.ndjson -e '$.id'
jetrocli --ndjson -i events.ndjson \
  -e '$.rows().reverse().distinct_by($.id).take(100)'

Building from source

git clone https://github.com/mitghi/jetro
cd jetro
cargo build --release         # build everything
cargo test                    # full suite
cargo bench -p jetro-core     # micro-benchmarks

Workspace layout:

jetro/             facade crate (re-exports + public API)
jetro-core/        engine: parser, planner, executor, builtins, runtime
jetro-core/fuzz/   cargo-fuzz harness (feature-gated)

Verifying your install

Run the tour from the next chapter against your install. If every query produces the printed output, you're ready.

A Practical Tour

This tour teaches Jetro the way you will probably use it: grab a real JSON payload, ask a precise question, reshape the answer, and move on. Every query in this chapter was checked with the release build of jetrocli 0.2.9.

Run a query against a JSON file:

jetrocli -e '$.services.filter(@.enabled).count()' < services.json

Run a row-local query against NDJSON:

jetrocli --ndjson -i events.ndjson -e '$.service + ":" + $.level'

The Working Document

Save this as services.json:

{
  "services": [
    {"name":"api","lang":"rust","latency_ms":42,"owner":"platform","enabled":true,"errors":2,"tags":["edge","json"]},
    {"name":"worker","lang":"go","latency_ms":85,"owner":"data","enabled":true,"errors":9,"tags":["queue"]},
    {"name":"admin","lang":"ts","latency_ms":130,"owner":"platform","enabled":false,"errors":0,"tags":["internal"]}
  ],
  "deploys": [
    {"service":"api","sha":"a1","status":"ok"},
    {"service":"worker","sha":"b2","status":"fail"}
  ],
  "meta": {"env":"prod","version":7}
}

1. Start With Paths

Use $ for the root document, then walk fields and indexes.

QUERY:  $.services[0].name
OUT:    "api"

Wildcards collect the same field from many array items:

QUERY:  $.services[*].name
OUT:    ["api","worker","admin"]

2. Filter Like You Would In Code

Inside filter, map, and similar methods, @ is the current item.

QUERY:  $.services.filter(@.enabled).count()
OUT:    2

That is the basic Jetro shape: start from a path, chain operations, return the value you actually need.

3. Return A Useful Shape

Projection objects let you rename fields, drop noise, and compute small derived values in one pass.

QUERY:
  $.services
    .filter(@.enabled)
    .map({name: @.name, p95: @.latency_ms, owner: @.owner})
OUT:
  [
    {"name":"api","owner":"platform","p95":42},
    {"name":"worker","owner":"data","p95":85}
  ]

This is where Jetro starts paying rent in developer workflows: the output is already shaped for the next command, dashboard, test assertion, or API boundary.

4. Sort, Bound, Then Project

Use sort_by, take, and map for top-N questions.

QUERY:
  $.services
    .filter(@.enabled)
    .sort_by(-latency_ms)
    .take(1)
    .map({service: name, alert: errors > 5})
OUT:
  [
    {"alert":true,"service":"worker"}
  ]

The minus sign sorts descending by latency. take(1) makes the intended demand explicit: you only want the worst enabled service.

5. Aggregate When A List Is Too Much

Reducers consume a sequence and return a single value.

QUERY:  $.services.map(@.latency_ms).avg()
OUT:    85.66666666666667

Group-style reducers return summaries that are easy to scan:

QUERY:  $.services.count_by(@.owner)
OUT:    {"data":1,"platform":2}

6. Build Operator-Friendly Strings

F-strings are useful for logs, labels, report fields, and shell output.

QUERY:
  $.services
    .filter(@.errors > 0)
    .map(f"{@.name}: {@.errors} errors")
OUT:
  ["api: 2 errors","worker: 9 errors"]

7. Classify Data With Pattern Matching

Pattern matching is a good fit for status payloads, event kinds, and tagged objects.

QUERY:
  $.deploys.map(d => match d with {
    {status:"fail",service:s} -> f"rollback {s}",
    {status:"ok",service:s} -> f"ship {s}",
    _ -> "inspect"
  })
OUT:
  ["ship api","rollback worker"]

Arms are checked top-down. Put specific cases before the fallback arm.

8. Search Deeply When The Path Is Not Stable

When you know the condition but not the exact location, use recursive descent.

QUERY:  $..find(@.status == "fail")
OUT:
  [
    {"service":"worker","sha":"b2","status":"fail"}
  ]

For known schemas, prefer direct paths. For exploratory work over unfamiliar payloads, deep search is often the fastest way to ask the first question.

9. Patch Documents

update returns the full document with the selected changes applied.

QUERY:  $.update({"meta.version": @ + 1, "services[*].checked": true})
OUT:
  {
    "deploys":[
      {"service":"api","sha":"a1","status":"ok"},
      {"service":"worker","sha":"b2","status":"fail"}
    ],
    "meta":{"env":"prod","version":8},
    "services":[
      {"checked":true,"enabled":true,"errors":2,"lang":"rust","latency_ms":42,"name":"api","owner":"platform","tags":["edge","json"]},
      {"checked":true,"enabled":true,"errors":9,"lang":"go","latency_ms":85,"name":"worker","owner":"data","tags":["queue"]},
      {"checked":true,"enabled":false,"errors":0,"lang":"ts","latency_ms":130,"name":"admin","owner":"platform","tags":["internal"]}
    ]
  }

The object keys are paths to update. The expression on the right is evaluated against the value at that path, so "meta.version": @ + 1 increments the current version.

10. Row-Local NDJSON

Save this as events.ndjson:

{"ts":"10:00","service":"api","level":"info","ms":38}
{"ts":"10:01","service":"worker","level":"error","ms":220}
{"ts":"10:02","service":"api","level":"error","ms":91}

Run:

jetrocli --ndjson -i events.ndjson -e '$.service + ":" + $.level'

Output:

"api:info"
"worker:error"
"api:error"

Without $.rows(), NDJSON mode evaluates the expression once per line.

11. Whole-Stream NDJSON

Use $.rows() when the expression should see the NDJSON file as one stream.

jetrocli --ndjson -i events.ndjson \
  -e '$.rows().filter($.level == "error").map({service: $.service, ms: $.ms})'

Output:

{"service":"worker","ms":220}
{"service":"api","ms":91}

This is the mode for file-level filtering, slicing, grouping, latest-record queries, and compacted-topic inspection.

12. Latest Record Per Key

For Kafka-style records where the payload starts after |:

1|{"id":1,"name":"api old","active":false}
2|{"id":2,"name":"worker","active":true}
1|{"id":1,"name":"api","active":true}

Run:

jetrocli --ndjson -i topic.ndjson --payload-after '|' \
  -e '$.rows().reverse().distinct_by($.id).filter($.active).map({id: $.id, name: $.name})'

Output:

{"id":1,"name":"api"}
{"id":2,"name":"worker"}

Read from the end, keep the first row for each id, then filter and project. That is a compacted-topic audit query in one expression.

A Few Power Moves

The tour above keeps to the common path. These examples are worth knowing once you start writing longer queries.

Lambda Forms

The shorthand @ form is usually enough, but named lambdas are useful when an expression gets dense:

QUERY:  $.services.filter(s => s.latency_ms > 80).map(s => s.name)
OUT:    ["worker","admin"]

These forms are equivalent where a single current item is in scope:

$.services.filter(@.enabled)
$.services.filter(.enabled)
$.services.filter(lambda s: s.enabled)

Schema Checks

Use has_key for object-key existence, includes for value membership, and missing for compact schema checks:

QUERY:
  $.services.map(s => {
    name: s.name,
    has_json_tag: s.tags.includes("json"),
    missing: s.missing("owner", "tags", "runtime")
  })
OUT:
  [
    {"has_json_tag":true,"missing":["runtime"],"name":"api"},
    {"has_json_tag":false,"missing":["runtime"],"name":"worker"},
    {"has_json_tag":false,"missing":["runtime"],"name":"admin"}
  ]

Guards In Pattern Matching

Patterns can bind fields, and guards can refine the match:

QUERY:
  $.services.map(s => match s with {
    {enabled:false,name:n} -> f"disabled {n}",
    {latency_ms:ms,name:n} when ms > 100 -> f"slow {n}",
    {name:n} -> f"ok {n}"
  })
OUT:
  ["ok api","ok worker","disabled admin"]

Pipe Value Flow

| passes the value on the left into the right expression as @. It is value flow, not method dispatch:

QUERY:  $.services.count() | "found " + (@ as string) + " services"
OUT:    "found 3 services"

Conditional Updates

Filtered wildcards let updates target many items without writing a host loop:

QUERY:
  $.services[* if errors > 5].update({
    tags: tags.append("hot"),
    checked: true
  })

The result is still the full document with untouched subtrees preserved.

Demand-Aware Queries

These are ordinary queries:

$.services.map(@.name).last()
$.services.filter(@.enabled).first()
$.services.sort_by(-latency_ms).take(2)

Jetro plans from the demanded result backward. Pure one-to-one maps can be delayed, first and take can bound input, and tape-backed sources can avoid materializing values until a stage actually needs them.

Rust Embedding

Use the small facade for one document:

let j = jetro::Jetro::from_bytes(bytes)?;
let out = j.collect("$.services.filter(@.enabled).map(@.name)")?;

Use JetroEngine when you want a long-lived engine with plan and VM reuse:

use jetro::JetroEngine;
use serde_json::json;

let eng = JetroEngine::default();
let doc = json!({"services":[{"latency_ms":42},{"latency_ms":85}]});
let v = eng.collect_value(doc, "$.services.map(@.latency_ms).sum()")?;
assert_eq!(v, json!(127));

You now have the core mental model: path, chain, project, reduce, patch, and stream.

Grammar Overview

The jetro DSL is a small, expression-oriented language. There are no statements at the top level — every program is an expression that produces a value (or, in the case of patches, a rewritten document).

The grammar lives in grammar.pest and is parsed by pest.

Five things that make jetro different

  1. Method calls use dot syntax. xs.map(f), not xs | map(f).
  2. Pipe | is value-flow. x | expr evaluates expr with @ bound to x.
  3. @ is the current value. Inside .filter(...) it's the row; at the top level it's the input.
  4. Bare paths inside method args. .filter(@.age > 18) is sugar for .filter(@.age > 18).
  5. Writes are queries. $.x.set(v) is parsed as a query that produces a patched document, not a mutation.

Categories of syntax

CategoryFormsChapter
Paths$, @, .field, [idx], [*], [start:end:step], ..desc, {pred}Paths
Operatorsarithmetic, comparison, logical, pipe, coalesce, ternary, kind, castOperators
Methods.name(args), lambdas (@, =>, lambda)Lambdas
Literalsnumbers, strings, f-strings, arrays, objects, regexLiterals
Control flowmatch, ternary, try, comprehensionsControl Flow
Writespatch $ {…}, chain-write terminalsPatch

A handy precedence table sits at the end of this part.

A worked sample

$.users
  .filter(u => u.active and u.age >= 18)
  .map(u => { id: u.id, name: u.name, email: u.email })
  .sort(@.name)
  .take(10)

That's: root, field users, predicate filter (named lambda), object-mapping, sort by name, take first 10.

Comments

There are no comments inside a query. Strip them client-side before calling jetro, or factor commentary into the surrounding host program.

Whitespace

Whitespace and newlines are insignificant between tokens. Keep queries on one line in CLIs; break across multiple lines in source.

Paths and Navigation

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5]}

A path is the part of a query that walks into the document. Paths start at a root marker ($, @, or an identifier inside a lambda) and chain steps left-to-right.

Roots

FormMeaning
$The whole input document (top-level root)
@The current value (set by .filter, .map, |, etc.)
nameA let-bound name or lambda parameter
DOC:    {"x": 10}
QUERY:  $
OUT:    {"x":10}

QUERY:  $.x | @ + 1
OUT:    11

Field access

DOC:    {"user": {"name": "Ada"}}
QUERY:  $.user.name
OUT:    ["Ada"]

Field names may also use string keys via ["name"]:

QUERY:  $["user"]["name"]

Use the bracket form when the key contains characters disallowed in identifiers (-, spaces, dots inside the key, leading digits).

Indexing arrays

DOC:    {"xs": [10, 20, 30, 40]}
QUERY:  $.xs[0]
OUT:    10

QUERY:  $.xs[-1]
OUT:    40

Negative indices count from the end.

Slicing

QUERY:  $.xs[1:3]
OUT:    [20,30]

QUERY:  $.xs[:2]
OUT:    [10,20]

QUERY:  $.xs[2:]
OUT:    [30,40]

QUERY:  $.xs[0:4:2]
OUT:    [10,30]

Wildcards

QUERY:  $.xs[*]
OUT:    [10,20,30,40]

[*] is "every element". Most users prefer chained methods (.filter, .map) which already iterate.

Filtered wildcard [* if pred]

A predicated wildcard — keeps only elements satisfying pred (with @ bound to the candidate).

DOC:    {"books": [{"title": "Dune", "year": 1965}, {"title": "Hyperion", "year": 1989}]}
QUERY:  $.books[* if year > 1980]
OUT:    [{"title":"Hyperion","year":1989}]

Equivalent to [*] immediately followed by an inline-filter {cond}, but stays on the path side of parsing. Particularly useful inside .update selectors and quoted patch path keys (see Patch).

Chaining a bare field step after a filtered wildcard collapses to null — chain a method instead:

QUERY:  $.books[* if year > 1980].map(@.title)
OUT:    ["Hyperion"]

Inline filter

{predicate} after a path step keeps only matching elements:

DOC:    {"books": [{"year": 1965}, {"year": 1989}]}
QUERY:  $.books{@.year > 1970}
OUT:    [{"year":1989}]

This is shorthand for .filter(@.year > 1970). Use .filter when you want named-lambda forms.

.. walks every descendant value in DFS pre-order:

DOC:    {"a": {"b": {"x": 1}}, "c": [{"x": 2}, {"x": 3}]}
QUERY:  $..x
OUT:    [1,2,3]

Combine with method calls (no space):

QUERY:  $..find(@.year < 1960)
QUERY:  $..shape({year, title})
QUERY:  $..like({author: "Asimov"})

The deep variants are bitmap-accelerated when a structural index is available.

Dynamic keys

Compute a key at runtime:

DOC:    {"realnames": {"abc": "Ada"}, "post": {"author": "abc"}}
QUERY:  $.realnames[$.post.author]
OUT:    "Ada"

Inside a lambda:

DOC:    {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY:  $.posts.map(p => $.realnames[p.author])
OUT:    ["Ada"]

Quantifiers (postfix)

FormMeaning
step?Optional — return null instead of error if missing
step!Exactly-one — error if zero or many
DOC:    {"xs": [42]}
QUERY:  $.xs!
OUT:    [42]

QUERY:  $.maybe?
OUT:    null      # absent, no error

Path after a method

Paths and methods are interchangeable steps:

$.users.filter(@.active).pick(name, email)[0]

That's: field, method, method, index. There is no special "tail position".

Paths inside method args need a root

Inside method-call arguments, paths must start with @ (current item), $ (document root), or a bound name. Bare-path forms like .field do not parse:

$.users.filter(@.age > 18)        # ✓ @-form
$.users.filter(u => u.age > 18)   # ✓ named lambda
$.users.filter(.age > 18)         # ✗ parse error
$.users.map(@.name)               # ✓
$.users.map(.name)                # ✗

The same rule applies to inline filters: $.xs{@.k > 1} works, $.xs{.k > 1} does not.

Top-level paths still need $.

Summary

StepExampleNotes
Root$, @One per chain (or implicit @ in args)
Field.nameUse ["..."] for tricky keys
Index[3], [-1]Negative counts from end
Slice[1:5], [::2]Half-open like Python
Wildcard[*]Whole array
Filtered wildcard[* if pred]Wildcard restricted by predicate (@ = element)
Descendant..name, ..DFS pre-order
Inline filter{cond}Sugar for .filter
Dynamic key[expr]Expression resolves to key
Quantifier?, !Postfix on a step

Operators

Jetro has the operators you'd expect plus a small number of extras that come up in JSON work.

Arithmetic

1 + 2          # 3
3 - 1          # 2
2 * 3          # 6
6 / 2          # 3
7 % 3          # 1
-x             # unary negation

+ on strings concatenates: "foo" + "bar""foobar".

+ on arrays concatenates: [1,2] + [3][1,2,3].

Comparison

a == b         # equality
a != b         # inequality
a < b          # less than
a <= b
a > b
a >= b

== and != work across types (strings to strings, numbers to numbers, etc). Cross-type comparison returns false for == and true for !=.

Logical

a and b        # short-circuit AND
a or b         # short-circuit OR
not a          # negation

Truthiness: null, false, 0, "", [], {} are falsy. Everything else is truthy.

Pipe

value | expr

Evaluates expr with @ bound to value. It is not a method-call shorthand.

DOC:    {"x": 10}
QUERY:  $.x | @ * 2
OUT:    20

QUERY:  $.x | f"got {@}"
OUT:    "got 10"

To call a method, use dot syntax: $.x.upper(), not $.x | upper.

Coalesce

a ?? b

Return a unless it is null, in which case b.

DOC:    {"name": null}
QUERY:  $.name ?? "anon"
OUT:    "anon"

Ternary

Python-style — postfix condition:

"hot" if temp > 30 else "cool"
DOC:    {"temp": 35}
QUERY:  "hot" if $.temp > 30 else "cool"
OUT:    "hot"

Kind tests

v is number
v is string
v is array
v is object
v is null
v is bool

Returns boolean.

QUERY:  $.x is number

Cast

x as int
x as float
x as string
x as bool
x as array
x as object

Coerces the value (or returns null if the cast is impossible — depends on the specific cast).

"42" as int        # 42
42 as string       # "42"

Membership

xs has v           # array membership: true if v is in xs
o  has "k"         # object membership: true if key "k" exists

There is no v in xs operator — that form is a parse error. Use the postfix has operator above, or call .includes(v) (arrays/strings) explicitly:

$.tags.includes("hugo")    # ✓
"hugo" in $.tags           # ✗ parse error

Regex match

s ~= "pattern"

Returns boolean. Uses Rust regex syntax. Bind captures with .captures or .match_first for richer info — see String Search.

Boolean shortcut on patches

In a patch $ { … } body, a key when condition clause skips the assignment when condition is falsy. See Patch.

Examples

DOC:    {"books": [{"year": 1965, "tags": ["sf"]}, {"year": 1989, "tags": ["sf","hugo"]}], "year_floor": 2000}

QUERY:  $.books.filter((@.year > 1970 and @.tags.includes("hugo")) or @.year >= $.year_floor)
OUT:    []

QUERY:  $.books[0].year ?? 9999
OUT:    1965

QUERY:  $.books.map(b => "old" if b.year < 1970 else "new")
OUT:    ["old","new"]

No in operator. Membership in jetro is xs.includes(v) (or xs.has(v) for objects/arrays). There is no v in xs operator — that form is a parse error. Wrap and/or mixes in parens to make precedence unambiguous; jetro follows standard binding (and tighter than or), but parens read clearer.

Lambdas and Method Calls

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5], "pairs": [["a", 1], ["b", 2], ["c", 3]]}

Methods take arguments. Most arguments are values; one common one is a lambda — a small function evaluated per element. Jetro accepts three lambda syntaxes; pick whichever reads best.

The @-form

@ is the current item. Inside method args, prefix paths with @ to walk into it:

$.users.filter(@.age >= 18)
$.users.map(@.name)
$.xs{@.active}                  # inline filter must also use @

Leading-dot shorthand .age inside method args desugars to @.age — the two forms are equivalent and the planner sees identical opcodes.

$.users.filter(.age >= 18)
$.users.map(.name)
$.xs{.active}                    # works inside inline filters too

Arrow-form named lambda

$.users.filter(u => u.age >= 18)
$.users.map((u) => u.name)

The parens around the parameter are optional for one parameter.

For multiple parameters:

$.pairs.map(([k, v]) => k + ":" + v)

Python-style lambda keyword

$.users.filter(lambda u: u.age >= 18)
$.users.map(lambda u: u.name)

Functionally identical to the arrow form. Useful when porting from Python.

Performance

Named lambdas (u => u.x, lambda u: u.x) and the @-form compile to the same bytecode. Benchmarks confirm parity (3.42 ms vs 3.44 ms / 100K rows in the lambda regression suite). Pick what reads best — there is no perf reason to prefer @.

Method call basics

.method()                       # no args
.method(arg)                    # one positional
.method(arg1, arg2)             # multiple
.method(name=value)             # named (a few methods support these)
.method(arg1, name=value)       # mixed

Examples:

$.xs.take(3)
$.xs.replace("foo", "bar")
$.xs.join(",")
$.xs.sort(@.year)                # sort by key projection

Methods inside method args

Lambdas can chain methods just like top-level queries:

$.posts.map(p => p.tags.unique().count())
$.users.filter(u => u.email.starts_with("admin"))

Multi-arg lambdas with destructuring

Some barriers (e.g. pairwise) yield 2-tuples. Destructure them:

$.xs.pairwise().map(([a, b]) => b - a)

Captured $

Inside a lambda, $ still means "the document root" — it does not get shadowed by the lambda parameter:

DOC:    {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY:  $.posts.map(p => $.realnames[p.author])
OUT:    ["Ada"]

First-class lambdas via let

Bind a lambda once, use it many times:

let by_year = (b => b.year < 1970) in
  $.books.filter(by_year)

The let-bound lambda is inlined at every method-arg use before compilation, so it has zero closure overhead — exactly the same code as if you'd written the body directly in .filter(...).

Outside method-arg position, the binding is a normal name reference.

Literals

Scalars

null
true     false
42       3.14     -7    1.5e3
"double-quoted"   'single-quoted'

Strings allow standard escapes (\n, \t, \\, \", \uXXXX).

F-strings

f"…" interpolates {expression}:

DOC:    {"name": "Ada", "age": 36}
QUERY:  f"hi {$.name}, you are {$.age + 1} next year"
OUT:    "hi Ada, you are 37 next year"

Inside a lambda:

$.users.map(u => f"{u.name} <{u.email}>")

Escape literal braces with {{ and }}:

f"{{not interpolated}}"      # "{not interpolated}"

Arrays

[1, 2, 3]
["a", "b"]
[$.x, $.y, 99]              # values can be expressions

[...$.xs, 4, 5]             # spread
[1, ...mid, 9]              # spread anywhere

Heterogeneous arrays are fine: [1, "a", null, [2,3]].

Objects

{name: "Ada", age: 36}            # bare-key (identifier-like)
{"name": "Ada"}                   # quoted-key (any string)

{x, y}                            # shorthand: same as {x: x, y: y}

{[dyn_key]: 1}                    # computed key
{...obj, extra: 1}                # spread
{...**deep}                       # deep recursive spread

{name: "Ada", role: "admin" when $.is_admin}
                                  # conditional value (omit if cond falsy)

Regex literals

Regex appear as the right operand of ~= or as arguments to regex builtins:

$.s ~= "^[A-Z]+$"
$.text.scan("\d+")

Patterns use Rust's regex crate syntax.

Numeric notes

Jetro distinguishes integers from floats internally where possible. 42 and 42.0 compare equal but a downstream sink that requires "integer" (e.g. indexing) will only accept the former.

Negative literals: -7 is a unary-negated literal — the parser handles this correctly without ambiguity in arithmetic positions (a - 7 is subtraction, a + -7 is addition with -7).

Control Flow

Ternary

Python-style:

expr if condition else fallback
DOC:    {"x": 10}
QUERY:  "big" if $.x > 5 else "small"
OUT:    "big"

Right-associative; chain via parens for clarity.

Try / else

Catch evaluation errors:

try expr else fallback
QUERY:  try $.maybe.deep.path else "missing"
OUT:    "missing"

QUERY:  try $.xs[0].name.upper() else "n/a"

? quantifier handles the "missing field" subset more concisely: $.maybe? returns null instead of erroring.

let … in …

Local bindings:

let x = $.users.count() in
  f"there are {x} users"

Multi-binding:

let a = 1, b = 2 in a + b   # equiv: let a=1 in let b=2 in a+b

let shines for first-class lambdas — see Lambdas.

Pattern match

match value with {
  pattern1 -> expr1,
  pattern2 when guard -> expr2,
  _ -> default
}

Patterns

PatternMatches
42, "x", true, nullEqual literal
_Any value
nameAny value, bound to name
1..10Number ≥ 1 and < 10
1..=10Number ≥ 1 and ≤ 10
{k1: p1, k2: p2}Object with these keys, each matching
{id, name}Object shorthand; binds id and name
{id, ...*rest}Object with rest capture
[p1, p2]Array of length 2, each matching
[h, ...t]Head + tail
p1 | p2Either pattern (or-pattern)
x: numberKind-bound: matches if x is a number

Guards

match $.x with {
  v when v > 100 -> "big",
  v when v > 10 -> "medium",
  _ -> "small"
}

Worked example

DOC:    {"event": {"kind": "click", "x": 100, "y": 200}}
QUERY:
  match $.event with {
    {kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
    {kind: "key",   code: c}       -> f"key:{c}",
    _ -> "unknown"
  }
OUT:    "click@100,200"

Deep match

$..match { pattern -> expr, _ -> null }

Walks every descendant; returns matched results as an array.

$..match! { pattern -> expr }      # first match only, early-stops

The bang variant terminates as soon as one match succeeds (uses the bitmap structural index when available).

Comprehensions

Jetro supports list, dict, set, and generator comprehensions over both literal and path-rooted sources. Pair destructure works in two interchangeable forms (for k, v in ... and for [k, v] in ...), and multiple if clauses are folded with and.

List

[expr for x in source if cond1 if cond2 ...]
DOC:    {"xs": [1, 2, 3, 4, 5]}

QUERY:  [n*n for n in $.xs if n > 2]
OUT:    [9,16,25]

QUERY:  [n for n in $.xs if n > 1 if n < 5]
OUT:    [2,3,4]

Object

{key: value for x in source if cond}
{k: v for [k, v] in pairs}
{k: v for k, v in pairs}
DOC:    {"pairs": [["a", 1], ["b", 2]]}

QUERY:  {k: v for [k, v] in $.pairs}
OUT:    {"a":1,"b":2}

QUERY:  {n: n*n for n in [1,2,3]}
OUT:    {"1":1,"2":4,"3":9}

Iterating an object yields {key, value} records:

DOC:    {"o": {"a": 1, "b": 2}}
QUERY:  {e.key: e.value*10 for e in $.o}
OUT:    {"a":10,"b":20}

Set

Deduplicating comprehension. Returns an array of unique values.

QUERY:  {n*n for n in [-2, -1, 0, 1, 2]}
OUT:    [4,1,0]

Generator

(x for x in items)

Same semantics as the list form; useful as a lazy source for a downstream reducer or barrier.

if-on-patch

Inside a patch $ {…} body, key: expr when cond skips the assignment when cond is falsy:

patch $ {
  status: "active" when $.verified
}

See Patch.

Patch and Writes

Fixture

Examples below run against:

DOC:    {"user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "xs": [1, 2, 3, 4, 5]}

Jetro treats writes as queries: a write returns the patched document. There are two equivalent surfaces.

Chain-write terminals

Add a write method at the end of a rooted path:

MethodEffect
.set(v)Replace the value at this path with v
.modify(expr)Replace, with @ bound to the current value
.delete()Remove the leaf
.unset(key)Remove key from the leaf object
.merge({…})Shallow-merge into the leaf object
.deep_merge({…})Recursive merge
.append(v)Push to the leaf array
.prepend(v)Unshift onto the leaf array
DOC:    {"user": {"name": "Ada", "tags": ["math"]}}

QUERY:  $.user.name.set("Ada Lovelace")
OUT:    {"user":{"name":"Ada Lovelace","tags":["math"]}}

QUERY:  $.user.tags.append("code")
OUT:    ["math","code"]

QUERY:  $.user.unset(tags)
OUT:    {"user":{"name":"Ada"}}

QUERY:  $.user.modify(u => u.merge({active: true}))
OUT:    {"user":{"active":true,"name":"Ada","tags":["math"]}}

The classifier fires only when the base of the chain is $. Inside lambdas ($.xs.map(@.set(...))) it remains a regular method call — useful when a sub-pipeline wants the old "return the new value" semantics.

patch $ { … } block

The same operation expressed as a block:

patch $ {
  user.name: "Ada Lovelace",
  user.tags: DELETE
}

Block syntax is best for multiple writes — it batches them through a single fused pass (see Write Fusion).

Block clauseMeaning
path: valueAssignment
path: DELETERemoval
path: value when condConditional
path[*]: valueBroadcast over an array

Conditional writes

patch $ {
  status: "active" when $.verified,
  retired_at: now() when $.retired
}

If the condition is falsy, the assignment is skipped entirely — neither written nor zeroed.

Broadcast over arrays

DOC:    {"items": [{"x": 1}, {"x": 2}, {"x": 3}]}

QUERY:  $.items[*].x.set(0)
OUT:    [0,0,0]

Pipe form preserves "return-the-new-value"

Some users prefer the v1 behavior where a write inside a .map returned the written value, not the patched root:

$.items.map(item => item | set(item.x + 1))

The pipe form value | set(new) keeps that meaning.

Modify with pipe

$.user.modify(u => u.merge({last_seen: now()}))

modify evaluates its argument with @ bound to the current value, then writes the result back at the same path.

Multiple writes in one query

Either chain them:

$.user.name.set("Ada").tags.append("admin")

or use a block:

patch $ {
  user.name: "Ada",
  user.tags[*]: "active"   # broadcast
}

The planner detects multi-write patterns and routes them through the patch-fusion optimizer, which lowers repeated path traversals into a single fused write pass.

Functional .update({...})

A third surface, written as a method call:

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}

QUERY:  $.books[*].update({tags: tags.append("modern") when year > 1980, reviewed: true})
OUT:    {"books":[{"reviewed":true,"tags":["sf"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

Use .update when you want all of the following at once:

  • A selector chosen with chain syntax ($.books[*], $.books[* if year > 1980])
  • An object body listing multiple field updates evaluated against each selected snapshot
  • The same when / DELETE semantics as patch $ { ... }
  • Quoted path keys ("books[*].tags") when the receiver is $, giving root-level batched updates without an explicit selector

.update parses to its own AST node (UpdateBatch) so the planner can keep the user-level shape — useful for selector pushdown, demand analysis, and fusion. See Path Mutation → update for the full argument matrix.

Filtered wildcard [* if pred]

A predicated wildcard inside a path. Available wherever [*] is, and particularly useful inside .update selectors and quoted path keys:

DOC:    {"books": [
  {"title": "Dune", "year": 1965},
  {"title": "Hyperion", "year": 1989}
]}

QUERY:  $.books[* if year > 1980]
OUT:    [{"title":"Hyperion","year":1989}]

The predicate runs against @ = the candidate element. Falsy elements are skipped from the path traversal entirely.

Wildcard .modify chains

Wildcard chain-writes are now lowered to a fused patch:

DOC:    {"books": [{"tags": ["sf"]}, {"tags": ["hugo"]}]}
QUERY:  $.books[*].tags.modify(@.append("test"))
OUT:    {"books":[{"tags":["sf","test"]},{"tags":["hugo","test"]}]}

Caveats

  • .replace(needle, with) is not a write terminal — it is the string-replace builtin.
  • The classifier only triggers on chains rooted at $. Use the block syntax when the base path is computed.
  • DELETE is a marker, not a value — you can't store it in a binding.

Precedence Table

Lowest precedence at the top. Operators on the same row associate left unless noted.

LevelOperatorsAssociativityNotes
1if … else …, try … else …rightTernary, try-else
2|, |>leftPipe (value-flow)
3??, ?|rightCoalesce
4orleftLogical OR (short-circuit)
5andleftLogical AND (short-circuit)
6notn/aLogical NOT (prefix)
7is, kind, is notn/aKind test
8hasleftMembership operator (no in — use .includes(v))
9==, !=, <, <=, >, >=, ~=leftComparison
10+, -leftAdditive (and string/array concat)
11*, /, %leftMultiplicative
12asleftCast
13- (unary)n/aNegation
14.field, .method(), [idx], {cond}, ?, !leftPostfix steps
15$, @, literal, (...), lambda, let, match, patch, compn/aPrimary

Common pitfalls

Pipe vs method call.

$.x | upper           # ✗ — interprets `upper` as a name to pipe into
$.x.upper()           # ✓ — method call

Comparison chains.

1 < x < 10            # ✗ — parses as `(1 < x) < 10`
1 < x and x < 10      # ✓

Ternary mid-chain.

$.x.upper() if cond else $.x   # parses fine — the ternary wraps the whole
                                # left expression

Negation tightness.

not a == b            # parses as `(not a) == b` — surprising!
not (a == b)          # parens are clearer
a != b                # cleanest

Coalesce + comparison.

$.x ?? 0 > 5          # parses as `($.x ?? 0) > 5` (low-precedence coalesce)

Try captures errors only.

try $.x.parse_int() else 0

try does not catch falsy-as-error — only actual evaluation errors (missing field, bad cast, regex failure, etc.).

Pipelines

A jetro query is a pipeline of stages. The shape is always:

Source → Stage* → Sink

Source produces values one at a time. Each Stage consumes one value and produces zero, one, or many. The Sink collects results.

What counts as a stage

StageExamplesOutput
One-to-one.map, .enumerate, .lag, .zscoreOne out per in
Filter.filter, .find, .compact, .takewhileZero or one out per in
Expander.flat_map, .flatten, .split, .lines, .charsMany out per in
Reducer.sum, .count, .min, .any, .find_indexOne total
Positional.first, .last, .nth(i), .collectOne or N
Barrier.sort, .unique, .group_by, .window, .chunkBuffers, then emits

A reducer or positional terminator ends the pipeline; further methods chain on the result (a scalar or array) rather than streaming.

Streaming vs. barrier

Most stages stream — they process one value, emit, repeat. The pull-based backend means each value travels end-to-end before the next is fetched. This is what makes early termination work (.first, .find).

Barriers cannot stream: .sort must see every element before it can emit any. The pipeline buffers up to the barrier, runs the barrier as a unit, then resumes streaming if more stages follow.

$.xs.map(f).filter(p).sort(@.x).take(10).map(g)
        \________________/   \____________/
            streaming         streaming again
                          ↑
                    barrier point

Barriers carry an apply_barrier method on the builtin.

Sources

The most common source is a path: $.users is a source. Other shapes:

  • An array literal ([1,2,3].map(f))
  • A range ((0..10).map(f))
  • A method that returns a sequence ($.text.lines().map(...))

Sinks

If your final stage is a reducer, the sink is the reducer's accumulator. If it's a streaming stage, the sink collects into an array.

.collect() is the explicit sink: scalar in → [scalar], array in → identity, null in → []. Use it when you need a deterministic array shape.

Composed stages

Adjacent stages get composed when possible: two Stages fold into one virtual call per element. This is Composed<A, B> under the hood; the optimizer fuses chains of .maps, .filters, and .picks aggressively.

User-visible effect: writing many short stages costs roughly the same as one big lambda — write for clarity.

Backend selection

Each pipeline node carries a list of preferred backends. The router tries them in order; the first to declare it can run the node wins.

SourcePreferred backends
FieldChain (e.g. $.a.b.c)tape-view → tape-rows → materialised → val-view → interpreted
Generic expressionfast-children → interpreted
Deep searchstructural index → interpreted
Single root pathtape-path → interpreted

You don't pick the backend — the planner does. But knowing they exist explains why simple queries are fast: they often run zero-copy over the simd-json tape.

When to think about pipeline shape

In practice, almost never. Two cases:

  1. Don't sort until you have to. A pre-sort barrier defeats early termination. Push .filter, .take, .first before .sort if the semantics allow.
  2. Avoid full materialisation in the middle. Chains of streaming stages stay zero-copy. A .collect() mid-chain forces a full pass.

The next chapter, Demand Propagation, explains why these heuristics work.

Demand Propagation

Demand propagation is the planner pass that makes "obvious" queries fast. It walks the pipeline backward — from sink to source — asking each operator: given what comes after you, what do you actually need from your source?

The answer is encoded in three lanes per stage and then used at execution time to skip work.

The three lanes

1. PullDemand — how many inputs?

VariantMeaning
AllRead everything
FirstInput(n)Stop after n inputs
LastInput(n)Seek to the end, take last n
NthInput(i)Jump to a single index
UntilOutput(n)Keep reading until n outputs are produced

2. ValueNeed — what payload from each input?

VariantMeaning
NoneDon't decode the row at all
PredicateOnly what the predicate touches
ProjectionOnly the fields used in a projection
NumericOnly numeric content
WholeThe full row (default pessimistic)

3. order: bool — does input order matter?

Some sinks (e.g. .sum()) don't care about order. The planner can use this to enable parallel-friendly access patterns when supported.

Backward walk

For a pipeline s1 → s2 → … → sN → sink, the planner does:

demand = sink_demand
for op in [sN, …, s2, s1]:        # reverse order
    upstream = op.propagate_demand(demand)
    record (op, downstream=demand, upstream)
    demand = upstream

The final demand is what the source must satisfy. The source backend chooses an access strategy that matches.

Operator laws

Every builtin declares one of these laws (in defs.rs):

LawEffect on demand
IdentityPass through unchanged (e.g. .upper, .lower)
MapLikePreserve pull, force ValueNeed::Whole
FilterLikeFirstInput(n) becomes UntilOutput(n)
TakeWhileSame as filter, but bounded
UniqueLikeMust scan until N distinct outputs
Take(n)Cap pull at FirstInput(n)
FirstAlways FirstInput(1)
LastAlways LastInput(1)
CountAll inputs, ValueNeed::None
NumericReducerAll inputs, ValueNeed::Numeric

Six worked examples

A. Early termination on .first

$.items.map(name).first()
  • first() declares FirstInput(1) to its source
  • .map(name) is MapLike: preserves pull, demands Whole from items
  • Source receives: read 1 item, decode fully

Without demand: read all items, decode all, take first.

B. Bounded filter

$.items.filter(active).take(3)
  • take(3)FirstInput(3)
  • filter(active)UntilOutput(3) (read until 3 pass)
  • Source: read until 3 active items found

Without demand: filter the entire array, then slice.

C. Field-level projection

$.users.map(u => {id, name})
  • The map projection touches id and name
  • Source: decode only id, name from each user

Other fields are not allocated. Over a wide-record document, this is the biggest win.

D. Last-element scan

$.logs.filter(severity >= 3).last()
  • last()LastInput(1)
  • filter(...)UntilOutput(1) from the end
  • Source: scan backward, stop after first match

Without demand: scan forward, materialise all matches, take last.

E. Count without payloads

$.items.filter(status == "done").count()
  • count() declares ValueNeed::None
  • filter(...) declares Predicate on status
  • Source: decode only status, no other fields

F. Reverse + take

$.items.reverse().take(2)
  • take(2)FirstInput(2)
  • reverse() flips: source receives LastInput(2)
  • Source: seek to end, read 2 backward, then reverse

What demand does not do

  • It does not change result semantics. Two pipelines with identical text produce identical output regardless of demand state.
  • It does not optimise across barriers (.sort, .group_by). A barrier forces All upstream — it must see every input.
  • It does not move work between stages. Operators don't fuse; demand only gates what they read.

When you'll feel demand kick in

Three rough rules of thumb:

  1. Put take/first/find near the end. That's how their pull demand reaches back to the source.
  2. Project early when possible. map(@.field) upstream of a barrier reduces the buffered set.
  3. Avoid unnecessary collect(). It forces full materialisation and resets the demand walk.

Demand is invisible most of the time — your queries get faster than they "should" be, and that's exactly the goal.

Lazy Evaluation and Caches

Jetro is lazy in three places that matter to users.

1. Document parsing

Jetro::from_bytes does not fully parse the document up front when the default simd-json feature is enabled. Instead it builds a tape — a flat array of tokens — and lazily decodes parts as queries demand them.

What this means:

  • Cold-start is ~4× faster than the legacy serde_json::Value path.
  • A query that touches only $.x.y decodes the rest of the doc only when asked.
  • Borrowed string slices (Val::StrSlice) avoid a copy when the value is read-only.

If you want eager full parsing (e.g. for serde_json::Value round-trips):

let doc: serde_json::Value = serde_json::from_slice(bytes)?;
let v = engine.collect_value(doc, "$.x")?;

2. Streaming pipelines

The pull-based pipeline backend processes one element at a time. A stage doesn't run until its downstream consumer pulls. This is what enables .first() and .find() to terminate early.

A consequence: side effects in lambdas are not guaranteed to fire for every element. (Lambdas in jetro have no I/O, so this is mostly an academic concern, but worth knowing if you write a custom builtin.)

3. Plan caches

Two caches matter:

Plan cache (per JetroEngine)

When you call engine.collect(&doc, query) repeatedly with the same query, the parsed AST → IR → bytecode pipeline is computed once and reused. Default capacity: 256 entries, evicted wholesale when full.

For workloads with a small fixed set of queries and many documents, this is a big speedup. For ad-hoc one-shot queries, it's a no-op.

Path cache (per VM)

The bytecode VM caches resolved pointer paths per document. The cache key hashes both structure and primitive leaf values bounded at depth 8 — two documents with identical shape but different leaves produce different hashes, so the cache stays correct across calls.

You don't manage this directly. It's amortised over many queries on the same document.

When laziness backfires

It rarely does, but two pitfalls:

Forcing materialisation. Methods like .collect(), .sort(), .unique(), .group_by() are barriers — they materialise. Putting them mid-chain when they aren't needed defeats laziness.

Holding onto Vals. A Val is Arc-wrapped, so cloning is O(1), but the Arc keeps the underlying data alive. If you query a giant doc, hold onto a small projection, and let the doc go, you may be surprised that the original data is still resident — the projection's Val::StrSlices borrow into the tape.

Use .to_json() (or serde_json::Value round-trip) to disconnect a projection from the source tape when you really need to release memory.

Practical recipe

For long-lived servers:

// At startup
let engine = JetroEngine::default();

// Per request
let result = engine.collect_bytes(req_body, "$.users.filter(@.active).count()")?;

Plans get cached, parsing is lazy, the pipeline early-terminates. There's typically nothing else to tune.

Builtin Reference — Overview

Jetro ships 181 builtin methods. They fall into 18 categories. Every method has the same shape:

.method(arg1, arg2, …)

…or, when the parser routes through inline path filters and sugar:

$.path.method(...)

This part documents every method. Each entry follows the format:

name (aliases: …)

  • Signature: what it takes and returns
  • Behavior: one-paragraph description
  • Example: at least one minimal runnable example
  • Demand law / Notes: when relevant

Index

CategoryWhat goes herePage
Value introspectiontype, len, schema, JSON round-tripIntrospection
Numeric scalarsceil, floor, round, absNumeric
String transformsupper, trim, pad_*, slice, replaceString
String search / regexstarts_with, match_*, captures, split_reString Search
Conversionto_number, parse_int, parse_boolConversion
Streaming one-to-onemap, enumerate, pairwise, lag, zscoreStreaming
Filteringfilter, find, compact, takewhileFiltering
Expandingflat_map, flatten, lines, charsExpanding
Reducerssum, count, any, max_byReducers
Positionalfirst, last, nth, collectPositional
Barrierssort, unique, group_by, windowBarrier
Arrays / setsappend, diff, union, zipArrays
Objectskeys, pick, merge, transform_valuesObjects
Path mutationget_path, set_path, set, updatePath Mutation
Deep traversaldeep_find, walk, recDeep
Predicateshas, missing, includes, indexPredicates
Tabularto_csv, to_tsvTabular
Relationalequi_joinRelational

Notation in this part

  • aliases — alternative names accepted by the parser. They lower to the same builtin and behave identically.
  • "demand law" — what kind of Demand this builtin propagates upstream. See Demand Propagation for the model.
  • "barrier" / "stream" / "scalar" — execution shape (does it buffer, stream, or run once on a single value).

When a method appears under multiple categories (e.g. .find is both a filter and positional), it lives in the most specific chapter and is cross-linked.

Sharp edges

A small set of 0.5-line design choices is documented in Known Limitations: replace is single-occurrence (use replace_all for substitute-every), there is no in operator (use xs has v), and rec(fn) caps at 10 000 iterations when the step never converges (use rec(fn, cond) to bound). Two engine items remain on the fix-list: rec() no-arg and a stronger runaway-iteration guard.

Aliases at a glance

CanonicalAliases
anyexists
chunkbatch
drop_whiledropwhile
take_whiletakewhile
includescontains
skipdrop
sortsort_by
uniquedistinct
deep_find..find (deep-method form)
deep_shape..shape
deep_like..like

These pairs are interchangeable. Pick whichever reads better.

Value Introspection

Methods that report on the kind and shape of a value, plus JSON round-trip.

type

  • Signature: Any -> String
  • Behavior: Returns the kind of value as a string: "null", "bool", "number", "string", "array", "object".
QUERY:  $.x.type()
DOC:    {"x": [1,2,3]}
OUT:    "array"

len

  • Signature: (String|Array|Object) -> Number
  • Behavior: Length: chars for strings, elements for arrays, key count for objects. Errors on null/bool/number.
DOC:    {"s": "hello", "xs": [1,2,3], "o": {"a":1,"b":2}}

QUERY:  $.s.len()     OUT: 1
QUERY:  $.xs.len()     OUT: 3
QUERY:  $.o.len()     OUT: 1

to_string

  • Signature: Any -> String
  • Behavior: Stringifies a scalar (42"42", true"true", null"null"). For arrays/objects, returns the JSON serialisation.
QUERY:  42.to_string()     OUT: "42"
QUERY:  ([1, 2]).to_string()     OUT: "[1,2]"

to_json

  • Signature: Any -> String
  • Behavior: Compact JSON serialisation of any value.
QUERY:  $.user.to_json()

Distinguish from to_string: for compound values, the two are equivalent; for scalars, to_json always quotes strings ("foo""\"foo\""), to_string does not.

from_json

  • Signature: String -> Any
  • Behavior: Parse a JSON string into a value.
QUERY:  '{"x":1}'.from_json()
OUT:    {"x":1}

QUERY:  $.encoded.from_json().x

Errors on malformed input. Wrap in try if the source is untrusted:

try $.s.from_json() else null

schema

  • Signature: Any -> Object
  • Behavior: Infers a schema sketch — keys, kinds, nullable flags. Useful for "what does this document look like?" probes.
DOC:    [{"id": 1, "name": "a"}, {"id": 2, "name": null}]
QUERY:  $.schema()
OUT:    {"items":{"fields":{"id":{"type":"Int"},"name":{"nullable":true,"type":"String"}},"required":["id"],"type":"Object"},"len":2,"type":"Array"}

The exact output format is documented in builtins/ops/schema.rs; treat it as advisory rather than a stable contract.

Demand notes

  • len over an array is ValueNeed::None upstream — it doesn't decode rows.
  • type is Identity demand-wise.
  • from_json/to_json are scalar transforms with no demand interaction.

Practical examples

# Quick shape check
$.payload.type()                        # → "object"
$.payload.len()                         # for object: number of keys

# Distinguish array length vs string length
$.items.len()                           # array element count
$.title.len()                           # number of characters

# Safe deserialization of a payload field
try $.body.from_json() else null

# Compact serialization
$.event.to_json()

# Stringify any value
$.x.to_string()

# Probe an unknown payload's schema
$.events[0].schema()

Numeric Scalars

Fixture

Examples below run against:

DOC:    {"products": [{"id": 1, "price": 3.7}, {"id": 2, "price": 4.2}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "deltas": [-1, 2, -3, 4], "xs": [1, 2, 3, 4, 5]}

Pure scalar transforms over numbers.

ceil

  • Signature: Number -> Number
  • Behavior: Smallest integer ≥ x.
QUERY:  3.2.ceil()     OUT: 4
QUERY:  (-3.2).ceil() OUT: -3

floor

  • Signature: Number -> Number
  • Behavior: Largest integer ≤ x.
QUERY:  3.7.floor()     OUT: 3
QUERY:  (-3.7).floor() OUT: -4

round

  • Signature: Number -> Number
  • Behavior: Round to nearest; ties round half-away-from-zero.
QUERY:  3.5.round()     OUT: 4
QUERY:  3.4.round()     OUT: 3
QUERY:  (-3.5).round() OUT: -4

abs

  • Signature: Number -> Number
  • Behavior: Absolute value.
QUERY:  (-7).abs()     OUT: 7
QUERY:  3.5.abs()     OUT: 3.5

Mapping over arrays

These are scalar; lift them with .map:

DOC:    {"xs": [1.4, 2.6, -3.5]}

QUERY:  $.xs.map(@.round())
OUT:    [1,3,-4]

QUERY:  $.xs.map(@.abs()).sum()
OUT:    7.5

See also

Numeric reducers (sum, avg, min, max) live in Reducers. Streaming numeric transforms (zscore, pct_change, cummax, cummin) live in Streaming.

Practical examples

# Round every price up to the nearest dollar
$.products.map(p => p.merge({price_ceil: p.price.ceil()}))

# Percent → integer percent
$.metric.pct.map(@ * 100).map(@.round())

# Magnitudes (drop sign)
$.deltas.map(@.abs())

# Banker-style splits
$.amount.floor()                   # cents component, etc.

# Build a histogram with binned values
$.measurements.map(m => (m / 10).floor() * 10).count_by(@)
# → {0: 12, 10: 5, 20: 3, ...}

String Transforms

Scalar string operations. Lift with .map to apply to an array of strings.

Case

MethodWhatExample
upperASCII uppercase"foo".upper()"FOO"
lowerASCII lowercase"FOO".lower()"foo"
capitalizeFirst char upper, rest lower"foo bar".capitalize()"Foo bar"
title_caseEach word capitalised"foo bar".title_case()"Foo Bar"
snake_caselowerSnake_case to lower_snake_case"FooBar".snake_case()"foo_bar"
kebab_caseWords joined with -"FooBar".kebab_case()"foo-bar"
camel_casefooBar style"foo_bar".camel_case()"fooBar"
pascal_caseFooBar style"foo_bar".pascal_case()"FooBar"
reverse_strReverse char order"abc".reverse_str()"cba"

Trim

MethodWhat
trimStrip whitespace from both ends
trim_leftStrip leading whitespace
trim_rightStrip trailing whitespace
QUERY:  "  hi  ".trim()     OUT: "hi"
QUERY:  "  hi  ".trim_left()     OUT: "hi  "

Padding and centering

MethodSignatureExample
pad_left(width, char?)Right-align by padding left"7".pad_left(3, "0")"007"
pad_right(width, char?)Left-align by padding right"hi".pad_right(5)"hi "
center(width, char?)Center within width"hi".center(6)" hi "

If char is omitted, space is used.

Indent / dedent

indent(n) takes an integer (number of spaces); the prefix is fixed spaces.

QUERY:  "line1\nline2".indent(2)
OUT:    "  line1\n  line2"

dedent() strips the first line's leading whitespace from every subsequent line that begins with the same prefix. It is not a common-prefix dedent across all lines:

QUERY:  "  a\n  b".dedent()
OUT:    "a\nb"

Slice

"hello world".slice(0, 5)      # "hello"
"hello world".slice(6)         # "world"
"hello".slice(-3)              # "llo"

slice(start, end?) mirrors Python; end is exclusive.

Repeat

"ab".repeat(3)        # "ababab"

Replace

MethodBehavior
replace(needle, with)Replace first literal occurrence
replace_all(needle, with)Replace all literal occurrences
replace_re(pattern, with)Regex-aware single replacement
replace_all_re(pattern, with)Regex-aware all replacements
QUERY:  "hello hello".replace("hello", "hi")
OUT:    ["hi hello"]

QUERY:  "hello hello".replace_all("hello", "hi")
OUT:    ["hi hi"]

QUERY:  "abc123def".replace_all_re("\d+", "#")
OUT:    "abc#def"

Regex escapes inside jetro string literals. Use a single backslash: "\d", "\w+", "\s". Jetro string literals don't eat backslashes separately; doubling ("\\d") sends the regex engine the literal two-char sequence \\d, which is not the digit class and silently fails to match. This differs from host languages like Python or JavaScript where you must double-escape.

Strip

"prefix-foo".strip_prefix("prefix-")  # "foo"
"foo.txt".strip_suffix(".txt")        # "foo"

If the prefix/suffix isn't present, returns the input unchanged.

Encoding

MethodWhat
to_base64Standard base64 encode
from_base64Standard base64 decode
url_encodePercent-encode
url_decodePercent-decode
html_escape&&amp;, <&lt;, etc.
html_unescapeReverse of html_escape
QUERY:  "hello world".to_base64()     OUT: "aGVsbG8gd29ybGQ="
QUERY:  "a b".url_encode()     OUT: "a%20b"
QUERY:  "<b>".html_escape()     OUT: "&lt;b&gt;"

Demand notes

All string transforms are Identity demand-wise: they don't change what the upstream needs to produce.

Practical examples

# Normalise display names
$.users.map(u => u.name.trim().title_case().first())

# Build an URL-safe slug
"My Article Title".lower().replace_all(" ", "-")
# → "my-article-title"

# CamelCase to snake_case migration
"FooBarBaz".snake_case()                # → "foo_bar_baz"

# Truncate with ellipsis
$.posts.map(p => p.body.slice(0, 100) + "..." if p.body.len() > 100 else p.body)

# Parse a comma-separated tag list
$.tags_csv.split(",").map(@.trim())

# Encode for URL
$.query.url_encode()

# Encode binary as base64
$.bytes.to_base64()

# HTML-escape user input
$.comments.map(c => c.text.html_escape())

# Pad a numeric ID for fixed-width keys
($.id as string).pad_left(8, "0")
# → "00000042" for id=42

# Strip a known prefix
"https://example.com/path".strip_prefix("https://")
# → "example.com/path"

# Build a banner
"=".repeat(40)                          # → "========================================"

# Indent a nested message
$.message.indent(4)

String Search and Regex

Predicates (return boolean)

MethodBehavior
is_blankTrue if empty or only whitespace
is_numericTrue if all chars are digits
is_alphaTrue if all chars are letters
is_asciiTrue if all bytes < 128
starts_with(prefix)Prefix check
ends_with(suffix)Suffix check
QUERY:  "  ".is_blank()     OUT: true
QUERY:  "abc123".is_numeric()     OUT: false
QUERY:  "hello".starts_with("he")     OUT: true

Position

MethodReturns
index_of(needle)First index of needle, or -1
last_index_of(needle)Last index of needle, or -1
QUERY:  "hello world".index_of("o")     OUT: 4
QUERY:  "hello world".last_index_of("o")     OUT: 7
"foo bar foo".matches("foo")    # 2 (count of literal occurrences)
"abc 12 cd 34".scan("\d+")     # ["12", "34"] (regex matches as strings)

Regex match

MethodReturns
re_match(pattern)Boolean
match_first(pattern)First match string, or null
match_all(pattern)Array of all match strings
captures(pattern)First match with groups: [full, g1, g2, …]
captures_all(pattern)Array of captures results
QUERY:  "a1b2".re_match("\d")     OUT: true
QUERY:  "a1b2".match_first("\d+")     OUT: "1"
QUERY:  "a1b2".match_all("\d+")     OUT: ["1","2"]

QUERY:  "key=val".captures("(\\w+)=(\\w+)")
OUT:    ["key=val","key","val"]

The ~= operator is sugar for re_match and returns the same boolean.

Splitting

MethodBehavior
split(sep)Split on literal separator
split_re(pattern)Split on regex
QUERY:  "a,b,c".split(",")     OUT: ["a","b","c"]
QUERY:  "a,,b".split_re(",+")     OUT: ["a","b"]

Multi-needle membership

"abc def".contains_any(["abc", "xyz"])    # true (matches first)
"abc def".contains_all(["abc", "def"])    # true (all match)

Demand notes

Regex builtins are scalar. Lift across an array with .map(...). The underlying regex is compiled once per query and reused — no per-element re-compilation cost.

Conversion and Parsing

Coerce between value kinds.

to_number

  • Signature: Any -> Number | null
  • Behavior: Coerce to number. "42"42, "3.14"3.14, true1, false0. Returns null for unparseable strings.
QUERY:  "42".to_number()     OUT: 42
QUERY:  "3.14".to_number()     OUT: 3.14
QUERY:  "abc".to_number()      OUT: null

to_bool

  • Signature: Any -> Boolean
  • Behavior: Truthiness: false/null/0/""/[]/{}false, everything else → true.
QUERY:  $.maybe.to_bool()

parse_int(radix?)

  • Signature: String -> Number | null
  • Behavior: Parse a string as integer, optional radix (default 10).
QUERY:  "42".parse_int()     OUT: 42
QUERY:  "ff".parse_int(16)     OUT: 255
QUERY:  "0b101".parse_int(2)     OUT: 5

parse_float

  • Signature: String -> Number | null
  • Behavior: Parse a string as float (IEEE 754 double).
QUERY:  "3.14".parse_float()     OUT: 3.14
QUERY:  "1e6".parse_float()     OUT: 1000000.0

parse_bool

  • Signature: String -> Boolean | null
  • Behavior: Strict parse: only "true" and "false" (lowercase) match; everything else returns null.
QUERY:  "true".parse_bool()     OUT: true
QUERY:  "TRUE".parse_bool()     OUT: true

as cast (operator)

The as operator does the same coercions as to_*:

"42" as int          # 42
42 as string         # "42"
true as int          # 1

Use as when the type is statically known; use to_number/parse_* when parsing untrusted strings (since as errors on failure rather than returning null).

Round-trip JSON

For full document round-trip, see from_json/to_json.

Practical examples

# Coerce strings collected from a CSV
$.rows.map(r => r.merge({age: r.age.to_number(), price: r.price.parse_float()}))

# Defensive parse — null on garbage
$.user_input.parse_int() ?? 0

# Boolean coercion of a flag string
"true".parse_bool() ?? false

# Truthiness coercion
$.value.to_bool()               # null/0/""/empty → false; else true

# Cast operator for static conversions
($.id as string).pad_left(8, "0")

# Round-trip number → string → back
(3.14 as string).parse_float()  # → 3.14

Row Stream Source

rows() is a source builtin. It changes what the receiver means: instead of querying one document value, it exposes a stream of rows.

rows()

  • Signature: Source -> Stream<Row>
  • Arity: zero
  • Demand behavior: forwards retained-row demand to the source
  • Supported stream stages: reverse, filter, find, distinct_by, take, first, map

Normal JSON

On a normal JSON document, $.rows() treats the document itself as one row:

DOC:    {"id":1,"name":"Ada"}
QUERY:  $.rows().map({id: $.id, name: $.name})
OUT:    [{"id":1,"name":"Ada"}]

Top-level arrays are also one document row in normal JSON mode. Use normal array methods directly when the input document is an array.

NDJSON

In NDJSON mode, root $.rows() means all rows in the file or reader:

jetrocli --ndjson -i events.ndjson \
  -e '$.rows().filter($.active).take(10).map({id: $.id, name: $.name})'

Without $.rows(), the same CLI mode is row-local:

jetrocli --ndjson -i events.ndjson -e '$.id'

Reverse

For file-backed NDJSON, reverse() scans from the end:

jetrocli --ndjson -i app.log \
  -e '$.rows().reverse().find($.level == "error").first()'

Reader-backed reverse streams are unsupported because readers cannot seek.

Latest Per Key

For Kafka compacted-topic dumps, scan newest-to-oldest and keep the first row seen for each key:

jetrocli --ndjson -i users.ndjson --payload-after '|' \
  -e '$.rows().reverse().distinct_by($.id).take(100).map({id: $.id, name: $.name})'

distinct_by in this stream order keeps the newest row for each key and drops older duplicates immediately.

Notes

  • rows() is currently root-level: use $.rows(), not $.books.rows().
  • map is delayed or direct-written only when it is semantically safe.
  • Unsupported stream methods fail before scanning input.
  • For more examples, see NDJSON and Whole-Stream Queries.

Streaming One-to-One

Each input produces exactly one output. These compose freely; the planner fuses adjacent stages into a single composed stage when possible.

Fixture

Examples in this chapter run against:

{
  "users": [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}],
  "xs":    [1, 2, 3, 4, 5],
  "prices":[100, 105, 102, 110, 108, 115]
}

map

  • Signature: Array<A> -> Array<B> (with f: A -> B)
  • Demand law: MapLike — preserves pull, forces Whole.
QUERY:  $.users.map(u => u.name)
OUT:    ["Ada","Bob"]

QUERY:  $.xs.map(@ * 2)
OUT:    [2, 4, 6, 8, 10]

QUERY:  $.users.map(@.name.upper())
OUT:    ["ADA","BOB"]

map is the workhorse. The lambda may use any of the three forms.

enumerate

  • Signature: Array<A> -> Array<{index: Number, value: A}>
  • Behavior: Pair each element with its zero-based index. Output is a record {index, value} per element.
QUERY:  $.xs.enumerate()
OUT:    [{"index":0,"value":1},{"index":1,"value":2},{"index":2,"value":3},{"index":3,"value":4},{"index":4,"value":5}]

QUERY:  $.users.map(@.name).enumerate()
OUT:    [{"index":0,"value":"Ada"},{"index":1,"value":"Bob"}]

pairwise

  • Signature: Array<A> -> Array<[A, A]>
  • Behavior: Yield consecutive pairs [xs[0], xs[1]], [xs[1], xs[2]], …
QUERY:  [1,2,3,4].pairwise()
OUT:    [[1,2],[2,3],[3,4]]

QUERY:  $.xs.pairwise().map(p => p[1] - p[0])
OUT:    [1, 1, 1, 1]

lag(n=1) and lead(n=1)

  • Signature: Array<Number> -> Array<Number | null>
  • Behavior: Shift by n positions; out-of-range positions become null.
  • Numeric: Output values are returned as floats regardless of input numeric type.
QUERY:  $.xs.lag()
OUT:    [null, 1.0, 2.0, 3.0, 4.0]

QUERY:  $.xs.lead()
OUT:    [2.0, 3.0, 4.0, 5.0, null]

QUERY:  $.xs.lag(2)
OUT:    [null, null, 1.0, 2.0, 3.0]

diff_window(n=1)

  • Signature: Array<Number> -> Array<Number | null>
  • Behavior: xs[i] - xs[i - n], with null until lag is satisfied.
QUERY:  $.prices.diff_window()
OUT:    [null, 5.0, -3.0, 8.0, -2.0, 7.0]

pct_change(n=1)

  • Signature: Array<Number> -> Array<Number | null>
  • Behavior: (xs[i] - xs[i-n]) / xs[i-n] — relative change.
QUERY:  [100.0, 110.0, 121.0].pct_change()
OUT:    [null, 0.1, 0.09999999999999998]

cummax and cummin

  • Signature: Array<Number> -> Array<Number>
  • Behavior: Running max / min up to and including the current position.
QUERY:  $.prices.cummax()
OUT:    [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]

QUERY:  $.prices.cummin()
OUT:    [100.0, 100.0, 100.0, 100.0, 100.0, 100.0]

zscore

  • Signature: Array<Number> -> Array<Number>
  • Behavior: Standardise: (x - mean) / stddev. Two passes (one for stats, one for transform); not strictly streaming, but presented as a one-to-one stage at the user surface.
QUERY:  [1.0, 2.0, 3.0, 4.0, 5.0].zscore()
OUT:    [-1.414213562373095, -0.7071067811865475, 0.0, 0.7071067811865475, 1.414213562373095]

accumulate

See Barriersaccumulate is a barrier because it requires a custom reducer over the full input.

Practical examples

DOC:    {"prices":[100, 105, 102, 110, 108, 115]}

# Apply tax to every price
QUERY:  $.prices.map(@ * 1.08)
OUT:    [108.0, 113.4, 110.16000000000001, 118.80000000000001, 116.64000000000001, 124.2]

# Day-over-day deltas
QUERY:  [100,105,102,110,108].pairwise().map(p => p[1] - p[0])
OUT:    [5, -3, 8, -2]

# Running max ("high-water mark")
QUERY:  $.prices.cummax()
OUT:    [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]

# Lag-1 to compare current vs previous
QUERY:  $.prices.lag()
OUT:    [null, 100.0, 105.0, 102.0, 110.0, 108.0]

Filtering

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "xs": [1, 2, 3, 4, 5]}

Methods that drop elements based on a predicate.

filter

  • Signature: Array<A> -> Array<A> (with pred: A -> Bool)
  • Demand law: FilterLikeFirstInput(n) from downstream becomes UntilOutput(n) upstream.
$.users.filter(u => u.active)
$.users.filter(@.age >= 18)
$.users.filter(@.email ~= "@admin\.")

filter is the universal predicate stage. Combine with .take(n) for bounded scans:

$.events.filter(@.severity >= 3).take(10)

The planner stops reading from the source as soon as 10 events pass — no full scan.

find

  • Signature: Array<A> -> A | null (first match only on this branch)
  • Demand law: FilterLike with FirstInput(1) → source.
DOC:    {"users": [{"id":1,"role":"user"},{"id":2,"role":"admin"}]}
QUERY:  $.users.find(@.role == "admin")
OUT:    {"id":2,"role":"admin"}

find returns the first match (or null if none), not an array. Use find_all for the array form.

find_all

  • Signature: Array<A> -> Array<A>
  • Behavior: Like filter. Alias kept for readability.
$.users.find_all(@.role == "admin")

Equivalent to .filter(@.role == "admin"). The two are interchangeable.

compact

  • Signature: Array<Any> -> Array<Any>
  • Behavior: Drop nulls.
QUERY:  [1, null, 2, null, 3].compact()
OUT:    [1,2,3]

Equivalent to .filter(@ != null), but reads better and avoids a lambda.

take_while (alias takewhile)

  • Signature: Array<A> -> Array<A>
  • Behavior: Take elements while pred is true; stop at the first false (don't keep checking).
QUERY:  [1, 2, 3, 4, 1, 2].take_while(@ < 3)
OUT:    [1,2]

Demand law: bounded — terminates the source as soon as pred flips.

drop_while (alias dropwhile)

  • Signature: Array<A> -> Array<A>
  • Behavior: Drop the leading run where pred holds; emit the rest.
QUERY:  [1, 2, 3, 4, 1, 2].drop_while(@ < 3)
OUT:    [3,4,1,2]

remove

  • Signature: Array<A> -> Array<A>
  • Behavior: Inverse of filter. Drop elements where pred is true.
QUERY:  $.xs.remove(@ < 0)

Useful when the negated predicate reads worse than the affirmative.

Filtering objects

For object filtering, see filter_keys and filter_values in Objects. They take a predicate over keys / values and return a filtered object.

Practical examples

DOC:    {"users":[
  {"id":1,"name":"Ada","active":true,"age":30},
  {"id":2,"name":"Bob","active":false,"age":24},
  {"id":3,"name":"Cy", "active":true,"age":42}
]}

# Active users only
QUERY:  $.users.filter(@.active)
OUT:    []

# Active users over 30, just names
QUERY:  $.users.filter(@.active and @.age >= 30).map(@.name)
OUT:    []

# First admin (early-exit)
QUERY:  $.users.find(@.active).name
OUT:    "Ada"

# Take while a streak holds
QUERY:  [1,2,3,4,1,2].take_while(@ < 3)
OUT:    [1,2]

# Negate a predicate
QUERY:  $.users.remove(@.active).count()
OUT:    1

# Drop nulls
QUERY:  [1, null, 2, null, 3].compact()
OUT:    [1,2,3]

Worked demand example

DOC:    {"events": [
  {"sev": 1, "msg": "ok"},
  {"sev": 2, "msg": "warn"},
  {"sev": 3, "msg": "err"},
  {"sev": 1, "msg": "ok2"}
]}

QUERY:  $.events.filter(@.sev >= 2).map(@.msg).take(2)
OUT:    []

Demand walks back: take(2) → FirstInput(2), map → preserves, filter → UntilOutput(2). Source reads events one-by-one, stops after the second match.

Expanding Sequences

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}]}

Each input produces zero or many outputs.

flat_map

  • Signature: Array<A> -> Array<B> (with f: A -> Array<B>)
  • Behavior: Map then concatenate.
QUERY:  [[1,2],[3,4]].flat_map(@)
OUT:    [1,2,3,4]

QUERY:  $.users.flat_map(u => u.tags)

If f returns a non-array, it's wrapped first (flat_map(@ + 1) works on numbers).

flatten

  • Signature: Array<Array<A>> -> Array<A>
  • Behavior: One level of flattening.
QUERY:  [[1,2],[3],[4,5]].flatten()
OUT:    [1,2,3,4,5]

To flatten more levels, chain: .flatten().flatten(). Or use walk for full recursive flatten of arbitrary structure.

explode

0.5.11 status: explode requires an argument (errors with "explode: missing argument" on no-arg call). Spec is intended to mirror chars / to_pairs for the common cases; until then, use those builtins directly.

  • Signature (intended): (Array | Object | String) -> Array<...>
  • Behavior (intended): Convert to a flat sequence of elements / pairs / chars.
    • Array: identity
    • Object: array of [key, value] pairs (= to_pairs)
    • String: array of single-char strings (= chars)

split(sep)

  • Signature: String -> Array<String>
  • Behavior: Split a string on a literal separator. (See split_re for regex.)
QUERY:  "a,b,c".split(",")
OUT:    ["a","b","c"]

lines

  • Signature: String -> Array<String>
  • Behavior: Split on newline (\n or \r\n).
QUERY:  "a\nb\nc".lines()
OUT:    ["a","b","c"]

words

  • Signature: String -> Array<String>
  • Behavior: Split on whitespace (any run).
QUERY:  "  hello  world  ".words()
OUT:    ["hello","world"]

chars

  • Signature: String -> Array<String>
  • Behavior: Array of single-character strings.
QUERY:  "abc".chars()
OUT:    ["a","b","c"]

chars_of(s)

  • Signature: String -> Array<String>
  • Behavior: Equivalent to s.chars(). Useful when the source is the argument:
QUERY:  ($.text).chars_of()

bytes

  • Signature: String -> Array<Number>
  • Behavior: UTF-8 byte values, 0–255.
QUERY:  "abc".bytes()
OUT:    [97,98,99]

Demand notes

Expanding stages declare an indeterminate output count. Pull demand from downstream still flows back, but the planner can't tightly bound how many inputs are needed — it pulls one input at a time and yields outputs lazily.

.flat_map(...) followed by .first() will read inputs until the first flat-mapped output appears, then stop.

Practical examples

# Flatten one level
[[1,2],[3,4],[5]].flatten()                # → [1, 2, 3, 4, 5]

# Tags across all books
$.books.flat_map(@.tags)

# Distinct hashtags across tweets
$.tweets.flat_map(t => t.entities.hashtags.map(@.text)).unique()

# Word histogram from a paragraph
$.text.words().map(@.lower()).count_by(@)

# Parse CSV headers
"id,name,email".split(",")

# Process logs line by line
$.log_blob.lines().filter(@.contains_any(["ERROR","WARN"]))

# Char-level analysis
$.password.chars().count_by(@)             # frequency of each char

# Bytes for a binary diff
"hello".bytes()                            # → [104, 101, 108, 108, 111]

Reducers and Aggregates

Reducers consume the whole stream and emit a single value. They terminate the streaming pipeline.

Numeric

MethodSignatureNotes
sumArray<Number> -> NumberEmpty → 0
avgArray<Number> -> NumberEmpty → null
minArray<Number|String> -> ...Empty → null
maxArray<Number|String> -> ...Empty → null
QUERY:  [1,2,3,4].sum()     OUT: 10
QUERY:  [1,2,3,4].avg()     OUT: 2.5
QUERY:  [3,1,4,1,5].min()     OUT: 1.0
QUERY:  ["b","a","c"].max()   OUT: "c"

Demand law: NumericReducerValueNeed::Numeric, pull = All.

count

  • Signature: Array -> Number
  • Behavior: Element count.
  • Demand: All inputs, ValueNeed::None (no payload decoded).
QUERY:  $.users.count()
QUERY:  $.users.filter(@.active).count()

This is the cheapest reducer — the source skips deserialisation entirely.

approx_count_distinct

Not yet supported in 0.5.11 — runtime returns "ApproxCountDistinct: builtin unsupported". Spec exists; HyperLogLog backend pending.

  • Signature (planned): Array<Any> -> Number
  • Behavior (planned): Approximate count of distinct values via HLL.

For now, use .unique().count() for exact distinct count.

any (alias exists)

  • Signature: Array<A> -> Bool (with pred: A -> Bool)
  • Behavior: True if any element matches. Short-circuits.
QUERY:  $.users.any(@.role == "admin")
OUT:    true

all

  • Signature: Array<A> -> Bool
  • Behavior: True if every element matches. Short-circuits on first false.
QUERY:  $.flags.all(@ == true)

find_index

  • Signature: Array<A> -> Number | null
  • Behavior: Zero-based index of first match, or null.
QUERY:  ["a","b","c"].find_index(@ == "b")
OUT:    1

indices_where

  • Signature: Array<A> -> Array<Number>
  • Behavior: All indices where pred matches.
QUERY:  [10, 20, 5, 30, 8].indices_where(@ < 15)
OUT:    [0,2,4]

max_by and min_by

  • Signature: Array<A> -> A | null
  • Behavior: Element with the maximum / minimum projected key.
QUERY:  $.books.max_by(@.year)
QUERY:  $.users.min_by(@.age)

Distinguish from .sort(@.key).first()max_by is one pass; the sort form allocates the sorted array first.

When to use which

GoalUse
Sum/avg numberssum, avg
Count rowscount
Exact distinct count.unique().count()
Existence checkany
Universal checkall
Find indexfind_index
Pick single max/min elementmax_by, min_by

Practical examples

DOC:    {"books":[
  {"title":"Dune","year":1965,"price":15},
  {"title":"Foundation","year":1951,"price":10},
  {"title":"Hyperion","year":1989,"price":18},
  {"title":"Snow Crash","year":1992,"price":12}
]}

# Total revenue across all books
QUERY:  $.books.map(@.price).sum()
OUT:    55

# Mean price
QUERY:  $.books.map(@.price).avg()
OUT:    13.75

# Earliest and most expensive
QUERY:  $.books.min_by(b => b.year).title
OUT:    "Foundation"

QUERY:  $.books.max_by(b => b.price).title
OUT:    "Hyperion"

# Any cyberpunk in the catalog?
QUERY:  $.books.any(@.tags? and @.tags.includes("cyberpunk"))
# (where @.tags? guards against missing field)

# Count books published before 1970
QUERY:  $.books.filter(@.year < 1970).count()
OUT:    2

# Position of the first 1990s book
QUERY:  $.books.find_index(@.year >= 1990)
OUT:    3

# All published years where price > 12
QUERY:  $.books.indices_where(@.price > 12)
OUT:    [0,2]

Positional Access

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "transactions": [{"ts": "01"}, {"ts": "02"}, {"ts": "03"}]}

Bounded extraction by position.

first

  • Signature: Array<A> -> A | null
  • Demand law: First — always FirstInput(1).
QUERY:  [10,20,30].first()     OUT: 10
QUERY:  [].first()              OUT: null

QUERY:  $.users.filter(@.active).first()
# Source reads only enough to get one active user.

Equivalent to .nth(0) but reads better and is the canonical "early-exit" sink.

last

  • Signature: Array<A> -> A | null
  • Demand law: Last — always LastInput(1).
QUERY:  [10,20,30].last()     OUT: 30

When the source supports it (an in-memory array, or a tape with known length), last seeks to the end; for streams it must drain.

nth(i)

  • Signature: Array<A> -> A | null
  • Demand law: NthInput(i) if i is non-negative; LastInput(-i) otherwise.
QUERY:  [10,20,30,40].nth(2)     OUT: 30
QUERY:  [10,20,30,40].nth(-1)     OUT: 40

find_first(pred)

  • Signature: Array<A> -> A | null
  • Behavior: Same as find — kept for naming clarity. Use find in new code.

find_one(pred)

  • Signature: Array<A> -> A | null
  • Behavior: Asserts at most one match; errors if more than one matches. Useful for "exactly one user with this id" shapes.
QUERY:  $.users.find_one(@.id == 1)

collect

  • Signature: Any -> Array<Any>
  • Behavior: Coerce to array. Scalar → [scalar]; array → identity; null → [].
QUERY:  42.collect()     OUT: [42]
QUERY:  [1,2].collect()     OUT: [1,2]
QUERY:  null.collect()     OUT: []

Use collect to guarantee an array shape at a pipeline boundary — useful for callers that always want to iterate.

When to use a positional vs. a reducer

first() is a positional sink (returns one element). count() is a reducer (returns one number). Both terminate the pipeline. Use whichever matches your output type.

Worked example

DOC:    {"orders": [
  {"id": 1, "total": 100},
  {"id": 2, "total": 50},
  {"id": 3, "total": 200}
]}

QUERY:  $.orders.filter(@.total > 75).first().id
OUT:    1

QUERY:  $.orders.sort_by(@.total).last().id
OUT:    3

The first query early-exits (one filter pass, one match). The second sorts (barrier), then takes the last — the planner can't avoid the sort.

Practical examples

# First active user — early-exit, demand-aware
$.users.find(@.active).name

# Last log entry of severity 3+ (when the source supports random access)
$.logs.filter(@.sev >= 3).last().msg

# Get a user at known index
$.users.nth(2).email

# Negative-index array tail
$.transactions.nth(-1).ts

# Coerce-or-empty: scalar source becomes a 1-element array
"hello".collect()      # → ["hello"]
null.collect()         # → []

# Use collect() at a method-call boundary so callers always iterate
$.config.tags.collect().map(@.lower())

Barrier Operators

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}], "daily": [{"day": 1, "value": 10}, {"day": 2, "value": 12}]}

Barriers must see the full input before emitting any output. They materialise. Place them late in pipelines when possible.

Sort

sort (alias sort_by)

  • Signature: Array<A> -> Array<A>
  • Behavior: Stable ascending sort. With a projection, sorts by the projected key.
QUERY:  [3,1,4,1,5].sort()
OUT:    [1,1,3,4,5]

QUERY:  $.books.sort(@.year)
QUERY:  $.books.sort(b => -b.year)
QUERY:  $.users.sort(@.last_name, @.first_name)

Multi-arg form sorts by a tuple of keys.

Distinct

unique (alias distinct)

  • Signature: Array<A> -> Array<A>
  • Behavior: Remove duplicates by structural equality, preserving first occurrence order.
QUERY:  [3,1,4,1,5,9,2,6,5].unique()
OUT:    [3,1,4,5,9,2,6]

unique_by(f)

  • Signature: Array<A> -> Array<A>
  • Behavior: Dedup by projected key.
QUERY:  $.books.unique_by(@.author)

Group / count / index

group_by(key)

  • Signature: Array<A> -> Object<KeyString, Array<A>>
  • Behavior: Bucket by projected key.
QUERY:  $.books.group_by(@.author)
OUT:    {"Herbert":[{"title":"Dune",...}],"Asimov":[{"title":"Foundation",...}],...}

count_by(key)

  • Signature: Array<A> -> Object<KeyString, Number>
  • Behavior: Bucket counts.
QUERY:  $.books.count_by(@.author)
OUT:    {"Herbert":1,"Asimov":1,"Simmons":1,"Stephenson":1}

index_by(key)

  • Signature: Array<A> -> Object<KeyString, A>
  • Behavior: Index by key. Last wins on collision.
QUERY:  $.users.index_by(@.id)
OUT:    {"1":{"id":1,"name":"Ada",...},"2":{"id":2,"name":"Bob",...},"3":{"id":3,"name":"Cy",...}}

group_shape

Not yet supported in 0.5.11 — runtime returns "GroupShape: builtin unsupported". Tracked for a future release.

  • Signature: Array<Object> -> Array<Object>
  • Behavior (planned): Group by structural shape (key set).

Partition

partition(pred)

  • Signature: Array<A> -> [Array<A>, Array<A>]
  • Behavior: Split into [matching, non_matching].
QUERY:  $.books.partition(@.year < 1970)
OUT:    [[{"title":"Dune",...},{"title":"Foundation",...}],[{"title":"Hyperion",...},{"title":"Snow Crash",...}]]

Window / chunk

window(size)

  • Signature: Array<A> -> Array<Array<A>>
  • Behavior: Sliding window of size.
QUERY:  [1,2,3,4,5].window(3)
OUT:    [[1,2,3],[2,3,4],[3,4,5]]

chunk(size) (alias batch)

  • Signature: Array<A> -> Array<Array<A>>
  • Behavior: Non-overlapping chunks. Last chunk may be shorter.
QUERY:  [1,2,3,4,5,6,7].chunk(3)
OUT:    [[1,2,3],[4,5,6],[7]]

Rolling aggregates

MethodBehavior
rolling_sum(n)Sum over a window of size n
rolling_avg(n)Average over a window
rolling_min(n)Min over a window
rolling_max(n)Max over a window
QUERY:  [1,2,3,4,5].rolling_sum(3)
OUT:    [null,null,6.0,9.0,12.0]

The leading n-1 positions emit null until the window fills.

accumulate(init, fn)

  • Signature: Array<A> -> Array<B> with fn: (B, A) -> B
  • Behavior: Streaming fold producing intermediate states.
QUERY:  [1,2,3,4].accumulate(0, (a, x) => a + x)
OUT:    [1,3,6,10]

QUERY:  [1,2,3,4].accumulate((a, x) => a + x)
OUT:    [1,3,6,10]

When to barrier

You have to barrier when:

  • Order needs computation (sort, unique)
  • Output is grouped / indexed (group_by, index_by)
  • A window crosses element boundaries (window, rolling_*)

You don't need a barrier for:

  • Per-element transforms (map)
  • Predicates (filter)
  • Numeric reducers (sum, count) — they're streaming reducers, not barriers

Practical examples

DOC:    {"books":[
  {"title":"Dune","year":1965,"author":"Herbert","price":15},
  {"title":"Foundation","year":1951,"author":"Asimov","price":10},
  {"title":"Hyperion","year":1989,"author":"Simmons","price":18},
  {"title":"Snow Crash","year":1992,"author":"Stephenson","price":12}
]}

# Sort by year ascending
QUERY:  $.books.sort(b => b.year).map(@.title)
OUT:    ["Foundation","Dune","Hyperion","Snow Crash"]

# Sort by price descending (negate the key)
QUERY:  $.books.sort(b => -b.price).map(@.title)
OUT:    ["Hyperion","Dune","Snow Crash","Foundation"]

# Distinct tags across books
QUERY:  $.books.flat_map(@.tags).unique()

# How many distinct authors
QUERY:  $.books.unique_by(b => b.author).count()
OUT:    4

# Group by author
QUERY:  $.books.group_by(b => b.author)
OUT:    {"Herbert":[{"title":"Dune",...}],"Asimov":[{"title":"Foundation",...}],...}

# Histogram of authors (prefer count_by — no buffering of bucket payloads)
QUERY:  $.books.count_by(b => b.author)
OUT:    {"Herbert":1,"Asimov":1,"Simmons":1,"Stephenson":1}

# Build a quick lookup table
QUERY:  $.users.index_by(u => u.id)

# Sliding-3 windows for moving stats
QUERY:  $.measurements.window(3).map(w => w.sum() / 3)

# 50/50 split into batches of 10 for paginated processing
QUERY:  $.records.chunk(10)

# 7-day moving average over a numeric series
QUERY:  $.daily.rolling_avg(7)

Array and Set Operations

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "tags_today": ["a", "b", "c"], "tags_yesterday": ["b", "c", "d"], "left_tags": ["a", "b", "c"], "right_tags": ["b", "c", "d"]}

Operations that take an array and produce a derivative array (or join two arrays).

append(v) and prepend(v)

  • Signature: Array<A> -> Array<A>
  • Behavior: Add v to the end / front.
QUERY:  [1,2,3].append(4)     OUT: [1,2,3,4]
QUERY:  [1,2,3].prepend(0)     OUT: [0,1,2,3]

When used as chain-write terminals ($.path.append(v)), they patch the document — see Patch.

reverse

  • Signature: Array<A> -> Array<A>
  • Behavior: Reverse element order. Also works on strings (calls reverse_str).
QUERY:  [1,2,3].reverse()     OUT: [3,2,1]
QUERY:  "abc".reverse()     OUT: ["abc"]

Set-like operations

MethodBehavior
diff(other)Elements in self not in other
intersect(other)Elements in both
union(other)Elements in either, deduped
QUERY:  [1,2,3,4].diff([3,4,5])     OUT: [1,2]
QUERY:  [1,2,3,4].intersect([3,4,5])     OUT: [3,4]
QUERY:  [1,2,3].union([3,4,5])     OUT: [1,2,3,4,5]

Equality is structural. Order: result preserves first-occurrence order from the left operand.

join(sep)

  • Signature: Array<String> -> String
  • Behavior: Concatenate strings with separator.
QUERY:  ["a","b","c"].join(", ")
OUT:    "a, b, c"

QUERY:  $.users.map(@.name).join(" / ")

For non-string elements, lift with .map(@.to_string()) first.

zip(other) and zip_longest(other, fill?)

  • Signature: Array<A>, Array<B> -> Array<[A, B]>
  • Behavior: Pair element-wise.
QUERY:  [1,2,3].zip(["a","b","c"])
OUT:    [[1,"a"],[2,"b"],[3,"c"]]

QUERY:  [1,2,3].zip(["a","b"])     OUT: [[1,"a"],[2,"b"]]
QUERY:  [1,2,3].zip_longest(["a","b"]) OUT: [[1,"a"],[2,"b"],[3,null]]
QUERY:  [1,2,3].zip_longest(["a"], "x") OUT: [[1,"a"],[2,"x"],[3,"x"]]

fanout(...lambdas)

  • Signature: A -> Array<...>
  • Behavior: Apply each lambda to the same input; collect results.
DOC:    {"x": 10}
QUERY:  $.x.fanout(@ * 2, @ + 1, @.to_string())
OUT:    [20,11,"10"]

Useful for building multi-shape projections without repeating subexpressions.

zip_shape(arrays)

Not yet supported in 0.5.11 — runtime returns "ZipShape: builtin unsupported". Spec exists; runtime hookup pending.

  • Signature (planned): Object<KeyString, Array<A>> -> Array<Object>
  • Behavior (planned): Combine parallel arrays under shared keys into an array of objects.

The inverse is pivot — see Objects.

Demand notes

Set operations and join are barriers (they consume both inputs fully). reverse is a barrier too — but it's cheap and well-supported by demand: reverse().take(n) is rewritten so the source seeks to the end.

Practical examples

# Add an item to a tag list
$.user.tags.append("admin")             # patches the doc

# Build a "label = value" string
$.user.pick(name, email).values().join(" = ")

# CSV row from selected fields
[$.user.id, $.user.name, $.user.email].join(",")

# Set difference — find items missing from a baseline
[1,2,3,4,5].diff([2,4])                 # → [1, 3, 5]

# Set intersection — common items
$.left_tags.intersect($.right_tags)

# Merge unique values, preserving first-occurrence order
$.tags_today.union($.tags_yesterday)

# Reverse and take last 5 (demand-aware: seeks end)
$.events.reverse().take(5)

# Pair two arrays positionally
[1,2,3].zip(["a","b","c"])              # → [[1,"a"],[2,"b"],[3,"c"]]

# Pad shorter array with default
[1,2,3].zip_longest(["a","b"], "?")     # → [[1,"a"],[2,"b"],[3,"?"]]

# Run several projections at once
$.metric.value.fanout(@ * 2, @ + 1, @ - 1)    # → [v*2, v+1, v-1]

Object Projection and Transform

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

Methods that read or rewrite objects.

Keys and values

MethodSignatureResult
keysObject -> Array<String>Insertion-order key list
valuesObject -> Array<Any>Insertion-order value list
entriesObject -> Array<[String, Any]>Key-value pairs
to_pairsObject -> Array<[String, Any]>Alias of entries
DOC:    {"a": 1, "b": 2}
QUERY:  $.keys()     OUT: ["a","b"]
QUERY:  $.values()     OUT: [1,2]
QUERY:  $.entries()     OUT: [["a",1],["b",2]]

from_pairs

  • Signature: Array<[String, Any]> -> Object
  • Behavior: Inverse of to_pairs.
QUERY:  [["a",1],["b",2]].from_pairs()
OUT:    {"a":1,"b":2}

invert

  • Signature: Object<K, V> -> Object<V, K>
  • Behavior: Swap keys and values. Values must be coercible to keys (string-like).
QUERY:  {"a":"x","b":"y"}.invert()
OUT:    {"x":"a","y":"b"}

pick(field, ...)

  • Signature: Object -> Object
  • Behavior: Keep only the named keys. Supports alias: src rename.
DOC:    {"id": 1, "name": "Ada", "secret": "!"}

QUERY:  $.pick(id, name)
OUT:    {"id":1,"name":"Ada"}

QUERY:  $.pick(uid: id, name)
OUT:    {"name":"Ada","uid":1}

Maps over arrays of objects:

$.users.pick(id, email)

is equivalent to $.users.map(u => u.pick(id, email)).

omit(field, ...)

  • Signature: Object -> Object
  • Behavior: Inverse of pick. Drop the named keys.
QUERY:  $.user.omit(secret, password)

Merge

MethodBehavior
merge(other)Shallow merge — other's keys win on collision
deep_merge(other)Recursive merge — sub-objects merged, arrays replaced
defaults(other)Reverse merge — keep self's keys, fill missing from other
QUERY:  {"a":1,"b":2}.merge({"b":99,"c":3})
OUT:    {"a":1,"b":99,"c":3}

QUERY:  {"a":{"x":1}}.deep_merge({"a":{"y":2}})
OUT:    {"a":{"x":1,"y":2}}

QUERY:  {"a":1}.defaults({"a":99,"b":2})
OUT:    {"a":1,"b":2}

rename(...mapping)

  • Signature: Object -> Object
  • Behavior: Rename keys per a {old: new, ...} mapping.
QUERY:  $.user.rename({user_id: id, full_name: name})

transform_keys(fn) and transform_values(fn)

  • Signature: Object -> Object
  • Behavior: Apply fn to every key / value.
QUERY:  {"foo": 1, "bar": 2}.transform_keys(@.upper())
OUT:    [{"BAR":2,"FOO":1}]

QUERY:  {"a": 1, "b": 2}.transform_values(@ * 10)
OUT:    [{"a":10,"b":20}]

filter_keys(pred) and filter_values(pred)

  • Signature: Object -> Object
  • Behavior: Keep entries whose key / value matches the predicate.
QUERY:  $.config.filter_keys(k => k.starts_with("aws_"))
QUERY:  $.scores.filter_values(@ >= 50)

pivot(rows, cols, value)

  • Signature: Array<Object> -> Object<KeyString, Object>
  • Behavior: Pivot a table-shaped array into a nested object indexed by rows then cols, with value as the leaf.
DOC:    [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY:  $.pivot("y", "q", "v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15}}

implode(joiner=",")

  • Signature: Array<String> -> String
  • Behavior: Like join, but works on object values too:
QUERY:  {"a":"x","b":"y"}.values().implode("/")
OUT:    ["x","y"]

Demand notes

pick is a powerful demand signal — it tells the source which fields are needed. Over a wide-record document, pick(id, name) upstream of the rest of the pipeline avoids decoding all the other fields.

keys over an array stage emits one row per element, but keys over a single object is a scalar.

Practical examples

DOC:    {"users":[
  {"id":1,"name":"Ada","email":"ada@x.com","secret":"!"},
  {"id":2,"name":"Bob","email":"bob@y.org","secret":"?"}
]}

# Project safe public fields
QUERY:  $.users.map(u => u.pick(id, name, email))

# Drop sensitive keys
QUERY:  $.users.map(u => u.omit(secret))

# Rename in flight
QUERY:  $.users.map(u => u.pick(uid: id, full_name: name, email))

# Keys / values / entries
QUERY:  $.users[0].keys()                  → ["id","name","email","secret"]
QUERY:  $.users[0].values().count()        → 4
QUERY:  $.users[0].entries().count()       → 4

# Round-trip through entries
QUERY:  $.users[0].entries().from_pairs()  → equivalent to $.users[0]

# Merge with defaults (existing keys win)
QUERY:  $.config.defaults({timeout: 30, retries: 3})

# Deep-merge config layers
QUERY:  $.base_config.deep_merge($.user_config)

# Filter object by key prefix
QUERY:  $.env.filter_keys(k => k.starts_with("AWS_"))

# Filter values
QUERY:  $.scores.filter_values(@ >= 50)

# Apply transform to every value
QUERY:  $.prices.transform_values(@ * 1.08)

# Normalise keys to snake_case
QUERY:  $.payload.transform_keys(k => k.snake_case())

# Invert a code-to-name table
QUERY:  $.country_codes.invert()           # {"US":"United States",...} → {"United States":"US",...}

# Pivot long-format records
DOC:    [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY:  $.pivot("y","q","v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15}}

Path and Structural Mutation

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

Methods that read, set, delete, or rewrite values at specific paths within a document. These work on whole documents or sub-trees.

For chain-write terminals ($.path.set(v)) see Patch. This chapter documents the method-call versions.

get_path(path)

  • Signature: Any, String -> Any | null
  • Behavior: Read a value at a slash- or dot-separated path.
DOC:    {"user": {"profile": {"name": "Ada"}}}
QUERY:  $.get_path("user")
OUT:    {"profile":{"name":"Ada"}}
QUERY:  $.get_path("user/profile")
OUT:    {"name":"Ada"}

set_path(path, value)

  • Signature: Any, String, Any -> Any
  • Behavior: Return a copy with value written at path. Creates intermediate objects as needed.
QUERY:  $.set_path("user/profile/email", "ada@example.com")

del_path(path)

  • Signature: Any, String -> Any
  • Behavior: Return a copy with the leaf at path removed.
QUERY:  $.del_path("user/secret")

del_paths(paths)

  • Signature: Any, Array<String> -> Any
  • Behavior: Remove all listed paths in one pass. Cheaper than chained del_path for many removals.
QUERY:  $.del_paths(["user/secret", "user/temp", "session/csrf"])

has_path(path)

  • Signature: Any, String -> Bool
  • Behavior: True if a path exists and resolves to a non-null value. Current 0.5.11 behavior treats a present null like a missing path:
DOC:    {"a": null}
QUERY:  $.has_path("a")     OUT: false
QUERY:  $.has_path("b")     OUT: false

flatten_keys(sep="/")

  • Signature: Object -> Object
  • Behavior: Flatten a nested object into a single-level object with joined keys.
DOC:    {"a": {"b": 1, "c": 2}, "d": 3}
QUERY:  $.flatten_keys()
OUT:    {"a.b":1,"a.c":2,"d":3}

QUERY:  $.flatten_keys(".")
OUT:    {"a.b":1,"a.c":2,"d":3}

unflatten_keys(sep="/")

  • Signature: Object -> Object
  • Behavior: Inverse of flatten_keys.
QUERY:  {"a/b": 1, "a/c": 2}.unflatten_keys()
OUT:    {"a/b":1,"a/c":2}

set(path, value) (method-call form)

  • Signature: Any, String, Any -> Any
  • Behavior: Same as set_path. Kept for ergonomic chains.

The chain-write terminal $.path.set(v) is different — it's parsed as a patch and operates on the rooted document path.

update

update is jetro's functional batched update. Two surfaces:

Object body — update({k: expr, ...})

Apply a set of field updates to one or more selected subtrees. Plain keys update fields below the receiver; quoted keys carry full paths.

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf", "hugo"]}
]}

QUERY:  $.books[*].update({tags: tags.append("test"), reviewed: true})
OUT:    {"books":[{"reviewed":true,"tags":["sf","test"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","hugo","test"],"title":"Hyperion","year":1989}]}

Each selected book gets both fields written. Plain identifiers (tags, reviewed) are read against the selected snapshot — not the mid-batch document — so two ops on the same target both see the original field values.

Body forms:

FormMeaning
field: exprWrite expr into field of each selected target
"a.b.c": exprWrite into a nested path inside each selected target
"books[*].tags": exprQuoted path key — full root-relative path with wildcards/filters
field: expr when condSkip when cond is falsy
field: DELETERemove the field (with optional when)

@ inside the body is the current value at the target field (handy inside path keys); $ is the original root.

QUERY:  $.books[*].update({tags: tags.append("modern") when year > 1980})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","hugo","modern"],"title":"Hyperion","year":1989}]}

Root-level batch with quoted paths

When the receiver is $, quoted keys carry full paths, including wildcards and DELETE:

QUERY:  $.update({"books[*].tags": @.append("test"), active: false})
DOC:    {"books": [{"tags": ["sf"]}], "active": true}
OUT:    {"active":false,"books":[{"tags":["sf","test"]}]}
DOC:    {"users": [{"id":1,"secret":"a"}, {"id":2,"secret":"b"}]}
QUERY:  $.update({"users[*].secret": DELETE})
OUT:    {"users":[{"id":1},{"id":2}]}

Filtered wildcard [* if pred]

Both selectors and quoted path keys support a filtered wildcard:

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}

QUERY:  $.books[* if year > 1980].update({tags: tags.append("modern")})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

QUERY:  $.update({"books[* if year > 1980].tags": @.append("modern")})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

Two-argument path form — update(path, expr)

The classic shape: a slash- or dot-separated path plus an expression. @ inside the expression is the current value at path.

DOC:    {"counters": {"visits": 10, "clicks": 3}}
QUERY:  $.update("counters.visits", @ + 1)
OUT:    {"counters":{"clicks":3,"visits":11}}

QUERY:  $.update("counters/visits", @ + 1)
OUT:    {"counters":{"clicks":3,"visits":11}}

Semantics

PropertyBehavior
Snapshot readsEach body expression sees the pre-batch values, not partial mid-batch state
OrderOps apply in source order — last write wins on overlap
SelectorsIndex, wildcard [*], filtered wildcard [* if pred], nested chains all OK
Scalar targetsAn update with object body promotes scalar elements to objects ({seen: true} over [1,2][{seen:true},{seen:true}])
Untouched subtreesPreserved by Arc sharing — no deep copy of unrelated fields
Empty body.update({}) is a no-op — returns the doc unchanged

Worked example

DOC:    {"users": [
  {"id": 1, "secret": "a", "name": "Ada"},
  {"id": 2, "secret": "b", "name": "Bob"}
]}

QUERY:  $.users.map(u => u.omit("secret").set_path("display", u.name))
OUT:    [{"display":"Ada","id":1,"name":"Ada"},{"display":"Bob","id":2,"name":"Bob"}]

Demand notes

Path-mutation methods produce a full result and can't tell the source what fields they need (the path is data, not statically analysable). When the path is a literal, prefer pick/omit/set over get_path/set_path — the planner can use literal field names.

Practical examples

# Single-key write
$.user.name.set("Ada Lovelace")                  # chain-write

# Set a field deep
patch $ { user.profile.email: "ada@x.com" }

# Bulk delete
$.del_paths(["secret","temp","csrf"])

# Flatten a nested config for environment-variable export
$.config.flatten_keys(".")                       # {"db.host":..., "db.port":..., ...}

# Round-trip via flatten/unflatten
$.config.flatten_keys().unflatten_keys()         # ≈ $.config

# Existence test before write
patch $ {
  email: $.user.email when $.has_path("user.email")
}

# Flat-key patches
$.patch_set.flatten_keys().entries().map(([k,v]) => $.set_path(k, v))

# Batched functional update
$.books[*].update({
  reviewed: true,
  tags: tags.append("classic") when year < 1970,
  tmp: DELETE
})

Deep Traversal and Recursion

Walk every descendant value in DFS pre-order. The deep methods are also available as ..method(...) syntax sugar in path position.

deep_find(pred) (or ..find(pred))

  • Signature: Any -> Array<Any>
  • Behavior: Every descendant satisfying pred. Order: DFS pre-order.
DOC:    {"a": {"x": 1}, "b": [{"x": 2}, {"y": 3}]}
QUERY:  $..find(@.x?)
OUT:    [{"x":1},{"x":2}]

QUERY:  $.deep_find(@ is number)
OUT:    [1,2,3]

When the structural index is available, deep_find runs over a bitmap representation in jetro-experimental rather than walking Val nodes — significantly faster for shallow predicates.

deep_shape({k1, k2, ...}) (or ..shape({...}))

  • Signature: Any -> Array<Object>
  • Behavior: Every object that has all listed keys (regardless of value).
DOC:    [{"id":1,"name":"a"},{"id":2},{"name":"c","id":3}]
QUERY:  $..shape({id, name})
OUT:    [{"id":1,"name":"a"},{"id":3,"name":"c"}]

deep_like({k1: v1, ...}) (or ..like({...}))

  • Signature: Any -> Array<Object>
  • Behavior: Every object whose listed keys equal the listed literal values.
DOC:    [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942},{"author":"Herbert","year":1965}]
QUERY:  $..like({author: "Asimov"})
OUT:    [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942}]

walk(fn)

  • Signature: Any, (Any -> Any) -> Any
  • Behavior: Apply fn to every node bottom-up; rebuild the tree.
QUERY:  $.walk(node => node.upper() if node is string else node)
# Returns the document with every string node uppercased.

walk_pre(fn)

  • Signature: Any, (Any -> Any) -> Any
  • Behavior: Like walk, but pre-order — fn sees parent before children.

Use walk_pre when the transform decides whether to recurse based on the node's identity (e.g. "stop at leaves of kind X").

rec(pattern, fn)

Limited in 0.5.11 — recursive rewrites are guarded with a 10 000 iteration cap. Prefer walk or walk_pre for one-pass document traversal, and keep rec for bounded fixpoint-style rewrites.

  • Signature (planned): Any, Pattern, (Any -> Any) -> Any
  • Behavior (planned): Match-and-rewrite. Recursively walks; replaces every match with fn(match).

This is the recursive sibling of Pattern Match; useful for AST rewrites and document migrations.

trace_path(pred)

  • Signature: Any, (Any -> Bool) -> Array<Array<Step>>
  • Behavior: For every node matching pred, return the path from root to the node as an array of steps.
DOC:    {"a": {"x": 1}, "b": [{"x": 2}]}
QUERY:  $.trace_path(@.x?)
OUT:    [{"path":"$.a","value":{"x":1}},{"path":"$.b[0]","value":{"x":2}}]

The steps are the keys/indices to walk to reach the match. Pair with set_path for find-and-replace operations.

Deep match

The pattern-match construct has deep variants ..match and ..match! — see Control Flow and the pattern-match cookbook.

When the bitmap kicks in

Deep search uses the structural index when:

  • The query is rooted at $.. or .deep_*
  • The predicate is a shape/key check (not a complex lambda)
  • The document was loaded with the simd-json tape (default)

You don't enable this — it's selected by the planner.

Demand notes

Deep traversals declare All upstream by nature. The optimisation surface is the predicate: shape and like checks bypass the per-node lambda evaluation entirely.

Practical examples

# Find every node with an "id" key (anywhere in the tree)
$..find(@.id?)

# Find all numbers
$..find(@ is number)

# Every object that has both id + name keys
$..shape({id, name})

# Every object where a field equals a specific value
$..like({status: "error"})

# Locate an event by ID inside a deeply nested tree
$..match! { {id: 42} -> @, _ -> null }

# Walk every node, transforming strings to upper
$.walk(node => node.upper() if node is string else node)

# Trace paths from root to nodes matching a predicate
$.trace_path(@.is_admin?)
# → [["users",0],["users",2]]

# Bulk audit: find every "secret"-named field
$..find(@.secret?)

Membership and Predicates

Tests and small helpers.

or(default)

  • Signature: Any, Any -> Any
  • Behavior: If self is null, return default. Otherwise return self.
QUERY:  null.or("default")     OUT: "default"
QUERY:  "hi".or("default")     OUT: "hi"

Equivalent to ?? default but reads better in chains:

$.user.name.or("anon")

has(key)

  • Signature: Object|Array, KeyOrIndex -> Bool
  • Behavior: True if the key exists (objects) or index is in range (arrays).
QUERY:  {"a":1,"b":2}.has("a")     OUT: true
QUERY:  {"a":1}.has("b")     OUT: false
QUERY:  [1,2,3].has(2)     OUT: true
QUERY:  [1,2,3].has(5)     OUT: false

The has operator (x has y) is sugar for x.includes(y) — distinct from this method.

has_key(key)

  • Signature: Object, String -> Bool
  • Behavior: True if the receiver is an object and the key exists.
QUERY:  {"a":1,"b":null}.has_key("a")     OUT: true
QUERY:  {"a":1,"b":null}.has_key("b")     OUT: true
QUERY:  {"a":1}.has_key("z")              OUT: false
QUERY:  [1,2,3].has_key("0")              OUT: false

Use has_key when you specifically mean object-key existence. It is narrower than has and easier for direct object-key checks to optimize.

missing(...keys)

  • Signature: Object, ...String -> Array<String>
  • Behavior: Return the subset of provided keys that are not present.
QUERY:  {"host":"localhost","port":5432}.missing("host", "port", "user")
OUT:    ["user"]

includes(value) (alias contains)

  • Signature: Array|String, Any -> Bool
  • Behavior: Membership.
QUERY:  [1,2,3].includes(2)           OUT: true
QUERY:  "hello".includes("ell")       OUT: true

index(value)

  • Signature: Array|String, Any -> Number | null
  • Behavior: Index of first occurrence; null if not found.
QUERY:  [10,20,30].index(20)          OUT: 1
QUERY:  [10,20,30].index(99)          OUT: null

For strings, see also index_of in String Search.

indices_of(value)

  • Signature: Array|String, Any -> Array<Number>
  • Behavior: All indices of value.
QUERY:  [1,2,3,2,1].indices_of(2)
OUT:    [1, 3]

Quick comparison: predicates that look similar

PatternReturns
obj.has_key("foo")Bool — does this object key exist?
xs.has("foo")Bool — key/index style existence helper
xs.includes("foo")Bool — is the value present?
x has yBool — membership/containment operator
doc.has_path("a.b")Bool — does this nested path exist?
xs.index("foo")Number|null — where?
xs.indices_of("foo")Array — all positions
xs.find(p)A|null — first matching element
xs.find_index(p)Number|null — first matching index

Practical examples

# Default for missing field
$.user.email.or("no-email@example.com")

# Existence check on key
$.config.has_key("aws_region")

# Which required config keys are absent
$.config.missing("host", "port", "user")

# Index of a value (not the predicate form)
$.tags.index("admin")

# All positions of duplicates
[1, 2, 1, 3, 1].indices_of(1)            # → [0, 2, 4]

# Membership in a set
$.tags.includes("urgent")

# Allow-list / deny-list patterns
$.role.includes("admin") and not $.banned_users.includes($.id)

Tabular Output

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}]}

Serialise sequences of objects to row-oriented text formats.

to_csv(headers?)

  • Signature: Array<Object> -> String
  • Behavior: RFC-4180-ish CSV. Without arguments, the union of object keys is the header set, sorted by first-appearance.
DOC:    [{"name":"Ada","age":36},{"name":"Bob","age":42}]
QUERY:  $.to_csv()
OUT:
"name,age
Ada,36
Bob,42"

With explicit headers:

QUERY:  $.to_csv(["age","name"])
OUT:
"age,name
36,Ada
42,Bob"

Strings containing commas, quotes, or newlines are quoted and escaped per RFC 4180.

to_tsv(headers?)

  • Signature: Array<Object> -> String
  • Behavior: Same as to_csv but tab-separated. No quoting (tab-in-value is replaced with a space).
QUERY:  $.users.to_tsv(["id","email"])

Composing with the rest of the pipeline

Build a report:

$.users
  .filter(@.active)
  .map(u => u.pick(id, name, email))
  .sort(@.id)
  .to_csv()

Pipe to a file from the CLI:

jetrocli -e '$.users.filter(@.active).pick(id,name).to_csv()' < users.json > out.csv

Limitations

  • Nested values are JSON-encoded into the cell. For deeply-nested structures, flatten first with flatten_keys:
    $.records.map(r => r.flatten_keys()).to_csv()
    
  • The format is row-major. For wide-narrow long-format reshape, use pivot / zip_shape first.
  • For Excel-flavored CSV (BOM, CRLF), post-process the result.

Practical examples

# Active-user export
$.users.filter(@.active).map(u => u.pick(id, name, email)).sort(u => u.id).to_csv()

# Daily sales report
$.sales.group_by(s => s.day).entries().map(e => {
  day:   e[0],
  total: e[1].map(@.amount).sum(),
  count: e[1].count()
}).to_csv()

# Hashtag frequency CSV
$.tweets.flat_map(t => t.entities.hashtags.map(@.text))
  .count_by(@)
  .entries()
  .map(e => {tag: e[0], count: e[1]})
  .to_csv()

# TSV for log shipping
$.logs.map(l => l.pick(ts, level, message)).to_tsv()

Relational

Fixture

Examples below run against:

DOC:    {"orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "customers": [{"id": 1, "name": "Ada", "email": "ada@x.com"}, {"id": 2, "name": "Bob", "email": "bob@y.org"}], "left": [{"id": 1, "name": "Ada"}, {"id": 2, "name": "Bob"}], "right": [{"uid": 1, "role": "admin"}, {"uid": 2, "role": "user"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}]}

Operations that combine two arrays of objects on a key.

equi_join(other, leftKey, rightKey, fn?)

  • Signature: Array<L>, Array<R>, KeyL, KeyR, ((L, R) -> Any)? -> Array<Any>
  • Behavior: Inner equi-join: for every pair (l, r) where l[leftKey] == r[rightKey], emit a result. If fn is omitted, the result is the merged object l.merge(r).
LEFT:   [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}]
RIGHT:  [{"uid":1,"role":"admin"},{"uid":2,"role":"user"}]

QUERY:  $.left.equi_join($.right, "id", "uid")
OUT:    [{"id":1,"name":"Ada","uid":1,"role":"admin"},
         {"id":2,"name":"Bob","uid":2,"role":"user"}]

QUERY:  $.left.equi_join($.right, "id", "uid", (l, r) => {
          name: l.name,
          role: r.role
        })
OUT:    [{"name":"Ada","role":"admin"},{"name":"Bob","role":"user"}]

Worked example: orders + customers

DOC:
{
  "customers": [
    {"id": 1, "name": "Ada"},
    {"id": 2, "name": "Bob"}
  ],
  "orders": [
    {"customer": 1, "amount": 100},
    {"customer": 1, "amount": 50},
    {"customer": 2, "amount": 75}
  ]
}

QUERY:
  $.orders.equi_join($.customers, "customer", "id", (o, c) => {
    customer: c.name,
    amount: o.amount
  })

OUT:
  [
    {"customer":"Ada","amount":100},
    {"customer":"Ada","amount":50},
    {"customer":"Bob","amount":75}
  ]

Notes and limitations

  • Inner only. No outer joins. For "all left, fill missing right with null" you can hand-roll:
    $.left.map(l =>
      l.merge($.right.find(@.uid == l.id).or({role: null}))
    )
    
  • Equality only. No range, prefix, or function joins.
  • One key on each side. For multi-key joins, project a tuple key first:
    $.left.map(l => l.merge({_k: [l.a, l.b]}))
         .equi_join($.right.map(r => r.merge({_k: [r.x, r.y]})), "_k", "_k")
    
  • The implementation builds a hash on the right side; left is streamed. Pre-sort or pre-filter before joining if either side is large and only a subset matters.

When to choose join vs. lookup

For "many left rows, lookup one field on each":

$.orders.map(o => o.merge({customer_name: $.customers.find(@.id == o.customer).name}))

This nested find is O(n×m) — fine for small data. For large data, use equi_join (O(n+m)) or build a lookup table first:

let by_id = $.customers.index_by(@.id) in
  $.orders.map(o => o.merge({customer_name: by_id[o.customer].name}))

Practical examples

# Enrich orders with customer info
$.orders.equi_join($.customers, "customer_id", "id")

# Custom result shape
$.orders.equi_join($.customers, "customer_id", "id", (o, c) => {
  order_id: o.id,
  total: o.amount,
  buyer: c.name,
  email: c.email
})

# Self-join: pair adjacent records via shared key
$.events.equi_join($.events, "session_id", "session_id", (a, b) => {a, b})

# Multi-key join via tuple projection
let lk = $.left.map(l => l.merge({_k: f"{l.a}-{l.b}"})) in
  let rk = $.right.map(r => r.merge({_k: f"{r.x}-{r.y}"})) in
    lk.equi_join(rk, "_k", "_k")

# Filter-then-join (drop rows before paying join cost)
$.orders.filter(@.status == "paid").equi_join($.customers, "cid", "id")

Chained Pipelines

Real-world queries assembled from the building blocks. Each recipe uses one small document and shows the query chain plus a sentence on what the planner does.

1. Top-N by aggregate

DOC:    {"sales": [
  {"region": "NA", "amount": 100},
  {"region": "EU", "amount": 200},
  {"region": "NA", "amount": 50},
  {"region": "AS", "amount": 300},
  {"region": "EU", "amount": 75}
]}

QUERY:  $.sales
          .group_by(@.region)
          .entries()
          .map(([region, rows]) => {region, total: rows.map(@.amount).sum()})
          .sort(@.total)
          .reverse()
          .take(2)

OUT:    [{"region":"AS","total":300},{"region":"EU","total":275}]

group_by and sort are barriers; take(2) after the sort doesn't help — the sort must complete first. Push the demand earlier where possible.

2. Active users + role-based count

DOC:    {"users": [
  {"id":1,"role":"admin","active":true},
  {"id":2,"role":"user","active":false},
  {"id":3,"role":"user","active":true},
  {"id":4,"role":"admin","active":true}
]}

QUERY:  $.users
          .filter(@.active)
          .count_by(@.role)

OUT:    {"admin":2,"user":1}

Streaming filter + barrier count_by. The filter passes only what's needed; count_by buffers but with ValueNeed::Predicate (only the role key) — the rest of the user object is never decoded.

3. Histogram of word frequency

DOC:    {"text": "the quick brown fox jumps over the lazy dog the end"}

QUERY:  $.text
          .words()
          .map(@.lower())
          .count_by(@)

OUT:    {"the": 3, "quick": 1, "brown": 1, ...}

4. Customer order summary

QUERY:  $.orders
          .group_by(@.customer_id)
          .entries()
          .map(([cid, orders]) => {
            customer_id: cid,
            total: orders.map(@.amount).sum(),
            count: orders.count(),
            recent: orders.sort(@.date).last().date
          })
          .sort_by(@.total)
          .reverse()

The inner .sort(@.date).last() is wasteful: it sorts every group to grab the last. Rewrite with max_by:

QUERY:  ...
          .map(([cid, orders]) => {
            customer_id: cid,
            total: orders.map(@.amount).sum(),
            count: orders.count(),
            recent: orders.max_by(@.date).date
          })

5. Unique recent active sessions

QUERY:  $.events
          .filter(@.kind == "login" and .at >= "2026-01-01")
          .map(@.user_id)
          .unique()
          .count()

6. Pretty-print a CSV from objects

QUERY:  $.users
          .filter(@.active)
          .map(u => u.pick(id: id, name: full_name, email))
          .sort(@.id)
          .to_csv()

7. Find a needle in a deep document

QUERY:  $..find(@.id == 42)

If the document was loaded from bytes (default), this hits the structural index — no full traversal.

8. Compute deltas with pairwise

DOC:    {"prices": [100, 105, 102, 110, 108]}

QUERY:  $.prices.pairwise().map(([a, b]) => b - a)
OUT:    [5,-3,8,-2]

9. Rolling 3-point moving average

QUERY:  $.measurements.rolling_avg(3)

The first two outputs are null until the window fills.

10. Build a lookup, then enrich

QUERY:  let by_id = $.users.index_by(@.id) in
          $.events.map(e => e.merge({user: by_id[e.user_id].name}))

index_by is a barrier that runs once; the .map streams.

11. Select rows with all required fields

QUERY:  $.records.filter(r => r.missing("id", "name", "email").count() == 0)

12. Re-shape a long-format table

DOC:    [
  {"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},
  {"y":2025,"q":1,"v":15},{"y":2025,"q":2,"v":25}
]
QUERY:  $.pivot("y", "q", "v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15,"2":25}}

13. Mask sensitive fields

QUERY:  $.users.map(u => u.omit("password", "ssn", "token"))

14. Delta + cumulative sum

DOC:    {"daily":[{"value":10},{"value":15},{"value":12},{"value":20}]}

QUERY:  $.daily
          .pairwise()
          .map(([a, b]) => b.value - a.value)

OUT:    [5,-3,8]

For a running total, use accumulate:

DOC:    {"amounts":[10,12,9]}

QUERY:  $.amounts.accumulate(0, (total, x) => total + x)

OUT:    [10,22,31]

15. Classify rows with match

DOC:    {"books": [
  {"title":"Dune","year":1965,"tags":["sf"]},
  {"title":"Snow Crash","year":1992,"tags":["sf","cyberpunk"]},
  {"title":"Foundation","year":1951,"tags":["sf","hugo"]}
]}

QUERY:  $.books
          .map(book => {
            title: book.title,
            era: match book with {
              {year: y} when y < 1970 -> f"classic {y}",
              {year: y} -> f"modern {y}",
              _ -> "unknown"
            },
            tag_count: book.tags.count()
          })

OUT:    [
  {"title":"Dune","era":"classic 1965","tag_count":1},
  {"title":"Snow Crash","era":"modern 1992","tag_count":2},
  {"title":"Foundation","era":"classic 1951","tag_count":2}
]

16. Latest active rows from NDJSON

jetrocli --ndjson -i users.topic --payload-after '|' -e '
  $.rows()
    .reverse()
    .distinct_by(@.id)
    .filter(@.active)
    .take(100)
    .map({
      id: $.id,
      name: $.profile.name,
      city: $.profile.address.city
    })
'

On a compacted Kafka-style file, reverse rows make the newest record for each key appear first. distinct_by(@.id) keeps that first row and discards older duplicates as soon as the key has been seen.

17. Patch several paths in one pass

DOC:    {"books":[
  {"title":"Dune","year":1965,"tags":["sf"],"tmp":true},
  {"title":"Snow Crash","year":1992,"tags":["sf"],"tmp":true}
]}

QUERY:  $.update({
          books[*].tags: @.append("catalog"),
          books[*].reviewed: true,
          books[*].tmp: DELETE
        })

OUT:    {"books":[
  {"title":"Dune","year":1965,"tags":["sf","catalog"],"reviewed":true},
  {"title":"Snow Crash","year":1992,"tags":["sf","catalog"],"reviewed":true}
]}

The planner can batch compatible rooted writes so shared ancestors are cloned once and all writes under that prefix are applied together.

18. Migrate a document shape

Use walk when every nested object with a matching shape must be rewritten:

QUERY:
  $.walk(node =>
    node.merge({type: "v2"})
        .rename({old_field: "new_field"})
        .omit("legacy_blob")
    if node is object and node.type == "v1" else node)

For query-local rewrites on known paths, prefer update(...); for broad shape migration, walk makes the traversal explicit.

Pattern Match Cookbook

Fixture

Examples below run against:

DOC:    {"xs": [1, 2, 3, 4, 5], "row": {"k": "foo", "data": {"a": 1, "b": 2}}, "doc": {"a": 1, "b": 2, "type": "v1"}, "tree": {"x": 1, "children": [{"x": 2}]}, "value": 3.14}

Pattern matching is one of jetro's most expressive features. It compiles to a Maranget decision tree at lower-time and runs over all three execution domains (Val, borrowed View, tape).

Anatomy

match scrutinee with {
  pattern1 -> expr1,
  pattern2 when guard -> expr2,
  _ -> default
}
  • Arms checked top-down.
  • First match wins.
  • _ is the universal fallback.
  • when guards run after the structural match succeeds.

Pattern reference

PatternMatches
42, "x", true, nullEqual literal
_Anything
nameAnything, binds to name
1..10Number ≥ 1 and < 10
1..=10Number ≥ 1 and ≤ 10
{k: p, ...}Object with key k, value matches p
[p1, p2]Array of length 2
[h, ...t]Head + tail
p1 | p2Either
x: numberKind-bind

Object shorthand {id, name} binds each key to a same-name local. Rest captures are spelled ...*rest for objects and ...tail for arrays: {id, name, ...*rest}, [h, ...tail].

1. Discriminated union

match $.event with {
  {kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
  {kind: "key",   code: c}       -> f"key:{c}",
  {kind: "scroll", dy: d}        -> f"scroll:{d}",
  _ -> "unknown"
}

Literal discriminants and shorthand captures can be mixed, so the click arm could also be written as {kind: "click", x, y}.

2. Numeric ranges

match $.score with {
  s when s < 0 -> "invalid",
  0..50 -> "low",
  50..80 -> "medium",
  80..=100 -> "high",
  _ -> "out of range"
}

3. Or-patterns

match $.day with {
  "sat" | "sun" -> "weekend",
  _ -> "weekday"
}

4. Object rest capture

match $.config with {
  {host, port, ...*extra} -> {host, port, extra},
  _ -> null
}

5. Array shape

match $.coords with {
  [x, y] -> {x, y},
  [x, y, z] -> {x, y, z},
  _ -> null
}

6. Head + tail

match $.xs with {
  [] -> "empty",
  [first, ...rest] -> f"head={first}, count={rest.count()}",
}

7. Kind-bound + guard

match $.value with {
  s: string when s.len() > 100 -> "long string",
  s: string -> "short string",
  n: number when n > 0 -> "positive",
  n: number -> "non-positive",
  _: array -> "array",
  _ -> "other"
}

8. Deep match (..match)

Walk every descendant; collect results.

$.tree..match {
  {kind: "leaf", value} -> value,
  _ -> null
} | .compact()

The trailing .compact() drops the nulls from non-leaf descendants.

9. First-match deep (..match!)

Stops at the first match — the bang variant uses early termination via the structural index where possible.

$.tree..match! {
  {role: "admin", id} -> id,
  _ -> null
}

10. Migration / rewrite (rec)

$.doc.rec({type: "v1"}, node => node.merge({type: "v2"}))

rec is the recursive sibling of match — it descends and rewrites every matching node.

11. Cross-arm sharing

When multiple arms test the same prefix ({kind: "x", ...}, {kind: "y", ...}), the lowering shares the discriminant test. You don't write anything special — the planner does it for you. Practically: write many narrow arms; they cost about as much as one big switch.

12. Guards over deep patterns

match $.row with {
  {user: {age, role: "admin"}} when age >= 18 -> "adult admin",
  {user: {age}} when age < 18 -> "minor",
  _ -> "other"
}

Bench tips

  • Patterns with literal-only discriminants (no guards) compile to switch-like decision trees and run as fast as a hand-written if/else if.
  • Guards add a per-arm conditional; cheap, but don't put expensive computation in them.
  • Deep ..match over a large doc benefits a lot from the structural index; deep ..match! (first-match) is even better.

Kafka Compacted Topic Dumps

Kafka compacted topics keep the latest value for each key logically. A file dump can still contain older values earlier in the file:

user-a|{"id":"a","version":1,"name":"Ada"}
user-b|{"id":"b","version":1,"name":"Bob"}
user-a|{"id":"a","version":2,"name":"Ada Lovelace"}
user-c|null

Here user-c|null is a tombstone. With jetrocli, query only the JSON payload after the separator and skip tombstones:

jetrocli --ndjson -i users.topic --payload-after '|' -e '$.id'

Latest N Unique Keys

Scan from the tail, keep the first row seen for each logical id, then project only the retained rows:

jetrocli --ndjson -i users.topic --payload-after '|' \
  -e '$.rows()
    .reverse()
    .distinct_by($.id)
    .take(100)
    .map({id: $.id, version: $.version, name: $.name})'

Why this works:

  1. $.rows() switches from row-local mode to one stream over the file.
  2. reverse() starts at the newest records.
  3. distinct_by($.id) keeps the first row per key in that reverse order.
  4. take(100) stops after 100 retained unique keys.
  5. map(...) shapes only the rows that survived selection.

Find One Recent Record

jetrocli --ndjson -i users.topic --payload-after '|' \
  -e '$.rows().reverse().find($.id == "user-42").first()'

This can stop as soon as the newest matching record is found.

Keep Only Active Latest Records

Filter before de-duplication when the key should be unique among active rows:

jetrocli --ndjson -i users.topic --payload-after '|' \
  -e '$.rows()
    .reverse()
    .filter($.active)
    .distinct_by($.id)
    .take(500)
    .map({id: $.id, email: $.email})'

If tombstones carry important delete semantics for your workload, use --null-payload keep and handle null explicitly. The default skip policy is best when you only want live JSON payloads.

Write Fusion

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

When a query contains multiple chain-writes, jetro fuses them into a single pass over the document. This is the patch-fusion optimizer.

What gets fused

Any sequence of chain-write terminals on the same document:

$.user.name.set("Ada")
   .user.email.set("ada@x.com")
   .user.tags.append("admin")

Or the equivalent block form (preferred for many writes):

patch $ {
  user.name: "Ada",
  user.email: "ada@x.com",
  user.tags[*]: "admin"
}

Without fusion

Naively, three writes mean three traversals from $:

$ → user → name      (write)
$ → user → email     (write)
$ → user → tags[*]   (write)

Each rebuilds the path from the root. For deeply-nested documents, the cost adds up.

With fusion

The optimizer collects effects, walks the document once, and applies all relevant rewrites at each visited node:

$ → user → {set name, set email, append tags}

Three writes, one walk.

Phases

The patch-fusion pass has internal phases (Phase C, Phase E in the source); the user-visible properties are:

  1. Same-base writes group together. Writes under $.user.* batch.
  2. Disjoint paths don't interfere. Writes to $.user.name and $.config.theme execute in one walk but at different nodes.
  3. Conflicts are resolved last-wins. Two writes to the same path: the later one wins.
  4. Conditional writes (when) are evaluated per-write. They short-circuit per clause; the walk doesn't redo work.

Worked example

DOC:
{
  "users": [
    {"id": 1, "name": "Ada", "active": false},
    {"id": 2, "name": "Bob", "active": true}
  ]
}

QUERY:
patch $ {
  users[*].active: true,                        # broadcast write
  users[0].name: "Ada Lovelace",                # specific write
  users[*].last_seen: "2026-05-08" when .active # conditional broadcast
}

What happens:

  • One walk visits every user.
  • For each, three potential writes evaluate. Per element:
    • active: true always applies.
    • name only at index 0.
    • last_seen only when post-active write is true (so all of them).

Output:

{
  "users": [
    {"id": 1, "name": "Ada Lovelace", "active": true, "last_seen": "2026-05-08"},
    {"id": 2, "name": "Bob",          "active": true, "last_seen": "2026-05-08"}
  ]
}

When fusion doesn't fire

  • The chain isn't rooted at $ (parser doesn't classify it as a write).
  • The writes are gated by data-dependent conditions that change document shape mid-pipeline.
  • Mixed read/write — $.users[0].name.set("A").upper() keeps standard method semantics.

Tips

  • Prefer the block form (patch $ { … }) when you have ≥ 3 writes — easier to read, and the optimizer treats it identically.
  • Use broadcast (xs[*].field: v) instead of a .map that calls .set per element.
  • Conditionals (when) are fine — they don't break fusion.

jq vs jetro Cheatsheet

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}]}

For users coming from jq. Same shape: query JSON in a terminal. Different philosophy in places — call this out where it matters.

In the CLI, use -e for direct expression execution:

jetrocli -e '$.users.filter($.active).map($.email)' < users.json
jetrocli --ndjson -i events.ndjson -e '$.id'

Big differences at a glance

Topicjqjetro
Calling methodsPipe-of-filters: . | lengthDot syntax: .len()
Pipe |Sole composition operatorValue-flow only — passes @ to RHS
IterationImplicit on .[]Explicit on chained methods
LambdasNone — uses . rebindingThree forms: @, r =>, lambda r:
Pattern matchingNoneFirst-class with guards and ranges
Writes|=, =, del().set(), patch $ {}, chain-writes
BackendSingle interpreterSix backends, planner-selected
CachingNonePlan + path caches in JetroEngine

Jetro favors functional method chains over jq's pipe-of-filters style:

$.users
  .filter($.active)
  .map({id: $.id, email: $.email})
  .take(100)

One-liner translations

Identity / projection

jq:     .
jetro:  $

jq:     .x
jetro:  $.x

jq:     .x.y[0]
jetro:  $.x.y[0]

Iteration

jq:     .users[]
jetro:  $.users[*]                  # explicit; or just .users for chained methods

jq:     .users[].name
jetro:  $.users.map(@.name)

Field selection / projection

jq:     {id, name}
jetro:  .pick(id, name)            # method form, maps over arrays

jq:     .users | map({id, name})
jetro:  $.users.map(u => u.pick(id, name))
        # or
        $.users.pick(id, name)

jq:     del(.password)
jetro:  $.omit(password)            # or $.password.delete()

Filter

jq:     .users | map(select(.active))
jetro:  $.users.filter(@.active)

jq:     .users[] | select(.age > 18)
jetro:  $.users.filter(@.age > 18)

Aggregates

jq:     length
jetro:  .len()                      # for arrays, objects, strings
        .count()                    # explicit array-count reducer

jq:     [.[] | .price] | add
jetro:  $.map(@.price).sum()

jq:     [.[] | .age] | min
jetro:  $.map(@.age).min()
        # or
        $.min_by(@.age).age           # one-pass, returns whole element

Sort / unique / group

jq:     sort
jetro:  .sort()

jq:     sort_by(.year)
jetro:  .sort(@.year)

jq:     unique
jetro:  .unique()

jq:     group_by(.author)
jetro:  .group_by(@.author)
        # jq returns array-of-arrays; jetro returns object indexed by key

jq:     [group_by(.k)[] | {k: .[0].k, n: length}]
jetro:  .count_by(@.k).entries().map(([k,n]) => {k, n})

Slice and take

jq:     .[0:3]
jetro:  $[0:3]

jq:     .[0]
jetro:  $[0]
        # or
        $.first()                    # demand-aware sink

jq:     .[-1]
jetro:  $[-1]
        # or
        $.last()

Has / index / membership

jq:     has("foo")
jetro:  .has("foo")

jq:     .tags | index("admin")
jetro:  $.tags.index("admin")

jq:     .tags | contains(["admin"])
jetro:  $.tags.includes("admin")

Strings

jq:     ascii_upcase
jetro:  .upper()

jq:     ltrimstr("foo")
jetro:  .strip_prefix("foo")

jq:     split(",")
jetro:  .split(",")

jq:     test("regex")
jetro:  @ ~= "regex"
        # or
        .re_match("regex")

jq:     match("(\\d+)").captures
jetro:  .captures("(\d+)")

Recursive descent

jq:     ..
jetro:  ..                           # same notation

jq:     .. | strings
jetro:  $..find(@ is string)

jq:     .. | objects | select(.id?)
jetro:  $..find(@.id?)
        # or
        $..shape({id})

String formatting

jq:     "Hello, \(.name)!"
jetro:  f"Hello, {$.name}!"

Conditional

jq:     if .x > 5 then "big" else "small" end
jetro:  "big" if $.x > 5 else "small"

jq:     .x // "default"
jetro:  $.x ?? "default"

Variables

jq:     . as $doc | $doc.x + $doc.y
jetro:  let doc = $ in doc.x + doc.y

Reduce / fold

jq:     reduce .[] as $x (0; . + $x)
jetro:  $.sum()                      # for sum specifically
        # or general fold:
        $.accumulate(0, (a, x) => a + x).last()

Object construction

jq:     {users: [.[] | {id, name}]}
jetro:  {users: $.map(u => u.pick(id, name))}

Modification

jq:     .x = 1
jetro:  $.x.set(1)
        # or
        patch $ {x: 1}

jq:     .x |= . + 1
jetro:  $.x.modify(@ + 1)

jq:     del(.x)
jetro:  $.x.delete()

jq:     .users[].active = true
jetro:  $.users[*].active.set(true)
        # or
        patch $ {users[*].active: true}

Multiple writes

jq:     .x = 1 | .y = 2 | del(.z)
jetro:  patch $ {x: 1, y: 2, z: DELETE}

jetro fuses these into one document walk. jq evaluates each pipe stage independently.

NDJSON

jq:     jaq -c '.id' events.ndjson
jetro:  jetrocli --ndjson -i events.ndjson -e '$.id'

For whole-file stream operations, use $.rows():

jq:     tac events.ndjson | jaq -c 'select(.level == "error"), halt'
jetro:  jetrocli --ndjson -i events.ndjson \
          -e '$.rows().reverse().find($.level == "error").first()'

For Kafka compacted-topic dumps:

jetrocli --ndjson -i users.topic --payload-after '|' \
  -e '$.rows().reverse().distinct_by($.id).take(100)'

Complex pipeline translations

Real-world jq queries from the wild. Originals are taken verbatim from the jq manual and the Programming Historian "Reshaping JSON with jq" lesson; all credit to those sources. Each shows the original jq alongside an idiomatic jetro rewrite.

1. Alternative-binding destructure (jq manual)

Flatten a list of resources whose events field may be either a single object or an array of objects, into one row per (resource, event) pair. jq uses its alternative-destructuring operator ?// to try both shapes:

.resources[] as {$id, $kind, events: {$user_id, $ts}} ?// {$id, $kind, events: [{$user_id, $ts}]}
  | {$user_id, $kind, $id, $ts}

jetro has no ?//. Use kind-test + flat_map to normalise:

$.resources.flat_map(r =>
  let evts = (r.events if r.events is array else [r.events]) in
    evts.map(e => {
      user_id: e.user_id,
      kind:    r.kind,
      id:      r.id,
      ts:      e.ts
    })
)

…or with a match to make the two shapes explicit:

$.resources.flat_map(r =>
  match r.events with {
    arr: array -> arr.map(e => {user_id: e.user_id, kind: r.kind, id: r.id, ts: e.ts}),
    {user_id, ts} -> [{user_id, kind: r.kind, id: r.id, ts}],
    _ -> []
  }
)

The match form is more explicit and surfaces the "single object" branch as its own arm — easier to extend (e.g. add a third event-shape later).

2. Tweet hashtags as semicolon-joined CSV (Programming Historian)

Take an array of tweets, project id plus a semicolon-joined string of hashtag texts, emit as CSV. Original jq, threaded through five pipe stages:

{id: .id, hashtags: .entities.hashtags}
| {id: .id, hashtags: [.hashtags[].text]}
| {id: .id, hashtags: .hashtags | join(";")}
| [.id, .hashtags]
| @csv

Each pipe stage rebuilds the object — jq has no nested method chaining, so projection accumulates by reassignment.

jetro collapses it to one chain:

$.map(t => {
  id:       t.id,
  hashtags: t.entities.hashtags.map(@.text).join(";")
}).to_csv()

to_csv already emits the row, headers and all. To match jq's headerless output:

$.map(t => [t.id, t.entities.hashtags.map(@.text).join(";")])
 .map(row => row.map(@.to_string()).join(","))
 .join("\n")

3. Hashtag frequency CSV (Programming Historian)

Explode each tweet into one row per hashtag, group by hashtag, count, emit (tag, count) as CSV. Original jq:

[.[] | {id: .id, hashtag: .entities.hashtags} | {id: .id, hashtag: .hashtag[].text}]
| group_by(.hashtag)
| .[]
| {tag: .[0].hashtag, count: . | length}
| [.tag, .count]
| @csv

jq's group_by returns an array-of-arrays, so the trailing .[] and .[0].hashtag extract the key from the first element of each group.

jetro uses count_by, which already produces a {tag: count} map:

$.flat_map(t => t.entities.hashtags.map(@.text))
 .count_by(@)
 .entries()
 .map(([tag, count]) => {tag, count})
 .to_csv()

The pipeline reads top-to-bottom: explode → tally → reshape → emit. count_by is one of several jetro idioms (also index_by, unique_by, max_by) that fold a common jq pattern (group_by | map(...)) into a single barrier.

Why these examples are shorter in jetro

Three patterns recur:

  1. Method chaining. jq's ... | {...} | {...} style rebuilds the object at each stage; jetro's .map(t => {...}) builds it once.
  2. Specialised barriers. count_by, index_by, unique_by, max_by, min_by collapse group_by | map(...) chains into one call.
  3. First-class lambdas. jq's . rebinding inside as / [] becomes plain t => t.field in jetro, with no positional gymnastics.

The trade-off: jq's pipe-of-filters is more uniform — every stage is a filter that takes one input and produces zero-or-more outputs. jetro's methods are typed (one-to-one, filter, expander, reducer, barrier), so the pipeline shape is more visible but the surface is bigger.

Things jq has that jetro doesn't

  • @base64, @uri, @csv formatters as suffix. jetro spells these as methods: .to_base64(), .url_encode(), .to_csv().
  • SQL-style modules. No equivalent.
  • input, inputs, nul-separated streaming. jetro is in-process; no streaming-input model.
  • recurse(f; cond). Use walk_pre or rec with a pattern.

Things jetro has that jq doesn't

  • Pattern matching with guards, ranges, kind binding, deep ..match.
  • Demand propagation. .first(), .find(), .take(n) cut off the source; no full materialization.
  • Bitmap structural index. ..find, ..shape, ..like skip non-matching subtrees in O(1) per node.
  • First-class lambdas (r => body, lambda r: body) with let-binding + inlining.
  • Write fusion. Many writes batch into one walk.
  • Backends. Tape-zero-copy, structural index, columnar — selected by the planner.

Pitfalls when porting

  • .[] doesn't exist. Replace with [*] or just chain methods (most jetro methods auto-iterate over arrays).
  • Pipe is not composition. .x | .y in jq means "x then y". In jetro it's "evaluate .y with @ = .x". For chaining methods, use .: .x.y().
  • Method calls need parens. length is .len(), not .len.
  • select(p) becomes filter(p), and works on whole arrays — no need to first iterate with .[].
  • Group_by returns an object, not an array of arrays. Use .entries() for jq-shaped output.

Quick reference card

Needjqjetro
Project{a, b}.pick(a, b)
Drop keydel(@.k).omit(k)
Filterselect(p).filter(p)
Mapmap(f).map(f)
Iterate.[][*] or implicit
Lengthlength.len()
Sortsort_by(@.k).sort(@.k)
Uniqueunique.unique()
First.[0].first()
Last.[-1].last()
String concat"\(@.x)"f"{$.x}"
Default// d?? d
Ifif c then a else b enda if c else b
Varas $xlet x = ...
Set.x = v.x.set(v)
Update.x |= f.x.modify(f)
Deletedel(@.x).x.delete()

NDJSON and Whole-Stream Queries

jetrocli --ndjson reads newline-delimited JSON from a file: one JSON document per physical line, one compact JSON result per output line.

Use -e to run an expression directly and stay out of the interactive TUI:

jetrocli --ndjson -i events.ndjson -e '$.id'
jetrocli --ndjson -i events.ndjson -e '$.user.name.upper()'
jetrocli --ndjson -i events.ndjson -e '$.attributes.first().value'

This row-local mode evaluates the expression independently for each line. It is the fastest path for projections, scalar transforms, small array operations, and filters that do not need to coordinate across rows.

Payload Framing

Many log and Kafka dump formats store metadata before the JSON payload:

customer-42|{"id":42,"name":"Ada","active":true}
customer-17|null

Use --payload-after to query only the JSON payload after a one-byte separator:

jetrocli --ndjson -i topic.ndjson --payload-after '|' -e '$.id'

Literal null payloads are tombstones in many Kafka compacted topics. They are skipped by default:

jetrocli --ndjson -i topic.ndjson \
  --payload-after '|' \
  -e '$.name'

The null policy is configurable:

jetrocli --ndjson -i topic.ndjson \
  --payload-after '|' \
  --null-payload keep \
  -e '$'

$.rows() Whole-Stream Mode

Use $.rows() when the expression should operate on the whole file as one stream instead of running independently per line:

jetrocli --ndjson -i events.ndjson \
  -e '$.rows().filter($.active).take(10).map({id: $.id, name: $.name})'

The expression is now a stream program:

  1. read rows from the NDJSON source
  2. filter active rows
  3. keep the first ten retained rows
  4. project only those rows

No extra CLI flags are needed for filtering, limiting, mapping, or de-duplication.

Reverse Streams

For file inputs, $.rows().reverse() scans from the end of the file:

jetrocli --ndjson -i app.log \
  -e '$.rows().reverse().find($.level == "error").first()'

This is useful for append-only logs and Kafka compacted-topic dumps where the newest record for a key is physically last.

Latest Record Per Key

Kafka compacted topics keep the newest value for each key logically, but a dump file can still contain older values earlier in the file. Scan backward and keep the first row seen per key:

jetrocli --ndjson -i users.ndjson --payload-after '|' \
  -e '$.rows()
    .reverse()
    .distinct_by($.id)
    .take(100)
    .map({id: $.id, name: $.name, updated_at: $.updated_at})'

For rows:

{"id":"a","version":1}
{"id":"b","version":1}
{"id":"a","version":2}

the reverse distinct stream sees a@2 first, then b@1, and discards a@1.

Performance Expectations

On the 1 GB benchmark used by jetrocli, simple row-local projections are usually tens of times faster than jaq; the best direct byte paths are near 100x faster. Whole-stream $.rows() queries keep the same mmap and direct byte/tape foundation, but total time depends on how much of the file must be inspected.

Fastest shapes:

jetrocli --ndjson -i big.ndjson -e '$.name'
jetrocli --ndjson -i big.ndjson -e '$.attributes.first().value'
jetrocli --ndjson -i big.ndjson \
  -e '$.rows().reverse().find($.name == "user_355617").first()'

Naturally heavier shapes:

jetrocli --ndjson -i big.ndjson \
  -e '$.rows().filter($.active).distinct_by($.id).map({id: $.id, name: $.name})'

Those must inspect many rows and maintain stream state. They should still avoid unnecessary materialization, but they cannot be as cheap as a direct single-field projection.

Normal JSON Documents

$.rows() is not NDJSON-only. On a normal JSON document, it treats the document itself as one row:

DOC:    {"id":1}
QUERY:  $.rows().map($.id)
OUT:    [1]

Top-level arrays are one document row in normal JSON mode; use normal array methods directly when the input document is an array. In NDJSON mode, $.rows() means the whole input stream.

Performance Guide

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "rows": [{"age": "30", "price": "3.14"}]}

How to write jetro queries that the planner can run fast, and how to read the benchmarks.

Jetro is optimized for cold, file-backed workloads as well as long-lived embedded engines. The fastest paths avoid building full JSON trees: they read raw bytes, simd-json tape, or borrowed views and materialize only the requested result.

Mental model

Jetro picks one of six backends per pipeline node. Fast paths share three properties:

  1. The source is a path of pure field accesses. $.a.b.c triggers tape backends (zero-copy over simd-json output).
  2. The pipeline ends in a sink that bounds demand. .first(), .take(n), .find(p), .count() propagate backward and gate source reads.
  3. No mid-pipeline materialization. .collect(), .sort(), .group_by() flush the tape access pattern back to a Val walk.

If you write to those three rules, queries land on the fast path automatically.

Backend selection (cheat-sheet)

Source / shapePrimary backend
$.a.b.c (field-chain)tape-view (zero-copy)
$..find(...), $..shape({...})bitmap structural index
Single $.a.b (path only)tape-path
Generic expr / lambda bodyfast-children
NDJSON direct projectionbyte/tape writer
$.rows().filter(...).take(n) over a filedemand-aware row stream, sometimes partitioned
Any backend declinesinterpreted (universal fallback)

You don't pick — the planner does. Knowing the table tells you why a query is fast.

Demand: the killer feature

Every Demand-aware sink lets the source skip work. Concrete impact:

PatternSpeedup vs. naive
xs.first()~N× (reads 1 element)
xs.find(p)up to ~N× (stops at first match)
xs.filter(p).take(k)up to N/k×
xs.count()2-5× (no payload decoded)
xs.sum(), xs.avg()2-3× (only numeric leaves)
xs.last() (random-access source)~N× (seek to end)
xs.reverse().take(k)rewritten to LastInput(k)

For wide objects, field projection is the other big win:

$.users.map(u => u.pick(id, name))

The source decodes only id and name per row. Other fields stay as raw tape tokens.

NDJSON cold path

In jetrocli --ndjson, a row-local expression runs once per line:

jetrocli --ndjson -i big.ndjson -e '$.name'
jetrocli --ndjson -i big.ndjson -e '$.attributes.first().value'

The best row-local shapes are direct byte/tape plans. They can project fields, evaluate simple scalar calls, and write compact JSON output without converting the whole row to an owned tree.

On the 1 GB jetrocli benchmark, expect:

ShapeTypical expectation vs jaq
Root field projection, string scalar callsTens of times faster; best cases near 100x
Nested first/last field accessUsually tens of times faster
Small array map/projectionStrong, but bounded by output bytes
Filtered nested array reductionsStrong when predicates stay direct
Large derived arrays or fallback lambdasSlower; more allocation and VM work

Use $.rows() when the query needs whole-file stream state:

jetrocli --ndjson -i events.ndjson \
  -e '$.rows().filter($.active).take(100).map({id: $.id, name: $.name})'

For append-only logs and Kafka compacted-topic dumps, reverse streams can stop near the tail:

jetrocli --ndjson -i topic.ndjson --payload-after '|' \
  -e '$.rows().reverse().distinct_by($.id).take(1000)'

The important distinction is how much input must be inspected. take(10) and tail-first find(...) can stop early. Broad filter, distinct_by, or fallback expressions may need to inspect the full file, even though they still avoid avoidable materialization.

What kills performance

Mid-chain materialization

$.users
  .filter(@.active)
  .collect()                # unnecessary
  .map(@.email)

The .collect() forces a full pass before .map. Drop it.

Pre-sort barriers blocking demand

$.events.sort(@.ts).first()

.sort is a barrier — must see every element. The .first() doesn't help. Rewrite with min_by:

$.events.min_by(@.ts)

One pass, no allocation of the sorted array.

Per-element joins (O(n×m))

$.orders.map(o => o.merge({name: $.users.find(@.id == o.user_id).name}))

Each find rescans $.users. For large data, build a lookup once:

let by_id = $.users.index_by(@.id) in
  $.orders.map(o => o.merge({name: by_id[o.user_id].name}))

Or use equi_join.

Repeated sub-expressions

$.user.profile.name + " <" + $.user.profile.email + ">"

Three tape walks. Bind once:

let p = $.user.profile in
  f"{p.name} <{p.email}>"

Heavy lambdas in barriers

$.rows.unique_by(@.to_string())

unique_by calls the lambda once per row. If the projection is non-trivial (regex, deep traversal), pre-project once:

$.rows.map(r => r.merge({_k: r.to_string()}))
     .unique_by(@._k)
     .map(@.omit(_k))

Engine tuning

Plan cache

JetroEngine caches (query, context) → compiled pipeline. Default 256 entries, wholesale eviction.

For a small fixed query set with high doc volume — the typical web-server shape — every call after the first is a cache hit. Don't fight it.

For unique-per-call queries (CLI ad-hoc), the cache is a no-op; just use Jetro directly.

Path cache

The VM caches resolved pointer paths per document. The hash key includes both structure and primitive values bounded at depth 8 — so two docs with the same shape but different leaves stay distinct. You don't manage this.

simd-json (default)

The simd-json feature gives ~4× cold-start. Disable only if you need to round-trip serde_json::Value and the conversion cost dominates.

Benchmarks

cargo bench -p jetro-core

The harness covers:

  • Field access ($.a.b.c) — tape-view zero-copy
  • Filter / map / take pipelines — demand propagation
  • Deep search (..find, ..shape) — bitmap structural index
  • Pattern match — Maranget tree
  • Lambda forms — @ vs. => vs. lambda parity
  • Write fusion — single vs. fused multi-writes

To compare your changes against main:

git checkout main
cargo bench -p jetro-core -- --save-baseline main
git checkout your-branch
cargo bench -p jetro-core -- --baseline main

Reading the output: criterion reports geometric mean ratios. >5% regression should have a clear cause.

Profiling

For Rust workloads:

cargo bench -p jetro-core --bench <name> -- --profile-time 10

Then attach with samply or cargo flamegraph. Hot paths usually live in:

  • exec/pipeline/exec.rs — pipeline driver
  • exec/view/*.rs — borrowed view stages
  • exec/router.rs — backend selection
  • vm/exec.rs — bytecode VM (interpreted fallback)

If the interpreter (vm::execute) shows up hot, the planner is falling through to the universal fallback. Check the query — usually a non-$ source or a generic expr inside a method arg.

Quick checklist

Before benchmarking a query, ask:

  • Can .first() / .take() / .find() replace a full materialization?
  • Is there a barrier (sort, unique, group_by) before the bound? Push the bound earlier or use a one-pass equivalent (min_by, count_by).
  • Does a lookup repeat per row? Pre-build with index_by.
  • Are wide rows projected early with pick?
  • Are sub-expressions duplicated? Bind with let.
  • Is simd-json enabled (default)?
  • Is the same query run many times? Use JetroEngine.

If all yes, the query is on the fast path.

Public API and Engine

The full public surface of the jetro crate is two types and a handful of methods. Everything else is implementation detail.

Jetro — single-document handle

For one document, possibly many queries:

use jetro::Jetro;

let bytes = br#"{"x":[1,2,3]}"#;
let j = Jetro::from_bytes(bytes)?;          // lazy parse via simd-json tape
let v: serde_json::Value = j.collect("$.x.sum()")?;
assert_eq!(v, serde_json::json!(6));

Constructors

MethodInputNotes
Jetro::from_bytes(&[u8])Raw JSON bytesLazy parse — fastest path
Jetro::from_value(serde_json::Value)Parsed valueSkip simd-json
Jetro::from_val(Val)Internal ValAdvanced — re-using engine state

Methods

MethodReturns
j.collect(query)Result<serde_json::Value, EvalError>
j.collect_typed::<T>(query)Result<T, EvalError> (deserialize directly)

Jetro owns its per-document lazy state: raw bytes, tape/value caches, object vector promotion cache, and an instance VM used for fallback execution. It is cheap to construct for a document and can answer many queries over the same bytes without reparsing.

JetroEngine — long-lived multi-doc handle

For many documents and many queries with overlap, share the plan/VM caches:

use jetro::JetroEngine;

let eng = JetroEngine::default();

for doc_bytes in inputs {
    let v = eng.collect_bytes(doc_bytes, "$.users.filter(@.active).count()")?;
    println!("{}", v);
}

Methods

MethodInputNotes
eng.collect(&doc, q)&ValDocument already in Val form
eng.collect_value(serde_value, q)serde_json::ValueRound-trips
eng.collect_bytes(&[u8], q)Raw bytesLazy parse
eng.run_ndjson(...)Reader, query, writerRow-local NDJSON execution
eng.run_ndjson_file(...)File path, query, writerFile-backed NDJSON, including $.rows() stream mode
eng.run_ndjson_source(...)Reader or file sourceDispatches reader/file behavior explicitly

Returns Result<serde_json::Value, JetroEngineError> — a wider error type that may also wrap JSON-parse errors.

NDJSON options

NDJSON helpers accept NdjsonOptions variants for file and reader workloads:

OptionEffect
row_framePlain JSON lines or delimited payloads such as `key
null_outputSkip or emit expression results that are JSON null
parallelismAutomatic or disabled partition execution for eligible file streams
parallel_min_bytesMinimum file size before parallel partitions are considered
max_line_lenPer-line safety cap
reverse_chunk_sizeReverse file-reader chunk size

Expression-level $.rows() switches NDJSON from row-local execution to a whole-source stream plan. On files, $.rows().reverse() uses reverse file traversal; reader-backed reverse streams return a clear unsupported-source error.

Configuration

OptionDefaultEffect
Plan-cache capacity256Wholesale-evicted when full

The engine's plan cache amortises parse + lower + compile across calls. Hits are O(hash); misses do full work.

Errors

pub enum EvalError {
    /* … */
}

pub enum JetroEngineError {
    Json(serde_json::Error),
    Eval(EvalError),
}

Error messages include the query position when available.

Feature flags

FeatureDefaultWhat it does
simd-jsononDirect bytes → Val parse, skipping serde_json::Value
fuzz_internaloffRe-exports parser + planner for fuzz harness — not stable

To disable simd-json:

[dependencies]
jetro = { version = "0.5.11", default-features = false }

Python binding

jetro_py exposes a collect(doc, query) function. Internals are identical to the Rust crate.

import jetro

result = jetro.collect({"x": [1,2,3]}, "$.x.sum()")
# result == 6

CLI

jetrocli -e '$.x.sum()' < input.json
jetrocli --ndjson -i events.ndjson -e '$.rows().take(10)'

The CLI is a thin wrapper around the Rust APIs, with -e selecting non-interactive expression execution.

Threading

  • Jetro is intended as a document handle. Prefer one handle per document owner; use JetroEngine for shared multi-document workloads.
  • JetroEngine is Send + Sync and intended for shared-engine workloads.
  • The engine owns shared plan/VM caches so repeated queries over many documents avoid parse/lower/compile cost.

Stability

  • The query DSL is stable as of jetro 0.5.x.
  • The Rust API surface (Jetro, JetroEngine, error types) is stable.
  • BuiltinMethod, opcodes, IR types are internal and may change in any minor release.
  • The fuzz_internal feature is explicitly unstable.

Known Limitations and Behavior Notes (0.5.11)

This page documents current boundaries and intentional language choices for jetro 0.5.11. It is not a bug graveyard: fixed audit items have moved back into their normal reference pages.

Current Boundaries

$.rows() is a root stream source

$.rows() starts a source-level stream. In NDJSON mode it means "all rows in the file or reader"; in normal JSON mode it means "the top-level array elements" or one row for an object/scalar.

Supported:

$.rows().filter($.active).take(10)
$.rows().reverse().distinct_by($.id).take(100)

Not yet supported:

$.books.rows().take(10)

Nested stream sources need a separate design because they mix document-local arrays with source-level IO and reverse traversal.

Reader-backed reverse NDJSON is unsupported

$.rows().reverse() needs a seekable file-backed source. It works with run_ndjson_file, NdjsonSource::file, and jetrocli --ndjson -i file. Reader-backed NDJSON sources return a clear error instead of materializing the whole stream implicitly.

Row-stream operators are deliberately small

Current $.rows() stream mode supports the operators needed for retained-row workloads:

  • reverse()
  • filter(pred)
  • find(pred) / find_first(pred) / find_one(pred)
  • distinct_by(key)
  • take(n) / first()
  • map(expr)

Operators such as sort, group_by, windows, joins, and multi-source streaming are normal array/document operators, but not yet source-level $.rows() stages.

Parallel NDJSON is selective

File-backed row-stream partitioning is automatic only for plans where it is expected to help. For example, selective filter(...).take(n) can benefit from partitioned scanning. Plain map(...).take(n) stays sequential because it can stop after the first n rows without scanning unrelated partitions.

Public observability is still minimal

The engine records internal rows-stream stats for tests and future explain output, but 0.5.11 does not expose a stable public explain() API yet.

Intentional Language Choices

No in operator

in would conflict with let x = y in z and for x in xs. Use has, includes, or has_key:

$.tags.includes("urgent")
$.user.has_key("email")
$.users has {id: 1}

has, has_key, includes, and has_path differ

FormMeaning
obj.has_key("k")Object key exists
obj.has("k")Key/index style existence helper
xs.includes(v)Value membership
doc.has_path("a.b")Path exists in a nested structure
x has yMembership/containment operator sugar

Use has_key when you specifically want an object-key check.

replace is single-occurrence

.replace(needle, with) replaces only the first match. Use replace_all for every occurrence:

"hello hello".replace("hello", "hi")      # "hi hello"
"hello hello".replace_all("hello", "hi")  # "hi hi"

Comments are outside the query language

Jetro expressions do not contain comments. Keep query comments in the host language, shell script, or documentation.

Safety Limits

rec(fn) has an iteration cap

rec(fn) runs until a deep structural fixpoint. If the function never converges, jetro stops at the iteration cap and reports an error. Prefer rec(fn, cond) when the loop has an explicit bound.

$.state.rec(step, done)

NDJSON line size is bounded

NDJSON readers enforce a per-line byte cap to avoid unbounded memory use on malformed input. Tune it with NdjsonOptions or the CLI flag when processing legitimately huge rows.

Version Note

This page reflects jetro 0.5.11. If a page elsewhere still carries an older audit note, prefer this page and the current builtin reference.

Glossary

Backend. One of the execution paths the planner can route a node through: Structural, TapeView, TapeRows, TapePath, ValView, MaterializedSource, FastChildren, Interpreted. Selected automatically based on shape and capabilities.

Barrier. A stage that must see all input before emitting output. sort, unique, group_by, window, etc.

Bitmap structural index. A bit-packed index over the simd-json tape that lets ..find, ..shape, ..like, and ..match skip non-matching subtrees in O(1) per node. Used when the document is loaded with the simd-json tape (default).

Borrowed view. A ValueView — a read-only borrowed reference into a parsed document. Zero-copy substrings via Val::StrSlice.

Builtin. One of the 181 methods in jetro's catalog. Each is one impl Builtin for X block in defs.rs with identity, demand law, and runtime layers co-located.

Chain-write. A query ending in a write terminal (.set, .modify, .delete, .unset, .merge, .deep_merge, .append, .prepend) on a rooted path. Rewritten to Expr::Patch by the parser.

Composed stage. A Composed<A, B> pair that fuses two adjacent stages into one virtual call per element.

Demand. The triple (pull, value, order) describing what an operator needs from its source. See Demand Propagation.

Demand law. The rule by which a builtin transforms downstream demand into upstream demand. Encoded in the builtin's BuiltinDemandLaw.

Effect lifting. The patch-fusion pass that batches multiple chain-writes into a single document walk.

Engine. A JetroEngine — a long-lived handle that caches parsed and compiled queries for reuse across documents.

F-string. f"text {expr}" — string with embedded expression interpolation.

Field chain. A path of pure field accesses, e.g. $.a.b.c. Recognised by the planner and routed to fast tape backends.

Jetro. Single-document handle. Jetro::from_bytes(bytes)?.collect(q).

JetroEngine. Multi-document handle with plan/VM caches.

Lambda. A small function value: @, r => body, lambda r: body. All three forms compile identically.

Maranget tree. The decision-tree compilation strategy used for pattern matching. Cross-arm sharing of common discriminant tests.

Patch. The internal write operation. Generated by both patch $ { … } blocks and chain-write classification.

Patch fusion. The optimizer pass that batches multiple writes into a single walk.

Pipeline. The streaming execution model: Source → Stage* → Sink. One element at a time.

Plan / Logical Plan. Tree-shaped IR between AST and bytecode. Lives in ir/logical.rs.

Plan cache. A cache in JetroEngine that maps (query, context) to a compiled Pipeline. Default capacity 256.

Pull demand. The first lane of Demand: how many inputs must be read. Variants: All, FirstInput(n), LastInput(n), NthInput(i), UntilOutput(n).

Quantifier. A postfix operator on a path step. ? = optional, ! = exactly-one.

Sink. The terminal stage of a pipeline. Reducers, positional, and implicit collectors.

Source. The first stage of a pipeline. Usually a path or array literal.

Streaming. Per-element execution; no buffering.

Tape. The simd-json output: a flat array of tokens describing structural positions in the JSON byte buffer. Used for zero-copy access.

Val. The internal value type. Arc-wrapped compound nodes ensure cheap clones.

Value need. The second lane of Demand: how much of each row's content is required. Variants: None, Predicate, Projection, Numeric, Whole.

View. A ValueView — borrowed read-only access to a value.

VM. The bytecode executor. Used as the universal fallback backend; also provides the path-cache.

Write fusion. Same as patch fusion. See above.