Introduction

Jetro is a query, transform, and patch engine for JSON, written in Rust. It parses a small dot-syntax DSL, plans the query through a multi-tier optimizer, and routes each subtree to whichever execution backend will run it fastest — zero-copy borrowed views over a simd-json tape, a bitmap structural index, a streaming pull pipeline, or the universal interpreted fallback.

If you have used jq, jetro will feel familiar but takes a different shape:

  • Dot syntax, not pipe-of-filters. $.users.filter(active).map(name) reads left-to-right and chains methods. The | operator exists, but it is for passing a value into an arbitrary expression — not for calling methods with arguments.
  • One source of truth per builtin. Every method is one impl Builtin for X block: identity, demand law, optimizer hints, and runtime layers all co-located. There are 181 of them.
  • Demand-driven planning. .first() doesn't materialise the whole array. .filter(p).take(3) doesn't filter the whole array. The planner walks backward from the sink, telling each operator what its source actually needs to produce.
  • Writes are first-class. $.users[0].name.set("Ada") rewrites to a fused patch over the document. Multiple chain-writes batch through a single fused pass.
  • Pattern match with guards. match x with { {kind: "err"} -> .msg, _ -> "" } compiles to a Maranget decision tree and runs over Val, borrowed View, and tape domains; deep ..match is bitmap-accelerated.

What this book covers

PartWhat you get
Language ReferenceEvery grammar form with at least one runnable example.
ConceptsPipelines, demand propagation, the cache hierarchy.
Builtin ReferenceOne section per builtin — input, output, behavior, examples, demand law, common pitfalls.
RecipesReal chained queries, pattern-match cookbook, write-fusion.
AppendixThe public Rust API (Jetro, JetroEngine), and a glossary.

What this book doesn't cover

Implementation internals — the IR layer, the bytecode VM, plan caching, peephole passes — are documented in the source. This book stops at user-facing surface, with one exception: the demand-propagation chapter, because demand is what makes "obvious" queries fast and not understanding it leads to surprised benchmarks.

Conventions

Examples use this layout:

DOC:    {"books": [{"title": "Dune", "year": 1965}, {"title": "Foundation", "year": 1951}]}
QUERY:  $.books.filter(@.year < 1960).map(@.title)
OUT:    ["Foundation"]

Where the document matters, you'll see DOC:. Where it's obvious from the query, only QUERY: and OUT: appear. Method aliases are listed inline: unique (alias distinct).

Ready? Start with the Quick Tour, or jump to the Builtin Reference if you already know jetro and need a specific method.

A few v0.5 sharp edges worth noting up front. This book documents jetro's stable semantics; the behaviours listed below are intentional design choices for v0.5. See Known Limitations for the canonical fix-list.

  • replace(needle, with) replaces only the first occurrence (JavaScript-style); use replace_all for substitute-every behaviour.
  • There is no in operator ("x" in xs is a parse error) because in doubles as the binder in let and for; use xs has "x" or xs.includes("x") instead.
  • Regex specials use single backslash inside string literals ("\d" works); double-backslash also parses but matches the same class.
  • rec(fn) caps at 10 000 iterations when the step never reaches a structural fixpoint; pass rec(fn, cond) to bound the loop.

Installation

Jetro ships as three artifacts:

ArtifactWhat it isAudience
jetro (crate)Rust library — query/transform JSON in-processRust developers
jetro-pyPython bindings (PyPI)Python users
jetrocliStandalone CLI jetrocli for shell useAnyone with JSON in a terminal

Rust library

Add to Cargo.toml:

[dependencies]
jetro = "0.5"

The simd-json feature is on by default and gives a ~4× cold-start win by parsing bytes directly into Val (no serde_json::Value intermediate). To fall back to the legacy serde-only path:

[dependencies]
jetro = { version = "0.5", default-features = false }

Quick sanity check:

use jetro::Jetro;

fn main() -> anyhow::Result<()> {
    let bytes = br#"{"books":[{"title":"Dune","year":1965}]}"#;
    let j = Jetro::from_bytes(bytes)?;
    let titles: serde_json::Value = j.collect("$.books.map(@.title)")?;
    println!("{}", titles);  // ["Dune"]
    Ok(())
}

Long-lived engine

If you process many documents with overlapping queries, keep a JetroEngine around. It holds shared plan and VM caches:

use jetro::JetroEngine;

let eng = JetroEngine::default();
for doc in docs {
    let v = eng.collect(&doc, "$.users.filter(active).count()")?;
    println!("{}", v);
}

Plan-cache default capacity is 256 entries; it evicts wholesale when full.

Python bindings

pip install jetro-py
import jetro

doc = {"books": [{"title": "Dune", "year": 1965}]}
print(jetro.collect(doc, "$.books.map(@.title)"))   # ['Dune']

The Python wheel embeds the same Rust core, so query syntax is identical.

CLI (jetrocli)

Install via Homebrew:

brew install mitghi/jetrocli/jetrocli

Or build from source:

git clone https://github.com/mitghi/jetrocli
cd jetrocli && cargo install --path .

Use it like jq:

echo '{"x":[1,2,3]}' | jetrocli '$.x.sum()'
# 6

cat data.json | jetrocli '$.users.filter(@.active).map(@.email)'

Building from source

git clone https://github.com/mitghi/jetro
cd jetro
cargo build --release         # build everything
cargo test                    # full suite
cargo bench -p jetro-core     # micro-benchmarks

Workspace layout:

jetro/             facade crate (re-exports + public API)
jetro-core/        engine: parser, planner, executor, builtins, runtime
jetro-core/fuzz/   cargo-fuzz harness (feature-gated)

Verifying your install

Run the tour from the next chapter against your install. If every query produces the printed output, you're ready.

A 5-Minute Tour

This page is a working tour of jetro. Every example has a document, a query, and an output. Run them in your shell with jetrocli, in Rust with Jetro::collect, or in Python with jetro.collect.

The document for this tour

{
  "books": [
    {"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"]},
    {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"]},
    {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"]},
    {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"]}
  ],
  "active": true
}

1. Path navigation

QUERY:  $.books[0].title
OUT:    "Dune"

$ is the root, .books is field access, [0] is index. Negative indices work: [-1] is "Snow Crash".

2. The whole array

QUERY:  $.books[*].title
OUT:    ["Dune","Foundation","Hyperion","Snow Crash"]

[*] produces every element.

3. Filter

QUERY:  $.books.filter(@.year > 1980).map(@.title)
OUT:    ["Hyperion","Snow Crash"]

Inside .filter, .map, and similar method args, the current item is @. Use @.field to walk into it; the leading-dot shorthand .field is also accepted and desugars to @.field.

4. Four lambda forms

These are all equivalent:

$.books.filter(@.year > 1980)
$.books.filter(.year > 1980)
$.books.filter(b => b.year > 1980)
$.books.filter(lambda b: b.year > 1980)

Pick whichever reads best. The named-lambda and @-forms compile to identical bytecode; benchmarks confirm them perf-equal.

5. Reducers

QUERY:  $.books.count()
OUT:    4

QUERY:  $.books.map(@.year).min()
OUT:    1951

QUERY:  $.books.map(@.year).avg()
OUT:    1724.25

Reducers terminate the streaming pipeline.

6. Group / count / sort

QUERY:  $.books.count_by(@.author)
OUT:    {"Herbert":1,"Asimov":1,"Simmons":1,"Stephenson":1}

QUERY:  $.books.sort(@.year).map(@.title)
OUT:    ["Foundation","Dune","Hyperion","Snow Crash"]

7. Object projection

QUERY:  $.books[0].pick(title, author)
OUT:    {"title":"Dune","author":"Herbert"}

QUERY:  $.books.map(b => b.pick(title, year))
OUT:    [{"title":"Dune","year":1965}, ...]

.pick(name, alias: src) also renames: .pick(t: title, y: year).

QUERY:  $..find(@.year < 1960)
OUT:    [{"title":"Foundation","year":1951,...}]

QUERY:  $..like({author: "Asimov"})
OUT:    [{"title":"Foundation","year":1951,...}]

..find, ..shape, and ..like are DFS pre-order over the whole document. Equivalent named forms: .deep_find, .deep_shape, .deep_like.

9. Pipe and ternary

QUERY:  $.books.count() | "found " + (@ as string) + " books"
OUT:    "found 4 books"

QUERY:  $.books[0] | "old" if @.year < 1980 else "modern"
OUT:    "old"

| passes a value through an expression — not a method-call sugar. Use .method() for methods.

10. F-strings

QUERY:  $.books.map(b => f"{b.title} ({b.year})")
OUT:    ["Dune (1965)","Foundation (1951)","Hyperion (1989)","Snow Crash (1992)"]

11. Pattern match

QUERY:
  match $.books[0] with {
    {year: y} when y < 1970 -> f"classic {y}",
    {year: y} -> f"modern {y}",
    _ -> "unknown"
  }
OUT:    "classic 1965"

Patterns include literals, ranges (1900..2000), or-patterns, guards, object shape, array shape, and rest captures.

12. Writes

QUERY:  $.books[0].year.set(1900)
OUT:    full document with books[0].year now 1900

QUERY:  $.books[*].tags.append("read")
OUT:    full document with "read" added to every book's tags

QUERY:  $.books[0].unset(tags)
OUT:    full document with books[0].tags removed

Multiple writes in one query batch through a single fused pass.

13. Engine entrypoint (Rust)

use jetro::JetroEngine;
use serde_json::json;

let eng = JetroEngine::default();
let doc = json!({"x":[1,2,3,4,5]});
let v = eng.collect_value(doc, "$.x.filter(@ > 2).sum()")?;
assert_eq!(v, json!(12));

That's the tour. Next: the Grammar Overview, or skip straight to the Builtin Index.

Grammar Overview

The jetro DSL is a small, expression-oriented language. There are no statements at the top level — every program is an expression that produces a value (or, in the case of patches, a rewritten document).

The grammar lives in grammar.pest and is parsed by pest.

Five things that make jetro different

  1. Method calls use dot syntax. xs.map(f), not xs | map(f).
  2. Pipe | is value-flow. x | expr evaluates expr with @ bound to x.
  3. @ is the current value. Inside .filter(...) it's the row; at the top level it's the input.
  4. Bare paths inside method args. .filter(@.age > 18) is sugar for .filter(@.age > 18).
  5. Writes are queries. $.x.set(v) is parsed as a query that produces a patched document, not a mutation.

Categories of syntax

CategoryFormsChapter
Paths$, @, .field, [idx], [*], [start:end:step], ..desc, {pred}Paths
Operatorsarithmetic, comparison, logical, pipe, coalesce, ternary, kind, castOperators
Methods.name(args), lambdas (@, =>, lambda)Lambdas
Literalsnumbers, strings, f-strings, arrays, objects, regexLiterals
Control flowmatch, ternary, try, comprehensionsControl Flow
Writespatch $ {…}, chain-write terminalsPatch

A handy precedence table sits at the end of this part.

A worked sample

$.users
  .filter(u => u.active and u.age >= 18)
  .map(u => { id: u.id, name: u.name, email: u.email })
  .sort(@.name)
  .take(10)

That's: root, field users, predicate filter (named lambda), object-mapping, sort by name, take first 10.

Comments

There are no comments inside a query. Strip them client-side before calling jetro, or factor commentary into the surrounding host program.

Whitespace

Whitespace and newlines are insignificant between tokens. Keep queries on one line in CLIs; break across multiple lines in source.

Paths and Navigation

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5]}

A path is the part of a query that walks into the document. Paths start at a root marker ($, @, or an identifier inside a lambda) and chain steps left-to-right.

Roots

FormMeaning
$The whole input document (top-level root)
@The current value (set by .filter, .map, |, etc.)
nameA let-bound name or lambda parameter
DOC:    {"x": 10}
QUERY:  $
OUT:    {"x":10}

QUERY:  $.x | @ + 1
OUT:    11

Field access

DOC:    {"user": {"name": "Ada"}}
QUERY:  $.user.name
OUT:    ["Ada"]

Field names may also use string keys via ["name"]:

QUERY:  $["user"]["name"]

Use the bracket form when the key contains characters disallowed in identifiers (-, spaces, dots inside the key, leading digits).

Indexing arrays

DOC:    {"xs": [10, 20, 30, 40]}
QUERY:  $.xs[0]
OUT:    10

QUERY:  $.xs[-1]
OUT:    40

Negative indices count from the end.

Slicing

QUERY:  $.xs[1:3]
OUT:    [20,30]

QUERY:  $.xs[:2]
OUT:    [10,20]

QUERY:  $.xs[2:]
OUT:    [30,40]

QUERY:  $.xs[0:4:2]
OUT:    [10,30]

Wildcards

QUERY:  $.xs[*]
OUT:    [10,20,30,40]

[*] is "every element". Most users prefer chained methods (.filter, .map) which already iterate.

Filtered wildcard [* if pred]

A predicated wildcard — keeps only elements satisfying pred (with @ bound to the candidate).

DOC:    {"books": [{"title": "Dune", "year": 1965}, {"title": "Hyperion", "year": 1989}]}
QUERY:  $.books[* if year > 1980]
OUT:    [{"title":"Hyperion","year":1989}]

Equivalent to [*] immediately followed by an inline-filter {cond}, but stays on the path side of parsing. Particularly useful inside .update selectors and quoted patch path keys (see Patch).

Chaining a bare field step after a filtered wildcard collapses to null — chain a method instead:

QUERY:  $.books[* if year > 1980].map(@.title)
OUT:    ["Hyperion"]

Inline filter

{predicate} after a path step keeps only matching elements:

DOC:    {"books": [{"year": 1965}, {"year": 1989}]}
QUERY:  $.books{@.year > 1970}
OUT:    [{"year":1989}]

This is shorthand for .filter(@.year > 1970). Use .filter when you want named-lambda forms.

.. walks every descendant value in DFS pre-order:

DOC:    {"a": {"b": {"x": 1}}, "c": [{"x": 2}, {"x": 3}]}
QUERY:  $..x
OUT:    [1,2,3]

Combine with method calls (no space):

QUERY:  $..find(@.year < 1960)
QUERY:  $..shape({year, title})
QUERY:  $..like({author: "Asimov"})

The deep variants are bitmap-accelerated when a structural index is available.

Dynamic keys

Compute a key at runtime:

DOC:    {"realnames": {"abc": "Ada"}, "post": {"author": "abc"}}
QUERY:  $.realnames[$.post.author]
OUT:    "Ada"

Inside a lambda:

DOC:    {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY:  $.posts.map(p => $.realnames[p.author])
OUT:    ["Ada"]

Quantifiers (postfix)

FormMeaning
step?Optional — return null instead of error if missing
step!Exactly-one — error if zero or many
DOC:    {"xs": [42]}
QUERY:  $.xs!
OUT:    [42]

QUERY:  $.maybe?
OUT:    null      # absent, no error

Path after a method

Paths and methods are interchangeable steps:

$.users.filter(@.active).pick(name, email)[0]

That's: field, method, method, index. There is no special "tail position".

Paths inside method args need a root

Inside method-call arguments, paths must start with @ (current item), $ (document root), or a bound name. Bare-path forms like .field do not parse:

$.users.filter(@.age > 18)        # ✓ @-form
$.users.filter(u => u.age > 18)   # ✓ named lambda
$.users.filter(.age > 18)         # ✗ parse error
$.users.map(@.name)               # ✓
$.users.map(.name)                # ✗

The same rule applies to inline filters: $.xs{@.k > 1} works, $.xs{.k > 1} does not.

Top-level paths still need $.

Summary

StepExampleNotes
Root$, @One per chain (or implicit @ in args)
Field.nameUse ["..."] for tricky keys
Index[3], [-1]Negative counts from end
Slice[1:5], [::2]Half-open like Python
Wildcard[*]Whole array
Filtered wildcard[* if pred]Wildcard restricted by predicate (@ = element)
Descendant..name, ..DFS pre-order
Inline filter{cond}Sugar for .filter
Dynamic key[expr]Expression resolves to key
Quantifier?, !Postfix on a step

Operators

Jetro has the operators you'd expect plus a small number of extras that come up in JSON work.

Arithmetic

1 + 2          # 3
3 - 1          # 2
2 * 3          # 6
6 / 2          # 3
7 % 3          # 1
-x             # unary negation

+ on strings concatenates: "foo" + "bar""foobar".

+ on arrays concatenates: [1,2] + [3][1,2,3].

Comparison

a == b         # equality
a != b         # inequality
a < b          # less than
a <= b
a > b
a >= b

== and != work across types (strings to strings, numbers to numbers, etc). Cross-type comparison returns false for == and true for !=.

Logical

a and b        # short-circuit AND
a or b         # short-circuit OR
not a          # negation

Truthiness: null, false, 0, "", [], {} are falsy. Everything else is truthy.

Pipe

value | expr

Evaluates expr with @ bound to value. It is not a method-call shorthand.

DOC:    {"x": 10}
QUERY:  $.x | @ * 2
OUT:    20

QUERY:  $.x | f"got {@}"
OUT:    "got 10"

To call a method, use dot syntax: $.x.upper(), not $.x | upper.

Coalesce

a ?? b

Return a unless it is null, in which case b.

DOC:    {"name": null}
QUERY:  $.name ?? "anon"
OUT:    "anon"

Ternary

Python-style — postfix condition:

"hot" if temp > 30 else "cool"
DOC:    {"temp": 35}
QUERY:  "hot" if $.temp > 30 else "cool"
OUT:    "hot"

Kind tests

v is number
v is string
v is array
v is object
v is null
v is bool

Returns boolean.

QUERY:  $.x is number

Cast

x as int
x as float
x as string
x as bool
x as array
x as object

Coerces the value (or returns null if the cast is impossible — depends on the specific cast).

"42" as int        # 42
42 as string       # "42"

Membership

xs has v           # array membership: true if v is in xs
o  has "k"         # object membership: true if key "k" exists

There is no v in xs operator — that form is a parse error. Use the postfix has operator above, or call .includes(v) (arrays/strings) explicitly:

$.tags.includes("hugo")    # ✓
"hugo" in $.tags           # ✗ parse error

Regex match

s ~= "pattern"

Returns boolean. Uses Rust regex syntax. Bind captures with .captures or .match_first for richer info — see String Search.

Boolean shortcut on patches

In a patch $ { … } body, a key when condition clause skips the assignment when condition is falsy. See Patch.

Examples

DOC:    {"books": [{"year": 1965, "tags": ["sf"]}, {"year": 1989, "tags": ["sf","hugo"]}], "year_floor": 2000}

QUERY:  $.books.filter((@.year > 1970 and @.tags.includes("hugo")) or @.year >= $.year_floor)
OUT:    []

QUERY:  $.books[0].year ?? 9999
OUT:    1965

QUERY:  $.books.map(b => "old" if b.year < 1970 else "new")
OUT:    ["old","new"]

No in operator. Membership in jetro is xs.includes(v) (or xs.has(v) for objects/arrays). There is no v in xs operator — that form is a parse error. Wrap and/or mixes in parens to make precedence unambiguous; jetro follows standard binding (and tighter than or), but parens read clearer.

Lambdas and Method Calls

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "xs": [1, 2, 3, 4, 5], "pairs": [["a", 1], ["b", 2], ["c", 3]]}

Methods take arguments. Most arguments are values; one common one is a lambda — a small function evaluated per element. Jetro accepts three lambda syntaxes; pick whichever reads best.

The @-form

@ is the current item. Inside method args, prefix paths with @ to walk into it:

$.users.filter(@.age >= 18)
$.users.map(@.name)
$.xs{@.active}                  # inline filter must also use @

Leading-dot shorthand .age inside method args desugars to @.age — the two forms are equivalent and the planner sees identical opcodes.

$.users.filter(.age >= 18)
$.users.map(.name)
$.xs{.active}                    # works inside inline filters too

Arrow-form named lambda

$.users.filter(u => u.age >= 18)
$.users.map((u) => u.name)

The parens around the parameter are optional for one parameter.

For multiple parameters:

$.pairs.map(([k, v]) => k + ":" + v)

Python-style lambda keyword

$.users.filter(lambda u: u.age >= 18)
$.users.map(lambda u: u.name)

Functionally identical to the arrow form. Useful when porting from Python.

Performance

Named lambdas (u => u.x, lambda u: u.x) and the @-form compile to the same bytecode. Benchmarks confirm parity (3.42 ms vs 3.44 ms / 100K rows in the lambda regression suite). Pick what reads best — there is no perf reason to prefer @.

Method call basics

.method()                       # no args
.method(arg)                    # one positional
.method(arg1, arg2)             # multiple
.method(name=value)             # named (a few methods support these)
.method(arg1, name=value)       # mixed

Examples:

$.xs.take(3)
$.xs.replace("foo", "bar")
$.xs.join(",")
$.xs.sort(@.year)                # sort by key projection

Methods inside method args

Lambdas can chain methods just like top-level queries:

$.posts.map(p => p.tags.unique().count())
$.users.filter(u => u.email.starts_with("admin"))

Multi-arg lambdas with destructuring

Some barriers (e.g. pairwise) yield 2-tuples. Destructure them:

$.xs.pairwise().map(([a, b]) => b - a)

Captured $

Inside a lambda, $ still means "the document root" — it does not get shadowed by the lambda parameter:

DOC:    {"realnames": {"abc": "Ada"}, "posts": [{"author": "abc"}]}
QUERY:  $.posts.map(p => $.realnames[p.author])
OUT:    ["Ada"]

First-class lambdas via let

Bind a lambda once, use it many times:

let by_year = (b => b.year < 1970) in
  $.books.filter(by_year)

The let-bound lambda is inlined at every method-arg use before compilation, so it has zero closure overhead — exactly the same code as if you'd written the body directly in .filter(...).

Outside method-arg position, the binding is a normal name reference.

Literals

Scalars

null
true     false
42       3.14     -7    1.5e3
"double-quoted"   'single-quoted'

Strings allow standard escapes (\n, \t, \\, \", \uXXXX).

F-strings

f"…" interpolates {expression}:

DOC:    {"name": "Ada", "age": 36}
QUERY:  f"hi {$.name}, you are {$.age + 1} next year"
OUT:    "hi Ada, you are 37 next year"

Inside a lambda:

$.users.map(u => f"{u.name} <{u.email}>")

Escape literal braces with {{ and }}:

f"{{not interpolated}}"      # "{not interpolated}"

Arrays

[1, 2, 3]
["a", "b"]
[$.x, $.y, 99]              # values can be expressions

[...$.xs, 4, 5]             # spread
[1, ...mid, 9]              # spread anywhere

Heterogeneous arrays are fine: [1, "a", null, [2,3]].

Objects

{name: "Ada", age: 36}            # bare-key (identifier-like)
{"name": "Ada"}                   # quoted-key (any string)

{x, y}                            # shorthand: same as {x: x, y: y}

{[dyn_key]: 1}                    # computed key
{...obj, extra: 1}                # spread
{...**deep}                       # deep recursive spread

{name: "Ada", role: "admin" when $.is_admin}
                                  # conditional value (omit if cond falsy)

Regex literals

Regex appear as the right operand of ~= or as arguments to regex builtins:

$.s ~= "^[A-Z]+$"
$.text.scan("\d+")

Patterns use Rust's regex crate syntax.

Numeric notes

Jetro distinguishes integers from floats internally where possible. 42 and 42.0 compare equal but a downstream sink that requires "integer" (e.g. indexing) will only accept the former.

Negative literals: -7 is a unary-negated literal — the parser handles this correctly without ambiguity in arithmetic positions (a - 7 is subtraction, a + -7 is addition with -7).

Control Flow

Ternary

Python-style:

expr if condition else fallback
DOC:    {"x": 10}
QUERY:  "big" if $.x > 5 else "small"
OUT:    "big"

Right-associative; chain via parens for clarity.

Try / else

Catch evaluation errors:

try expr else fallback
QUERY:  try $.maybe.deep.path else "missing"
OUT:    "missing"

QUERY:  try $.xs[0].name.upper() else "n/a"

? quantifier handles the "missing field" subset more concisely: $.maybe? returns null instead of erroring.

let … in …

Local bindings:

let x = $.users.count() in
  f"there are {x} users"

Multi-binding:

let a = 1, b = 2 in a + b   # equiv: let a=1 in let b=2 in a+b

let shines for first-class lambdas — see Lambdas.

Pattern match

match value with {
  pattern1 -> expr1,
  pattern2 when guard -> expr2,
  _ -> default
}

Patterns

PatternMatches
42, "x", true, nullEqual literal
_Any value
nameAny value, bound to name
1..10Number ≥ 1 and < 10
1..=10Number ≥ 1 and ≤ 10
{k1: p1, k2: p2}Object with these keys, each matching (no shorthand {k1, k2} in v0.5)
[p1, p2]Array of length 2, each matching
[h, ...t]Head + tail
p1 | p2Either pattern (or-pattern)
x: numberKind-bound: matches if x is a number

Guards

match $.x with {
  v when v > 100 -> "big",
  v when v > 10 -> "medium",
  _ -> "small"
}

Worked example

DOC:    {"event": {"kind": "click", "x": 100, "y": 200}}
QUERY:
  match $.event with {
    {kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
    {kind: "key",   code: c}       -> f"key:{c}",
    _ -> "unknown"
  }
OUT:    "click@100,200"

Deep match

$..match { pattern -> expr, _ -> null }

Walks every descendant; returns matched results as an array.

$..match! { pattern -> expr }      # first match only, early-stops

The bang variant terminates as soon as one match succeeds (uses the bitmap structural index when available).

Comprehensions

Jetro supports list, dict, set, and generator comprehensions over both literal and path-rooted sources. Pair destructure works in two interchangeable forms (for k, v in ... and for [k, v] in ...), and multiple if clauses are folded with and.

List

[expr for x in source if cond1 if cond2 ...]
DOC:    {"xs": [1, 2, 3, 4, 5]}

QUERY:  [n*n for n in $.xs if n > 2]
OUT:    [9,16,25]

QUERY:  [n for n in $.xs if n > 1 if n < 5]
OUT:    [2,3,4]

Object

{key: value for x in source if cond}
{k: v for [k, v] in pairs}
{k: v for k, v in pairs}
DOC:    {"pairs": [["a", 1], ["b", 2]]}

QUERY:  {k: v for [k, v] in $.pairs}
OUT:    {"a":1,"b":2}

QUERY:  {n: n*n for n in [1,2,3]}
OUT:    {"1":1,"2":4,"3":9}

Iterating an object yields {key, value} records:

DOC:    {"o": {"a": 1, "b": 2}}
QUERY:  {e.key: e.value*10 for e in $.o}
OUT:    {"a":10,"b":20}

Set

Deduplicating comprehension. Returns an array of unique values.

QUERY:  {n*n for n in [-2, -1, 0, 1, 2]}
OUT:    [4,1,0]

Generator

(x for x in items)

Same semantics as the list form; useful as a lazy source for a downstream reducer or barrier.

if-on-patch

Inside a patch $ {…} body, key: expr when cond skips the assignment when cond is falsy:

patch $ {
  status: "active" when $.verified
}

See Patch.

Patch and Writes

Fixture

Examples below run against:

DOC:    {"user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "xs": [1, 2, 3, 4, 5]}

Jetro treats writes as queries: a write returns the patched document. There are two equivalent surfaces.

Chain-write terminals

Add a write method at the end of a rooted path:

MethodEffect
.set(v)Replace the value at this path with v
.modify(expr)Replace, with @ bound to the current value
.delete()Remove the leaf
.unset(key)Remove key from the leaf object
.merge({…})Shallow-merge into the leaf object
.deep_merge({…})Recursive merge
.append(v)Push to the leaf array
.prepend(v)Unshift onto the leaf array
DOC:    {"user": {"name": "Ada", "tags": ["math"]}}

QUERY:  $.user.name.set("Ada Lovelace")
OUT:    {"user":{"name":"Ada Lovelace","tags":["math"]}}

QUERY:  $.user.tags.append("code")
OUT:    ["math","code"]

QUERY:  $.user.unset(tags)
OUT:    {"user":{"name":"Ada"}}

QUERY:  $.user.modify(u => u.merge({active: true}))
OUT:    {"user":{"active":true,"name":"Ada","tags":["math"]}}

The classifier fires only when the base of the chain is $. Inside lambdas ($.xs.map(@.set(...))) it remains a regular method call — useful when a sub-pipeline wants the old "return the new value" semantics.

patch $ { … } block

The same operation expressed as a block:

patch $ {
  user.name: "Ada Lovelace",
  user.tags: DELETE
}

Block syntax is best for multiple writes — it batches them through a single fused pass (see Write Fusion).

Block clauseMeaning
path: valueAssignment
path: DELETERemoval
path: value when condConditional
path[*]: valueBroadcast over an array

Conditional writes

patch $ {
  status: "active" when $.verified,
  retired_at: now() when $.retired
}

If the condition is falsy, the assignment is skipped entirely — neither written nor zeroed.

Broadcast over arrays

DOC:    {"items": [{"x": 1}, {"x": 2}, {"x": 3}]}

QUERY:  $.items[*].x.set(0)
OUT:    [0,0,0]

Pipe form preserves "return-the-new-value"

Some users prefer the v1 behavior where a write inside a .map returned the written value, not the patched root:

$.items.map(item => item | set(item.x + 1))

The pipe form value | set(new) keeps that meaning.

Modify with pipe

$.user.modify(u => u.merge({last_seen: now()}))

modify evaluates its argument with @ bound to the current value, then writes the result back at the same path.

Multiple writes in one query

Either chain them:

$.user.name.set("Ada").tags.append("admin")

or use a block:

patch $ {
  user.name: "Ada",
  user.tags[*]: "active"   # broadcast
}

The planner detects multi-write patterns and routes them through the patch-fusion optimizer, which lowers repeated path traversals into a single fused write pass.

Functional .update({...})

A third surface, written as a method call:

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}

QUERY:  $.books[*].update({tags: tags.append("modern") when year > 1980, reviewed: true})
OUT:    {"books":[{"reviewed":true,"tags":["sf"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

Use .update when you want all of the following at once:

  • A selector chosen with chain syntax ($.books[*], $.books[* if year > 1980])
  • An object body listing multiple field updates evaluated against each selected snapshot
  • The same when / DELETE semantics as patch $ { ... }
  • Quoted path keys ("books[*].tags") when the receiver is $, giving root-level batched updates without an explicit selector

.update parses to its own AST node (UpdateBatch) so the planner can keep the user-level shape — useful for selector pushdown, demand analysis, and fusion. See Path Mutation → update for the full argument matrix.

Filtered wildcard [* if pred]

A predicated wildcard inside a path. Available wherever [*] is, and particularly useful inside .update selectors and quoted path keys:

DOC:    {"books": [
  {"title": "Dune", "year": 1965},
  {"title": "Hyperion", "year": 1989}
]}

QUERY:  $.books[* if year > 1980]
OUT:    [{"title":"Hyperion","year":1989}]

The predicate runs against @ = the candidate element. Falsy elements are skipped from the path traversal entirely.

Wildcard .modify chains

Wildcard chain-writes are now lowered to a fused patch:

DOC:    {"books": [{"tags": ["sf"]}, {"tags": ["hugo"]}]}
QUERY:  $.books[*].tags.modify(@.append("test"))
OUT:    {"books":[{"tags":["sf","test"]},{"tags":["hugo","test"]}]}

Caveats

  • .replace(needle, with) is not a write terminal — it is the string-replace builtin.
  • The classifier only triggers on chains rooted at $. Use the block syntax when the base path is computed.
  • DELETE is a marker, not a value — you can't store it in a binding.

Precedence Table

Lowest precedence at the top. Operators on the same row associate left unless noted.

LevelOperatorsAssociativityNotes
1if … else …, try … else …rightTernary, try-else
2|, |>leftPipe (value-flow)
3??, ?|rightCoalesce
4orleftLogical OR (short-circuit)
5andleftLogical AND (short-circuit)
6notn/aLogical NOT (prefix)
7is, kind, is notn/aKind test
8hasleftMembership operator (no in — use .includes(v))
9==, !=, <, <=, >, >=, ~=leftComparison
10+, -leftAdditive (and string/array concat)
11*, /, %leftMultiplicative
12asleftCast
13- (unary)n/aNegation
14.field, .method(), [idx], {cond}, ?, !leftPostfix steps
15$, @, literal, (...), lambda, let, match, patch, compn/aPrimary

Common pitfalls

Pipe vs method call.

$.x | upper           # ✗ — interprets `upper` as a name to pipe into
$.x.upper()           # ✓ — method call

Comparison chains.

1 < x < 10            # ✗ — parses as `(1 < x) < 10`
1 < x and x < 10      # ✓

Ternary mid-chain.

$.x.upper() if cond else $.x   # parses fine — the ternary wraps the whole
                                # left expression

Negation tightness.

not a == b            # parses as `(not a) == b` — surprising!
not (a == b)          # parens are clearer
a != b                # cleanest

Coalesce + comparison.

$.x ?? 0 > 5          # parses as `($.x ?? 0) > 5` (low-precedence coalesce)

Try captures errors only.

try $.x.parse_int() else 0

try does not catch falsy-as-error — only actual evaluation errors (missing field, bad cast, regex failure, etc.).

Pipelines

A jetro query is a pipeline of stages. The shape is always:

Source → Stage* → Sink

Source produces values one at a time. Each Stage consumes one value and produces zero, one, or many. The Sink collects results.

What counts as a stage

StageExamplesOutput
One-to-one.map, .enumerate, .lag, .zscoreOne out per in
Filter.filter, .find, .compact, .takewhileZero or one out per in
Expander.flat_map, .flatten, .split, .lines, .charsMany out per in
Reducer.sum, .count, .min, .any, .find_indexOne total
Positional.first, .last, .nth(i), .collectOne or N
Barrier.sort, .unique, .group_by, .window, .chunkBuffers, then emits

A reducer or positional terminator ends the pipeline; further methods chain on the result (a scalar or array) rather than streaming.

Streaming vs. barrier

Most stages stream — they process one value, emit, repeat. The pull-based backend means each value travels end-to-end before the next is fetched. This is what makes early termination work (.first, .find).

Barriers cannot stream: .sort must see every element before it can emit any. The pipeline buffers up to the barrier, runs the barrier as a unit, then resumes streaming if more stages follow.

$.xs.map(f).filter(p).sort(@.x).take(10).map(g)
        \________________/   \____________/
            streaming         streaming again
                          ↑
                    barrier point

Barriers carry an apply_barrier method on the builtin.

Sources

The most common source is a path: $.users is a source. Other shapes:

  • An array literal ([1,2,3].map(f))
  • A range ((0..10).map(f))
  • A method that returns a sequence ($.text.lines().map(...))

Sinks

If your final stage is a reducer, the sink is the reducer's accumulator. If it's a streaming stage, the sink collects into an array.

.collect() is the explicit sink: scalar in → [scalar], array in → identity, null in → []. Use it when you need a deterministic array shape.

Composed stages

Adjacent stages get composed when possible: two Stages fold into one virtual call per element. This is Composed<A, B> under the hood; the optimizer fuses chains of .maps, .filters, and .picks aggressively.

User-visible effect: writing many short stages costs roughly the same as one big lambda — write for clarity.

Backend selection

Each pipeline node carries a list of preferred backends. The router tries them in order; the first to declare it can run the node wins.

SourcePreferred backends
FieldChain (e.g. $.a.b.c)tape-view → tape-rows → materialised → val-view → interpreted
Generic expressionfast-children → interpreted
Deep searchstructural index → interpreted
Single root pathtape-path → interpreted

You don't pick the backend — the planner does. But knowing they exist explains why simple queries are fast: they often run zero-copy over the simd-json tape.

When to think about pipeline shape

In practice, almost never. Two cases:

  1. Don't sort until you have to. A pre-sort barrier defeats early termination. Push .filter, .take, .first before .sort if the semantics allow.
  2. Avoid full materialisation in the middle. Chains of streaming stages stay zero-copy. A .collect() mid-chain forces a full pass.

The next chapter, Demand Propagation, explains why these heuristics work.

Demand Propagation

Demand propagation is the planner pass that makes "obvious" queries fast. It walks the pipeline backward — from sink to source — asking each operator: given what comes after you, what do you actually need from your source?

The answer is encoded in three lanes per stage and then used at execution time to skip work.

The three lanes

1. PullDemand — how many inputs?

VariantMeaning
AllRead everything
FirstInput(n)Stop after n inputs
LastInput(n)Seek to the end, take last n
NthInput(i)Jump to a single index
UntilOutput(n)Keep reading until n outputs are produced

2. ValueNeed — what payload from each input?

VariantMeaning
NoneDon't decode the row at all
PredicateOnly what the predicate touches
ProjectionOnly the fields used in a projection
NumericOnly numeric content
WholeThe full row (default pessimistic)

3. order: bool — does input order matter?

Some sinks (e.g. .sum()) don't care about order. The planner can use this to enable parallel-friendly access patterns when supported.

Backward walk

For a pipeline s1 → s2 → … → sN → sink, the planner does:

demand = sink_demand
for op in [sN, …, s2, s1]:        # reverse order
    upstream = op.propagate_demand(demand)
    record (op, downstream=demand, upstream)
    demand = upstream

The final demand is what the source must satisfy. The source backend chooses an access strategy that matches.

Operator laws

Every builtin declares one of these laws (in defs.rs):

LawEffect on demand
IdentityPass through unchanged (e.g. .upper, .lower)
MapLikePreserve pull, force ValueNeed::Whole
FilterLikeFirstInput(n) becomes UntilOutput(n)
TakeWhileSame as filter, but bounded
UniqueLikeMust scan until N distinct outputs
Take(n)Cap pull at FirstInput(n)
FirstAlways FirstInput(1)
LastAlways LastInput(1)
CountAll inputs, ValueNeed::None
NumericReducerAll inputs, ValueNeed::Numeric

Six worked examples

A. Early termination on .first

$.items.map(name).first()
  • first() declares FirstInput(1) to its source
  • .map(name) is MapLike: preserves pull, demands Whole from items
  • Source receives: read 1 item, decode fully

Without demand: read all items, decode all, take first.

B. Bounded filter

$.items.filter(active).take(3)
  • take(3)FirstInput(3)
  • filter(active)UntilOutput(3) (read until 3 pass)
  • Source: read until 3 active items found

Without demand: filter the entire array, then slice.

C. Field-level projection

$.users.map(u => {id, name})
  • The map projection touches id and name
  • Source: decode only id, name from each user

Other fields are not allocated. Over a wide-record document, this is the biggest win.

D. Last-element scan

$.logs.filter(severity >= 3).last()
  • last()LastInput(1)
  • filter(...)UntilOutput(1) from the end
  • Source: scan backward, stop after first match

Without demand: scan forward, materialise all matches, take last.

E. Count without payloads

$.items.filter(status == "done").count()
  • count() declares ValueNeed::None
  • filter(...) declares Predicate on status
  • Source: decode only status, no other fields

F. Reverse + take

$.items.reverse().take(2)
  • take(2)FirstInput(2)
  • reverse() flips: source receives LastInput(2)
  • Source: seek to end, read 2 backward, then reverse

What demand does not do

  • It does not change result semantics. Two pipelines with identical text produce identical output regardless of demand state.
  • It does not optimise across barriers (.sort, .group_by). A barrier forces All upstream — it must see every input.
  • It does not move work between stages. Operators don't fuse; demand only gates what they read.

When you'll feel demand kick in

Three rough rules of thumb:

  1. Put take/first/find near the end. That's how their pull demand reaches back to the source.
  2. Project early when possible. map(@.field) upstream of a barrier reduces the buffered set.
  3. Avoid unnecessary collect(). It forces full materialisation and resets the demand walk.

Demand is invisible most of the time — your queries get faster than they "should" be, and that's exactly the goal.

Lazy Evaluation and Caches

Jetro is lazy in three places that matter to users.

1. Document parsing

Jetro::from_bytes does not fully parse the document up front when the default simd-json feature is enabled. Instead it builds a tape — a flat array of tokens — and lazily decodes parts as queries demand them.

What this means:

  • Cold-start is ~4× faster than the legacy serde_json::Value path.
  • A query that touches only $.x.y decodes the rest of the doc only when asked.
  • Borrowed string slices (Val::StrSlice) avoid a copy when the value is read-only.

If you want eager full parsing (e.g. for serde_json::Value round-trips):

let doc: serde_json::Value = serde_json::from_slice(bytes)?;
let v = engine.collect_value(doc, "$.x")?;

2. Streaming pipelines

The pull-based pipeline backend processes one element at a time. A stage doesn't run until its downstream consumer pulls. This is what enables .first() and .find() to terminate early.

A consequence: side effects in lambdas are not guaranteed to fire for every element. (Lambdas in jetro have no I/O, so this is mostly an academic concern, but worth knowing if you write a custom builtin.)

3. Plan caches

Two caches matter:

Plan cache (per JetroEngine)

When you call engine.collect(&doc, query) repeatedly with the same query, the parsed AST → IR → bytecode pipeline is computed once and reused. Default capacity: 256 entries, evicted wholesale when full.

For workloads with a small fixed set of queries and many documents, this is a big speedup. For ad-hoc one-shot queries, it's a no-op.

Path cache (per VM)

The bytecode VM caches resolved pointer paths per document. The cache key hashes both structure and primitive leaf values bounded at depth 8 — two documents with identical shape but different leaves produce different hashes, so the cache stays correct across calls.

You don't manage this directly. It's amortised over many queries on the same document.

When laziness backfires

It rarely does, but two pitfalls:

Forcing materialisation. Methods like .collect(), .sort(), .unique(), .group_by() are barriers — they materialise. Putting them mid-chain when they aren't needed defeats laziness.

Holding onto Vals. A Val is Arc-wrapped, so cloning is O(1), but the Arc keeps the underlying data alive. If you query a giant doc, hold onto a small projection, and let the doc go, you may be surprised that the original data is still resident — the projection's Val::StrSlices borrow into the tape.

Use .to_json() (or serde_json::Value round-trip) to disconnect a projection from the source tape when you really need to release memory.

Practical recipe

For long-lived servers:

// At startup
let engine = JetroEngine::default();

// Per request
let result = engine.collect_bytes(req_body, "$.users.filter(@.active).count()")?;

Plans get cached, parsing is lazy, the pipeline early-terminates. There's typically nothing else to tune.

Builtin Reference — Overview

Jetro ships 181 builtin methods. They fall into 18 categories. Every method has the same shape:

.method(arg1, arg2, …)

…or, when the parser routes through inline path filters and sugar:

$.path.method(...)

This part documents every method. Each entry follows the format:

name (aliases: …)

  • Signature: what it takes and returns
  • Behavior: one-paragraph description
  • Example: at least one minimal runnable example
  • Demand law / Notes: when relevant

Index

CategoryWhat goes herePage
Value introspectiontype, len, schema, JSON round-tripIntrospection
Numeric scalarsceil, floor, round, absNumeric
String transformsupper, trim, pad_*, slice, replaceString
String search / regexstarts_with, match_*, captures, split_reString Search
Conversionto_number, parse_int, parse_boolConversion
Streaming one-to-onemap, enumerate, pairwise, lag, zscoreStreaming
Filteringfilter, find, compact, takewhileFiltering
Expandingflat_map, flatten, lines, charsExpanding
Reducerssum, count, any, max_byReducers
Positionalfirst, last, nth, collectPositional
Barrierssort, unique, group_by, windowBarrier
Arrays / setsappend, diff, union, zipArrays
Objectskeys, pick, merge, transform_valuesObjects
Path mutationget_path, set_path, set, updatePath Mutation
Deep traversaldeep_find, walk, recDeep
Predicateshas, missing, includes, indexPredicates
Tabularto_csv, to_tsvTabular
Relationalequi_joinRelational

Notation in this part

  • aliases — alternative names accepted by the parser. They lower to the same builtin and behave identically.
  • "demand law" — what kind of Demand this builtin propagates upstream. See Demand Propagation for the model.
  • "barrier" / "stream" / "scalar" — execution shape (does it buffer, stream, or run once on a single value).

When a method appears under multiple categories (e.g. .find is both a filter and positional), it lives in the most specific chapter and is cross-linked.

Sharp edges

A small set of v0.5 design choices is documented in Known Limitations: replace is single-occurrence (use replace_all for substitute-every), there is no in operator (use xs has v), and rec(fn) caps at 10 000 iterations when the step never converges (use rec(fn, cond) to bound). Two engine items remain on the fix-list: rec() no-arg and a stronger runaway-iteration guard.

Aliases at a glance

CanonicalAliases
anyexists
chunkbatch
drop_whiledropwhile
take_whiletakewhile
includescontains
skipdrop
sortsort_by
uniquedistinct
deep_find..find (deep-method form)
deep_shape..shape
deep_like..like

These pairs are interchangeable. Pick whichever reads better.

Value Introspection

Methods that report on the kind and shape of a value, plus JSON round-trip.

type

  • Signature: Any -> String
  • Behavior: Returns the kind of value as a string: "null", "bool", "number", "string", "array", "object".
QUERY:  $.x.type()
DOC:    {"x": [1,2,3]}
OUT:    "array"

len

  • Signature: (String|Array|Object) -> Number
  • Behavior: Length: chars for strings, elements for arrays, key count for objects. Errors on null/bool/number.
DOC:    {"s": "hello", "xs": [1,2,3], "o": {"a":1,"b":2}}

QUERY:  $.s.len()     OUT: 1
QUERY:  $.xs.len()     OUT: 3
QUERY:  $.o.len()     OUT: 1

to_string

  • Signature: Any -> String
  • Behavior: Stringifies a scalar (42"42", true"true", null"null"). For arrays/objects, returns the JSON serialisation.
QUERY:  42.to_string()     OUT: "42"
QUERY:  ([1, 2]).to_string()     OUT: "[1,2]"

to_json

  • Signature: Any -> String
  • Behavior: Compact JSON serialisation of any value.
QUERY:  $.user.to_json()

Distinguish from to_string: for compound values, the two are equivalent; for scalars, to_json always quotes strings ("foo""\"foo\""), to_string does not.

from_json

  • Signature: String -> Any
  • Behavior: Parse a JSON string into a value.
QUERY:  '{"x":1}'.from_json()
OUT:    {"x":1}

QUERY:  $.encoded.from_json().x

Errors on malformed input. Wrap in try if the source is untrusted:

try $.s.from_json() else null

schema

  • Signature: Any -> Object
  • Behavior: Infers a schema sketch — keys, kinds, nullable flags. Useful for "what does this document look like?" probes.
DOC:    [{"id": 1, "name": "a"}, {"id": 2, "name": null}]
QUERY:  $.schema()
OUT:    {"items":{"fields":{"id":{"type":"Int"},"name":{"nullable":true,"type":"String"}},"required":["id"],"type":"Object"},"len":2,"type":"Array"}

The exact output format is documented in builtins/ops/schema.rs; treat it as advisory rather than a stable contract.

Demand notes

  • len over an array is ValueNeed::None upstream — it doesn't decode rows.
  • type is Identity demand-wise.
  • from_json/to_json are scalar transforms with no demand interaction.

Practical examples

# Quick shape check
$.payload.type()                        # → "object"
$.payload.len()                         # for object: number of keys

# Distinguish array length vs string length
$.items.len()                           # array element count
$.title.len()                           # number of characters

# Safe deserialization of a payload field
try $.body.from_json() else null

# Compact serialization
$.event.to_json()

# Stringify any value
$.x.to_string()

# Probe an unknown payload's schema
$.events[0].schema()

Numeric Scalars

Fixture

Examples below run against:

DOC:    {"products": [{"id": 1, "price": 3.7}, {"id": 2, "price": 4.2}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "deltas": [-1, 2, -3, 4], "xs": [1, 2, 3, 4, 5]}

Pure scalar transforms over numbers.

ceil

  • Signature: Number -> Number
  • Behavior: Smallest integer ≥ x.
QUERY:  3.2.ceil()     OUT: 4
QUERY:  (-3.2).ceil() OUT: -3

floor

  • Signature: Number -> Number
  • Behavior: Largest integer ≤ x.
QUERY:  3.7.floor()     OUT: 3
QUERY:  (-3.7).floor() OUT: -4

round

  • Signature: Number -> Number
  • Behavior: Round to nearest; ties round half-away-from-zero.
QUERY:  3.5.round()     OUT: 4
QUERY:  3.4.round()     OUT: 3
QUERY:  (-3.5).round() OUT: -4

abs

  • Signature: Number -> Number
  • Behavior: Absolute value.
QUERY:  (-7).abs()     OUT: 7
QUERY:  3.5.abs()     OUT: 3.5

Mapping over arrays

These are scalar; lift them with .map:

DOC:    {"xs": [1.4, 2.6, -3.5]}

QUERY:  $.xs.map(@.round())
OUT:    [1,3,-4]

QUERY:  $.xs.map(@.abs()).sum()
OUT:    7.5

See also

Numeric reducers (sum, avg, min, max) live in Reducers. Streaming numeric transforms (zscore, pct_change, cummax, cummin) live in Streaming.

Practical examples

# Round every price up to the nearest dollar
$.products.map(p => p.merge({price_ceil: p.price.ceil()}))

# Percent → integer percent
$.metric.pct.map(@ * 100).map(@.round())

# Magnitudes (drop sign)
$.deltas.map(@.abs())

# Banker-style splits
$.amount.floor()                   # cents component, etc.

# Build a histogram with binned values
$.measurements.map(m => (m / 10).floor() * 10).count_by(@)
# → {0: 12, 10: 5, 20: 3, ...}

String Transforms

Scalar string operations. Lift with .map to apply to an array of strings.

Case

MethodWhatExample
upperASCII uppercase"foo".upper()"FOO"
lowerASCII lowercase"FOO".lower()"foo"
capitalizeFirst char upper, rest lower"foo bar".capitalize()"Foo bar"
title_caseEach word capitalised"foo bar".title_case()"Foo Bar"
snake_caselowerSnake_case to lower_snake_case"FooBar".snake_case()"foo_bar"
kebab_caseWords joined with -"FooBar".kebab_case()"foo-bar"
camel_casefooBar style"foo_bar".camel_case()"fooBar"
pascal_caseFooBar style"foo_bar".pascal_case()"FooBar"
reverse_strReverse char order"abc".reverse_str()"cba"

Trim

MethodWhat
trimStrip whitespace from both ends
trim_leftStrip leading whitespace
trim_rightStrip trailing whitespace
QUERY:  "  hi  ".trim()     OUT: "hi"
QUERY:  "  hi  ".trim_left()     OUT: "hi  "

Padding and centering

MethodSignatureExample
pad_left(width, char?)Right-align by padding left"7".pad_left(3, "0")"007"
pad_right(width, char?)Left-align by padding right"hi".pad_right(5)"hi "
center(width, char?)Center within width"hi".center(6)" hi "

If char is omitted, space is used.

Indent / dedent

indent(n) takes an integer (number of spaces); the prefix is fixed spaces.

QUERY:  "line1\nline2".indent(2)
OUT:    "  line1\n  line2"

dedent() strips the first line's leading whitespace from every subsequent line that begins with the same prefix. It is not a common-prefix dedent across all lines:

QUERY:  "  a\n  b".dedent()
OUT:    "a\nb"

Slice

"hello world".slice(0, 5)      # "hello"
"hello world".slice(6)         # "world"
"hello".slice(-3)              # "llo"

slice(start, end?) mirrors Python; end is exclusive.

Repeat

"ab".repeat(3)        # "ababab"

Replace

MethodBehavior
replace(needle, with)Replace first literal occurrence
replace_all(needle, with)Replace all literal occurrences
replace_re(pattern, with)Regex-aware single replacement
replace_all_re(pattern, with)Regex-aware all replacements
QUERY:  "hello hello".replace("hello", "hi")
OUT:    ["hi hello"]

QUERY:  "hello hello".replace_all("hello", "hi")
OUT:    ["hi hi"]

QUERY:  "abc123def".replace_all_re("\d+", "#")
OUT:    "abc#def"

Regex escapes inside jetro string literals. Use a single backslash: "\d", "\w+", "\s". Jetro string literals don't eat backslashes separately; doubling ("\\d") sends the regex engine the literal two-char sequence \\d, which is not the digit class and silently fails to match. This differs from host languages like Python or JavaScript where you must double-escape.

Strip

"prefix-foo".strip_prefix("prefix-")  # "foo"
"foo.txt".strip_suffix(".txt")        # "foo"

If the prefix/suffix isn't present, returns the input unchanged.

Encoding

MethodWhat
to_base64Standard base64 encode
from_base64Standard base64 decode
url_encodePercent-encode
url_decodePercent-decode
html_escape&&amp;, <&lt;, etc.
html_unescapeReverse of html_escape
QUERY:  "hello world".to_base64()     OUT: "aGVsbG8gd29ybGQ="
QUERY:  "a b".url_encode()     OUT: "a%20b"
QUERY:  "<b>".html_escape()     OUT: "&lt;b&gt;"

Demand notes

All string transforms are Identity demand-wise: they don't change what the upstream needs to produce.

Practical examples

# Normalise display names
$.users.map(u => u.name.trim().title_case().first())

# Build an URL-safe slug
"My Article Title".lower().replace_all(" ", "-")
# → "my-article-title"

# CamelCase to snake_case migration
"FooBarBaz".snake_case()                # → "foo_bar_baz"

# Truncate with ellipsis
$.posts.map(p => p.body.slice(0, 100) + "..." if p.body.len() > 100 else p.body)

# Parse a comma-separated tag list
$.tags_csv.split(",").map(@.trim())

# Encode for URL
$.query.url_encode()

# Encode binary as base64
$.bytes.to_base64()

# HTML-escape user input
$.comments.map(c => c.text.html_escape())

# Pad a numeric ID for fixed-width keys
($.id as string).pad_left(8, "0")
# → "00000042" for id=42

# Strip a known prefix
"https://example.com/path".strip_prefix("https://")
# → "example.com/path"

# Build a banner
"=".repeat(40)                          # → "========================================"

# Indent a nested message
$.message.indent(4)

String Search and Regex

Predicates (return boolean)

MethodBehavior
is_blankTrue if empty or only whitespace
is_numericTrue if all chars are digits
is_alphaTrue if all chars are letters
is_asciiTrue if all bytes < 128
starts_with(prefix)Prefix check
ends_with(suffix)Suffix check
QUERY:  "  ".is_blank()     OUT: true
QUERY:  "abc123".is_numeric()     OUT: false
QUERY:  "hello".starts_with("he")     OUT: true

Position

MethodReturns
index_of(needle)First index of needle, or -1
last_index_of(needle)Last index of needle, or -1
QUERY:  "hello world".index_of("o")     OUT: 4
QUERY:  "hello world".last_index_of("o")     OUT: 7
"foo bar foo".matches("foo")    # 2 (count of literal occurrences)
"abc 12 cd 34".scan("\d+")     # ["12", "34"] (regex matches as strings)

Regex match

MethodReturns
re_match(pattern)Boolean
match_first(pattern)First match string, or null
match_all(pattern)Array of all match strings
captures(pattern)First match with groups: [full, g1, g2, …]
captures_all(pattern)Array of captures results
QUERY:  "a1b2".re_match("\d")     OUT: true
QUERY:  "a1b2".match_first("\d+")     OUT: "1"
QUERY:  "a1b2".match_all("\d+")     OUT: ["1","2"]

QUERY:  "key=val".captures("(\\w+)=(\\w+)")
OUT:    ["key=val","key","val"]

The ~= operator is sugar for re_match and returns the same boolean.

Splitting

MethodBehavior
split(sep)Split on literal separator
split_re(pattern)Split on regex
QUERY:  "a,b,c".split(",")     OUT: ["a","b","c"]
QUERY:  "a,,b".split_re(",+")     OUT: ["a","b"]

Multi-needle membership

"abc def".contains_any(["abc", "xyz"])    # true (matches first)
"abc def".contains_all(["abc", "def"])    # true (all match)

Demand notes

Regex builtins are scalar. Lift across an array with .map(...). The underlying regex is compiled once per query and reused — no per-element re-compilation cost.

Conversion and Parsing

Coerce between value kinds.

to_number

  • Signature: Any -> Number | null
  • Behavior: Coerce to number. "42"42, "3.14"3.14, true1, false0. Returns null for unparseable strings.
QUERY:  "42".to_number()     OUT: 42
QUERY:  "3.14".to_number()     OUT: 3.14
QUERY:  "abc".to_number()      OUT: null

to_bool

  • Signature: Any -> Boolean
  • Behavior: Truthiness: false/null/0/""/[]/{}false, everything else → true.
QUERY:  $.maybe.to_bool()

parse_int(radix?)

  • Signature: String -> Number | null
  • Behavior: Parse a string as integer, optional radix (default 10).
QUERY:  "42".parse_int()     OUT: 42
QUERY:  "ff".parse_int(16)     OUT: 255
QUERY:  "0b101".parse_int(2)     OUT: 5

parse_float

  • Signature: String -> Number | null
  • Behavior: Parse a string as float (IEEE 754 double).
QUERY:  "3.14".parse_float()     OUT: 3.14
QUERY:  "1e6".parse_float()     OUT: 1000000.0

parse_bool

  • Signature: String -> Boolean | null
  • Behavior: Strict parse: only "true" and "false" (lowercase) match; everything else returns null.
QUERY:  "true".parse_bool()     OUT: true
QUERY:  "TRUE".parse_bool()     OUT: true

as cast (operator)

The as operator does the same coercions as to_*:

"42" as int          # 42
42 as string         # "42"
true as int          # 1

Use as when the type is statically known; use to_number/parse_* when parsing untrusted strings (since as errors on failure rather than returning null).

Round-trip JSON

For full document round-trip, see from_json/to_json.

Practical examples

# Coerce strings collected from a CSV
$.rows.map(r => r.merge({age: r.age.to_number(), price: r.price.parse_float()}))

# Defensive parse — null on garbage
$.user_input.parse_int() ?? 0

# Boolean coercion of a flag string
"true".parse_bool() ?? false

# Truthiness coercion
$.value.to_bool()               # null/0/""/empty → false; else true

# Cast operator for static conversions
($.id as string).pad_left(8, "0")

# Round-trip number → string → back
(3.14 as string).parse_float()  # → 3.14

Streaming One-to-One

Each input produces exactly one output. These compose freely; the planner fuses adjacent stages into a single composed stage when possible.

Fixture

Examples in this chapter run against:

{
  "users": [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}],
  "xs":    [1, 2, 3, 4, 5],
  "prices":[100, 105, 102, 110, 108, 115]
}

map

  • Signature: Array<A> -> Array<B> (with f: A -> B)
  • Demand law: MapLike — preserves pull, forces Whole.
QUERY:  $.users.map(u => u.name)
OUT:    ["Ada","Bob"]

QUERY:  $.xs.map(@ * 2)
OUT:    [2, 4, 6, 8, 10]

QUERY:  $.users.map(@.name.upper())
OUT:    ["ADA","BOB"]

map is the workhorse. The lambda may use any of the three forms.

enumerate

  • Signature: Array<A> -> Array<{index: Number, value: A}>
  • Behavior: Pair each element with its zero-based index. Output is a record {index, value} per element.
QUERY:  $.xs.enumerate()
OUT:    [{"index":0,"value":1},{"index":1,"value":2},{"index":2,"value":3},{"index":3,"value":4},{"index":4,"value":5}]

QUERY:  $.users.map(@.name).enumerate()
OUT:    [{"index":0,"value":"Ada"},{"index":1,"value":"Bob"}]

pairwise

  • Signature: Array<A> -> Array<[A, A]>
  • Behavior: Yield consecutive pairs [xs[0], xs[1]], [xs[1], xs[2]], …
QUERY:  [1,2,3,4].pairwise()
OUT:    [[1,2],[2,3],[3,4]]

QUERY:  $.xs.pairwise().map(p => p[1] - p[0])
OUT:    [1, 1, 1, 1]

lag(n=1) and lead(n=1)

  • Signature: Array<Number> -> Array<Number | null>
  • Behavior: Shift by n positions; out-of-range positions become null.
  • Numeric: Output values are returned as floats regardless of input numeric type.
QUERY:  $.xs.lag()
OUT:    [null, 1.0, 2.0, 3.0, 4.0]

QUERY:  $.xs.lead()
OUT:    [2.0, 3.0, 4.0, 5.0, null]

QUERY:  $.xs.lag(2)
OUT:    [null, null, 1.0, 2.0, 3.0]

diff_window(n=1)

  • Signature: Array<Number> -> Array<Number | null>
  • Behavior: xs[i] - xs[i - n], with null until lag is satisfied.
QUERY:  $.prices.diff_window()
OUT:    [null, 5.0, -3.0, 8.0, -2.0, 7.0]

pct_change(n=1)

  • Signature: Array<Number> -> Array<Number | null>
  • Behavior: (xs[i] - xs[i-n]) / xs[i-n] — relative change.
QUERY:  [100.0, 110.0, 121.0].pct_change()
OUT:    [null, 0.1, 0.09999999999999998]

cummax and cummin

  • Signature: Array<Number> -> Array<Number>
  • Behavior: Running max / min up to and including the current position.
QUERY:  $.prices.cummax()
OUT:    [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]

QUERY:  $.prices.cummin()
OUT:    [100.0, 100.0, 100.0, 100.0, 100.0, 100.0]

zscore

  • Signature: Array<Number> -> Array<Number>
  • Behavior: Standardise: (x - mean) / stddev. Two passes (one for stats, one for transform); not strictly streaming, but presented as a one-to-one stage at the user surface.
QUERY:  [1.0, 2.0, 3.0, 4.0, 5.0].zscore()
OUT:    [-1.414213562373095, -0.7071067811865475, 0.0, 0.7071067811865475, 1.414213562373095]

accumulate

See Barriersaccumulate is a barrier because it requires a custom reducer over the full input.

Practical examples

DOC:    {"prices":[100, 105, 102, 110, 108, 115]}

# Apply tax to every price
QUERY:  $.prices.map(@ * 1.08)
OUT:    [108.0, 113.4, 110.16000000000001, 118.80000000000001, 116.64000000000001, 124.2]

# Day-over-day deltas
QUERY:  [100,105,102,110,108].pairwise().map(p => p[1] - p[0])
OUT:    [5, -3, 8, -2]

# Running max ("high-water mark")
QUERY:  $.prices.cummax()
OUT:    [100.0, 105.0, 105.0, 110.0, 110.0, 115.0]

# Lag-1 to compare current vs previous
QUERY:  $.prices.lag()
OUT:    [null, 100.0, 105.0, 102.0, 110.0, 108.0]

Filtering

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "xs": [1, 2, 3, 4, 5]}

Methods that drop elements based on a predicate.

filter

  • Signature: Array<A> -> Array<A> (with pred: A -> Bool)
  • Demand law: FilterLikeFirstInput(n) from downstream becomes UntilOutput(n) upstream.
$.users.filter(u => u.active)
$.users.filter(@.age >= 18)
$.users.filter(@.email ~= "@admin\.")

filter is the universal predicate stage. Combine with .take(n) for bounded scans:

$.events.filter(@.severity >= 3).take(10)

The planner stops reading from the source as soon as 10 events pass — no full scan.

find

  • Signature: Array<A> -> A | null (first match only on this branch)
  • Demand law: FilterLike with FirstInput(1) → source.
DOC:    {"users": [{"id":1,"role":"user"},{"id":2,"role":"admin"}]}
QUERY:  $.users.find(@.role == "admin")
OUT:    {"id":2,"role":"admin"}

find returns the first match (or null if none), not an array. Use find_all for the array form.

find_all

  • Signature: Array<A> -> Array<A>
  • Behavior: Like filter. Alias kept for readability.
$.users.find_all(@.role == "admin")

Equivalent to .filter(@.role == "admin"). The two are interchangeable.

compact

  • Signature: Array<Any> -> Array<Any>
  • Behavior: Drop nulls.
QUERY:  [1, null, 2, null, 3].compact()
OUT:    [1,2,3]

Equivalent to .filter(@ != null), but reads better and avoids a lambda.

take_while (alias takewhile)

  • Signature: Array<A> -> Array<A>
  • Behavior: Take elements while pred is true; stop at the first false (don't keep checking).
QUERY:  [1, 2, 3, 4, 1, 2].take_while(@ < 3)
OUT:    [1,2]

Demand law: bounded — terminates the source as soon as pred flips.

drop_while (alias dropwhile)

  • Signature: Array<A> -> Array<A>
  • Behavior: Drop the leading run where pred holds; emit the rest.
QUERY:  [1, 2, 3, 4, 1, 2].drop_while(@ < 3)
OUT:    [3,4,1,2]

remove

  • Signature: Array<A> -> Array<A>
  • Behavior: Inverse of filter. Drop elements where pred is true.
QUERY:  $.xs.remove(@ < 0)

Useful when the negated predicate reads worse than the affirmative.

Filtering objects

For object filtering, see filter_keys and filter_values in Objects. They take a predicate over keys / values and return a filtered object.

Practical examples

DOC:    {"users":[
  {"id":1,"name":"Ada","active":true,"age":30},
  {"id":2,"name":"Bob","active":false,"age":24},
  {"id":3,"name":"Cy", "active":true,"age":42}
]}

# Active users only
QUERY:  $.users.filter(@.active)
OUT:    []

# Active users over 30, just names
QUERY:  $.users.filter(@.active and @.age >= 30).map(@.name)
OUT:    []

# First admin (early-exit)
QUERY:  $.users.find(@.active).name
OUT:    "Ada"

# Take while a streak holds
QUERY:  [1,2,3,4,1,2].take_while(@ < 3)
OUT:    [1,2]

# Negate a predicate
QUERY:  $.users.remove(@.active).count()
OUT:    1

# Drop nulls
QUERY:  [1, null, 2, null, 3].compact()
OUT:    [1,2,3]

Worked demand example

DOC:    {"events": [
  {"sev": 1, "msg": "ok"},
  {"sev": 2, "msg": "warn"},
  {"sev": 3, "msg": "err"},
  {"sev": 1, "msg": "ok2"}
]}

QUERY:  $.events.filter(@.sev >= 2).map(@.msg).take(2)
OUT:    []

Demand walks back: take(2) → FirstInput(2), map → preserves, filter → UntilOutput(2). Source reads events one-by-one, stops after the second match.

Expanding Sequences

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}]}

Each input produces zero or many outputs.

flat_map

  • Signature: Array<A> -> Array<B> (with f: A -> Array<B>)
  • Behavior: Map then concatenate.
QUERY:  [[1,2],[3,4]].flat_map(@)
OUT:    [1,2,3,4]

QUERY:  $.users.flat_map(u => u.tags)

If f returns a non-array, it's wrapped first (flat_map(@ + 1) works on numbers).

flatten

  • Signature: Array<Array<A>> -> Array<A>
  • Behavior: One level of flattening.
QUERY:  [[1,2],[3],[4,5]].flatten()
OUT:    [1,2,3,4,5]

To flatten more levels, chain: .flatten().flatten(). Or use walk for full recursive flatten of arbitrary structure.

explode

v0.5 status: explode requires an argument in v0.5 (errors with "explode: missing argument" on no-arg call). Spec is intended to mirror chars / to_pairs for the common cases; until then, use those builtins directly.

  • Signature (intended): (Array | Object | String) -> Array<...>
  • Behavior (intended): Convert to a flat sequence of elements / pairs / chars.
    • Array: identity
    • Object: array of [key, value] pairs (= to_pairs)
    • String: array of single-char strings (= chars)

split(sep)

  • Signature: String -> Array<String>
  • Behavior: Split a string on a literal separator. (See split_re for regex.)
QUERY:  "a,b,c".split(",")
OUT:    ["a","b","c"]

lines

  • Signature: String -> Array<String>
  • Behavior: Split on newline (\n or \r\n).
QUERY:  "a\nb\nc".lines()
OUT:    ["a","b","c"]

words

  • Signature: String -> Array<String>
  • Behavior: Split on whitespace (any run).
QUERY:  "  hello  world  ".words()
OUT:    ["hello","world"]

chars

  • Signature: String -> Array<String>
  • Behavior: Array of single-character strings.
QUERY:  "abc".chars()
OUT:    ["a","b","c"]

chars_of(s)

  • Signature: String -> Array<String>
  • Behavior: Equivalent to s.chars(). Useful when the source is the argument:
QUERY:  ($.text).chars_of()

bytes

  • Signature: String -> Array<Number>
  • Behavior: UTF-8 byte values, 0–255.
QUERY:  "abc".bytes()
OUT:    [97,98,99]

Demand notes

Expanding stages declare an indeterminate output count. Pull demand from downstream still flows back, but the planner can't tightly bound how many inputs are needed — it pulls one input at a time and yields outputs lazily.

.flat_map(...) followed by .first() will read inputs until the first flat-mapped output appears, then stop.

Practical examples

# Flatten one level
[[1,2],[3,4],[5]].flatten()                # → [1, 2, 3, 4, 5]

# Tags across all books
$.books.flat_map(@.tags)

# Distinct hashtags across tweets
$.tweets.flat_map(t => t.entities.hashtags.map(@.text)).unique()

# Word histogram from a paragraph
$.text.words().map(@.lower()).count_by(@)

# Parse CSV headers
"id,name,email".split(",")

# Process logs line by line
$.log_blob.lines().filter(@.contains_any(["ERROR","WARN"]))

# Char-level analysis
$.password.chars().count_by(@)             # frequency of each char

# Bytes for a binary diff
"hello".bytes()                            # → [104, 101, 108, 108, 111]

Reducers and Aggregates

Reducers consume the whole stream and emit a single value. They terminate the streaming pipeline.

Numeric

MethodSignatureNotes
sumArray<Number> -> NumberEmpty → 0
avgArray<Number> -> NumberEmpty → null
minArray<Number|String> -> ...Empty → null
maxArray<Number|String> -> ...Empty → null
QUERY:  [1,2,3,4].sum()     OUT: 10
QUERY:  [1,2,3,4].avg()     OUT: 2.5
QUERY:  [3,1,4,1,5].min()     OUT: 1.0
QUERY:  ["b","a","c"].max()   OUT: "c"

Demand law: NumericReducerValueNeed::Numeric, pull = All.

count

  • Signature: Array -> Number
  • Behavior: Element count.
  • Demand: All inputs, ValueNeed::None (no payload decoded).
QUERY:  $.users.count()
QUERY:  $.users.filter(@.active).count()

This is the cheapest reducer — the source skips deserialisation entirely.

approx_count_distinct

Not yet supported in v0.5 — runtime returns "ApproxCountDistinct: builtin unsupported". Spec exists; HyperLogLog backend pending.

  • Signature (planned): Array<Any> -> Number
  • Behavior (planned): Approximate count of distinct values via HLL.

For now, use .unique().count() for exact distinct count.

any (alias exists)

  • Signature: Array<A> -> Bool (with pred: A -> Bool)
  • Behavior: True if any element matches. Short-circuits.
QUERY:  $.users.any(@.role == "admin")
OUT:    false

all

  • Signature: Array<A> -> Bool
  • Behavior: True if every element matches. Short-circuits on first false.
QUERY:  $.flags.all(@ == true)

find_index

  • Signature: Array<A> -> Number | null
  • Behavior: Zero-based index of first match, or null.
QUERY:  ["a","b","c"].find_index(@ == "b")
OUT:    1

indices_where

  • Signature: Array<A> -> Array<Number>
  • Behavior: All indices where pred matches.
QUERY:  [10, 20, 5, 30, 8].indices_where(@ < 15)
OUT:    [0,2,4]

max_by and min_by

  • Signature: Array<A> -> A | null
  • Behavior: Element with the maximum / minimum projected key.
QUERY:  $.books.max_by(@.year)
QUERY:  $.users.min_by(@.age)

Distinguish from .sort(@.key).first()max_by is one pass; the sort form allocates the sorted array first.

When to use which

GoalUse
Sum/avg numberssum, avg
Count rowscount
Exact distinct count.unique().count()
Existence checkany
Universal checkall
Find indexfind_index
Pick single max/min elementmax_by, min_by

Practical examples

DOC:    {"books":[
  {"title":"Dune","year":1965,"price":15},
  {"title":"Foundation","year":1951,"price":10},
  {"title":"Hyperion","year":1989,"price":18},
  {"title":"Snow Crash","year":1992,"price":12}
]}

# Total revenue across all books
QUERY:  $.books.map(@.price).sum()
OUT:    0

# Mean price
QUERY:  $.books.map(@.price).avg()
OUT:    13.75

# Earliest and most expensive
QUERY:  $.books.min_by(b => b.year).title
OUT:    "Foundation"

QUERY:  $.books.max_by(b => b.price).title
OUT:    "Hyperion"

# Any cyberpunk in the catalog?
QUERY:  $.books.any(@.tags? and @.tags.includes("cyberpunk"))
# (where @.tags? guards against missing field)

# Count books published before 1970
QUERY:  $.books.filter(@.year < 1970).count()
OUT:    0

# Position of the first 1990s book
QUERY:  $.books.find_index(@.year >= 1990)
OUT:    3

# All published years where price > 12
QUERY:  $.books.indices_where(@.price > 12)
OUT:    []

Positional Access

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "transactions": [{"ts": "01"}, {"ts": "02"}, {"ts": "03"}]}

Bounded extraction by position.

first

  • Signature: Array<A> -> A | null
  • Demand law: First — always FirstInput(1).
QUERY:  [10,20,30].first()     OUT: 10
QUERY:  [].first()              OUT: null

QUERY:  $.users.filter(@.active).first()
# Source reads only enough to get one active user.

Equivalent to .nth(0) but reads better and is the canonical "early-exit" sink.

last

  • Signature: Array<A> -> A | null
  • Demand law: Last — always LastInput(1).
QUERY:  [10,20,30].last()     OUT: 30

When the source supports it (an in-memory array, or a tape with known length), last seeks to the end; for streams it must drain.

nth(i)

  • Signature: Array<A> -> A | null
  • Demand law: NthInput(i) if i is non-negative; LastInput(-i) otherwise.
QUERY:  [10,20,30,40].nth(2)     OUT: 30
QUERY:  [10,20,30,40].nth(-1)     OUT: 40

find_first(pred)

  • Signature: Array<A> -> A | null
  • Behavior: Same as find — kept for naming clarity. Use find in new code.

find_one(pred)

  • Signature: Array<A> -> A | null
  • Behavior: Asserts at most one match; errors if more than one matches. Useful for "exactly one user with this id" shapes.
QUERY:  $.users.find_one(@.id == 1)

collect

  • Signature: Any -> Array<Any>
  • Behavior: Coerce to array. Scalar → [scalar]; array → identity; null → [].
QUERY:  42.collect()     OUT: [42]
QUERY:  [1,2].collect()     OUT: [1,2]
QUERY:  null.collect()     OUT: []

Use collect to guarantee an array shape at a pipeline boundary — useful for callers that always want to iterate.

When to use a positional vs. a reducer

first() is a positional sink (returns one element). count() is a reducer (returns one number). Both terminate the pipeline. Use whichever matches your output type.

Worked example

DOC:    {"orders": [
  {"id": 1, "total": 100},
  {"id": 2, "total": 50},
  {"id": 3, "total": 200}
]}

QUERY:  $.orders.filter(@.total > 75).first().id
OUT:    1

QUERY:  $.orders.sort_by(@.total).last().id
OUT:    3

The first query early-exits (one filter pass, one match). The second sorts (barrier), then takes the last — the planner can't avoid the sort.

Practical examples

# First active user — early-exit, demand-aware
$.users.find(@.active).name

# Last log entry of severity 3+ (when the source supports random access)
$.logs.filter(@.sev >= 3).last().msg

# Get a user at known index
$.users.nth(2).email

# Negative-index array tail
$.transactions.nth(-1).ts

# Coerce-or-empty: scalar source becomes a 1-element array
"hello".collect()      # → ["hello"]
null.collect()         # → []

# Use collect() at a method-call boundary so callers always iterate
$.config.tags.collect().map(@.lower())

Barrier Operators

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "books": [{"title": "Dune", "year": 1965, "author": "Herbert", "tags": ["sf"], "price": 15, "genre": "sci-fi"}, {"title": "Foundation", "year": 1951, "author": "Asimov", "tags": ["sf", "hugo"], "price": 10, "genre": "sci-fi"}, {"title": "Hyperion", "year": 1989, "author": "Simmons", "tags": ["sf", "hugo"], "price": 18, "genre": "cyberpunk"}, {"title": "Snow Crash", "year": 1992, "author": "Stephenson", "tags": ["sf", "cyberpunk"], "price": 12, "genre": "cyberpunk"}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}], "daily": [{"day": 1, "value": 10}, {"day": 2, "value": 12}]}

Barriers must see the full input before emitting any output. They materialise. Place them late in pipelines when possible.

Sort

sort (alias sort_by)

  • Signature: Array<A> -> Array<A>
  • Behavior: Stable ascending sort. With a projection, sorts by the projected key.
QUERY:  [3,1,4,1,5].sort()
OUT:    [1,1,3,4,5]

QUERY:  $.books.sort(@.year)
QUERY:  $.books.sort(b => -b.year)
QUERY:  $.users.sort(@.last_name, @.first_name)

Multi-arg form sorts by a tuple of keys.

Distinct

unique (alias distinct)

  • Signature: Array<A> -> Array<A>
  • Behavior: Remove duplicates by structural equality, preserving first occurrence order.
QUERY:  [3,1,4,1,5,9,2,6,5].unique()
OUT:    [3,1,4,5,9,2,6]

unique_by(f)

  • Signature: Array<A> -> Array<A>
  • Behavior: Dedup by projected key.
QUERY:  $.books.unique_by(@.author)

Group / count / index

group_by(key)

  • Signature: Array<A> -> Object<KeyString, Array<A>>
  • Behavior: Bucket by projected key.
QUERY:  $.books.group_by(@.author)
OUT:    {"null":[null]}

count_by(key)

  • Signature: Array<A> -> Object<KeyString, Number>
  • Behavior: Bucket counts.
QUERY:  $.books.count_by(@.author)
OUT:    [null]

index_by(key)

  • Signature: Array<A> -> Object<KeyString, A>
  • Behavior: Index by key. Last wins on collision.
QUERY:  $.users.index_by(@.id)
OUT:    [null]

group_shape

Not yet supported in v0.5 — runtime returns "GroupShape: builtin unsupported". Tracked for a future release.

  • Signature: Array<Object> -> Array<Object>
  • Behavior (planned): Group by structural shape (key set).

Partition

partition(pred)

Not yet supported in v0.5 for chained / pipeline use. The apply_* trait dispatch isn't wired through the streaming planner; calling it inside a chain like $.store.books.partition(@.x) is unreliable. Spec exists but output shape and execution path are subject to change.

  • Signature (planned): Array<A> -> [Array<A>, Array<A>]
  • Behavior (planned): [matching, non-matching].

Window / chunk

window(size)

  • Signature: Array<A> -> Array<Array<A>>
  • Behavior: Sliding window of size.
QUERY:  [1,2,3,4,5].window(3)
OUT:    [[1,2,3],[2,3,4],[3,4,5]]

chunk(size) (alias batch)

  • Signature: Array<A> -> Array<Array<A>>
  • Behavior: Non-overlapping chunks. Last chunk may be shorter.
QUERY:  [1,2,3,4,5,6,7].chunk(3)
OUT:    [[1,2,3],[4,5,6],[7]]

Rolling aggregates

MethodBehavior
rolling_sum(n)Sum over a window of size n
rolling_avg(n)Average over a window
rolling_min(n)Min over a window
rolling_max(n)Max over a window
QUERY:  [1,2,3,4,5].rolling_sum(3)
OUT:    [null,null,6.0,9.0,12.0]

The leading n-1 positions emit null until the window fills.

accumulate(init, fn)

Not yet supported in v0.5 — runtime returns "accumulate: builtin not migrated to builtins.rs AST adapter". Spec exists; runtime hookup pending.

  • Signature (planned): Array<A> -> Array<B> (with fn: (B, A) -> B, init: B)
  • Behavior (planned): Streaming fold producing intermediate states.

For now, use cummax / cummin for running min/max, or build the fold with a let + recursive helper if absolutely needed.

When to barrier

You have to barrier when:

  • Order needs computation (sort, unique)
  • Output is grouped / indexed (group_by, index_by)
  • A window crosses element boundaries (window, rolling_*)

You don't need a barrier for:

  • Per-element transforms (map)
  • Predicates (filter)
  • Numeric reducers (sum, count) — they're streaming reducers, not barriers

Practical examples

DOC:    {"books":[
  {"title":"Dune","year":1965,"author":"Herbert","price":15},
  {"title":"Foundation","year":1951,"author":"Asimov","price":10},
  {"title":"Hyperion","year":1989,"author":"Simmons","price":18},
  {"title":"Snow Crash","year":1992,"author":"Stephenson","price":12}
]}

# Sort by year ascending
QUERY:  $.books.sort(b => b.year).map(@.title)
OUT:    [null]

# Sort by price descending (negate the key)
QUERY:  $.books.sort(b => -b.price).map(@.title)
OUT:    [null]

# Distinct tags across books
QUERY:  $.books.flat_map(@.tags).unique()

# How many distinct authors
QUERY:  $.books.unique_by(b => b.author).count()
OUT:    1

# Group by author
QUERY:  $.books.group_by(b => b.author)
OUT:    {"null":[null]}

# Histogram of authors (prefer count_by — no buffering of bucket payloads)
QUERY:  $.books.count_by(b => b.author)
OUT:    [null]

# Build a quick lookup table
QUERY:  $.users.index_by(u => u.id)

# Sliding-3 windows for moving stats
QUERY:  $.measurements.window(3).map(w => w.sum() / 3)

# 50/50 split into batches of 10 for paginated processing
QUERY:  $.records.chunk(10)

# 7-day moving average over a numeric series
QUERY:  $.daily.rolling_avg(7)

Array and Set Operations

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "metric": {"pct": 0.5, "value": 7, "x": 10}, "tags_today": ["a", "b", "c"], "tags_yesterday": ["b", "c", "d"], "left_tags": ["a", "b", "c"], "right_tags": ["b", "c", "d"]}

Operations that take an array and produce a derivative array (or join two arrays).

append(v) and prepend(v)

  • Signature: Array<A> -> Array<A>
  • Behavior: Add v to the end / front.
QUERY:  [1,2,3].append(4)     OUT: [1,2,3,4]
QUERY:  [1,2,3].prepend(0)     OUT: [0,1,2,3]

When used as chain-write terminals ($.path.append(v)), they patch the document — see Patch.

reverse

  • Signature: Array<A> -> Array<A>
  • Behavior: Reverse element order. Also works on strings (calls reverse_str).
QUERY:  [1,2,3].reverse()     OUT: [3,2,1]
QUERY:  "abc".reverse()     OUT: ["abc"]

Set-like operations

MethodBehavior
diff(other)Elements in self not in other
intersect(other)Elements in both
union(other)Elements in either, deduped
QUERY:  [1,2,3,4].diff([3,4,5])     OUT: [1,2]
QUERY:  [1,2,3,4].intersect([3,4,5])     OUT: [3,4]
QUERY:  [1,2,3].union([3,4,5])     OUT: [1,2,3,4,5]

Equality is structural. Order: result preserves first-occurrence order from the left operand.

join(sep)

  • Signature: Array<String> -> String
  • Behavior: Concatenate strings with separator.
QUERY:  ["a","b","c"].join(", ")
OUT:    "a, b, c"

QUERY:  $.users.map(@.name).join(" / ")

For non-string elements, lift with .map(@.to_string()) first.

zip(other) and zip_longest(other, fill?)

  • Signature: Array<A>, Array<B> -> Array<[A, B]>
  • Behavior: Pair element-wise.
QUERY:  [1,2,3].zip(["a","b","c"])
OUT:    [[1,"a"],[2,"b"],[3,"c"]]

QUERY:  [1,2,3].zip(["a","b"])     OUT: [[1,"a"],[2,"b"]]
QUERY:  [1,2,3].zip_longest(["a","b"]) OUT: [[1,"a"],[2,"b"],[3,null]]
QUERY:  [1,2,3].zip_longest(["a"], "x") OUT: [[1,"a"],[2,"x"],[3,"x"]]

fanout(...lambdas)

  • Signature: A -> Array<...>
  • Behavior: Apply each lambda to the same input; collect results.
DOC:    {"x": 10}
QUERY:  $.x.fanout(@ * 2, @ + 1, @.to_string())
OUT:    [20,11,"10"]

Useful for building multi-shape projections without repeating subexpressions.

zip_shape(arrays)

Not yet supported in v0.5 — runtime returns "ZipShape: builtin unsupported". Spec exists; runtime hookup pending.

  • Signature (planned): Object<KeyString, Array<A>> -> Array<Object>
  • Behavior (planned): Combine parallel arrays under shared keys into an array of objects.

The inverse is pivot — see Objects.

Demand notes

Set operations and join are barriers (they consume both inputs fully). reverse is a barrier too — but it's cheap and well-supported by demand: reverse().take(n) is rewritten so the source seeks to the end.

Practical examples

# Add an item to a tag list
$.user.tags.append("admin")             # patches the doc

# Build a "label = value" string
$.user.pick(name, email).values().join(" = ")

# CSV row from selected fields
[$.user.id, $.user.name, $.user.email].join(",")

# Set difference — find items missing from a baseline
[1,2,3,4,5].diff([2,4])                 # → [1, 3, 5]

# Set intersection — common items
$.left_tags.intersect($.right_tags)

# Merge unique values, preserving first-occurrence order
$.tags_today.union($.tags_yesterday)

# Reverse and take last 5 (demand-aware: seeks end)
$.events.reverse().take(5)

# Pair two arrays positionally
[1,2,3].zip(["a","b","c"])              # → [[1,"a"],[2,"b"],[3,"c"]]

# Pad shorter array with default
[1,2,3].zip_longest(["a","b"], "?")     # → [[1,"a"],[2,"b"],[3,"?"]]

# Run several projections at once
$.metric.value.fanout(@ * 2, @ + 1, @ - 1)    # → [v*2, v+1, v-1]

Object Projection and Transform

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

Methods that read or rewrite objects.

Keys and values

MethodSignatureResult
keysObject -> Array<String>Insertion-order key list
valuesObject -> Array<Any>Insertion-order value list
entriesObject -> Array<[String, Any]>Key-value pairs
to_pairsObject -> Array<[String, Any]>Alias of entries
DOC:    {"a": 1, "b": 2}
QUERY:  $.keys()     OUT: ["a","b"]
QUERY:  $.values()     OUT: [1,2]
QUERY:  $.entries()     OUT: [["a",1],["b",2]]

from_pairs

  • Signature: Array<[String, Any]> -> Object
  • Behavior: Inverse of to_pairs.
QUERY:  [["a",1],["b",2]].from_pairs()
OUT:    {"a":1,"b":2}

invert

  • Signature: Object<K, V> -> Object<V, K>
  • Behavior: Swap keys and values. Values must be coercible to keys (string-like).
QUERY:  {"a":"x","b":"y"}.invert()
OUT:    {"x":"a","y":"b"}

pick(field, ...)

  • Signature: Object -> Object
  • Behavior: Keep only the named keys. Supports alias: src rename.
DOC:    {"id": 1, "name": "Ada", "secret": "!"}

QUERY:  $.pick(id, name)
OUT:    {"id":1,"name":"Ada"}

QUERY:  $.pick(uid: id, name)
OUT:    {"name":"Ada","uid":1}

Maps over arrays of objects:

$.users.pick(id, email)

is equivalent to $.users.map(u => u.pick(id, email)).

omit(field, ...)

  • Signature: Object -> Object
  • Behavior: Inverse of pick. Drop the named keys.
QUERY:  $.user.omit(secret, password)

Merge

MethodBehavior
merge(other)Shallow merge — other's keys win on collision
deep_merge(other)Recursive merge — sub-objects merged, arrays replaced
defaults(other)Reverse merge — keep self's keys, fill missing from other
QUERY:  {"a":1,"b":2}.merge({"b":99,"c":3})
OUT:    {"a":1,"b":99,"c":3}

QUERY:  {"a":{"x":1}}.deep_merge({"a":{"y":2}})
OUT:    {"a":{"x":1,"y":2}}

QUERY:  {"a":1}.defaults({"a":99,"b":2})
OUT:    {"a":1,"b":2}

rename(...mapping)

  • Signature: Object -> Object
  • Behavior: Rename keys per a {old: new, ...} mapping.
QUERY:  $.user.rename({user_id: id, full_name: name})

transform_keys(fn) and transform_values(fn)

  • Signature: Object -> Object
  • Behavior: Apply fn to every key / value.
QUERY:  {"foo": 1, "bar": 2}.transform_keys(@.upper())
OUT:    [{"BAR":2,"FOO":1}]

QUERY:  {"a": 1, "b": 2}.transform_values(@ * 10)
OUT:    [{"a":10,"b":20}]

filter_keys(pred) and filter_values(pred)

  • Signature: Object -> Object
  • Behavior: Keep entries whose key / value matches the predicate.
QUERY:  $.config.filter_keys(k => k.starts_with("aws_"))
QUERY:  $.scores.filter_values(@ >= 50)

pivot(rows, cols, value)

  • Signature: Array<Object> -> Object<KeyString, Object>
  • Behavior: Pivot a table-shaped array into a nested object indexed by rows then cols, with value as the leaf.
DOC:    [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY:  $.pivot("y", "q", "v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15}}

implode(joiner=",")

  • Signature: Array<String> -> String
  • Behavior: Like join, but works on object values too:
QUERY:  {"a":"x","b":"y"}.values().implode("/")
OUT:    ["x","y"]

Demand notes

pick is a powerful demand signal — it tells the source which fields are needed. Over a wide-record document, pick(id, name) upstream of the rest of the pipeline avoids decoding all the other fields.

keys over an array stage emits one row per element, but keys over a single object is a scalar.

Practical examples

DOC:    {"users":[
  {"id":1,"name":"Ada","email":"ada@x.com","secret":"!"},
  {"id":2,"name":"Bob","email":"bob@y.org","secret":"?"}
]}

# Project safe public fields
QUERY:  $.users.map(u => u.pick(id, name, email))

# Drop sensitive keys
QUERY:  $.users.map(u => u.omit(secret))

# Rename in flight
QUERY:  $.users.map(u => u.pick(uid: id, full_name: name, email))

# Keys / values / entries
QUERY:  $.users[0].keys()                  → ["id","name","email","secret"]
QUERY:  $.users[0].values().count()        → 4
QUERY:  $.users[0].entries().count()       → 4

# Round-trip through entries
QUERY:  $.users[0].entries().from_pairs()  → equivalent to $.users[0]

# Merge with defaults (existing keys win)
QUERY:  $.config.defaults({timeout: 30, retries: 3})

# Deep-merge config layers
QUERY:  $.base_config.deep_merge($.user_config)

# Filter object by key prefix
QUERY:  $.env.filter_keys(k => k.starts_with("AWS_"))

# Filter values
QUERY:  $.scores.filter_values(@ >= 50)

# Apply transform to every value
QUERY:  $.prices.transform_values(@ * 1.08)

# Normalise keys to snake_case
QUERY:  $.payload.transform_keys(k => k.snake_case())

# Invert a code-to-name table
QUERY:  $.country_codes.invert()           # {"US":"United States",...} → {"United States":"US",...}

# Pivot long-format records
DOC:    [{"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},{"y":2025,"q":1,"v":15}]
QUERY:  $.pivot("y","q","v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15}}

Path and Structural Mutation

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

Methods that read, set, delete, or rewrite values at specific paths within a document. These work on whole documents or sub-trees.

For chain-write terminals ($.path.set(v)) see Patch. This chapter documents the method-call versions.

get_path(path)

v0.5 quirk: only resolves a single key — get_path("a/b/c") returns null even when $.a.b.c exists. Use direct path navigation ($.a.b.c) when the path is statically known. For dynamic paths, walk manually with let + chained [expr].

  • Signature (intended): Any, String -> Any | null
  • Behavior (intended): Read a value at a slash-separated path.
DOC:    {"user": {"profile": {"name": "Ada"}}}
QUERY:  $.get_path("user")
OUT:    {"profile":{"name":"Ada"}}
QUERY:  $.get_path("user/profile")
OUT:    {"name":"Ada"}

set_path(path, value)

  • Signature: Any, String, Any -> Any
  • Behavior: Return a copy with value written at path. Creates intermediate objects as needed.
QUERY:  $.set_path("user/profile/email", "ada@example.com")

del_path(path)

  • Signature: Any, String -> Any
  • Behavior: Return a copy with the leaf at path removed.
QUERY:  $.del_path("user/secret")

del_paths(paths)

  • Signature: Any, Array<String> -> Any
  • Behavior: Remove all listed paths in one pass. Cheaper than chained del_path for many removals.
QUERY:  $.del_paths(["user/secret", "user/temp", "session/csrf"])

has_path(path)

  • Signature: Any, String -> Bool
  • Behavior: True if a value exists at path. Distinguishes "missing" from "explicit null":
DOC:    {"a": null}
QUERY:  $.has_path("a")     OUT: false
QUERY:  $.has_path("b")     OUT: false

flatten_keys(sep="/")

  • Signature: Object -> Object
  • Behavior: Flatten a nested object into a single-level object with joined keys.
DOC:    {"a": {"b": 1, "c": 2}, "d": 3}
QUERY:  $.flatten_keys()
OUT:    {"a.b":1,"a.c":2,"d":3}

QUERY:  $.flatten_keys(".")
OUT:    {"a.b":1,"a.c":2,"d":3}

unflatten_keys(sep="/")

  • Signature: Object -> Object
  • Behavior: Inverse of flatten_keys.
QUERY:  {"a/b": 1, "a/c": 2}.unflatten_keys()
OUT:    {"a/b":1,"a/c":2}

set(path, value) (method-call form)

  • Signature: Any, String, Any -> Any
  • Behavior: Same as set_path. Kept for ergonomic chains.

The chain-write terminal $.path.set(v) is different — it's parsed as a patch and operates on the rooted document path.

update

update is jetro's functional batched update. Two surfaces:

Object body — update({k: expr, ...})

Apply a set of field updates to one or more selected subtrees. Plain keys update fields below the receiver; quoted keys carry full paths.

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf", "hugo"]}
]}

QUERY:  $.books[*].update({tags: tags.append("test"), reviewed: true})
OUT:    {"books":[{"reviewed":true,"tags":["sf","test"],"title":"Dune","year":1965},{"reviewed":true,"tags":["sf","hugo","test"],"title":"Hyperion","year":1989}]}

Each selected book gets both fields written. Plain identifiers (tags, reviewed) are read against the selected snapshot — not the mid-batch document — so two ops on the same target both see the original field values.

Body forms:

FormMeaning
field: exprWrite expr into field of each selected target
"a.b.c": exprWrite into a nested path inside each selected target
"books[*].tags": exprQuoted path key — full root-relative path with wildcards/filters
field: expr when condSkip when cond is falsy
field: DELETERemove the field (with optional when)

@ inside the body is the current value at the target field (handy inside path keys); $ is the original root.

QUERY:  $.books[*].update({tags: tags.append("modern") when year > 1980})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","hugo","modern"],"title":"Hyperion","year":1989}]}

Root-level batch with quoted paths

When the receiver is $, quoted keys carry full paths, including wildcards and DELETE:

QUERY:  $.update({"books[*].tags": @.append("test"), active: false})
DOC:    {"books": [{"tags": ["sf"]}], "active": true}
OUT:    {"active":false,"books":[{"tags":["sf","test"]}]}
DOC:    {"users": [{"id":1,"secret":"a"}, {"id":2,"secret":"b"}]}
QUERY:  $.update({"users[*].secret": DELETE})
OUT:    {"users":[{"id":1},{"id":2}]}

Filtered wildcard [* if pred]

Both selectors and quoted path keys support a filtered wildcard:

DOC:    {"books": [
  {"title": "Dune", "year": 1965, "tags": ["sf"]},
  {"title": "Hyperion", "year": 1989, "tags": ["sf"]}
]}

QUERY:  $.books[* if year > 1980].update({tags: tags.append("modern")})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

QUERY:  $.update({"books[* if year > 1980].tags": @.append("modern")})
OUT:    {"books":[{"tags":["sf"],"title":"Dune","year":1965},{"tags":["sf","modern"],"title":"Hyperion","year":1989}]}

Two-argument path form — update(path, expr)

The classic shape: a slash- or dot-separated path plus an expression. @ inside the expression is the current value at path.

DOC:    {"counters": {"visits": 10, "clicks": 3}}
QUERY:  $.update("counters.visits", @ + 1)
OUT:    {"counters":{"clicks":3,"visits":11}}

QUERY:  $.update("counters/visits", @ + 1)
OUT:    {"counters":{"clicks":3,"visits":11}}

Semantics

PropertyBehavior
Snapshot readsEach body expression sees the pre-batch values, not partial mid-batch state
OrderOps apply in source order — last write wins on overlap
SelectorsIndex, wildcard [*], filtered wildcard [* if pred], nested chains all OK
Scalar targetsAn update with object body promotes scalar elements to objects ({seen: true} over [1,2][{seen:true},{seen:true}])
Untouched subtreesPreserved by Arc sharing — no deep copy of unrelated fields
Empty body.update({}) is a no-op — returns the doc unchanged

Worked example

DOC:    {"users": [
  {"id": 1, "secret": "a", "name": "Ada"},
  {"id": 2, "secret": "b", "name": "Bob"}
]}

QUERY:  $.users.map(u => u.del_paths(["secret"]).set_path("display", u.name))
OUT:    [{"display":null}]

Demand notes

Path-mutation methods produce a full result and can't tell the source what fields they need (the path is data, not statically analysable). When the path is a literal, prefer pick/omit/set over get_path/set_path — the planner can use literal field names.

Practical examples

# Single-key write (preferred over set_path for v0.5)
$.user.name.set("Ada Lovelace")                  # chain-write

# Set a field deep
patch $ { user.profile.email: "ada@x.com" }

# Bulk delete
$.del_paths(["secret","temp","csrf"])

# Flatten a nested config for environment-variable export
$.config.flatten_keys(".")                       # {"db.host":..., "db.port":..., ...}

# Round-trip via flatten/unflatten
$.config.flatten_keys().unflatten_keys()         # ≈ $.config

# Existence test before write
patch $ {
  email: $.user.email when $.has_path("user.email")
}

# Flat-key patches
$.patch_set.flatten_keys().entries().map(([k,v]) => $.set_path(k, v))

Deep Traversal and Recursion

Walk every descendant value in DFS pre-order. The deep methods are also available as ..method(...) syntax sugar in path position.

deep_find(pred) (or ..find(pred))

  • Signature: Any -> Array<Any>
  • Behavior: Every descendant satisfying pred. Order: DFS pre-order.
DOC:    {"a": {"x": 1}, "b": [{"x": 2}, {"y": 3}]}
QUERY:  $..find(@.x?)
OUT:    [{"x":1},{"x":2}]

QUERY:  $.deep_find(@ is number)
OUT:    [1,2,3]

When the structural index is available, deep_find runs over a bitmap representation in jetro-experimental rather than walking Val nodes — significantly faster for shallow predicates.

deep_shape({k1, k2, ...}) (or ..shape({...}))

  • Signature: Any -> Array<Object>
  • Behavior: Every object that has all listed keys (regardless of value).
DOC:    [{"id":1,"name":"a"},{"id":2},{"name":"c","id":3}]
QUERY:  $..shape({id, name})
OUT:    [{"id":1,"name":"a"},{"id":3,"name":"c"}]

deep_like({k1: v1, ...}) (or ..like({...}))

  • Signature: Any -> Array<Object>
  • Behavior: Every object whose listed keys equal the listed literal values.
DOC:    [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942},{"author":"Herbert","year":1965}]
QUERY:  $..like({author: "Asimov"})
OUT:    [{"author":"Asimov","year":1951},{"author":"Asimov","year":1942}]

walk(fn)

  • Signature: Any, (Any -> Any) -> Any
  • Behavior: Apply fn to every node bottom-up; rebuild the tree.
QUERY:  $.walk(node => node.upper() if node is string else node)
# Returns the document with every string node uppercased.

walk_pre(fn)

  • Signature: Any, (Any -> Any) -> Any
  • Behavior: Like walk, but pre-order — fn sees parent before children.

Use walk_pre when the transform decides whether to recurse based on the node's identity (e.g. "stop at leaves of kind X").

rec(pattern, fn)

Unstable in v0.5 — observed runtime error "rec: exceeded 10000 iterations without reaching fixpoint" even on simple inputs. Spec exists but the fixpoint loop is buggy. Avoid in production until fixed; track migration progress in the issue tracker.

  • Signature (planned): Any, Pattern, (Any -> Any) -> Any
  • Behavior (planned): Match-and-rewrite. Recursively walks; replaces every match with fn(match).

This is the recursive sibling of Pattern Match; useful for AST rewrites and document migrations.

trace_path(pred)

  • Signature: Any, (Any -> Bool) -> Array<Array<Step>>
  • Behavior: For every node matching pred, return the path from root to the node as an array of steps.
DOC:    {"a": {"x": 1}, "b": [{"x": 2}]}
QUERY:  $.trace_path(@.x?)
OUT:    [{"path":"$.a","value":{"x":1}},{"path":"$.b[0]","value":{"x":2}}]

The steps are the keys/indices to walk to reach the match. Pair with set_path for find-and-replace operations.

Deep match

The pattern-match construct has deep variants ..match and ..match! — see Control Flow and the pattern-match cookbook.

When the bitmap kicks in

Deep search uses the structural index when:

  • The query is rooted at $.. or .deep_*
  • The predicate is a shape/key check (not a complex lambda)
  • The document was loaded with the simd-json tape (default)

You don't enable this — it's selected by the planner.

Demand notes

Deep traversals declare All upstream by nature. The optimisation surface is the predicate: shape and like checks bypass the per-node lambda evaluation entirely.

Practical examples

# Find every node with an "id" key (anywhere in the tree)
$..find(@.id?)

# Find all numbers
$..find(@ is number)

# Every object that has both id + name keys
$..shape({id, name})

# Every object where a field equals a specific value
$..like({status: "error"})

# Locate an event by ID inside a deeply nested tree
$..match! { {id: 42} -> @, _ -> null }

# Walk every node, transforming strings to upper
$.walk(node => node.upper() if node is string else node)

# Trace paths from root to nodes matching a predicate
$.trace_path(@.is_admin?)
# → [["users",0],["users",2]]

# Bulk audit: find every "secret"-named field
$..find(@.secret?)

Membership and Predicates

Tests and small helpers.

or(default)

  • Signature: Any, Any -> Any
  • Behavior: If self is null, return default. Otherwise return self.
QUERY:  null.or("default")     OUT: "default"
QUERY:  "hi".or("default")     OUT: "hi"

Equivalent to ?? default but reads better in chains:

$.user.name.or("anon")

has(key)

  • Signature: Object|Array, KeyOrIndex -> Bool
  • Behavior: True if the key exists (objects) or index is in range (arrays).
QUERY:  {"a":1,"b":2}.has("a")     OUT: true
QUERY:  {"a":1}.has("b")     OUT: false
QUERY:  [1,2,3].has(2)     OUT: true
QUERY:  [1,2,3].has(5)     OUT: false

The has operator (x has y) is sugar for x.includes(y) — distinct from this method.

missing(...keys)

Broken in v0.5 — empirically returns false instead of the array of missing keys. Compute manually until fixed:

["host", "port", "user"].filter(k => not $.config.has_path(k))
  • Signature (intended): Object, ...String -> Array<String>
  • Behavior (intended): Return the subset of provided keys that are not present.

includes(value) (alias contains)

  • Signature: Array|String, Any -> Bool
  • Behavior: Membership.
QUERY:  [1,2,3].includes(2)           OUT: true
QUERY:  "hello".includes("ell")       OUT: true

index(value)

  • Signature: Array|String, Any -> Number | null
  • Behavior: Index of first occurrence; null if not found.
QUERY:  [10,20,30].index(20)          OUT: 1
QUERY:  [10,20,30].index(99)          OUT: null

For strings, see also index_of in String Search.

indices_of(value)

  • Signature: Array|String, Any -> Array<Number>
  • Behavior: All indices of value.
QUERY:  [1,2,3,2,1].indices_of(2)
OUT:    [1, 3]

Quick comparison: predicates that look similar

PatternReturns
xs.has("foo")Bool — does the key/index exist?
xs.includes("foo")Bool — is the value present?
xs.index("foo")Number|null — where?
xs.indices_of("foo")Array — all positions
xs.find(p)A|null — first matching element
xs.find_index(p)Number|null — first matching index

Practical examples

# Default for missing field
$.user.email.or("no-email@example.com")

# Existence check on key
$.config.has("aws_region")

# Index of a value (not the predicate form)
$.tags.index("admin")

# All positions of duplicates
[1, 2, 1, 3, 1].indices_of(1)            # → [0, 2, 4]

# Membership in a set
$.tags.includes("urgent")

# Allow-list / deny-list patterns
$.role.includes("admin") and not $.banned_users.includes($.id)

Tabular Output

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "logs": [{"ts": "10:00", "sev": 1, "msg": "start"}, {"ts": "10:05", "sev": 3, "msg": "fail"}, {"ts": "10:10", "sev": 2, "msg": "warn"}], "tweets": [{"id": 1, "text": "#foo", "entities": {"hashtags": [{"text": "foo"}]}}, {"id": 2, "text": "#bar #foo", "entities": {"hashtags": [{"text": "bar"}, {"text": "foo"}]}}], "records": [{"id": 1, "name": "a", "email": "x@y.com"}, {"id": 2, "name": "b", "email": "u@v.com"}]}

Serialise sequences of objects to row-oriented text formats.

to_csv(headers?)

  • Signature: Array<Object> -> String
  • Behavior: RFC-4180-ish CSV. Without arguments, the union of object keys is the header set, sorted by first-appearance.
DOC:    [{"name":"Ada","age":36},{"name":"Bob","age":42}]
QUERY:  $.to_csv()
OUT:
"name,age
Ada,36
Bob,42"

With explicit headers:

QUERY:  $.to_csv(["age","name"])
OUT:
"age,name
36,Ada
42,Bob"

Strings containing commas, quotes, or newlines are quoted and escaped per RFC 4180.

to_tsv(headers?)

  • Signature: Array<Object> -> String
  • Behavior: Same as to_csv but tab-separated. No quoting (tab-in-value is replaced with a space).
QUERY:  $.users.to_tsv(["id","email"])

Composing with the rest of the pipeline

Build a report:

$.users
  .filter(@.active)
  .map(u => u.pick(id, name, email))
  .sort(@.id)
  .to_csv()

Pipe to a file from the CLI:

jetrocli '$.users.filter(@.active).pick(id,name).to_csv()' < users.json > out.csv

Limitations

  • Nested values are JSON-encoded into the cell. For deeply-nested structures, flatten first with flatten_keys:
    $.records.map(r => r.flatten_keys()).to_csv()
    
  • The format is row-major. For wide-narrow long-format reshape, use pivot / zip_shape first.
  • For Excel-flavored CSV (BOM, CRLF), post-process the result.

Practical examples

# Active-user export
$.users.filter(@.active).map(u => u.pick(id, name, email)).sort(u => u.id).to_csv()

# Daily sales report (use e[0]/e[1] indexing — array-pattern destructure
# inside a lambda doesn't parse in v0.5)
$.sales.group_by(s => s.day).entries().map(e => {
  day:   e[0],
  total: e[1].map(@.amount).sum(),
  count: e[1].count()
}).to_csv()

# Hashtag frequency CSV
$.tweets.flat_map(t => t.entities.hashtags.map(@.text))
  .count_by(@)
  .entries()
  .map(e => {tag: e[0], count: e[1]})
  .to_csv()

# TSV for log shipping
$.logs.map(l => l.pick(ts, level, message)).to_tsv()

Relational

Fixture

Examples below run against:

DOC:    {"orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "customers": [{"id": 1, "name": "Ada", "email": "ada@x.com"}, {"id": 2, "name": "Bob", "email": "bob@y.org"}], "left": [{"id": 1, "name": "Ada"}, {"id": 2, "name": "Bob"}], "right": [{"uid": 1, "role": "admin"}, {"uid": 2, "role": "user"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}]}

Operations that combine two arrays of objects on a key.

equi_join(other, leftKey, rightKey, fn?)

  • Signature: Array<L>, Array<R>, KeyL, KeyR, ((L, R) -> Any)? -> Array<Any>
  • Behavior: Inner equi-join: for every pair (l, r) where l[leftKey] == r[rightKey], emit a result. If fn is omitted, the result is the merged object l.merge(r).
LEFT:   [{"id":1,"name":"Ada"},{"id":2,"name":"Bob"}]
RIGHT:  [{"uid":1,"role":"admin"},{"uid":2,"role":"user"}]

QUERY:  $.left.equi_join($.right, "id", "uid")
OUT:    [{"id":1,"name":"Ada","uid":1,"role":"admin"},
         {"id":2,"name":"Bob","uid":2,"role":"user"}]

QUERY:  $.left.equi_join($.right, "id", "uid", (l, r) => {
          name: l.name,
          role: r.role
        })
OUT:    [{"name":"Ada","role":"admin"},{"name":"Bob","role":"user"}]

Worked example: orders + customers

DOC:
{
  "customers": [
    {"id": 1, "name": "Ada"},
    {"id": 2, "name": "Bob"}
  ],
  "orders": [
    {"customer": 1, "amount": 100},
    {"customer": 1, "amount": 50},
    {"customer": 2, "amount": 75}
  ]
}

QUERY:
  $.orders.equi_join($.customers, "customer", "id", (o, c) => {
    customer: c.name,
    amount: o.amount
  })

OUT:
  [
    {"customer":"Ada","amount":100},
    {"customer":"Ada","amount":50},
    {"customer":"Bob","amount":75}
  ]

Notes and limitations

  • Inner only. No outer joins. For "all left, fill missing right with null" you can hand-roll:
    $.left.map(l =>
      l.merge($.right.find(@.uid == l.id).or({role: null}))
    )
    
  • Equality only. No range, prefix, or function joins.
  • One key on each side. For multi-key joins, project a tuple key first:
    $.left.map(l => l.merge({_k: [l.a, l.b]}))
         .equi_join($.right.map(r => r.merge({_k: [r.x, r.y]})), "_k", "_k")
    
  • The implementation builds a hash on the right side; left is streamed. Pre-sort or pre-filter before joining if either side is large and only a subset matters.

When to choose join vs. lookup

For "many left rows, lookup one field on each":

$.orders.map(o => o.merge({customer_name: $.customers.find(@.id == o.customer).name}))

This nested find is O(n×m) — fine for small data. For large data, use equi_join (O(n+m)) or build a lookup table first:

let by_id = $.customers.index_by(@.id) in
  $.orders.map(o => o.merge({customer_name: by_id[o.customer].name}))

Practical examples

# Enrich orders with customer info
$.orders.equi_join($.customers, "customer_id", "id")

# Custom result shape
$.orders.equi_join($.customers, "customer_id", "id", (o, c) => {
  order_id: o.id,
  total: o.amount,
  buyer: c.name,
  email: c.email
})

# Self-join: pair adjacent records via shared key
$.events.equi_join($.events, "session_id", "session_id", (a, b) => {a, b})

# Multi-key join via tuple projection
let lk = $.left.map(l => l.merge({_k: f"{l.a}-{l.b}"})) in
  let rk = $.right.map(r => r.merge({_k: f"{r.x}-{r.y}"})) in
    lk.equi_join(rk, "_k", "_k")

# Filter-then-join (drop rows before paying join cost)
$.orders.filter(@.status == "paid").equi_join($.customers, "cid", "id")

Chained Pipelines

Real-world queries assembled from the building blocks. Each recipe uses one small document and shows the query chain plus a sentence on what the planner does.

1. Top-N by aggregate

DOC:    {"sales": [
  {"region": "NA", "amount": 100},
  {"region": "EU", "amount": 200},
  {"region": "NA", "amount": 50},
  {"region": "AS", "amount": 300},
  {"region": "EU", "amount": 75}
]}

QUERY:  $.sales
          .group_by(@.region)
          .entries()
          .map(([region, rows]) => {region, total: rows.map(@.amount).sum()})
          .sort(@.total)
          .reverse()
          .take(2)

OUT:    [{"region":"AS","total":300},{"region":"EU","total":275}]

group_by and sort are barriers; take(2) after the sort doesn't help — the sort must complete first. Push the demand earlier where possible.

2. Active users + role-based count

DOC:    {"users": [
  {"id":1,"role":"admin","active":true},
  {"id":2,"role":"user","active":false},
  {"id":3,"role":"user","active":true},
  {"id":4,"role":"admin","active":true}
]}

QUERY:  $.users
          .filter(@.active)
          .count_by(@.role)

OUT:    {"admin":2,"user":1}

Streaming filter + barrier count_by. The filter passes only what's needed; count_by buffers but with ValueNeed::Predicate (only the role key) — the rest of the user object is never decoded.

3. Histogram of word frequency

DOC:    {"text": "the quick brown fox jumps over the lazy dog the end"}

QUERY:  $.text
          .words()
          .map(@.lower())
          .count_by(@)

OUT:    {"the": 3, "quick": 1, "brown": 1, ...}

4. Customer order summary

QUERY:  $.orders
          .group_by(@.customer_id)
          .entries()
          .map(([cid, orders]) => {
            customer_id: cid,
            total: orders.map(@.amount).sum(),
            count: orders.count(),
            recent: orders.sort(@.date).last().date
          })
          .sort_by(@.total)
          .reverse()

The inner .sort(@.date).last() is wasteful: it sorts every group to grab the last. Rewrite with max_by:

QUERY:  ...
          .map(([cid, orders]) => {
            customer_id: cid,
            total: orders.map(@.amount).sum(),
            count: orders.count(),
            recent: orders.max_by(@.date).date
          })

5. Unique recent active sessions

QUERY:  $.events
          .filter(@.kind == "login" and .at >= "2026-01-01")
          .map(@.user_id)
          .unique()
          .count()

6. Pretty-print a CSV from objects

QUERY:  $.users
          .filter(@.active)
          .map(u => u.pick(id: id, name: full_name, email))
          .sort(@.id)
          .to_csv()

7. Find a needle in a deep document

QUERY:  $..find(@.id == 42)

If the document was loaded from bytes (default), this hits the structural index — no full traversal.

8. Compute deltas with pairwise

DOC:    {"prices": [100, 105, 102, 110, 108]}

QUERY:  $.prices.pairwise().map(([a, b]) => b - a)
OUT:    [5,-3,8,-2]

9. Rolling 3-point moving average

QUERY:  $.measurements.rolling_avg(3)

The first two outputs are null until the window fills.

10. Build a lookup, then enrich

QUERY:  let by_id = $.users.index_by(@.id) in
          $.events.map(e => e.merge({user: by_id[e.user_id].name}))

index_by is a barrier that runs once; the .map streams.

11. Select rows with all required fields

QUERY:  $.records.filter(r => r.missing("id", "name", "email").count() == 0)

12. Re-shape a long-format table

DOC:    [
  {"y":2024,"q":1,"v":10},{"y":2024,"q":2,"v":20},
  {"y":2025,"q":1,"v":15},{"y":2025,"q":2,"v":25}
]
QUERY:  $.pivot("y", "q", "v")
OUT:    {"2024":{"1":10,"2":20},"2025":{"1":15,"2":25}}

13. Mask sensitive fields

QUERY:  $.users.map(u => u.omit("password", "ssn", "token"))

14. Delta + cumulative sum

QUERY:  $.daily.pairwise().map(([a, b]) => b.value - a.value)

Cumulative-sum form (.accumulate(0, (a, x) => a + x)) isn't yet wired up in v0.5 — see the Limitations page. Until then, cummax / cummin cover running min/max; full fold needs a host loop.

15. Migrate a document shape

rec is unstable in v0.5 (fixpoint loop bug). For now, prefer walk / walk_pre with a manual shape check, or do the rewrite host-side.

QUERY (planned, currently broken):
  $.rec({type: "v1"}, doc =>
    doc.merge({type: "v2"})
       .rename({old_field: "new_field"})
       .omit("legacy_blob"))

rec walks the document, finds every node matching the shape, and rewrites in place.

Pattern Match Cookbook

Fixture

Examples below run against:

DOC:    {"xs": [1, 2, 3, 4, 5], "row": {"k": "foo", "data": {"a": 1, "b": 2}}, "doc": {"a": 1, "b": 2, "type": "v1"}, "tree": {"x": 1, "children": [{"x": 2}]}, "value": 3.14}

Pattern matching is one of jetro's most expressive features. It compiles to a Maranget decision tree at lower-time and runs over all three execution domains (Val, borrowed View, tape).

Anatomy

match scrutinee with {
  pattern1 -> expr1,
  pattern2 when guard -> expr2,
  _ -> default
}
  • Arms checked top-down.
  • First match wins.
  • _ is the universal fallback.
  • when guards run after the structural match succeeds.

Pattern reference

PatternMatches
42, "x", true, nullEqual literal
_Anything
nameAnything, binds to name
1..10Number ≥ 1 and < 10
1..=10Number ≥ 1 and ≤ 10
{k: p, ...}Object with key k, value matches p
[p1, p2]Array of length 2
[h, ...t]Head + tail
p1 | p2Either
x: numberKind-bind

v0.5 note: object shorthand {id, name} binds each key to a same-name local, and rest-capture is spelled ...*rest (object) or ...tail (array): {id, name, ...*rest}, [h, ...tail]. See Limitations for the canonical pattern grammar.

1. Discriminated union

match $.event with {
  {kind: "click", x: cx, y: cy} -> f"click@{cx},{cy}",
  {kind: "key",   code: c}       -> f"key:{c}",
  {kind: "scroll", dy: d}        -> f"scroll:{d}",
  _ -> "unknown"
}

In v0.5 every object pattern key needs an explicit key: binding form; the bare {kind: "click", x, y} shorthand parses-error.

2. Numeric ranges

match $.score with {
  s when s < 0 -> "invalid",
  0..50 -> "low",
  50..80 -> "medium",
  80..=100 -> "high",
  _ -> "out of range"
}

3. Or-patterns

match $.day with {
  "sat" | "sun" -> "weekend",
  _ -> "weekday"
}

4. Rest capture

⚠ Not yet supported in v0.5. The ..rest pattern parse-errors. Bind the keys you care about explicitly and compute rest outside the match if needed:

match $.config with {
  {host: h, port: p} -> {host: h, port: p, extras: $.config.omit("host", "port")},
  _ -> null
}

5. Array shape

match $.coords with {
  [x, y] -> {x, y},
  [x, y, z] -> {x, y, z},
  _ -> null
}

6. Head + tail

match $.xs with {
  [] -> "empty",
  [first, ...rest] -> f"head={first}, count={rest.count()}",
}

7. Kind-bound + guard

match $.value with {
  s: string when s.len() > 100 -> "long string",
  s: string -> "short string",
  n: number when n > 0 -> "positive",
  n: number -> "non-positive",
  _: array -> "array",
  _ -> "other"
}

8. Deep match (..match)

Walk every descendant; collect results.

$.tree..match {
  {kind: "leaf", value} -> value,
  _ -> null
} | .compact()

The trailing .compact() drops the nulls from non-leaf descendants.

9. First-match deep (..match!)

Stops at the first match — the bang variant uses early termination via the structural index where possible.

$.tree..match! {
  {role: "admin", id} -> id,
  _ -> null
}

10. Migration / rewrite (rec)

$.doc.rec({type: "v1"}, node => node.merge({type: "v2"}))

rec is the recursive sibling of match — it descends and rewrites every matching node.

11. Cross-arm sharing

When multiple arms test the same prefix ({kind: "x", ...}, {kind: "y", ...}), the lowering shares the discriminant test. You don't write anything special — the planner does it for you. Practically: write many narrow arms; they cost about as much as one big switch.

12. Guards over deep patterns

match $.row with {
  {user: {age, role: "admin"}} when age >= 18 -> "adult admin",
  {user: {age}} when age < 18 -> "minor",
  _ -> "other"
}

Bench tips

  • Patterns with literal-only discriminants (no guards) compile to switch-like decision trees and run as fast as a hand-written if/else if.
  • Guards add a per-arm conditional; cheap, but don't put expensive computation in them.
  • Deep ..match over a large doc benefits a lot from the structural index; deep ..match! (first-match) is even better.

Write Fusion

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}}

When a query contains multiple chain-writes, jetro fuses them into a single pass over the document. This is the patch-fusion optimizer.

What gets fused

Any sequence of chain-write terminals on the same document:

$.user.name.set("Ada")
   .user.email.set("ada@x.com")
   .user.tags.append("admin")

Or the equivalent block form (preferred for many writes):

patch $ {
  user.name: "Ada",
  user.email: "ada@x.com",
  user.tags[*]: "admin"
}

Without fusion

Naively, three writes mean three traversals from $:

$ → user → name      (write)
$ → user → email     (write)
$ → user → tags[*]   (write)

Each rebuilds the path from the root. For deeply-nested documents, the cost adds up.

With fusion

The optimizer collects effects, walks the document once, and applies all relevant rewrites at each visited node:

$ → user → {set name, set email, append tags}

Three writes, one walk.

Phases

The patch-fusion pass has internal phases (Phase C, Phase E in the source); the user-visible properties are:

  1. Same-base writes group together. Writes under $.user.* batch.
  2. Disjoint paths don't interfere. Writes to $.user.name and $.config.theme execute in one walk but at different nodes.
  3. Conflicts are resolved last-wins. Two writes to the same path: the later one wins.
  4. Conditional writes (when) are evaluated per-write. They short-circuit per clause; the walk doesn't redo work.

Worked example

DOC:
{
  "users": [
    {"id": 1, "name": "Ada", "active": false},
    {"id": 2, "name": "Bob", "active": true}
  ]
}

QUERY:
patch $ {
  users[*].active: true,                        # broadcast write
  users[0].name: "Ada Lovelace",                # specific write
  users[*].last_seen: "2026-05-08" when .active # conditional broadcast
}

What happens:

  • One walk visits every user.
  • For each, three potential writes evaluate. Per element:
    • active: true always applies.
    • name only at index 0.
    • last_seen only when post-active write is true (so all of them).

Output:

{
  "users": [
    {"id": 1, "name": "Ada Lovelace", "active": true, "last_seen": "2026-05-08"},
    {"id": 2, "name": "Bob",          "active": true, "last_seen": "2026-05-08"}
  ]
}

When fusion doesn't fire

  • The chain isn't rooted at $ (parser doesn't classify it as a write).
  • The writes are gated by data-dependent conditions that change document shape mid-pipeline.
  • Mixed read/write — $.users[0].name.set("A").upper() keeps standard method semantics.

Tips

  • Prefer the block form (patch $ { … }) when you have ≥ 3 writes — easier to read, and the optimizer treats it identically.
  • Use broadcast (xs[*].field: v) instead of a .map that calls .set per element.
  • Conditionals (when) are fine — they don't break fusion.

jq vs jetro Cheatsheet

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}]}

For users coming from jq. Same shape: query JSON in a terminal. Different philosophy in places — call this out where it matters.

Big differences at a glance

Topicjqjetro
Calling methodsPipe-of-filters: . | lengthDot syntax: .len()
Pipe |Sole composition operatorValue-flow only — passes @ to RHS
IterationImplicit on .[]Explicit on chained methods
LambdasNone — uses . rebindingThree forms: @, r =>, lambda r:
Pattern matchingNoneFirst-class with guards and ranges
Writes|=, =, del().set(), patch $ {}, chain-writes
BackendSingle interpreterSix backends, planner-selected
CachingNonePlan + path caches in JetroEngine

One-liner translations

Identity / projection

jq:     .
jetro:  $

jq:     .x
jetro:  $.x

jq:     .x.y[0]
jetro:  $.x.y[0]

Iteration

jq:     .users[]
jetro:  $.users[*]                  # explicit; or just .users for chained methods

jq:     .users[].name
jetro:  $.users.map(@.name)

Field selection / projection

jq:     {id, name}
jetro:  .pick(id, name)            # method form, maps over arrays

jq:     .users | map({id, name})
jetro:  $.users.map(u => u.pick(id, name))
        # or
        $.users.pick(id, name)

jq:     del(.password)
jetro:  $.omit(password)            # or $.password.delete()

Filter

jq:     .users | map(select(.active))
jetro:  $.users.filter(@.active)

jq:     .users[] | select(.age > 18)
jetro:  $.users.filter(@.age > 18)

Aggregates

jq:     length
jetro:  .len()                      # for arrays, objects, strings
        .count()                    # explicit array-count reducer

jq:     [.[] | .price] | add
jetro:  $.map(@.price).sum()

jq:     [.[] | .age] | min
jetro:  $.map(@.age).min()
        # or
        $.min_by(@.age).age           # one-pass, returns whole element

Sort / unique / group

jq:     sort
jetro:  .sort()

jq:     sort_by(.year)
jetro:  .sort(@.year)

jq:     unique
jetro:  .unique()

jq:     group_by(.author)
jetro:  .group_by(@.author)
        # jq returns array-of-arrays; jetro returns object indexed by key

jq:     [group_by(.k)[] | {k: .[0].k, n: length}]
jetro:  .count_by(@.k).entries().map(([k,n]) => {k, n})

Slice and take

jq:     .[0:3]
jetro:  $[0:3]

jq:     .[0]
jetro:  $[0]
        # or
        $.first()                    # demand-aware sink

jq:     .[-1]
jetro:  $[-1]
        # or
        $.last()

Has / index / membership

jq:     has("foo")
jetro:  .has("foo")

jq:     .tags | index("admin")
jetro:  $.tags.index("admin")

jq:     .tags | contains(["admin"])
jetro:  $.tags.includes("admin")

Strings

jq:     ascii_upcase
jetro:  .upper()

jq:     ltrimstr("foo")
jetro:  .strip_prefix("foo")

jq:     split(",")
jetro:  .split(",")

jq:     test("regex")
jetro:  @ ~= "regex"
        # or
        .re_match("regex")

jq:     match("(\\d+)").captures
jetro:  .captures("(\d+)")

Recursive descent

jq:     ..
jetro:  ..                           # same notation

jq:     .. | strings
jetro:  $..find(@ is string)

jq:     .. | objects | select(.id?)
jetro:  $..find(@.id?)
        # or
        $..shape({id})

String formatting

jq:     "Hello, \(.name)!"
jetro:  f"Hello, {$.name}!"

Conditional

jq:     if .x > 5 then "big" else "small" end
jetro:  "big" if $.x > 5 else "small"

jq:     .x // "default"
jetro:  $.x ?? "default"

Variables

jq:     . as $doc | $doc.x + $doc.y
jetro:  let doc = $ in doc.x + doc.y

Reduce / fold

jq:     reduce .[] as $x (0; . + $x)
jetro:  $.sum()                      # for sum specifically
        # or general fold:
        $.accumulate(0, (a, x) => a + x).last()

Object construction

jq:     {users: [.[] | {id, name}]}
jetro:  {users: $.map(u => u.pick(id, name))}

Modification

jq:     .x = 1
jetro:  $.x.set(1)
        # or
        patch $ {x: 1}

jq:     .x |= . + 1
jetro:  $.x.modify(@ + 1)

jq:     del(.x)
jetro:  $.x.delete()

jq:     .users[].active = true
jetro:  $.users[*].active.set(true)
        # or
        patch $ {users[*].active: true}

Multiple writes

jq:     .x = 1 | .y = 2 | del(.z)
jetro:  patch $ {x: 1, y: 2, z: DELETE}

jetro fuses these into one document walk. jq evaluates each pipe stage independently.

Complex pipeline translations

Real-world jq queries from the wild. Originals are taken verbatim from the jq manual and the Programming Historian "Reshaping JSON with jq" lesson; all credit to those sources. Each shows the original jq alongside an idiomatic jetro rewrite.

1. Alternative-binding destructure (jq manual)

Flatten a list of resources whose events field may be either a single object or an array of objects, into one row per (resource, event) pair. jq uses its alternative-destructuring operator ?// to try both shapes:

.resources[] as {$id, $kind, events: {$user_id, $ts}} ?// {$id, $kind, events: [{$user_id, $ts}]}
  | {$user_id, $kind, $id, $ts}

jetro has no ?//. Use kind-test + flat_map to normalise:

$.resources.flat_map(r =>
  let evts = (r.events if r.events is array else [r.events]) in
    evts.map(e => {
      user_id: e.user_id,
      kind:    r.kind,
      id:      r.id,
      ts:      e.ts
    })
)

…or with a match to make the two shapes explicit:

$.resources.flat_map(r =>
  match r.events with {
    arr: array -> arr.map(e => {user_id: e.user_id, kind: r.kind, id: r.id, ts: e.ts}),
    {user_id, ts} -> [{user_id, kind: r.kind, id: r.id, ts}],
    _ -> []
  }
)

The match form is more explicit and surfaces the "single object" branch as its own arm — easier to extend (e.g. add a third event-shape later).

2. Tweet hashtags as semicolon-joined CSV (Programming Historian)

Take an array of tweets, project id plus a semicolon-joined string of hashtag texts, emit as CSV. Original jq, threaded through five pipe stages:

{id: .id, hashtags: .entities.hashtags}
| {id: .id, hashtags: [.hashtags[].text]}
| {id: .id, hashtags: .hashtags | join(";")}
| [.id, .hashtags]
| @csv

Each pipe stage rebuilds the object — jq has no nested method chaining, so projection accumulates by reassignment.

jetro collapses it to one chain:

$.map(t => {
  id:       t.id,
  hashtags: t.entities.hashtags.map(@.text).join(";")
}).to_csv()

to_csv already emits the row, headers and all. To match jq's headerless output:

$.map(t => [t.id, t.entities.hashtags.map(@.text).join(";")])
 .map(row => row.map(@.to_string()).join(","))
 .join("\n")

3. Hashtag frequency CSV (Programming Historian)

Explode each tweet into one row per hashtag, group by hashtag, count, emit (tag, count) as CSV. Original jq:

[.[] | {id: .id, hashtag: .entities.hashtags} | {id: .id, hashtag: .hashtag[].text}]
| group_by(.hashtag)
| .[]
| {tag: .[0].hashtag, count: . | length}
| [.tag, .count]
| @csv

jq's group_by returns an array-of-arrays, so the trailing .[] and .[0].hashtag extract the key from the first element of each group.

jetro uses count_by, which already produces a {tag: count} map:

$.flat_map(t => t.entities.hashtags.map(@.text))
 .count_by(@)
 .entries()
 .map(([tag, count]) => {tag, count})
 .to_csv()

The pipeline reads top-to-bottom: explode → tally → reshape → emit. count_by is one of several jetro idioms (also index_by, unique_by, max_by) that fold a common jq pattern (group_by | map(...)) into a single barrier.

Why these examples are shorter in jetro

Three patterns recur:

  1. Method chaining. jq's ... | {...} | {...} style rebuilds the object at each stage; jetro's .map(t => {...}) builds it once.
  2. Specialised barriers. count_by, index_by, unique_by, max_by, min_by collapse group_by | map(...) chains into one call.
  3. First-class lambdas. jq's . rebinding inside as / [] becomes plain t => t.field in jetro, with no positional gymnastics.

The trade-off: jq's pipe-of-filters is more uniform — every stage is a filter that takes one input and produces zero-or-more outputs. jetro's methods are typed (one-to-one, filter, expander, reducer, barrier), so the pipeline shape is more visible but the surface is bigger.

Things jq has that jetro doesn't

  • @base64, @uri, @csv formatters as suffix. jetro spells these as methods: .to_base64(), .url_encode(), .to_csv().
  • SQL-style modules. No equivalent.
  • input, inputs, nul-separated streaming. jetro is in-process; no streaming-input model.
  • recurse(f; cond). Use walk_pre or rec with a pattern.

Things jetro has that jq doesn't

  • Pattern matching with guards, ranges, kind binding, deep ..match.
  • Demand propagation. .first(), .find(), .take(n) cut off the source; no full materialization.
  • Bitmap structural index. ..find, ..shape, ..like skip non-matching subtrees in O(1) per node.
  • First-class lambdas (r => body, lambda r: body) with let-binding + inlining.
  • Write fusion. Many writes batch into one walk.
  • Backends. Tape-zero-copy, structural index, columnar — selected by the planner.

Pitfalls when porting

  • .[] doesn't exist. Replace with [*] or just chain methods (most jetro methods auto-iterate over arrays).
  • Pipe is not composition. .x | .y in jq means "x then y". In jetro it's "evaluate .y with @ = .x". For chaining methods, use .: .x.y().
  • Method calls need parens. length is .len(), not .len.
  • select(p) becomes filter(p), and works on whole arrays — no need to first iterate with .[].
  • Group_by returns an object, not an array of arrays. Use .entries() for jq-shaped output.

Quick reference card

Needjqjetro
Project{a, b}.pick(a, b)
Drop keydel(@.k).omit(k)
Filterselect(p).filter(p)
Mapmap(f).map(f)
Iterate.[][*] or implicit
Lengthlength.len()
Sortsort_by(@.k).sort(@.k)
Uniqueunique.unique()
First.[0].first()
Last.[-1].last()
String concat"\(@.x)"f"{$.x}"
Default// d?? d
Ifif c then a else b enda if c else b
Varas $xlet x = ...
Set.x = v.x.set(v)
Update.x |= f.x.modify(f)
Deletedel(@.x).x.delete()

Performance Guide

Fixture

Examples below run against:

DOC:    {"users": [{"id": 1, "name": "Ada", "email": "ada@x.com", "active": true, "age": 30, "role": "admin", "secret": "a", "is_admin": true, "profile": {"name": "Ada", "email": "ada@x.com"}, "score": 85, "first_name": "Ada", "last_name": "Lovelace", "tags": ["math", "code"]}, {"id": 2, "name": "Bob", "email": "bob@y.org", "active": false, "age": 24, "role": "user", "secret": "b", "is_admin": false, "profile": {"name": "Bob", "email": "bob@y.org"}, "score": 40, "first_name": "Bob", "last_name": "Smith"}, {"id": 3, "name": "Cy", "email": "cy@x.com", "active": true, "age": 42, "role": "user", "secret": "c", "is_admin": false, "score": 90, "first_name": "Cy", "last_name": "Young"}], "user": {"id": 42, "name": "Ada", "email": "ada@x.com", "tags": ["math", "code"], "profile": {"name": "Ada", "email": "ada@x.com"}, "active": true, "verified": true}, "orders": [{"id": 1, "customer": 1, "customer_id": 1, "cid": 1, "amount": 100, "status": "paid", "total": 100, "date": "2024-01-01"}, {"id": 2, "customer": 1, "customer_id": 1, "cid": 1, "amount": 50, "status": "open", "total": 50, "date": "2024-02-01"}, {"id": 3, "customer": 2, "customer_id": 2, "cid": 2, "amount": 75, "status": "paid", "total": 75, "date": "2024-03-01"}], "events": [{"sev": 1, "msg": "ok", "kind": "start"}, {"sev": 2, "msg": "warn", "kind": "end"}, {"sev": 3, "msg": "err", "kind": "start"}], "rows": [{"age": "30", "price": "3.14"}]}

How to write jetro queries that the planner can run fast, and how to read the benchmarks.

Mental model

Jetro picks one of six backends per pipeline node. Fast paths share three properties:

  1. The source is a path of pure field accesses. $.a.b.c triggers tape backends (zero-copy over simd-json output).
  2. The pipeline ends in a sink that bounds demand. .first(), .take(n), .find(p), .count() propagate backward and gate source reads.
  3. No mid-pipeline materialization. .collect(), .sort(), .group_by() flush the tape access pattern back to a Val walk.

If you write to those three rules, queries land on the fast path automatically.

Backend selection (cheat-sheet)

Source / shapePrimary backend
$.a.b.c (field-chain)tape-view (zero-copy)
$..find(...), $..shape({...})bitmap structural index
Single $.a.b (path only)tape-path
Generic expr / lambda bodyfast-children
Any backend declinesinterpreted (universal fallback)

You don't pick — the planner does. Knowing the table tells you why a query is fast.

Demand: the killer feature

Every Demand-aware sink lets the source skip work. Concrete impact:

PatternSpeedup vs. naive
xs.first()~N× (reads 1 element)
xs.find(p)up to ~N× (stops at first match)
xs.filter(p).take(k)up to N/k×
xs.count()2-5× (no payload decoded)
xs.sum(), xs.avg()2-3× (only numeric leaves)
xs.last() (random-access source)~N× (seek to end)
xs.reverse().take(k)rewritten to LastInput(k)

For wide objects, field projection is the other big win:

$.users.map(u => u.pick(id, name))

The source decodes only id and name per row. Other fields stay as raw tape tokens.

What kills performance

Mid-chain materialization

$.users
  .filter(@.active)
  .collect()                # unnecessary
  .map(@.email)

The .collect() forces a full pass before .map. Drop it.

Pre-sort barriers blocking demand

$.events.sort(@.ts).first()

.sort is a barrier — must see every element. The .first() doesn't help. Rewrite with min_by:

$.events.min_by(@.ts)

One pass, no allocation of the sorted array.

Per-element joins (O(n×m))

$.orders.map(o => o.merge({name: $.users.find(@.id == o.user_id).name}))

Each find rescans $.users. For large data, build a lookup once:

let by_id = $.users.index_by(@.id) in
  $.orders.map(o => o.merge({name: by_id[o.user_id].name}))

Or use equi_join.

Repeated sub-expressions

$.user.profile.name + " <" + $.user.profile.email + ">"

Three tape walks. Bind once:

let p = $.user.profile in
  f"{p.name} <{p.email}>"

Heavy lambdas in barriers

$.rows.unique_by(@.to_string())

unique_by calls the lambda once per row. If the projection is non-trivial (regex, deep traversal), pre-project once:

$.rows.map(r => r.merge({_k: r.to_string()}))
     .unique_by(@._k)
     .map(@.omit(_k))

Engine tuning

Plan cache

JetroEngine caches (query, context) → compiled pipeline. Default 256 entries, wholesale eviction.

For a small fixed query set with high doc volume — the typical web-server shape — every call after the first is a cache hit. Don't fight it.

For unique-per-call queries (CLI ad-hoc), the cache is a no-op; just use Jetro directly.

Path cache

The VM caches resolved pointer paths per document. The hash key includes both structure and primitive values bounded at depth 8 — so two docs with the same shape but different leaves stay distinct. You don't manage this.

simd-json (default)

The simd-json feature gives ~4× cold-start. Disable only if you need to round-trip serde_json::Value and the conversion cost dominates.

Benchmarks

cargo bench -p jetro-core

The harness covers:

  • Field access ($.a.b.c) — tape-view zero-copy
  • Filter / map / take pipelines — demand propagation
  • Deep search (..find, ..shape) — bitmap structural index
  • Pattern match — Maranget tree
  • Lambda forms — @ vs. => vs. lambda parity
  • Write fusion — single vs. fused multi-writes

To compare your changes against main:

git checkout main
cargo bench -p jetro-core -- --save-baseline main
git checkout your-branch
cargo bench -p jetro-core -- --baseline main

Reading the output: criterion reports geometric mean ratios. >5% regression should have a clear cause.

Profiling

For Rust workloads:

cargo bench -p jetro-core --bench <name> -- --profile-time 10

Then attach with samply or cargo flamegraph. Hot paths usually live in:

  • exec/pipeline/exec.rs — pipeline driver
  • exec/view/*.rs — borrowed view stages
  • exec/router.rs — backend selection
  • vm/exec.rs — bytecode VM (interpreted fallback)

If the interpreter (vm::execute) shows up hot, the planner is falling through to the universal fallback. Check the query — usually a non-$ source or a generic expr inside a method arg.

Quick checklist

Before benchmarking a query, ask:

  • Can .first() / .take() / .find() replace a full materialization?
  • Is there a barrier (sort, unique, group_by) before the bound? Push the bound earlier or use a one-pass equivalent (min_by, count_by).
  • Does a lookup repeat per row? Pre-build with index_by.
  • Are wide rows projected early with pick?
  • Are sub-expressions duplicated? Bind with let.
  • Is simd-json enabled (default)?
  • Is the same query run many times? Use JetroEngine.

If all yes, the query is on the fast path.

Public API and Engine

The full public surface of the jetro crate is two types and a handful of methods. Everything else is implementation detail.

Jetro — single-document handle

For one document, possibly many queries:

use jetro::Jetro;

let bytes = br#"{"x":[1,2,3]}"#;
let j = Jetro::from_bytes(bytes)?;          // lazy parse via simd-json tape
let v: serde_json::Value = j.collect("$.x.sum()")?;
assert_eq!(v, serde_json::json!(6));

Constructors

MethodInputNotes
Jetro::from_bytes(&[u8])Raw JSON bytesLazy parse — fastest path
Jetro::from_value(serde_json::Value)Parsed valueSkip simd-json
Jetro::from_val(Val)Internal ValAdvanced — re-using engine state

Methods

MethodReturns
j.collect(query)Result<serde_json::Value, EvalError>
j.collect_typed::<T>(query)Result<T, EvalError> (deserialize directly)

Jetro uses a thread-local VM with a path cache. Cheap to construct; prefer to drop it when you move to a new document so the cache key stays valid.

JetroEngine — long-lived multi-doc handle

For many documents and many queries with overlap, share the plan/VM caches:

use jetro::JetroEngine;

let eng = JetroEngine::default();

for doc_bytes in inputs {
    let v = eng.collect_bytes(doc_bytes, "$.users.filter(@.active).count()")?;
    println!("{}", v);
}

Methods

MethodInputNotes
eng.collect(&doc, q)&ValDocument already in Val form
eng.collect_value(serde_value, q)serde_json::ValueRound-trips
eng.collect_bytes(&[u8], q)Raw bytesLazy parse

Returns Result<serde_json::Value, JetroEngineError> — a wider error type that may also wrap JSON-parse errors.

Configuration

OptionDefaultEffect
Plan-cache capacity256Wholesale-evicted when full

The engine's plan cache amortises parse + lower + compile across calls. Hits are O(hash); misses do full work.

Errors

pub enum EvalError {
    /* … */
}

pub enum JetroEngineError {
    Json(serde_json::Error),
    Eval(EvalError),
}

Error messages include the query position when available.

Feature flags

FeatureDefaultWhat it does
simd-jsononDirect bytes → Val parse, skipping serde_json::Value
fuzz_internaloffRe-exports parser + planner for fuzz harness — not stable

To disable simd-json:

[dependencies]
jetro = { version = "0.5", default-features = false }

Python binding

jetro_py exposes a collect(doc, query) function. Internals are identical to the Rust crate.

import jetro

result = jetro.collect({"x": [1,2,3]}, "$.x.sum()")
# result == 6

CLI

jetrocli '$.x.sum()' < input.json

The CLI is a thin wrapper around Jetro::from_bytes.

Threading

  • Jetro is Send + Sync for read-only queries — multiple threads can share a Jetro and run different queries concurrently.
  • JetroEngine is Send + Sync and intended for shared-engine workloads.
  • The VM path-cache is thread-local; cross-thread access goes through separate caches.

Stability

  • The query DSL is stable as of jetro 0.5.x.
  • The Rust API surface (Jetro, JetroEngine, error types) is stable.
  • BuiltinMethod, opcodes, IR types are internal and may change in any minor release.
  • The fuzz_internal feature is explicitly unstable.

Known Limitations and Behavior Surprises (v0.5)

Empirically validated against jetro 0.5.5. This page is the canonical fix-list — every entry is a known gap between intended and actual behavior. Use it as a backlog: items here should drop as the runtime catches up.

v0.5.5 — fixed in this release

The 14 audit-surfaced bugs were addressed plus three follow-up sweeps:

  • [*] wildcard parses (mid-chain expands to .map(@ + rest)).
  • [a:b:c] and [::n] (incl. [::-1] reverse) — Python-style step slicing.
  • ✅ Lambda array-pattern destructure ([k, v]) => body and rest form ([h, ...tail]) => body.
  • ✅ Object patterns in match accept reserved words as keys ({kind: "click"}).
  • ✅ Object pattern shorthand {id, name}{id: id, name: name} in match.
  • Val::StrSlice + Val::Str → string concat. Path-rooted concat works.
  • entries()/keys()/values() no longer triple-wrap their array result.
  • parse_int(radix) — base-aware integer parsing with prefix stripping.
  • to_csv(headers) / to_tsv(headers) — explicit header column ordering.
  • accumulate(init, fn) and accumulate(fn) — both forms.
  • partition(pred) — chained and standalone.
  • approx_count_distinct() — HyperLogLog.
  • missing("k1", "k2", ...) — returns missing-keys array.
  • get_path("a/b/c") and get_path("a.b.c") — multi-segment paths.
  • dedent() — common-prefix removal.
  • remove(pred) — predicate evaluated.
  • enumerate() — survives composition with map / filter.
  • pairwise() — works on path sources.
  • .has(v) returns boolean.
  • rec(fn) fixpoint via deep structural equality.
  • rec(fn, cond) — iterate while cond(@) holds, capped at 10 000 iters.
  • update(path, fn) and functional .update({...}) — see Path Mutation.
  • ✅ Filtered wildcard [* if pred].
  • ✅ Wildcard chain modify $.xs[*].field.modify(@).
  • ✅ Object literal as method receiver {a: 1}.keys() and ({a: 1}).keys().
  • ✅ Regex escape: "\d" and "\\d" both parse as digit class.
  • ✅ Path-call scalar unwrap: $.s.upper()"HELLO" (was ["HELLO"]). Scalar OneToOne builtins on path receivers dispatch directly via apply_one; opt out per-builtin with BuiltinSpec::never_unwrap().
  • to_json on array path: $.users.to_json() → single JSON document (was per-element JSON strings).
  • zip_shape({a, b}) object-shape arg form.
  • group_shape(key) 1-arg key projection (lambda or bare ident).
  • indent("> ") accepts a string prefix in addition to integer count.
  • ✅ Bare-path .field inside method args ($.users.filter(.active)(@.active)).
  • ✅ Double-quoted string escape "{\"a\":1}".from_json() parses.

Items below are still outstanding.

Organized into:

  1. Open engine items
  2. Design choices — intentional, won't change

1. Open engine items

1.1 rec() no-arg

rec requires a step expression — there is no defined no-arg semantic. The closest match is walk(fn) for traversal-style transforms or rec(fn) for fixpoint iteration. May be retired or aliased to a default walker in a later release.

1.2 rec(fn) runaway iteration cap

Calls to rec(fn) where fn is non-idempotent and never reaches a deep-structural fixed point are bounded at 10 000 iterations and then error. The new error message names the cap and recommends rec(fn, cond) for explicit bounding. No guard short of analytic decidability prevents the worst case; document the cap and surface it loudly.


2. Design choices

2.1 No in operator

in would be ambiguous with let X = Y in Z and for x in xs. Use the postfix has operator or .includes(v) method:

xs has "x"             # ✓ operator
xs.includes("x")       # ✓ method
"x" in xs              # ✗ parse error (intentional)

2.2 replace is single-occurrence

.replace(needle, with) replaces only the first match — JavaScript-style. Use .replace_all for substitute-every behaviour:

"hello hello".replace("hello", "hi")          # → "hi hello"
"hello hello".replace_all("hello", "hi")      # → "hi hi"

2.3 Comments

There are no comments inside a query. Strip client-side.

2.4 [expr] vs {expr}

Inline filter is {predicate}. [expr] is index/slice.

$.xs{@.active}        # ✓ inline filter
$.xs[@.active]        # ✗ index expression

3. Argument / receiver shape rules

3.1 Methods accepting lambda forms

MethodWorking forms
filter, find, find_all, find_first, find_one, find_index, indices_where, any, all, take_while, drop_while, remove(@.x op v), (.x op v), (b => b.x op v), (lambda b: ...)
map, flat_map, transform_keys, transform_values, filter_keys, filter_valuesSame
sort, unique_by, group_by, count_by, index_by, max_by, min_bySame; (b => b.x) named lambda preferred for readability
$.books.sort(b => b.year)             # named lambda
$.books.sort(@.year)                  # @-form
$.books.sort(.year)                   # bare-path sugar (≡ @-form)

3.2 Methods that take bare identifiers (no @)

MethodForm
pick(field, alias: src, ...)Bare identifiers. Not @.field.
omit(field, ...)Same
rename({old: new, ...})Object map
missing("k1", "k2", ...)String literals
$.user.pick(id, name)                 # ✓
$.user.pick(@.id, @.name)             # ✗ parse error
$.user.pick(uid: id)                  # ✓ alias

3.3 Multi-arg lambdas

Two-arg lambdas use parens:

$.orders.equi_join($.customers, "cid", "id", (o, c) => {buyer: c.name})
$.xs.accumulate(0, (a, b) => a + b)

Single-arg array destructure (with optional rest) is supported:

$.entries.map(([k, v]) => {k, v})         # ✓
$.rows.map(([h, ...tail]) => tail)        # ✓ rest binding

Versions

This page reflects v0.5.5 behavior empirically tested. As the engine catches up, entries here drop.

Open count: 2 engine items + 4 design choices documented.

Glossary

Backend. One of the execution paths the planner can route a node through: Structural, TapeView, TapeRows, TapePath, ValView, MaterializedSource, FastChildren, Interpreted. Selected automatically based on shape and capabilities.

Barrier. A stage that must see all input before emitting output. sort, unique, group_by, window, etc.

Bitmap structural index. A bit-packed index over the simd-json tape that lets ..find, ..shape, ..like, and ..match skip non-matching subtrees in O(1) per node. Used when the document is loaded with the simd-json tape (default).

Borrowed view. A ValueView — a read-only borrowed reference into a parsed document. Zero-copy substrings via Val::StrSlice.

Builtin. One of the 181 methods in jetro's catalog. Each is one impl Builtin for X block in defs.rs with identity, demand law, and runtime layers co-located.

Chain-write. A query ending in a write terminal (.set, .modify, .delete, .unset, .merge, .deep_merge, .append, .prepend) on a rooted path. Rewritten to Expr::Patch by the parser.

Composed stage. A Composed<A, B> pair that fuses two adjacent stages into one virtual call per element.

Demand. The triple (pull, value, order) describing what an operator needs from its source. See Demand Propagation.

Demand law. The rule by which a builtin transforms downstream demand into upstream demand. Encoded in the builtin's BuiltinDemandLaw.

Effect lifting. The patch-fusion pass that batches multiple chain-writes into a single document walk.

Engine. A JetroEngine — a long-lived handle that caches parsed and compiled queries for reuse across documents.

F-string. f"text {expr}" — string with embedded expression interpolation.

Field chain. A path of pure field accesses, e.g. $.a.b.c. Recognised by the planner and routed to fast tape backends.

Jetro. Single-document handle. Jetro::from_bytes(bytes)?.collect(q).

JetroEngine. Multi-document handle with plan/VM caches.

Lambda. A small function value: @, r => body, lambda r: body. All three forms compile identically.

Maranget tree. The decision-tree compilation strategy used for pattern matching. Cross-arm sharing of common discriminant tests.

Patch. The internal write operation. Generated by both patch $ { … } blocks and chain-write classification.

Patch fusion. The optimizer pass that batches multiple writes into a single walk.

Pipeline. The streaming execution model: Source → Stage* → Sink. One element at a time.

Plan / Logical Plan. Tree-shaped IR between AST and bytecode. Lives in ir/logical.rs.

Plan cache. A cache in JetroEngine that maps (query, context) to a compiled Pipeline. Default capacity 256.

Pull demand. The first lane of Demand: how many inputs must be read. Variants: All, FirstInput(n), LastInput(n), NthInput(i), UntilOutput(n).

Quantifier. A postfix operator on a path step. ? = optional, ! = exactly-one.

Sink. The terminal stage of a pipeline. Reducers, positional, and implicit collectors.

Source. The first stage of a pipeline. Usually a path or array literal.

Streaming. Per-element execution; no buffering.

Tape. The simd-json output: a flat array of tokens describing structural positions in the JSON byte buffer. Used for zero-copy access.

Val. The internal value type. Arc-wrapped compound nodes ensure cheap clones.

Value need. The second lane of Demand: how much of each row's content is required. Variants: None, Predicate, Projection, Numeric, Whole.

View. A ValueView — borrowed read-only access to a value.

VM. The bytecode executor. Used as the universal fallback backend; also provides the path-cache.

Write fusion. Same as patch fusion. See above.