Clinker

Clinker is a pure-Rust, memory-bounded CLI ETL engine for streaming transformation of CSV, JSON, XML, and fixed-width data. It ships as a single static binary with no interpreter, no runtime, and no install dependencies.

Pipelines are declared in YAML. Data transformation logic is written in CXL, a custom expression language purpose-built for ETL. Together they replace legacy tools like Informatica, SSIS, Talend, and NiFi with something deterministic, lightweight, and easy to reason about.

Why Clinker?

Single binary, zero dependencies. Download it, run it. No JVM, no Python, no package manager. Works on any Linux server out of the box.

Good neighbor on busy servers. Clinker enforces a strict memory ceiling (default 256 MB) so it can run alongside JVM applications, databases, and other services without competing for RAM. Aggregation spills to disk when memory pressure rises.

Reproducible output. Given the same input and pipeline, Clinker produces byte-identical output across runs. No nondeterminism from thread scheduling, hash randomization, or floating-point reordering.

Operability-first design. Per-stage metrics, dead-letter queues for error records, explain plans for understanding execution, and structured exit codes for scripting. Built for production from day one.

Two binaries:

Binary	Purpose
`clinker`	Run pipelines against real data
`cxl`	Check, evaluate, and format CXL expressions interactively

A taste of Clinker

Here is a complete pipeline that reads a customer CSV, filters to active customers, classifies them into tiers, and writes the result:

pipeline:
  name: customer_etl

nodes:
  - type: source
    name: customers
    config:
      name: customers
      type: csv
      path: "./data/customers.csv"
      schema:
        - { name: customer_id, type: int }
        - { name: first_name, type: string }
        - { name: last_name, type: string }
        - { name: status, type: string }
        - { name: lifetime_value, type: float }

  - type: transform
    name: enrich
    input: customers
    config:
      cxl: |
        filter status == "active"
        emit customer_id = customer_id
        emit full_name = first_name + " " + last_name
        emit tier = if lifetime_value >= 10000 then "gold" else "standard"

  - type: output
    name: result
    input: enrich
    config:
      name: enriched
      type: csv
      path: "./output/enriched_customers.csv"

Run it:

clinker run customer_etl.yaml

That is the entire workflow. No project scaffolding, no configuration files, no compile step. One YAML file, one command.

Next steps

Installation – download the binary and verify it works
Your First Pipeline – build and run a pipeline step by step
Key Concepts – understand the mental model behind Clinker pipelines

Keyboard shortcuts

Clinker User Guide

Clinker

Why Clinker?

A taste of Clinker

Next steps