Observer is a deterministic, artifact-oriented verification platform.
It is built for teams whose verification has outgrown a single language runner, a pile of shell glue, and a purely local pass/fail loop.
Observer gives you one explicit model for:
- discovering verification targets through explicit providers
- lowering them into canonical inventory
- expressing expectations in suites
- running them deterministically
- emitting structured reports and derived analytics artifacts
If your current setup feels too magical, too fragile, too hard to compare across builds, or too weak to serve as an operational contract, Observer is aimed directly at that problem.
flowchart LR
A[Language-native tests or workflow cases] --> B[Explicit provider or filesystem discovery]
B --> C[Canonical inventory or case set]
C --> D[Observer suite\nsimple or full]
D --> E[Deterministic run records\nJSONL report]
E --> F[Cube / Compare / Compare Index]
E --> G[Console UX]
F --> H[Self-contained HTML explorers]
Most test tooling is optimized for one runtime, one local feedback loop, and one pass/fail moment.
Observer is for the cases where that is no longer enough.
It is designed for projects that need:
- deterministic execution and canonical artifacts
- explicit provider boundaries instead of implicit discovery conventions
- workflow verification and product certification, not just unit-style test execution
- machine-readable reports that can be analyzed and compared later
- one platform that can span multiple ecosystems cleanly
- one maintained verification topology that can answer product questions directly
Observer behaves more like a build artifact pipeline and product certification layer than a bag of conventions.
Observer is a strong fit if your project has one or more of these problems:
- verification spans more than one language or runtime
- your CI outputs need to be reproducible and mechanically comparable
- shell glue and ad hoc harness code are becoming the real testing framework
- you need to verify workflows, artifacts, or staged pipelines, not just function calls
- you want machine-readable run artifacts that can feed later analysis
Observer is probably not the right first tool if all you need is:
- a lightweight unit test runner for one language
- a purely local red-green loop with no artifact discipline
- no need for canonical inventory, derived reports, or cross-build comparison
This is the shape of a real local flow using the runnable Rust starter already in the repository:
cd lib/rust/starter
make list
make inventory
cat tests.inv
make run
make verify
What that gives you, in order:
- raw provider host discovery
- derived canonical inventory
- the exact public execution contract Observer will run against
- a real suite execution with the human console
- hash and JSONL verification against checked-in expected artifacts
That is the product in miniature.
This is the kind of terminal loop Observer is designed to make normal:
$ observer run --inventory tests.inv --suite tests.obs --ui compact --report jsonl > report.jsonl
PASS ledger/applies-ordered-postings
PASS ledger/rejects-overdraft
FAIL format/renders-balance-line
Summary 2 pass 1 fail exit 1
Failed:
format/renders-balance-line
$ observer cube --report report.jsonl --out build-1234.cube.json
{"k":"observer_cube_result","v":"0","out":"build-1234.cube.json","status":"ok"}
$ observer view --cube build-1234.cube.json --out build-1234.html
{"k":"observer_view_result","v":"0","out":"build-1234.html","view_kind":"cube","status":"ok"}
The point is not just that a run passed or failed.
The point is that the run became a stable artifact you can inspect, compare, publish, revisit later, and use as part of a larger product verdict.
Observer is built around deterministic ordering, canonical normalization, and stable derivation.
That means you can:
- trust outputs in CI
- regenerate goldens mechanically
- diff one build against another
- explain what changed without hand-waving
Observer does not blur discovery and execution together.
Discovered targets are first lowered into canonical inventory.
That inventory becomes the explicit contract that suites run against.
Observer supports both:
- a simple suite surface for routine expectations
- a full suite surface for richer verification flows involving workflows, artifacts, extraction, branching, and publication
Both lower to one semantic core.
Observer emits machine-readable reports and can derive:
- telemetry summaries
- build cubes
- pairwise compares
- compare-index artifacts across build sets
- self-contained HTML explorer views
That makes post-run analysis a first-class part of the product instead of an afterthought.
Observer now lives above individual suite runs too.
It can declare:
- what certifies a product
- which ordered stages define release health
- which artifacts and reports come out of those stages
- what exact contract was satisfied when the product passed
That is a different category of value from simply "we ran some tests".
Observer is not tied to one language runtime or one authoring surface.
The repo currently includes real onboarding paths for:
- shell-oriented workflow verification
- C providers
- Go providers
- .NET providers
- Rust providers
- TypeScript provider authoring
- Python provider authoring
The language-specific APIs are allowed to feel native. The platform contract stays explicit and deterministic.
Observer can run tests, but that is the least interesting thing about it.
JUnit-class tools answer a narrower question:
- did these tests pass in this ecosystem
Observer answers broader product questions:
- what is the explicit execution contract
- what certifies this product
- which staged proofs define release health
- what artifacts were emitted
- how do two runs compare mechanically
If you position Observer as a generic test runner, it sounds interchangeable.
If you position it as a verification platform with canonical contracts, product certification, and derived analytics, its actual teeth become visible.
The observer CLI already ships real operator tooling for:
derive-inventoryhash-inventoryhash-suitereport-headerhash-productcertifyrunsummarize-telemetrycubecube-productcompareviewdoctorcompletionmanpage
It also includes:
- serious built-in help and runnable examples
- human-oriented console modes plus machine-oriented report output
- version plus build stamping
- licensing output
- shell completions
- manpage generation
observer derive-inventory --config observer.toml --provider rust > tests.inv
observer run --inventory tests.inv --suite tests.obs --surface simple --analytics --report jsonl > build-1234.report.jsonl
observer cube --report build-1234.report.jsonl --out build-1234.cube.json
observer compare --cube build-1234.cube.json --cube build-1235.cube.json --out compare.json
observer view --compare compare.json --out compare.html
That flow is a good summary of the product thesis:
- explicit provider boundary
- canonical execution contract
- deterministic run records
- derived analysis artifacts
- local, inspectable outputs
Observer now has a first-class product layer above individual suites.
This is the new part of the system.
It exists for products that are only considered ready when several heterogeneous verification areas pass together, such as:
- unit suites plus workflow corpus suites
- producer and consumer compatibility suites
- server, client, and contract suites
- compiler unit, golden, and pipeline suites
Instead of encoding that rule in shell glue or CI YAML, you can now declare one product definition that names the stages, their working directories, and the certification rule.
A product definition is a canonical JSON file, typically product.json, that declares:
- one stable
product_id - one certification rule such as
all_pass - an ordered list of certification stages
- one runner contract per stage
In v0, each stage is an observer_suite runner. That means the product layer reuses normal Observer suites as the stage-level verification mechanism.
Typical shape:
{
"k": "observer_product",
"v": "0",
"product_id": "demo",
"certification_rule": "all_pass",
"stages": [
{
"stage_id": "unit",
"runner": {
"k": "observer_suite",
"cwd": "unit",
"suite": "tests.obs",
"inventory": "tests.inv",
"surface": "simple",
"mode": "default"
}
},
{
"stage_id": "workflow",
"runner": {
"k": "observer_suite",
"cwd": "workflow",
"suite": "tests.obs",
"surface": "full",
"mode": "default"
}
}
]
}The product layer adds three important CLI commands.
hash-product
- parses a product definition
- normalizes it canonically
- emits one stable
product_sha256
certify
- executes the declared stages in source order
- changes into each stage's declared
cwd - runs the stage suite using the stage's own suite, inventory, config, surface, and mode
- writes each child suite report under that stage's local
.observer/product/directory - emits one canonical product report on stdout
- returns one final exit code for the product verdict
cube-product
- reads a product report
- resolves the child suite reports recorded by
certify - derives one build cube per stage
- derives one compare-index across those stage cubes
- lets existing
viewflows render the product analytics outputs directly
certify produces two layers of evidence:
- a product report on stdout describing the product header, stage outcomes, and final summary
- one child suite report per stage written locally under that stage's
.observer/product/directory
That split is deliberate.
The product report explains the product-level verdict.
The child reports preserve the normal suite-level evidence for each certification stage.
observer hash-product --product product.json
observer certify --product product.json > product.default.jsonl
observer cube-product --report product.default.jsonl --root . --out analytics/product
observer view --compare-index analytics/product/product.compare-index.json --out product.html
This is the new top-level workflow when one product is certified by multiple Observer suites together.
Observer now uses that same product layer on itself through the repo-owned product.json contract and the stage tree under tests.
The implementation-level contract for the product layer lives in:
That spec defines the canonical product JSON shape, normalization and hashing semantics, product report records, and the initial CLI surface.
If you want to get hands-on quickly, start here:
- product.json plus tests for the repo-owned Observer self-certification flow
- examples/README.md for the structural and operational manual that explains how examples hand off into a real product-owned verification tree
- lib/shell/starter-pipeline for staged artifact workflows without writing a provider library
- examples/product-certify for the new top-level product-certification flow over a unit stage plus a workflow stage
- lib/c/starter for a standalone C provider host
- lib/go/starter for a standalone Go provider host
- lib/java/starter for a standalone Java provider host
- lib/java/starter-embedded if your Java application already owns its CLI and you want
myapp observe ... - examples/java-consumer-maven for a normal Maven-shaped Java consumer project
- examples/java-consumer-gradle for a normal Gradle-shaped Java consumer project using the optional JUnit 5 bridge
- lib/go/starter-embedded if your Go application already owns its CLI and you want
myapp observe ... - lib/go/starter-embedded-failure if you want the Go embedded path plus an intentional failing example
- lib/dotnet/starter for a standalone .NET provider host
- lib/dotnet/starter-embedded if your .NET application already owns its CLI and you want
myapp observe ... - lib/python/starter for a standalone Python provider host
- lib/python/starter-embedded if your Python application already owns its CLI and you want
myapp observe ... - lib/rust/starter for a standalone Rust provider host
- lib/rust/starter-embedded if your Rust application already owns its CLI and you want
myapp observe ... - lib/rust/starter-embedded-failure if you want the embedded path plus an intentional failing example
Practical manuals live in:
- lib/shell/HOWTO.md and lib/shell/README.md
- lib/c/HOWTO.md and lib/c/README.md
- lib/go/HOWTO.md and lib/go/README.md
- lib/java/HOWTO.md and lib/java/README.md
- lib/java-junit5/README.md for the optional Java JUnit 5 bridge
- lib/dotnet/HOWTO.md and lib/dotnet/README.md
- lib/python/HOWTO.md and lib/python/README.md
- lib/rust/HOWTO.md and lib/rust/README.md
Use Observer if you want:
- a verification platform with explicit contracts instead of magical discovery assumptions
- one model that covers tests, workflows, artifacts, and analysis together
- deterministic and canonical artifacts you can hash, diff, and trust
- a provider model that lets language-native authoring surfaces plug into one core platform
- local-first analytics and comparison flows without backend infrastructure
- a CLI that serves both automation and humans cleanly
Observer is a particularly good fit for:
- language tooling and compiler projects
- staged build and artifact pipelines
- teams that care about goldens, reproducibility, and determinism
- polyglot environments where one language-specific runner is not enough
- projects that need both execution and post-run analysis
A normal test runner is often enough if all you want is:
- one language
- one runtime
- one local pass/fail loop
- no canonical inventory layer
- no workflow artifact verification
- no structured post-run analytics
Observer is for the cases where that simplicity stops being enough.
It is for teams that need a stronger execution contract, stronger determinism, and better artifact discipline.
- FEATURES.md for the product-level feature pitch
- OBSERVER.md for the platform definition and spec map
- specs/00-architecture.md for the architectural core
- specs/13-provider-authoring.md for the provider authoring model
- specs/30-suite.md for the suite model
- specs/40-reporting.md for reporting semantics
- specs/50-workflow-verification.md for workflow verification
Observer uses an explicit split licensing model:
- core platform and repository materials:
GPL-3.0-or-later - files under
lib/*:MIT
See LICENSING.md for the exact boundary and lib/LICENSE for the MIT text used by the library subtree.
This repository already contains a working reference implementation, a serious CLI, runnable starters, provider libraries, published sample artifacts, and conformance coverage.
It is not a vague concept repo.
It is the beginning of a real verification platform.