Skip to content

frogfishio/observer

Repository files navigation

Observer

Observer is a deterministic, artifact-oriented verification platform.

It is built for teams whose verification has outgrown a single language runner, a pile of shell glue, and a purely local pass/fail loop.

Status CLI Libraries Coverage

Observer gives you one explicit model for:

  • discovering verification targets through explicit providers
  • lowering them into canonical inventory
  • expressing expectations in suites
  • running them deterministically
  • emitting structured reports and derived analytics artifacts

If your current setup feels too magical, too fragile, too hard to compare across builds, or too weak to serve as an operational contract, Observer is aimed directly at that problem.

flowchart LR
	A[Language-native tests or workflow cases] --> B[Explicit provider or filesystem discovery]
	B --> C[Canonical inventory or case set]
	C --> D[Observer suite\nsimple or full]
	D --> E[Deterministic run records\nJSONL report]
	E --> F[Cube / Compare / Compare Index]
	E --> G[Console UX]
	F --> H[Self-contained HTML explorers]
Loading

Why It Exists

Most test tooling is optimized for one runtime, one local feedback loop, and one pass/fail moment.

Observer is for the cases where that is no longer enough.

It is designed for projects that need:

  • deterministic execution and canonical artifacts
  • explicit provider boundaries instead of implicit discovery conventions
  • workflow verification and product certification, not just unit-style test execution
  • machine-readable reports that can be analyzed and compared later
  • one platform that can span multiple ecosystems cleanly
  • one maintained verification topology that can answer product questions directly

Observer behaves more like a build artifact pipeline and product certification layer than a bag of conventions.

Who It Is For

Observer is a strong fit if your project has one or more of these problems:

  • verification spans more than one language or runtime
  • your CI outputs need to be reproducible and mechanically comparable
  • shell glue and ad hoc harness code are becoming the real testing framework
  • you need to verify workflows, artifacts, or staged pipelines, not just function calls
  • you want machine-readable run artifacts that can feed later analysis

Observer is probably not the right first tool if all you need is:

  • a lightweight unit test runner for one language
  • a purely local red-green loop with no artifact discipline
  • no need for canonical inventory, derived reports, or cross-build comparison

See It In 60 Seconds

This is the shape of a real local flow using the runnable Rust starter already in the repository:

cd lib/rust/starter
make list
make inventory
cat tests.inv
make run
make verify

What that gives you, in order:

  • raw provider host discovery
  • derived canonical inventory
  • the exact public execution contract Observer will run against
  • a real suite execution with the human console
  • hash and JSONL verification against checked-in expected artifacts

That is the product in miniature.

What It Looks Like In Practice

This is the kind of terminal loop Observer is designed to make normal:

$ observer run --inventory tests.inv --suite tests.obs --ui compact --report jsonl > report.jsonl
PASS  ledger/applies-ordered-postings
PASS  ledger/rejects-overdraft
FAIL  format/renders-balance-line

Summary  2 pass  1 fail  exit 1
Failed:
	format/renders-balance-line

$ observer cube --report report.jsonl --out build-1234.cube.json
{"k":"observer_cube_result","v":"0","out":"build-1234.cube.json","status":"ok"}

$ observer view --cube build-1234.cube.json --out build-1234.html
{"k":"observer_view_result","v":"0","out":"build-1234.html","view_kind":"cube","status":"ok"}

The point is not just that a run passed or failed.

The point is that the run became a stable artifact you can inspect, compare, publish, revisit later, and use as part of a larger product verdict.

What Makes Observer Different

Deterministic by default

Observer is built around deterministic ordering, canonical normalization, and stable derivation.

That means you can:

  • trust outputs in CI
  • regenerate goldens mechanically
  • diff one build against another
  • explain what changed without hand-waving

Canonical inventory as the execution contract

Observer does not blur discovery and execution together.

Discovered targets are first lowered into canonical inventory.

That inventory becomes the explicit contract that suites run against.

Verification beyond “run this test file”

Observer supports both:

  • a simple suite surface for routine expectations
  • a full suite surface for richer verification flows involving workflows, artifacts, extraction, branching, and publication

Both lower to one semantic core.

Structured artifacts, not just console text

Observer emits machine-readable reports and can derive:

  • telemetry summaries
  • build cubes
  • pairwise compares
  • compare-index artifacts across build sets
  • self-contained HTML explorer views

That makes post-run analysis a first-class part of the product instead of an afterthought.

Product-level operational truth

Observer now lives above individual suite runs too.

It can declare:

  • what certifies a product
  • which ordered stages define release health
  • which artifacts and reports come out of those stages
  • what exact contract was satisfied when the product passed

That is a different category of value from simply "we ran some tests".

Polyglot by design

Observer is not tied to one language runtime or one authoring surface.

The repo currently includes real onboarding paths for:

  • shell-oriented workflow verification
  • C providers
  • Go providers
  • .NET providers
  • Rust providers
  • TypeScript provider authoring
  • Python provider authoring

The language-specific APIs are allowed to feel native. The platform contract stays explicit and deterministic.

Not JUnit

Observer can run tests, but that is the least interesting thing about it.

JUnit-class tools answer a narrower question:

  • did these tests pass in this ecosystem

Observer answers broader product questions:

  • what is the explicit execution contract
  • what certifies this product
  • which staged proofs define release health
  • what artifacts were emitted
  • how do two runs compare mechanically

If you position Observer as a generic test runner, it sounds interchangeable.

If you position it as a verification platform with canonical contracts, product certification, and derived analytics, its actual teeth become visible.

What You Can Do Today

The observer CLI already ships real operator tooling for:

  • derive-inventory
  • hash-inventory
  • hash-suite
  • report-header
  • hash-product
  • certify
  • run
  • summarize-telemetry
  • cube
  • cube-product
  • compare
  • view
  • doctor
  • completion
  • manpage

It also includes:

  • serious built-in help and runnable examples
  • human-oriented console modes plus machine-oriented report output
  • version plus build stamping
  • licensing output
  • shell completions
  • manpage generation

A Typical Observer Flow

observer derive-inventory --config observer.toml --provider rust > tests.inv
observer run --inventory tests.inv --suite tests.obs --surface simple --analytics --report jsonl > build-1234.report.jsonl
observer cube --report build-1234.report.jsonl --out build-1234.cube.json
observer compare --cube build-1234.cube.json --cube build-1235.cube.json --out compare.json
observer view --compare compare.json --out compare.html

That flow is a good summary of the product thesis:

  • explicit provider boundary
  • canonical execution contract
  • deterministic run records
  • derived analysis artifacts
  • local, inspectable outputs

New: Product Certification

Observer now has a first-class product layer above individual suites.

This is the new part of the system.

It exists for products that are only considered ready when several heterogeneous verification areas pass together, such as:

  • unit suites plus workflow corpus suites
  • producer and consumer compatibility suites
  • server, client, and contract suites
  • compiler unit, golden, and pipeline suites

Instead of encoding that rule in shell glue or CI YAML, you can now declare one product definition that names the stages, their working directories, and the certification rule.

What A Product Definition Is

A product definition is a canonical JSON file, typically product.json, that declares:

  • one stable product_id
  • one certification rule such as all_pass
  • an ordered list of certification stages
  • one runner contract per stage

In v0, each stage is an observer_suite runner. That means the product layer reuses normal Observer suites as the stage-level verification mechanism.

Typical shape:

{
	"k": "observer_product",
	"v": "0",
	"product_id": "demo",
	"certification_rule": "all_pass",
	"stages": [
		{
			"stage_id": "unit",
			"runner": {
				"k": "observer_suite",
				"cwd": "unit",
				"suite": "tests.obs",
				"inventory": "tests.inv",
				"surface": "simple",
				"mode": "default"
			}
		},
		{
			"stage_id": "workflow",
			"runner": {
				"k": "observer_suite",
				"cwd": "workflow",
				"suite": "tests.obs",
				"surface": "full",
				"mode": "default"
			}
		}
	]
}

How The New Commands Fit Together

The product layer adds three important CLI commands.

hash-product

  • parses a product definition
  • normalizes it canonically
  • emits one stable product_sha256

certify

  • executes the declared stages in source order
  • changes into each stage's declared cwd
  • runs the stage suite using the stage's own suite, inventory, config, surface, and mode
  • writes each child suite report under that stage's local .observer/product/ directory
  • emits one canonical product report on stdout
  • returns one final exit code for the product verdict

cube-product

  • reads a product report
  • resolves the child suite reports recorded by certify
  • derives one build cube per stage
  • derives one compare-index across those stage cubes
  • lets existing view flows render the product analytics outputs directly

Product Evidence Model

certify produces two layers of evidence:

  • a product report on stdout describing the product header, stage outcomes, and final summary
  • one child suite report per stage written locally under that stage's .observer/product/ directory

That split is deliberate.

The product report explains the product-level verdict.

The child reports preserve the normal suite-level evidence for each certification stage.

End-To-End Product Flow

observer hash-product --product product.json
observer certify --product product.json > product.default.jsonl
observer cube-product --report product.default.jsonl --root . --out analytics/product
observer view --compare-index analytics/product/product.compare-index.json --out product.html

This is the new top-level workflow when one product is certified by multiple Observer suites together.

Observer now uses that same product layer on itself through the repo-owned product.json contract and the stage tree under tests.

Where To Read The Full Contract

The implementation-level contract for the product layer lives in:

That spec defines the canonical product JSON shape, normalization and hashing semantics, product report records, and the initial CLI surface.

Pick A Starting Path

If you want to get hands-on quickly, start here:

Practical manuals live in:

Why Use Observer

Use Observer if you want:

  • a verification platform with explicit contracts instead of magical discovery assumptions
  • one model that covers tests, workflows, artifacts, and analysis together
  • deterministic and canonical artifacts you can hash, diff, and trust
  • a provider model that lets language-native authoring surfaces plug into one core platform
  • local-first analytics and comparison flows without backend infrastructure
  • a CLI that serves both automation and humans cleanly

Good Fit

Observer is a particularly good fit for:

  • language tooling and compiler projects
  • staged build and artifact pipelines
  • teams that care about goldens, reproducibility, and determinism
  • polyglot environments where one language-specific runner is not enough
  • projects that need both execution and post-run analysis

Why Not Just Use A Normal Test Runner

A normal test runner is often enough if all you want is:

  • one language
  • one runtime
  • one local pass/fail loop
  • no canonical inventory layer
  • no workflow artifact verification
  • no structured post-run analytics

Observer is for the cases where that simplicity stops being enough.

It is for teams that need a stronger execution contract, stronger determinism, and better artifact discipline.

Learn More

Licensing

Observer uses an explicit split licensing model:

  • core platform and repository materials: GPL-3.0-or-later
  • files under lib/*: MIT

See LICENSING.md for the exact boundary and lib/LICENSE for the MIT text used by the library subtree.

Status

This repository already contains a working reference implementation, a serious CLI, runnable starters, provider libraries, published sample artifacts, and conformance coverage.

It is not a vague concept repo.

It is the beginning of a real verification platform.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors