diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 297cbbe..b091779 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -111,11 +111,24 @@ These are the invariants the codebase should preserve. ## Known Architectural Debt -The deepest remaining architecture gap is still in protocol execution semantics. - -`protocols/` now owns richer semantic descriptors for controls, windows, metrics, ranking, figures, and artifacts, but not all of that meaning is compiled from one executable typed analysis program yet. Some assay semantics still live partly in protocol metadata and partly in compiler behavior. - -That is the next rooted cut. When that work lands, `reader inspect`, `reader protocols`, and `reader explain` should report the semantic analysis program directly instead of primarily reporting compiled plugin mechanics. +The deepest remaining debt is now concentration, not split semantic ownership. + +The compiled semantic program is explicit end-to-end, from bound protocol +through compiled plan to experiment semantics and inspection payloads. The +remaining architecture pressure is that too much assay detail still collects in +three places: + +- `src/reader/protocols/builtins.py` +- `src/reader/protocols/_builtins_plate_reader_variants.py` +- `src/reader/protocols/compiler.py` +- `src/reader/workbench/notebooks/` for retron-review flows + +The public builtin catalog is smaller than before because the heavier +plate-reader variants now live in a family helper, and notebook launch +preflight/runtime state is split from the planner. Even so, those surfaces are +still large enough that future assay families can turn them into semantic +monoliths if new logic is not pushed down into domain modules and +family-specific helpers. ## Extension Guide diff --git a/QUALITY.md b/QUALITY.md index f18226b..0788040 100644 --- a/QUALITY.md +++ b/QUALITY.md @@ -124,7 +124,14 @@ These are the failure classes that quality work should continue to reduce. ## Current Open Quality Debt -The main unresolved quality debt is still semantic, not documentation. Protocol windows, controls, metrics, and ranking are not yet compiled from one executable typed analysis program. Until that changes, there is still some duplicate truth between protocol metadata and compiler behavior. +The main unresolved quality debt is now concentration and documentation drift. + +Semantic ownership is explicit, but two areas still need sustained pressure: + +- large protocol surfaces can accumulate too much family-specific behavior in a + few files +- maintainer docs can drift unless they keep linking to real code surfaces and + the docs integrity check stays part of the gate ## Definition Of Done diff --git a/README.md b/README.md index 52c16d4..3391546 100644 --- a/README.md +++ b/README.md @@ -4,16 +4,31 @@ ![reader banner](assets/reader-banner.svg) -`reader` is a toolkit for organizing experiment directories and running config-driven analysis pipelines over structured assay data. Each experiment has a clear working layout: raw inputs live in `inputs/`, optional notebooks live in `notebooks/`, generated results live in `outputs/`, and a `reader/v7` `config.yaml` describes what should be run. +`reader` organizes experiment directories and runs config-driven analysis +pipelines over structured assay data. Each experiment uses a fixed layout: +raw inputs in `inputs/`, optional notebooks in `notebooks/`, generated results +in `outputs/`, and a `reader/v7` `config.yaml` that declares the run. --- ## Documentation -- [Documentation index](docs/README.md): complete map of user docs, reference docs, and maintainer docs. -- [Getting started](docs/guides/getting_started.md): install `reader`, verify the environment, and inspect the first experiment. -- [Preflight, run, verify](docs/guides/preflight_run_verify.md): deterministic path for inspecting, executing, and checking one experiment. -- [Automation and JSON](docs/guides/automation.md): machine-readable discovery, inspection, and preflight surfaces. +- [Documentation index](docs/README.md): full map of user, reference, and + maintainer docs. +- [Getting started](docs/guides/getting_started.md): install `reader`, check + the environment, and inspect a first experiment. +- [Preflight, run, verify](docs/guides/preflight_run_verify.md): inspect, + validate, and execute one experiment. +- [Automation and JSON](docs/guides/automation.md): machine-readable + discovery, inspection, and preflight routes. +- [Data Operations Plan](docs/guides/data_operations_plan.md): classify data + before intake and capture the minimum metadata needed for reliable reuse. +- [Experiment bootstrap](docs/guides/experiment_bootstrap.md): create an + experiment from local or Drive-backed inputs. +- [Workbench gardening](docs/guides/workbench_gardening.md): maintainer + workflow for architecture and docs cleanup. - [CLI reference](docs/core/cli.md): full command reference. -- [Configuring `reader/v7`](docs/core/pipeline.md): the public authoring surface for experiment configs. -- [Repo maintenance](docs/repo-maintenance.md): maintainer verification and CI lanes. +- [Configuring `reader/v7`](docs/core/pipeline.md): schema and protocol-owned + config surface. +- [Repo maintenance](docs/repo-maintenance.md): repo-wide checks, CI, and + maintainer routines. diff --git a/RELIABILITY.md b/RELIABILITY.md index ab6ca9a..8cc9b7f 100644 --- a/RELIABILITY.md +++ b/RELIABILITY.md @@ -123,10 +123,10 @@ Use the cheapest command that answers the current question. - use `reader validate --no-files` for schema-only checks - use `reader run --dry-run` before executing a slice -- use `reader ls --details --format json` for fleet-wide inspection +- use `reader ls --details --format json` for workbench-wide inspection - use `reader inspect --format json` for one experiment’s current state -This follows the same harness principle highlighted in OpenAI’s harness-engineering article: better harnesses reduce retries by shortening the path from action to trustworthy feedback. +Shorter feedback loops reduce retries by shortening the path from action to trustworthy feedback. Use [docs/guides/automation.md](./docs/guides/automation.md) for the compact JSON route. @@ -140,9 +140,14 @@ Use [docs/guides/automation.md](./docs/guides/automation.md) for the compact JSO ## Current Reliability Debt -The largest remaining reliability debt is semantic rather than operational. Protocol controls, windows, metrics, and ranking are still not one executable typed analysis DAG. That means some assay truth is still split between protocol metadata and compiler behavior. +The largest remaining reliability debt is no longer semantic ambiguity. It is +change-surface concentration and documentation freshness. -Operationally, the workbench is more reliable when inspection and dry-run routes are used first. Architecturally, full reliability requires finishing that semantic cut. +Operationally, the workbench is reliable when inspection and dry-run routes are +used first. Structurally, the higher-risk failures now come from oversized +maintainer surfaces such as the protocol kernel and retron notebook bundle, or +from docs that stop matching those surfaces closely enough for operators and +agents to trust them. ## Related Docs diff --git a/docs/README.md b/docs/README.md index 8e1460c..687b289 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,39 +1,57 @@ # Documentation index -Use this index to find the smallest document that matches what you need. Start -with the guides if you want to understand how `reader` is organized or how an -experiment moves from inputs to outputs. Use the reference pages when you need -exact CLI or config details. +Use this index to find the smallest doc that answers the current question. +Start with the guides if you want to see how `reader` moves from inputs to +outputs. Use the reference pages when you need exact CLI or config details. ## Start here -- [Getting started](./guides/getting_started.md): install `reader`, check the environment, and inspect the first experiment. -- [Common tasks](./guides/common_routes.md): shortest command paths for discovery, validation, execution, and JSON output. +- [Getting started](./guides/getting_started.md): install `reader`, check the + environment, and inspect a first experiment. +- [Common tasks](./guides/common_routes.md): shortest routes for discovery, + validation, execution, and JSON output. ## Core workflows -- [Preflight, run, verify](./guides/preflight_run_verify.md): deterministic operating path for one experiment. -- [Automation and JSON](./guides/automation.md): machine-readable discovery, inspection, and preflight surfaces. +- [Preflight, run, verify](./guides/preflight_run_verify.md): inspect, + validate, and execute one experiment. +- [Automation and JSON](./guides/automation.md): machine-readable discovery, + inspection, and preflight routes. +- [Data Operations Plan](./guides/data_operations_plan.md): classify datasets, + capture metadata minimums, and keep intake decisions explicit. +- [Experiment bootstrap](./guides/experiment_bootstrap.md): create an + experiment from local or Drive-backed inputs and verify the run. - [End-to-end demo](./guides/demo.md): one concrete walkthrough from discovery to outputs. ## User guides -- [Retron sponge screen guide](./guides/retron_sponge_screen.md): matched-control sponge assay setup, runtime flow, plots, and exports. -- [Notebooks](./guides/notebooks.md): notebook scaffolding and Marimo usage in experiment directories. +- [Retron sponge screen guide](./guides/retron_sponge_screen.md): matched-control + sponge assay setup, runtime flow, plots, and exports. +- [Notebooks](./guides/notebooks.md): notebook scaffolding and Marimo usage in + experiment directories. - [Marimo reference](./guides/marimo_reference.md): notebook widgets, patterns, and examples. ## Reference - [CLI reference](./core/cli.md): full command reference. -- [Configuring `reader/v7`](./core/pipeline.md): config schema and protocol-owned authoring surface. +- [Configuring `reader/v7`](./core/pipeline.md): config schema and protocol-owned settings. ## Maintainer docs -- [Repo change gate](./repo-change-gate.md): minimum gate before landing tracked changes. -- [Repo maintenance](./repo-maintenance.md): repo-wide verification, CI lanes, and maintenance surfaces. -- [Plugin development](./core/plugins.md): add or extend ingest, transform, plot, export, and validator plugins. +- [Repo change gate](./repo-change-gate.md): minimum gate before landing + tracked changes. +- [Repo maintenance](./repo-maintenance.md): repo-wide checks, CI, and + maintenance guidance. +- [Workbench gardening](./guides/workbench_gardening.md): maintainer workflow + for architecture, docs, and verification-surface cleanup. +- [SFXI triptych sequence plot spec](./dev/sfxi_triptych_sequence_plugin_spec.md): + dev spec for promoting the SFXI triptych/BaseRender preview into a formal + plot plugin with a versioned dnadesign render contract. +- [Plugin development](./core/plugins.md): add or extend ingest, transform, + plot, export, and validator plugins. - [Architecture](../ARCHITECTURE.md): system structure, ownership boundaries, and invariants. -- [Design](../DESIGN.md): product and information-design rules for the public surface. +- [Design](../DESIGN.md): product and information-design rules for the public + UI and docs. - [Quality](../QUALITY.md): quality bar, evidence expectations, and failure taxonomy. - [Reliability](../RELIABILITY.md): preflight, run, verify, and recovery expectations. - [Security](../SECURITY.md): trust boundaries and safe defaults. diff --git a/docs/core/cli.md b/docs/core/cli.md index bd15e72..9526831 100644 --- a/docs/core/cli.md +++ b/docs/core/cli.md @@ -3,7 +3,7 @@ This page is the full CLI reference. For setup and the shortest common paths, start with [Getting started](../guides/getting_started.md) and [Common tasks](../guides/common_routes.md). For the operating loop and -machine-readable routes, use [Preflight, run, verify](../guides/preflight_run_verify.md) +machine-readable output, use [Preflight, run, verify](../guides/preflight_run_verify.md) and [Automation and JSON](../guides/automation.md). A typical order is: @@ -23,8 +23,9 @@ uv run reader CONFIG|DIR|INDEX [options] If `CONFIG|DIR|INDEX` is omitted, `uv run reader` searches upward from the current working directory for `config.yaml`. If a numeric index is provided, it is resolved against the nearest `experiments/` -directory (or `./experiments` if none is found) using the same default experiment inventory as -`uv run reader ls`; indices shown by `uv run reader ls --all` are accepted too. +directory (or `./experiments` if none is found) using the same default experiment list as +`uv run reader ls`. Hidden scaffold/template entries shown by `uv run reader ls --all` must be +addressed by explicit path. --- @@ -36,20 +37,20 @@ List experiments: uv run reader ls --root experiments ``` -Show protocol ids, selected-plan summaries, and current output counts: +Show protocol ids, selected step summaries, and current output counts: ```bash uv run reader ls --root experiments --details ``` -Add readiness state so the inventory tells you whether each experiment is -draft/template, blocked, ready to run, or already has a record catalog: +Add readiness state so the list tells you whether each experiment is +draft/template, blocked, ready to run, or already has a records catalog: ```bash uv run reader ls --root experiments --details --readiness ``` -Emit the same inventory as JSON for agents or automation: +Emit the same list as JSON for agents or automation: ```bash uv run reader ls --root experiments --details --format json @@ -57,14 +58,14 @@ uv run reader ls --root experiments --details --readiness --format json ``` The JSON payload uses explicit `catalog`, `selection`, `summary`, and -`experiments` blocks so agents do not need to reconstruct fleet state by +`experiments` blocks so agents do not need to reconstruct the experiment list by walking every row or guessing which filters produced the current view. When `--readiness` is enabled, `selection.readiness` is `true`, each experiment -entry gains a `readiness` block, and `summary.by_readiness` counts the fleet by +entry gains a `readiness` block, and `summary.by_readiness` counts experiments by `config_error`, `draft`, `template`, `dependency_blocked`, `blocked`, `runnable`, `legacy_outputs_present`, or `records_ready`. -Filter the inventory down to one assay family, one lifecycle, or just broken configs: +Filter the list down to one assay family, one lifecycle, or just broken configs: ```bash uv run reader ls --root experiments --details --protocol plate_reader/dual_reporter_screen @@ -80,6 +81,9 @@ Include scaffold/template directories too: uv run reader ls --root experiments --all ``` +Use explicit paths for scaffold/template configs when acting on them. Numeric +indexes only target the default `uv run reader ls` experiment list. + If `--root` is omitted, `uv run reader` auto-detects the nearest `experiments/` directory. Inspect plugins, protocols, and notebook templates: @@ -97,6 +101,9 @@ uv run reader protocols plate_reader/retron_sponge_screen uv run reader protocols --example-config uv run reader protocols --family screen_analysis uv run reader protocols --family matched_control_screen +uv run reader dop classes +uv run reader dop classes --protocol plate_reader/retron_sponge_screen --format json +uv run reader dop ready-specs --format json uv run reader notebook --list-templates ``` @@ -108,7 +115,7 @@ uv run reader init ./experiments/20260317_new_assay --protocol Use `plate_reader/dual_reporter_screen` for CFP/YFP-style dual-reporter panels. Use `plate_reader/single_reporter_screen` for RFP-or-other single-reporter panels normalized to a configured denominator. Use `plate_reader/retron_sponge_screen` when the assay contract depends on matched same-sensor tetO controls plus compiled burden, leakiness, induced-effect, and cross-sensor ranking nodes. -For the matched-control sponge workflow itself, use the [Retron sponge screen guide](../guides/retron_sponge_screen.md). That guide maps the direct-ratio analysis sequence, the compiled semantic tables, and the retron-specific plot/export surface. +For the matched-control sponge workflow itself, use the [Retron sponge screen guide](../guides/retron_sponge_screen.md). That guide maps the direct-ratio analysis sequence, the compiled assay tables, and the retron-specific plots and exports. Inspect one experiment end to end: @@ -116,13 +123,18 @@ Inspect one experiment end to end: uv run reader inspect CONFIG|DIR|INDEX ``` -Emit the experiment as layered JSON with `authoring`, `semantics`, and +Emit the experiment as structured JSON with `authoring`, `semantics`, and `implementation`: ```bash uv run reader inspect CONFIG|DIR|INDEX --format json ``` +In JSON mode, `semantics.program` is the authored view of the active semantic +program for the experiment. The same program, with execution bindings and +coverage, lives under `implementation.compiled.semantic_program` beside the +compiled plugin wiring. + List just the pipeline chain and bindings: ```bash @@ -136,16 +148,16 @@ Guided walkthrough: uv run reader demo ``` -Protocol descriptions are the main discovery surface for assay-specific inputs +Protocol descriptions are the main place to check assay-specific inputs and outputs. For the compact route, use [Common tasks](../guides/common_routes.md). -For machine-readable contracts, use [Automation and JSON](../guides/automation.md). +For machine-readable output, use [Automation and JSON](../guides/automation.md). In short: -- `uv run reader protocols ` shows the protocol authoring surface, selected outputs, and compiled defaults. +- `uv run reader protocols ` shows the protocol inputs, selected outputs, and compiled defaults. - `uv run reader protocols --example-config` prints a starter `reader/v7` outline. - `uv run reader inspect`, `config`, `steps`, and `explain` show one bound experiment; JSON mode uses shared `authoring`, `semantics`, and `implementation` sections. -- `uv run reader ls --details --readiness` is the fleet-level inventory and preflight view. +- `uv run reader ls --details --readiness` is the experiment list with preflight state. - `uv run reader plot --list`, `uv run reader export --list`, and `uv run reader records` show selected outputs and generated records. - `uv run reader plugins --protocol --category ...` scopes registry inspection to the plugins a protocol uses by default. @@ -176,7 +188,8 @@ uv run reader config CONFIG|DIR|INDEX --format json ``` In JSON mode, `authoring` is the full `reader/v7` document, while -`implementation` carries the compiled plan. +`implementation` carries the compiled plan and the execution-bound semantic +program. Validate schema, wiring, and inputs: @@ -190,7 +203,7 @@ then separates overall status/counts into `summary` from file-check details in `validation`. `uv run reader validate --no-files --format json` still reports declared file and auto-root counts even when the checks are skipped. -If you want the same preflight signal while browsing the whole workbench, use +If you want the same preflight signal while browsing the whole experiment list, use `uv run reader ls --details --readiness`. If you want the readiness view beside one experiment’s compiled plan and current outputs, use `uv run reader inspect`. For the full operating loop, use [Preflight, run, verify](../guides/preflight_run_verify.md). @@ -230,7 +243,7 @@ uv run reader run CONFIG|DIR|INDEX --from step_a --until step_c --dry-run --form `uv run reader run` fails fast if `--from` comes after `--until` in pipeline order. -Inspect the emitted record catalog: +Inspect the emitted records catalog: ```bash uv run reader records CONFIG|DIR|INDEX @@ -241,7 +254,7 @@ uv run reader records CONFIG|DIR|INDEX --all --format json In JSON mode, `uv run reader records` keeps experiment identity at the top level, then adds the record-manifest path, a summary by record kind and producer, and the latest record entries. `--all` does not dump every historical revision; it adds -per-record revision counts and a total revision summary so the surface stays +per-record revision counts and a total revision summary so the output stays compact. Useful flags: @@ -268,13 +281,17 @@ Run plots for all experiments in a year (expects `experiments/YYYY`): uv run reader plot --year 2025 ``` +For mutating runs, `reader plot --year` preflights the full batch first. If any +selected experiment is not runnable, the command aborts before writing plot +files so the year run does not leave partial state behind. + Override the experiments root when using `--year`: ```bash uv run reader plot --year 2025 --root /path/to/experiments ``` -List resolved semantic plot outputs and their upstream dataframe bindings: +List plot outputs and their upstream dataframe bindings: ```bash uv run reader plot CONFIG|DIR|INDEX --list @@ -319,7 +336,7 @@ Run export specs only: uv run reader export CONFIG|DIR|INDEX ``` -List resolved semantic export artifacts and their upstream dataframe bindings: +List exports and their upstream dataframe bindings: ```bash uv run reader export CONFIG|DIR|INDEX --list @@ -353,13 +370,10 @@ uv run reader export CONFIG|DIR|INDEX --only crosstalk_pairs_table --set with.pa ## Notebooks -Scaffold a marimo notebook (no pipeline execution). If `--template` is omitted, the CLI -uses the first configured `notebooks.specs` entry, otherwise auto-picks a default -template from declared template capabilities: - -- plot-capable template when plots exist -- cytometry EDA template when the pipeline is cytometry-shaped -- fallback basic template otherwise +Scaffold a marimo notebook (no pipeline execution). If `--template` is omitted, +the CLI resolves the protocol default notebook template after applying any +`protocol.outputs.notebook.template` override from the experiment config. It +does not auto-pick a template from notebook specs or template capabilities. Notebooks are written under `outputs/notebooks/`. @@ -400,6 +414,7 @@ Runtime notes: - If the notebook or runtime has drifted, it restarts the stale session instead of silently reusing it. - It prunes older reader-managed sessions for the same experiment and launch mode before starting a new one. - For agent review, prefer `--mode run --headless`, then open the printed URL in Chrome MCP. +- Static HTML export can catch execution failures, but it does not validate live widget behavior. Use a served Marimo app for dropdown, slider, export-button, and chart-rerender checks. See templates: @@ -443,7 +458,7 @@ List pipeline steps (resolved): uv run reader steps CONFIG|DIR|INDEX ``` -List workbench records from `outputs/manifests/records.json`: +List records from `outputs/manifests/records.json`: ```bash uv run reader records CONFIG|DIR|INDEX diff --git a/docs/core/pipeline.md b/docs/core/pipeline.md index 71f9ae1..9e2241d 100644 --- a/docs/core/pipeline.md +++ b/docs/core/pipeline.md @@ -2,9 +2,9 @@ `reader` now has three explicit layers: -- authoring: experiment `config.yaml` -- semantics: `reader.protocols` -- execution: compiled workbench IR +- config: experiment `config.yaml` +- protocol: `reader.protocols` +- execution: compiled workbench plan The config should describe assay inputs, analysis choices, and requested outputs in domain terms, not plugin or graph terms. @@ -31,7 +31,7 @@ resources: path: ./inputs/metadata.xlsx ``` -## Top-level surface +## Top-level keys - `schema` Must be `reader/v7`. @@ -53,18 +53,18 @@ resources: There is no public `graph_patch`, no top-level `pipeline` / `plots` / `exports`, and no `protocol.with`. -## Protocol surface +## Protocol block The protocol block is split by role: - `protocol.inputs` Assay-family input bindings and protocol-owned knobs. - `protocol.analysis` - Analysis toggles and semantic policy choices. + Analysis toggles and protocol policy choices. - `protocol.outputs` Notebook, plot, and export selection. -## Plot and artifact registries +## Plot and export choices Protocols expose two user-facing registries: @@ -76,7 +76,7 @@ Protocols expose two user-facing registries: `logic_summary_workbook` Plot outputs can also be grouped into named plot profiles. A profile is just a -semantic bundle of figure ids chosen by the protocol author. +named group of figure ids chosen by the protocol author. Users do not select plugins directly. They choose: @@ -86,7 +86,7 @@ Users do not select plugins directly. They choose: - optional export `include` / `exclude` - optional per-artifact `artifacts` config -Unknown keys on the public authoring surface now fail fast. `reader/v7` no +Unknown keys in the public config now fail fast. `reader/v7` no longer silently drops misspelled `protocol` keys, unknown plot/export output blocks, or malformed annotation collections. @@ -165,7 +165,7 @@ protocol: require_non_null: true ``` -## Inspect the config surface +## Inspect the config Use the CLI to inspect one protocol or one experiment: diff --git a/docs/core/plugins.md b/docs/core/plugins.md index 1b740f8..a32c9ed 100644 --- a/docs/core/plugins.md +++ b/docs/core/plugins.md @@ -1,423 +1,184 @@ - # Extending reader with plugins -Plugins exist so repeated parsing/transforms/plots can be reused across experiments. -This is a maintainer-facing surface. Ordinary experiment authors should stay in -`config.yaml`, `reader init`, `reader protocols`, `reader inspect`, -`reader plot --list`, and `reader export --list`; they should not need plugin -ids for normal workbench use. When you do need registry-level inspection, start -with `reader plugins --protocol ` so you see the plugin kernel in -assay context instead of as a flat global dump. - -### Contents - -1. [Plugin categories](#plugin-categories) -2. [Example of adding new plugins](#example-of-adding-new-plugins) -3. [Flow cytometry ingest plugin](#flow-cytometry-ingest-plugin) -4. [Adding a transform plugin](#adding-a-transform-plugin) -5. [Adding a plot/export plugin](#adding-a-plotexport-plugin) - ---- - -### Plugin categories - -A good plugin is thin orchestration: - -- keep instrument/file parsing in `domains//io/` when it is domain-owned, - or in `plugins/ingest/discovery_policy.py` when it is genuinely shared raw-file - autodiscovery policy for ingest adapters -- keep reusable computation in `domains//...` instead of `plugins/` -- keep derived domain tables with the producing domain, not in shared contract buckets -- keep plugins focused on wiring inputs → computation → declared outputs -- if multiple plugins in one category share orchestration-only behavior, keep - that in `plugins//_*.py`; do not duplicate file discovery, - partition resolution, or figure-save plumbing across plugin modules -- for plotting, keep figure selection/layout in the domain package - (for example `domains/plate_reader/analysis/`, `domains/plate_reader/ordering.py`, - and `domains/plate_reader/plots/`), keep axes rendering in - `domains//plots/panels/`, and keep plugin modules as config adapters only -- if a plot needs semantic input preparation, keep that in the plotting library - next to the figure package rather than in a plugin-private helper - -Current examples of this convention: - -- `plugins/ingest/_discovery.py` owns shared ingest auto-discovery and pick - logic -- `plugins/ingest/discovery_policy.py` owns raw-file discovery defaults and search helpers -- `plugins/plot/_shared.py` owns shared figure-plot adapter behavior -- `plotting/style.py` owns shared palette/style helpers used by workbench and domain plot code -- `plugins/transform/_labeling.py` owns reusable dataframe label-application mechanics for generic labeling transforms -- `domains/semantics.py` owns the shared domain-semantic access surface used by workbench and plugins -- `plugins/transform/_*.py` owns transform-local adapter support only when the code is not shared outside one plugin - -Each plugin now has a small workbench ontology entry declared in the explicit -built-in plugin manifest: - -- `category` = execution stage (`ingest`, `transform`, `validator`, `plot`, `export`) -- `domain` = canonical problem domain (`plate_reader`, `cytometry`, `logic`, `generic`) -- `family` = semantic plugin type within that domain (`time_series`, `metadata_merge`, `derived_channel`, ...) - -That ontology is first-class in the registry and CLI. `reader plugins` is no longer -just a flat key dump; it is the package’s semantic catalog. - -Built-in plugins live under: - -```bash -src/reader/plugins// -``` - -You’ll typically see plugins grouped as: - -* `ingest/*` — read raw instrument/files into a tidy table -* `transform/*` — operate on tidy tables (derive new channels, attach metadata, filter, normalize, etc.) -* `validator/*` — enforce or upgrade schema/shape -* `plot/*` — render plots (plot specs) -* `export/*` — write exports (export specs) - -Built-in plugin registration is explicit in: - -```bash -src/reader/workbench/assets/plugin_manifest.py -``` - -`src/reader/plugins/` now contains implementations only; the runtime does not -discover built-ins by scanning that package tree. - -External plugins use the `reader.plugins` entry-point group and must expose an -explicit plugin descriptor, not just a `Plugin` subclass. - -Plugin I/O is now declared through the explicit port kernel in -`reader.workbench.ports`, not through string conventions. That means: - -- input optionality is `optional=True`, not a `?` suffix -- dataframe ports declare `contract="tidy.v1"` or `contract=None` -- file inputs use `file_path` ports -- plot/export outputs use explicit `file_path` or `file_bundle` ports -- the removed legacy conventions `"none"` and `"files"` are not valid plugin API - surface anymore - ---- - -### Example of adding new plugins - -**Generic ingestion** - -1. Keep parsing logic in a domain package: - - ```python - # src/reader/domains/plate_reader/io/my_format.py - import pandas as pd - from pathlib import Path - - def parse_my_format(path: str | Path) -> pd.DataFrame: - # return tidy long table - # required columns depend on your chosen contract(s) - ... - return df - ``` - -2. Wire it up as a plugin implementation: - - ```python - # src/reader/plugins/ingest/my_format.py - from typing import Any - from reader.workbench.ports import dataframe_output, file_path_input - from reader.workbench.registry import Plugin, PluginConfig - from reader.domains.plate_reader.io.my_format import parse_my_format - - class MyCfg(PluginConfig): - pass - - class MyIngest(Plugin): - ConfigModel = MyCfg - - @classmethod - def input_ports(cls): - return {"raw": file_path_input("raw")} - - @classmethod - def output_ports(cls): - return {"df": dataframe_output("df", "tidy.v1")} - - def run(self, ctx, inputs: dict[str, Any], cfg: MyCfg): - return {"df": parse_my_format(inputs["raw"])} - ``` - -3. Register it explicitly in the built-in manifest: - - ```python - # src/reader/workbench/assets/plugin_manifest.py - from reader.plugins.ingest.my_format import MyIngest - from reader.workbench import PluginSemantics - from reader.workbench.assets import build_plugin_asset - - build_plugin_asset( - plugin_id="ingest/my_format", - semantics=PluginSemantics( - domain="plate_reader", - family="workbook_ingest", - summary="Parse my custom workbook format into tidy traces.", - ), - plugin_cls=MyIngest, - ) - ``` - -4. Use it in an experiment: - - ```yaml - - id: "ingest_custom" - plugin: "ingest/my_format" - reads: - raw: - file: "./inputs/run001.ext" - ``` - -### Flow cytometry ingest plugin - -For flow cytometry `.fcs` files, use `ingest/flow_cytometer`. It emits a tidy table with: +Plugins are the execution layer of `reader`. They are for maintainers, not the +public authoring surface. Experiment authors should stay in +[`reader/v7`](./pipeline.md), protocol selection, and protocol-owned output +choices. Public configs do not list raw plugin ids. + +## Plugin categories + +Built-in plugin implementations live under +[`src/reader/plugins/`](../../src/reader/plugins/): + +- [`ingest/`](../../src/reader/plugins/ingest/) + read raw files into tidy tables +- [`transform/`](../../src/reader/plugins/transform/) + derive or enrich dataframe records +- [`validator/`](../../src/reader/plugins/validator/) + enforce or promote contracts +- [`plot/`](../../src/reader/plugins/plot/) + render file-bundle plot outputs +- [`export/`](../../src/reader/plugins/export/) + write file-bundle export artifacts + +Built-in registration is explicit in +[`src/reader/workbench/assets/plugin_manifest.py`](../../src/reader/workbench/assets/plugin_manifest.py). +The runtime does not discover built-ins by scanning package trees. + +External plugins are still supported. `reader` loads third-party plugin +descriptors from the [`reader.plugins` entry-point group](../../src/reader/workbench/registry.py) +after it registers the built-in manifest. + +## Ownership rules + +A good plugin is thin orchestration. + +- Keep domain parsing and math in + [`src/reader/domains/`](../../src/reader/domains/). +- Keep shared plotting mechanics in + [`src/reader/plotting/`](../../src/reader/plotting/). +- Keep shared ingest autodiscovery in + [`src/reader/plugins/ingest/discovery_policy.py`](../../src/reader/plugins/ingest/discovery_policy.py) + or [`src/reader/plugins/ingest/_discovery.py`](../../src/reader/plugins/ingest/_discovery.py), + not duplicated across adapters. +- Keep plugin metadata in the asset manifest and ontology types: + [`src/reader/workbench/assets/types.py`](../../src/reader/workbench/assets/types.py) + and [`src/reader/workbench/ontology.py`](../../src/reader/workbench/ontology.py). +- Keep protocol-facing defaults and output selection in + [`src/reader/protocols/`](../../src/reader/protocols/), not in ad hoc CLI or + docs-only conventions. + +If maintainers need to widen the public config just to reach a plugin, the +design is probably heading in the wrong direction. + +## How a plugin reaches users + +The maintainer path is: + +1. Implement the plugin class under `src/reader/plugins//`. +2. Register it in the built-in manifest, or expose an external + [`reader.plugins` entry point](../../src/reader/workbench/registry.py) that + resolves to an `AssetDescriptor`. +3. Wire it into a + [`protocol compiler`](../../src/reader/protocols/compiler.py) or recipe so + the protocol owns when it runs and what semantic output it represents. +4. Expose it through protocol inputs, analysis knobs, plot profiles, or export + artifacts rather than raw plugin ids in user config. + +That last step matters. `reader` is intentionally protocol-driven. A new plugin +is not a public feature until a protocol gives it a semantic role. + +## Minimal implementation pattern -* `sample_id` (from filename) and `position = sample_id` -* `time` set to a constant (default `0.0`, since cytometry is snapshot data) -* long-form `channel` / `value` pairs per event - -The raw FCS parsing currently lives in `reader.domains.cytometry.io.fcs`; the plugin is just the workbench adapter -for config, auto-discovery, output contracts, and logging. - -Example: - -```yaml -- id: ingest_cytometer - plugin: ingest/flow_cytometer - with: - auto_roots: ["./inputs"] - channel_name_field: "pns" - auto_pick: "merge" -``` - -To attach metadata keyed by `sample_id`: - -```yaml -- id: attach_metadata - plugin: transform/sample_metadata - reads: - df: - record: "ingest_cytometer/df" - metadata: - file: "./metadata.csv" - with: - require_columns: ["design_id", "treatment"] -``` - -If the merged table satisfies the annotated plate-reader contract, reader -stores it as `plate_reader.annotated.v1` instead of plain `tidy.v1`. - -`reader explain` shows this as a minimum contract with a possible runtime -promotion; execution decides the actual stored contract from the emitted data. - -**Note:** `flowio` is now a core dependency. If cytometry parsing fails because -the package is missing, re-sync the environment with `uv sync --locked`. - ---- +```python +from typing import Any -### Adding a transform plugin +from reader.workbench.ports import dataframe_output, file_path_input +from reader.workbench.registry import Plugin, PluginConfig -Transforms typically declare a minimal table contract such as `tidy.v1`. -When a transform preserves richer metadata semantics, it can resolve a stricter -runtime output contract instead of collapsing back to the minimum. -If that promotion matters to users, expose it through the dataframe output -port surface -so `reader explain` reports the planned semantic range instead of only the floor. -```python -import pandas as pd -from reader.workbench.ports import dataframe_input, dataframe_output -from reader.workbench.registry import Plugin, PluginConfig +class MyCfg(PluginConfig): + pass -class Cfg(PluginConfig): - factor: float = 2.0 -class ScaleValues(Plugin): - ConfigModel = Cfg +class MyIngest(Plugin): + ConfigModel = MyCfg @classmethod def input_ports(cls): - return {"df": dataframe_input("df", "tidy.v1")} + return {"raw": file_path_input("raw")} @classmethod def output_ports(cls): - return cls.passthrough_output_ports( - outputs={"df": dataframe_output("df", "tidy.v1")}, - passthrough={"df": "df"}, - promoted_examples={"df": ("plate_reader.annotated.v1",)}, - ) - - def resolve_output_ports(self, *, inputs, outputs, cfg, where): - del cfg - return self.inherit_dataframe_output_ports( - inputs=inputs, - outputs=outputs, - passthrough={"df": "df"}, - where=where, - ) - - def run(self, ctx, inputs: dict[str, Any], cfg: Cfg): - df = inputs["df"].copy() - df["value"] = pd.to_numeric(df["value"], errors="coerce") * cfg.factor - return {"df": df} -``` - ---- - -### Crosstalk pairing (transform/crosstalk_pairs) - -Compute pairwise crosstalk-safe design pairings using a `fold_change.v1` table. This transform -summarizes per-design selectivity and evaluates pairs where each design responds strongly to its -own treatment while responding weakly to others (including non-self treatments). -If your design-to-treatment mapping lives in metadata, include that column in the fold-change -step via `attach_metadata` so it is available to this transform. - -Time selection is explicit and assertive: -- `time_mode: single` requires exactly one time in the fold-change table. -- `time_mode: exact|nearest` requires `time` or `times` to be provided (tolerance applies only to `nearest`). -- `time_mode: latest` uses the latest time present in the fold-change table. -- `time_mode: all` evaluates every time present in the fold-change table. -- `time_policy: all` (optional) keeps only pairs that pass at *every* evaluated time. - -Mapping strategies are explicit and documented in config: -- `mapping_mode: explicit` uses `design_treatment_map` (stable, recommended for ground-truth mapping). -- `mapping_mode: column` uses a metadata column (keeps mapping in data; good for reuse). -- `mapping_mode: top1` uses the top response in the data (data-driven, but can change across runs/time). - Use `top1_tie_policy` and `top1_tie_tolerance` to control how ties are handled. - -For library-level API details and column semantics, see `docs/lib/crosstalk_pairs.md`. - -```yaml -protocol: - id: plate_reader/dual_reporter_screen - with: - workflow: - include_crosstalk_pairs: true - include_crosstalk_export: true - plot_set: yfp_time_series - fold_change: - report_times: [12.0] - use_global_baseline: true - global_baseline_value: negative - plugins: - transform/crosstalk_pairs: - value_column: log2FC - value_scale: log2 - target: YFP/CFP - time_mode: all - time_policy: per_time - mapping_mode: column - design_treatment_column: cognate_treatment - min_self: 1.0 - max_cross: 0.5 - max_other: 0.5 - min_self_minus_best_other: 1.0 - min_selectivity_delta: 1.0 - require_self_is_top1: true -``` + return {"df": dataframe_output("df", "tidy.v1")} -To export pairings: - -```yaml -protocol: - id: plate_reader/dual_reporter_screen - analysis: - crosstalk_pairs: - enabled: true - export: true - outputs: - exports: - include: [crosstalk_pairs_table] - artifacts: - crosstalk_pairs_table: - path: crosstalk_pairs.csv + def run(self, ctx, inputs: dict[str, Any], cfg: MyCfg): + del ctx, cfg + return {"df": parse_my_format(inputs["raw"])} ``` ---- +Register it in the manifest: -### Adding a plot/export plugin +```python +build_plugin_asset( + plugin_id="ingest/my_format", + semantics=PluginSemantics( + domain="plate_reader", + family="workbook_ingest", + summary="Parse a custom workbook format into tidy traces.", + ), + plugin_cls=MyIngest, +) +``` -Plot and export plugins now enter configs through protocol compilation plus -`protocol.outputs` selection/settings. +Then wire it through a protocol compiler: -They are run by: +```python +return CompiledProtocolPlan( + semantic_program=protocol.semantic_program(), + pipeline=( + PluginStepDecl( + id="ingest", + plugin="ingest/my_format", + reads={"raw": FileInputDecl(path="./inputs/run001.ext")}, + ), + ), +) +``` -* `reader plot` (save plot files only) -* `reader export` (exports only) +The important contract is not the example code. It is the layering: +domain logic -> plugin adapter -> asset registration -> protocol-owned exposure. -Guidelines: +## Port and contract rules -* Plot/export plugins should be deterministic and pure: read declared inputs, produce deterministic outputs. -* Avoid experiment-specific logic inside plot plugins; keep bespoke logic in `domains//`. -* Declare typed input/output ports; write under `outputs/plots` or `outputs/exports`. -* Plot specs are assertive: missing required columns raise an error. -* If a selection is empty, emit a warning and skip (don’t silently write an empty plot). -* Plot/export outputs are tracked as `file_bundle` records in `outputs/manifests/records.json`. +Plugin I/O is declared through +[`reader.workbench.ports`](../../src/reader/workbench/ports/), not string +conventions. -Plot plugins implement a **single render path** that powers file output: +- Optional inputs use `optional=True`, not `?` suffixes. +- Dataframe ports declare a contract id or `None`. +- Plot/export outputs use `file_bundle` ports. +- Removed legacy conventions such as `"none"` and `"files"` are not valid. -* `render(ctx, inputs, cfg) -> PlotFigure | list[PlotFigure]` -* `run(...)` should call `render(...)` and then save via `save_plot_figures(...)`. +Runtime validation then checks reads, writes, and contract compatibility before +execution. -Minimal plot plugin pattern: +## Plot and export guidance -```python -from reader.plotting.sinks import PlotFigure, normalize_plot_figures, save_plot_figures -from reader.workbench.ports import dataframe_input, file_bundle_output +Plot and export plugins are invoked through protocol-owned surfaces: -class MyPlot(Plugin): - ConfigModel = MyCfg +- `reader plot` +- `reader export` +- `protocol.outputs.plots` +- `protocol.outputs.exports` - @classmethod - def input_ports(cls): - return {"df": dataframe_input("df", "tidy.v1")} +They should be deterministic, assertive, and provenance-friendly: - @classmethod - def output_ports(cls): - return {"artifacts": file_bundle_output("artifacts")} - - def render(self, ctx, inputs, cfg: MyCfg) -> list[PlotFigure]: - fig = build_plot(inputs["df"]) - return [PlotFigure(fig=fig, filename=cfg.filename or "my_plot")] - - def run(self, ctx, inputs, cfg: MyCfg): - figures = normalize_plot_figures(self.render(ctx, inputs, cfg), where=f"plot/{self.plugin_key}") - saved = save_plot_figures(figures, ctx.plots_dir) - return {"artifacts": [str(p) for p in saved]} -``` +- read only declared inputs +- fail fast on missing required columns or invalid config +- write file bundles under `outputs/plots` or `outputs/exports` +- let records/manifests explain what was produced -Common plot config knobs (shared across most plot plugins): +For file output helpers, start with +[`src/reader/plotting/sinks.py`](../../src/reader/plotting/sinks.py). -* `filename`: override the output filename stub -* `fig.ext`: file extension (default `pdf`) -* `fig.dpi`: raster resolution for PNGs (ignored for vector PDFs) - -Inspect plugins: +## Useful inspection routes ```bash uv run reader plugins uv run reader plugins --category plot -uv run reader plugins --domain plate_reader -uv run reader plugins --family time_series +uv run reader plugins --protocol plate_reader/dual_reporter_screen --category transform +uv run reader protocols +uv run reader explain ``` -Export plugins are intentionally permissive about input contracts; the built‑in -`export/csv` and `export/xlsx` accept any dataframe record and write it to disk. - -Example export spec: - -```yaml -protocol: - id: logic/sfxi_screen - outputs: - exports: - include: [logic_summary_workbook] - artifacts: - logic_summary_workbook: - path: sfxi_vec8.xlsx - sheet_name: vec8 -``` +Use `reader plugins` to inspect the registry. Use `reader protocols` and +`reader explain` to verify that a protocol actually exposes the new plugin in a +maintainable way. + +## Related docs + +- [Configuring `reader/v7`](./pipeline.md) +- [reader specification](./spec.md) +- [Architecture](../../ARCHITECTURE.md) +- [Crosstalk pairs](../lib/crosstalk_pairs.md) diff --git a/docs/core/spec.md b/docs/core/spec.md index ea394b8..21962d0 100644 --- a/docs/core/spec.md +++ b/docs/core/spec.md @@ -1,227 +1,169 @@ # reader specification -This document is the developer‑oriented source of truth for how **reader** is structured, how configs map to execution, and how dependencies are managed. - ---- - -### Scope - -- **Experiment directory** = unit of work. -- **Pipeline steps** produce dataframe records; **plot/export specs** render file outputs tracked in the record catalog. -- **Notebooks** are optional and read outputs for interactive exploration. - ---- - -### Repo layout - -```text -reader/ - experiments/ # workbench directories (inputs, notebooks, outputs) - docs/ # documentation (index + grouped references) - index.md # docs map - core/ # core reference (CLI, pipeline, plugins, spec) - guides/ # how-to + walkthroughs - lib/ # library-level references - audits/ # audits and investigations - src/reader/ # library + CLI - protocols/ # explicit experiment analysis protocol kernel - workbench/ # experiment lifecycle, config, decl/graph IR, records, notebooks, CLI - assets/ # unified asset registry + capability model for plugins/templates - config/ # wire schema + YAML loading only - decl/ # compiled declaration IR - experiment/ # typed experiment-local semantics (protocol binding, annotations, resources, output layout) - engine/ # planning, validation, contracts, runtime execution - graph/ # runtime graph nodes and typed refs - ports/ # typed plugin input/output port ontology - ontology.py # shared workbench semantic types - notebooks/ # notebook scaffold + launch flows - records/ # record catalog store + dataset discovery helpers - domains/ # protocol/data semantics by domain - plate_reader/ - analysis/ # derived plate-reader summary logic (e.g. fold_change) - ordering.py # dose/treatment ordering semantics - io/ # Synergy H1 parsing - plots/ # plate-reader plotting primitives and figure builders - cytometry/ - io/ # FCS parsing - logic/ - sfxi/ # SFXI math, selection, reference handling, writer - logic_symmetry/ # logic-symmetry plotting/metrics helpers - crosstalk/ # pairwise crosstalk ranking helpers - contracts/ # explicit dataframe contract kernel - builtins/ # built-in contract declarations by semantic domain - plugins/ # ingest/transform/plot/export/validator - plotting/ # shared plotting/style/cache infrastructure - tests/ -``` - ---- - -### Contracts - -Plugins declare input/output contracts (schema identifiers). The engine: -- asserts required inputs are present -- validates declared outputs -- fails fast on mismatches (unless runtime strictness is relaxed, in which case mismatches are logged as warnings) - -Built‑in contracts now live entirely under `src/reader/contracts/`. -The contract ontology is explicit and centralized: - -- `src/reader/contracts/model.py` defines contract identity and dataframe rules -- `src/reader/contracts/catalog.py` owns `ContractCatalog` and lineage checks -- `src/reader/contracts/builtins/` owns built-in declarations for: - - `generic` - - `plate_reader` - - `logic` - - `cytometry` -- `src/reader/contracts/__init__.py` exports the explicit built-in catalog - constructor `builtin_contract_catalog()` - -`domains/` no longer declares built-in dataframe contracts. Domain packages now -own algorithms, IO, and semantics only. - -The workbench engine is now organized as a package instead of a monolithic module: - -- `workbench/engine/planning.py` owns explain/plan rendering -- `workbench/engine/validation.py` owns config/reference checks -- `workbench/engine/contracts.py` owns runtime contract enforcement -- `workbench/engine/inputs.py` owns dataframe-record/file input resolution -- `workbench/engine/runtime.py` owns execution orchestration - -That split keeps plan-time semantics, runtime semantics, and filesystem concerns orthogonal. - -The workbench asset surface now follows one model: - -- `workbench/assets/` is the single semantic registry surface for plugins and - notebook templates -- `workbench/registry.py` owns executable plugin discovery only; plugin assets - are exposed through the shared asset model -- `workbench/decl/` owns the internal authored declaration layer for bound - experiments, recipe-expanded step declarations, and notebook template calls -- `workbench/experiment/` owns experiment-local semantics: - explicit protocol binding, typed annotation vocabulary, explicit resource catalogs, and output layout -- `protocols/` owns built-in experiment analysis protocols and the typed - `ProtocolCatalog` used by runtime composition -- `workbench/graph/` owns typed workbench references and normalized runtime - nodes: - `AssetRef`, `InputRef`, `OutputRef`, plugin-step nodes, notebook-template - calls, and typed `source_recipe` provenance for recipe-expanded steps -- `workbench/ports/` owns typed plugin I/O semantics: - input/output port names, optionality, port kind, and dataframe-contract - attachment -- `workbench/config/` is wire-schema parsing only; it no longer doubles as the - internal authored model or the runtime graph model -- `workbench/records/model.py` owns persisted artifact provenance types instead - of opaque input strings -- `workbench/templates/builtins/*` are static template assets -- `workbench/recipes/*` are internal workflow macros used by protocol - compilers, not user-facing config surfaces -- `workbench/model/` was deleted; the remaining semantic types now live under - `workbench/ontology.py`, `workbench/assets/`, `workbench/decl/`, and - `workbench/graph/` -- operator behavior such as notebook auto-pick and protocol-level plugin - defaults now comes from protocol execution plans instead of heuristic CLI - branches or repeated per-step config blobs -- template-local capabilities still own template-specific behavior such as plot - filtering or injected plot specs, but protocol policy now owns which - templates are valid by default for a bound experiment - -The plate-reader plotting library now follows the domain ontology directly: - -- `domains/plate_reader/analysis/fold_change.py` owns fold-change table construction -- `domains/plate_reader/analysis/timepoints.py` owns nearest-time and snapshot selection helpers -- `domains/plate_reader/ordering.py` owns dose/treatment ordering semantics -- `domains/plate_reader/io/sample_map.py` for plate-map parsing -- `domains/plate_reader/plots/common.py` owns plot-shared dataframe/layout/color/output helpers -- `domains/plate_reader/plots/grouping.py` owns figure-group resolution helpers -- `domains/plate_reader/plots/panels/` for axes-level drawing primitives -- figure-specific packages such as `domains/plate_reader/plots/snapshot_barplot/` - and `domains/plate_reader/plots/snapshot_heatmap/` for figure planning plus - render orchestration - -The logic domain now follows the same rule: - -- `domains/semantics.py` owns the canonical plugin domain vocabulary -- `domains/logic/sfxi/` owns vec8 config parsing, selection, math, and output writing -- `domains/logic/logic_symmetry/` owns logic-symmetry preparation, metrics, overlays, and rendering -- `domains/logic/crosstalk/` owns pairwise crosstalk ranking logic - -The cytometry domain now follows the same rule: - -- `domains/cytometry/io/` owns raw FCS parsing - -Raw ingest autodiscovery no longer lives under `workbench/`. -That policy now lives with ingest adapters: - -- `plugins/ingest/discovery_policy.py` owns raw-file auto-discovery defaults - and file search helpers - -The shared plotting infrastructure now lives under `plotting/` instead of any domain: - -- `plotting/style.py` owns palettes and shared figure construction helpers -- `plotting/mpl.py` owns Matplotlib cache setup and rc defaults - ---- - -### Matplotlib cache - -Plotting plugins require a writable Matplotlib cache directory. `reader` sets -`MPLCONFIGDIR` automatically when plotting is needed. - -Defaults: -- Commands that resolve a config/experiment (run/explain/validate/plot/export) use - `/.cache/matplotlib`. -- Other commands that load plot plugins without a config (e.g., `reader plugins`) - use `$XDG_CACHE_HOME/reader/matplotlib` (or `~/.cache/reader/matplotlib`). - -Override with `MPLCONFIGDIR` or `READER_MPLCONFIGDIR` if you need a custom path. - ---- - -### Dependency management (uv) - -This repo uses **uv**: - -```bash -uv sync --locked -``` - -Developer tooling (lint + tests + notebooks): +This page is the maintainer-facing map of how `reader` is organized today. Use +it to answer three questions quickly: + +1. Where does authored experiment intent live? +2. Which package owns assay semantics versus execution mechanics? +3. Which surfaces are public contract versus internal IR? + +## System layers + +`reader` stays legible when these layers remain separate: + +- Authoring contract: + [`reader/v7` config](./pipeline.md) in `experiments//config.yaml` +- Protocol semantics: + [`src/reader/protocols/`](../../src/reader/protocols/) +- Experiment-local semantics: + [`src/reader/workbench/experiment/`](../../src/reader/workbench/experiment/) +- Data Operations Plan overlay: + [`src/reader/workbench/dop/`](../../src/reader/workbench/dop/) +- Execution IR and runtime: + [`src/reader/workbench/decl/`](../../src/reader/workbench/decl/), + [`src/reader/workbench/graph/`](../../src/reader/workbench/graph/), + [`src/reader/workbench/engine/`](../../src/reader/workbench/engine/) +- Extension mechanics: + [`src/reader/plugins/`](../../src/reader/plugins/), + [`src/reader/contracts/`](../../src/reader/contracts/), + [`src/reader/plotting/`](../../src/reader/plotting/) +- Operator surfaces: + [`src/reader/workbench/cli/`](../../src/reader/workbench/cli/), + [`docs/guides/preflight_run_verify.md`](../guides/preflight_run_verify.md), + experiment `outputs/` + +The important rule is that authored config should name assays, inputs, +analysis choices, and requested outputs in domain terms. It should not mirror +plugin wiring or internal graph structure. + +## Package ownership + +- [`src/reader/protocols/model.py`](../../src/reader/protocols/model.py) + owns the typed protocol contract: config fields, plots, artifacts, semantic + nodes, notebook policy, and the compiled semantic program contract. +- [`src/reader/protocols/compiler.py`](../../src/reader/protocols/compiler.py) + owns protocol-specific compilation from bound protocol config into pipeline, + plots, exports, notebooks, and step assembly. +- [`src/reader/protocols/builtins.py`](../../src/reader/protocols/builtins.py) + remains the public builtin catalog surface, while + [`src/reader/protocols/_builtins_plate_reader_variants.py`](../../src/reader/protocols/_builtins_plate_reader_variants.py) + owns the heavier single-reporter and retron matched-control descriptor + assembly so family-specific assay detail does not keep accreting in the + catalog façade. +- [`src/reader/protocols/semantic_coverage.py`](../../src/reader/protocols/semantic_coverage.py) + owns execution-bound semantic coverage mapping so semantic status/materialized + record ids do not stay buried inside the step compiler. +- [`src/reader/workbench/config/`](../../src/reader/workbench/config/) + parses YAML and validates the wire schema only. +- [`src/reader/workbench/decl/build.py`](../../src/reader/workbench/decl/build.py) + binds a `reader/v7` document to a protocol and produces the compiled + workbench declaration. +- [`src/reader/workbench/experiment/model.py`](../../src/reader/workbench/experiment/model.py) + owns experiment-local semantics: protocol binding, annotations, resources, + layout, and the compiled protocol semantic program. +- [`src/reader/workbench/dop/`](../../src/reader/workbench/dop/) + owns the read-only Data Operations Plan overlay: data-class selection, + metadata minimums, stop conditions, transfer rules, and readiness evidence + gates. It references protocol ids but does not own protocol execution. +- [`src/reader/workbench/inspection/`](../../src/reader/workbench/inspection/) + owns read-only payloads and reports for `inspect`, `steps`, `records`, and + related CLI surfaces. +- [`src/reader/workbench/assets/plugin_manifest.py`](../../src/reader/workbench/assets/plugin_manifest.py) + is the explicit built-in plugin registry. +- [`src/reader/workbench/templates/catalog.py`](../../src/reader/workbench/templates/catalog.py) + owns notebook template selection and compatibility checks. +- [`src/reader/workbench/notebooks/launch.py`](../../src/reader/workbench/notebooks/launch.py) + owns Marimo launch orchestration, while + [`src/reader/workbench/notebooks/_launch_runtime.py`](../../src/reader/workbench/notebooks/_launch_runtime.py) + and + [`src/reader/workbench/notebooks/_launch_registry.py`](../../src/reader/workbench/notebooks/_launch_registry.py) + keep runtime-path/env setup and managed-session state separate from the + planner itself. +- [`src/reader/domains/`](../../src/reader/domains/) + owns domain math, parsing, ordering, and figure-planning logic. +- [`src/reader/plugins/`](../../src/reader/plugins/) + owns thin execution adapters only. +- [`src/reader/contracts/`](../../src/reader/contracts/) + owns dataframe contract identities, validation rules, and built-in contract + catalogs. + +## Runtime flow + +The canonical path is: + +`config -> protocol binding -> compiled semantic program + compiled workbench plan -> graph/runtime execution -> records and file bundles` + +More concretely: + +1. [`src/reader/workbench/config/load.py`](../../src/reader/workbench/config/load.py) + loads and validates `reader/v7`. +2. [`src/reader/workbench/decl/build.py`](../../src/reader/workbench/decl/build.py) + binds the protocol and stores the compiled semantic program on the + experiment semantics object. +3. [`src/reader/workbench/graph/normalize.py`](../../src/reader/workbench/graph/normalize.py) + normalizes declarations into runtime nodes and refs. +4. [`src/reader/workbench/engine/`](../../src/reader/workbench/engine/) + validates inputs, resolves records/resources, and executes the selected + slice. +5. [`src/reader/workbench/records/store.py`](../../src/reader/workbench/records/store.py) + persists dataframe records and file-bundle provenance under + `outputs/manifests/records.json`. + +The semantic program is now part of that compiled contract. Inspection surfaces +should read the compiled program snapshot directly instead of reconstructing it +through fallback branches. That removes one major split-ownership path, even +though deeper staleness checks for same-protocol snapshots would still be a +separate hardening step. + +## Information architecture rules + +- Public config lives in [`docs/core/pipeline.md`](./pipeline.md), not in plugin + docs. +- Protocols own user-facing output vocabulary such as figures, artifacts, and + notebook policy. +- Plugins stay mechanical. If a maintainer needs assay meaning to understand a + plugin, that logic probably belongs in a domain or protocol package instead. +- `inspection/` is presentation-only. It should not recompile or “repair” + semantic state. +- Generated artifacts live under `outputs/` and are never the source of truth. +- When docs name a code surface, prefer linking to the actual file or package so + `tools/check_docs.py` can catch drift. + +## Current pressure points + +The package is no longer suffering from split semantic ownership between +compiled plans and experiment semantics, but two maintainability hotspots +remain: + +- Protocol concentration: + [`src/reader/protocols/builtins.py`](../../src/reader/protocols/builtins.py), + [`src/reader/protocols/_builtins_plate_reader_variants.py`](../../src/reader/protocols/_builtins_plate_reader_variants.py), + [`src/reader/protocols/compiler.py`](../../src/reader/protocols/compiler.py), + [`src/reader/protocols/model.py`](../../src/reader/protocols/model.py), and + [`src/reader/protocols/semantic_coverage.py`](../../src/reader/protocols/semantic_coverage.py) + still carry a large share of assay semantics. The plate-reader variants are + now in a private family helper instead of the public façade, but new assay + families should keep pushing descriptor, compiler, and semantic-coverage + logic down into family-specific helpers instead of back into shared catalog + files. +- Retron notebook concentration: + [`src/reader/workbench/notebooks/`](../../src/reader/workbench/notebooks/) + remains the biggest local cluster of large files. Launch preflight/runtime + state is now split from the planner, but the retron review stack is still the + highest-risk area for future monolith drift. + +## Dependency management + +This repo uses `uv`. ```bash uv sync --locked --group dev --group notebooks -uv run ruff check . uv run pytest -q -uv run pytest -q -m smoke -uv run pytest -q -m repo_matrix -uv run pytest -q -m fleet -uv run pytest -q -m integration -``` - -The default `uv run pytest -q` lane excludes only the full data-backed `fleet` matrix so local feedback stays bounded while still keeping ordinary integration checks, the repo-wide config sweep, and a few real temp-copy smoke runs in the default suite. - -Add/remove dependencies: - -```bash -uv add -uv add --group dev -uv remove -``` - -If you edit `pyproject.toml` manually, regenerate the lockfile: - -```bash -uv lock -``` - ---- - -### Upgrading dependencies - -To upgrade a pinned package: - -```bash -uv sync --upgrade-package +uv run ruff check . +uv run ruff format . --check ``` -Commit `pyproject.toml` and `uv.lock` together. +For the maintainer gate and docs integrity loop, see +[Repo maintenance](../repo-maintenance.md), +[Quality](../../QUALITY.md), and +[Reliability](../../RELIABILITY.md). diff --git a/docs/dev/journal.md b/docs/dev/journal.md index da1feb3..832f34b 100644 --- a/docs/dev/journal.md +++ b/docs/dev/journal.md @@ -1,5 +1,37 @@ # Dev Journal +## 2026-04-19: Protocol Variant Split + Notebook Launch Fail-Fast Hardening + +Reduced two remaining information-architecture hotspots without changing the +public CLI or protocol catalog surface. + +- Split the heavier single-reporter and retron matched-control protocol + descriptors out of `src/reader/protocols/builtins.py` into + `src/reader/protocols/_builtins_plate_reader_variants.py`. +- Kept `reader.protocols.builtin_protocol_catalog()` and the public builtin + ordering unchanged so runtime/CLI imports still see the same stable catalog. +- Split Marimo launch runtime-path/env setup and managed-session registry logic + out of `src/reader/workbench/notebooks/launch.py` into: + - `src/reader/workbench/notebooks/_launch_runtime.py` + - `src/reader/workbench/notebooks/_launch_registry.py` +- Reordered notebook launch planning so missing targets fail before repo-local + `.cache/marimo/` state is created. +- Added regression coverage for: + - missing notebook targets not creating launch runtime dirs + - managed-session register/unregister round trips + - staged retron experiment CLI preflight surfaces (`validate --no-files`, + `run --dry-run`, `plot --list`, `export --list`, `inspect`) +- Tightened the repo-local `reader-experiment-bootstrap` skill so its output + contract and trigger boundaries match the current skill design bar. + +The placement rule is clearer again: + +- `builtins.py` is the public protocol catalog façade, not the place where the + largest assay-family descriptor blocks should keep growing +- notebook launch planning fails fast before mutating repo-local runtime state +- repo-local skills route to docs with an explicit output contract instead of + acting like loose prose notes + ## 2026-03-16: Progressive-Disclosure CLI + Docs Hardening Slice Audited the live `reader/v7` surface as a real operator and agent harness, then diff --git a/docs/dev/sfxi_triptych_sequence_plugin_spec.md b/docs/dev/sfxi_triptych_sequence_plugin_spec.md new file mode 100644 index 0000000..5581f71 --- /dev/null +++ b/docs/dev/sfxi_triptych_sequence_plugin_spec.md @@ -0,0 +1,541 @@ +# SFXI Triptych Sequence Plot Plugin Spec + +Status: Initial implementation slice landed + +Audience: `reader` and `dnadesign` maintainers + +## Summary + +The current SFXI triptych sequence renderer is useful as an +experiment-scoped preview, but it is not yet a durable `reader` integration. +It owns a standalone JSON config, custom manifest, dry-run behavior, output +cleanup, BaseRender style shims, and raster normalization path outside the +canonical `reader` experiment and records surfaces. + +This spec promotes the workflow into a formal `reader` plot plugin while +preserving the package boundary: + +- `reader` owns experiment config, protocol semantics, plate-reader plots, + figure composition, output persistence, records, dry-run, and validation. +- `dnadesign` owns sequence records, USR/DenseGen projections, GenBank + overlays, BaseRender styles, render profiles, and sequence-render contract + versions. + +The target outcome is a plot that "just works" through normal `reader` +surfaces, without private `dnadesign.*.src.*` imports, silent fallbacks, or +manual output cleanup. + +## Implementation Status + +The initial vertical slice is implemented: + +- `plot/sfxi_triptych_sequence` is registered as a formal reader plot plugin. +- `logic/sfxi_screen` can expose `sfxi_triptych_sequence` as a semantic plot + output when configured. +- The plot emits one canonical file-bundle record through reader's existing + file-bundle record path. +- Rendering publishes through a staging directory so a failed render does not + delete the previous successful bundle. +- Reader imports only public dnadesign surfaces. +- dnadesign exposes `dnadesign.baserender.sequence_panel.v1`, + `promoter_compact_slide.v1`, public style helpers, and + `render_sequence_panel_image`. + +Remaining follow-up work: + +- replace any remaining experiment-local historical scripts with the plugin + path where appropriate +- broaden visual smoke coverage beyond the synthetic fixture +- decide whether the protocol default profile should eventually include this + heavier bundle or keep it opt-in + +## Problem + +The current preview lives at: + +```text +experiments/2026/20260501_sfxi_promoter_setpoint_scatter/ +``` + +It successfully renders an SFXI review bundle, but it has contract drift from +the rest of `reader`: + +- it is not discoverable as a first-class `reader/v7` experiment plot surface +- it writes a sidecar manifest instead of one canonical bundle record +- config validation is shallow and mostly top-level +- output cleanup happens before successful replacement +- BaseRender style details and image normalization live in experiment code +- dry-run is not a machine-readable preflight surface + +The fix should not move SFXI scoring or sequence rendering into the wrong +package. The fix is to narrow and version the boundary. + +## Goals + +- Add a formal `reader` plot plugin: + + ```text + plot/sfxi_triptych_sequence + ``` + +- Persist one canonical figure bundle record containing all subplot outputs and + provenance. +- Keep user-facing authoring semantic and protocol-owned, not plugin-shaped. +- Keep sequence rendering behind a public `dnadesign` contract. +- Fail fast when required data, package APIs, or contract versions are missing. +- Preserve the current preview output as the behavior and visual baseline unless + intentionally revised. +- Make dry-run and validate useful for CI and agent workflows. +- Make failed renders preserve the last successful bundle. + +## Non-Goals + +- Do not change SFXI vec8 math or SFXI objective scoring. +- Do not introduce reader-side reimplementations of BaseRender. +- Do not import `dnadesign.*.src.*` from `reader`. +- Do not add hidden fallback rendering or hidden scoring compatibility modes. +- Do not generalize every future sequence visualization before this SFXI plot is + stable. +- Do not hand-edit generated files under `experiments/**/outputs/`. + +## Ownership Boundaries + +| Surface | Owner | Notes | +| --- | --- | --- | +| Experiment config | `reader` | Public config remains `reader/v7`. | +| Protocol semantics | `reader` | `logic/sfxi_screen` decides when this plot is exposed. | +| Plate-reader traces | `reader` | OD600, YFP/CFP, snapshot selection, CIs, and labels. | +| Figure composition | `reader` | Multi-row triptych layout and bundle persistence. | +| Canonical artifact records | `reader` | One bundle record in `outputs/manifests/records.json`. | +| Sequence records | `dnadesign` | USR/DenseGen/GenBank source of sequence truth. | +| Sequence render semantics | `dnadesign` | BaseRender adapters, labels, styles, and render diagnostics. | +| Style profiles | `dnadesign` | Public named profile, not copied dict shims in `reader`. | +| SFXI scalar objective | `dnadesign` | Existing OPAL public scoring API remains authoritative. | + +## Public Contract Names + +Use stable, consumer-neutral names: + +```text +dnadesign.baserender.sequence_panel.v1 +promoter_compact_slide.v1 +reader.sfxi_triptych_sequence_bundle.v1 +``` + +Rationale: + +- `sequence_panel.v1` describes the abstraction, not the current notebook or + SFXI use case. +- `promoter_compact_slide.v1` describes the visual profile and can survive new + assay consumers. +- `reader.sfxi_triptych_sequence_bundle.v1` describes the persisted figure + bundle contract owned by `reader`. + +Avoid names such as `notebook_render_contract`, `densegen_promoter_only`, or +`sfxi_baserender_hack`; those encode an implementation path instead of a stable +boundary. + +## Proposed Reader Surfaces + +### Domain Module + +```text +src/reader/domains/logic/sfxi/triptych_sequence.py +``` + +Responsibilities: + +- validate the typed triptych config +- resolve input artifacts and required columns +- build the row-level render plan +- join vec8 rows to sequence metadata +- call the optional dnadesign adapter +- compose each promoter figure +- build the canonical bundle manifest payload + +It should not: + +- own BaseRender feature layout rules +- parse private dnadesign record internals +- mutate generated outputs before successful render completion +- expose plugin IDs as public assay semantics + +### Plot Plugin + +```text +src/reader/plugins/plot/sfxi_triptych_sequence.py +``` + +Plugin id: + +```text +plot/sfxi_triptych_sequence +``` + +Plugin responsibilities: + +- declare input ports and file-bundle output port +- load `PluginConfig`/Pydantic config +- call the domain primitive +- participate in `reader plot --list`, `reader validate`, and dry-run surfaces +- report missing optional dependencies as actionable preflight issues + +The plugin should stay a thin adapter. Plot mechanics and validation belong in +the domain module. + +### Protocol Exposure + +Expose through `logic/sfxi_screen` as a semantic plot output: + +```yaml +protocol: + id: logic/sfxi_screen + outputs: + plots: + include: + - sfxi_triptych_sequence +``` + +User config should not need to name internal plugin config fields unless using a +maintainer or expert override. + +## Proposed dnadesign Surfaces + +Expose a public sequence-panel contract from `dnadesign.baserender`. + +Candidate public API: + +```python +from dnadesign.baserender import ( + BASERENDER_SEQUENCE_PANEL_CONTRACT_VERSION, + SequencePanelConfig, + SequencePanelDiagnostics, + list_style_presets, + render_sequence_panel_image, + resolve_style, +) +``` + +Required behavior: + +- contract id: `dnadesign.baserender.sequence_panel.v1` +- default profile: `promoter_compact_slide.v1` +- input: public sequence render records or adapter-supported public records +- output: image/renderable plus diagnostics +- diagnostics include bounds, strand count, feature count, legend entries, + profile id, and warnings +- invalid style or palette values fail during style resolution, not after render + +This API should absorb the current need for reader-side raster crop thresholds, +manual canvas sizing, large copied style dictionaries, and source-string +sentinel knowledge. + +## Canonical Bundle Artifact + +The canonical output is one bundle record. Individual PNG/PDF/MP4/index files +are members of the bundle, not independent top-level records. + +Record identity: + +```yaml +artifact_type: figure_bundle +artifact_subtype: sfxi_triptych_sequence +contract_version: reader.sfxi_triptych_sequence_bundle.v1 +``` + +Expected bundle members: + +```text +outputs/plots/sfxi_triptych_sequence/.png +outputs/plots/sfxi_triptych_sequence/.pdf +outputs/plots/sfxi_triptych_sequence/.mp4 +outputs/exports/sfxi_triptych_sequence/_index.csv +outputs/manifests/records.json +``` + +Required manifest fields: + +| Field | Meaning | +| --- | --- | +| `bundle_id` | Stable id for this rendered bundle. | +| `contract_version` | `reader.sfxi_triptych_sequence_bundle.v1`. | +| `plot_id` | `sfxi_triptych_sequence`. | +| `protocol_id` | Expected `logic/sfxi_screen` when protocol-bound. | +| `source_experiment_id` | Reader experiment id. | +| `source_vec8_artifact` | Source vec8 artifact path or record id. | +| `row_count` | Number of promoter panels rendered. | +| `row_order` | Ordered promoter/design ids in the bundle. | +| `reference_rows` | Reference promoter ids and degraded/full status. | +| `vec8_selected_time_h` | Time used for vec8 derivation. | +| `snapshot_target_time_h` | Requested visual snapshot time. | +| `snapshot_observed_time_h` | Actual snapshot time used. | +| `snapshot_fell_back` | Boolean fallback flag. | +| `snapshot_fallback_delta_h` | Difference between target and observed time. | +| `dnadesign_contract_id` | `dnadesign.baserender.sequence_panel.v1`. | +| `dnadesign_contract_version` | Version reported by dnadesign. | +| `sequence_profile_id` | Example: `promoter_compact_slide.v1`. | +| `outputs` | Map of PNG/PDF/MP4/index paths. | +| `created_at` | Timestamp. | + +## Config Shape + +The plugin should use a typed config model. Public protocol config should stay +semantic; plugin config is a maintainer surface. + +Sketch: + +```yaml +analysis: + sfxi_triptych_sequence: + vec8_source: sfxi.vec8.v2 + sequence_source: + provider: dnadesign.usr + dataset: usr_sfxi_pdual10_densegen_promoters + overlay: densegen_promoter_annotations + references: + include: + - pDual-10-spyp + - pDual-10-sulAp + time: + snapshot_target_time_h: 12.0 + induction_time_h: 12.0 + render: + sequence_contract: dnadesign.baserender.sequence_panel.v1 + sequence_profile: promoter_compact_slide.v1 + movie_fps: 0.85 +``` + +Validation rules: + +- `snapshot_target_time_h` must be numeric and finite. +- `induction_time_h` must be numeric and finite when shown. +- `sequence_contract` must match a supported dnadesign public contract. +- `sequence_profile` must exist according to `dnadesign.baserender`. +- reference ids must be resolved explicitly or reported as degraded references. +- required vec8 columns must be present with canonical names. + +## Runtime Lifecycle + +1. Load and validate reader config. +2. Resolve plugin input records and declared artifacts. +3. Check optional `dnadesign` dependency. +4. Check `dnadesign.baserender.sequence_panel.v1` compatibility. +5. Validate SFXI vec8 input columns and row identity. +6. Resolve USR/DenseGen/GenBank sequence records. +7. Assert sequence equality where both reader and dnadesign provide sequence + strings. +8. Build row-level render plan. +9. In dry-run mode, emit JSON plan and stop before rendering. +10. Render figures into a staging directory. +11. Verify expected PNG/PDF/MP4/index members exist. +12. Atomically publish the bundle. +13. Register one canonical bundle record. + +## Failure and Degraded-Mode Contract + +No silent fallback is allowed. + +| Condition | Behavior | +| --- | --- | +| `dnadesign` not installed | Fail fast with `reader[dnadesign]` install/update guidance. | +| Missing sequence-panel API | Fail fast with expected public API and version. | +| Incompatible contract version | Fail fast with expected and actual version. | +| Missing candidate sequence overlay | Fail unless explicitly configured as optional. | +| Missing reference annotation | Mark reference as degraded, include reason in manifest, do not pretend full sequence evidence exists. | +| Snapshot target not present | Use nearest only under explicit policy, record fallback fields. | +| Render error | Keep last successful bundle; staging output is discarded. | +| Bad style/palette | Fail during config/style validation. | + +## Atomic Publication + +Rendering should never delete the previous good bundle before the new bundle is +complete. + +Required approach: + +1. Create a staging directory under `outputs/.staging/` or an equivalent temp + location. +2. Render all row images and bundle outputs into staging. +3. Validate expected files and manifest payload. +4. Move or copy into final `outputs/plots`, `outputs/exports`, and + `outputs/manifests` paths. +5. Clean staging only after success. + +If any step fails before publication, final outputs remain untouched. + +## Test Plan + +### Reader Unit Tests + +- typed config accepts valid minimal config +- typed config rejects unknown contract ids +- typed config rejects invalid time values +- vec8 validation rejects missing canonical columns +- row-order planner places references first when requested +- snapshot fallback metadata is recorded + +### Reader Plugin Tests + +- `reader plot --list` exposes `sfxi_triptych_sequence` +- missing dnadesign API appears in validate/dry-run preflight +- dry-run JSON includes bundle id, row count, row order, output paths, and + contract versions +- full render writes one canonical bundle record +- interrupted render preserves previous outputs + +### dnadesign Tests + +- public root facade exports sequence-panel contract helpers +- `promoter_compact_slide.v1` resolves through public style APIs +- invalid palette values fail during style resolution +- near-feature labels render from public hints, not private source sentinels +- render diagnostics report two-strand state when requested + +### Integration Smoke + +- render a two-row fixture with one DenseGen promoter and one GenBank reference +- assert nonempty PNG/PDF outputs +- assert MP4 is produced when movie output is enabled +- assert canonical `records.json` contains one bundle record +- assert no `dnadesign.*.src.*` import appears in reader implementation + +## Delivery Slices + +### Slice 1: Reader Safety Wrapper + +Goal: harden the current behavior without changing plot appearance. + +In scope: + +- typed config model +- machine-readable dry-run +- atomic staging and publication +- canonical bundle record bridge + +Done when: + +- the current preview output regenerates +- failed render preserves previous outputs +- dry-run reports planned row count and bundle paths +- `records.json` contains the bundle record + +### Slice 2: Formal Plot Plugin + +Goal: expose the plot through normal reader plugin and protocol surfaces. + +In scope: + +- `reader.domains.logic.sfxi.triptych_sequence` +- `reader.plugins.plot.sfxi_triptych_sequence` +- built-in plugin manifest registration +- `logic/sfxi_screen` semantic plot exposure +- `reader plot --list` and `reader validate` support + +Done when: + +- users can request the plot semantically from `logic/sfxi_screen` +- plugin internals remain thin +- current experiment script can be removed or reduced to a wrapper + +### Slice 3: dnadesign Sequence-Panel Contract + +Goal: remove copied BaseRender style and image-normalization logic from reader. + +In scope: + +- public `dnadesign.baserender.sequence_panel.v1` +- `promoter_compact_slide.v1` profile +- public style helpers +- early style/palette validation +- render diagnostics + +Done when: + +- reader calls only public dnadesign APIs +- reader selects a profile instead of passing low-level style dictionaries +- sequence-panel diagnostics are recorded in the bundle manifest + +### Slice 4: Cleanup and Regression Hardening + +Goal: remove parallel preview contracts. + +In scope: + +- remove or retire sidecar-only manifest behavior +- remove experiment-local raster crop/style shim code +- add smoke/golden sanity coverage +- update docs and dev journal + +Done when: + +- the formal plugin is the maintained path +- the old experiment preview is either deleted or documented as historical +- all validation commands for the slice pass + +## Acceptance Criteria + +- `reader plot --list` shows `sfxi_triptych_sequence`. +- `reader validate` catches missing dnadesign, missing contract, bad config, and + missing required inputs. +- `reader plot --dry-run --format json` emits the planned bundle, row count, + output paths, and contract versions. +- Full render produces one canonical bundle record. +- No reader import reaches into `dnadesign.*.src.*`. +- Failed render does not delete the previous good PNG/PDF/MP4. +- Current visual output remains the baseline unless a visual change is + intentionally approved. +- The sequence panel uses a named dnadesign style profile rather than copied + reader-side style dictionaries. + +## Validation Commands + +Docs-only changes to this spec: + +```bash +uv run python tools/check_docs.py +git diff --check +``` + +Reader implementation slices: + +```bash +uv run pytest -q src/reader/tests/domains/logic/sfxi +uv run pytest -q src/reader/tests/plugins/plot +uv run pytest -q src/reader/tests/cli/test_plot_export.py +uv run ruff check . +uv run ruff format . --check +git diff --check +``` + +dnadesign implementation slices: + +```bash +uv run pytest -q src/dnadesign/baserender/tests +uv run ruff check src/dnadesign/baserender +uv run ruff format src/dnadesign/baserender --check +git diff --check +``` + +## Evidence Links + +- `reader/ARCHITECTURE.md`: generated outputs, records, dry-run, and domain + ownership expectations. +- `reader/DESIGN.md`: protocol-owned semantics and fail-fast behavior. +- `reader/docs/core/plugins.md`: plugin layering and registration rules. +- `reader/docs/lib/sfxi_vec8_in_reader.md`: SFXI vec8 ownership and OPAL + handoff. +- `dnadesign/DESIGN.md`: cross-tool coupling through documented artifacts or + public APIs only. +- `dnadesign/src/dnadesign/baserender/docs/reference.md`: public BaseRender + boundary and private import warning. + +## Open Risks + +- The exact dnadesign API shape may need adjustment once implementation starts. +- A full image golden test may be brittle; prefer structural diagnostics plus a + small nonblank render smoke test unless pixel stability is proven. +- Existing local experiment outputs are generated and should not be treated as + source-of-truth code artifacts. +- If USR dataset resolution still depends on a sibling checkout path, a package + resource or explicit dataset registry path will be needed for installed + `reader[dnadesign]` workflows. diff --git a/docs/guides/automation.md b/docs/guides/automation.md index 20aa44b..8249276 100644 --- a/docs/guides/automation.md +++ b/docs/guides/automation.md @@ -3,13 +3,13 @@ Use JSON output when another tool needs stable discovery, inspection, or preflight data from `reader`. -## Fleet discovery +## Experiment list ```bash uv run reader ls --root experiments --details --readiness --format json ``` -Use this as the fleet-level inventory and readiness surface. It includes +Use this as the machine-readable experiment list with readiness data. It includes `catalog`, `selection`, `summary`, and `experiments`. ## Protocol discovery @@ -19,7 +19,7 @@ uv run reader protocols --format json uv run reader plugins --protocol --category --format json ``` -Use `protocols` for the public assay surface and compiled defaults. Use +Use `protocols` for the public assay definition and compiled defaults. Use `plugins` only when you need registry-level inspection for one protocol. ## Single experiment inspection @@ -48,7 +48,7 @@ Use `validate --no-files` for schema and wiring only, `validate` when input files matter, and `run --dry-run` to inspect the execution slice without mutation. -## Result inventory +## Records ```bash uv run reader records --format json diff --git a/docs/guides/common_routes.md b/docs/guides/common_routes.md index 6a2f69c..3cd193d 100644 --- a/docs/guides/common_routes.md +++ b/docs/guides/common_routes.md @@ -10,7 +10,10 @@ uv run reader ls --root experiments --details uv run reader ls --root experiments --details --readiness ``` -The first command lists discovered experiments. `--details` adds selected pipeline, plot, and export summaries. `--readiness` adds blocked, draft, runnable, and records-ready state. +The first command lists discovered experiments. `--details` adds selected +pipeline, plot, and export summaries. `--readiness` adds the current preflight +state, including `config_error`, `template`, `draft`, `dependency_blocked`, +`blocked`, `runnable`, `legacy_outputs_present`, and `records_ready`. ## Inspect one experiment @@ -31,7 +34,7 @@ uv run reader protocols --example-config uv run reader init ./experiments/ --protocol ``` -Use the protocol commands to see the public assay surface before you scaffold a new experiment. +Use the protocol commands to see the public assay definition before you scaffold a new experiment. ## Validate before execution diff --git a/docs/guides/data_operations_plan.md b/docs/guides/data_operations_plan.md new file mode 100644 index 0000000..65aada3 --- /dev/null +++ b/docs/guides/data_operations_plan.md @@ -0,0 +1,73 @@ +# Data Operations Plan + +Use this guide as the `reader`-local overlay for deciding what must be +captured before an experiment is run. It adapts the lab-facing parts of a +[Data Operations Plan](https://merelogic.net/data_operations_plans/how) to the +workbench without turning `reader/v7` into an organization-wide policy schema. + +The short path is: + +1. Classify the dataset. +2. Capture the minimum metadata for that class. +3. Stage raw files under the standard experiment layout. +4. Use `reader` preflight, execution, and records to prove what happened. + +## Choose the Smallest Needed Reference + +- [Operating model](./data_operations_plan/operating_model.md): understand what + belongs in repo-local DOP policy, what remains outside `reader`, and which + surface owns each fact. +- [Data classes](./data_operations_plan/data_classes.md): choose the protocol + family or draft/template path before copying configs. +- [Metadata minimums](./data_operations_plan/metadata_minimums.md): decide what + must be captured and when to stop for clarification. +- [Transfer and verification](./data_operations_plan/transfer_and_verification.md): + stage inputs, run checks, and verify generated evidence. + +For the concrete intake workflow, continue with +[Experiment bootstrap](./experiment_bootstrap.md). For the execution loop, use +[Preflight, run, verify](./preflight_run_verify.md). + +Machine-readable inspection is available through the read-only +[`reader` DOP registry](../../src/reader/workbench/dop/): + +```bash +uv run reader dop classes +uv run reader dop classes --format json +uv run reader dop ready-specs --format json +``` + +Repo-local agent routing lives in +[reader-data-operations-plan](../../skills/reader-data-operations-plan/SKILL.md). +Use that skill when the task is DOP classification, DOP registry/docs +maintenance, or checking that experiment-intake guidance still matches this +overlay. + +## Operating Contract + +- Data classes are decision aids, not new config schema. +- `config.yaml`, `inputs/`, and hand-authored notes are the source of truth. +- Generated artifacts under `outputs/` are evidence, not source material. +- If well identity, treatment meaning, channel semantics, or control + interpretation is ambiguous, stop and ask instead of encoding a guess. +- Add new config fields or CLI surfaces only after the docs-level contract has + proven stable across real experiments. +- Keep the four DOP concerns separate: requirements explain why capture + matters, design explains the repo operating path, configuration defines + classes and canonical names, and instructions tell users or agents what to do + next. + +## Maintenance + +Update this overlay when: + +- a new protocol family is added; +- a recurring metadata ambiguity appears during experiment intake; +- a new external transfer path becomes common; +- validation misses a class of preventable failure; or +- a long-tail assay graduates from draft/template handling into a formal + protocol. + +Keep changes small: update one reference page first, then update +[Experiment bootstrap](./experiment_bootstrap.md) or repo-local skills only when +the operating workflow changes. diff --git a/docs/guides/data_operations_plan/data_classes.md b/docs/guides/data_operations_plan/data_classes.md new file mode 100644 index 0000000..8f1a866 --- /dev/null +++ b/docs/guides/data_operations_plan/data_classes.md @@ -0,0 +1,43 @@ +# Data Classes + +Use this page first during intake. Choose the first class that fits the +dataset, then use the matching protocol family or draft/template path. + +Return to the [Data Operations Plan](../data_operations_plan.md) when you only +need the overview. + +For machine-readable data-class and protocol-candidate output, use: + +```bash +uv run reader dop classes --format json +``` + +| Data class | Use when | Preferred `reader` route | Minimum capture | +| --- | --- | --- | --- | +| Plate-reader screen | Raw input is a Synergy/plate-reader export with well-level measurements and assay metadata. | `plate_reader/dual_reporter_screen`, `plate_reader/single_reporter_screen`, or `plate_reader/retron_sponge_screen` | Raw workbook/export, sample map, channel semantics, treatment/control meaning, plate/well coverage | +| Flow-cytometry panel | Raw input is FCS or cytometry panel data. | `cytometry/flow_panel` | Raw FCS roots/files, channel naming field, sample metadata, required metadata columns | +| Logic/SFXI analysis | Dataset is a logic-response or SFXI-style screen with explicit response/intensity channels and logic maps. | `logic/sfxi_screen` | Raw files, metadata map, response/intensity channel choices, reference design, logic-map corners | +| Aggregate/review workspace | Inputs are prior `reader` records, plots, exports, or hand-authored review material rather than one raw assay run. | `workbench/generic` or a draft/template experiment | Source experiment ids, record/export paths, review purpose, expected notebook template | +| Unsupported long-tail assay | The assay does not fit an existing protocol contract. | Start as a draft/template; add a protocol only after the metadata and execution contract are clear. | Raw source path, intended analysis, required metadata, missing protocol decision, owner for follow-up | + +If a dataset fits multiple classes, prefer the class with the strictest +control and metadata contract. For example, a matched-control retron sponge +plate should use `plate_reader/retron_sponge_screen` instead of the more +general dual-reporter route. + +## Decision Rules + +- Prefer an existing protocol when its metadata and control assumptions match + the run. +- Prefer a nearest-neighbor config only when it preserves real assay semantics, + not just plot appearance. +- Use `draft` or `template` when the run is not yet executable or when the + protocol contract is still being discovered. +- Do not make a long-tail assay look runnable by forcing it into an adjacent + protocol with different control semantics. +- Treat context as part of the class decision. An instrument calibration, + failed setup run, or review bundle can need a different route than an + experiment even when the raw instrument family is the same. + +After choosing a class, move to +[Metadata minimums](./metadata_minimums.md). diff --git a/docs/guides/data_operations_plan/metadata_minimums.md b/docs/guides/data_operations_plan/metadata_minimums.md new file mode 100644 index 0000000..2230be0 --- /dev/null +++ b/docs/guides/data_operations_plan/metadata_minimums.md @@ -0,0 +1,51 @@ +# Metadata Minimums + +Use this page when building or reviewing `config.yaml`, sample maps, metadata +workbooks, or intake handoffs. + +Return to the [Data Operations Plan](../data_operations_plan.md) when you only +need the overview. + +## Required Before Execution + +- Usage context: why the dataset is being captured, likely downstream + consumers, and any immediate decision the outputs must support. +- Dataset identity: experiment id, date/slug, assay family, and lifecycle + (`active`, `draft`, or `template`). +- Raw provenance: original filename, source location, and whether the file was + copied from local storage, Drive, or an instrument export. +- Assay semantics: instrument/readout family, channel labels, denominators or + ratios, and protocol-specific analysis choices. +- Sample map: every measured well, position, or sample accounted for when the + protocol expects complete coverage. +- Controls: blank, reference, negative, positive, paired-control, and treatment + meanings when the assay uses them. +- Canonical labels: design ids, strain ids, treatments, aliases, orders, + collections, and logic-map corners. +- Requested outputs: plot profile, export artifacts, and notebook template only + when they differ from protocol defaults. + +## Stop Conditions + +Stop intake and ask for clarification when any of these affect interpretation: + +- well coordinates or sample positions conflict; +- treatment meaning is incomplete or overloaded; +- blank/control rows are present but their role is unclear; +- channel labels drift from the selected protocol; +- reference design ids or logic-map corners cannot be reconstructed; or +- the closest existing protocol would silently change the assay meaning. + +## Storage Rules + +Metadata belongs in `config.yaml`, `inputs/` metadata files, or hand-authored +notes under `notebooks/`. Generated artifacts under `outputs/` are evidence, +not the source of truth. + +Keep lab-wide owners, approval policy, retention rules, and enterprise catalog +policy outside `reader/v7` until there is a concrete repo-local behavior to +validate. Record those details in the handoff or organization system of record +instead of widening experiment config. + +After metadata is stable, move to +[Transfer and verification](./transfer_and_verification.md). diff --git a/docs/guides/data_operations_plan/operating_model.md b/docs/guides/data_operations_plan/operating_model.md new file mode 100644 index 0000000..044000d --- /dev/null +++ b/docs/guides/data_operations_plan/operating_model.md @@ -0,0 +1,83 @@ +# Operating Model + +Use this page when changing the `reader` Data Operations Plan overlay or when a +new intake task does not fit cleanly into the existing data-class pages. + +Return to the [Data Operations Plan](../data_operations_plan.md) when you only +need the overview. + +## Component Boundaries + +Merelogic frames a Data Operations Plan as four related documents. In `reader`, +those concerns map to repo-local surfaces instead of becoming one large policy +document. + +| DOP concern | `reader` surface | Contract | +| --- | --- | --- | +| Requirements | Intake prompt, handoff notes, and experiment context | Capture why the data matters, who will consume it, and what ambiguity would make it unusable later. Do not store organization-wide requirements in `reader/v7`. | +| Design | Standard experiment layout, workbench architecture, and preflight/run/verify loop | Keep the operating path stable: `config.yaml`, `inputs/`, `notebooks/`, generated `outputs/`, and manifest-backed records. | +| Configuration | DOP registry, protocol catalog, config files, sample maps, and canonical labels | Define data classes, protocol candidates, metadata minimums, naming expectations, and stop conditions in one owned place. | +| Instructions | DOP guide pages, experiment bootstrap guide, repo-local skills, and CLI commands | Give humans and agents the shortest safe next step without duplicating every detail into one monolith. | + +## Ownership Rules + +- `src/reader/workbench/dop/` owns the machine-readable data-class and + ready-spec registry. +- `docs/guides/data_operations_plan/` owns human-facing explanations, + checklists, and examples. +- `skills/reader-data-operations-plan/` owns agent routing for DOP + classification and maintenance. +- `src/reader/protocols/` owns executable assay semantics. +- `reader/v7` config owns authored experiment intent, not lab-wide policy. +- `outputs/manifests/records.json` owns generated evidence after execution. + +When a fact must be consumed by automation, put it in the registry first and +let docs summarize it. When a fact is explanatory or procedural, keep it in the +smallest guide page that owns that decision. + +## Change Contract + +Use this order for a DOP change: + +1. Identify whether the change affects requirements, design, configuration, or + instructions. +2. Update the smallest owned surface. +3. Keep any registry change read-only and fail-fast until a later slice proves + it should affect execution. +4. Link from the overview or skill only when the route is recurring. +5. Run the docs, skill, and targeted registry checks before broad tests. + +Do not use a DOP change to widen `reader/v7`, revive legacy config keys, or +encode a guessed metadata interpretation. If a new assay needs different +execution semantics, add or change a protocol after the intake contract is +clear. + +## Maintenance Triggers + +Update the DOP overlay when: + +- a protocol is added, removed, or renamed; +- an intake task repeatedly stops on the same metadata ambiguity; +- a data class needs a different prescribed order or stop condition; +- transfer paths change for raw inputs or metadata files; +- generated evidence no longer answers the review question; or +- a long-tail assay graduates from draft/template handling to an executable + protocol. + +## Verification + +For DOP docs or skill changes: + +```bash +uv run python tools/audit_repo_skills.py +uv run python tools/check_docs.py +git diff --check +``` + +For registry or CLI changes, add: + +```bash +uv run pytest -q src/reader/tests/workbench/test_dop_registry.py +uv run reader dop classes --format json +uv run reader dop ready-specs --format json +``` diff --git a/docs/guides/data_operations_plan/transfer_and_verification.md b/docs/guides/data_operations_plan/transfer_and_verification.md new file mode 100644 index 0000000..501b3a3 --- /dev/null +++ b/docs/guides/data_operations_plan/transfer_and_verification.md @@ -0,0 +1,49 @@ +# Transfer and Verification + +Use this page after the data class and metadata contract are clear. + +Return to the [Data Operations Plan](../data_operations_plan.md) when you only +need the overview. + +## Transfer Rules + +- Put raw inputs in `inputs/` with the original filename when practical. +- Keep hand-authored notebooks in `notebooks/`; generated scaffolds belong in + `outputs/notebooks/`. +- Keep generated records, plots, exports, and manifests under `outputs/`. +- Use explicit `resources` entries for files or directories consumed by the + compiled plan. +- When materializing from Google Drive or another external system, record the + source and staged path in the handoff. +- Do not copy generated outputs from an old experiment into a new one. Copy + config/metadata intent only when the new run is semantically close, then + regenerate outputs. +- If source ownership, source location, or transfer status is unknown, leave the + intake blocked instead of inventing a path that makes preflight look cleaner. + +## Verification Commands + +Use the cheapest check that answers the next question, then broaden only when +the surface is ready: + +```bash +uv run reader validate --no-files +uv run reader validate +uv run reader run --dry-run --format json +uv run reader plot --list +uv run reader export --list +uv run reader run +uv run reader records +``` + +## Evidence Bar + +Verification is complete only when it proves: + +- the config schema and protocol binding are valid; +- declared files/resources exist or the experiment is intentionally non-active; +- the compiled pipeline, plots, exports, and notebooks match the intended data + class; +- `outputs/manifests/records.json` records the generated dataframe and + file-bundle evidence; and +- unresolved metadata assumptions are visible in the final handoff. diff --git a/docs/guides/experiment_bootstrap.md b/docs/guides/experiment_bootstrap.md new file mode 100644 index 0000000..fba9576 --- /dev/null +++ b/docs/guides/experiment_bootstrap.md @@ -0,0 +1,212 @@ +# Experiment Bootstrap + +Use this guide when creating a new `reader` experiment from raw assay data or +when auditing the local experiment list. This is the main creation and intake workflow +for the recurring "classify the data, find a similar experiment, materialize +inputs, wire config, build metadata, preflight, run, verify" loop. + +## Principles + +- Keep `AGENTS.md` as the map, not the encyclopedia. +- Start with the [Data Operations Plan](./data_operations_plan.md) data class + before copying templates or authoring config. +- Prefer existing protocol contracts and nearby experiment templates over + bespoke config authoring. +- Do not hand-edit generated `outputs/`; regenerate instead. +- Do not silently infer plate semantics when ambiguity changes well identity, + treatment meaning, or control interpretation. +- Treat tracked repo fixtures and local experiments as different audit + scopes. CI should stay stable; local experiment list audits should include ignored + experiments. + +## 1. Classify the data class + +Start by selecting the first matching class from +[Data classes](./data_operations_plan/data_classes.md): + +- plate-reader screen +- flow-cytometry panel +- logic/SFXI analysis +- aggregate/review workspace +- unsupported long-tail assay + +The selected class should determine the preferred protocol family, metadata +minimums, and transfer expectations. If no class fits, keep the experiment as +`draft` or `template` and document the missing protocol/metadata contract +instead of forcing the data into a nearby protocol. + +## 2. Discover the assay family and nearest template + +Start with the local experiment list: + +```bash +uv run reader ls --root experiments --details --readiness +uv run reader ls --root experiments --details --readiness --format json +``` + +Prefer JSON when another tool or agent will consume the output. + +Then narrow to the assay family you need: + +```bash +uv run reader ls --root experiments --details --protocol plate_reader/single_reporter_screen +uv run reader ls --root experiments --details --protocol plate_reader/dual_reporter_screen +uv run reader protocols +uv run reader protocols --example-config +``` + +Pick the closest prior experiment by: + +- data class +- protocol id +- raw instrument family +- channel semantics +- metadata shape +- plot and export choices + +Inspect before copying: + +```bash +uv run reader inspect +uv run reader steps +uv run reader explain +``` + +## 3. Create the workspace + +Use `reader init` when protocol defaults are the main starting point: + +```bash +uv run reader init ./experiments/YYYY/YYYYMMDD_shortslug --protocol +``` + +Use a nearest-neighbor config when the new run is semantically close to a prior +experiment and you need to preserve annotations, plot ids, or export behavior. + +Either way, keep the standard layout: + +```text +experiments/YYYY/YYYYMMDD_shortslug/ + config.yaml + inputs/ + notebooks/ + outputs/ +``` + +Keep hand-authored notebooks in `notebooks/`. `reader notebook` writes generated +scaffolds under `outputs/notebooks/`. + +## 4. Intake raw data + +Use the raw workbook or instrument export as the source of truth and keep the +original filename in `inputs/`. + +If the user explicitly points at Google Drive and `gws` is available, use the +local Google Workspace tooling instead of manual browser instructions: + +```bash +gws-account run bu drive files list ... +gws-account run bu drive files get --params '{"fileId":"...","alt":"media"}' --output +``` + +When the workbook schema drifts from prior experiments, inspect it before +editing config: + +- sheet names +- channel labels +- whether the file is kinetic-only or multi-part +- whether the data is a native workbook or an imported Google file + +If the channel labels differ from the template, add an explicit +`protocol.inputs.ingest.channel_map` instead of relying on inferred names. + +## 5. Build metadata deliberately + +Use the nearest prior metadata workbook or CSV as the formatting template, but +rewrite the semantic content for the new experiment. + +Preserve these contracts: + +- Every measured position must be accounted for. +- If the sample-map workflow expects full plate coverage, keep every well in the + metadata table and leave truly unused wells metadata-empty rather than + deleting rows. +- Keep blanks explicit only when the assay semantics actually require them. +- Preserve workbook structure when downstream parsing depends on it. + +Ask the user for missing or conflicting metadata when any of these are unclear: + +- well coordinates +- design ids / strain ids +- treatment lattice +- blank/control interpretation +- desired alias labels + +Do not silently resolve collisions like overlapping well assignments. Surface +the ambiguity and get confirmation first. + +## 6. Preflight the smallest slice first + +Use the normal `reader` loop: + +```bash +uv run reader validate --no-files +uv run reader validate +uv run reader run --dry-run --format json +uv run reader plot --list +uv run reader export --list +``` + +Use the cheapest command that answers the next question: + +- config shape only: `validate --no-files` +- file presence and dependency readiness: `validate` +- compiled execution slice: `run --dry-run` +- output portfolio: `plot --list` / `export --list` + +## 7. Execute and verify + +Run only after preflight is clean: + +```bash +uv run reader run +uv run reader plot +uv run reader export +uv run reader records +``` + +Verification should include: + +- `outputs/manifests/records.json` +- expected plot files +- expected export files +- any key fold-change or summary tables the experiment is supposed to produce + +## 8. Audit the local experiment list + +The repo test suite only covers tracked fixture experiments. To audit the real +local experiment directories under `experiments/`, use the local audit tool. +Omit `--years` to audit every numeric year directory under `experiments/`, or +pass explicit years when you want a narrower run: + +```bash +uv run python tools/audit_local_experiments.py +uv run python tools/audit_local_experiments.py --format json +uv run python tools/audit_local_experiments.py --years [ ...] +uv run python tools/audit_local_experiments.py --include-non-active +``` + +This tool stages experiments into temporary copies so the audit does not mutate +the original experiment outputs. By default it skips non-active lifecycles and +reports them separately. Use `--include-non-active` only when you intentionally +want to pressure-test draft/template configs too. + +## Common friction + +- The local experiment list is broader than the tracked repo fixture set. +- Workbook channel names drift more often than protocol ids do. +- Sample-map failures usually come from incomplete plate coverage, not parser + bugs. +- Draft experiments should not be forced through an end-to-end run. +- Google Drive materialization is external state; keep that step explicit in the + audit output or final handoff. diff --git a/docs/guides/getting_started.md b/docs/guides/getting_started.md index 0fe2b83..8d4a8d3 100644 --- a/docs/guides/getting_started.md +++ b/docs/guides/getting_started.md @@ -18,16 +18,16 @@ uv run ruff check . uv run ruff format . --check ``` -`uv run pytest -q` is the default test lane. It excludes only the full data-backed `fleet` matrix. The Ruff commands check lint and formatting. +`uv run pytest -q` is the default test run. It excludes only the full data-backed active-experiment run. The Ruff commands check lint and formatting. -## Inspect the experiment inventory +## Inspect the experiment list ```bash uv run reader ls --root experiments uv run reader ls --root experiments --details --readiness ``` -Start with `reader ls` to see the experiment catalog. Add `--details --readiness` when you need protocol, output, and readiness state in the same view. +Start with `reader ls` to see the experiment list. Add `--details --readiness` when you need protocol, output, and readiness state in the same view. ## Inspect one experiment before execution diff --git a/docs/guides/notebooks.md b/docs/guides/notebooks.md index 46aa42a..9bc3eab 100644 --- a/docs/guides/notebooks.md +++ b/docs/guides/notebooks.md @@ -88,6 +88,9 @@ What the scaffolded notebook includes: The default `notebook/eda`, `notebook/basic`, and `notebook/microplate` templates are intentionally minimal record explorers. They do not currently scaffold ad-hoc plotting controls or Altair chart builders. +`notebook/dual_reporter_triptych` is a neutral plate-reader review surface for dual-reporter assays. It renders +OD600 kinetics, YFP/CFP kinetics, and a YFP/CFP snapshot bar plot for one selected design without assuming SFXI +four-corner logic or vec8 export semantics. `plate_reader/retron_sponge_screen` instead defaults to `notebook/retron_sponge`, which adds an experiment-scoped plot-portfolio review, transform ladder, and semantic-table walkthrough on top of the record explorer. For cross-run retron library review, `notebook/retron_sponge_aggregate` is available as an explicit opt-in template @@ -114,13 +117,15 @@ Notes: - `uv run reader notebook --mode run --headless` - open the printed URL in Chrome MCP - or run `uv run marimo check ` for a static validation pass +* Static HTML export is useful as an execution/shareability smoke check, but it is not an interaction check. Validate dropdowns, sliders, export buttons, and chart rerenders from a live `marimo run` app. * Record discovery is catalog-first. If `outputs/manifests/records.json` is missing, the scaffolded notebook will show no datasets unless you regenerate records with `uv run reader run` or opt in with `uv run reader notebook --scan-records`. -* Common templates include `notebook/retron_sponge`, `notebook/retron_sponge_aggregate`, `notebook/eda`, `notebook/basic`, `notebook/microplate`, `notebook/cytometry`, and `notebook/sfxi_eda`. +* Common templates include `notebook/retron_sponge`, `notebook/retron_sponge_aggregate`, `notebook/eda`, `notebook/basic`, `notebook/microplate`, `notebook/dual_reporter_triptych`, `notebook/cytometry`, and `notebook/sfxi_eda`. * Template behavior is capability-driven: - plot filtering is only available for templates that declare plot-filter support - auto-pick chooses a template from declared default rules instead of hardcoded CLI branching - template applicability checks are declared on the template asset itself * `notebook/sfxi_eda` requires SFXI-capable context declared through asset requirements: either an SFXI-tagged pipeline transform or compatible dataframe records. +* `notebook/sfxi_eda` reuses the neutral dual-reporter triptych for visualization, then layers SFXI-specific vec8 recomputation, reference anchoring, and XLSX/JSON export on top. * The SFXI template draws a red dashed induction marker on the time-series plot when an induction time can be inferred from dataframe records: - preferred: an explicit column like `induction_time_h` (or `induction_time`) in the tidy dataframe - fallback: Synergy H1 ingest columns (`sheet_index` + `time`), where the first time in the second sheet is treated as the induction time diff --git a/docs/guides/preflight_run_verify.md b/docs/guides/preflight_run_verify.md index 2bbb22d..ccfabf6 100644 --- a/docs/guides/preflight_run_verify.md +++ b/docs/guides/preflight_run_verify.md @@ -45,11 +45,14 @@ uv run reader export --list uv run reader run uv run reader plot uv run reader export -uv run reader notebook +uv run reader notebook --mode none +uv run reader notebook --mode run --headless ``` -`run` materializes records. `plot`, `export`, and `notebook` materialize their -own output surfaces after the experiment is ready. +`run` materializes records. `plot` and `export` materialize their output +surfaces after the experiment is ready. `notebook --mode none` scaffolds a +review notebook without launching Marimo, and `--mode run --headless` prints a +loopback URL for agent/browser review. ## 5. Verify outputs and provenance diff --git a/docs/guides/retron_sponge_screen.md b/docs/guides/retron_sponge_screen.md index 9e83cff..651c0c3 100644 --- a/docs/guides/retron_sponge_screen.md +++ b/docs/guides/retron_sponge_screen.md @@ -33,7 +33,9 @@ The transform `transform/retron_sponge_metrics` materializes two typed assay rec - contract: `plate_reader.sponge_summary.v1` - carries `R_pre`, `P_pre`, `C_AUC`, `C_END`, `D_AUC`, `D_END`, `D_abs_AUC`, `D_abs_END`, `D_growth_AUC`, `D_growth_END`, `M_AUC`, `M_END`, `O_AUC`, `O_abs_AUC`, `S_AUC`, `S_abs_AUC`, `L_pre`, `L_post_AUC`, `T_ratio_AUC`, `T_growth_AUC`, and `T_finalOD` -The internal config key is still `protocol.analysis.semantic_metrics` for compatibility. In the user-facing docs and notebooks, treat those outputs as derived assay metrics rather than a separate semantic layer. +The canonical internal config key is `protocol.analysis.semantic_metrics`. In +the user-facing docs and notebooks, treat those outputs as derived assay metrics +rather than a separate semantic layer. ## Metric flow diff --git a/docs/guides/workbench_gardening.md b/docs/guides/workbench_gardening.md new file mode 100644 index 0000000..db78bb5 --- /dev/null +++ b/docs/guides/workbench_gardening.md @@ -0,0 +1,275 @@ +# Workbench gardening + +Use this guide when the task is to keep `reader` easy to change, assay +extensible, and operationally clear for maintainers. The matching repo-local +skill routes here; this document is the primary workflow. + +## When to use this + +Use this guide for: + +- architecture and information-architecture audits +- semantic monolith pressure around protocols, compiler surfaces, notebooks, or + registries +- assay lock-in risk in config, CLI, docs, or code organization +- stale semantics, stale docs, or legacy behavior that no longer matches + `reader/v7` +- maintainer ergonomics and CLI or JSON surface hardening + +Do not use this guide for new-experiment intake, result interpretation, or +hand-editing generated outputs under `experiments/**/outputs/`. + +## Adjacent routes + +Use a narrower or broader route when the task is not primarily about workbench +architecture or maintainability: + +- use [Experiment bootstrap](./experiment_bootstrap.md) for new-experiment + intake, metadata staging, or local experiment audits +- use [Repo maintenance](../repo-maintenance.md) when branch state, publish + flow, or CI topology becomes part of the task +- use [Plugin development](../core/plugins.md) when the real work is adding a + new plugin mechanic rather than reducing workbench drift + +## Gardening modes + +Pick one mode before you start so the cycle stays reviewable: + +- `audit-only` + - map ownership and pressure, then stop with a ranked next slice +- `docs-sync` + - bring maintainer docs and routes back in sync with actual runtime behavior +- `boundary-hardening` + - make one small structural cut that reduces concentration or lock-in +- `surface-contracts` + - tighten CLI, JSON, or preflight/run/verify evidence surfaces + +## Core invariants + +Start every gardening cycle by checking the current invariants instead of +rephrasing them from memory: + +- [ARCHITECTURE.md](../../ARCHITECTURE.md) +- [DESIGN.md](../../DESIGN.md) +- [QUALITY.md](../../QUALITY.md) +- [RELIABILITY.md](../../RELIABILITY.md) +- [docs/repo-change-gate.md](../repo-change-gate.md) + +The invariants that usually matter most are: + +- experiment-scoped IO remains the workbench unit of work +- `reader/v7` stays the only public config schema +- protocols own assay semantics and output vocabulary +- plugins stay mechanical adapters around domain or runtime logic +- discovery, validation, and dry-run surfaces stay first-class +- generated outputs remain generated and manifest-backed + +## Skill composition + +Pair this maintainer workflow with global skills when the cycle needs a deeper +specialized lens: + +- `deep-introspection` to map current architecture, ownership, and runtime flow +- `pragmatic-programming-principles` to choose boundaries, contracts, and + fail-fast behavior +- `code-review` when the main deliverable is findings rather than edits +- `harness-engineering` to tighten CLI, JSON, or end-to-end verification + contracts +- `deslop` only when cleaning maintainer prose after the technical content is + correct + +## Evidence discipline + +Do not run this workflow from memory or from abstract preferences alone. +Ground the cycle in: + +- canonical repo docs +- the changed code surface +- representative CLI evidence when the claim touches runtime behavior + +The output should make clear which statements are verified facts, which are +inferences, and which are deferred follow-ups. + +## Harness endpoints for this workflow + +When hardening the workflow itself, treat these three endpoints as primary: + +- `knowledge-integrity` + - docs, routes, and source tables stay current and cross-linked +- `autonomy-capability` + - agents can follow a bounded workflow with deterministic checks +- `architecture-invariants` + - the guide keeps routing work toward canonical `reader` boundaries instead + of smearing them together + +## Workflow + +### 1. Define the cycle scope + +State: + +- the target surface +- the gardening mode +- whether the cycle is audit-only or includes code or docs changes +- the workbench invariant you are protecting +- the representative assay family, experiment, or CLI surface if runtime + verification is needed + +Prefer one small slice over a repo-wide cleanup. If the target is broad, split +it by ownership boundary first. + +### 2. Map current ownership + +Trace the surface through the workbench layers described in +[ARCHITECTURE.md](../../ARCHITECTURE.md): + +1. authored config +2. protocol semantics +3. compiled declaration and runtime execution +4. CLI, notebooks, records, and generated outputs + +Use repository docs and implementation together. For runtime-facing mapping, +prefer machine-readable CLI evidence before making architectural claims: + +```bash +uv run reader ls --root experiments --details --readiness --format json +uv run reader inspect --format json +uv run reader explain --format json +``` + +The goal is to name where meaning lives, where mechanics live, and where those +two are being confused. + +### 3. Identify pressure and drift + +Look for these failure modes: + +- monolith pressure + - one module or helper is collecting semantics, mechanics, and rendering +- assay lock-in + - config or code assumes one assay family is the normal case +- stale semantics or docs + - docs, routes, or invariants no longer match current behavior +- harness drift + - CLI or JSON surfaces are brittle, inconsistent, or not fail-fast +- directory drift + - ownership boundaries in code placement no longer match the architecture +- legacy creep + - removed behavior or hidden fallback is trying to return through shims or + ambiguous docs + +Use the repo-local checklist at +[skills/reader-workbench-gardening/references/checklists.md](../../skills/reader-workbench-gardening/references/checklists.md) +to keep this pass concrete. + +### 4. Choose the smallest reversible slice + +Typical gardening moves are: + +- move assay semantics out of generic runtime or CLI code +- split a growing family helper before it becomes the only place new assay work + can land +- replace duplicated docs with canonical routes to existing architecture docs +- tighten validation or fail-fast behavior instead of carrying compatibility + shims +- improve JSON or preflight surfaces so agents can inspect behavior without + mutation + +Avoid wide cleanup passes that mix unrelated layers. If the slice would touch +many ownership boundaries at once, it is probably too large. + +When in doubt, prefer: + +- a doc-route repair over a new overview document +- a fail-fast validation improvement over a compatibility shim +- a family-specific helper split over a generic abstraction that only moves the + complexity +- one representative CLI contract improvement over a broad surface rewrite + +### 5. Verify the slice + +Use the smallest verification bundle that matches the risk. Start with the repo +change gate and then add representative runtime proof only where needed. + +Docs or routing changes: + +```bash +uv run python tools/audit_repo_skills.py +uv run python tools/check_docs.py +git diff --check +``` + +Code or CLI changes: + +```bash +uv run ruff check . +uv run ruff format . --check +uv run pytest -q +git diff --check +``` + +Runtime, contract, or harness changes should also include a representative CLI +preflight path: + +```bash +uv run reader ls --root experiments --details --readiness --format json +uv run reader inspect --format json +uv run reader validate --no-files --format json +uv run reader explain --format json +uv run reader run --dry-run --format json +``` + +When the gardening cycle changes end-to-end experiment behavior, add the +smallest repo marker that proves the surface: + +```bash +uv run pytest -q -m repo_matrix +uv run pytest -q -m integration +uv run pytest -q -m active_experiments +``` + +Use only the smallest marker set that matches the risk. If plots, exports, or +notebooks changed, add the matching `plot --list`, `export --list`, `records`, +or notebook command for one representative experiment. + +### 6. Close the cycle + +Before finalizing: + +1. review the diff +2. confirm no generated outputs were hand-edited +3. state skipped checks explicitly +4. route through [docs/repo-change-gate.md](../repo-change-gate.md) when + tracked docs or code changed + +If the task includes landing changes, do the normal branch, commit, and publish +steps after the gate passes. Commit and push are delivery steps, not the +identity of this workflow. + +If the task expands into branch state, CI behavior, or remote publish steps, +continue into [Repo maintenance](../repo-maintenance.md) rather than trying to +hide that broader scope inside this guide. + +## Deliverables + +A good gardening cycle produces: + +- the selected gardening mode +- a scoped statement of the invariant or boundary under review +- an evidence summary: canonical docs, code paths, and CLI probes used +- an ownership or drift summary +- the smallest reversible slice selected +- verification evidence +- residual risks and the next likely maintenance pass + +## Related docs + +- [Preflight, run, verify](./preflight_run_verify.md) +- [Automation and JSON](./automation.md) +- [Experiment bootstrap](./experiment_bootstrap.md) +- [Repo change gate](../repo-change-gate.md) +- [Repo maintenance](../repo-maintenance.md) +- [Architecture](../../ARCHITECTURE.md) +- [Design](../../DESIGN.md) +- [Quality](../../QUALITY.md) +- [Reliability](../../RELIABILITY.md) diff --git a/docs/index.md b/docs/index.md index 2fe2004..a505524 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,27 +1,23 @@ # Documentation index -Use [docs/README.md](./README.md) for the full documentation map. If you -reached `docs/index.md` directly, start here for the guide or reference page -that matches the question. +Use [docs/README.md](./README.md) for the full documentation map. This page +stays thin so it does not turn into a second, drifting docs index. -## Start here +Most common routes: - [Getting started](./guides/getting_started.md) - [Common tasks](./guides/common_routes.md) - -## Core workflows - - [Preflight, run, verify](./guides/preflight_run_verify.md) - [Automation and JSON](./guides/automation.md) - -## Reference - +- [Data Operations Plan](./guides/data_operations_plan.md) +- [Experiment bootstrap](./guides/experiment_bootstrap.md) - [CLI reference](./core/cli.md) - [Configuring `reader/v7`](./core/pipeline.md) -## Maintainer docs +Maintainer routes: -- [Documentation index](./README.md) +- [Full docs map](./README.md) - [Repo change gate](./repo-change-gate.md) - [Repo maintenance](./repo-maintenance.md) +- [Workbench gardening](./guides/workbench_gardening.md) - [Architecture](../ARCHITECTURE.md) diff --git a/docs/lib/sfxi_vec8_in_reader.md b/docs/lib/sfxi_vec8_in_reader.md index 3f3f09c..206d4b7 100644 --- a/docs/lib/sfxi_vec8_in_reader.md +++ b/docs/lib/sfxi_vec8_in_reader.md @@ -1,6 +1,6 @@ ## Generating SFXI 8-vectors in `reader` -This document describes how **reader** processes **Setpoint Fidelity x Intensity** (SFXI) 8-vectors from experimental measurements. The objective/scalar spec is outside of reader (for more details on SFXI see [here](https://github.com/e-south/dnadesign/blob/main/src/dnadesign/opal/docs/setpoint_fidelity_x_intensity.md)). The process here involves collecting microplate reader data, selecting a timepoint, and then deriving an 8‑vector per *design_id* in the fixed state order **00, 10, 01, 11**. +This document describes how **reader** processes **Setpoint Fidelity x Intensity** (SFXI) 8-vectors from experimental measurements. The objective/scalar spec is outside of reader and is owned by **dnadesign** (for more details on SFXI see the OPAL objective docs in `dnadesign/src/dnadesign/opal/docs/plugins/objective-sfxi.md`). The process here involves collecting microplate reader data, selecting a timepoint, and then deriving an 8‑vector per *design_id* in the fixed state order **00, 10, 01, 11**. 8-vector definition: @@ -22,8 +22,9 @@ This document describes how **reader** processes **Setpoint Fidelity x Intensity 6. [Logic channel](#logic-channel) 7. [Intensity channel](#intensity-channel) 8. [Output](#output) -9. [Configuration entry point](#configuration-entry-point) -10. [Usage demo](#usage-demo) +9. [Setpoint scatter plot](#setpoint-scatter-plot) +10. [Configuration entry point](#configuration-entry-point) +11. [Usage demo](#usage-demo) --- @@ -37,6 +38,7 @@ This document describes how **reader** processes **Setpoint Fidelity x Intensity * **Important:** The `delta` used in **reader** (`log2_offset_delta`) must match OPAL’s `intensity_log2_offset_delta`. * **Reader makes this explicit** by writing an `intensity_log2_offset_delta` column into every vec8 row. When `log2_offset_delta` is left at its default (`0.0`), this column will be all zeros. * **If OPAL uses a different delta, recovered linear intensities and downstream scores will be inconsistent.** Keep the values in sync (preferably by validating against the vec8 column at ingest time). + * Reader plot code imports only the public scoring boundary `dnadesign.opal.api.sfxi`, never OPAL internals. - The **reader** transform plugin (`src/reader/plugins/transform/sfxi.py`) delegates to `reader.domains.logic.sfxi.*` and adds pipeline plumbing and logging. @@ -63,6 +65,10 @@ Key modules in `src/reader/domains/logic/sfxi/`: * `build_vec8_from_tidy(...)`, `run_sfxi(...)` * `write_outputs(...)` +* **Setpoint scoring plot prep/rendering:** `setpoint_scatter.py` + + * `score_sfxi_setpoints(...)` + * `render_sfxi_setpoint_scatter(...)` --- @@ -437,6 +443,26 @@ See `src/reader/contracts/builtins/` for the canonical contracts referenced by t --- +### Setpoint scatter plot + +The protocol figure `sfxi_setpoint_scatter` consumes the typed `sfxi.vec8.v2` +record at `sfxi_vec8/vec8`, calls the public dnadesign scorer, and writes plot +files through reader's plot sink under `outputs/plots` by default. + +Persisted score columns and plot axes keep the OPAL objective channel names: + +* `logic_fidelity` +* `effect_scaled` +* `sfxi` + +Reader does not persist compatibility aliases such as `f_logic`, `e_scaled`, or +`score` for this plot surface. If `dnadesign.opal.api.sfxi` is unavailable or +has an unsupported `SFXI_API_VERSION`, `reader validate` and +`reader plot --dry-run` report the missing optional dependency before plot +execution. Install or sync `reader[dnadesign]` for this figure. + +--- + ### Configuration entry point In `reader/v7`, SFXI is normally configured through the bound protocol plus @@ -462,7 +488,22 @@ protocol: reference: design_id: REF stat: mean + analysis: + sfxi_objective: + setpoints: + and: [0.0, 0.0, 0.0, 1.0] + or: [0.0, 1.0, 1.0, 1.0] + scaling: + percentile: 95 + min_n: 5 + eps: 1.0e-8 + exponents: + logic_exponent_beta: 1.0 + intensity_exponent_gamma: 1.0 + intensity_log2_offset_delta: 0.0 outputs: + plots: + include: [sfxi_setpoint_scatter] exports: include: [logic_summary_workbook] @@ -530,7 +571,16 @@ The following example uses the SFXI-capable experiment * `outputs/exports/sfxi/vec8.xlsx` -3) Launch the SFXI notebook template (interactive vec8 inspection + export panel): +3) Render the SFXI setpoint scatter figure when configured: + + ```bash + uv run reader plot experiments/2025/20250915_sfxi_pSingle_ref/config.yaml --only sfxi_setpoint_scatter + ``` + + This writes plot files under `outputs/plots/`, such as + `outputs/plots/sfxi_setpoint_scatter.pdf`. + +4) Launch the SFXI notebook template (interactive vec8 inspection + export panel): ```bash uv run reader notebook experiments/2025/20250915_sfxi_pSingle_ref/config.yaml --template notebook/sfxi_eda --mode edit @@ -542,4 +592,4 @@ The following example uses the SFXI-capable experiment `transform/sfxi` step or existing SFXI dataframe records. * You can repeat the same workflow with any of the other SFXI-capable experiments in `experiments/2025/`. -4) (Optional) export vec8 from the notebook UI: +5) (Optional) export vec8 from the notebook UI: diff --git a/docs/repo-change-gate.md b/docs/repo-change-gate.md index 6d357e0..3dc6f28 100644 --- a/docs/repo-change-gate.md +++ b/docs/repo-change-gate.md @@ -4,9 +4,9 @@ Use this as the minimum maintainer gate before landing tracked changes in `reade ## Scope -This gate is for ordinary repo-local changes to code, docs, tests, or CLI behavior. It is the shortest path for checking that a change is reviewable and does not violate the workbench contract. +This gate is for ordinary repo-local changes to code, docs, tests, or CLI behavior. It is the shortest path for checking that a change is reviewable and does not break the expected workbench behavior. -For deeper repo surfaces, publish flow, or CI topology, continue to [repo-maintenance.md](./repo-maintenance.md). +For broader repo behavior, publish flow, or CI topology, continue to [repo-maintenance.md](./repo-maintenance.md). ## Minimum Gate @@ -14,10 +14,10 @@ Before finalizing a change: 1. Review the diff. 2. Confirm you did not hand-edit `experiments/**/outputs/`. -3. Run the smallest verification bundle that matches the change: +3. Run the smallest verification set that matches the change: - docs-only: run `uv run python tools/check_docs.py` and `git diff --check` - - CLI/code: targeted tests plus `uv run ruff check .` - - runtime/contract changes: targeted tests, lint, and a representative CLI preflight path + - CLI/code: targeted tests plus `uv run ruff check .`, `uv run ruff format . --check`, and `git diff --check` + - runtime or contract changes: targeted tests, `uv run ruff check .`, `uv run ruff format . --check`, a representative CLI preflight command, and `git diff --check` 4. State any skipped verification explicitly. ## Non-Negotiable Invariants @@ -37,12 +37,12 @@ uv run ruff format . --check uv run pytest -q uv run pytest -q -m smoke uv run pytest -q -m repo_matrix -uv run pytest -q -m fleet +uv run pytest -q -m active_experiments uv run pytest -q -m integration git diff --check ``` -`uv run pytest -q` is the fast default lane: it excludes only the full data-backed `fleet` matrix while still running ordinary integration coverage and the repo-wide config sweep. Use `uv run pytest -q -m repo_matrix` when the change mainly touches repo config invariants, `uv run pytest -q -m fleet` for the full active-experiment end-to-end matrix, and `uv run pytest -q -m integration` when you intentionally want the full integration surface. Use the smallest subset that matches the risk of the change and explain any omission. +`uv run pytest -q` is the fast default test run: it excludes only the full data-backed active-experiment run while still covering ordinary integration checks and the repo-wide config sweep. Use `uv run pytest -q -m repo_matrix` when the change mainly touches repo config invariants, `uv run pytest -q -m active_experiments` for the full active-experiment end-to-end run, and `uv run pytest -q -m integration` when you intentionally want the full integration set. Use the smallest subset that matches the risk of the change and explain any omission. ## Related Docs diff --git a/docs/repo-maintenance.md b/docs/repo-maintenance.md index 7d37448..e80e7a1 100644 --- a/docs/repo-maintenance.md +++ b/docs/repo-maintenance.md @@ -1,17 +1,17 @@ # Repo Maintenance -This document is the maintainer guide for repo-wide changes, publish flow, and ongoing workbench hygiene. +This document is the maintainer guide for repo-wide changes, publish flow, and ongoing repo hygiene. ## Use This When - the change crosses multiple package boundaries - branch or publish state matters - CI or verification policy needs to change -- docs, CLI, and runtime contracts need to be kept in sync across the repo +- docs, CLI, and runtime behavior need to stay in sync across the repo For the smaller tracked-change workflow, start with [repo-change-gate.md](./repo-change-gate.md). -## Maintenance Surfaces +## Reference Points - Repo entry point: [README.md](../README.md) @@ -32,13 +32,13 @@ For the smaller tracked-change workflow, start with [repo-change-gate.md](./repo - Keep the workbench discoverable from the CLI before requiring people to read source. - Prefer explicit registries and typed contracts over implicit discovery. -- Keep docs aligned with the actual runtime and CLI surface. +- Keep docs aligned with the actual runtime and CLI behavior. - Keep protocol semantics tighter than plugin mechanics. -- Favor small, reviewable changes over broad rewrites unless the rooted cut is clear. +- Favor small, reviewable changes over broad rewrites unless the broader cut is clearly justified. ## Verification Strategy -Choose the cheapest verification bundle that still exercises the risk: +Choose the smallest verification set that still exercises the risk: - docs and routing - CLI discovery and preflight @@ -46,7 +46,7 @@ Choose the cheapest verification bundle that still exercises the risk: - plugin or contract boundary changes - repo-wide smoke and lint checks -The quality bar for those bundles is defined in [QUALITY.md](../QUALITY.md). +The quality bar for those checks is defined in [QUALITY.md](../QUALITY.md). For docs and routing changes, start with: @@ -59,14 +59,17 @@ git diff --check `reader` uses two GitHub Actions workflows: -- `CI` in [.github/workflows/ci.yaml](../.github/workflows/ci.yaml): pull-request and push feedback. It runs docs integrity, lockfile drift checks, lint, format, compile, build, and the default test lane with coverage. The default lane is `uv run pytest -q`, which excludes only `fleet`. -- `Integration` in [.github/workflows/integration.yaml](../.github/workflows/integration.yaml): slower main-branch, nightly, and manual validation. It runs `pytest -m integration` with `--durations=25` and uploads the experiment readiness inventory as an artifact. +- `CI` in [.github/workflows/ci.yaml](../.github/workflows/ci.yaml): pull-request and push feedback. It runs docs integrity, lockfile drift checks, lint, format, compile, build, and the default test run with coverage. The default run is `uv run pytest -q`, which excludes only the active-experiment run. +- `Integration` in [.github/workflows/integration.yaml](../.github/workflows/integration.yaml): slower main-branch, nightly, and manual validation. It runs `pytest -m integration` with `--durations=25` and uploads the experiment readiness summary as an artifact. -Local command contract: +Local commands: +- `uv run ruff check .`: repo-wide lint +- `uv run ruff format . --check`: formatting check - `uv run python tools/check_docs.py`: docs links and routing integrity -- `uv run pytest -q`: fast default lane, excludes only `fleet` +- `uv run pytest -q`: fast default test run, excludes only the active-experiment run - `uv run pytest -q -m repo_matrix`: repo-wide config and metadata sweeps - `uv run pytest -q -m smoke`: representative real-experiment smoke tests -- `uv run pytest -q -m fleet`: full active-experiment end-to-end matrix -- `uv run pytest -q -m integration`: full integration surface, including `repo_matrix` and `fleet` +- `uv run pytest -q -m active_experiments`: full active-experiment end-to-end run +- `uv run pytest -q -m integration`: full integration set, including `repo_matrix` and `active_experiments` +- `git diff --check`: whitespace and merge-marker hygiene diff --git a/pyproject.toml b/pyproject.toml index 89c80c1..6e87a0f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -101,11 +101,11 @@ addopts = [ "--strict-markers", "-ra", "-m", - "not integration and not fleet", + "not integration and not active_experiments", ] markers = [ - "fleet: full data-backed active-experiment end-to-end matrix excluded from the default lane", + "active_experiments: full data-backed active-experiment end-to-end matrix excluded from the default lane", "integration: cross-surface integration coverage", "repo_matrix: repo-wide config and metadata sweeps kept in the default lane", "smoke: representative real-experiment runtime coverage kept in the default test lane", diff --git a/skills/README.md b/skills/README.md new file mode 100644 index 0000000..76dddcd --- /dev/null +++ b/skills/README.md @@ -0,0 +1,24 @@ +# Repo-local skills + +Repo-local skills live here when a recurring agent workflow is specific to +`reader` and worth keeping as a reusable pattern. + +Guidelines: + +- Keep `AGENTS.md` short and route to a skill or doc when the task is recurring. +- Treat the matching guide in `docs/` as the primary workflow. The skill should stay thin and point to it. +- Keep detailed workflow guidance in `docs/` and let skills point to those docs. +- Prefer one narrowly-owned skill over a broad catch-all. +- Pair repo-local skills with existing global skills when the task also needs + external tooling such as Google Workspace access or spreadsheet editing. + +Current repo-local skill: + +- [`reader-data-operations-plan`](./reader-data-operations-plan/SKILL.md): + DOP data-class classification, DOP registry/docs alignment, and DOP + maintenance routing. + Primary workflow: [docs/guides/data_operations_plan.md](../docs/guides/data_operations_plan.md) +- [`reader-experiment-bootstrap`](./reader-experiment-bootstrap/SKILL.md): new experiment intake, metadata mapping, Drive-backed input staging, and local experiment audits. + Primary workflow: [docs/guides/experiment_bootstrap.md](../docs/guides/experiment_bootstrap.md) +- [`reader-workbench-gardening`](./reader-workbench-gardening/SKILL.md): maintain `reader`'s information architecture, semantic boundaries, and verification surfaces without locking the repo into one assay family. + Primary workflow: [docs/guides/workbench_gardening.md](../docs/guides/workbench_gardening.md) diff --git a/skills/reader-data-operations-plan/SKILL.md b/skills/reader-data-operations-plan/SKILL.md new file mode 100644 index 0000000..917c037 --- /dev/null +++ b/skills/reader-data-operations-plan/SKILL.md @@ -0,0 +1,167 @@ +--- +name: reader-data-operations-plan +description: Classifies reader datasets and maintains DOP registry/docs alignment. Use when selecting data classes, auditing gates, or updating rules. Do not use for full experiment creation, result interpretation, or outputs edits. +metadata: + version: 0.2.0 + category: scientific-workbench + tags: [reader, data-operations-plan, metadata, intake, registry] +--- + +# Reader Data Operations Plan + +## Purpose + +Keep `reader` Data Operations Plan classification and maintenance explicit by +routing agents to the DOP registry, owned docs, and source-backed checks. + +## Scope + +In scope: + +- classifying a dataset before experiment bootstrap +- checking DOP data classes, metadata minimums, stop conditions, transfer rules, + and ready-spec gates +- maintaining alignment between DOP docs, the read-only DOP registry, and + repo-local skills +- source-backed updates to DOP guidance from the Merelogic Data Operations Plan + resource + +Out of scope: + +- organization-wide ELN/LIMS, archive, retention, or role-assignment policy +- widening `reader/v7` without a separate code-change contract +- generic scientific result interpretation +- hand-editing generated `experiments/**/outputs/` + +## Skill Composition + +- Pair with `reader-experiment-bootstrap` when the user is creating or staging + an experiment. +- Pair with `reader-workbench-gardening` when the user is reorganizing DOP + surfaces or reducing monolith pressure. +- Pair with `code-change-discipline` for registry, CLI, or test changes. +- Pair with `harness-engineering` when docs, skills, or CLI evidence routes need + stronger deterministic checks. + +## Required Inputs + +- target dataset, protocol, DOP registry entry, or docs/skill surface +- whether the task is classification, intake support, or DOP maintenance +- raw input provenance or representative protocol id when classifying data +- explicit constraints around schema changes, generated outputs, and external + systems + +Clarification policy: + +- ask only when missing context would change data-class selection, metadata + interpretation, or whether a change belongs in code versus docs +- otherwise proceed with explicit assumptions and record them + +## Success Criteria + +- exactly one DOP mode is selected before work starts +- automation-facing DOP facts come from `uv run reader dop ...` or + `src/reader/workbench/dop/` +- docs and skill routes point to owned surfaces instead of duplicating registry + facts +- stop conditions block ambiguous metadata instead of encouraging inference +- external DOP claims map to dated rows in + [External sources](./references/external-sources.md) + +## Workflow + +1. Choose one mode: `classification`, `intake-support`, or `maintenance`. +2. Start with + [docs/guides/data_operations_plan.md](../../docs/guides/data_operations_plan.md) + and load only the smallest referenced page needed for the decision. +3. Use [Operating model](../../docs/guides/data_operations_plan/operating_model.md) + when the task changes DOP ownership, repo routing, or maintenance policy. +4. Use `uv run reader dop classes --format json` for data-class and + protocol-candidate facts instead of parsing prose tables. +5. Use `uv run reader dop ready-specs --format json` for ready-spec gates and + evidence expectations. +6. For experiment creation, route into + [Experiment bootstrap](../../docs/guides/experiment_bootstrap.md) after the + data class and metadata stop conditions are known. +7. For DOP maintenance, use [Workflow reference](./references/workflow.md) and + keep each fact in its owned surface. +8. Use [Endpoint contracts](./references/endpoint-contracts.md) when changing + docs, skills, registry, or CLI evidence routes. +9. Use [Test matrix](./references/test-matrix.md) for trigger, functional, and + deterministic checks. +10. Use [External sources](./references/external-sources.md) when external DOP + claims shape the update. + +## Guardrails + +- Do not infer well identity, treatment meaning, channel semantics, control + interpretation, or source provenance to make an experiment appear ready. +- Do not treat DOP data classes as new `reader/v7` schema fields. +- Do not duplicate the DOP registry into prose when automation needs the fact. +- Do not copy generated outputs as source material for a new experiment. +- Do not turn this skill into broad workbench gardening; route architecture + maintenance to `reader-workbench-gardening`. + +## Required Deliverables + +- chosen DOP mode: classification, intake support, or maintenance +- data class or registry surface reviewed +- metadata minimums and stop conditions checked +- source evidence used for any external DOP claim +- changed files or explicit audit-only result +- verification evidence and skipped checks +- assumptions, open questions, and residual risks + +## Output Contract + +Return: + +1. Decision summary + - mode, target surface, selected data class or maintenance surface, and + assumptions +2. DOP contract check + - requirements, design, configuration, and instructions concerns touched +3. Evidence bundle + - registry commands, docs or skill routes, source rows, and stop conditions +4. Change summary + - files changed or audit-only finding, with ownership boundary notes +5. Verification bundle + - commands run, pass/fail status, skipped checks, and residual risks + +## Trigger Tests + +Should trigger: + +- "Classify this dataset against the reader DOP." +- "Check whether the DOP registry and docs are aligned." +- "Update the reader DOP skill and source evidence." +- "Add a DOP data-class route for a new protocol." +- "Audit DOP metadata minimums before experiment bootstrap." + +Should not trigger: + +- "Interpret these generated plots." +- "Create a new experiment from this workbook." +- "Refactor the protocol compiler." +- "Clean up my Downloads folder." +- "Hand-edit files under outputs/." + +## Troubleshooting + +- Data class fits multiple routes: + - choose the strictest class whose protocol assumptions match the assay, then + record why broader classes were rejected +- Metadata is missing: + - keep intake blocked and ask for the missing semantic fact +- DOP docs and registry disagree: + - update the owned source first, then repair summaries and route checks +- The request becomes architecture maintenance: + - route to `reader-workbench-gardening` and preserve this skill as DOP policy + routing + +## Additional Resources + +- [Workflow reference](./references/workflow.md) +- [Endpoint contracts](./references/endpoint-contracts.md) +- [Test matrix](./references/test-matrix.md) +- [External sources](./references/external-sources.md) diff --git a/skills/reader-data-operations-plan/references/endpoint-contracts.md b/skills/reader-data-operations-plan/references/endpoint-contracts.md new file mode 100644 index 0000000..cf94ab2 --- /dev/null +++ b/skills/reader-data-operations-plan/references/endpoint-contracts.md @@ -0,0 +1,61 @@ +# Endpoint Contracts + +Use these endpoint contracts when hardening or validating +`reader-data-operations-plan`. + +## `knowledge-integrity` + +Goal: +- Keep DOP policy discoverable, source-backed, and cross-linked to owned repo + surfaces. + +Required evidence: + +- [Data Operations Plan](../../../docs/guides/data_operations_plan.md) remains + the primary DOP overview +- [Operating model](../../../docs/guides/data_operations_plan/operating_model.md) + owns DOP component boundaries +- [External sources](./external-sources.md) records dated source rows for + external DOP claims +- `uv run python tools/check_docs.py` passes after docs or route edits + +Failure handling: +- Repair missing routes before changing prose. If the source claim is stale or + unavailable, mark the claim as unverified instead of strengthening it. + +## `autonomy-capability` + +Goal: +- Let agents classify DOP data classes and check ready gates without scraping + prose or guessing from prior memory. + +Required evidence: + +- `uv run reader dop classes --format json` works for data-class facts +- `uv run reader dop ready-specs --format json` works for evidence gates +- [Test matrix](./test-matrix.md) names trigger, functional, and deterministic + checks +- `uv run python tools/audit_repo_skills.py` passes after skill edits + +Failure handling: +- If JSON registry output is unavailable, stop the automation-facing claim and + report the registry check failure. Do not replace registry facts with prose + inference. + +## `architecture-invariants` + +Goal: +- Keep DOP policy decoupled from experiment execution and public config schema. + +Required evidence: + +- The skill routes full experiment creation to `reader-experiment-bootstrap` +- The skill routes broad architecture maintenance to + `reader-workbench-gardening` +- The DOP registry remains read-only guidance unless a separate code-change + contract changes execution behavior +- Generated `experiments/**/outputs/` stay out of scope + +Failure handling: +- Stop when a DOP update would widen `reader/v7`, hand-edit generated outputs, + or encode guessed metadata semantics. Route to the owning workflow instead. diff --git a/skills/reader-data-operations-plan/references/external-sources.md b/skills/reader-data-operations-plan/references/external-sources.md new file mode 100644 index 0000000..ea303e6 --- /dev/null +++ b/skills/reader-data-operations-plan/references/external-sources.md @@ -0,0 +1,21 @@ +# External Sources + +This skill is grounded first in repository sources: + +- [Data Operations Plan](../../../docs/guides/data_operations_plan.md) +- [Operating model](../../../docs/guides/data_operations_plan/operating_model.md) +- [DOP registry](../../../src/reader/workbench/dop/) +- [Experiment bootstrap](../../../docs/guides/experiment_bootstrap.md) +- [Repo change gate](../../../docs/repo-change-gate.md) + +External source rows: + +| URL | Retrieved | Mapped update | +| --- | --- | --- | +| https://merelogic.net/data_operations_plans/how | 2026-05-01 | Source for the DOP framing: group long-tail assays into simple data classes, separate requirements/design/configuration/instructions, keep instructions easy to follow, and maintain the plan from real use. | +| https://merelogic.net/static/js/main.18e51494.js.map | 2026-05-01 | Used to verify the JS-rendered page text for the DOP component and maintenance sections because the public route returns a JavaScript app shell. | + +Use external sources to shape repo-local guidance, not to import an +organization-wide DOP wholesale. Claims in the skill should stay paraphrased and +mapped to `reader` behavior, with the repo docs and registry remaining the +operating source of truth. diff --git a/skills/reader-data-operations-plan/references/test-matrix.md b/skills/reader-data-operations-plan/references/test-matrix.md new file mode 100644 index 0000000..879b08a --- /dev/null +++ b/skills/reader-data-operations-plan/references/test-matrix.md @@ -0,0 +1,60 @@ +# Test Matrix + +Use this matrix when validating `reader-data-operations-plan` after edits. + +## Trigger Checks + +Should trigger: + +- "Classify this plate-reader workbook against the reader DOP." +- "Audit the DOP registry for protocol coverage." +- "Refresh the DOP skill against the Merelogic resource." +- "Check whether DOP metadata minimums are enough before bootstrap." +- "Add DOP guidance for a long-tail assay without changing reader/v7." + +Should not trigger: + +- "Interpret the assay results." +- "Implement a new transform plugin." +- "Create the experiment workspace now." +- "Refactor the notebook launcher." +- "Delete duplicate files in Downloads." + +## Functional Checks + +- top-level `SKILL.md` routes to the DOP overview before deeper references +- top-level `SKILL.md` exposes [Endpoint contracts](./endpoint-contracts.md) +- the skill names exactly one mode: `classification`, `intake-support`, or + `maintenance` +- DOP registry commands are preferred for data-class and ready-spec facts +- generated outputs remain out of scope +- organization-wide policy is recorded as external unless `reader` has concrete + behavior to validate + +## Deterministic Checks + +Run: + +```bash +uv run python tools/audit_repo_skills.py +uv run python tools/check_docs.py +uv run pytest -q src/reader/tests/repo/test_docs_routes.py +uv run reader dop classes --format json +uv run reader dop ready-specs --format json +git diff --check +``` + +When registry or CLI behavior changes, add: + +```bash +uv run pytest -q src/reader/tests/workbench/test_dop_registry.py +``` + +## Content-Correctness Checks + +- `references/external-sources.md` has URL, retrieved date, and mapped update + rows for external DOP claims +- Merelogic-derived guidance is paraphrased and mapped to `reader` behavior + instead of copied wholesale +- repo-local claims point to docs or code surfaces rather than stale memory + summaries diff --git a/skills/reader-data-operations-plan/references/workflow.md b/skills/reader-data-operations-plan/references/workflow.md new file mode 100644 index 0000000..fea97fb --- /dev/null +++ b/skills/reader-data-operations-plan/references/workflow.md @@ -0,0 +1,115 @@ +# Workflow Reference + +Use this reference for DOP classification or DOP maintenance after the +top-level skill has routed the task. + +## Mode Selection + +- `classification` + - select a DOP data class, protocol candidate set, and stop conditions for a + dataset before experiment bootstrap +- `intake-support` + - check metadata minimums, transfer rules, and ready-spec gates for an + experiment that is being created or repaired +- `maintenance` + - update DOP docs, registry facts, or skill routes while keeping ownership + boundaries explicit + +Choose exactly one mode. If a task starts as classification and then becomes +workspace creation, finish the DOP decision and hand off to +`reader-experiment-bootstrap` instead of continuing inside this skill. + +## Read Order + +1. [Data Operations Plan](../../../docs/guides/data_operations_plan.md) +2. [Operating model](../../../docs/guides/data_operations_plan/operating_model.md) + when the task changes ownership or maintenance policy +3. [Data classes](../../../docs/guides/data_operations_plan/data_classes.md) + for class and protocol selection +4. [Metadata minimums](../../../docs/guides/data_operations_plan/metadata_minimums.md) + for required capture and stop conditions +5. [Transfer and verification](../../../docs/guides/data_operations_plan/transfer_and_verification.md) + for staging and evidence checks +6. [Experiment bootstrap](../../../docs/guides/experiment_bootstrap.md) only + after the DOP decision is made + +## Command Loop + +Use CLI output for stable facts. Start with the smallest command that answers +the current mode: + +```bash +uv run reader dop classes --format json +uv run reader dop ready-specs --format json +``` + +For intake-support work, add only the relevant preflight command: + +```bash +uv run reader protocols --example-config +uv run reader validate --no-files --format json +uv run reader validate --format json +uv run reader run --dry-run --format json +``` + +Use only the commands that match the current task. Do not execute a run just to +prove DOP classification. + +## Ownership Map + +| Fact type | Owned by | Maintenance rule | +| --- | --- | --- | +| DOP data-class ids, protocol candidates, stop conditions, transfer rules, ready-spec gates | `src/reader/workbench/dop/` | Update registry and targeted tests first when automation consumes the fact. | +| DOP overview and operator explanation | `docs/guides/data_operations_plan.md` and subpages | Keep pages short and link to the owned source instead of duplicating tables. | +| Experiment creation procedure | `docs/guides/experiment_bootstrap.md` and `reader-experiment-bootstrap` | Start after DOP classification and metadata stop conditions are known. | +| Agent routing | `skills/reader-data-operations-plan/` | Route to docs and CLI; do not become a long policy document. | +| Executable assay semantics | `src/reader/protocols/` | Add or change protocols only when intake policy is not enough. | +| Generated evidence | `outputs/manifests/records.json` | Verify outputs through records; do not use generated files as source inputs. | + +## Merelogic Principle Map + +- Data classes: + - use a prescribed order and simple criteria so classification is faster than + inventing local rules +- Requirements: + - capture usage expectations, data requirements, consumers, sources, existing + infrastructure, and user constraints before designing the route +- Design: + - keep the storage, metadata, compute, version-control, transfer, and + convention decisions distinct from changing assay details +- Configuration: + - maintain data categories, canonical names, role references, and technical + references as living facts +- Instructions: + - make the first-use path complete and the later-reference path short +- Maintenance: + - expect updates from real use; do not treat the first DOP pass as final + +In `reader`, only the repo-local subset belongs in code or docs. Organization +ownership, training rituals, retention, enterprise catalog policy, and ELN/LIMS +governance remain external unless a later feature gives them concrete behavior. + +## Maintenance Checklist + +For a DOP maintenance change: + +1. Name the changed concern: requirements, design, configuration, or + instructions. +2. Check whether the fact is automation-facing. If yes, update the DOP registry + and targeted tests. +3. Update the smallest doc page that explains the behavior. +4. Update skill routes only when the recurring agent workflow changes. +5. Add or refresh source rows when external DOP claims changed. +6. Check [Endpoint contracts](./endpoint-contracts.md) for required evidence. +7. Run the deterministic checks listed in the DOP operating model. + +## Stop Conditions + +Stop and ask or report blocked when: + +- the closest protocol would change assay semantics; +- metadata ambiguity affects identity, treatment, channels, controls, or source + provenance; +- a requested change would make generated outputs source material; +- the change requires lab-wide policy that `reader` cannot enforce; or +- the change would widen `reader/v7` without a separate schema decision. diff --git a/skills/reader-experiment-bootstrap/SKILL.md b/skills/reader-experiment-bootstrap/SKILL.md new file mode 100644 index 0000000..d73a612 --- /dev/null +++ b/skills/reader-experiment-bootstrap/SKILL.md @@ -0,0 +1,142 @@ +--- +name: reader-experiment-bootstrap +description: Bootstraps `reader` experiment workspaces by selecting a matching protocol/template, materializing raw assay inputs, building sample metadata, and running the preflight/run/verify loop. Use when creating a new experiment, cloning prior assay semantics into a new run, or auditing local experiments. Do not use for generic result interpretation, ad hoc config edits with no bootstrap objective, or hand-editing generated outputs. +metadata: + version: 0.3.0 + category: scientific-workbench + tags: [reader, experiments, metadata, google-drive, audit] +--- + +# Reader Experiment Bootstrap + +## Purpose + +Create or audit `reader` experiment workspaces without re-deriving the same +data-class, protocol, metadata, and verification decisions every time. + +## Scope + +In scope: + +- selecting the nearest matching `reader` experiment or protocol template +- classifying the dataset against the repo Data Operations Plan +- materializing raw inputs into a new experiment workspace +- building or rewriting metadata maps for the new run +- running `reader` preflight, execution, plot, and export steps +- auditing the local experiment list under `experiments/` + +Out of scope: + +- hand-editing generated `outputs/` +- generic scientific interpretation disconnected from experiment setup +- destructive data cleanup + +## Skill composition + +- Pair with `reader-data-operations-plan` when the task is primarily about + DOP classification, metadata stop conditions, or DOP registry/docs + maintenance before experiment creation. +- Pair with `gws-cli` when the raw input lives on Google Drive. +- Pair with `xlsx` when workbook inspection or metadata workbook rewriting is + required. +- Pair with `pragmatic-programming-principles` when changing experiment + config or audit structure. + +## Inputs + +- target experiment date/slug or enough context to create one +- data class, assay family, or a nearby prior experiment +- raw input location +- known metadata semantics: layout, treatments, controls, aliases + +Clarification policy: + +- ask for missing metadata only when it changes well identity, treatment + meaning, or control semantics +- otherwise proceed with explicit assumptions and record them + +## Workflow + +1. Start with the [Data Operations Plan](../../docs/guides/data_operations_plan.md) + overview before copying a template or authoring config. +2. Load [Data classes](../../docs/guides/data_operations_plan/data_classes.md) + only for the class/protocol decision, then load + [Metadata minimums](../../docs/guides/data_operations_plan/metadata_minimums.md) + or + [Transfer and verification](../../docs/guides/data_operations_plan/transfer_and_verification.md) + only when that part of the intake is active. +3. Treat [docs/guides/experiment_bootstrap.md](../../docs/guides/experiment_bootstrap.md) + as the primary workflow. This skill stays thin and routes to that guide. +4. Use [Workflow reference](./references/workflow.md) only for the concrete + command list while following the guide. +5. Use `uv run reader dop classes --format json` when an agent needs the stable + data-class and protocol-candidate registry instead of parsing prose tables. +6. Prefer JSON command output whenever another tool or agent will consume the + result. +7. For repo-wide local experiment checks, use + `uv run python tools/audit_local_experiments.py [--years [ ...]]` + and add `--include-non-active` only when draft/template configs are + intentionally in scope. +8. Prefer `reader notebook --mode none` when the task needs a generated review + scaffold without launching Marimo during intake or audit. + +## Guardrails + +- Do not invent plate semantics to get past metadata ambiguity. +- Do not treat repo fixture tests as proof that the local experiment list is + healthy. +- Do not copy generated outputs between experiments. +- Do not hide channel/schema drift; encode it explicitly in config. + +## Required Deliverables + +- chosen data class, template/protocol, and why +- raw input provenance and staged path +- metadata contract summary and unresolved assumptions +- preflight evidence +- execution + verification evidence +- local experiment audit summary when the task is repo-wide + +## Output Contract + +Return: + +1. Decision summary + - target experiment, chosen data class, chosen protocol/template, and copied prior context +2. Metadata contract + - key columns, control semantics, unresolved assumptions, and explicit user confirmations still needed +3. Preflight bundle + - commands run, JSON/table evidence, and whether the experiment is blocked, runnable, or draft +4. Execution + verification bundle + - run/plot/export/notebook steps executed plus records/artifact checks +5. Local audit bundle + - local experiment audit results when the task is repo-wide + +## Trigger Tests + +Should trigger: + +- "Bootstrap a new reader experiment from this workbook." +- "Find the nearest matching experiment and stage a new run." +- "Audit local experiments under experiments/." +- "Materialize raw inputs and wire metadata for a new reader run." + +Should not trigger: + +- "Interpret these results." +- "Hand-edit the generated outputs." +- "Tweak one plot label in an existing notebook." + +## Troubleshooting + +- Missing workbook or Drive provenance: + - keep the raw-input step explicit and incomplete instead of guessing filenames or source state +- Metadata ambiguity: + - stop when well identity, treatment meaning, or control semantics are unclear +- Preflight passes but outputs drift: + - re-run the smallest failing surface and verify `outputs/manifests/records.json` instead of trusting filesystem shape alone + +## Additional resources + +- [Workflow reference](./references/workflow.md) +- [External sources](./references/external-sources.md) diff --git a/skills/reader-experiment-bootstrap/references/external-sources.md b/skills/reader-experiment-bootstrap/references/external-sources.md new file mode 100644 index 0000000..0355f31 --- /dev/null +++ b/skills/reader-experiment-bootstrap/references/external-sources.md @@ -0,0 +1,8 @@ +# External sources + +| URL | Retrieved | Mapped update | +| --- | --- | --- | +| https://openai.com/business/guides-and-resources/how-openai-uses-codex/ | 2026-04-19 | Keep `AGENTS.md` short, use it as persistent context, and structure recurring tasks like issue-style workflows. | +| https://openai.com/index/harness-engineering/ | 2026-04-19 | Treat `AGENTS.md` as a table of contents, keep durable knowledge in repo-local docs, and use progressive disclosure instead of monolithic instructions. | +| https://openai.com/business/guides-and-resources/a-practical-guide-to-building-ai-agents/ | 2026-04-19 | Prefer one controlling workflow with specialized tools/skills only where they materially reduce ambiguity or repeated failures. | +| https://merelogic.net/data_operations_plans/how | 2026-05-01 | Classify lab datasets, keep requirements/configuration/instructions distinct, and route long-tail assays through explicit metadata and transfer rules before execution. | diff --git a/skills/reader-experiment-bootstrap/references/workflow.md b/skills/reader-experiment-bootstrap/references/workflow.md new file mode 100644 index 0000000..86ca7d6 --- /dev/null +++ b/skills/reader-experiment-bootstrap/references/workflow.md @@ -0,0 +1,84 @@ +# Workflow reference + +Use this reference when the top-level skill needs a reminder of the concrete +`reader` commands and files involved in experiment bootstrapping. + +## Discovery + +Start with the Data Operations Plan overview and load only the reference needed +for the current decision: + +- [Data classes](../../../docs/guides/data_operations_plan/data_classes.md) + for the class/protocol decision. +- [Metadata minimums](../../../docs/guides/data_operations_plan/metadata_minimums.md) + when building or reviewing sample maps and config metadata. +- [Transfer and verification](../../../docs/guides/data_operations_plan/transfer_and_verification.md) + when staging inputs and proving outputs. + +```bash +uv run reader dop classes --format json +uv run reader dop ready-specs --format json +uv run reader ls --root experiments --details --readiness +uv run reader ls --root experiments --details --readiness --format json +uv run reader inspect +uv run reader inspect --format json +uv run reader steps +uv run reader explain +uv run reader protocols +``` + +Prefer JSON whenever another tool or agent will consume the output. + +## Workspace creation + +```bash +uv run reader init ./experiments/YYYY/YYYYMMDD_shortslug --protocol +``` + +Use a copied nearest-neighbor config only when protocol defaults would lose +meaningful assay-specific behavior. + +## Input intake + +- Keep source filenames intact in `inputs/`. +- When Drive-backed intake is requested, use local `gws-account` commands rather + than browser narration. +- Inspect workbook sheet names and channel labels before editing config. + +## Metadata + +- Preserve full plate coverage when the existing assay family expects it. +- Keep blanks explicit only when the assay semantics require blank subtraction + or blank QC. +- Ask before resolving conflicting well assignments. +- Use the Data Operations Plan stop conditions when well identity, treatment + meaning, controls, or channel semantics are ambiguous. + +## Preflight + +```bash +uv run reader validate --no-files +uv run reader validate +uv run reader run --dry-run --format json +uv run reader plot --list +uv run reader export --list +``` + +## Execution + +```bash +uv run reader run +uv run reader plot +uv run reader export +uv run reader records +``` + +## Local experiment audit + +```bash +uv run python tools/audit_local_experiments.py [--years [ ...]] [--include-non-active] +``` + +This audit stages experiments into temporary copies so the original experiment +directories are not mutated during verification. By default it skips non-active +lifecycles and reports them separately. diff --git a/skills/reader-workbench-gardening/SKILL.md b/skills/reader-workbench-gardening/SKILL.md new file mode 100644 index 0000000..f460bf3 --- /dev/null +++ b/skills/reader-workbench-gardening/SKILL.md @@ -0,0 +1,223 @@ +--- +name: reader-workbench-gardening +description: Audit and garden `reader`'s workbench architecture, semantic boundaries, docs, and verification surfaces so the repo stays assay-extensible, decoupled, fail-fast, and easy to change. Use when auditing monolith pressure, assay lock-in, stale semantics or docs, CLI or harness drift, or maintainer ergonomics before or during change cycles. Do not use for new-experiment intake, one-off feature work with no architecture objective, result interpretation, or hand-editing generated outputs. +metadata: + version: 0.3.0 + category: scientific-workbench + tags: [reader, architecture, workbench, maintenance, harness, semantics] +--- + +# Reader Workbench Gardening + +## Purpose + +Keep `reader` easy to extend and operate by auditing information ownership, +reducing monolith pressure, and tightening maintainer-facing workbench +surfaces. + +## Scope + +In scope: + +- workbench architecture and information-architecture audits +- assay lock-in, semantic drift, and stale-doc cleanup +- CLI, JSON, and preflight/run/verify surface hardening +- small, reversible maintainability improvements + +Out of scope: + +- new experiment intake or metadata staging +- hand-editing generated `outputs/` +- one-off feature delivery with no architecture or maintainability objective +- scientific interpretation disconnected from repo structure +- branch, publish, or CI topology work with no workbench-architecture objective + +## Skill composition + +- Pair with `deep-introspection` to map current boundaries, ownership, and + runtime flow before changing them. +- Pair with `reader-data-operations-plan` when the architecture or docs pass is + specifically about DOP policy, DOP registry coverage, or DOP intake + contracts. +- Pair with `pragmatic-programming-principles` when boundary, contract, or + fail-fast decisions need to be made explicit. +- Pair with `code-review` when the main deliverable is severity-ranked + findings rather than code changes. +- Pair with `harness-engineering` when CLI, JSON, or end-to-end verification + surfaces need stronger contracts or evidence. +- Pair with `deslop` only when cleaning maintainer docs or skill prose after + the technical content is settled. + +## Inputs + +- target surface or audit scope +- desired mode: audit-only, docs-sync, boundary-hardening, or + surface-contracts +- known architectural pressure points or regressions +- representative experiment or protocol when runtime verification is needed +- explicit constraints: reliability, extensibility, delivery urgency, or + publish boundaries + +Clarification policy: + +- ask only when missing context changes the target boundary or verification + surface +- otherwise proceed with explicit assumptions and record them + +## Success Criteria + +- the cycle stays bounded to one ownership surface or one reversible slice +- findings and decisions are traceable to repository docs, code, or CLI + evidence +- the chosen slice reduces coupling, drift, or retry cost without widening + `reader/v7` +- verification matches the changed surface and skipped checks are explicit +- output separates verified facts, inferences, and follow-up work +- knowledge-integrity, autonomy-capability, and architecture-invariants checks + remain satisfiable through deterministic repo-local commands + +## Harness endpoints + +Use [Endpoint contracts](./references/endpoint-contracts.md) to keep the skill +aligned with: + +- `knowledge-integrity` +- `autonomy-capability` +- `architecture-invariants` + +## Workflow + +1. Treat + [docs/guides/workbench_gardening.md](../../docs/guides/workbench_gardening.md) + as the primary workflow. This skill stays thin and routes to that guide. +2. Use [Workflow reference](./references/workflow.md) to choose the mode and + read order before making claims or edits. +3. If the task is really experiment intake, route to + `reader-experiment-bootstrap`. If the task is really DOP classification, + registry coverage, or DOP policy maintenance, route to + `reader-data-operations-plan`. If branch state, publish flow, or CI topology + becomes material, continue into + [docs/repo-maintenance.md](../../docs/repo-maintenance.md). +4. Use [Checklists](./references/checklists.md) to look for monolith pressure, + assay lock-in, stale semantics, doc drift, harness drift, and directory + boundary violations. +5. Use [Endpoint contracts](./references/endpoint-contracts.md) to confirm + which evidence surfaces must hold for this skill cycle. +6. Use [Verification](./references/verification.md) to choose the smallest + verification bundle that matches the risk of the change. +7. Use [Test matrix](./references/test-matrix.md) for trigger checks, + deterministic checks, and repeated-run consistency checks. +8. Use [External sources](./references/external-sources.md) when the cycle + introduces claims that rely on tooling, standards, or behavior outside this + repository. +9. If tracked docs or code changed, close through + [docs/repo-change-gate.md](../../docs/repo-change-gate.md). + +## Guardrails + +- Do not restate `ARCHITECTURE.md` or `DESIGN.md` when the task only needs to + point at their invariants. +- Do not treat a single assay family's needs as the workbench architecture. +- Do not hand-edit generated artifacts under `experiments/**/outputs/`. +- Do not widen public config or CLI surfaces when a narrower semantic move will + solve the problem. +- Do not bundle unrelated refactors into the same gardening cycle. +- Do not let a gardening pass turn into generic repo-maintenance or release + work unless the task explicitly expands. + +## Required Deliverables + +- scoped audit target, mode, and invariants +- endpoint contract coverage +- evidence summary: canonical docs, code surfaces, or CLI probes consulted +- ownership map or change target summary +- pressure or drift findings by category +- chosen smallest reversible slice or explicit audit-only recommendation +- verification evidence and skipped checks +- assumptions and residual risks + +## Output Contract + +Return: + +1. Decision summary + - target surface, mode, paired skills or docs used, invariants in scope, and + selected endpoints +2. Evidence summary + - canonical docs, code paths, and CLI probes used for this cycle +3. Endpoint coverage + - how `knowledge-integrity`, `autonomy-capability`, and + `architecture-invariants` were satisfied or deferred +4. Ownership and pressure summary + - where meaning, mechanics, and docs currently live plus the drift or + monolith signal +5. Selected slice + - audit-only findings or the concrete change surface chosen, including why a + broader slice was rejected +6. Verification bundle + - commands run, CLI evidence, and skipped checks +7. Residual risks + - deferred work, lock-in risk, and next maintenance pass + +## Trigger Tests + +Should trigger: + +- "Audit `reader` for monolith pressure and assay lock-in." +- "Garden this workbench so it stays easy to extend." +- "Review the information architecture around protocols, plugins, and docs." +- "Tighten the CLI and verification surfaces after this maintainer cleanup." +- "Sync stale maintainer docs back to the current runtime and architecture." + +Should not trigger: + +- "Bootstrap a new experiment from these inputs." +- "Interpret the assay results." +- "Hand-edit the generated notebook outputs." +- "Add this feature with no architecture or maintainability objective." +- "Handle release, branch, or CI publish steps for this repo." + +## Examples + +Example 1: audit-only + +- User says: "Audit the protocol and notebook surfaces for monolith pressure." +- Result: ownership map, pressure findings, and a smallest-next-slice + recommendation with representative CLI evidence. + +Example 2: docs-sync + +- User says: "Bring the maintainer docs back in sync with current `reader/v7` + behavior." +- Result: stale-route findings, canonical-doc fixes, and docs-only + verification evidence. + +Example 3: boundary-hardening + +- User says: "Split this growing helper so a new assay family does not have to + land in the same file." +- Result: one reversible boundary cut plus targeted verification and residual + risk notes. + +## Troubleshooting + +- Scope keeps expanding: + - narrow to one ownership boundary or one CLI/runtime surface +- Findings are correct but not actionable: + - pick the smallest reversible slice and defer the rest +- The task overlaps adjacent routes: + - hand experiment intake to `reader-experiment-bootstrap` and hand + branch/publish or CI work to `docs/repo-maintenance.md` +- Verification is too broad: + - use the smallest bundle from `references/verification.md` +- Docs start duplicating architecture prose: + - point to the canonical document instead of restating it + +## Additional resources + +- [Workflow reference](./references/workflow.md) +- [Endpoint contracts](./references/endpoint-contracts.md) +- [Checklists](./references/checklists.md) +- [Verification](./references/verification.md) +- [Test matrix](./references/test-matrix.md) +- [External sources](./references/external-sources.md) diff --git a/skills/reader-workbench-gardening/references/checklists.md b/skills/reader-workbench-gardening/references/checklists.md new file mode 100644 index 0000000..25a890b --- /dev/null +++ b/skills/reader-workbench-gardening/references/checklists.md @@ -0,0 +1,70 @@ +# Gardening checklists + +Use these lenses to keep the audit concrete and to avoid collapsing the task +into a vague "architecture review." + +## Information ownership + +- Does `reader/v7` stay the authored source of truth? +- Are assay-facing semantics owned by protocols rather than plugins? +- Is domain math or parsing living in `domains/` instead of CLI or plugin glue? +- Are docs pointing to canonical sources instead of forking duplicate guidance? + +## Monolith pressure + +- Are protocol families or compiler branches collecting too many + responsibilities? +- Are notebook or report flows hiding domain semantics inside one large helper? +- Does one file or module own configuration, behavior, and rendering at once? +- Would a new assay family force edits across too many unrelated files? + +## Assay lock-in + +- Do naming, defaults, or CLI surfaces assume one assay family is the norm? +- Are new behaviors exposed semantically or through raw plugin-shaped config? +- Would adding a new assay require compatibility shims instead of new protocol + ownership? + +## Legacy creep and silent fallback + +- Are removed legacy keys or behaviors trying to return through compatibility + shims? +- Does any changed surface quietly coerce, infer, or ignore invalid states + instead of failing fast? +- Do JSON or CLI surfaces hide empty or invalid selections as success? + +## Docs and semantics drift + +- Do docs still describe the current `reader/v7` surface? +- Are removed legacy keys or behaviors still documented or silently accepted? +- Does `AGENTS.md` route to the current maintainer workflow instead of stale + instructions? + +## Harness and CLI drift + +- Are JSON surfaces deterministic and aligned with table surfaces? +- Can agents discover, inspect, validate, and dry-run without mutation? +- Do representative commands fail fast when invariants are violated? +- Are records and outputs still traceable through manifest-backed provenance? + +## Directory and boundary drift + +- Are generated outputs still treated as generated? +- Does code placement match the layer described in `ARCHITECTURE.md`? +- Are new helpers reducing coupling, or just moving the monolith around? + +## Adjacent route boundaries + +- Is this really new-experiment intake, metadata staging, or local experiment + auditing instead of workbench gardening? +- Is the real task plugin implementation or protocol feature delivery rather + than boundary hardening? +- Has the task expanded into branch, publish, or CI topology and therefore + needs `docs/repo-maintenance.md`? + +## Evidence capture + +- Which canonical docs were checked first? +- Which code paths or modules support the claim? +- Which CLI probes confirm the runtime-facing statement? +- Which findings are verified fact versus inference or follow-up hypothesis? diff --git a/skills/reader-workbench-gardening/references/endpoint-contracts.md b/skills/reader-workbench-gardening/references/endpoint-contracts.md new file mode 100644 index 0000000..c358d2e --- /dev/null +++ b/skills/reader-workbench-gardening/references/endpoint-contracts.md @@ -0,0 +1,49 @@ +# Endpoint contracts + +Use these endpoint contracts when hardening or validating +`reader-workbench-gardening`. + +Start with these three endpoints: + +## `knowledge-integrity` + +This skill should keep repo-local maintainer guidance current, cross-linked, +and anchored to canonical docs. + +Required evidence: + +- [docs/guides/workbench_gardening.md](../../../docs/guides/workbench_gardening.md) + remains the primary workflow +- [skills/reader-workbench-gardening/SKILL.md](../SKILL.md) routes to the guide + instead of duplicating it +- [references/external-sources.md](./external-sources.md) records official + source rows when external claims shape the contract +- `uv run python tools/check_docs.py` passes after docs or routing edits + +## `autonomy-capability` + +This skill should be executable by an agent with a small, deterministic read +surface and explicit deliverables. + +Required evidence: + +- top-level `SKILL.md` stays router-first and points deeper detail into + `references/` +- a gardening cycle can choose one explicit mode and one bounded slice +- [references/test-matrix.md](./test-matrix.md) provides should/should-not + trigger prompts and consistency checks +- `uv run python tools/audit_repo_skills.py` passes + +## `architecture-invariants` + +This skill must reinforce `reader`'s workbench architecture instead of +smearing boundaries across docs and instructions. + +Required evidence: + +- the skill routes new-experiment intake to `reader-experiment-bootstrap` +- the skill routes publish, branch, or CI topology work to + [docs/repo-maintenance.md](../../../docs/repo-maintenance.md) +- the skill keeps generated outputs out of scope +- the guide points to `ARCHITECTURE.md`, `DESIGN.md`, `QUALITY.md`, and + `RELIABILITY.md` as canonical invariants diff --git a/skills/reader-workbench-gardening/references/external-sources.md b/skills/reader-workbench-gardening/references/external-sources.md new file mode 100644 index 0000000..b2e49e6 --- /dev/null +++ b/skills/reader-workbench-gardening/references/external-sources.md @@ -0,0 +1,21 @@ +# External sources + +This repo-local skill is grounded first in repository sources: + +- [ARCHITECTURE.md](../../../ARCHITECTURE.md) +- [DESIGN.md](../../../DESIGN.md) +- [QUALITY.md](../../../QUALITY.md) +- [RELIABILITY.md](../../../RELIABILITY.md) +- [docs/repo-change-gate.md](../../../docs/repo-change-gate.md) + +For the skill-development and harness pass on this skill, these official +sources informed the contract: + +| URL | Retrieved | Mapped update | +| --- | --- | --- | +| https://openai.com/index/harness-engineering/ | 2026-04-19 | Add explicit endpoint contracts, deterministic validation, and feedback-loop framing rather than relying on generic architecture prose. | +| https://openai.com/business/guides-and-resources/a-practical-guide-to-building-ai-agents/ | 2026-04-19 | Keep the workflow incremental, tool-backed, and guardrail-driven, with clear instructions and bounded orchestration rather than jumping straight to a broad multi-agent pattern. | +| https://openai.com/business/guides-and-resources/how-openai-uses-codex/ | 2026-04-19 | Keep persistent repo context and agent instructions explicit, and improve the local environment with deterministic validation commands that reduce repeated errors. | + +When future gardening cycles depend on behavior, standards, or tooling outside +this repository, add new rows with URL, retrieval date, and mapped update. diff --git a/skills/reader-workbench-gardening/references/test-matrix.md b/skills/reader-workbench-gardening/references/test-matrix.md new file mode 100644 index 0000000..eb977c0 --- /dev/null +++ b/skills/reader-workbench-gardening/references/test-matrix.md @@ -0,0 +1,53 @@ +# Test matrix + +Use this matrix when validating `reader-workbench-gardening` after edits. + +## Trigger checks + +Should trigger: + +- "Audit `reader` for monolith pressure and assay lock-in." +- "Sync stale maintainer docs back to current runtime behavior." +- "Harden the preflight and JSON surfaces so agents can verify changes faster." +- "Choose one small boundary cut to keep the workbench assay-extensible." + +Should not trigger: + +- "Bootstrap a new experiment from this workbook." +- "Interpret these assay results." +- "Implement this new plugin." +- "Handle release, branch, or CI publish steps." + +## Functional checks + +- the top-level skill routes to the primary guide before deeper references +- the skill names one mode per cycle: `audit-only`, `docs-sync`, + `boundary-hardening`, or `surface-contracts` +- the output contract requires evidence, a selected slice, verification, and + residual risks +- adjacent routes are explicit for experiment bootstrap, plugin work, and repo + maintenance + +## Deterministic checks + +Run: + +```bash +uv run python tools/audit_repo_skills.py +uv run python tools/check_docs.py +git diff --check +``` + +For repeated-run consistency, use the same prompt three times and confirm the +response keeps: + +- the same mode selection +- the same adjacent-route decision +- the same required deliverable structure + +## Content-correctness checks + +- external source rows have URL, retrieval date, and mapped update +- external claims are official-source-backed when the skill depends on them +- repo-local invariants are cited from canonical repo docs instead of restated + from memory diff --git a/skills/reader-workbench-gardening/references/verification.md b/skills/reader-workbench-gardening/references/verification.md new file mode 100644 index 0000000..6d49c91 --- /dev/null +++ b/skills/reader-workbench-gardening/references/verification.md @@ -0,0 +1,80 @@ +# Verification + +Choose the smallest bundle that proves the gardening cycle did not weaken the +workbench. + +## Docs-only or routing-only changes + +```bash +uv run python tools/audit_repo_skills.py +uv run python tools/check_docs.py +git diff --check +``` + +Also confirm that changed routes point to current canonical docs and that any +embedded commands still match the current CLI surface. + +## Skill and maintainer-doc changes + +```bash +uv run python tools/audit_repo_skills.py +uv run python tools/check_docs.py +git diff --check +``` + +If the changed guidance includes concrete commands, run the smallest command in +scope to confirm it still matches reality. + +## Code, CLI, or contract changes + +```bash +uv run ruff check . +uv run ruff format . --check +uv run pytest -q +uv run python -m compileall src/reader +git diff --check +``` + +Add a representative CLI preflight bundle when the change affects runtime or +maintainer-facing command surfaces: + +```bash +uv run reader ls --root experiments --details --readiness --format json +uv run reader inspect --format json +uv run reader explain --format json +uv run reader validate --no-files --format json +uv run reader run --dry-run --format json +``` + +## End-to-end or experiment-surface changes + +Start with the code and CLI bundle above, then add the smallest repo marker +that matches the risk: + +```bash +uv run pytest -q -m repo_matrix +uv run pytest -q -m integration +uv run pytest -q -m active_experiments +``` + +Use only the smallest marker set that proves the changed surface. If plots, +exports, or notebooks changed, add the matching `reader plot --list`, +`reader export --list`, `reader records`, or notebook mode command for one +representative experiment. + +## Continue into repo maintenance when + +- branch state or remote publish readiness is part of the task +- CI workflow behavior or policy changed +- the gardening slice now spans multiple unrelated repo surfaces +- the change needs the broader maintainer workflow in + `docs/repo-maintenance.md` + +## Evidence expectations + +- name the experiment or protocol used for representative CLI checks +- name the mode used for the gardening cycle +- name which endpoint contracts were in scope +- state which checks were skipped and why +- do not claim success from filesystem shape alone when records or manifests + are available diff --git a/skills/reader-workbench-gardening/references/workflow.md b/skills/reader-workbench-gardening/references/workflow.md new file mode 100644 index 0000000..da84e7a --- /dev/null +++ b/skills/reader-workbench-gardening/references/workflow.md @@ -0,0 +1,69 @@ +# Workflow reference + +Use this reference while following +[docs/guides/workbench_gardening.md](../../../docs/guides/workbench_gardening.md). + +## Mode selection + +- `audit-only` + - map ownership, identify pressure or drift, and stop with a ranked next + slice +- `docs-sync` + - fix stale maintainer docs, routing, and command examples without changing + runtime behavior +- `boundary-hardening` + - make one small code or documentation cut that reduces concentration, + coupling, or lock-in +- `surface-contracts` + - tighten CLI, JSON, or preflight/run/verify evidence surfaces for agents and + maintainers + +If the task expands into branch state, publish flow, or CI topology, continue +into [docs/repo-maintenance.md](../../../docs/repo-maintenance.md). + +## Read order + +1. [ARCHITECTURE.md](../../../ARCHITECTURE.md) +2. [DESIGN.md](../../../DESIGN.md) +3. [QUALITY.md](../../../QUALITY.md) when the task changes quality or review + expectations +4. [RELIABILITY.md](../../../RELIABILITY.md) when the task changes preflight, + run, verify, or recovery behavior +5. [docs/repo-maintenance.md](../../../docs/repo-maintenance.md) when the task + crosses repo boundaries, CI, or publish surfaces +6. [docs/repo-change-gate.md](../../../docs/repo-change-gate.md) before + finalizing tracked changes + +## Minimal audit loop + +1. State the workbench invariant or boundary under review. +2. Trace the surface through authored config, protocols, compiled declaration, + runtime execution, and workbench outputs. +3. Use + `uv run reader ls --root experiments --details --readiness --format json` + when the cycle needs repo-wide discovery evidence. +4. Use + `uv run reader inspect --format json` and + `uv run reader explain --format json` when runtime + mapping matters. +5. Record monolith pressure, assay lock-in, stale semantics, doc drift, or + harness drift using [checklists.md](./checklists.md). +6. Confirm which endpoint contracts matter using + [endpoint-contracts.md](./endpoint-contracts.md). +7. Choose the smallest reversible slice that improves ownership clarity or + verification discipline. +8. Run the smallest matching verification bundle from + [verification.md](./verification.md). + +## Default commands + +```bash +uv run reader ls --root experiments --details --readiness --format json +uv run reader inspect --format json +uv run reader explain --format json +uv run reader validate --no-files --format json +uv run reader run --dry-run --format json +``` + +Add `plot --list`, `export --list`, `records`, or a real execution slice only +when the gardening cycle changes those surfaces. diff --git a/src/reader/domains/logic/sfxi/setpoint_scatter.py b/src/reader/domains/logic/sfxi/setpoint_scatter.py new file mode 100644 index 0000000..be662a1 --- /dev/null +++ b/src/reader/domains/logic/sfxi/setpoint_scatter.py @@ -0,0 +1,261 @@ +""" +SFXI setpoint-scatter scoring and plotting helpers. +""" + +from __future__ import annotations + +import importlib +import math +from collections.abc import Mapping, Sequence +from typing import Any + +import pandas as pd + +from reader.errors import SFXIError +from reader.plotting.sinks import PlotFigure + +VEC8_COLUMNS = ( + "v00", + "v10", + "v01", + "v11", + "y00_star", + "y10_star", + "y01_star", + "y11_star", +) +READER_SUPPORTED_SFXI_API_VERSION = "1" + + +def _load_dnadesign_sfxi_api(): + try: + module = importlib.import_module("dnadesign.opal.api.sfxi") + except ImportError as exc: + _raise_dnadesign_sfxi_import_error(exc) + api_version = getattr(module, "SFXI_API_VERSION", None) + if str(api_version) != READER_SUPPORTED_SFXI_API_VERSION: + raise SFXIError( + "Unsupported dnadesign SFXI API version " + f"{api_version!r}; reader expects {READER_SUPPORTED_SFXI_API_VERSION!r}. " + "Update the reader lockfile or install a compatible dnadesign build." + ) + for attr in ("SFXIScoringConfig", "score_vec8"): + _require_public_attr(module, attr) + return module + + +def _require_public_attr(module, attr: str) -> None: + try: + getattr(module, attr) + except AttributeError as exc: + raise SFXIError(f"dnadesign.opal.api.sfxi is missing required public API: {attr}.") from exc + except ImportError as exc: + _raise_dnadesign_sfxi_import_error(exc) + + +def _raise_dnadesign_sfxi_import_error(exc: ImportError) -> None: + raise SFXIError( + "SFXI setpoint scatter requires the public dnadesign SFXI API. " + "Install or sync the optional dependency with `uv sync --extra dnadesign` " + "or install `reader[dnadesign]`." + ) from exc + + +def require_dnadesign_sfxi_api(): + return _load_dnadesign_sfxi_api() + + +def _require_vec8_columns(df: pd.DataFrame) -> None: + missing = [col for col in VEC8_COLUMNS if col not in df.columns] + if missing: + raise SFXIError(f"SFXI setpoint scatter requires vec8 columns: {', '.join(missing)}.") + + +def _coerce_setpoints(setpoints: Mapping[str, Sequence[float]]) -> dict[str, list[float]]: + if not isinstance(setpoints, Mapping) or not setpoints: + raise SFXIError("SFXI setpoint scatter requires at least one named setpoint.") + out: dict[str, list[float]] = {} + for raw_name, raw_vector in setpoints.items(): + name = str(raw_name).strip() + if not name: + raise SFXIError("SFXI setpoint names must be non-empty strings.") + vector = [float(value) for value in raw_vector] + if len(vector) != 4 or not all(math.isfinite(value) for value in vector): + raise SFXIError(f"SFXI setpoint {name!r} must be a finite length-4 vector.") + out[name] = vector + return out + + +def _metadata_columns(df: pd.DataFrame) -> list[str]: + preferred = [ + "design_id", + "sequence", + "id", + "sequence_source_id", + "experiment_id", + "experiment_date", + "time_selected_h", + "reference_design_id", + "r_logic", + "flat_logic", + ] + return [col for col in preferred if col in df.columns] + + +def score_sfxi_setpoints( + vec8: pd.DataFrame, + *, + setpoints: Mapping[str, Sequence[float]], + scaling_percentile: int = 95, + scaling_min_n: int = 5, + scaling_eps: float = 1.0e-8, + logic_exponent_beta: float = 1.0, + intensity_exponent_gamma: float = 1.0, + intensity_log2_offset_delta: float = 0.0, +) -> pd.DataFrame: + _require_vec8_columns(vec8) + setpoint_map = _coerce_setpoints(setpoints) + api = require_dnadesign_sfxi_api() + + vec8_values = vec8.loc[:, VEC8_COLUMNS].astype(float).to_numpy() + metadata_cols = _metadata_columns(vec8) + metadata = vec8.loc[:, metadata_cols].reset_index(drop=True) + + frames: list[pd.DataFrame] = [] + for setpoint_name, setpoint_vector in setpoint_map.items(): + cfg = api.SFXIScoringConfig( + setpoint_vector=tuple(setpoint_vector), + scaling_percentile=int(scaling_percentile), + scaling_min_n=int(scaling_min_n), + scaling_eps=float(scaling_eps), + logic_exponent_beta=float(logic_exponent_beta), + intensity_exponent_gamma=float(intensity_exponent_gamma), + intensity_log2_offset_delta=float(intensity_log2_offset_delta), + ) + result = api.score_vec8(vec8_values, cfg, scaling_vec8=vec8_values) + scored = pd.DataFrame(result.to_records()) + scored.insert(0, "setpoint_name", setpoint_name) + frames.append(pd.concat([metadata.copy(), scored], axis=1)) + + if not frames: + return pd.DataFrame() + return pd.concat(frames, axis=0, ignore_index=True) + + +def _layout_for_panels(n_panels: int) -> tuple[int, int]: + if n_panels <= 1: + return 1, 1 + if n_panels <= 4: + return 2, 2 + return math.ceil(n_panels / 3), 3 + + +def _setpoint_label(row: pd.Series) -> str: + vector = row.get("setpoint_vector") + if isinstance(vector, list): + return "[" + ",".join(f"{float(value):g}" for value in vector) + "]" + return str(vector) + + +def render_sfxi_setpoint_scatter( + *, + vec8: pd.DataFrame, + setpoints: Mapping[str, Sequence[float]], + scaling_percentile: int = 95, + scaling_min_n: int = 5, + scaling_eps: float = 1.0e-8, + logic_exponent_beta: float = 1.0, + intensity_exponent_gamma: float = 1.0, + intensity_log2_offset_delta: float = 0.0, + fig_kwargs: Mapping[str, Any] | None = None, + filename: str | None = None, + formats: Sequence[str] = ("pdf",), + dpi: int = 300, + label_points: bool = False, +) -> list[PlotFigure]: + try: + import matplotlib.pyplot as plt # noqa: PLC0415 + except Exception as exc: # pragma: no cover - dependency guard + raise SFXIError("SFXI setpoint scatter requires matplotlib.") from exc + + scored = score_sfxi_setpoints( + vec8, + setpoints=setpoints, + scaling_percentile=scaling_percentile, + scaling_min_n=scaling_min_n, + scaling_eps=scaling_eps, + logic_exponent_beta=logic_exponent_beta, + intensity_exponent_gamma=intensity_exponent_gamma, + intensity_log2_offset_delta=intensity_log2_offset_delta, + ) + if scored.empty: + raise SFXIError("SFXI setpoint scatter has no rows to plot.") + + setpoint_names = list(dict.fromkeys(scored["setpoint_name"].astype(str).tolist())) + n_rows, n_cols = _layout_for_panels(len(setpoint_names)) + fig_opts = dict(fig_kwargs or {}) + figsize = fig_opts.pop("figsize", (4.4 * n_cols, 3.9 * n_rows)) + fig_opts.setdefault("constrained_layout", True) + fig, axes_grid = plt.subplots(n_rows, n_cols, figsize=figsize, squeeze=False, **fig_opts) + axes = [ax for row in axes_grid for ax in row] + mappable = None + for ax, setpoint_name in zip(axes, setpoint_names, strict=False): + subset = scored[scored["setpoint_name"].astype(str) == setpoint_name].copy() + sort_cols = [col for col in ("design_id", "time_selected_h") if col in subset.columns] + if sort_cols: + subset = subset.sort_values(sort_cols) + if "design_id" in subset.columns and "time_selected_h" in subset.columns: + for _, group in subset.groupby("design_id", dropna=False): + if len(group) > 1: + ax.plot( + group["logic_fidelity"], + group["effect_scaled"], + color="0.72", + linewidth=0.9, + zorder=1, + ) + mappable = ax.scatter( + subset["logic_fidelity"], + subset["effect_scaled"], + c=subset["sfxi"], + cmap="viridis", + vmin=0.0, + vmax=1.0, + s=42, + edgecolors="white", + linewidths=0.5, + zorder=2, + ) + if label_points and "design_id" in subset.columns: + for _, row in subset.iterrows(): + ax.annotate( + str(row["design_id"]), + (float(row["logic_fidelity"]), float(row["effect_scaled"])), + xytext=(3, 3), + textcoords="offset points", + fontsize=7, + ) + label = _setpoint_label(subset.iloc[0]) + ax.set_title(f"{setpoint_name} {label}") + ax.set_xlabel("logic_fidelity") + ax.set_ylabel("effect_scaled") + ax.set_xlim(-0.03, 1.03) + ax.set_ylim(-0.03, 1.03) + ax.grid(True, color="0.9", linewidth=0.8) + + for ax in axes[len(setpoint_names) :]: + ax.axis("off") + if mappable is not None: + fig.colorbar(mappable, ax=axes[: len(setpoint_names)], label="sfxi", shrink=0.88) + + base = filename or "sfxi_setpoint_scatter" + return [PlotFigure(fig=fig, filename=base, ext=str(ext).lstrip(".").lower(), dpi=dpi) for ext in formats] + + +__all__ = [ + "READER_SUPPORTED_SFXI_API_VERSION", + "VEC8_COLUMNS", + "render_sfxi_setpoint_scatter", + "require_dnadesign_sfxi_api", + "score_sfxi_setpoints", +] diff --git a/src/reader/domains/logic/sfxi/triptych_sequence.py b/src/reader/domains/logic/sfxi/triptych_sequence.py new file mode 100644 index 0000000..80fc467 --- /dev/null +++ b/src/reader/domains/logic/sfxi/triptych_sequence.py @@ -0,0 +1,672 @@ +""" +SFXI triptych sequence figure assembly. + +Reader owns the plate-reader figure composition. Sequence rendering and +artifact publication are delegated to narrow sibling adapters. +""" + +from __future__ import annotations + +import hashlib +import json +import math +from collections.abc import Mapping, Sequence +from dataclasses import asdict +from pathlib import Path +from tempfile import mkdtemp +from typing import Any + +import numpy as np +import pandas as pd +from matplotlib import pyplot as plt +from matplotlib.backends.backend_pdf import PdfPages + +from reader.domains.logic.sfxi.triptych_sequence_dnadesign import ( + DNADESIGN_SEQUENCE_PANEL_CONTRACT_ID, + READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION, + require_dnadesign_sequence_panel_api, +) +from reader.domains.logic.sfxi.triptych_sequence_dnadesign import ( + draw_sequence_panel as _draw_sequence_panel, +) +from reader.domains.logic.sfxi.triptych_sequence_dnadesign import ( + load_usr_sequence_rows as _load_usr_rows, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + TRIPTYCH_BUNDLE_CONTRACT_VERSION, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + bundle_paths as _bundle_paths, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + cleanup_staging_root as _cleanup_staging_root, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + manifest_payload as _manifest_payload, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + publish_bundle as _publish_bundle, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + relative_to_outputs as _relative_to_outputs, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + staging_parent as _staging_parent, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + staging_paths as _staging_paths, +) +from reader.domains.logic.sfxi.triptych_sequence_outputs import ( + write_movie as _write_movie, +) +from reader.domains.plate_reader.analysis.timepoints import infer_induction_time_h +from reader.domains.plate_reader.plots.panels import ( + draw_time_series_panel, + marker_map_for_levels, + select_snapshot_rows, + summarize_snapshot_values, +) +from reader.errors import SFXIError + +DEFAULT_TREATMENTS = ( + {"state": "00", "label": "negative", "short_label": "00", "color": "#8E8E8E"}, + {"state": "10", "label": "3% EtOH", "short_label": "EtOH", "color": "#4C78A8"}, + {"state": "01", "label": "100 nM ciprofloxacin", "short_label": "Cipro", "color": "#F58518"}, + {"state": "11", "label": "3% EtOH + 100 nM ciprofloxacin", "short_label": "Dual", "color": "#54A24B"}, +) + + +def render_sfxi_triptych_sequence_bundle( + *, + ctx, + vec8: pd.DataFrame, + assay: pd.DataFrame, + config: Mapping[str, Any], +) -> list[Path]: + baserender, usr = require_dnadesign_sequence_panel_api() + cfg = _normalize_config(config) + _require_columns(vec8, ["design_id"], where="sfxi triptych vec8") + _require_columns( + assay, + [cfg["design_col"], cfg["time_col"], "channel", "value", cfg["treatment_col"]], + where="sfxi triptych assay", + ) + sequence_rows = _load_usr_rows(usr=usr, cfg=cfg, exp_dir=ctx.exp_dir) + plan = _build_candidate_plan(vec8=vec8, sequence_rows=sequence_rows, cfg=cfg) + if plan.empty: + raise SFXIError("SFXI triptych sequence has no candidate rows to render.") + if cfg["limit"] is not None: + plan = plan.iloc[: int(cfg["limit"])].copy() + + scales = _compute_render_scales(assay=assay, render_plan=plan, cfg=cfg) + bundle_id = _slug(cfg["bundle_id"]) + final = _bundle_paths(ctx=ctx, bundle_id=bundle_id, movie_enabled=cfg["movie_enabled"]) + staging_root = Path(mkdtemp(prefix=f"{bundle_id}__", dir=str(_staging_parent(ctx.outputs_dir)))) + staging = _staging_paths(staging_root=staging_root, bundle_id=bundle_id, movie_enabled=cfg["movie_enabled"]) + try: + records: list[dict[str, Any]] = [] + with PdfPages(staging["pdf"]) as pdf: + for _, row in plan.iterrows(): + fig, record = _render_one( + baserender=baserender, + assay=assay, + row=row, + cfg=cfg, + scales=scales, + ) + png_path = staging["frames_dir"] / f"{_slug(row['display_label'])}.png" + fig.savefig(png_path, dpi=cfg["dpi"], facecolor="white") + pdf.savefig(fig, facecolor="white") + if len(records) == 0: + fig.savefig(staging["poster"], dpi=cfg["dpi"], facecolor="white") + _close_figure(fig) + records.append({**record, "png_path": _relative_to_outputs(png_path, outputs_dir=ctx.outputs_dir)}) + + movie_path = _write_movie(cfg=cfg, records=records, staging=staging, outputs_dir=ctx.outputs_dir) + index = pd.DataFrame(records) + index.to_csv(staging["index"], index=False) + manifest = _manifest_payload( + ctx=ctx, + cfg=cfg, + records=records, + outputs=final, + movie_path=final.get("movie") if movie_path is not None else None, + scales=scales, + ) + staging["manifest"].write_text(json.dumps(manifest, indent=2, sort_keys=True) + "\n", encoding="utf-8") + _publish_bundle(staging=staging, final=final) + except Exception: + _cleanup_staging_root(staging_root) + raise + + paths = [final["poster"], final["pdf"], final["index"], final["manifest"]] + if cfg["movie_enabled"]: + paths.append(final["movie"]) + paths.extend(sorted(final["frames_dir"].glob("*.png"))) + return paths + + +def _normalize_config(config: Mapping[str, Any]) -> dict[str, Any]: + cfg = dict(config or {}) + sequence_source = dict(cfg.get("sequence_source") or {}) + channels = dict(cfg.get("channels") or {}) + sequence_panel = dict(cfg.get("sequence_panel") or {}) + time_series = dict(cfg.get("time_series") or {}) + axis_limits = dict(cfg.get("axis_limits") or {}) + treatments = list(cfg.get("treatments") or DEFAULT_TREATMENTS) + if not sequence_source.get("dataset"): + raise SFXIError("sfxi_triptych_sequence requires sequence_source.dataset.") + return { + "bundle_id": str(cfg.get("bundle_id") or "sfxi_triptych_sequence"), + "design_col": str(cfg.get("design_col") or "design_id"), + "sequence_id_col": str(cfg.get("sequence_id_col") or "id"), + "sequence_col": str(cfg.get("sequence_col") or "sequence"), + "time_col": str(cfg.get("time_col") or "time"), + "treatment_col": str(cfg.get("treatment_col") or "treatment_alias"), + "snapshot_target_time_h": _finite_float(cfg.get("snapshot_target_time_h"), default=12.0), + "induction_time_h": _finite_float(cfg.get("induction_time_h"), default=None), + "time_tolerance_h": _finite_float(cfg.get("time_tolerance_h"), default=0.51), + "limit": cfg.get("limit"), + "dpi": int(cfg.get("dpi") or 220), + "movie_enabled": bool(cfg.get("movie_enabled", False)), + "movie_fps": float(cfg.get("movie_fps", 0.85)), + "channels": { + "growth": str(channels.get("growth") or "OD600"), + "ratio": str(channels.get("ratio") or "YFP/CFP"), + "snapshot": str(channels.get("snapshot") or channels.get("ratio") or "YFP/CFP"), + }, + "sequence_source": { + "root": sequence_source.get("root"), + "dataset": str(sequence_source["dataset"]), + "required_overlays": list(sequence_source.get("required_overlays") or []), + "id_column": str(sequence_source.get("id_column") or "id"), + "sequence_column": str(sequence_source.get("sequence_column") or "sequence"), + "label_column": str(sequence_source.get("label_column") or "usr_label__primary"), + "annotations_column": str(sequence_source.get("annotations_column") or "densegen__used_tfbs_detail"), + "adapter_kind": str(sequence_source.get("adapter_kind") or "densegen_tfbs"), + }, + "sequence_panel": { + "profile": str(sequence_panel.get("profile") or "promoter_compact_slide.v1"), + "target_width_px": int(sequence_panel.get("target_width_px") or 2200), + "target_height_px": int(sequence_panel.get("target_height_px") or 310), + "vertical_anchor": str(sequence_panel.get("vertical_anchor") or "center"), + "canvas_top_pad_px": int(sequence_panel.get("canvas_top_pad_px") or 0), + "style_overrides": dict(sequence_panel.get("style_overrides") or {}), + }, + "time_series": time_series, + "axis_limits": axis_limits, + "treatments": _normalize_treatments(treatments), + } + + +def _build_candidate_plan(*, vec8: pd.DataFrame, sequence_rows: pd.DataFrame, cfg: Mapping[str, Any]) -> pd.DataFrame: + design_col = str(cfg["design_col"]) + seq_id_col = str(cfg["sequence_id_col"]) + vec = vec8.copy() + vec[design_col] = vec[design_col].astype(str) + if vec[design_col].duplicated().any(): + dupes = sorted(vec.loc[vec[design_col].duplicated(), design_col].astype(str).unique()) + raise SFXIError(f"SFXI triptych requires one vec8 row per design; duplicates: {dupes}") + + if seq_id_col in vec.columns: + plan = vec.merge(sequence_rows, left_on=seq_id_col, right_on="usr_sequence_id", how="left") + elif str(cfg["sequence_col"]) in vec.columns: + sequence_col = str(cfg["sequence_col"]) + seq = sequence_rows.copy() + seq["__join_sequence"] = seq["usr_sequence"].astype(str).str.upper() + vec["__join_sequence"] = vec[sequence_col].astype(str).str.upper() + plan = vec.merge(seq, on="__join_sequence", how="left").drop(columns=["__join_sequence"]) + else: + raise SFXIError( + f"SFXI triptych sequence requires vec8 to include either {seq_id_col!r} or {cfg['sequence_col']!r}." + ) + + missing = plan[plan["usr_sequence_id"].isna()] + if not missing.empty: + values = missing[[design_col]].to_dict(orient="records") + raise SFXIError(f"Missing USR sequence rows for selected vec8 designs: {values}") + if str(cfg["sequence_col"]) in plan.columns: + mismatch = plan[ + plan[str(cfg["sequence_col"])].astype(str).str.upper() != plan["usr_sequence"].astype(str).str.upper() + ] + if not mismatch.empty: + values = mismatch[[design_col, "usr_sequence_id"]].to_dict(orient="records") + raise SFXIError(f"Vec8 sequence does not match USR sequence for rows: {values}") + plan["display_label"] = plan[design_col].map(_display_label) + plan["row_kind"] = "candidate" + plan["score_status"] = "canonical_sfxi_vec8" + plan["snapshot_time_h"] = float(cfg["snapshot_target_time_h"]) + plan["snapshot_time_source"] = "sfxi_triptych_sequence.snapshot_target_time_h" + if "time_selected_h" in plan.columns: + plan["vec8_time_selected_h"] = pd.to_numeric(plan["time_selected_h"], errors="coerce") + else: + plan["vec8_time_selected_h"] = math.nan + return _sort_designs(plan, design_col=design_col) + + +def _render_one(*, baserender, assay: pd.DataFrame, row: pd.Series, cfg: Mapping[str, Any], scales: Mapping[str, Any]): + design_col = str(cfg["design_col"]) + design_id = str(row[design_col]) + assay_design = assay[assay[design_col].astype(str) == design_id].copy() + if assay_design.empty: + raise SFXIError(f"No assay rows available for design_id={design_id!r}.") + assay_design["plot_time_h"] = pd.to_numeric(assay_design[cfg["time_col"]], errors="coerce") + assay_design["value"] = pd.to_numeric(assay_design["value"], errors="coerce") + assay_design = assay_design.dropna(subset=["plot_time_h", "value", "channel", cfg["treatment_col"]]) + + treatments = list(cfg["treatments"]) + treatment_order = [item["label"] for item in treatments] + color_map = {item["label"]: item["color"] for item in treatments} + short_label_map = {item["label"]: item["short_label"] for item in treatments} + marker_map = marker_map_for_levels(treatment_order) + time_meta = _time_metadata(assay_design, cfg=cfg) + + fig = plt.figure(figsize=(10.7, 5.82), dpi=180) + gs = fig.add_gridspec(2, 3, height_ratios=[1.0, 0.43], hspace=0.20, wspace=0.25) + axes = [fig.add_subplot(gs[0, idx]) for idx in range(3)] + seq_ax = fig.add_subplot(gs[1, :]) + for ax in axes: + ax.set_box_aspect(1) + + snapshot_meta = _draw_snapshot_panel( + axes[2], + assay=assay_design, + channel=str(cfg["channels"]["snapshot"]), + snapshot_time_h=float(row["snapshot_time_h"]), + tolerance_h=float(cfg["time_tolerance_h"]), + treatments=treatments, + cfg=cfg, + y_limits=scales["y_limits"], + ) + time_meta["snapshot_display_time_h"] = float(snapshot_meta["time_used_h"]) + x_limits = _compute_row_x_limits(assay=assay_design, time_meta=time_meta, cfg=cfg) + for ax, channel, ylabel, title, event_labels in ( + (axes[0], str(cfg["channels"]["growth"]), "OD$_{600}$", "Growth", False), + (axes[1], str(cfg["channels"]["ratio"]), "YFP/CFP", "Reporter ratio", True), + ): + _draw_time_panel( + ax, + assay=assay_design, + channel=channel, + ylabel=ylabel, + title=title, + treatment_order=treatment_order, + color_map=color_map, + marker_map=marker_map, + label_map=short_label_map, + time_meta=time_meta, + y_limits=scales["y_limits"], + x_limits=x_limits, + cfg=cfg, + show_event_labels=event_labels, + ) + + diagnostics = _draw_sequence_panel(seq_ax, row=row, baserender=baserender, cfg=cfg) + fig.suptitle(str(row["display_label"]), x=0.5, y=0.984, ha="center", fontsize=19.0, fontweight="semibold") + fig.subplots_adjust(left=0.062, right=0.988, bottom=0.060, top=0.902) + + record = { + "design_id": design_id, + "display_label": str(row["display_label"]), + "row_kind": str(row["row_kind"]), + "score_status": str(row["score_status"]), + "sequence_source_id": str(row["usr_sequence_id"]), + "usr_dataset": str(row["usr_dataset"]), + "sequence_adapter_kind": str(row["sequence_adapter_kind"]), + "snapshot_target_time_h": float(row["snapshot_time_h"]), + "snapshot_time_source": str(row.get("snapshot_time_source", "unknown")), + "snapshot_observed_time_h": float(snapshot_meta["time_used_h"]), + "snapshot_fell_back": bool(snapshot_meta["fell_back"]), + "snapshot_fallback_delta_h": snapshot_meta.get("fallback_delta_h"), + "vec8_selected_time_h": _json_float_or_none(row.get("vec8_time_selected_h")), + "induction_time_h": _json_float_or_none(time_meta.get("induction_display_time_h")), + "x_min_h": x_limits[0], + "x_max_h": x_limits[1], + "sequence_panel": asdict(diagnostics), + } + return fig, record + + +def _time_metadata(assay: pd.DataFrame, *, cfg: Mapping[str, Any]) -> dict[str, float]: + configured = cfg.get("induction_time_h") + induction = float(configured) if configured is not None else infer_induction_time_h(assay, time_col="plot_time_h") + return { + "induction_display_time_h": float(induction) if induction is not None else math.nan, + "snapshot_display_time_h": float(cfg["snapshot_target_time_h"]), + } + + +def _draw_time_panel( + ax, + *, + assay: pd.DataFrame, + channel: str, + ylabel: str, + title: str, + treatment_order: list[str], + color_map: dict[str, str], + marker_map: dict[str, str], + label_map: dict[str, str], + time_meta: dict[str, float], + y_limits: dict[str, list[float]], + x_limits: list[float], + cfg: Mapping[str, Any], + show_event_labels: bool = False, +) -> None: + treatment_col = str(cfg["treatment_col"]) + data = assay[assay["channel"].astype(str) == channel].copy() + observed = set(data[treatment_col].astype(str)) + missing = [label for label in treatment_order if label not in observed] + if missing: + raise SFXIError(f"{channel} panel missing treatments: {missing}") + data = data[data[treatment_col].astype(str).isin(treatment_order)].copy() + segment_col = _add_segment_column(data) + ts_cfg = dict(cfg.get("time_series") or {}) + draw_time_series_panel( + ax, + data=data, + x_col="plot_time_h", + hue_col=treatment_col, + hue_levels=treatment_order, + color_map=color_map, + marker_map=marker_map, + segment_col=segment_col, + show_replicates=bool(ts_cfg.get("show_replicates", False)), + ci=float(ts_cfg.get("ci", 95.0)), + ci_alpha=float(ts_cfg.get("ci_alpha", 0.16)), + ci_boot=int(ts_cfg.get("ci_boot", 300)), + ci_seed=0, + line_alpha=0.92, + mean_marker_alpha=0.86, + replicate_alpha=0.18, + add_sheet_lines=math.isfinite(float(time_meta["induction_display_time_h"])), + sheet_lines=[float(time_meta["induction_display_time_h"])], + sheet_line_kwargs={"color": "#5F5F5F", "linestyle": "--", "linewidth": 2.15, "alpha": 0.98, "zorder": 0.4}, + log_y=False, + xlabel="Time (h)", + ylabel=ylabel, + legend_loc="upper left", + show_legend=False, + legend_label_map=label_map, + marked_time=float(time_meta["snapshot_display_time_h"]), + marked_time_kwargs={"color": "#5F5F5F", "linestyle": "--", "linewidth": 2.15, "alpha": 0.98, "zorder": 0.6}, + line_width=2.2, + mean_marker_size=30.0, + axis_label_size=13.4, + tick_label_size=12.3, + legend_fontsize=8.5, + legend_marker_size=5.8, + ) + ax.set_xlim(x_limits) + ax.set_ylim(y_limits[channel]) + if show_event_labels: + _draw_time_event_labels(ax, time_meta=time_meta, font_size=float(ts_cfg.get("event_label_font_size", 10.0))) + ax.set_title(title, loc="center", fontsize=14.3, pad=7) + _style_axis(ax) + + +def _draw_snapshot_panel( + ax, + *, + assay: pd.DataFrame, + channel: str, + snapshot_time_h: float, + tolerance_h: float, + treatments: list[dict[str, str]], + cfg: Mapping[str, Any], + y_limits: dict[str, list[float]], +) -> dict[str, Any]: + treatment_col = str(cfg["treatment_col"]) + key_cols = [str(cfg["design_col"]), treatment_col, "channel", "position"] + key_cols = [column for column in key_cols if column in assay.columns] + snapshot_df = assay.copy() + snapshot_df["time"] = snapshot_df["plot_time_h"] + selection = select_snapshot_rows( + df=snapshot_df, + target_time=float(snapshot_time_h), + keys=key_cols, + channel=channel, + tolerance=float(tolerance_h), + ) + if selection.rows.empty: + raise SFXIError(f"No snapshot rows for channel={channel!r} near t={snapshot_time_h:.2f} h") + stats = summarize_snapshot_values(df=selection.rows, group_cols=[treatment_col], err="sem") + order = [item["label"] for item in treatments] + stats = stats.set_index(treatment_col).reindex(order) + missing = stats[stats["mean"].isna()].index.astype(str).tolist() + if missing: + raise SFXIError(f"Snapshot panel missing treatments: {missing}") + positions = np.arange(len(order), dtype=float) + means = stats["mean"].astype(float).to_numpy() + sem = stats["sem"].astype(float).fillna(0.0).to_numpy() + colors = [item["color"] for item in treatments] + ax.bar(positions, means, yerr=sem, color=colors, edgecolor="#BDBDBD", linewidth=0.75, capsize=3, zorder=2) + seed = _stable_seed("|".join(order) + f"|{snapshot_time_h:.6f}") + rng = np.random.default_rng(seed) + for idx, treatment in enumerate(order): + points = selection.rows[selection.rows[treatment_col].astype(str) == treatment] + jitter = rng.uniform(-0.09, 0.09, size=len(points)) + ax.scatter( + positions[idx] + jitter, + points["value"].astype(float), + s=18, + facecolors="white", + edgecolors="#4D4D4D", + linewidths=0.55, + alpha=0.92, + zorder=3, + ) + ax.set_xticks(positions) + ax.set_xticklabels([_snapshot_tick_label(item["short_label"]) for item in treatments], ha="center", fontsize=11.5) + ax.set_xlabel("") + ax.set_ylabel(channel, fontsize=13.4) + ax.set_ylim(y_limits[channel]) + ax.set_title(f"Snapshot ({float(selection.time_used):.1f} h)", loc="center", fontsize=14.3, pad=7) + ax.tick_params(axis="y", labelsize=12.3) + ax.yaxis.grid(True, which="major", color="#E1E1E1", linewidth=0.75) + ax.xaxis.grid(False) + _style_axis(ax) + return { + "time_used_h": float(selection.time_used), + "fell_back": bool(selection.fell_back), + "fallback_delta_h": selection.fallback_delta, + } + + +def _compute_render_scales(*, assay: pd.DataFrame, render_plan: pd.DataFrame, cfg: Mapping[str, Any]) -> dict[str, Any]: + values_by_channel: dict[str, list[float]] = { + str(cfg["channels"]["growth"]): [], + str(cfg["channels"]["ratio"]): [], + str(cfg["channels"]["snapshot"]): [], + } + treatment_labels = {item["label"] for item in cfg["treatments"]} + treatment_col = str(cfg["treatment_col"]) + design_col = str(cfg["design_col"]) + design_ids = set(render_plan[design_col].astype(str)) + sub = assay[assay[design_col].astype(str).isin(design_ids)].copy() + for channel, values in values_by_channel.items(): + channel_rows = sub[ + (sub["channel"].astype(str) == channel) & sub[treatment_col].astype(str).isin(treatment_labels) + ] + values.extend(pd.to_numeric(channel_rows["value"], errors="coerce").dropna().astype(float).tolist()) + axis_cfg = dict(cfg.get("axis_limits") or {}) + pad_fraction = float(axis_cfg.get("y_padding_fraction", 0.08)) + upper_quantile = float(axis_cfg.get("upper_quantile", 1.0)) + if not 0 < upper_quantile <= 1: + raise SFXIError("axis_limits.upper_quantile must be > 0 and <= 1.") + y_limits: dict[str, list[float]] = {} + for channel, values in values_by_channel.items(): + if not values: + raise SFXIError(f"No values available for global y-axis scaling: {channel}") + upper_source = float(np.quantile(np.asarray(values, dtype=float), upper_quantile)) + upper = max(upper_source * (1.0 + pad_fraction), 1e-9) + y_limits[channel] = [0.0, _nice_upper(upper)] + return {"x_policy": "per_row_raw_assay_time", "y_limits": y_limits} + + +def _compute_row_x_limits( + *, assay: pd.DataFrame, time_meta: Mapping[str, float], cfg: Mapping[str, Any] +) -> list[float]: + values = pd.to_numeric(assay["plot_time_h"], errors="coerce").dropna().astype(float).tolist() + for key in ("induction_display_time_h", "snapshot_display_time_h"): + value = float(time_meta[key]) + if math.isfinite(value): + values.append(value) + if not values: + raise SFXIError("Cannot compute x-axis limits: no finite time values.") + x_pad = float((cfg.get("time_axis") or {}).get("x_padding_h", 0.5)) + return [float(min(0.0, min(values))), _nice_upper(max(values) + max(0.0, x_pad))] + + +def _normalize_treatments(rows: Sequence[Mapping[str, Any]]) -> list[dict[str, str]]: + out: list[dict[str, str]] = [] + for row in rows: + item = {key: str(row[key]) for key in ("state", "label", "short_label", "color") if key in row} + missing = sorted({"state", "label", "short_label", "color"} - set(item)) + if missing: + raise SFXIError(f"SFXI triptych treatment row missing {missing}: {row}") + out.append(item) + if not out: + raise SFXIError("SFXI triptych requires at least one treatment row.") + return out + + +def _draw_time_event_labels(ax, *, time_meta: Mapping[str, float], font_size: float) -> None: + transform = ax.get_xaxis_transform() + for label, value, x_offset in ( + ("induction", time_meta.get("induction_display_time_h"), -12), + ("snapshot", time_meta.get("snapshot_display_time_h"), 12), + ): + try: + x = float(value) + except (TypeError, ValueError): + continue + if not math.isfinite(x): + continue + ax.annotate( + label, + xy=(x, 0.5), + xycoords=transform, + xytext=(x_offset, 0), + textcoords="offset points", + clip_on=True, + va="center", + ha="center", + fontsize=font_size, + color="#555555", + rotation=90, + rotation_mode="anchor", + zorder=4.5, + bbox={"boxstyle": "round,pad=0.16", "facecolor": "white", "edgecolor": "none", "alpha": 0.74}, + ) + + +def _add_segment_column(data: pd.DataFrame) -> str | None: + segment_parts = [column for column in ("source", "sheet_name", "sheet_index") if column in data.columns] + if not segment_parts: + return None + segment_col = "__plot_segment" + segments = data[segment_parts].copy() + for column in segment_parts: + segments[column] = segments[column].astype(str) + data[segment_col] = segments.agg("::".join, axis=1) + return segment_col + + +def _style_axis(ax) -> None: + ax.spines["top"].set_visible(False) + ax.spines["right"].set_visible(False) + ax.spines["left"].set_color("#868686") + ax.spines["bottom"].set_color("#868686") + ax.spines["left"].set_linewidth(1.45) + ax.spines["bottom"].set_linewidth(1.45) + ax.tick_params(colors="#2F2F2F", width=1.15) + ax.yaxis.grid(True, which="major", color="#E1E1E1", linewidth=0.75) + ax.xaxis.grid(True, which="major", color="#ECECEC", linewidth=0.65) + + +def _snapshot_tick_label(label: str) -> str: + return str(label).replace(" ", "\n", 1) + + +def _display_label(design_id: object) -> str: + text = str(design_id) + prefix = "pDual-10-" + return text[len(prefix) :] if text.startswith(prefix) else text + + +def _sort_designs(df: pd.DataFrame, *, design_col: str) -> pd.DataFrame: + out = df.copy() + out["__design_sort"] = out[design_col].map(_design_sort_key) + out = out.sort_values("__design_sort", kind="stable").drop(columns=["__design_sort"]) + return out.reset_index(drop=True) + + +def _design_sort_key(value: object) -> tuple[int, str]: + text = str(value) + digits = "".join(ch for ch in text if ch.isdigit()) + return (int(digits) if digits else 10**9, text) + + +def _slug(value: object) -> str: + text = str(value) + keep = [ch.lower() if ch.isalnum() else "_" for ch in text] + slug = "".join(keep).strip("_") + while "__" in slug: + slug = slug.replace("__", "_") + return slug or "item" + + +def _stable_seed(text: str) -> int: + return int(hashlib.sha256(text.encode("utf-8")).hexdigest()[:8], 16) + + +def _finite_float(value: Any, *, default: float | None) -> float | None: + if value is None: + return default + try: + out = float(value) + except (TypeError, ValueError) as exc: + raise SFXIError(f"Expected a finite numeric value, got {value!r}") from exc + if not math.isfinite(out): + raise SFXIError(f"Expected a finite numeric value, got {value!r}") + return out + + +def _json_float_or_none(value: Any) -> float | None: + try: + out = float(value) + except (TypeError, ValueError): + return None + return out if math.isfinite(out) else None + + +def _nice_upper(value: float) -> float: + if not math.isfinite(value) or value <= 0: + return 1.0 + if value <= 1.5: + step = 0.1 + elif value <= 5: + step = 0.25 + else: + step = 1.0 + return float(math.ceil(value / step) * step) + + +def _close_figure(fig) -> None: + plt.close(fig) + + +def _require_columns(df: pd.DataFrame, columns: Sequence[str], *, where: str) -> None: + missing = [column for column in columns if column not in df.columns] + if missing: + raise SFXIError(f"{where}: missing required columns {missing}") + + +__all__ = [ + "DNADESIGN_SEQUENCE_PANEL_CONTRACT_ID", + "READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION", + "TRIPTYCH_BUNDLE_CONTRACT_VERSION", + "render_sfxi_triptych_sequence_bundle", + "require_dnadesign_sequence_panel_api", +] diff --git a/src/reader/domains/logic/sfxi/triptych_sequence_dnadesign.py b/src/reader/domains/logic/sfxi/triptych_sequence_dnadesign.py new file mode 100644 index 0000000..eb359e1 --- /dev/null +++ b/src/reader/domains/logic/sfxi/triptych_sequence_dnadesign.py @@ -0,0 +1,153 @@ +""" +dnadesign boundary adapter for the SFXI triptych sequence plot. + +This module is intentionally narrow: reader owns plot semantics and bundle +publication, while dnadesign owns USR access and BaseRender sequence panels. +""" + +from __future__ import annotations + +import importlib +from collections.abc import Mapping +from pathlib import Path +from typing import Any + +import pandas as pd +import pyarrow as pa + +from reader.errors import SFXIError + +DNADESIGN_SEQUENCE_PANEL_CONTRACT_ID = "dnadesign.baserender.sequence_panel.v1" +READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION = "1" + + +def require_dnadesign_sequence_panel_api(): + try: + baserender = importlib.import_module("dnadesign.baserender") + usr = importlib.import_module("dnadesign.usr") + except ImportError as exc: + _raise_dnadesign_import_error(exc) + + actual = getattr(baserender, "BASERENDER_SEQUENCE_PANEL_CONTRACT_VERSION", None) + if str(actual) != READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION: + raise SFXIError( + "Unsupported dnadesign BaseRender sequence-panel contract version " + f"{actual!r}; reader expects {READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION!r}. " + "Update reader[dnadesign] or sync a compatible dnadesign checkout." + ) + for attr in ("render_sequence_panel_image", "sequence_panel_config_for_adapter"): + _require_public_attr(baserender, attr, module_name="dnadesign.baserender") + for attr in ("Dataset", "default_usr_root"): + _require_public_attr(usr, attr, module_name="dnadesign.usr") + return baserender, usr + + +def _require_public_attr(module, attr: str, *, module_name: str) -> None: + try: + getattr(module, attr) + except AttributeError as exc: + raise SFXIError(f"{module_name} is missing required public API: {attr}.") from exc + except ImportError as exc: + _raise_dnadesign_import_error(exc) + + +def _raise_dnadesign_import_error(exc: ImportError) -> None: + raise SFXIError( + "SFXI triptych sequence requires dnadesign public APIs. Install or sync the optional dependency " + "with `uv sync --extra dnadesign` or install `reader[dnadesign]`." + ) from exc + + +def resolve_usr_root(*, usr, root: object, exp_dir: Path) -> Path: + if root in (None, ""): + return Path(usr.default_usr_root()) + root_path = Path(str(root)).expanduser() + return root_path if root_path.is_absolute() else (exp_dir / root_path).resolve() + + +def require_usr_sequence_dataset(*, usr, root: object, dataset_name: str, exp_dir: Path) -> Path: + usr_root = resolve_usr_root(usr=usr, root=root, exp_dir=exp_dir) + dataset = usr.Dataset.open(usr_root, dataset_name) + records_path = Path(dataset.records_path) + if not records_path.exists(): + raise SFXIError(f"sfxi_triptych_sequence could not find USR dataset {dataset_name!r} at {records_path}.") + return records_path + + +def load_usr_sequence_rows(*, usr, cfg: Mapping[str, Any], exp_dir: Path) -> pd.DataFrame: + source = cfg["sequence_source"] + usr_root = resolve_usr_root(usr=usr, root=source.get("root"), exp_dir=exp_dir) + dataset = usr.Dataset.open(Path(usr_root), str(source["dataset"])) + columns = [ + str(source["id_column"]), + str(source["sequence_column"]), + str(source["label_column"]), + str(source["annotations_column"]), + ] + include_overlays: bool | list[str] = list(source["required_overlays"]) or True + batches = list(dataset.scan(columns=columns, include_overlays=include_overlays)) + if not batches: + raise SFXIError(f"USR dataset has no rows: {Path(usr_root) / str(source['dataset'])}") + frame = pa.Table.from_batches(batches).to_pandas() + _require_columns(frame, columns, where=f"USR dataset {source['dataset']}") + out = frame.rename( + columns={ + str(source["id_column"]): "usr_sequence_id", + str(source["sequence_column"]): "usr_sequence", + str(source["label_column"]): "usr_label", + str(source["annotations_column"]): "usr_annotations", + } + ) + out["usr_dataset"] = str(source["dataset"]) + out["sequence_adapter_kind"] = str(source["adapter_kind"]) + if out["usr_sequence_id"].duplicated().any(): + dupes = sorted(out.loc[out["usr_sequence_id"].duplicated(), "usr_sequence_id"].astype(str).unique()) + raise SFXIError(f"USR dataset {source['dataset']} has duplicate ids: {dupes}") + return out + + +def draw_sequence_panel(ax, *, row: pd.Series, baserender, cfg: Mapping[str, Any]): + panel = cfg["sequence_panel"] + record_row = { + "id": str(row["usr_sequence_id"]), + "sequence": str(row["usr_sequence"]), + } + adapter_kind = str(row["sequence_adapter_kind"]) + if adapter_kind == "densegen_tfbs": + record_row["densegen__used_tfbs_detail"] = row["usr_annotations"] + elif adapter_kind == "usr_genbank_annotations_v1": + record_row["seq_annot__features"] = row["usr_annotations"] + record_row["usr_label__primary"] = str(row.get("usr_label", row["display_label"])) + else: + raise SFXIError(f"Unsupported sequence adapter kind: {adapter_kind!r}") + + result = baserender.render_sequence_panel_image( + record_row, + adapter_kind=adapter_kind, + style_profile=str(panel["profile"]), + style_overrides=dict(panel["style_overrides"]), + target_width_px=int(panel["target_width_px"]), + target_height_px=int(panel["target_height_px"]), + vertical_anchor=str(panel["vertical_anchor"]), + canvas_top_pad_px=int(panel["canvas_top_pad_px"]), + ) + ax.imshow(result.image) + ax.set_axis_off() + return result.diagnostics + + +def _require_columns(df: pd.DataFrame, columns: list[str], *, where: str) -> None: + missing = [column for column in columns if column not in df.columns] + if missing: + raise SFXIError(f"{where}: missing required columns {missing}") + + +__all__ = [ + "DNADESIGN_SEQUENCE_PANEL_CONTRACT_ID", + "READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION", + "draw_sequence_panel", + "load_usr_sequence_rows", + "require_dnadesign_sequence_panel_api", + "require_usr_sequence_dataset", + "resolve_usr_root", +] diff --git a/src/reader/domains/logic/sfxi/triptych_sequence_outputs.py b/src/reader/domains/logic/sfxi/triptych_sequence_outputs.py new file mode 100644 index 0000000..ed8055e --- /dev/null +++ b/src/reader/domains/logic/sfxi/triptych_sequence_outputs.py @@ -0,0 +1,190 @@ +""" +Artifact publication helpers for the SFXI triptych sequence bundle. +""" + +from __future__ import annotations + +import shutil +import subprocess +from collections.abc import Mapping +from datetime import UTC, datetime +from pathlib import Path + +from reader.errors import SFXIError + +from .triptych_sequence_dnadesign import ( + DNADESIGN_SEQUENCE_PANEL_CONTRACT_ID, + READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION, +) + +TRIPTYCH_BUNDLE_CONTRACT_VERSION = "reader.sfxi_triptych_sequence_bundle.v1" + + +def bundle_paths(*, ctx, bundle_id: str, movie_enabled: bool) -> dict[str, Path]: + plot_dir = ctx.plots_dir / "sfxi_triptych_sequence" + export_dir = ctx.exports_dir / "sfxi_triptych_sequence" + paths = { + "poster": plot_dir / f"{bundle_id}.png", + "pdf": plot_dir / f"{bundle_id}.pdf", + "frames_dir": plot_dir / f"{bundle_id}__frames", + "index": export_dir / f"{bundle_id}_index.csv", + "manifest": ctx.outputs_dir / "manifests" / f"{bundle_id}_manifest.json", + } + if movie_enabled: + paths["movie"] = plot_dir / f"{bundle_id}.mp4" + return paths + + +def staging_parent(outputs_dir: Path) -> Path: + path = outputs_dir / ".staging" + path.mkdir(parents=True, exist_ok=True) + return path + + +def staging_paths(*, staging_root: Path, bundle_id: str, movie_enabled: bool) -> dict[str, Path]: + frames = staging_root / f"{bundle_id}__frames" + frames.mkdir(parents=True, exist_ok=True) + paths = { + "poster": staging_root / f"{bundle_id}.png", + "pdf": staging_root / f"{bundle_id}.pdf", + "frames_dir": frames, + "index": staging_root / f"{bundle_id}_index.csv", + "manifest": staging_root / f"{bundle_id}_manifest.json", + "frames_txt": staging_root / "_movie_frames.txt", + } + if movie_enabled: + paths["movie"] = staging_root / f"{bundle_id}.mp4" + return paths + + +def write_movie( + *, cfg: Mapping[str, object], records: list[dict[str, object]], staging: Mapping[str, Path], outputs_dir: Path +): + if not bool(cfg["movie_enabled"]): + return None + ffmpeg = shutil.which("ffmpeg") + if ffmpeg is None: + raise SFXIError("sfxi_triptych_sequence movie output requires ffmpeg on PATH.") + duration = 1.0 / max(float(cfg["movie_fps"]), 1e-9) + frame_list = staging["frames_txt"] + lines: list[str] = [] + for record in records: + png = outputs_dir / str(record["png_path"]) + png = staging["frames_dir"] / png.name + lines.append(f"file '{png}'") + lines.append(f"duration {duration:.6f}") + if records: + last = staging["frames_dir"] / Path(str(records[-1]["png_path"])).name + lines.append(f"file '{last}'") + frame_list.write_text("\n".join(lines) + "\n", encoding="utf-8") + cmd = [ + ffmpeg, + "-y", + "-hide_banner", + "-loglevel", + "error", + "-f", + "concat", + "-safe", + "0", + "-i", + str(frame_list), + "-vf", + "format=yuv420p", + "-movflags", + "+faststart", + str(staging["movie"]), + ] + subprocess.run(cmd, check=True) + return staging["movie"] + + +def manifest_payload( + *, + ctx, + cfg: Mapping[str, object], + records: list[dict[str, object]], + outputs: Mapping[str, Path], + movie_path: Path | None, + scales: Mapping[str, object], +) -> dict[str, object]: + output_map = { + "png": relative_to_outputs(outputs["poster"], outputs_dir=ctx.outputs_dir), + "pdf": relative_to_outputs(outputs["pdf"], outputs_dir=ctx.outputs_dir), + "index_csv": relative_to_outputs(outputs["index"], outputs_dir=ctx.outputs_dir), + "manifest_json": relative_to_outputs(outputs["manifest"], outputs_dir=ctx.outputs_dir), + } + if movie_path is not None: + output_map["movie_mp4"] = relative_to_outputs(movie_path, outputs_dir=ctx.outputs_dir) + return { + "schema": TRIPTYCH_BUNDLE_CONTRACT_VERSION, + "created_at": datetime.now(UTC).isoformat(), + "bundle_id": str(cfg["bundle_id"]), + "plot_id": "sfxi_triptych_sequence", + "protocol_id": getattr(getattr(ctx, "protocol", None), "id", None), + "source_experiment_id": getattr(getattr(ctx, "experiment", None), "id", None), + "row_count": len(records), + "row_order": [record["design_id"] for record in records], + "reference_rows": [record for record in records if record.get("row_kind") == "reference"], + "snapshot_target_time_h": float(cfg["snapshot_target_time_h"]), + "dnadesign_contract_id": DNADESIGN_SEQUENCE_PANEL_CONTRACT_ID, + "dnadesign_contract_version": READER_SUPPORTED_SEQUENCE_PANEL_CONTRACT_VERSION, + "sequence_profile_id": str(cfg["sequence_panel"]["profile"]), + "axis_scales": scales, + "outputs": output_map, + "records": records, + } + + +def publish_bundle(*, staging: Mapping[str, Path], final: Mapping[str, Path]) -> None: + for key in ("poster", "pdf", "index", "manifest"): + final[key].parent.mkdir(parents=True, exist_ok=True) + if "movie" in final: + final["movie"].parent.mkdir(parents=True, exist_ok=True) + final["frames_dir"].parent.mkdir(parents=True, exist_ok=True) + + frames_dir = final["frames_dir"] + previous_frames_dir = frames_dir.with_name(f"{frames_dir.name}.__previous") + if previous_frames_dir.exists(): + shutil.rmtree(previous_frames_dir) + + try: + if frames_dir.exists(): + frames_dir.rename(previous_frames_dir) + shutil.move(str(staging["frames_dir"]), str(frames_dir)) + for key in ("poster", "pdf", "index", "manifest", "movie"): + if key in staging and key in final: + Path(staging[key]).replace(final[key]) + except Exception: + if frames_dir.exists(): + shutil.rmtree(frames_dir) + if previous_frames_dir.exists(): + previous_frames_dir.rename(frames_dir) + raise + finally: + shutil.rmtree(previous_frames_dir, ignore_errors=True) + shutil.rmtree(Path(staging["poster"]).parent, ignore_errors=True) + + +def cleanup_staging_root(staging_root: Path) -> None: + shutil.rmtree(staging_root, ignore_errors=True) + + +def relative_to_outputs(path: Path, *, outputs_dir: Path) -> str: + try: + return str(path.resolve().relative_to(outputs_dir.resolve())) + except ValueError: + return str(path.resolve()) + + +__all__ = [ + "TRIPTYCH_BUNDLE_CONTRACT_VERSION", + "bundle_paths", + "cleanup_staging_root", + "manifest_payload", + "publish_bundle", + "relative_to_outputs", + "staging_parent", + "staging_paths", + "write_movie", +] diff --git a/src/reader/domains/plate_reader/analysis/timepoints.py b/src/reader/domains/plate_reader/analysis/timepoints.py index a11695d..2a081ff 100644 --- a/src/reader/domains/plate_reader/analysis/timepoints.py +++ b/src/reader/domains/plate_reader/analysis/timepoints.py @@ -60,4 +60,24 @@ def choose_nearest_time( return chosen_time -__all__ = ["choose_nearest_time", "nearest_time_per_key"] +def infer_induction_time_h(df: pd.DataFrame, *, time_col: str) -> float | None: + for column in ("induction_time_h", "induction_time", "time_of_induction_h", "time_of_induction"): + if column in df.columns: + values = pd.to_numeric(df[column], errors="coerce").dropna() + if not values.empty: + return float(values.iloc[0]) + + if "sheet_index" not in df.columns: + return None + sheet_values = pd.to_numeric(df["sheet_index"], errors="coerce").dropna() + if sheet_values.empty: + return None + min_sheet = float(sheet_values.min()) + sheet_series = pd.to_numeric(df["sheet_index"], errors="coerce") + times = pd.to_numeric(df.loc[sheet_series > min_sheet, time_col], errors="coerce").dropna() + if times.empty: + return None + return float(times.min()) + + +__all__ = ["choose_nearest_time", "infer_induction_time_h", "nearest_time_per_key"] diff --git a/src/reader/plugins/plot/sfxi_setpoint_scatter.py b/src/reader/plugins/plot/sfxi_setpoint_scatter.py new file mode 100644 index 0000000..7095050 --- /dev/null +++ b/src/reader/plugins/plot/sfxi_setpoint_scatter.py @@ -0,0 +1,70 @@ +""" +SFXI setpoint scatter plot plugin. +""" + +from __future__ import annotations + +from typing import Any + +import pandas as pd +from pydantic import Field + +from reader.errors import SFXIError +from reader.plotting.sinks import PlotFigure +from reader.plugins.plot._shared import FigurePlotPlugin +from reader.workbench.ports import dataframe_input +from reader.workbench.registry import PluginConfig, PreflightIssue + + +class SFXISetpointScatterCfg(PluginConfig): + setpoints: dict[str, list[float]] = Field(default_factory=lambda: {"and": [0.0, 0.0, 0.0, 1.0]}) + scaling_percentile: int = 95 + scaling_min_n: int = 5 + scaling_eps: float = 1.0e-8 + logic_exponent_beta: float = 1.0 + intensity_exponent_gamma: float = 1.0 + intensity_log2_offset_delta: float = 0.0 + fig: dict[str, Any] = Field(default_factory=dict) + filename: str | None = None + format: list[str] = Field(default_factory=lambda: ["pdf"]) + dpi: int = 300 + label_points: bool = False + + +class SFXISetpointScatterPlot(FigurePlotPlugin): + ConfigModel = SFXISetpointScatterCfg + + @classmethod + def input_ports(cls): + return {"vec8": dataframe_input("vec8", "sfxi.vec8.v2")} + + @classmethod + def preflight_readiness(cls, *, exp_dir, cfg: SFXISetpointScatterCfg, reads): + del exp_dir, cfg, reads + from reader.domains.logic.sfxi.setpoint_scatter import require_dnadesign_sfxi_api # noqa: PLC0415 + + try: + require_dnadesign_sfxi_api() + except SFXIError as exc: + return (PreflightIssue(kind="dependency", message=str(exc)),) + return () + + def render(self, ctx, inputs, cfg: SFXISetpointScatterCfg) -> list[PlotFigure]: + vec8: pd.DataFrame = inputs["vec8"] + from reader.domains.logic.sfxi.setpoint_scatter import render_sfxi_setpoint_scatter # noqa: PLC0415 + + return render_sfxi_setpoint_scatter( + vec8=vec8, + setpoints=cfg.setpoints, + scaling_percentile=cfg.scaling_percentile, + scaling_min_n=cfg.scaling_min_n, + scaling_eps=cfg.scaling_eps, + logic_exponent_beta=cfg.logic_exponent_beta, + intensity_exponent_gamma=cfg.intensity_exponent_gamma, + intensity_log2_offset_delta=cfg.intensity_log2_offset_delta, + fig_kwargs=cfg.fig, + filename=cfg.filename, + formats=cfg.format, + dpi=cfg.dpi, + label_points=cfg.label_points, + ) diff --git a/src/reader/plugins/plot/sfxi_triptych_sequence.py b/src/reader/plugins/plot/sfxi_triptych_sequence.py new file mode 100644 index 0000000..547e76a --- /dev/null +++ b/src/reader/plugins/plot/sfxi_triptych_sequence.py @@ -0,0 +1,131 @@ +""" +SFXI triptych sequence bundle plot plugin. +""" + +from __future__ import annotations + +from typing import Any + +import pandas as pd +from pydantic import BaseModel, Field + +from reader.errors import SFXIError +from reader.plugins.plot._shared import FigurePlotPlugin +from reader.workbench.ports import dataframe_input +from reader.workbench.registry import PluginConfig, PreflightIssue + + +class _StrictModel(BaseModel): + model_config = {"extra": "forbid"} + + +class SFXITriptychSequenceSourceCfg(_StrictModel): + dataset: str + root: str | None = None + required_overlays: list[str] = Field(default_factory=list) + id_column: str = "id" + sequence_column: str = "sequence" + label_column: str = "usr_label__primary" + annotations_column: str = "densegen__used_tfbs_detail" + adapter_kind: str = "densegen_tfbs" + + +class SFXITriptychChannelsCfg(_StrictModel): + growth: str = "OD600" + ratio: str = "YFP/CFP" + snapshot: str = "YFP/CFP" + + +class SFXITriptychSequencePanelCfg(_StrictModel): + profile: str = "promoter_compact_slide.v1" + target_width_px: int = 2200 + target_height_px: int = 310 + vertical_anchor: str = "center" + canvas_top_pad_px: int = 0 + style_overrides: dict[str, Any] = Field(default_factory=dict) + + +class SFXITriptychTreatmentCfg(_StrictModel): + state: str + label: str + short_label: str + color: str + + +class SFXITriptychSequenceCfg(PluginConfig): + sequence_source: SFXITriptychSequenceSourceCfg + bundle_id: str = "sfxi_triptych_sequence" + design_col: str = "design_id" + sequence_id_col: str = "id" + sequence_col: str = "sequence" + time_col: str = "time" + treatment_col: str = "treatment_alias" + snapshot_target_time_h: float = 12.0 + induction_time_h: float | None = 12.0 + time_tolerance_h: float = 0.51 + channels: SFXITriptychChannelsCfg = Field(default_factory=SFXITriptychChannelsCfg) + sequence_panel: SFXITriptychSequencePanelCfg = Field(default_factory=SFXITriptychSequencePanelCfg) + treatments: list[SFXITriptychTreatmentCfg] = Field(default_factory=list) + time_series: dict[str, Any] = Field(default_factory=dict) + axis_limits: dict[str, Any] = Field(default_factory=dict) + movie_enabled: bool = False + movie_fps: float = 0.85 + dpi: int = 220 + limit: int | None = None + + +class SFXITriptychSequencePlot(FigurePlotPlugin): + ConfigModel = SFXITriptychSequenceCfg + + @classmethod + def input_ports(cls): + return { + "vec8": dataframe_input("vec8", "sfxi.vec8.v2"), + "assay": dataframe_input("assay", "plate_reader.annotated.v1"), + } + + @classmethod + def preflight_readiness(cls, *, exp_dir, cfg: SFXITriptychSequenceCfg, reads): + del reads + from reader.domains.logic.sfxi.triptych_sequence_dnadesign import ( # noqa: PLC0415 + require_dnadesign_sequence_panel_api, + require_usr_sequence_dataset, + ) + + if not cfg.sequence_source.dataset.strip(): + return ( + PreflightIssue(kind="dependency", message="sfxi_triptych_sequence requires sequence_source.dataset."), + ) + try: + _baserender, usr = require_dnadesign_sequence_panel_api() + except SFXIError as exc: + return (PreflightIssue(kind="dependency", message=str(exc)),) + try: + require_usr_sequence_dataset( + usr=usr, + root=cfg.sequence_source.root, + dataset_name=cfg.sequence_source.dataset, + exp_dir=exp_dir, + ) + except SFXIError as exc: + return (PreflightIssue(kind="dependency", message=str(exc)),) + return () + + def render(self, ctx, inputs, cfg: SFXITriptychSequenceCfg): + del ctx, inputs, cfg + raise NotImplementedError("SFXITriptychSequencePlot uses run() to write an atomic file bundle.") + + def run(self, ctx, inputs, cfg: SFXITriptychSequenceCfg): + vec8: pd.DataFrame = inputs["vec8"] + assay: pd.DataFrame = inputs["assay"] + from reader.domains.logic.sfxi.triptych_sequence import ( # noqa: PLC0415 + render_sfxi_triptych_sequence_bundle, + ) + + paths = render_sfxi_triptych_sequence_bundle( + ctx=ctx, + vec8=vec8, + assay=assay, + config=cfg.model_dump(mode="python"), + ) + return {"artifacts": [str(path) for path in paths]} diff --git a/src/reader/protocols/_builtins_plate_reader_variants.py b/src/reader/protocols/_builtins_plate_reader_variants.py new file mode 100644 index 0000000..d608b6d --- /dev/null +++ b/src/reader/protocols/_builtins_plate_reader_variants.py @@ -0,0 +1,1099 @@ +from __future__ import annotations + +from collections.abc import Callable + +from reader.domains.plate_reader.analysis._retron_sponge_contract import DEFAULT_PRIMARY_POST_STRESS_HOURS +from reader.plugins.ingest.discovery_policy import DEFAULT_EXCLUDE, DEFAULT_INCLUDE + +from .compiler import ( + compile_plate_reader_retron_sponge_screen, + compile_plate_reader_single_reporter_screen, +) +from .model import ( + ProtocolArtifactSpec, + ProtocolConfigFieldSpec, + ProtocolControlRule, + ProtocolDescriptor, + ProtocolEffectSignSpec, + ProtocolExecutionPlan, + ProtocolFactorSpec, + ProtocolFigureSpec, + ProtocolMetricSpec, + ProtocolNotebookPolicy, + ProtocolPlotProfileSpec, + ProtocolPluginDefaultsSpec, + ProtocolRankingSpec, + ProtocolSemanticProfileOverride, + ProtocolSemanticProfileSpec, + ProtocolWindowSpec, + binding_value, +) + + +def build_plate_reader_variant_protocols( + *, + dual_reporter_protocol: ProtocolDescriptor, + field_builder: Callable[..., ProtocolConfigFieldSpec], +) -> tuple[ProtocolDescriptor, ProtocolDescriptor]: + field = field_builder + dual_strict_field = next(item for item in dual_reporter_protocol.analysis_fields if item.key == "strict") + dual_preprocessing_field = next( + item for item in dual_reporter_protocol.analysis_fields if item.key == "preprocessing" + ) + measurement_field = field( + "measurement", + "Primary matched-control measurement family.", + kind="string", + choices=("yfp_cfp", "single_reporter"), + default="yfp_cfp", + ) + + retron_sponge_figures = ( + ProtocolFigureSpec( + id="raw_kinetics", + kind="qc", + summary="Raw growth and reporter kinetics for early QC before matched-control normalization.", + primary=True, + ), + ProtocolFigureSpec( + id="support_kinetics", + kind="qc", + summary="Growth-normalized support ratios that contextualize broad physiology vs reporter-specific effects.", + primary=True, + ), + ProtocolFigureSpec( + id="control_burden_panel", + kind="qc", + summary="tetO-only burden panel over the primary readout and growth-rate traces across the full run.", + primary=True, + ), + ProtocolFigureSpec( + id="baseline_shifted_kinetics", + kind="kinetics", + summary="Baseline-shifted kinetics that isolate post-stress movement from pre-stress offsets.", + primary=True, + ), + ProtocolFigureSpec( + id="matched_control_kinetics", + kind="kinetics", + summary="Per-arm matched-control-normalized kinetics that show deviation from same-sensor tetO controls across the full run.", + primary=True, + ), + ProtocolFigureSpec( + id="induced_effect_kinetics", + kind="kinetics", + summary="Per-arm post-stress increment trajectories after matched-control normalization, paired with a compact expected-direction positive-area score.", + primary=True, + ), + ProtocolFigureSpec( + id="absolute_effect_kinetics", + kind="kinetics", + summary="Per-arm matched-tetO separation trajectories that preserve pre-stress preload differences, paired with a compact expected-direction total-area score.", + primary=True, + ), + ProtocolFigureSpec( + id="control_anchored_decomposition", + kind="summary", + summary="Per-pair sponge-versus-matched-tetO assay summary with relevant-stress traces, H2O context, pre-stress ΔR, and expected-direction state-area summaries.", + primary=True, + ), + ProtocolFigureSpec( + id="interaction_summary", + kind="summary", + summary="2x2 state summary over the matched-control-normalized endpoint or AUC surface.", + primary=True, + ), + ProtocolFigureSpec( + id="library_heatmaps", + kind="summary", + summary="Library-wide heatmaps over expected-direction total area, expected-direction post-stress area, and preload shift.", + primary=True, + ), + ProtocolFigureSpec( + id="stress_modulation_scores", + kind="summary", + summary="Stress-modulation score review across on-target sponge/sensor pairs.", + primary=True, + ), + ProtocolFigureSpec( + id="pareto_ranking", + kind="summary", + summary="Pareto-style ranking of expected-direction total area against burden and leakiness.", + primary=True, + ), + ) + + retron_sponge_plot_profiles = ( + ProtocolPlotProfileSpec( + id="screen_overview", + summary="Reader-first default set for matched-control sponge screens from QC through decision and ranking.", + figures=( + "raw_kinetics", + "support_kinetics", + "control_burden_panel", + "control_anchored_decomposition", + "absolute_effect_kinetics", + "induced_effect_kinetics", + "library_heatmaps", + "pareto_ranking", + ), + ), + ProtocolPlotProfileSpec( + id="kinetics_qc", + summary="QC-first review over raw, support, and tetO burden traces.", + figures=("raw_kinetics", "support_kinetics", "control_burden_panel"), + ), + ProtocolPlotProfileSpec( + id="analysis_review", + summary="Expanded semantic review over compiled sponge metrics, intermediate transforms, and rankings.", + figures=( + "baseline_shifted_kinetics", + "matched_control_kinetics", + "absolute_effect_kinetics", + "induced_effect_kinetics", + "control_anchored_decomposition", + "interaction_summary", + "library_heatmaps", + "stress_modulation_scores", + "pareto_ranking", + ), + ), + ) + + single_reporter_protocol = ProtocolDescriptor( + protocol="plate_reader/single_reporter_screen", + domain="plate_reader", + family="screen_analysis", + summary=( + "Single-reporter plate-reader panel protocol with configurable reporter/normalizer channels and " + "compiled fold-change summaries." + ), + tags=("plate_reader", "single_reporter", "screen", "ratio", "fold_change"), + input_fields=dual_reporter_protocol.input_fields, + analysis_fields=( + field( + "reporter_channel", + "Primary reporter channel to normalize against the configured normalizer.", + kind="string", + default="RFP", + ), + field( + "normalizer_channel", + "Denominator channel used to normalize the reporter signal.", + kind="string", + default="OD600", + ), + field("include_fold_change", "Build the fold-change comparison table.", kind="bool", default=True), + dual_strict_field, + dual_preprocessing_field, + ), + factors=dual_reporter_protocol.factors, + semantic_profiles=( + ProtocolSemanticProfileSpec( + id="single_reporter_raw", + family="single_reporter_panel", + summary="Single-reporter panel semantics over a configured reporter/normalizer ratio.", + primary_metric="Reporter_Normalizer", + primary_readout="reporter / normalizer", + tags=("single_reporter", "ratio", "panel"), + ), + ProtocolSemanticProfileSpec( + id="single_reporter_fold_change", + family="single_reporter_panel", + summary="Single-reporter panel semantics with compiled fold-change summaries.", + primary_metric="log2FC", + primary_readout="reporter / normalizer", + tags=("single_reporter", "ratio", "panel", "fold_change"), + ), + ), + control_rules=(), + windows=(), + metrics=( + ProtocolMetricSpec( + id="Normalizer", + stage="raw", + summary="Raw configured normalizer trace.", + formula="configured_normalizer_channel", + profiles=("single_reporter_raw", "single_reporter_fold_change"), + ), + ProtocolMetricSpec( + id="Reporter", + stage="raw", + summary="Raw configured reporter trace.", + formula="configured_reporter_channel", + profiles=("single_reporter_raw", "single_reporter_fold_change"), + ), + ProtocolMetricSpec( + id="Reporter_Normalizer", + stage="support", + summary="Configured reporter normalized by the configured denominator channel.", + formula="configured_reporter_channel / configured_normalizer_channel", + depends_on=("Reporter", "Normalizer"), + value_space="linear_ratio", + unit="ratio", + comparable_group="primary_ratio_linear", + profiles=("single_reporter_raw", "single_reporter_fold_change"), + ), + ProtocolMetricSpec( + id="FC", + stage="summary", + summary="Nearest-time fold-change relative to the configured baseline treatment.", + formula="Reporter_Normalizer(t*) / baseline(Reporter_Normalizer)", + depends_on=("Reporter_Normalizer",), + value_space="fold_change_ratio", + unit="ratio", + comparable_group="fold_change_linear", + profiles=("single_reporter_fold_change",), + ), + ProtocolMetricSpec( + id="log2FC", + stage="summary", + summary="Log2 fold-change relative to the configured baseline treatment.", + formula="log2(FC)", + depends_on=("FC",), + value_space="log2_fold_change", + unit="log2_ratio", + comparable_group="fold_change_log2", + profiles=("single_reporter_fold_change",), + ), + ), + effect_signs=(), + figures=( + ProtocolFigureSpec( + id="raw_kinetics", + kind="qc", + summary="Raw kinetics view over the configured normalizer, reporter, and reporter ratio channels.", + primary=True, + ), + ProtocolFigureSpec( + id="endpoint_by_condition", + kind="summary", + summary="Endpoint comparison grouped by treatment/condition.", + primary=True, + ), + ProtocolFigureSpec( + id="endpoint_by_design", + kind="summary", + summary="Endpoint comparison grouped by construct/design.", + primary=True, + ), + ProtocolFigureSpec( + id="intensity_overview", + kind="kinetics", + summary="Combined time-series and endpoint view of the primary single-reporter ratio.", + primary=True, + ), + ProtocolFigureSpec( + id="value_distributions", + kind="qc", + summary="Distribution view of the primary single-reporter ratio.", + ), + ), + plot_profiles=( + ProtocolPlotProfileSpec( + id="screen_overview", + summary="Balanced default set for single-reporter plate-reader experiments.", + figures=("raw_kinetics", "endpoint_by_condition", "endpoint_by_design", "intensity_overview"), + ), + ProtocolPlotProfileSpec( + id="kinetics_qc", + summary="Kinetics-first QC view with raw traces and distributions.", + figures=("raw_kinetics", "value_distributions"), + ), + ), + default_plot_profile="screen_overview", + execution=ProtocolExecutionPlan( + notebook=ProtocolNotebookPolicy( + default_template="notebook/eda", + allowed_templates=("notebook/eda", "notebook/microplate", "notebook/basic"), + summary="Single-reporter plate-reader screens default to the EDA notebook with plot support.", + ), + plugin_defaults=( + ProtocolPluginDefaultsSpec( + plugin="ingest/synergy_h1", + summary=( + "Single-reporter screens inherit generic ingest settings here; " + "the compiler derives the required reporter/normalizer channels." + ), + with_={ + "mode": binding_value("ingest.mode", "auto"), + "channel_map": binding_value("ingest.channel_map", None), + "sheet_names": binding_value("ingest.sheet_names", None), + "add_sheet": binding_value("ingest.add_sheet", False), + "time_round_decimals": binding_value("ingest.time_round_decimals", 12), + "time_step_h": binding_value("ingest.time_step_h", None), + "auto_roots": binding_value("ingest.auto_roots", None), + "auto_include": binding_value("ingest.auto_include", list(DEFAULT_INCLUDE)), + "auto_exclude": binding_value("ingest.auto_exclude", list(DEFAULT_EXCLUDE)), + "auto_pick": binding_value("ingest.auto_pick", "single"), + "auto_recursive": binding_value("ingest.auto_recursive", False), + "add_source_column": binding_value("ingest.add_source_column", False), + "source_col": binding_value("ingest.source_col", "source_file"), + "print_summary": binding_value("ingest.print_summary", True), + }, + ), + ProtocolPluginDefaultsSpec( + plugin="transform/fold_change", + summary=( + "Single-reporter fold-change inherits generic comparison settings here; " + "the compiler sets the target to the configured reporter/normalizer ratio." + ), + with_={ + "report_times": binding_value("fold_change.report_times"), + "time_tolerance": binding_value("fold_change.time_tolerance", 0.51), + "agg": binding_value("fold_change.agg", "median"), + "treatment_column": binding_value("fold_change.treatment_column", "treatment"), + "group_by": binding_value("fold_change.group_by", ["design_id"]), + "use_global_baseline": binding_value("fold_change.use_global_baseline", False), + "global_baseline_value": binding_value("fold_change.global_baseline_value", None), + "overrides": binding_value("fold_change.overrides", []), + "fc_column": binding_value("fold_change.fc_column", "FC"), + "log2fc_column": binding_value("fold_change.log2fc_column", "log2FC"), + }, + ), + ), + compiler=compile_plate_reader_single_reporter_screen, + ), + ) + + retron_protocol = ProtocolDescriptor( + protocol="plate_reader/retron_sponge_screen", + domain="plate_reader", + family="matched_control_screen", + summary=( + "Plate-reader retron sponge screen with explicit matched-control kinetics, burden, leakiness, " + "and cross-sensor ranking summaries." + ), + tags=("plate_reader", "retron", "sponge", "matched_control", "screen", "ratio"), + input_fields=dual_reporter_protocol.input_fields, + analysis_fields=( + measurement_field, + field( + "reporter_channel", + "Reporter channel used when measurement=single_reporter.", + kind="string", + default="RFP", + ), + field( + "growth_channel", + "Growth / biomass proxy channel used when measurement=single_reporter.", + kind="string", + default="OD600", + ), + field( + "include_fold_change", + "Optionally build the fold-change comparison table.", + kind="bool", + default=False, + ), + dual_strict_field, + dual_preprocessing_field, + field( + "semantic_metrics", + "Matched-control sponge-analysis settings.", + children=( + field( + "design_column", + "Design label column used to derive sensor/sponge identities.", + kind="string", + default="design_id_alias", + ), + field("state_column", "2x2 state label column.", kind="string", default="treatment_alias"), + field( + "raw_treatment_column", + "Raw treatment column used to recover the actual stress label.", + kind="string", + default="treatment", + ), + field( + "plate_column", + "Plate-normalization boundary column. Set to null when workbook sheets are acquisition segments of one plate; set it explicitly when sheets encode distinct biological plates.", + kind="string", + allow_none=True, + default=None, + ), + field("replicate_column", "Replicate-well identifier column.", kind="string", default="position"), + field("sensor_column", "Optional explicit sensor column.", kind="string", allow_none=True), + field("sponge_column", "Optional explicit sponge column.", kind="string", allow_none=True), + field("genotype_column", "Optional explicit genotype-id column.", kind="string", allow_none=True), + field( + "stress_condition_column", + "Optional explicit stress-condition column when raw treatment parsing is not canonical.", + kind="string", + allow_none=True, + ), + field( + "relevant_stress_column", + "Optional explicit boolean column marking relevant stress rows.", + kind="string", + allow_none=True, + ), + field( + "expected_sign_column", + "Optional explicit sign column (-1/+1) for cross-sensor ranking.", + kind="string", + allow_none=True, + ), + field( + "relevant_sensor_pair_column", + "Optional explicit boolean column marking on-target sensor/sponge pairs.", + kind="string", + allow_none=True, + ), + field( + "matched_control_group_column", + "Optional explicit grouping column for same-sensor tetO control matching.", + kind="string", + allow_none=True, + ), + field( + "sponge_family_size_column", + "Optional explicit sponge-family size/category column.", + kind="string", + allow_none=True, + ), + field( + "design_separator", + "Separator used when deriving sensor/sponge from the design label.", + kind="string", + default="/", + ), + field( + "control_name", + "Control sponge label used for same-sensor matching.", + kind="string", + default="tetO", + ), + field( + "no_stress_label", + "Canonical no-stress label for summary outputs.", + kind="string", + default="H2O", + ), + field( + "stress_time_zero_policy", + "How to resolve the stress-addition boundary on the assay clock.", + kind="string", + choices=("explicit", "largest_gap_midpoint"), + default="largest_gap_midpoint", + ), + field( + "stress_time_zero_h", + "Explicit stress-addition time in hours on the assay clock when policy=explicit.", + kind="number", + allow_none=True, + default=None, + ), + field( + "max_post_stress_hours", + "Optional cap on the primary post-stress window, measured in hours after stress addition, " + "before both AUC and endpoint summaries are computed.", + kind="number", + allow_none=True, + default=DEFAULT_PRIMARY_POST_STRESS_HOURS, + ), + field( + "pre_reads", + "Number of pre-stress reads used for the baseline window.", + kind="integer", + default=3, + ), + field("endpoint_reads", "Number of reads used in the endpoint window.", kind="integer", default=3), + field( + "states", + "Explicit 2x2 IPTG/stress state labels.", + children=( + field( + "uninduced_unstressed", + "Label for the H2O, -IPTG state.", + kind="string", + default="-IPTG/-stress", + ), + field( + "induced_unstressed", + "Label for the H2O, +IPTG state.", + kind="string", + default="+IPTG/-stress", + ), + field( + "uninduced_stressed", + "Label for the relevant-stress, -IPTG state.", + kind="string", + default="-IPTG/+stress", + ), + field( + "induced_stressed", + "Label for the relevant-stress, +IPTG state.", + kind="string", + default="+IPTG/+stress", + ), + ), + ), + field( + "plateau", + "Primary post-stress window policy.", + children=( + field( + "mode", + "Window selector: full trace after stress, or stop once the matched tetO control plateaus.", + kind="string", + choices=("full_post_stress", "control_plateau"), + default="full_post_stress", + ), + field( + "slope_tolerance", + "Absolute OD slope threshold used for plateau detection.", + kind="number", + default=0.01, + ), + field( + "min_intervals", + "Minimum number of trailing low-slope intervals before calling plateau.", + kind="integer", + default=2, + ), + ), + ), + field( + "relevant_stress_map", + "Sensor -> relevant stress label mapping.", + kind="mapping", + allow_unknown=True, + ), + field( + "sensor_target_map", + "Sensor -> cognate sponge motif list.", + kind="mapping", + allow_unknown=True, + ), + field( + "expected_sign_map", + "Optional explicit sign overrides for cross-sensor ranking.", + kind="mapping", + allow_unknown=True, + ), + ), + ), + ), + factors=( + ProtocolFactorSpec(name="sensor", role="sensor", summary="Reporter promoter / sensor arm."), + ProtocolFactorSpec(name="sponge", role="construct", summary="Real or tetO sponge arm."), + ProtocolFactorSpec(name="stress_condition", role="stress", summary="Relevant stress or H2O control."), + ProtocolFactorSpec(name="IPTG", role="induction", summary="IPTG-driven retron-expression state."), + ProtocolFactorSpec(name="replicate_id", role="replicate", summary="Replicate well identifier."), + ProtocolFactorSpec(name="time", role="time", summary="Time on the assay clock in hours."), + ProtocolFactorSpec(name="plate_id", role="plate", summary="Plate-local normalization boundary."), + ProtocolFactorSpec(name="genotype_id", role="construct", summary="Sensor/sponge genotype identifier."), + ), + semantic_profiles=( + ProtocolSemanticProfileSpec( + id="yfp_cfp", + family="matched_control_dual_reporter", + summary="Dual-reporter sponge-screen semantics on the log2(YFP/CFP) axis.", + primary_metric="O_abs_AUC", + primary_readout="log2(YFP / CFP)", + tags=("dual_reporter", "matched_control", "sponge"), + ), + ProtocolSemanticProfileSpec( + id="single_reporter", + family="matched_control_single_reporter", + summary="Single-reporter sponge-screen semantics on the log2(configured reporter / configured growth channel) axis.", + primary_metric="O_abs_AUC", + primary_readout="log2(configured_reporter_channel / configured_growth_channel)", + tags=("single_reporter", "matched_control", "sponge"), + ), + ), + control_rules=( + ProtocolControlRule( + id="matched_same_sensor_control", + summary=( + "Normalize every real sponge well to the same-sensor tetO control on the same plate, " + "matched by stress state, IPTG state, and timepoint." + ), + match_on=("sensor", "plate_id", "stress_condition", "IPTG", "time"), + control_selector="matched_tetO_group", + profiles=("yfp_cfp", "single_reporter"), + ), + ), + windows=( + ProtocolWindowSpec( + id="pre_stress_last_n", + summary="Use the last N reads before stress addition as the baseline window.", + anchor="stress_time_zero", + selector="last_n_before", + params={"n": 3}, + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolWindowSpec( + id="primary_post_stress", + summary="Use the post-stress kinetic window through the configured end-of-window policy.", + anchor="stress_time_zero", + selector="configured_post_stress_window", + params={"policy": "semantic_metrics.plateau"}, + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolWindowSpec( + id="endpoint_last_n", + summary="Use the last N reads inside the primary post-stress window as the endpoint window after any " + "configured post-stress time cap is applied.", + anchor="primary_post_stress", + selector="last_n_within", + params={"n": 3}, + profiles=("yfp_cfp", "single_reporter"), + ), + ), + metrics=( + ProtocolMetricSpec( + id="OD", + stage="raw", + summary="Raw OD600 trace.", + formula="OD600", + profile_overrides={ + "single_reporter": ProtocolSemanticProfileOverride( + summary="Raw configured growth-proxy trace.", + formula="configured_growth_channel", + ) + }, + ), + ProtocolMetricSpec(id="CFP", stage="raw", summary="Raw CFP trace.", formula="CFP", profiles=("yfp_cfp",)), + ProtocolMetricSpec(id="YFP", stage="raw", summary="Raw YFP trace.", formula="YFP", profiles=("yfp_cfp",)), + ProtocolMetricSpec( + id="Reporter", + stage="raw", + summary="Raw configured reporter trace.", + formula="configured_reporter_channel", + profiles=("single_reporter",), + ), + ProtocolMetricSpec( + id="YFP_OD", + stage="support", + summary="Supporting YFP per biomass proxy.", + formula="YFP / OD600", + depends_on=("YFP", "OD"), + value_space="linear_ratio", + unit="ratio", + comparable_group="support_ratio_linear", + profiles=("yfp_cfp",), + ), + ProtocolMetricSpec( + id="CFP_OD", + stage="support", + summary="Supporting CFP per biomass proxy.", + formula="CFP / OD600", + depends_on=("CFP", "OD"), + value_space="linear_ratio", + unit="ratio", + comparable_group="support_ratio_linear", + profiles=("yfp_cfp",), + ), + ProtocolMetricSpec( + id="Reporter_OD", + stage="support", + summary="Supporting configured reporter per biomass proxy.", + formula="configured_reporter_channel / configured_growth_channel", + depends_on=("Reporter", "OD"), + value_space="linear_ratio", + unit="ratio", + comparable_group="support_ratio_linear", + profiles=("single_reporter",), + ), + ProtocolMetricSpec( + id="R", + stage="derived", + summary="Primary within-well log2 ratio.", + formula="log2(YFP / CFP)", + depends_on=("YFP", "CFP"), + value_space="log2_ratio", + unit="log2_ratio", + comparable_group="primary_ratio_log2", + profile_overrides={ + "single_reporter": ProtocolSemanticProfileOverride( + summary="Primary within-well single-reporter log2 ratio.", + formula="log2(configured_reporter_channel / configured_growth_channel)", + depends_on=("Reporter", "OD", "Reporter_OD"), + value_space="log2_ratio", + unit="log2_ratio", + comparable_group="primary_ratio_log2", + ) + }, + ), + ProtocolMetricSpec( + id="R_pre", + stage="summary", + summary="Mean of the primary ratio in the pre-stress window.", + formula="mean(R over pre_stress_last_n)", + depends_on=("R", "pre_stress_last_n"), + value_space="log2_ratio", + unit="log2_ratio", + comparable_group="primary_ratio_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="P_pre", + stage="summary", + summary="Pre-stress matched-control preload shift between +IPTG and -IPTG states.", + formula="mean(R_pre - R_pre_tetO,matched)(+IPTG) - mean(R_pre - R_pre_tetO,matched)(-IPTG)", + depends_on=("R_pre", "matched_same_sensor_control"), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="mu", + stage="support", + summary="Approximate growth-rate trace from the slope of log(OD600).", + formula="d(log(OD600)) / dt", + depends_on=("OD",), + profile_overrides={ + "single_reporter": ProtocolSemanticProfileOverride( + summary="Approximate growth-rate trace from the slope of log(configured growth channel).", + formula="d(log(configured_growth_channel)) / dt", + ) + }, + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="B", + stage="derived", + summary="Baseline-shifted reporter ratio relative to the well's own pre-stress state.", + formula="R(t) - R_pre", + depends_on=("R", "R_pre"), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="C", + stage="comparison", + summary="Matched-control-normalized sponge deviation.", + formula="B(t) - mean(B matched_same_sensor_control at t)", + depends_on=("B", "matched_same_sensor_control"), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="C_AUC", + stage="summary", + summary="AUC of the matched-control-normalized trace over the primary post-stress window.", + formula="AUC(C over primary_post_stress)", + depends_on=("C", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="C_END", + stage="summary", + summary="Endpoint mean of the matched-control-normalized trace.", + formula="mean(C over endpoint_last_n)", + depends_on=("C", "endpoint_last_n"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D", + stage="comparison", + summary="IPTG-state effect after matched-control normalization.", + formula="mean(C +IPTG) - mean(C -IPTG)", + depends_on=("C",), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_AUC", + stage="summary", + summary="AUC of the IPTG-state effect.", + formula="AUC(D over primary_post_stress)", + depends_on=("D", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_END", + stage="summary", + summary="Endpoint mean of the IPTG-state effect.", + formula="mean(D over endpoint_last_n)", + depends_on=("D", "endpoint_last_n"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_abs", + stage="comparison", + summary="Absolute matched-control IPTG-state effect that retains pre-stress preload differences.", + formula="mean(R - R_tetO,matched)(+IPTG) - mean(R - R_tetO,matched)(-IPTG)", + depends_on=("R", "matched_same_sensor_control"), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_abs_AUC", + stage="summary", + summary="AUC of the absolute matched-control IPTG-state effect.", + formula="AUC(D_abs over primary_post_stress)", + depends_on=("D_abs", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_abs_END", + stage="summary", + summary="Endpoint mean of the absolute matched-control IPTG-state effect.", + formula="mean(D_abs over endpoint_last_n)", + depends_on=("D_abs", "endpoint_last_n"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_growth", + stage="burden", + summary="Construct-specific growth burden after same-sensor tetO subtraction.", + formula="mean(mu - mu_tetO,matched)(+IPTG) - mean(mu - mu_tetO,matched)(-IPTG)", + depends_on=("mu", "matched_same_sensor_control"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_growth_AUC", + stage="burden", + summary="AUC of construct-specific growth burden over the primary window.", + formula="AUC(D_growth over primary_post_stress)", + depends_on=("D_growth", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="D_growth_END", + stage="burden", + summary="Endpoint mean of construct-specific growth burden.", + formula="mean(D_growth over endpoint_last_n)", + depends_on=("D_growth", "endpoint_last_n"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="M", + stage="comparison", + summary="Stress modulation of the IPTG-state effect after stress addition.", + formula="D(relevant_stress) - D(H2O)", + depends_on=("D",), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="M_AUC", + stage="summary", + summary="AUC of stress modulation over the post-stress window.", + formula="AUC(M over primary_post_stress)", + depends_on=("M", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="M_END", + stage="summary", + summary="Endpoint mean of the stress modulation trace.", + formula="mean(M over endpoint_last_n)", + depends_on=("M", "endpoint_last_n"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="O", + stage="ranking", + summary="Expected-direction-aligned IPTG-state effect.", + formula="expected_decoy_sign * D", + depends_on=("D",), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="O_AUC", + stage="ranking", + summary="Positive-area integral of the expected-direction-aligned IPTG-state effect.", + formula="∫ max(O, 0) dt over primary_post_stress", + depends_on=("O", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="O_abs", + stage="ranking", + summary="Expected-direction-aligned absolute matched-control IPTG-state effect.", + formula="expected_decoy_sign * D_abs", + depends_on=("D_abs",), + value_space="delta_log2_ratio", + unit="log2_ratio_delta", + comparable_group="response_delta_log2", + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="O_abs_AUC", + stage="ranking", + summary="Positive-area integral of the expected-direction-aligned absolute matched-control IPTG-state effect.", + formula="∫ max(O_abs, 0) dt over primary_post_stress", + depends_on=("O_abs", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="G_sensor", + stage="summary", + summary="Native tetO sensor response used for cross-sensor scaling.", + formula="AUC(mean(B tetO,-IPTG,relevant stress) - mean(B tetO,-IPTG,H2O))", + depends_on=("B", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="S_AUC", + stage="ranking", + summary="Cross-sensor scaled expected-direction post-stress area relative to the native sensor response.", + formula="O_AUC / abs(G_sensor)", + depends_on=("O_AUC", "G_sensor"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="S_abs_AUC", + stage="ranking", + summary="Cross-sensor scaled expected-direction total area relative to the native sensor response.", + formula="O_abs_AUC / abs(G_sensor)", + depends_on=("O_abs_AUC", "G_sensor"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="L_pre", + stage="leakiness", + summary="Pre-stress leakiness relative to the matched control.", + formula="R_pre(real,-IPTG) - mean(R_pre tetO,-IPTG)", + depends_on=("R_pre", "matched_same_sensor_control"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="L_post_AUC", + stage="leakiness", + summary="Uninduced post-stress leakiness over the primary window.", + formula="AUC(mean(C -IPTG))", + depends_on=("C", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="T_ratio_AUC", + stage="burden", + summary="tetO ratio burden from the +IPTG versus -IPTG state contrast.", + formula="AUC(mean(B tetO,+IPTG) - mean(B tetO,-IPTG))", + depends_on=("B", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="T_growth_AUC", + stage="burden", + summary="tetO growth burden from the +IPTG versus -IPTG state contrast.", + formula="AUC(mean(mu tetO,+IPTG) - mean(mu tetO,-IPTG))", + depends_on=("mu", "primary_post_stress"), + profiles=("yfp_cfp", "single_reporter"), + ), + ProtocolMetricSpec( + id="T_finalOD", + stage="burden", + summary="Endpoint OD burden for the tetO control.", + formula="mean(OD tetO,+IPTG,end) - mean(OD tetO,-IPTG,end)", + depends_on=("OD", "endpoint_last_n"), + profiles=("yfp_cfp", "single_reporter"), + ), + ), + effect_signs=( + ProtocolEffectSignSpec( + target="spyP", + expected_sign="negative", + summary="Effective decoys reduce the spyP ratio after sign correction.", + ), + ProtocolEffectSignSpec( + target="sulAp", + expected_sign="positive", + summary="Effective LexA decoys increase the sulAp ratio.", + ), + ProtocolEffectSignSpec( + target="soxSp", + expected_sign="negative", + summary="Effective SoxR/SoxS decoys reduce the soxSp ratio after sign correction.", + ), + ), + figures=retron_sponge_figures, + plot_profiles=retron_sponge_plot_profiles, + default_plot_profile="screen_overview", + artifacts=( + ProtocolArtifactSpec( + id="semantic_trace_table", + summary="CSV export of the matched-control sponge trace table.", + default=True, + ), + ProtocolArtifactSpec( + id="semantic_summary_table", + summary="CSV export of the matched-control sponge summary table.", + default=True, + ), + ), + ranking=ProtocolRankingSpec( + primary_metric="O_abs_AUC", + direction="higher_is_better", + penalties=("T_ratio_AUC", "T_finalOD", "L_pre", "L_post_AUC"), + supporting_metrics=("S_abs_AUC", "P_pre", "O_AUC", "M_AUC"), + summary="Rank hits by expected-direction total area, then inspect preload, expected-direction post-stress area, burden, and leakiness.", + profiles=("yfp_cfp", "single_reporter"), + ), + execution=ProtocolExecutionPlan( + notebook=ProtocolNotebookPolicy( + default_template="notebook/retron_sponge", + allowed_templates=("notebook/retron_sponge", "notebook/eda", "notebook/microplate", "notebook/basic"), + summary=( + "Retron sponge screens default to the protocol-specific review notebook and keep the generic " + "record explorers available as fallbacks." + ), + ), + plugin_defaults=( + ProtocolPluginDefaultsSpec( + plugin="ingest/synergy_h1", + summary=( + "Retron sponge screens inherit generic ingest settings here; " + "the compiler derives the required measurement-family channels." + ), + with_={ + "mode": binding_value("ingest.mode", "auto"), + "channel_map": binding_value("ingest.channel_map", None), + "sheet_names": binding_value("ingest.sheet_names", None), + "add_sheet": binding_value("ingest.add_sheet", False), + "time_round_decimals": binding_value("ingest.time_round_decimals", 12), + "time_step_h": binding_value("ingest.time_step_h", None), + "auto_roots": binding_value("ingest.auto_roots", None), + "auto_include": binding_value("ingest.auto_include", list(DEFAULT_INCLUDE)), + "auto_exclude": binding_value("ingest.auto_exclude", list(DEFAULT_EXCLUDE)), + "auto_pick": binding_value("ingest.auto_pick", "single"), + "auto_recursive": binding_value("ingest.auto_recursive", False), + "add_source_column": binding_value("ingest.add_source_column", False), + "source_col": binding_value("ingest.source_col", "source_file"), + "print_summary": binding_value("ingest.print_summary", True), + }, + ), + ProtocolPluginDefaultsSpec( + plugin="transform/fold_change", + summary=( + "Retron sponge fold-change inherits generic comparison settings here; " + "the compiler sets the target to the compiled primary ratio." + ), + with_={ + "report_times": binding_value("fold_change.report_times"), + "time_tolerance": binding_value("fold_change.time_tolerance", 0.51), + "agg": binding_value("fold_change.agg", "median"), + "treatment_column": binding_value("fold_change.treatment_column", "treatment"), + "group_by": binding_value("fold_change.group_by", ["design_id"]), + "use_global_baseline": binding_value("fold_change.use_global_baseline", False), + "global_baseline_value": binding_value("fold_change.global_baseline_value", None), + "overrides": binding_value("fold_change.overrides", []), + "fc_column": binding_value("fold_change.fc_column", "FC"), + "log2fc_column": binding_value("fold_change.log2fc_column", "log2FC"), + }, + ), + ), + compiler=compile_plate_reader_retron_sponge_screen, + ), + ) + + return single_reporter_protocol, retron_protocol diff --git a/src/reader/protocols/builtins.py b/src/reader/protocols/builtins.py index 82722b7..97f59a9 100644 --- a/src/reader/protocols/builtins.py +++ b/src/reader/protocols/builtins.py @@ -2,16 +2,14 @@ from functools import cache -from reader.domains.plate_reader.analysis._retron_sponge_contract import DEFAULT_PRIMARY_POST_STRESS_HOURS from reader.plugins.ingest.discovery_policy import DEFAULT_EXCLUDE, DEFAULT_INCLUDE +from ._builtins_plate_reader_variants import build_plate_reader_variant_protocols from .compiler import ( compile_cytometry_flow_panel, compile_generic_protocol, compile_logic_sfxi_screen, compile_plate_reader_dual_reporter_screen, - compile_plate_reader_retron_sponge_screen, - compile_plate_reader_single_reporter_screen, ) from .model import ( ProtocolArtifactSpec, @@ -19,7 +17,6 @@ ProtocolConfigFieldSpec, ProtocolControlRule, ProtocolDescriptor, - ProtocolEffectSignSpec, ProtocolExecutionPlan, ProtocolFactorSpec, ProtocolFigureSpec, @@ -28,7 +25,6 @@ ProtocolPlotProfileSpec, ProtocolPluginDefaultsSpec, ProtocolRankingSpec, - ProtocolSemanticProfileOverride, ProtocolSemanticProfileSpec, ProtocolWindowSpec, binding_value, @@ -100,6 +96,7 @@ def _field( "notebook/retron_sponge_aggregate", "notebook/eda", "notebook/microplate", + "notebook/dual_reporter_triptych", "notebook/cytometry", "notebook/sfxi_eda", ), @@ -537,7 +534,12 @@ def _field( execution=ProtocolExecutionPlan( notebook=ProtocolNotebookPolicy( default_template="notebook/eda", - allowed_templates=("notebook/eda", "notebook/microplate", "notebook/basic"), + allowed_templates=( + "notebook/eda", + "notebook/dual_reporter_triptych", + "notebook/microplate", + "notebook/basic", + ), summary="Dual-reporter plate-reader screens default to the EDA notebook with plot support.", ), plugin_defaults=( @@ -768,6 +770,76 @@ def _field( _field("include_vec8", "Build the vec8 summary table.", kind="bool", default=True), _field("include_export", "Emit the workbook export when vec8 is present.", kind="bool", default=True), _field("strict", "Treat runtime contract mismatches as hard errors.", kind="bool", default=True), + _field( + "sfxi_objective", + "OPAL-compatible SFXI objective settings for setpoint scoring plots.", + children=( + _field( + "setpoints", + "Named SFXI setpoint vectors in 00/10/01/11 order.", + kind="mapping", + allow_unknown=True, + default={"and": [0.0, 0.0, 0.0, 1.0]}, + ), + _field( + "scaling", + "SFXI effect scaling controls.", + children=( + _field("percentile", "Effect scaling percentile.", kind="integer", default=95), + _field( + "min_n", "Minimum sample count before percentile scaling.", kind="integer", default=5 + ), + _field("eps", "Small denominator floor used during scaling.", kind="number", default=1e-8), + ), + ), + _field( + "exponents", + "SFXI objective exponents.", + children=( + _field( + "logic_exponent_beta", + "Exponent applied to logic_fidelity.", + kind="number", + default=1.0, + ), + _field( + "intensity_exponent_gamma", + "Exponent applied to effect_scaled.", + kind="number", + default=1.0, + ), + ), + ), + _field( + "intensity_log2_offset_delta", + "Log2 offset applied before raw effect scaling.", + kind="number", + default=0.0, + ), + ), + ), + _field( + "sfxi_triptych_sequence", + "SFXI triptych sequence bundle settings for kinetics, snapshot, and sequence panels.", + allow_unknown=True, + children=( + _field( + "sequence_source", + "Public dnadesign USR dataset and overlay settings for sequence panels.", + allow_unknown=True, + ), + _field("bundle_id", "Stable bundle id used for generated file names.", kind="string"), + _field("snapshot_target_time_h", "Visual snapshot target time in hours.", kind="number"), + _field("induction_time_h", "Induction marker time in hours.", kind="number", allow_none=True), + _field( + "sequence_panel", + "Public dnadesign BaseRender sequence-panel profile and sizing settings.", + allow_unknown=True, + ), + _field("movie_enabled", "Emit an MP4 review movie in the bundle.", kind="bool", default=False), + _field("movie_fps", "Review movie frame cadence.", kind="number", default=0.85), + ), + ), _field( "preprocessing", "Pre-ingest cleanup policy for blanks and overflow.", @@ -896,6 +968,16 @@ def _field( kind="summary", summary="Logic symmetry geometry over the configured response channel.", ), + ProtocolFigureSpec( + id="sfxi_setpoint_scatter", + kind="summary", + summary="OPAL-compatible SFXI objective scatter over logic_fidelity and effect_scaled by setpoint.", + ), + ProtocolFigureSpec( + id="sfxi_triptych_sequence", + kind="summary", + summary="Bundle each SFXI promoter's kinetics, snapshot, and sequence architecture panels.", + ), ), plot_profiles=( ProtocolPlotProfileSpec( @@ -908,6 +990,16 @@ def _field( summary="Geometry-only logic symmetry review.", figures=("logic_symmetry",), ), + ProtocolPlotProfileSpec( + id="sfxi_objective", + summary="SFXI objective setpoint scatter review.", + figures=("sfxi_setpoint_scatter",), + ), + ProtocolPlotProfileSpec( + id="sfxi_sequence_review", + summary="SFXI promoter triptych sequence review.", + figures=("sfxi_triptych_sequence",), + ), ProtocolPlotProfileSpec( id="logic_full", summary="Full logic review with kinetics and symmetry geometry.", @@ -917,6 +1009,8 @@ def _field( "endpoint_by_design", "intensity_overview", "logic_symmetry", + "sfxi_setpoint_scatter", + "sfxi_triptych_sequence", ), ), ), @@ -935,7 +1029,12 @@ def _field( execution=ProtocolExecutionPlan( notebook=ProtocolNotebookPolicy( default_template="notebook/sfxi_eda", - allowed_templates=("notebook/sfxi_eda", "notebook/eda", "notebook/basic"), + allowed_templates=( + "notebook/sfxi_eda", + "notebook/dual_reporter_triptych", + "notebook/eda", + "notebook/basic", + ), summary="SFXI logic screens default to the vec8-aware notebook scaffold.", ), plugin_defaults=( @@ -1151,1054 +1250,9 @@ def _field( _DUAL_REPORTER_PROTOCOL = next( item for item in BUILTIN_PROTOCOLS if item.protocol == "plate_reader/dual_reporter_screen" ) -_DUAL_STRICT_FIELD = next(item for item in _DUAL_REPORTER_PROTOCOL.analysis_fields if item.key == "strict") -_DUAL_PREPROCESSING_FIELD = next( - item for item in _DUAL_REPORTER_PROTOCOL.analysis_fields if item.key == "preprocessing" -) -_PLATE_READER_MEASUREMENT_FIELD = _field( - "measurement", - "Primary matched-control measurement family.", - kind="string", - choices=("yfp_cfp", "single_reporter"), - default="yfp_cfp", -) - -_RETRON_SPONGE_FIGURES = ( - ProtocolFigureSpec( - id="raw_kinetics", - kind="qc", - summary="Raw growth and reporter kinetics for early QC before matched-control normalization.", - primary=True, - ), - ProtocolFigureSpec( - id="support_kinetics", - kind="qc", - summary="Growth-normalized support ratios that contextualize broad physiology vs reporter-specific effects.", - primary=True, - ), - ProtocolFigureSpec( - id="control_burden_panel", - kind="qc", - summary="tetO-only burden panel over the primary readout and growth-rate traces across the full run.", - primary=True, - ), - ProtocolFigureSpec( - id="baseline_shifted_kinetics", - kind="kinetics", - summary="Baseline-shifted kinetics that isolate post-stress movement from pre-stress offsets.", - primary=True, - ), - ProtocolFigureSpec( - id="matched_control_kinetics", - kind="kinetics", - summary="Per-arm matched-control-normalized kinetics that show deviation from same-sensor tetO controls across the full run.", - primary=True, - ), - ProtocolFigureSpec( - id="induced_effect_kinetics", - kind="kinetics", - summary="Per-arm post-stress increment trajectories after matched-control normalization, paired with a compact expected-direction positive-area score.", - primary=True, - ), - ProtocolFigureSpec( - id="absolute_effect_kinetics", - kind="kinetics", - summary="Per-arm matched-tetO separation trajectories that preserve pre-stress preload differences, paired with a compact expected-direction total-area score.", - primary=True, - ), - ProtocolFigureSpec( - id="control_anchored_decomposition", - kind="summary", - summary="Per-pair sponge-versus-matched-tetO assay summary with relevant-stress traces, H2O context, pre-stress ΔR, and expected-direction state-area summaries.", - primary=True, - ), - ProtocolFigureSpec( - id="interaction_summary", - kind="summary", - summary="2x2 state summary over the matched-control-normalized endpoint or AUC surface.", - primary=True, - ), - ProtocolFigureSpec( - id="library_heatmaps", - kind="summary", - summary="Library-wide heatmaps over expected-direction total area, expected-direction post-stress area, and preload shift.", - primary=True, - ), - ProtocolFigureSpec( - id="stress_modulation_scores", - kind="summary", - summary="Stress-modulation score review across on-target sponge/sensor pairs.", - primary=True, - ), - ProtocolFigureSpec( - id="pareto_ranking", - kind="summary", - summary="Pareto-style ranking of expected-direction total area against burden and leakiness.", - primary=True, - ), -) - -_RETRON_SPONGE_PLOT_PROFILES = ( - ProtocolPlotProfileSpec( - id="screen_overview", - summary="Reader-first default set for matched-control sponge screens from QC through decision and ranking.", - figures=( - "raw_kinetics", - "support_kinetics", - "control_burden_panel", - "control_anchored_decomposition", - "absolute_effect_kinetics", - "induced_effect_kinetics", - "library_heatmaps", - "pareto_ranking", - ), - ), - ProtocolPlotProfileSpec( - id="kinetics_qc", - summary="QC-first review over raw, support, and tetO burden traces.", - figures=("raw_kinetics", "support_kinetics", "control_burden_panel"), - ), - ProtocolPlotProfileSpec( - id="analysis_review", - summary="Expanded semantic review over compiled sponge metrics, intermediate transforms, and rankings.", - figures=( - "baseline_shifted_kinetics", - "matched_control_kinetics", - "absolute_effect_kinetics", - "induced_effect_kinetics", - "control_anchored_decomposition", - "interaction_summary", - "library_heatmaps", - "stress_modulation_scores", - "pareto_ranking", - ), - ), -) - -_PLATE_READER_SINGLE_REPORTER_PROTOCOL = ProtocolDescriptor( - protocol="plate_reader/single_reporter_screen", - domain="plate_reader", - family="screen_analysis", - summary=( - "Single-reporter plate-reader panel protocol with configurable reporter/normalizer channels and " - "compiled fold-change summaries." - ), - tags=("plate_reader", "single_reporter", "screen", "ratio", "fold_change"), - input_fields=_DUAL_REPORTER_PROTOCOL.input_fields, - analysis_fields=( - _field( - "reporter_channel", - "Primary reporter channel to normalize against the configured normalizer.", - kind="string", - default="RFP", - ), - _field( - "normalizer_channel", - "Denominator channel used to normalize the reporter signal.", - kind="string", - default="OD600", - ), - _field("include_fold_change", "Build the fold-change comparison table.", kind="bool", default=True), - _DUAL_STRICT_FIELD, - _DUAL_PREPROCESSING_FIELD, - ), - factors=_DUAL_REPORTER_PROTOCOL.factors, - semantic_profiles=( - ProtocolSemanticProfileSpec( - id="single_reporter_raw", - family="single_reporter_panel", - summary="Single-reporter panel semantics over a configured reporter/normalizer ratio.", - primary_metric="Reporter_Normalizer", - primary_readout="reporter / normalizer", - tags=("single_reporter", "ratio", "panel"), - ), - ProtocolSemanticProfileSpec( - id="single_reporter_fold_change", - family="single_reporter_panel", - summary="Single-reporter panel semantics with compiled fold-change summaries.", - primary_metric="log2FC", - primary_readout="reporter / normalizer", - tags=("single_reporter", "ratio", "panel", "fold_change"), - ), - ), - control_rules=(), - windows=(), - metrics=( - ProtocolMetricSpec( - id="Normalizer", - stage="raw", - summary="Raw configured normalizer trace.", - formula="configured_normalizer_channel", - profiles=("single_reporter_raw", "single_reporter_fold_change"), - ), - ProtocolMetricSpec( - id="Reporter", - stage="raw", - summary="Raw configured reporter trace.", - formula="configured_reporter_channel", - profiles=("single_reporter_raw", "single_reporter_fold_change"), - ), - ProtocolMetricSpec( - id="Reporter_Normalizer", - stage="support", - summary="Configured reporter normalized by the configured denominator channel.", - formula="configured_reporter_channel / configured_normalizer_channel", - depends_on=("Reporter", "Normalizer"), - value_space="linear_ratio", - unit="ratio", - comparable_group="primary_ratio_linear", - profiles=("single_reporter_raw", "single_reporter_fold_change"), - ), - ProtocolMetricSpec( - id="FC", - stage="summary", - summary="Nearest-time fold-change relative to the configured baseline treatment.", - formula="Reporter_Normalizer(t*) / baseline(Reporter_Normalizer)", - depends_on=("Reporter_Normalizer",), - value_space="fold_change_ratio", - unit="ratio", - comparable_group="fold_change_linear", - profiles=("single_reporter_fold_change",), - ), - ProtocolMetricSpec( - id="log2FC", - stage="summary", - summary="Log2 fold-change relative to the configured baseline treatment.", - formula="log2(FC)", - depends_on=("FC",), - value_space="log2_fold_change", - unit="log2_ratio", - comparable_group="fold_change_log2", - profiles=("single_reporter_fold_change",), - ), - ), - effect_signs=(), - figures=( - ProtocolFigureSpec( - id="raw_kinetics", - kind="qc", - summary="Raw kinetics view over the configured normalizer, reporter, and reporter ratio channels.", - primary=True, - ), - ProtocolFigureSpec( - id="endpoint_by_condition", - kind="summary", - summary="Endpoint comparison grouped by treatment/condition.", - primary=True, - ), - ProtocolFigureSpec( - id="endpoint_by_design", - kind="summary", - summary="Endpoint comparison grouped by construct/design.", - primary=True, - ), - ProtocolFigureSpec( - id="intensity_overview", - kind="kinetics", - summary="Combined time-series and endpoint view of the primary single-reporter ratio.", - primary=True, - ), - ProtocolFigureSpec( - id="value_distributions", - kind="qc", - summary="Distribution view of the primary single-reporter ratio.", - ), - ), - plot_profiles=( - ProtocolPlotProfileSpec( - id="screen_overview", - summary="Balanced default set for single-reporter plate-reader experiments.", - figures=("raw_kinetics", "endpoint_by_condition", "endpoint_by_design", "intensity_overview"), - ), - ProtocolPlotProfileSpec( - id="kinetics_qc", - summary="Kinetics-first QC view with raw traces and distributions.", - figures=("raw_kinetics", "value_distributions"), - ), - ), - default_plot_profile="screen_overview", - execution=ProtocolExecutionPlan( - notebook=ProtocolNotebookPolicy( - default_template="notebook/eda", - allowed_templates=("notebook/eda", "notebook/microplate", "notebook/basic"), - summary="Single-reporter plate-reader screens default to the EDA notebook with plot support.", - ), - plugin_defaults=( - ProtocolPluginDefaultsSpec( - plugin="ingest/synergy_h1", - summary=( - "Single-reporter screens inherit generic ingest settings here; " - "the compiler derives the required reporter/normalizer channels." - ), - with_={ - "mode": binding_value("ingest.mode", "auto"), - "channel_map": binding_value("ingest.channel_map", None), - "sheet_names": binding_value("ingest.sheet_names", None), - "add_sheet": binding_value("ingest.add_sheet", False), - "time_round_decimals": binding_value("ingest.time_round_decimals", 12), - "time_step_h": binding_value("ingest.time_step_h", None), - "auto_roots": binding_value("ingest.auto_roots", None), - "auto_include": binding_value("ingest.auto_include", list(DEFAULT_INCLUDE)), - "auto_exclude": binding_value("ingest.auto_exclude", list(DEFAULT_EXCLUDE)), - "auto_pick": binding_value("ingest.auto_pick", "single"), - "auto_recursive": binding_value("ingest.auto_recursive", False), - "add_source_column": binding_value("ingest.add_source_column", False), - "source_col": binding_value("ingest.source_col", "source_file"), - "print_summary": binding_value("ingest.print_summary", True), - }, - ), - ProtocolPluginDefaultsSpec( - plugin="transform/fold_change", - summary=( - "Single-reporter fold-change inherits generic comparison settings here; " - "the compiler sets the target to the configured reporter/normalizer ratio." - ), - with_={ - "report_times": binding_value("fold_change.report_times"), - "time_tolerance": binding_value("fold_change.time_tolerance", 0.51), - "agg": binding_value("fold_change.agg", "median"), - "treatment_column": binding_value("fold_change.treatment_column", "treatment"), - "group_by": binding_value("fold_change.group_by", ["design_id"]), - "use_global_baseline": binding_value("fold_change.use_global_baseline", False), - "global_baseline_value": binding_value("fold_change.global_baseline_value", None), - "overrides": binding_value("fold_change.overrides", []), - "fc_column": binding_value("fold_change.fc_column", "FC"), - "log2fc_column": binding_value("fold_change.log2fc_column", "log2FC"), - }, - ), - ), - compiler=compile_plate_reader_single_reporter_screen, - ), -) - -_PLATE_READER_RETRON_SPONGE_PROTOCOL = ProtocolDescriptor( - protocol="plate_reader/retron_sponge_screen", - domain="plate_reader", - family="matched_control_screen", - summary=( - "Plate-reader retron sponge screen with explicit matched-control kinetics, burden, leakiness, " - "and cross-sensor ranking summaries." - ), - tags=("plate_reader", "retron", "sponge", "matched_control", "screen", "ratio"), - input_fields=_DUAL_REPORTER_PROTOCOL.input_fields, - analysis_fields=( - _PLATE_READER_MEASUREMENT_FIELD, - _field( - "reporter_channel", - "Reporter channel used when measurement=single_reporter.", - kind="string", - default="RFP", - ), - _field( - "growth_channel", - "Growth / biomass proxy channel used when measurement=single_reporter.", - kind="string", - default="OD600", - ), - _field("include_fold_change", "Optionally build the fold-change comparison table.", kind="bool", default=False), - _DUAL_STRICT_FIELD, - _DUAL_PREPROCESSING_FIELD, - _field( - "semantic_metrics", - "Matched-control sponge-analysis settings.", - children=( - _field( - "design_column", - "Design label column used to derive sensor/sponge identities.", - kind="string", - default="design_id_alias", - ), - _field("state_column", "2x2 state label column.", kind="string", default="treatment_alias"), - _field( - "raw_treatment_column", - "Raw treatment column used to recover the actual stress label.", - kind="string", - default="treatment", - ), - _field( - "plate_column", - "Plate-normalization boundary column. Set to null when workbook sheets are acquisition segments of one plate; set it explicitly when sheets encode distinct biological plates.", - kind="string", - allow_none=True, - default=None, - ), - _field("replicate_column", "Replicate-well identifier column.", kind="string", default="position"), - _field("sensor_column", "Optional explicit sensor column.", kind="string", allow_none=True), - _field("sponge_column", "Optional explicit sponge column.", kind="string", allow_none=True), - _field("genotype_column", "Optional explicit genotype-id column.", kind="string", allow_none=True), - _field( - "stress_condition_column", - "Optional explicit stress-condition column when raw treatment parsing is not canonical.", - kind="string", - allow_none=True, - ), - _field( - "relevant_stress_column", - "Optional explicit boolean column marking relevant stress rows.", - kind="string", - allow_none=True, - ), - _field( - "expected_sign_column", - "Optional explicit sign column (-1/+1) for cross-sensor ranking.", - kind="string", - allow_none=True, - ), - _field( - "relevant_sensor_pair_column", - "Optional explicit boolean column marking on-target sensor/sponge pairs.", - kind="string", - allow_none=True, - ), - _field( - "matched_control_group_column", - "Optional explicit grouping column for same-sensor tetO control matching.", - kind="string", - allow_none=True, - ), - _field( - "sponge_family_size_column", - "Optional explicit sponge-family size/category column.", - kind="string", - allow_none=True, - ), - _field( - "design_separator", - "Separator used when deriving sensor/sponge from the design label.", - kind="string", - default="/", - ), - _field( - "control_name", "Control sponge label used for same-sensor matching.", kind="string", default="tetO" - ), - _field( - "no_stress_label", "Canonical no-stress label for summary outputs.", kind="string", default="H2O" - ), - _field( - "stress_time_zero_policy", - "How to resolve the stress-addition boundary on the assay clock.", - kind="string", - choices=("explicit", "largest_gap_midpoint"), - default="largest_gap_midpoint", - ), - _field( - "stress_time_zero_h", - "Explicit stress-addition time in hours on the assay clock when policy=explicit.", - kind="number", - allow_none=True, - default=None, - ), - _field( - "max_post_stress_hours", - "Optional cap on the primary post-stress window, measured in hours after stress addition, " - "before both AUC and endpoint summaries are computed.", - kind="number", - allow_none=True, - default=DEFAULT_PRIMARY_POST_STRESS_HOURS, - ), - _field( - "pre_reads", "Number of pre-stress reads used for the baseline window.", kind="integer", default=3 - ), - _field("endpoint_reads", "Number of reads used in the endpoint window.", kind="integer", default=3), - _field( - "states", - "Explicit 2x2 IPTG/stress state labels.", - children=( - _field( - "uninduced_unstressed", - "Label for the H2O, -IPTG state.", - kind="string", - default="-IPTG/-stress", - ), - _field( - "induced_unstressed", - "Label for the H2O, +IPTG state.", - kind="string", - default="+IPTG/-stress", - ), - _field( - "uninduced_stressed", - "Label for the relevant-stress, -IPTG state.", - kind="string", - default="-IPTG/+stress", - ), - _field( - "induced_stressed", - "Label for the relevant-stress, +IPTG state.", - kind="string", - default="+IPTG/+stress", - ), - ), - ), - _field( - "plateau", - "Primary post-stress window policy.", - children=( - _field( - "mode", - "Window selector: full trace after stress, or stop once the matched tetO control plateaus.", - kind="string", - choices=("full_post_stress", "control_plateau"), - default="full_post_stress", - ), - _field( - "slope_tolerance", - "Absolute OD slope threshold used for plateau detection.", - kind="number", - default=0.01, - ), - _field( - "min_intervals", - "Minimum number of trailing low-slope intervals before calling plateau.", - kind="integer", - default=2, - ), - ), - ), - _field( - "relevant_stress_map", - "Sensor -> relevant stress label mapping.", - kind="mapping", - allow_unknown=True, - ), - _field( - "sensor_target_map", - "Sensor -> cognate sponge motif list.", - kind="mapping", - allow_unknown=True, - ), - _field( - "expected_sign_map", - "Optional explicit sign overrides for cross-sensor ranking.", - kind="mapping", - allow_unknown=True, - ), - ), - ), - ), - factors=( - ProtocolFactorSpec(name="sensor", role="sensor", summary="Reporter promoter / sensor arm."), - ProtocolFactorSpec(name="sponge", role="construct", summary="Real or tetO sponge arm."), - ProtocolFactorSpec(name="stress_condition", role="stress", summary="Relevant stress or H2O control."), - ProtocolFactorSpec( - name="IPTG", - role="induction", - summary="IPTG-driven retron-expression state.", - ), - ProtocolFactorSpec(name="replicate_id", role="replicate", summary="Replicate well identifier."), - ProtocolFactorSpec(name="time", role="time", summary="Time on the assay clock in hours."), - ProtocolFactorSpec(name="plate_id", role="plate", summary="Plate-local normalization boundary."), - ProtocolFactorSpec(name="genotype_id", role="construct", summary="Sensor/sponge genotype identifier."), - ), - semantic_profiles=( - ProtocolSemanticProfileSpec( - id="yfp_cfp", - family="matched_control_dual_reporter", - summary="Dual-reporter sponge-screen semantics on the log2(YFP/CFP) axis.", - primary_metric="O_abs_AUC", - primary_readout="log2(YFP / CFP)", - tags=("dual_reporter", "matched_control", "sponge"), - ), - ProtocolSemanticProfileSpec( - id="single_reporter", - family="matched_control_single_reporter", - summary="Single-reporter sponge-screen semantics on the log2(configured reporter / configured growth channel) axis.", - primary_metric="O_abs_AUC", - primary_readout="log2(configured_reporter_channel / configured_growth_channel)", - tags=("single_reporter", "matched_control", "sponge"), - ), - ), - control_rules=( - ProtocolControlRule( - id="matched_same_sensor_control", - summary=( - "Normalize every real sponge well to the same-sensor tetO control on the same plate, " - "matched by stress state, IPTG state, and timepoint." - ), - match_on=("sensor", "plate_id", "stress_condition", "IPTG", "time"), - control_selector="matched_tetO_group", - profiles=("yfp_cfp", "single_reporter"), - ), - ), - windows=( - ProtocolWindowSpec( - id="pre_stress_last_n", - summary="Use the last N reads before stress addition as the baseline window.", - anchor="stress_time_zero", - selector="last_n_before", - params={"n": 3}, - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolWindowSpec( - id="primary_post_stress", - summary="Use the post-stress kinetic window through the configured end-of-window policy.", - anchor="stress_time_zero", - selector="configured_post_stress_window", - params={"policy": "semantic_metrics.plateau"}, - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolWindowSpec( - id="endpoint_last_n", - summary="Use the last N reads inside the primary post-stress window as the endpoint window after any " - "configured post-stress time cap is applied.", - anchor="primary_post_stress", - selector="last_n_within", - params={"n": 3}, - profiles=("yfp_cfp", "single_reporter"), - ), - ), - metrics=( - ProtocolMetricSpec( - id="OD", - stage="raw", - summary="Raw OD600 trace.", - formula="OD600", - profile_overrides={ - "single_reporter": ProtocolSemanticProfileOverride( - summary="Raw configured growth-proxy trace.", - formula="configured_growth_channel", - ) - }, - ), - ProtocolMetricSpec(id="CFP", stage="raw", summary="Raw CFP trace.", formula="CFP", profiles=("yfp_cfp",)), - ProtocolMetricSpec(id="YFP", stage="raw", summary="Raw YFP trace.", formula="YFP", profiles=("yfp_cfp",)), - ProtocolMetricSpec( - id="Reporter", - stage="raw", - summary="Raw configured reporter trace.", - formula="configured_reporter_channel", - profiles=("single_reporter",), - ), - ProtocolMetricSpec( - id="YFP_OD", - stage="support", - summary="Supporting YFP per biomass proxy.", - formula="YFP / OD600", - depends_on=("YFP", "OD"), - value_space="linear_ratio", - unit="ratio", - comparable_group="support_ratio_linear", - profiles=("yfp_cfp",), - ), - ProtocolMetricSpec( - id="CFP_OD", - stage="support", - summary="Supporting CFP per biomass proxy.", - formula="CFP / OD600", - depends_on=("CFP", "OD"), - value_space="linear_ratio", - unit="ratio", - comparable_group="support_ratio_linear", - profiles=("yfp_cfp",), - ), - ProtocolMetricSpec( - id="Reporter_OD", - stage="support", - summary="Supporting configured reporter per biomass proxy.", - formula="configured_reporter_channel / configured_growth_channel", - depends_on=("Reporter", "OD"), - value_space="linear_ratio", - unit="ratio", - comparable_group="support_ratio_linear", - profiles=("single_reporter",), - ), - ProtocolMetricSpec( - id="R", - stage="derived", - summary="Primary within-well log2 ratio.", - formula="log2(YFP / CFP)", - depends_on=("YFP", "CFP"), - value_space="log2_ratio", - unit="log2_ratio", - comparable_group="primary_ratio_log2", - profile_overrides={ - "single_reporter": ProtocolSemanticProfileOverride( - summary="Primary within-well single-reporter log2 ratio.", - formula="log2(configured_reporter_channel / configured_growth_channel)", - depends_on=("Reporter", "OD", "Reporter_OD"), - value_space="log2_ratio", - unit="log2_ratio", - comparable_group="primary_ratio_log2", - ) - }, - ), - ProtocolMetricSpec( - id="R_pre", - stage="summary", - summary="Mean of the primary ratio in the pre-stress window.", - formula="mean(R over pre_stress_last_n)", - depends_on=("R", "pre_stress_last_n"), - value_space="log2_ratio", - unit="log2_ratio", - comparable_group="primary_ratio_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="P_pre", - stage="summary", - summary="Pre-stress matched-control preload shift between +IPTG and -IPTG states.", - formula="mean(R_pre - R_pre_tetO,matched)(+IPTG) - mean(R_pre - R_pre_tetO,matched)(-IPTG)", - depends_on=("R_pre", "matched_same_sensor_control"), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="mu", - stage="support", - summary="Approximate growth-rate trace from the slope of log(OD600).", - formula="d(log(OD600)) / dt", - depends_on=("OD",), - profile_overrides={ - "single_reporter": ProtocolSemanticProfileOverride( - summary="Approximate growth-rate trace from the slope of log(configured growth channel).", - formula="d(log(configured_growth_channel)) / dt", - ) - }, - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="B", - stage="derived", - summary="Baseline-shifted reporter ratio relative to the well's own pre-stress state.", - formula="R(t) - R_pre", - depends_on=("R", "R_pre"), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="C", - stage="comparison", - summary="Matched-control-normalized sponge deviation.", - formula="B(t) - mean(B matched_same_sensor_control at t)", - depends_on=("B", "matched_same_sensor_control"), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="C_AUC", - stage="summary", - summary="AUC of the matched-control-normalized trace over the primary post-stress window.", - formula="AUC(C over primary_post_stress)", - depends_on=("C", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="C_END", - stage="summary", - summary="Endpoint mean of the matched-control-normalized trace.", - formula="mean(C over endpoint_last_n)", - depends_on=("C", "endpoint_last_n"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D", - stage="comparison", - summary="IPTG-state effect after matched-control normalization.", - formula="mean(C +IPTG) - mean(C -IPTG)", - depends_on=("C",), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_AUC", - stage="summary", - summary="AUC of the IPTG-state effect.", - formula="AUC(D over primary_post_stress)", - depends_on=("D", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_END", - stage="summary", - summary="Endpoint mean of the IPTG-state effect.", - formula="mean(D over endpoint_last_n)", - depends_on=("D", "endpoint_last_n"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_abs", - stage="comparison", - summary="Absolute matched-control IPTG-state effect that retains pre-stress preload differences.", - formula="mean(R - R_tetO,matched)(+IPTG) - mean(R - R_tetO,matched)(-IPTG)", - depends_on=("R", "matched_same_sensor_control"), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_abs_AUC", - stage="summary", - summary="AUC of the absolute matched-control IPTG-state effect.", - formula="AUC(D_abs over primary_post_stress)", - depends_on=("D_abs", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_abs_END", - stage="summary", - summary="Endpoint mean of the absolute matched-control IPTG-state effect.", - formula="mean(D_abs over endpoint_last_n)", - depends_on=("D_abs", "endpoint_last_n"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_growth", - stage="burden", - summary="Construct-specific growth burden after same-sensor tetO subtraction.", - formula="mean(mu - mu_tetO,matched)(+IPTG) - mean(mu - mu_tetO,matched)(-IPTG)", - depends_on=("mu", "matched_same_sensor_control"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_growth_AUC", - stage="burden", - summary="AUC of construct-specific growth burden over the primary window.", - formula="AUC(D_growth over primary_post_stress)", - depends_on=("D_growth", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="D_growth_END", - stage="burden", - summary="Endpoint mean of construct-specific growth burden.", - formula="mean(D_growth over endpoint_last_n)", - depends_on=("D_growth", "endpoint_last_n"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="M", - stage="comparison", - summary="Stress modulation of the IPTG-state effect after stress addition.", - formula="D(relevant_stress) - D(H2O)", - depends_on=("D",), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="M_AUC", - stage="summary", - summary="AUC of stress modulation over the post-stress window.", - formula="AUC(M over primary_post_stress)", - depends_on=("M", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="M_END", - stage="summary", - summary="Endpoint mean of the stress modulation trace.", - formula="mean(M over endpoint_last_n)", - depends_on=("M", "endpoint_last_n"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="O", - stage="ranking", - summary="Expected-direction-aligned IPTG-state effect.", - formula="expected_decoy_sign * D", - depends_on=("D",), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="O_AUC", - stage="ranking", - summary="Positive-area integral of the expected-direction-aligned IPTG-state effect.", - formula="∫ max(O, 0) dt over primary_post_stress", - depends_on=("O", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="O_abs", - stage="ranking", - summary="Expected-direction-aligned absolute matched-control IPTG-state effect.", - formula="expected_decoy_sign * D_abs", - depends_on=("D_abs",), - value_space="delta_log2_ratio", - unit="log2_ratio_delta", - comparable_group="response_delta_log2", - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="O_abs_AUC", - stage="ranking", - summary="Positive-area integral of the expected-direction-aligned absolute matched-control IPTG-state effect.", - formula="∫ max(O_abs, 0) dt over primary_post_stress", - depends_on=("O_abs", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="G_sensor", - stage="summary", - summary="Native tetO sensor response used for cross-sensor scaling.", - formula="AUC(mean(B tetO,-IPTG,relevant stress) - mean(B tetO,-IPTG,H2O))", - depends_on=("B", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="S_AUC", - stage="ranking", - summary="Cross-sensor scaled expected-direction post-stress area relative to the native sensor response.", - formula="O_AUC / abs(G_sensor)", - depends_on=("O_AUC", "G_sensor"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="S_abs_AUC", - stage="ranking", - summary="Cross-sensor scaled expected-direction total area relative to the native sensor response.", - formula="O_abs_AUC / abs(G_sensor)", - depends_on=("O_abs_AUC", "G_sensor"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="L_pre", - stage="leakiness", - summary="Pre-stress leakiness relative to the matched control.", - formula="R_pre(real,-IPTG) - mean(R_pre tetO,-IPTG)", - depends_on=("R_pre", "matched_same_sensor_control"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="L_post_AUC", - stage="leakiness", - summary="Uninduced post-stress leakiness over the primary window.", - formula="AUC(mean(C -IPTG))", - depends_on=("C", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="T_ratio_AUC", - stage="burden", - summary="tetO ratio burden from the +IPTG versus -IPTG state contrast.", - formula="AUC(mean(B tetO,+IPTG) - mean(B tetO,-IPTG))", - depends_on=("B", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="T_growth_AUC", - stage="burden", - summary="tetO growth burden from the +IPTG versus -IPTG state contrast.", - formula="AUC(mean(mu tetO,+IPTG) - mean(mu tetO,-IPTG))", - depends_on=("mu", "primary_post_stress"), - profiles=("yfp_cfp", "single_reporter"), - ), - ProtocolMetricSpec( - id="T_finalOD", - stage="burden", - summary="Endpoint OD burden for the tetO control.", - formula="mean(OD tetO,+IPTG,end) - mean(OD tetO,-IPTG,end)", - depends_on=("OD", "endpoint_last_n"), - profiles=("yfp_cfp", "single_reporter"), - ), - ), - effect_signs=( - ProtocolEffectSignSpec( - target="spyP", - expected_sign="negative", - summary="Effective decoys reduce the spyP ratio after sign correction.", - ), - ProtocolEffectSignSpec( - target="sulAp", - expected_sign="positive", - summary="Effective LexA decoys increase the sulAp ratio.", - ), - ProtocolEffectSignSpec( - target="soxSp", - expected_sign="negative", - summary="Effective SoxR/SoxS decoys reduce the soxSp ratio after sign correction.", - ), - ), - figures=_RETRON_SPONGE_FIGURES, - plot_profiles=_RETRON_SPONGE_PLOT_PROFILES, - default_plot_profile="screen_overview", - artifacts=( - ProtocolArtifactSpec( - id="semantic_trace_table", - summary="CSV export of the matched-control sponge trace table.", - default=True, - ), - ProtocolArtifactSpec( - id="semantic_summary_table", - summary="CSV export of the matched-control sponge summary table.", - default=True, - ), - ), - ranking=ProtocolRankingSpec( - primary_metric="O_abs_AUC", - direction="higher_is_better", - penalties=("T_ratio_AUC", "T_finalOD", "L_pre", "L_post_AUC"), - supporting_metrics=("S_abs_AUC", "P_pre", "O_AUC", "M_AUC"), - summary="Rank hits by expected-direction total area, then inspect preload, expected-direction post-stress area, burden, and leakiness.", - profiles=("yfp_cfp", "single_reporter"), - ), - execution=ProtocolExecutionPlan( - notebook=ProtocolNotebookPolicy( - default_template="notebook/retron_sponge", - allowed_templates=("notebook/retron_sponge", "notebook/eda", "notebook/microplate", "notebook/basic"), - summary=( - "Retron sponge screens default to the protocol-specific review notebook and keep the generic " - "record explorers available as fallbacks." - ), - ), - plugin_defaults=( - ProtocolPluginDefaultsSpec( - plugin="ingest/synergy_h1", - summary=( - "Retron sponge screens inherit generic ingest settings here; " - "the compiler derives the required measurement-family channels." - ), - with_={ - "mode": binding_value("ingest.mode", "auto"), - "channel_map": binding_value("ingest.channel_map", None), - "sheet_names": binding_value("ingest.sheet_names", None), - "add_sheet": binding_value("ingest.add_sheet", False), - "time_round_decimals": binding_value("ingest.time_round_decimals", 12), - "time_step_h": binding_value("ingest.time_step_h", None), - "auto_roots": binding_value("ingest.auto_roots", None), - "auto_include": binding_value("ingest.auto_include", list(DEFAULT_INCLUDE)), - "auto_exclude": binding_value("ingest.auto_exclude", list(DEFAULT_EXCLUDE)), - "auto_pick": binding_value("ingest.auto_pick", "single"), - "auto_recursive": binding_value("ingest.auto_recursive", False), - "add_source_column": binding_value("ingest.add_source_column", False), - "source_col": binding_value("ingest.source_col", "source_file"), - "print_summary": binding_value("ingest.print_summary", True), - }, - ), - ProtocolPluginDefaultsSpec( - plugin="transform/fold_change", - summary=( - "Retron sponge fold-change inherits generic comparison settings here; " - "the compiler sets the target to the compiled primary ratio." - ), - with_={ - "report_times": binding_value("fold_change.report_times"), - "time_tolerance": binding_value("fold_change.time_tolerance", 0.51), - "agg": binding_value("fold_change.agg", "median"), - "treatment_column": binding_value("fold_change.treatment_column", "treatment"), - "group_by": binding_value("fold_change.group_by", ["design_id"]), - "use_global_baseline": binding_value("fold_change.use_global_baseline", False), - "global_baseline_value": binding_value("fold_change.global_baseline_value", None), - "overrides": binding_value("fold_change.overrides", []), - "fc_column": binding_value("fold_change.fc_column", "FC"), - "log2fc_column": binding_value("fold_change.log2fc_column", "log2FC"), - }, - ), - ), - compiler=compile_plate_reader_retron_sponge_screen, - ), +_PLATE_READER_SINGLE_REPORTER_PROTOCOL, _PLATE_READER_RETRON_SPONGE_PROTOCOL = build_plate_reader_variant_protocols( + dual_reporter_protocol=_DUAL_REPORTER_PROTOCOL, + field_builder=_field, ) BUILTIN_PROTOCOLS = ( diff --git a/src/reader/protocols/compiler.py b/src/reader/protocols/compiler.py index 5fb5eed..3b1902c 100644 --- a/src/reader/protocols/compiler.py +++ b/src/reader/protocols/compiler.py @@ -4,11 +4,13 @@ from typing import Any from reader.errors import ConfigError -from reader.protocols.model import ( - CompiledProtocolPlan, - ProtocolSemanticExecution, - ProtocolSemanticNode, - ProtocolSemanticProgram, +from reader.protocols.model import CompiledProtocolPlan +from reader.protocols.semantic_coverage import ( + _cytometry_semantic_program, + _logic_semantic_program, + _plate_reader_retron_sponge_semantic_program, + _plate_reader_semantic_program, + _plate_reader_single_reporter_semantic_program, ) from reader.workbench.decl.model import ( NotebookTemplateCallDecl, @@ -317,12 +319,15 @@ def compile_logic_sfxi_screen(protocol: Any): "endpoint_by_design", "intensity_overview", "logic_symmetry", + "sfxi_setpoint_scatter", + "sfxi_triptych_sequence", }, ) - requires_promoted_df = include_vec8 or "logic_symmetry" in selected_plot_ids + requires_vec8 = include_vec8 or bool({"sfxi_setpoint_scatter", "sfxi_triptych_sequence"} & set(selected_plot_ids)) + requires_promoted_df = requires_vec8 or bool({"logic_symmetry", "sfxi_triptych_sequence"} & set(selected_plot_ids)) if requires_promoted_df: pipeline.append(_sfxi_promote_step()) - if include_vec8: + if requires_vec8: pipeline.append(_sfxi_vec8_step()) plots = [ @@ -348,7 +353,7 @@ def compile_logic_sfxi_screen(protocol: Any): plots=tuple(plots), exports=tuple(exports), notebooks=(default_notebook_call(template),), - semantic_program=_logic_semantic_program(protocol, include_vec8=include_vec8), + semantic_program=_logic_semantic_program(protocol, include_vec8=requires_vec8), ) @@ -393,412 +398,6 @@ def compile_cytometry_flow_panel(protocol: Any): ) -def _semantic_program( - protocol: Any, - *, - overrides: dict[str, ProtocolSemanticExecution], - active_profile: str | None = None, -) -> ProtocolSemanticProgram: - descriptor_program = protocol.descriptor.semantic_program(active_profile=active_profile) - valid_ids = { - *(node.id for node in descriptor_program.controls), - *(node.id for node in descriptor_program.windows), - *(node.id for node in descriptor_program.metrics), - } - if descriptor_program.ranking is not None: - valid_ids.add(descriptor_program.ranking.id) - unknown_override_ids = sorted(set(overrides) - valid_ids) - if unknown_override_ids: - options = ", ".join(sorted(valid_ids)) or "—" - raise ConfigError( - f"Semantic execution overrides reference unknown ids {unknown_override_ids} for protocol {protocol.id!r}. " - f"Known semantic ids: {options}" - ) - - def _apply(nodes: tuple[ProtocolSemanticNode, ...]) -> tuple[ProtocolSemanticNode, ...]: - return tuple( - ProtocolSemanticNode( - id=node.id, - kind=node.kind, - summary=node.summary, - profiles=node.profiles, - stage=node.stage, - formula=node.formula, - depends_on=node.depends_on, - value_space=node.value_space, - unit=node.unit, - comparable_group=node.comparable_group, - anchor=node.anchor, - selector=node.selector, - params=node.params, - match_on=node.match_on, - control_selector=node.control_selector, - primary_metric=node.primary_metric, - direction=node.direction, - penalties=node.penalties, - supporting_metrics=node.supporting_metrics, - execution=overrides.get(node.id, node.execution), - ) - for node in nodes - ) - - ranking = descriptor_program.ranking - if ranking is not None: - ranking = ProtocolSemanticNode( - id=ranking.id, - kind=ranking.kind, - summary=ranking.summary, - profiles=ranking.profiles, - stage=ranking.stage, - formula=ranking.formula, - depends_on=ranking.depends_on, - value_space=ranking.value_space, - unit=ranking.unit, - comparable_group=ranking.comparable_group, - anchor=ranking.anchor, - selector=ranking.selector, - params=ranking.params, - match_on=ranking.match_on, - control_selector=ranking.control_selector, - primary_metric=ranking.primary_metric, - direction=ranking.direction, - penalties=ranking.penalties, - supporting_metrics=ranking.supporting_metrics, - execution=overrides.get(ranking.id, ranking.execution), - ) - - return ProtocolSemanticProgram( - protocol=descriptor_program.protocol, - profiles=descriptor_program.profiles, - active_profile=descriptor_program.active_profile, - controls=_apply(descriptor_program.controls), - windows=_apply(descriptor_program.windows), - metrics=_apply(descriptor_program.metrics), - ranking=ranking, - ) - - -def _plate_reader_semantic_program( - protocol: Any, - *, - include_crosstalk_pairs: bool, - include_fold_change: bool, -) -> ProtocolSemanticProgram: - active_profile = _dual_reporter_semantic_profile( - include_fold_change=include_fold_change, - include_crosstalk_pairs=include_crosstalk_pairs, - ) - overrides: dict[str, ProtocolSemanticExecution] = { - "OD": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note="Raw OD600 values are materialized on the ingest dataframe.", - ), - } - overrides.update( - { - "CFP": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note="Raw CFP values are materialized on the ingest dataframe.", - ), - "YFP": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note="Raw YFP values are materialized on the ingest dataframe.", - ), - "CFP_OD": ProtocolSemanticExecution( - status="compiled", - step_ids=("ratio_cfp_od600",), - plugin_ids=("transform/ratio",), - record_ids=("ratio_cfp_od600/df",), - note="The CFP/OD600 support channel is materialized as a ratio step output.", - ), - "YFP_OD": ProtocolSemanticExecution( - status="compiled", - step_ids=("ratio_yfp_od600",), - plugin_ids=("transform/ratio",), - record_ids=("ratio_yfp_od600/df",), - note="The YFP/OD600 support channel is materialized as a ratio step output.", - ), - "Ratio": ProtocolSemanticExecution( - status="compiled", - step_ids=("ratio_yfp_cfp",), - plugin_ids=("transform/ratio",), - record_ids=("ratio_yfp_cfp/df",), - note="The primary YFP/CFP ratio is materialized as a ratio step output.", - ), - } - ) - if include_fold_change: - fold_change_step_id = "fold_change__yfp_over_cfp" - fold_change_record_id = "fold_change__yfp_over_cfp/table" - fold_change_note = "Nearest-time fold-change summaries are materialized from the primary ratio channel." - overrides.update( - { - "FC": ProtocolSemanticExecution( - status="compiled", - step_ids=(fold_change_step_id,), - plugin_ids=("transform/fold_change",), - record_ids=(fold_change_record_id,), - note=fold_change_note, - ), - "log2FC": ProtocolSemanticExecution( - status="compiled", - step_ids=(fold_change_step_id,), - plugin_ids=("transform/fold_change",), - record_ids=(fold_change_record_id,), - note=fold_change_note, - ), - } - ) - if include_crosstalk_pairs: - overrides["ranking"] = ProtocolSemanticExecution( - status="compiled", - step_ids=("crosstalk_pairs",), - plugin_ids=("transform/crosstalk_pairs",), - record_ids=("crosstalk_pairs/table",), - config_paths=("protocol.analysis.crosstalk_pairs",), - note="When crosstalk pair analysis is enabled, pair selection is compiled from fold-change output.", - ) - return _semantic_program(protocol, overrides=overrides, active_profile=active_profile) - - -def _plate_reader_single_reporter_semantic_program( - protocol: Any, - *, - reporter_channel: str, - normalizer_channel: str, - include_fold_change: bool, -) -> ProtocolSemanticProgram: - ratio_label = _single_reporter_ratio_label( - reporter_channel=reporter_channel, - normalizer_channel=normalizer_channel, - ) - ratio_note = f"The primary {ratio_label} ratio is materialized as a ratio step output." - fold_change_note = f"Nearest-time fold-change summaries are materialized from the primary {ratio_label} channel." - overrides: dict[str, ProtocolSemanticExecution] = { - "Normalizer": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note=f"Raw {normalizer_channel} values are materialized on the ingest dataframe.", - ), - "Reporter": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note=f"Raw {reporter_channel} values are materialized on the ingest dataframe.", - ), - "Reporter_Normalizer": ProtocolSemanticExecution( - status="compiled", - step_ids=("ratio_reporter_normalizer",), - plugin_ids=("transform/ratio",), - record_ids=("ratio_reporter_normalizer/df",), - note=ratio_note, - ), - } - if include_fold_change: - overrides.update( - { - "FC": ProtocolSemanticExecution( - status="compiled", - step_ids=("fold_change__single_reporter",), - plugin_ids=("transform/fold_change",), - record_ids=("fold_change__single_reporter/table",), - note=fold_change_note, - ), - "log2FC": ProtocolSemanticExecution( - status="compiled", - step_ids=("fold_change__single_reporter",), - plugin_ids=("transform/fold_change",), - record_ids=("fold_change__single_reporter/table",), - note=fold_change_note, - ), - } - ) - return _semantic_program( - protocol, overrides=overrides, active_profile=_single_reporter_semantic_profile(include_fold_change) - ) - - -def _plate_reader_retron_sponge_semantic_program( - protocol: Any, - *, - measurement: str, - reporter_channel: str, - growth_channel: str, -) -> ProtocolSemanticProgram: - trace_binding = ProtocolSemanticExecution( - status="compiled", - step_ids=("semantic_metrics",), - plugin_ids=("transform/retron_sponge_metrics",), - record_ids=("semantic_metrics/trace",), - config_paths=("protocol.analysis.semantic_metrics",), - note="Matched-control sponge kinetics are materialized as a typed trace table.", - ) - summary_binding = ProtocolSemanticExecution( - status="compiled", - step_ids=("semantic_metrics",), - plugin_ids=("transform/retron_sponge_metrics",), - record_ids=("semantic_metrics/summary",), - config_paths=("protocol.analysis.semantic_metrics",), - note="Matched-control sponge summaries are materialized as a typed summary table.", - ) - overrides: dict[str, ProtocolSemanticExecution] = { - "matched_same_sensor_control": trace_binding, - "pre_stress_last_n": trace_binding, - "primary_post_stress": trace_binding, - "endpoint_last_n": trace_binding, - "OD": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note=f"Raw {growth_channel} values are materialized on the ingest dataframe.", - ), - "R": trace_binding, - "R_pre": summary_binding, - "P_pre": summary_binding, - "B": trace_binding, - "C": trace_binding, - "C_AUC": summary_binding, - "C_END": summary_binding, - "mu": trace_binding, - "D": trace_binding, - "D_AUC": summary_binding, - "D_END": summary_binding, - "D_abs": trace_binding, - "D_abs_AUC": summary_binding, - "D_abs_END": summary_binding, - "D_growth": trace_binding, - "D_growth_AUC": summary_binding, - "D_growth_END": summary_binding, - "M": trace_binding, - "M_AUC": summary_binding, - "M_END": summary_binding, - "O": trace_binding, - "O_AUC": summary_binding, - "O_abs": trace_binding, - "O_abs_AUC": summary_binding, - "G_sensor": summary_binding, - "S_AUC": summary_binding, - "S_abs_AUC": summary_binding, - "L_pre": summary_binding, - "L_post_AUC": summary_binding, - "T_ratio_AUC": summary_binding, - "T_growth_AUC": summary_binding, - "T_finalOD": summary_binding, - "ranking": summary_binding, - } - if measurement == "yfp_cfp": - overrides.update( - { - "CFP": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note="Raw CFP values are materialized on the ingest dataframe.", - ), - "YFP": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note="Raw YFP values are materialized on the ingest dataframe.", - ), - "CFP_OD": ProtocolSemanticExecution( - status="compiled", - step_ids=("ratio_cfp_od600",), - plugin_ids=("transform/ratio",), - record_ids=("ratio_cfp_od600/df",), - note="The CFP/OD600 support channel is materialized as a ratio step output.", - ), - "YFP_OD": ProtocolSemanticExecution( - status="compiled", - step_ids=("ratio_yfp_od600",), - plugin_ids=("transform/ratio",), - record_ids=("ratio_yfp_od600/df",), - note="The YFP/OD600 support channel is materialized as a ratio step output.", - ), - } - ) - else: - overrides.update( - { - "Reporter": ProtocolSemanticExecution( - status="compiled", - step_ids=("ingest",), - plugin_ids=("ingest/synergy_h1",), - record_ids=("ingest/df",), - note=f"Raw {reporter_channel} values are materialized on the ingest dataframe.", - ), - "Reporter_OD": ProtocolSemanticExecution( - status="compiled", - step_ids=("ratio_reporter_normalizer",), - plugin_ids=("transform/ratio",), - record_ids=("ratio_reporter_normalizer/df",), - note=( - "The " - f"{reporter_channel}/{growth_channel} support channel is materialized as a ratio step output." - ), - ), - } - ) - return _semantic_program(protocol, overrides=overrides, active_profile=measurement) - - -def _logic_semantic_program(protocol: Any, *, include_vec8: bool) -> ProtocolSemanticProgram: - overrides: dict[str, ProtocolSemanticExecution] = {} - if include_vec8: - vec8_binding = ProtocolSemanticExecution( - status="compiled", - step_ids=("sfxi_vec8",), - plugin_ids=("transform/sfxi",), - record_ids=("sfxi_vec8/vec8",), - config_paths=( - "protocol.inputs.response", - "protocol.inputs.reference", - "protocol.inputs.design_by", - "protocol.inputs.logic_map_ref", - "protocol.inputs.time_mode", - "protocol.inputs.target_time_h", - "protocol.inputs.time_tolerance_h", - ), - note="The SFXI vec8 transform materializes the protocol control rule, summary window, metric, and ranking surface.", - ) - overrides.update( - { - "logic_corner_map": vec8_binding, - "summary_timepoint": vec8_binding, - "vec8": vec8_binding, - "ranking": vec8_binding, - } - ) - return _semantic_program(protocol, overrides=overrides) - - -def _cytometry_semantic_program(protocol: Any) -> ProtocolSemanticProgram: - return _semantic_program( - protocol, - overrides={ - "ranking": ProtocolSemanticExecution( - status="descriptive_only", - note="Cytometry ranking remains domain-defined until a typed analysis program is introduced.", - ) - }, - ) - - def _analysis_options(protocol: Any) -> dict[str, Any]: raw = getattr(protocol, "analysis", {}) or {} if not isinstance(raw, dict): @@ -882,18 +481,6 @@ def _configured_fold_change_target(protocol: Any, *, expected: str) -> str: return target -def _dual_reporter_semantic_profile(*, include_fold_change: bool, include_crosstalk_pairs: bool) -> str: - if include_crosstalk_pairs: - return "yfp_cfp_crosstalk" - if include_fold_change: - return "yfp_cfp_fold_change" - return "yfp_cfp_raw" - - -def _single_reporter_semantic_profile(include_fold_change: bool) -> str: - return "single_reporter_fold_change" if include_fold_change else "single_reporter_raw" - - def _cfg_bool(raw: dict[str, Any], *, key: str, default: bool) -> bool: value = raw.get(key, default) if isinstance(value, bool): @@ -1145,6 +732,25 @@ def _sfxi_vec8_step() -> PluginStepDecl: ) +def _sfxi_setpoint_scatter_defaults(protocol: Any) -> dict[str, Any]: + objective = _analysis_mapping(_analysis_options(protocol), key="sfxi_objective") + scaling = _analysis_mapping(objective, key="scaling") + exponents = _analysis_mapping(objective, key="exponents") + return { + "setpoints": deepcopy(objective.get("setpoints", {"and": [0.0, 0.0, 0.0, 1.0]})), + "scaling_percentile": int(scaling.get("percentile", 95)), + "scaling_min_n": int(scaling.get("min_n", 5)), + "scaling_eps": float(scaling.get("eps", 1.0e-8)), + "logic_exponent_beta": float(exponents.get("logic_exponent_beta", 1.0)), + "intensity_exponent_gamma": float(exponents.get("intensity_exponent_gamma", 1.0)), + "intensity_log2_offset_delta": float(objective.get("intensity_log2_offset_delta", 0.0)), + } + + +def _sfxi_triptych_sequence_defaults(protocol: Any) -> dict[str, Any]: + return deepcopy(_analysis_mapping(_analysis_options(protocol), key="sfxi_triptych_sequence")) + + def _plate_reader_plot_output(protocol: Any, *, output_id: str, measurement: str) -> PluginStepDecl: settings = protocol.plot_view_config(figure_id=output_id) plot_reads = _plate_reader_plot_reads(measurement=measurement) @@ -1275,6 +881,23 @@ def _plate_reader_plot_output(protocol: Any, *, output_id: str, measurement: str reads={"df": RecordInputDecl(record_id="promote_to_tidy_plus_map/df")}, with_=_deep_merge(defaults, settings), ) + if output_id == "sfxi_setpoint_scatter": + return _step( + id="sfxi_setpoint_scatter", + plugin="plot/sfxi_setpoint_scatter", + reads={"vec8": RecordInputDecl(record_id="sfxi_vec8/vec8")}, + with_=_deep_merge(_sfxi_setpoint_scatter_defaults(protocol), settings), + ) + if output_id == "sfxi_triptych_sequence": + return _step( + id="sfxi_triptych_sequence", + plugin="plot/sfxi_triptych_sequence", + reads={ + "vec8": RecordInputDecl(record_id="sfxi_vec8/vec8"), + "assay": RecordInputDecl(record_id="promote_to_tidy_plus_map/df"), + }, + with_=_deep_merge(_sfxi_triptych_sequence_defaults(protocol), settings), + ) raise ConfigError(f"Unknown plate-reader plot output {output_id!r}") diff --git a/src/reader/protocols/model.py b/src/reader/protocols/model.py index 9246be2..40e14e2 100644 --- a/src/reader/protocols/model.py +++ b/src/reader/protocols/model.py @@ -2,7 +2,7 @@ from collections.abc import Callable from copy import deepcopy -from dataclasses import dataclass, field +from dataclasses import dataclass, field, replace from typing import Any, Literal from reader.domains.semantics import PluginDomain, validate_plugin_domain @@ -507,6 +507,10 @@ class ProtocolSemanticExecution: note: str = "" def __post_init__(self) -> None: + status = str(self.status).strip() + if status not in {"compiled", "descriptive_only"}: + raise ValueError("ProtocolSemanticExecution.status must be 'compiled' or 'descriptive_only'.") + object.__setattr__(self, "status", status) object.__setattr__(self, "step_ids", tuple(str(value) for value in self.step_ids if str(value).strip())) object.__setattr__(self, "plugin_ids", tuple(str(value) for value in self.plugin_ids if str(value).strip())) object.__setattr__(self, "record_ids", tuple(str(value) for value in self.record_ids if str(value).strip())) @@ -563,6 +567,9 @@ def __post_init__(self) -> None: tuple(str(value) for value in self.supporting_metrics if str(value).strip()), ) + def with_execution(self, execution: ProtocolSemanticExecution) -> ProtocolSemanticNode: + return replace(self, execution=execution) + @dataclass(frozen=True) class ProtocolSemanticProgram: @@ -584,6 +591,16 @@ def __post_init__(self) -> None: profile_ids.add(profile.id) if self.active_profile is not None and self.active_profile not in profile_ids: raise ValueError(f"Unknown active semantic profile {self.active_profile!r}.") + + def _assert_known_profiles(node_profiles: tuple[str, ...], *, where: str) -> None: + unknown_profiles = sorted(set(node_profiles) - profile_ids) + if unknown_profiles: + options = ", ".join(sorted(profile_ids)) or "—" + raise ValueError( + f"{where} references unknown semantic profiles: {', '.join(unknown_profiles)}. " + f"Known profiles: {options}." + ) + for group_name, nodes in ( ("controls", self.controls), ("windows", self.windows), @@ -594,8 +611,11 @@ def __post_init__(self) -> None: if node.id in seen: raise ValueError(f"Duplicate semantic node {node.id!r} in {group_name}.") seen.add(node.id) + _assert_known_profiles(node.profiles, where=f"ProtocolSemanticProgram.{group_name} item {node.id!r}") if self.ranking is not None and self.ranking.kind != "ranking": raise ValueError("ProtocolSemanticProgram.ranking must have kind='ranking'.") + if self.ranking is not None: + _assert_known_profiles(self.ranking.profiles, where="ProtocolSemanticProgram.ranking") node_ids = {node.id for node in (*self.controls, *self.windows, *self.metrics)} metric_ids = {node.id for node in self.metrics} @@ -623,6 +643,49 @@ def __post_init__(self) -> None: f"Known metrics: {options}." ) + @property + def has_nodes(self) -> bool: + return bool(self.controls or self.windows or self.metrics or self.ranking is not None) + + def with_execution_overrides( + self, + overrides: dict[str, ProtocolSemanticExecution] | None, + *, + protocol_id: str | None = None, + ) -> ProtocolSemanticProgram: + overrides = dict(overrides or {}) + if not overrides: + return self + valid_ids = { + *(node.id for node in self.controls), + *(node.id for node in self.windows), + *(node.id for node in self.metrics), + } + if self.ranking is not None: + valid_ids.add(self.ranking.id) + unknown_override_ids = sorted(set(overrides) - valid_ids) + if unknown_override_ids: + options = ", ".join(sorted(valid_ids)) or "—" + protocol_label = protocol_id or self.protocol + raise ConfigError( + f"Semantic execution overrides reference unknown ids {unknown_override_ids} for protocol " + f"{protocol_label!r}. Known semantic ids: {options}" + ) + + def _apply(nodes: tuple[ProtocolSemanticNode, ...]) -> tuple[ProtocolSemanticNode, ...]: + return tuple(node.with_execution(overrides.get(node.id, node.execution)) for node in nodes) + + ranking = self.ranking + if ranking is not None: + ranking = ranking.with_execution(overrides.get(ranking.id, ranking.execution)) + return replace( + self, + controls=_apply(self.controls), + windows=_apply(self.windows), + metrics=_apply(self.metrics), + ranking=ranking, + ) + @dataclass(frozen=True) class ProtocolPluginDefaultsSpec: @@ -666,14 +729,16 @@ def __post_init__(self) -> None: @dataclass(frozen=True) class CompiledProtocolPlan: + semantic_program: ProtocolSemanticProgram runtime: dict[str, Any] = field(default_factory=dict) pipeline: tuple[PluginStepDecl, ...] = () plots: tuple[PluginStepDecl, ...] = () exports: tuple[PluginStepDecl, ...] = () notebooks: tuple[NotebookTemplateCallDecl, ...] = () - semantic_program: ProtocolSemanticProgram | None = None def __post_init__(self) -> None: + if not isinstance(self.semantic_program, ProtocolSemanticProgram): + raise ValueError("CompiledProtocolPlan.semantic_program must be a ProtocolSemanticProgram instance.") object.__setattr__(self, "runtime", dict(self.runtime or {})) object.__setattr__(self, "pipeline", tuple(self.pipeline or ())) object.__setattr__(self, "plots", tuple(self.plots or ())) @@ -1087,6 +1152,15 @@ def allowed_notebook_templates(self) -> tuple[str, ...]: def allows_notebook_template(self, template: str) -> bool: return template in self.allowed_notebook_templates + def semantic_program( + self, + *, + active_profile: str | None = None, + execution_overrides: dict[str, ProtocolSemanticExecution] | None = None, + ) -> ProtocolSemanticProgram: + program = self.descriptor.semantic_program(active_profile=active_profile) + return program.with_execution_overrides(execution_overrides, protocol_id=self.id) + def resolve_notebook_template( self, *, @@ -1116,13 +1190,17 @@ def compile(self) -> CompiledProtocolPlan: notebooks = (NotebookTemplateCallDecl(id="default", template=selected_template),) for entry in notebooks: self.resolve_notebook_template(explicit_template=entry.template) + if plan.semantic_program.protocol != self.id: + raise ConfigError( + f"Protocol {self.id!r} compiler returned semantic program for {plan.semantic_program.protocol!r}." + ) return CompiledProtocolPlan( + semantic_program=plan.semantic_program, runtime=plan.runtime, pipeline=plan.pipeline, plots=plan.plots, exports=plan.exports, notebooks=notebooks, - semantic_program=plan.semantic_program or self.descriptor.semantic_program(), ) def effective_plugin_config(self, *, plugin_id: str, step_with: dict[str, Any] | None = None) -> dict[str, Any]: diff --git a/src/reader/protocols/semantic_coverage.py b/src/reader/protocols/semantic_coverage.py new file mode 100644 index 0000000..5d6f89e --- /dev/null +++ b/src/reader/protocols/semantic_coverage.py @@ -0,0 +1,351 @@ +from __future__ import annotations + +from typing import Any + +from reader.protocols.model import ProtocolSemanticExecution, ProtocolSemanticProgram + + +def _semantic_program( + protocol: Any, + *, + overrides: dict[str, ProtocolSemanticExecution], + active_profile: str | None = None, +) -> ProtocolSemanticProgram: + return protocol.semantic_program(active_profile=active_profile, execution_overrides=overrides) + + +def _plate_reader_semantic_program( + protocol: Any, + *, + include_crosstalk_pairs: bool, + include_fold_change: bool, +) -> ProtocolSemanticProgram: + active_profile = _dual_reporter_semantic_profile( + include_fold_change=include_fold_change, + include_crosstalk_pairs=include_crosstalk_pairs, + ) + overrides: dict[str, ProtocolSemanticExecution] = { + "OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note="Raw OD600 values are materialized on the ingest dataframe.", + ), + } + overrides.update( + { + "CFP": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note="Raw CFP values are materialized on the ingest dataframe.", + ), + "YFP": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note="Raw YFP values are materialized on the ingest dataframe.", + ), + "CFP_OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ratio_cfp_od600",), + plugin_ids=("transform/ratio",), + record_ids=("ratio_cfp_od600/df",), + note="The CFP/OD600 support channel is materialized as a ratio step output.", + ), + "YFP_OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ratio_yfp_od600",), + plugin_ids=("transform/ratio",), + record_ids=("ratio_yfp_od600/df",), + note="The YFP/OD600 support channel is materialized as a ratio step output.", + ), + "Ratio": ProtocolSemanticExecution( + status="compiled", + step_ids=("ratio_yfp_cfp",), + plugin_ids=("transform/ratio",), + record_ids=("ratio_yfp_cfp/df",), + note="The primary YFP/CFP ratio is materialized as a ratio step output.", + ), + } + ) + if include_fold_change: + fold_change_step_id = "fold_change__yfp_over_cfp" + fold_change_record_id = "fold_change__yfp_over_cfp/table" + fold_change_note = "Nearest-time fold-change summaries are materialized from the primary ratio channel." + overrides.update( + { + "FC": ProtocolSemanticExecution( + status="compiled", + step_ids=(fold_change_step_id,), + plugin_ids=("transform/fold_change",), + record_ids=(fold_change_record_id,), + note=fold_change_note, + ), + "log2FC": ProtocolSemanticExecution( + status="compiled", + step_ids=(fold_change_step_id,), + plugin_ids=("transform/fold_change",), + record_ids=(fold_change_record_id,), + note=fold_change_note, + ), + } + ) + if include_crosstalk_pairs: + overrides["ranking"] = ProtocolSemanticExecution( + status="compiled", + step_ids=("crosstalk_pairs",), + plugin_ids=("transform/crosstalk_pairs",), + record_ids=("crosstalk_pairs/table",), + config_paths=("protocol.analysis.crosstalk_pairs",), + note="When crosstalk pair analysis is enabled, pair selection is compiled from fold-change output.", + ) + return _semantic_program(protocol, overrides=overrides, active_profile=active_profile) + + +def _plate_reader_single_reporter_semantic_program( + protocol: Any, + *, + reporter_channel: str, + normalizer_channel: str, + include_fold_change: bool, +) -> ProtocolSemanticProgram: + ratio_label = _single_reporter_ratio_label( + reporter_channel=reporter_channel, + normalizer_channel=normalizer_channel, + ) + ratio_note = f"The primary {ratio_label} ratio is materialized as a ratio step output." + fold_change_note = f"Nearest-time fold-change summaries are materialized from the primary {ratio_label} channel." + overrides: dict[str, ProtocolSemanticExecution] = { + "Normalizer": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note=f"Raw {normalizer_channel} values are materialized on the ingest dataframe.", + ), + "Reporter": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note=f"Raw {reporter_channel} values are materialized on the ingest dataframe.", + ), + "Reporter_Normalizer": ProtocolSemanticExecution( + status="compiled", + step_ids=("ratio_reporter_normalizer",), + plugin_ids=("transform/ratio",), + record_ids=("ratio_reporter_normalizer/df",), + note=ratio_note, + ), + } + if include_fold_change: + overrides.update( + { + "FC": ProtocolSemanticExecution( + status="compiled", + step_ids=("fold_change__single_reporter",), + plugin_ids=("transform/fold_change",), + record_ids=("fold_change__single_reporter/table",), + note=fold_change_note, + ), + "log2FC": ProtocolSemanticExecution( + status="compiled", + step_ids=("fold_change__single_reporter",), + plugin_ids=("transform/fold_change",), + record_ids=("fold_change__single_reporter/table",), + note=fold_change_note, + ), + } + ) + return _semantic_program( + protocol, overrides=overrides, active_profile=_single_reporter_semantic_profile(include_fold_change) + ) + + +def _plate_reader_retron_sponge_semantic_program( + protocol: Any, + *, + measurement: str, + reporter_channel: str, + growth_channel: str, +) -> ProtocolSemanticProgram: + trace_binding = ProtocolSemanticExecution( + status="compiled", + step_ids=("semantic_metrics",), + plugin_ids=("transform/retron_sponge_metrics",), + record_ids=("semantic_metrics/trace",), + config_paths=("protocol.analysis.semantic_metrics",), + note="Matched-control sponge kinetics are materialized as a typed trace table.", + ) + summary_binding = ProtocolSemanticExecution( + status="compiled", + step_ids=("semantic_metrics",), + plugin_ids=("transform/retron_sponge_metrics",), + record_ids=("semantic_metrics/summary",), + config_paths=("protocol.analysis.semantic_metrics",), + note="Matched-control sponge summaries are materialized as a typed summary table.", + ) + overrides: dict[str, ProtocolSemanticExecution] = { + "matched_same_sensor_control": trace_binding, + "pre_stress_last_n": trace_binding, + "primary_post_stress": trace_binding, + "endpoint_last_n": trace_binding, + "OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note=f"Raw {growth_channel} values are materialized on the ingest dataframe.", + ), + "R": trace_binding, + "R_pre": summary_binding, + "P_pre": summary_binding, + "B": trace_binding, + "C": trace_binding, + "C_AUC": summary_binding, + "C_END": summary_binding, + "mu": trace_binding, + "D": trace_binding, + "D_AUC": summary_binding, + "D_END": summary_binding, + "D_abs": trace_binding, + "D_abs_AUC": summary_binding, + "D_abs_END": summary_binding, + "D_growth": trace_binding, + "D_growth_AUC": summary_binding, + "D_growth_END": summary_binding, + "M": trace_binding, + "M_AUC": summary_binding, + "M_END": summary_binding, + "O": trace_binding, + "O_AUC": summary_binding, + "O_abs": trace_binding, + "O_abs_AUC": summary_binding, + "G_sensor": summary_binding, + "S_AUC": summary_binding, + "S_abs_AUC": summary_binding, + "L_pre": summary_binding, + "L_post_AUC": summary_binding, + "T_ratio_AUC": summary_binding, + "T_growth_AUC": summary_binding, + "T_finalOD": summary_binding, + "ranking": summary_binding, + } + if measurement == "yfp_cfp": + overrides.update( + { + "CFP": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note="Raw CFP values are materialized on the ingest dataframe.", + ), + "YFP": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note="Raw YFP values are materialized on the ingest dataframe.", + ), + "CFP_OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ratio_cfp_od600",), + plugin_ids=("transform/ratio",), + record_ids=("ratio_cfp_od600/df",), + note="The CFP/OD600 support channel is materialized as a ratio step output.", + ), + "YFP_OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ratio_yfp_od600",), + plugin_ids=("transform/ratio",), + record_ids=("ratio_yfp_od600/df",), + note="The YFP/OD600 support channel is materialized as a ratio step output.", + ), + } + ) + else: + overrides.update( + { + "Reporter": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + plugin_ids=("ingest/synergy_h1",), + record_ids=("ingest/df",), + note=f"Raw {reporter_channel} values are materialized on the ingest dataframe.", + ), + "Reporter_OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ratio_reporter_normalizer",), + plugin_ids=("transform/ratio",), + record_ids=("ratio_reporter_normalizer/df",), + note=( + "The " + f"{reporter_channel}/{growth_channel} support channel is materialized as a ratio step output." + ), + ), + } + ) + return _semantic_program(protocol, overrides=overrides, active_profile=measurement) + + +def _logic_semantic_program(protocol: Any, *, include_vec8: bool) -> ProtocolSemanticProgram: + overrides: dict[str, ProtocolSemanticExecution] = {} + if include_vec8: + vec8_binding = ProtocolSemanticExecution( + status="compiled", + step_ids=("sfxi_vec8",), + plugin_ids=("transform/sfxi",), + record_ids=("sfxi_vec8/vec8",), + config_paths=( + "protocol.inputs.response", + "protocol.inputs.reference", + "protocol.inputs.design_by", + "protocol.inputs.logic_map_ref", + "protocol.inputs.time_mode", + "protocol.inputs.target_time_h", + "protocol.inputs.time_tolerance_h", + ), + note="The SFXI vec8 transform materializes the protocol control rule, summary window, metric, and ranking surface.", + ) + overrides.update( + { + "logic_corner_map": vec8_binding, + "summary_timepoint": vec8_binding, + "vec8": vec8_binding, + "ranking": vec8_binding, + } + ) + return _semantic_program(protocol, overrides=overrides) + + +def _cytometry_semantic_program(protocol: Any) -> ProtocolSemanticProgram: + return _semantic_program( + protocol, + overrides={ + "ranking": ProtocolSemanticExecution( + status="descriptive_only", + note="Cytometry ranking remains domain-defined until a typed analysis program is introduced.", + ) + }, + ) + + +def _dual_reporter_semantic_profile(*, include_fold_change: bool, include_crosstalk_pairs: bool) -> str: + if include_crosstalk_pairs: + return "yfp_cfp_crosstalk" + if include_fold_change: + return "yfp_cfp_fold_change" + return "yfp_cfp_raw" + + +def _single_reporter_semantic_profile(include_fold_change: bool) -> str: + return "single_reporter_fold_change" if include_fold_change else "single_reporter_raw" + + +def _single_reporter_ratio_label(*, reporter_channel: str, normalizer_channel: str) -> str: + return f"{reporter_channel}/{normalizer_channel}" diff --git a/src/reader/tests/cli/test_audit_local_experiments.py b/src/reader/tests/cli/test_audit_local_experiments.py new file mode 100644 index 0000000..118c06c --- /dev/null +++ b/src/reader/tests/cli/test_audit_local_experiments.py @@ -0,0 +1,87 @@ +from __future__ import annotations + +import json +import os +import subprocess +import sys +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parents[4] +TOOL_PATH = REPO_ROOT / "tools" / "audit_local_experiments.py" + + +def test_audit_local_experiments_auto_discovers_numeric_year_dirs(tmp_path: Path) -> None: + experiments_root = tmp_path / "experiments" + experiment_dir = experiments_root / "2027" / "exp_auto" + experiment_dir.mkdir(parents=True) + (experiments_root / "template").mkdir(parents=True) + (experiment_dir / "config.yaml").write_text( + "schema: reader/v7\nexperiment:\n id: exp_auto\n lifecycle: draft\nprotocol:\n id: workbench/generic\n", + encoding="utf-8", + ) + + result = subprocess.run( + [sys.executable, str(TOOL_PATH), "--root", str(experiments_root), "--format", "json"], + cwd=REPO_ROOT, + env={**os.environ, "PYTHONPATH": str(REPO_ROOT / "src")}, + check=False, + capture_output=True, + text=True, + ) + + assert result.returncode == 0, result.stderr + payload = json.loads(result.stdout) + assert payload["years"] == ["2027"] + assert payload["summary"] == {"experiments": 1, "passed": 0, "failed": 0, "skipped": 1} + assert payload["results"][0]["config"].endswith("2027/exp_auto/config.yaml") + + +def test_audit_local_experiments_include_non_active_flag(tmp_path: Path) -> None: + experiments_root = tmp_path / "experiments" + experiment_dir = experiments_root / "2027" / "exp_auto" + experiment_dir.mkdir(parents=True) + (experiment_dir / "config.yaml").write_text( + "schema: reader/v7\nexperiment:\n id: exp_auto\n lifecycle: draft\nprotocol:\n id: workbench/generic\n", + encoding="utf-8", + ) + + result = subprocess.run( + [sys.executable, str(TOOL_PATH), "--root", str(experiments_root), "--format", "json", "--include-non-active"], + cwd=REPO_ROOT, + env={**os.environ, "PYTHONPATH": str(REPO_ROOT / "src")}, + check=False, + capture_output=True, + text=True, + ) + + assert result.returncode == 1, result.stderr + payload = json.loads(result.stdout) + assert payload["summary"] == {"experiments": 1, "passed": 0, "failed": 1, "skipped": 0} + assert payload["results"][0]["status"] == "failed" + + +def test_audit_local_experiments_does_not_mutate_source_outputs(tmp_path: Path) -> None: + experiments_root = tmp_path / "experiments" + experiment_dir = experiments_root / "2027" / "exp_active" + experiment_dir.mkdir(parents=True) + (experiment_dir / "config.yaml").write_text( + "schema: reader/v7\nexperiment:\n id: exp_active\n lifecycle: active\nprotocol:\n id: workbench/generic\n", + encoding="utf-8", + ) + outputs_dir = experiment_dir / "outputs" + outputs_dir.mkdir() + sentinel = outputs_dir / "sentinel.txt" + sentinel.write_text("keep", encoding="utf-8") + + result = subprocess.run( + [sys.executable, str(TOOL_PATH), "--root", str(experiments_root), "--format", "json"], + cwd=REPO_ROOT, + env={**os.environ, "PYTHONPATH": str(REPO_ROOT / "src")}, + check=False, + capture_output=True, + text=True, + ) + + assert result.returncode == 1, result.stderr + assert sentinel.read_text(encoding="utf-8") == "keep" + assert not (outputs_dir / "manifests").exists() diff --git a/src/reader/tests/cli/test_helpers.py b/src/reader/tests/cli/test_helpers.py index 0a4cba4..fb19a67 100644 --- a/src/reader/tests/cli/test_helpers.py +++ b/src/reader/tests/cli/test_helpers.py @@ -4,10 +4,10 @@ import pytest -from reader.errors import RecordError +from reader.errors import ReaderError, RecordError from reader.protocols.model import binding_value from reader.runtime import builtin_runtime -from reader.workbench.cli.helpers import dataframe_record_contracts +from reader.workbench.cli.helpers import append_journal, dataframe_record_contracts from reader.workbench.cli.shared import json_friendly @@ -27,3 +27,29 @@ def test_json_friendly_serializes_protocol_binding_value_ref() -> None: "binding_value": "sample_map", "default": "metadata.xlsx", } + + +def test_append_journal_migrates_legacy_lowercase_filename(tmp_path: Path) -> None: + job_path = tmp_path / "config.yaml" + job_path.write_text("schema: reader/v7\n", encoding="utf-8") + legacy = tmp_path / "journal.md" + legacy.write_text("# Experiment Journal\n\nlegacy entry\n", encoding="utf-8") + + append_journal(job_path, "uv run reader run config.yaml") + + canonical = tmp_path / "JOURNAL.md" + assert canonical.exists() + text = canonical.read_text(encoding="utf-8") + assert "legacy entry" in text + assert "uv run reader run config.yaml" in text + + +def test_append_journal_rejects_split_case_journal_files(monkeypatch, tmp_path: Path) -> None: + job_path = tmp_path / "config.yaml" + job_path.write_text("schema: reader/v7\n", encoding="utf-8") + (tmp_path / "JOURNAL.md").write_text("# Experiment Journal\n", encoding="utf-8") + (tmp_path / "journal.md").write_text("# Experiment Journal\n", encoding="utf-8") + monkeypatch.setattr(Path, "samefile", lambda self, other: False) + + with pytest.raises(ReaderError, match="Both JOURNAL.md and journal.md exist"): + append_journal(job_path, "uv run reader run config.yaml") diff --git a/src/reader/tests/cli/test_plot_export.py b/src/reader/tests/cli/test_plot_export.py index ccf4654..484cc70 100644 --- a/src/reader/tests/cli/test_plot_export.py +++ b/src/reader/tests/cli/test_plot_export.py @@ -9,6 +9,7 @@ from __future__ import annotations +import importlib import json import re from pathlib import Path @@ -60,6 +61,21 @@ def _logic_plot_config() -> dict: ) +def _logic_sfxi_scatter_config() -> dict: + cfg = _logic_plot_config() + cfg["protocol"]["analysis"] = { + "include_vec8": True, + "include_fold_change": False, + "sfxi_objective": { + "setpoints": {"and": [0.0, 0.0, 0.0, 1.0]}, + "scaling": {"percentile": 95, "min_n": 1, "eps": 1e-8}, + "exponents": {"logic_exponent_beta": 1.0, "intensity_exponent_gamma": 1.0}, + }, + } + cfg["protocol"]["outputs"]["plots"] = {"profile": "none", "include": ["sfxi_setpoint_scatter"]} + return cfg + + def _retron_config() -> dict: return base_reader_config( experiment_id="exp_retron", @@ -126,7 +142,7 @@ def test_plot_list_empty(tmp_path: Path) -> None: runner = CliRunner() result = runner.invoke(app, ["plot", str(cfg_path), "--list"]) assert result.exit_code == 0 - assert "No plot specs configured" in result.output + assert "No plots configured" in result.output def test_plot_list_json(tmp_path: Path) -> None: @@ -175,6 +191,63 @@ def test_plot_list_json_surfaces_source_contract_metadata(tmp_path: Path) -> Non assert read["source"]["surface"]["rendered"] == "plate_reader.annotated.v1" +def test_logic_sfxi_plot_list_surfaces_setpoint_scatter(tmp_path: Path) -> None: + cfg = write_config(tmp_path, _logic_sfxi_scatter_config()) + runner = CliRunner() + + result = runner.invoke(app, ["plot", str(cfg), "--list", "--format", "json"]) + + assert result.exit_code == 0 + payload = json.loads(result.output) + assert payload["summary"]["plots"] == 1 + assert payload["summary"]["by_plugin"] == {"plot/sfxi_setpoint_scatter": 1} + assert payload["plots"][0]["id"] == "sfxi_setpoint_scatter" + read = payload["plots"][0]["reads"][0] + assert read["ref"] == {"record": "sfxi_vec8/vec8"} + assert read["contract"] == "sfxi.vec8.v2" + + +def test_logic_sfxi_plot_dry_run_reports_missing_dnadesign_public_api(tmp_path: Path, monkeypatch) -> None: + real_import_module = importlib.import_module + + def _fake_import_module(name: str, package: str | None = None): + if name == "dnadesign.opal.api.sfxi": + raise ModuleNotFoundError(name) + return real_import_module(name, package) + + monkeypatch.setattr(importlib, "import_module", _fake_import_module) + cfg = write_config(tmp_path, _logic_sfxi_scatter_config()) + runner = CliRunner() + + result = runner.invoke(app, ["plot", str(cfg), "--dry-run"]) + + assert result.exit_code != 0 + assert "reader[dnadesign]" in _plain(result.output) + + +def test_logic_sfxi_validate_reports_missing_dnadesign_public_api(tmp_path: Path, monkeypatch) -> None: + real_import_module = importlib.import_module + + def _fake_import_module(name: str, package: str | None = None): + if name == "dnadesign.opal.api.sfxi": + raise ModuleNotFoundError(name) + return real_import_module(name, package) + + monkeypatch.setattr(importlib, "import_module", _fake_import_module) + cfg = write_config(tmp_path, _logic_sfxi_scatter_config()) + inputs_dir = tmp_path / "inputs" + inputs_dir.mkdir(parents=True) + (inputs_dir / "metadata.xlsx").write_text("stub", encoding="utf-8") + runner = CliRunner() + + result = runner.invoke(app, ["validate", str(cfg), "--format", "json"]) + + assert result.exit_code == 1 + payload = json.loads(result.output) + assert payload["summary"]["status"] == "error" + assert any("reader[dnadesign]" in message for message in payload["validation"]["errors"]) + + def test_retron_plot_list_json(tmp_path: Path) -> None: cfg = write_config(tmp_path, _retron_config()) runner = CliRunner() @@ -235,6 +308,33 @@ def test_plot_json_requires_list(tmp_path: Path) -> None: assert "only supported with --list" in _plain(result.output) +@pytest.mark.parametrize( + ("args", "expected"), + [ + (["--list", "--dry-run"], "--dry-run cannot be combined with --list"), + (["--list", "--input", "df={record: ratio_yfp_od600/df}"], "--input cannot be combined with --list"), + (["--list", "--set", "with.time=6.0"], "--set cannot be combined with --list"), + ], +) +def test_plot_list_rejects_ignored_execution_flags(tmp_path: Path, args: list[str], expected: str) -> None: + cfg = write_config(tmp_path, _base_config()) + runner = CliRunner() + result = runner.invoke(app, ["plot", str(cfg), *args]) + assert result.exit_code != 0 + assert expected in _plain(result.output) + + +def test_plot_rejects_empty_selection_after_filters(tmp_path: Path) -> None: + cfg = write_config(tmp_path, _base_config()) + runner = CliRunner() + result = runner.invoke( + app, + ["plot", str(cfg), "--exclude", "raw_kinetics", "--exclude", "endpoint_by_condition", "--dry-run"], + ) + assert result.exit_code != 0 + assert "No plots selected" in _plain(result.output) + + def test_plot_requires_records(tmp_path: Path) -> None: cfg = write_config(tmp_path, _base_config()) runner = CliRunner() @@ -254,6 +354,14 @@ def test_plot_dry_run_does_not_require_records(tmp_path: Path) -> None: assert "raw_kinetics" in result.output +def test_plot_dry_run_allows_non_active_lifecycle(tmp_path: Path) -> None: + cfg = write_config(tmp_path, {**_base_config(), "experiment": {"id": "exp_cli", "lifecycle": "draft"}}) + runner = CliRunner() + result = runner.invoke(app, ["plot", str(cfg), "--dry-run"]) + assert result.exit_code == 0 + assert "raw_kinetics" in result.output + + def test_export_requires_records(tmp_path: Path) -> None: cfg = write_config(tmp_path, _base_config()) runner = CliRunner() @@ -273,6 +381,31 @@ def test_export_dry_run_does_not_require_records(tmp_path: Path) -> None: assert "crosstalk_pairs_table" in result.output +def test_export_dry_run_allows_non_active_lifecycle(tmp_path: Path) -> None: + cfg = write_config(tmp_path, {**_base_config(), "experiment": {"id": "exp_cli", "lifecycle": "draft"}}) + runner = CliRunner() + result = runner.invoke(app, ["export", str(cfg), "--dry-run"]) + assert result.exit_code == 0 + assert "crosstalk_pairs_table" in result.output + + +@pytest.mark.parametrize("command", ["plot", "export"]) +def test_plot_export_surfaces_corrupt_record_catalog_error(tmp_path: Path, command: str) -> None: + cfg = write_config(tmp_path, _base_config()) + records_path = tmp_path / "outputs" / "manifests" / "records.json" + records_path.parent.mkdir(parents=True, exist_ok=True) + records_path.write_text("{not-json", encoding="utf-8") + + runner = CliRunner() + result = runner.invoke(app, [command, str(cfg)]) + + assert result.exit_code != 0 + text = _plain(result.output) + assert "Could not read record catalog" in text + assert "records.json is not valid JSON" in text + assert "Run 'uv run reader run" not in text + + def test_plot_year_list(tmp_path: Path, monkeypatch) -> None: runner = CliRunner() year_dir = tmp_path / "experiments" / "2025" @@ -303,6 +436,84 @@ def test_plot_year_json_requires_single_experiment_listing(tmp_path: Path, monke assert "single-experiment plot" in result.output +def test_plot_year_dry_run_preflights_batch_before_execution(tmp_path: Path, monkeypatch) -> None: + runner = CliRunner() + year_dir = tmp_path / "experiments" / "2025" + exp_a = year_dir / "exp_a" + exp_b = year_dir / "exp_b" + exp_a.mkdir(parents=True) + exp_b.mkdir(parents=True) + write_config(exp_a, _base_config()) + cfg_b = _base_config() + cfg_b["protocol"]["outputs"]["plots"] = {"profile": "none"} + write_config(exp_b, cfg_b) + + calls: list[str] = [] + + def _fake_run_plot_job(job_path: Path, **kwargs) -> None: + calls.append(job_path.parent.name) + + monkeypatch.chdir(tmp_path) + monkeypatch.setattr("reader.workbench.cli.surfaces._run_plot_job", _fake_run_plot_job) + result = runner.invoke(app, ["plot", "--year", "2025", "--dry-run"]) + + assert result.exit_code != 0 + assert "No plots configured in this experiment" in _plain(result.output) + assert calls == [] + + +def test_plot_year_run_preflights_batch_before_mutation(tmp_path: Path, monkeypatch) -> None: + runner = CliRunner() + year_dir = tmp_path / "experiments" / "2025" + exp_a = year_dir / "exp_a" + exp_b = year_dir / "exp_b" + exp_a.mkdir(parents=True) + exp_b.mkdir(parents=True) + write_config(exp_a, _base_config()) + cfg_b = _base_config() + cfg_b["experiment"] = {"id": "exp_b", "lifecycle": "draft"} + write_config(exp_b, cfg_b) + + calls: list[str] = [] + + def _fake_run_plot_job(job_path: Path, **kwargs) -> None: + calls.append(job_path.parent.name) + + monkeypatch.chdir(tmp_path) + monkeypatch.setattr("reader.workbench.cli.surfaces.require_dataframe_records", lambda decl, job_path, runtime: None) + monkeypatch.setattr("reader.workbench.cli.surfaces._run_plot_job", _fake_run_plot_job) + result = runner.invoke(app, ["plot", "--year", "2025"]) + + assert result.exit_code != 0 + assert "lifecycle 'draft'" in _plain(result.output) + assert calls == [] + + +def test_plot_year_run_preflights_override_errors_before_mutation(tmp_path: Path, monkeypatch) -> None: + runner = CliRunner() + year_dir = tmp_path / "experiments" / "2025" + exp_a = year_dir / "exp_a" + exp_b = year_dir / "exp_b" + exp_a.mkdir(parents=True) + exp_b.mkdir(parents=True) + write_config(exp_a, _base_config()) + write_config(exp_b, _base_config()) + + calls: list[str] = [] + + def _fake_run_plot_job(job_path: Path, **kwargs) -> None: + calls.append(job_path.parent.name) + + monkeypatch.chdir(tmp_path) + monkeypatch.setattr("reader.workbench.cli.surfaces.require_dataframe_records", lambda decl, job_path, runtime: None) + monkeypatch.setattr("reader.workbench.cli.surfaces._run_plot_job", _fake_run_plot_job) + result = runner.invoke(app, ["plot", "--year", "2025", "--set", "bad.path=1"]) + + assert result.exit_code != 0 + assert "--set path must start with reads., with., or writes." in _plain(result.output) + assert calls == [] + + def test_export_list_filters(tmp_path: Path) -> None: cfg = write_config(tmp_path, _base_config()) runner = CliRunner() @@ -328,7 +539,7 @@ def test_export_list_empty(tmp_path: Path) -> None: runner = CliRunner() result = runner.invoke(app, ["export", str(cfg_path), "--list"]) assert result.exit_code == 0 - assert "No export specs configured" in result.output + assert "No exports configured" in result.output def test_export_list_json(tmp_path: Path) -> None: @@ -383,6 +594,30 @@ def test_export_json_requires_list(tmp_path: Path) -> None: assert "only supported with --list" in _plain(result.output) +@pytest.mark.parametrize( + ("args", "expected"), + [ + (["--list", "--dry-run"], "--dry-run cannot be combined with --list"), + (["--list", "--input", "df={record: ratio_yfp_od600/df}"], "--input cannot be combined with --list"), + (["--list", "--set", "with.path=exports/crosstalk_pairs.csv"], "--set cannot be combined with --list"), + ], +) +def test_export_list_rejects_ignored_execution_flags(tmp_path: Path, args: list[str], expected: str) -> None: + cfg = write_config(tmp_path, _base_config()) + runner = CliRunner() + result = runner.invoke(app, ["export", str(cfg), *args]) + assert result.exit_code != 0 + assert expected in _plain(result.output) + + +def test_export_rejects_empty_selection_after_filters(tmp_path: Path) -> None: + cfg = write_config(tmp_path, _base_config()) + runner = CliRunner() + result = runner.invoke(app, ["export", str(cfg), "--exclude", "crosstalk_pairs_table", "--dry-run"]) + assert result.exit_code != 0 + assert "No exports selected" in _plain(result.output) + + def test_validate_checks_files_by_default(tmp_path: Path) -> None: cfg_path = write_config(tmp_path, _base_config()) inputs_dir = tmp_path / "inputs" diff --git a/src/reader/tests/cli/test_records.py b/src/reader/tests/cli/test_records.py index 0bf1d4f..736ea6e 100644 --- a/src/reader/tests/cli/test_records.py +++ b/src/reader/tests/cli/test_records.py @@ -24,6 +24,34 @@ def test_records_requires_catalog(tmp_path) -> None: result = runner.invoke(app, ["records", str(config)]) assert result.exit_code == 1 assert "No outputs/manifests/records.json found" in result.output + text = " ".join(result.output.split()) + assert "Run 'uv run reader run" in text + assert config.parent.name in text + + +def test_records_lists_catalog_for_non_active_lifecycle(tmp_path) -> None: + config = tmp_path / "config.yaml" + config.write_text( + "schema: reader/v7\nexperiment:\n id: exp\n lifecycle: draft\nprotocol:\n id: workbench/generic\n", + encoding="utf-8", + ) + outputs = tmp_path / "outputs" + store = RecordStore(outputs, contracts=builtin_contract_catalog()) + df = pd.DataFrame({"position": ["A1"], "time": [0.0], "channel": ["OD600"], "value": [1.0]}) + store.persist_dataframe( + producer_id="ingest", + producer_plugin="ingest/synergy_h1", + out_name="df", + record_id="ingest/df", + df=df, + contract_id="tidy.v1", + inputs=[], + config_digest="sha256:test", + ) + runner = CliRunner() + result = runner.invoke(app, ["records", str(config)]) + assert result.exit_code == 0 + assert "ingest/df" in result.output def test_records_lists_dataframe_and_file_bundle_entries(tmp_path) -> None: diff --git a/src/reader/tests/cli/test_run.py b/src/reader/tests/cli/test_run.py index 5c871d3..4981768 100644 --- a/src/reader/tests/cli/test_run.py +++ b/src/reader/tests/cli/test_run.py @@ -4,10 +4,13 @@ import re from pathlib import Path +import pytest +import typer from typer.testing import CliRunner -from reader.tests.support import base_reader_config, write_config +from reader.tests.support import base_reader_config, load_decl, write_config from reader.workbench.cli import app +from reader.workbench.cli.helpers import resolve_pipeline_step_id def _plain(text: str) -> str: @@ -59,3 +62,88 @@ def test_run_json_requires_dry_run(tmp_path: Path) -> None: result = runner.invoke(app, ["run", str(cfg), "--format", "json"]) assert result.exit_code != 0 assert "only supported with --dry-run" in _plain(result.output) + + +def test_run_dry_run_allows_non_active_lifecycle(tmp_path: Path) -> None: + cfg = write_config(tmp_path, {**_run_config(), "experiment": {"id": "exp_run", "lifecycle": "draft"}}) + runner = CliRunner() + result = runner.invoke(app, ["run", str(cfg), "--dry-run"]) + assert result.exit_code == 0 + assert "DRY RUN" in _plain(result.output) + + +def test_run_only_shows_next_steps(tmp_path: Path, monkeypatch) -> None: + cfg = write_config(tmp_path, _run_config()) + captured: dict[str, object] = {} + + def _fake_run_job(*args, **kwargs): + captured["args"] = args + captured["kwargs"] = kwargs + + monkeypatch.setattr("reader.workbench.engine.run_job", _fake_run_job) + + runner = CliRunner() + result = runner.invoke(app, ["run", str(cfg), "--only", "ingest"]) + + assert result.exit_code == 0 + kwargs = dict(captured["kwargs"]) + assert kwargs["resume_from"] == "ingest" + assert kwargs["until"] == "ingest" + assert kwargs["show_next_steps"] is True + + +def test_resolve_pipeline_step_id_hint_includes_target_config(tmp_path: Path) -> None: + cfg = write_config(tmp_path, _run_config()) + decl = load_decl(cfg) + + with pytest.raises(typer.BadParameter, match="uv run reader steps") as exc_info: + resolve_pipeline_step_id(decl, "missing_step", job_path=cfg) + + assert str(cfg) in str(exc_info.value) + + +def test_read_only_commands_do_not_create_journal(tmp_path: Path) -> None: + cfg_payload = base_reader_config( + experiment_id="exp_read_only", + protocol_id="plate_reader/dual_reporter_screen", + protocol_inputs={"fold_change": {"report_times": [14.0]}}, + protocol_analysis={"crosstalk_pairs": {"enabled": True, "export": True}}, + protocol_outputs={ + "plots": {"profile": "none", "include": ["raw_kinetics"]}, + "exports": {"include": ["crosstalk_pairs_table"]}, + }, + resources={"sample_map": {"kind": "file", "path": "./inputs/metadata.xlsx"}}, + ) + cfg = write_config(tmp_path, cfg_payload) + runner = CliRunner() + + commands = [ + ["explain", str(cfg)], + ["validate", str(cfg), "--no-files"], + ["run", str(cfg), "--dry-run"], + ["plot", str(cfg), "--list"], + ["plot", str(cfg), "--dry-run"], + ["export", str(cfg), "--list"], + ["export", str(cfg), "--dry-run"], + ] + + for command in commands: + result = runner.invoke(app, command) + assert result.exit_code == 0, _plain(result.output) + + assert not (tmp_path / "JOURNAL.md").exists() + assert not (tmp_path / "journal.md").exists() + + +def test_run_reports_split_case_journal_conflict_without_traceback(monkeypatch, tmp_path: Path) -> None: + cfg = write_config(tmp_path, base_reader_config(experiment_id="exp_run")) + (tmp_path / "JOURNAL.md").write_text("# Experiment Journal\n", encoding="utf-8") + (tmp_path / "journal.md").write_text("# Experiment Journal\n", encoding="utf-8") + monkeypatch.setattr(Path, "samefile", lambda self, other: False) + + runner = CliRunner() + result = runner.invoke(app, ["run", str(cfg)]) + + assert result.exit_code == 1 + assert "Both JOURNAL.md and journal.md exist" in _plain(result.output) + assert "Traceback" not in result.output diff --git a/src/reader/tests/cli/test_ux.py b/src/reader/tests/cli/test_ux.py index d19dc2d..b3057d7 100644 --- a/src/reader/tests/cli/test_ux.py +++ b/src/reader/tests/cli/test_ux.py @@ -14,13 +14,15 @@ from pathlib import Path import pandas as pd +import pytest +import typer from rich.console import Console from typer.testing import CliRunner from reader.contracts import builtin_contract_catalog from reader.protocols import ProtocolBinding, builtin_protocol_catalog from reader.runtime import ReaderRuntime -from reader.tests.support import base_reader_config, build_decl, write_config +from reader.tests.support import base_reader_config, build_decl, default_notebook_name, write_config from reader.workbench import PluginSemantics, cli from reader.workbench.assets import AssetCatalog, build_plugin_asset from reader.workbench.config import ReaderSpec @@ -34,6 +36,10 @@ def _plain(text: str) -> str: return re.sub(r"\x1b\[[0-?]*[ -/]*[@-~]", "", text) +def _compiled_semantic_program(payload: dict) -> dict: + return payload["implementation"]["compiled"]["semantic_program"] + + def _tidy_df() -> pd.DataFrame: return pd.DataFrame( { @@ -107,7 +113,7 @@ def test_numeric_job_index_ignores_template_dirs(monkeypatch, tmp_path: Path) -> assert cli._infer_job_path("1") == (year_dir / "config.yaml").resolve() -def test_numeric_job_index_can_resolve_scaffold_index_from_ls_all(monkeypatch, tmp_path: Path) -> None: +def test_numeric_job_index_rejects_hidden_scaffold_index(monkeypatch, tmp_path: Path) -> None: exp_root = tmp_path / "experiments" year_dir = exp_root / "2025" / "real_exp" template_dir = exp_root / "2025" / "_template_alpha" @@ -117,7 +123,8 @@ def test_numeric_job_index_can_resolve_scaffold_index_from_ls_all(monkeypatch, t write_config(year_dir / "config.yaml", _base_config()) monkeypatch.chdir(tmp_path) - assert cli._infer_job_path("1") == (template_dir / "config.yaml").resolve() + with pytest.raises(typer.BadParameter, match="hidden scaffold/template config"): + cli._infer_job_path("1") assert cli._infer_job_path("2") == (year_dir / "config.yaml").resolve() @@ -186,8 +193,10 @@ def test_ls_details_shows_protocol_and_output_counts(monkeypatch, tmp_path: Path output = test_console.export_text() assert "Protocol" in output assert "plate_reader/dual_repo" in output - assert "Selected" in output - assert "Generated" in output + assert "Selec" in output + assert "ted" in output + assert "Gener" in output + assert "ated" in output assert "1 rec" in output @@ -299,6 +308,30 @@ def test_ls_json_surfaces_legacy_outputs_without_record_catalog(tmp_path: Path) assert entry["readiness"]["records"]["legacy_outputs_present"] is True +def test_ls_json_does_not_treat_notebook_only_scaffolds_as_legacy_outputs(tmp_path: Path) -> None: + exp_root = tmp_path / "experiments" + exp_dir = exp_root / "2025" / "notebook_only" + exp_dir.mkdir(parents=True) + cfg_path = write_config(exp_dir / "config.yaml", base_reader_config(experiment_id="notebook_only")) + + runner = CliRunner() + scaffold_result = runner.invoke(cli.app, ["notebook", str(cfg_path), "--mode", "none"]) + assert scaffold_result.exit_code == 0, scaffold_result.output + assert (exp_dir / "outputs" / "notebooks" / default_notebook_name()).exists() + + result = runner.invoke( + cli.app, + ["ls", "--root", str(exp_root), "--details", "--readiness", "--format", "json"], + ) + assert result.exit_code == 0 + payload = json.loads(result.output) + assert payload["summary"]["by_readiness"] == {"runnable": 1} + entry = payload["experiments"][0] + assert entry["readiness"]["state"] == "runnable" + assert entry["readiness"]["records"]["catalog"] is False + assert entry["readiness"]["records"]["legacy_outputs_present"] is False + + def test_ls_can_filter_by_protocol_and_status(tmp_path: Path) -> None: exp_root = tmp_path / "experiments" good_dir = exp_root / "2025" / "good_plate" @@ -432,14 +465,16 @@ def test_steps_json_surfaces_pipeline_bindings(tmp_path: Path) -> None: result = runner.invoke(cli.app, ["steps", str(cfg), "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) assert payload["experiment"]["protocol"] == "plate_reader/dual_reporter_screen" assert payload["authoring"]["inputs"]["fold_change"]["report_times"] == [14.0] assert payload["semantics"]["program"]["metrics"][0]["id"] == "OD" - assert payload["semantics"]["program"]["metrics"][0]["execution"]["status"] == "compiled" - assert payload["semantics"]["program"]["summary"]["compiled"] >= 1 - assert payload["semantics"]["program"]["summary"]["descriptive_only"] == 0 assert payload["semantics"]["program"]["active_profile"] == "yfp_cfp_crosstalk" - assert payload["semantics"]["program"]["ranking"]["execution"]["status"] == "compiled" + assert compiled_program["metrics"][0]["execution"]["status"] == "compiled" + assert compiled_program["summary"]["compiled"] >= 1 + assert compiled_program["summary"]["descriptive_only"] == 0 + assert compiled_program["active_profile"] == "yfp_cfp_crosstalk" + assert compiled_program["ranking"]["execution"]["status"] == "compiled" assert payload["implementation"]["plan"]["pipeline_count"] >= 1 assert payload["implementation"]["plan"]["plots"] == [] assert payload["implementation"]["compiled"]["plots"] == [] @@ -459,15 +494,17 @@ def test_config_json_surfaces_authoring_semantics_and_implementation(tmp_path: P result = runner.invoke(cli.app, ["config", str(cfg), "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) assert payload["experiment"]["protocol"] == "plate_reader/dual_reporter_screen" assert payload["authoring"]["schema"] == "reader/v7" assert payload["authoring"]["protocol"]["id"] == "plate_reader/dual_reporter_screen" assert payload["semantics"]["program"]["metrics"][0]["id"] == "OD" - assert payload["semantics"]["program"]["metrics"][0]["execution"]["status"] == "compiled" assert payload["semantics"]["program"]["active_profile"] == "yfp_cfp_crosstalk" assert payload["semantics"]["program"]["controls"] == [] assert payload["semantics"]["program"]["windows"] == [] assert payload["semantics"]["program"]["ranking"]["primary_metric"] == "log2FC" + assert compiled_program["metrics"][0]["execution"]["status"] == "compiled" + assert compiled_program["active_profile"] == "yfp_cfp_crosstalk" assert payload["implementation"]["plan"]["pipeline_flow"][0] == "ingest" assert payload["implementation"]["compiled"]["pipeline"][0]["id"] == "ingest" assert payload["implementation"]["compiled"]["plots"][0]["id"] == "raw_kinetics" @@ -481,13 +518,14 @@ def test_explain_json_surfaces_compiled_plan(tmp_path: Path) -> None: result = runner.invoke(cli.app, ["explain", str(cfg), "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) assert payload["experiment"]["protocol"] == "plate_reader/dual_reporter_screen" assert payload["authoring"]["inputs"]["fold_change"]["report_times"] == [14.0] assert payload["semantics"]["program"]["active_profile"] == "yfp_cfp_crosstalk" assert payload["semantics"]["program"]["controls"] == [] assert payload["semantics"]["program"]["windows"] == [] - assert payload["semantics"]["program"]["ranking"]["execution"]["status"] == "compiled" assert payload["semantics"]["program"]["summary"]["total"] >= 1 + assert compiled_program["ranking"]["execution"]["status"] == "compiled" assert payload["implementation"]["plan"]["pipeline_flow"][0] == "ingest" assert "sample_map" in payload["implementation"]["plan"]["resources"] assert payload["implementation"]["compiled"]["plots"][0]["semantics"]["category"] == "plot" @@ -601,8 +639,10 @@ def test_plugins_command_shows_workbench_semantics(monkeypatch) -> None: output = test_console.export_text() assert "plate_reader" in output assert "test_plot" in output - assert "Synthetic plot plugin" in output - assert "for CLI tests." in output + assert "Synthetic" in output + assert "plot plugin" in output + assert "for CLI" in output + assert "tests." in output def test_protocols_command_filters_by_family() -> None: @@ -622,9 +662,9 @@ def test_protocols_command_lists_builtin_protocols() -> None: assert "plate_reader/dual_reporter_screen" in result.output assert "Dual-reporter plate-reader panel protocol" in result.output assert "notebook/eda" in result.output - assert "Inputs Surface" in result.output + assert "Inputs" in result.output assert "ingest.mode" in result.output - assert "Analysis Surface" in result.output + assert "Analysis" in result.output assert "Semantic Program" in result.output assert "Plot Profiles" in result.output assert "Plot Outputs" in result.output @@ -633,6 +673,14 @@ def test_protocols_command_lists_builtin_protocols() -> None: assert "Plot Implementations" in result.output +def test_protocols_command_lists_semantic_nodes_without_ranking() -> None: + runner = CliRunner() + result = runner.invoke(cli.app, ["protocols", "plate_reader/single_reporter_screen"], terminal_width=160) + assert result.exit_code == 0 + assert "Semantic nodes" in result.output + assert "Compiled Semantic Execution" in result.output + + def test_protocols_command_can_render_example_config() -> None: runner = CliRunner() result = runner.invoke( @@ -778,14 +826,14 @@ def test_inspect_command_surfaces_pipeline_and_outputs(tmp_path: Path) -> None: assert "Experiment overview" in result.output assert "Readiness" in result.output assert "records ready" in result.output - assert "Authoring bindings" in result.output + assert "Config values" in result.output assert "Semantic Program" in result.output assert "fold_change.report_times" in result.output assert "Pipeline chain" in result.output assert "Plot outputs" in result.output - assert "Export artifacts" in result.output + assert "Exports" in result.output assert "Generated outputs" in result.output - assert "Record catalog" in result.output + assert "Records" in result.output assert "raw_kinetics" in result.output assert "crosstalk_pairs_table" in result.output @@ -813,13 +861,15 @@ def test_inspect_command_can_emit_json(tmp_path: Path) -> None: result = runner.invoke(cli.app, ["inspect", str(cfg_path), "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) assert payload["experiment"]["protocol"] == "plate_reader/dual_reporter_screen" assert payload["experiment"]["lifecycle"] == "active" assert payload["semantics"]["program"]["metrics"][0]["id"] == "OD" - assert payload["semantics"]["program"]["metrics"][0]["execution"]["status"] == "compiled" - assert payload["semantics"]["program"]["summary"]["descriptive_only"] == 0 assert payload["semantics"]["program"]["active_profile"] == "yfp_cfp_crosstalk" - assert payload["semantics"]["program"]["ranking"]["execution"]["status"] == "compiled" + assert compiled_program["metrics"][0]["execution"]["status"] == "compiled" + assert compiled_program["summary"]["descriptive_only"] == 0 + assert compiled_program["active_profile"] == "yfp_cfp_crosstalk" + assert compiled_program["ranking"]["execution"]["status"] == "compiled" assert payload["authoring"]["inputs"]["fold_change"]["report_times"] == [14.0] assert "sample_map" in payload["implementation"]["plan"]["resources"] assert payload["implementation"]["inputs"]["counts"]["files"] == 2 @@ -844,8 +894,13 @@ def test_plugins_command_can_filter_by_protocol(monkeypatch) -> None: output = test_console.export_text() assert "plate_reader/dual_reporter_screen" in output - assert "Attach well-position sample maps" in output - assert "Summarize nearest-time fold-change tables" in output + assert "Attach" in output + assert "well-posit" in output + assert "ion sample" in output + assert "maps" in output + assert "Summarize" in output + assert "nearest-ti" in output + assert "fold-chang" in output assert "validator" not in output @@ -854,21 +909,23 @@ def test_protocols_command_can_emit_json() -> None: result = runner.invoke(cli.app, ["protocols", "plate_reader/dual_reporter_screen", "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) metrics = {item["id"]: item for item in payload["semantics"]["program"]["metrics"]} assert payload["protocol"] == "plate_reader/dual_reporter_screen" - assert metrics["OD"]["execution"]["status"] == "compiled" assert metrics["Ratio"]["formula"] == "YFP / CFP" assert metrics["Ratio"]["value_space"] == "linear_ratio" assert metrics["Ratio"]["unit"] == "ratio" assert metrics["Ratio"]["comparable_group"] == "primary_ratio_linear" - assert metrics["Ratio"]["execution"]["step_ids"] == ["ratio_yfp_cfp"] assert payload["semantics"]["program"]["active_profile"] == "yfp_cfp_fold_change" - assert payload["semantics"]["program"]["summary"]["descriptive_only"] == 0 assert payload["semantics"]["program"]["controls"] == [] assert payload["semantics"]["program"]["windows"] == [] - assert metrics["FC"]["execution"]["step_ids"] == ["fold_change__yfp_over_cfp"] - assert metrics["log2FC"]["execution"]["record_ids"] == ["fold_change__yfp_over_cfp/table"] assert payload["semantics"]["program"]["ranking"] is None + compiled_metrics = {item["id"]: item for item in compiled_program["metrics"]} + assert compiled_metrics["OD"]["execution"]["status"] == "compiled" + assert compiled_metrics["Ratio"]["execution"]["step_ids"] == ["ratio_yfp_cfp"] + assert compiled_program["summary"]["descriptive_only"] == 0 + assert compiled_metrics["FC"]["execution"]["step_ids"] == ["fold_change__yfp_over_cfp"] + assert compiled_metrics["log2FC"]["execution"]["record_ids"] == ["fold_change__yfp_over_cfp/table"] assert payload["authoring"]["starter_config"]["schema"] == "reader/v7" assert payload["implementation"]["compiled"]["pipeline"][0]["id"] == "ingest" assert any(item["id"] == "screen_overview" for item in payload["authoring"]["outputs"]["plot_profiles"]) @@ -882,14 +939,16 @@ def test_protocols_command_json_surfaces_compiled_logic_semantic_program() -> No result = runner.invoke(cli.app, ["protocols", "logic/sfxi_screen", "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) assert payload["protocol"] == "logic/sfxi_screen" - assert payload["semantics"]["program"]["summary"]["compiled"] == 4 - assert payload["semantics"]["program"]["summary"]["descriptive_only"] == 0 - assert payload["semantics"]["program"]["controls"][0]["execution"]["status"] == "compiled" - assert payload["semantics"]["program"]["controls"][0]["execution"]["step_ids"] == ["sfxi_vec8"] - assert payload["semantics"]["program"]["windows"][0]["execution"]["status"] == "compiled" - assert payload["semantics"]["program"]["metrics"][0]["execution"]["record_ids"] == ["sfxi_vec8/vec8"] - assert payload["semantics"]["program"]["ranking"]["execution"]["status"] == "compiled" + assert payload["semantics"]["program"]["summary"]["total"] == 4 + assert compiled_program["summary"]["compiled"] == 4 + assert compiled_program["summary"]["descriptive_only"] == 0 + assert compiled_program["controls"][0]["execution"]["status"] == "compiled" + assert compiled_program["controls"][0]["execution"]["step_ids"] == ["sfxi_vec8"] + assert compiled_program["windows"][0]["execution"]["status"] == "compiled" + assert compiled_program["metrics"][0]["execution"]["record_ids"] == ["sfxi_vec8/vec8"] + assert compiled_program["ranking"]["execution"]["status"] == "compiled" def test_protocols_command_json_surfaces_retron_sponge_semantics() -> None: @@ -897,8 +956,10 @@ def test_protocols_command_json_surfaces_retron_sponge_semantics() -> None: result = runner.invoke(cli.app, ["protocols", "plate_reader/retron_sponge_screen", "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) program = payload["semantics"]["program"] metrics = {item["id"]: item for item in program["metrics"]} + compiled_metrics = {item["id"]: item for item in compiled_program["metrics"]} figure_ids = {item["id"] for item in payload["authoring"]["outputs"]["figures"]} plot_profile_ids = {item["id"] for item in payload["authoring"]["outputs"]["plot_profiles"]} assert payload["protocol"] == "plate_reader/retron_sponge_screen" @@ -933,16 +994,16 @@ def test_protocols_command_json_surfaces_retron_sponge_semantics() -> None: assert metrics["R"]["value_space"] == "log2_ratio" assert metrics["R"]["unit"] == "log2_ratio" assert metrics["R"]["comparable_group"] == "primary_ratio_log2" - assert metrics["R"]["execution"]["status"] == "compiled" - assert metrics["R"]["execution"]["record_ids"] == ["semantic_metrics/trace"] - assert metrics["D_AUC"]["execution"]["status"] == "compiled" - assert metrics["D_AUC"]["execution"]["record_ids"] == ["semantic_metrics/summary"] - assert metrics["D_abs_AUC"]["execution"]["record_ids"] == ["semantic_metrics/summary"] - assert metrics["D_growth_AUC"]["execution"]["record_ids"] == ["semantic_metrics/summary"] - assert program["controls"][0]["execution"]["status"] == "compiled" - assert program["windows"][0]["execution"]["step_ids"] == ["semantic_metrics"] + assert compiled_metrics["R"]["execution"]["status"] == "compiled" + assert compiled_metrics["R"]["execution"]["record_ids"] == ["semantic_metrics/trace"] + assert compiled_metrics["D_AUC"]["execution"]["status"] == "compiled" + assert compiled_metrics["D_AUC"]["execution"]["record_ids"] == ["semantic_metrics/summary"] + assert compiled_metrics["D_abs_AUC"]["execution"]["record_ids"] == ["semantic_metrics/summary"] + assert compiled_metrics["D_growth_AUC"]["execution"]["record_ids"] == ["semantic_metrics/summary"] + assert compiled_program["controls"][0]["execution"]["status"] == "compiled" + assert compiled_program["windows"][0]["execution"]["step_ids"] == ["semantic_metrics"] assert program["ranking"]["primary_metric"] == "O_abs_AUC" - assert program["ranking"]["execution"]["record_ids"] == ["semantic_metrics/summary"] + assert compiled_program["ranking"]["execution"]["record_ids"] == ["semantic_metrics/summary"] assert payload["implementation"]["compiled"]["pipeline"][-1]["id"] == "semantic_metrics" @@ -966,33 +1027,38 @@ def test_inspect_json_surfaces_active_single_reporter_semantic_profile(tmp_path: result = runner.invoke(cli.app, ["inspect", str(cfg_path), "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) program = payload["semantics"]["program"] metrics = {item["id"]: item for item in program["metrics"]} + compiled_metrics = {item["id"]: item for item in compiled_program["metrics"]} assert program["active_profile"] == "single_reporter_raw" assert {item["id"] for item in program["profiles"]} == { "single_reporter_raw", "single_reporter_fold_change", } assert set(metrics) == {"Normalizer", "Reporter", "Reporter_Normalizer"} - assert metrics["Reporter"]["execution"]["status"] == "compiled" assert metrics["Normalizer"]["formula"] == "configured_normalizer_channel" - assert metrics["Normalizer"]["execution"]["note"] == "Raw OD700 values are materialized on the ingest dataframe." assert metrics["Reporter_Normalizer"]["formula"] == "configured_reporter_channel / configured_normalizer_channel" - assert metrics["Reporter_Normalizer"]["execution"]["step_ids"] == ["ratio_reporter_normalizer"] + assert compiled_metrics["Reporter"]["execution"]["status"] == "compiled" + assert ( + compiled_metrics["Normalizer"]["execution"]["note"] + == "Raw OD700 values are materialized on the ingest dataframe." + ) + assert compiled_metrics["Reporter_Normalizer"]["execution"]["step_ids"] == ["ratio_reporter_normalizer"] assert program["controls"] == [] assert program["windows"] == [] assert program["ranking"] is None assert program["summary"] == { "total": 3, - "compiled": 3, - "descriptive_only": 0, "by_kind": { - "control_rule": {"total": 0, "compiled": 0, "descriptive_only": 0}, - "window": {"total": 0, "compiled": 0, "descriptive_only": 0}, - "metric": {"total": 3, "compiled": 3, "descriptive_only": 0}, - "ranking": {"total": 0, "compiled": 0, "descriptive_only": 0}, + "control_rule": 0, + "window": 0, + "metric": 3, + "ranking": 0, }, } + assert compiled_program["summary"]["compiled"] == 3 + assert compiled_program["summary"]["descriptive_only"] == 0 def test_inspect_json_surfaces_active_single_reporter_retron_sponge_profile(tmp_path: Path) -> None: @@ -1019,8 +1085,10 @@ def test_inspect_json_surfaces_active_single_reporter_retron_sponge_profile(tmp_ result = runner.invoke(cli.app, ["inspect", str(cfg_path), "--format", "json"]) assert result.exit_code == 0 payload = json.loads(result.output) + compiled_program = _compiled_semantic_program(payload) program = payload["semantics"]["program"] metrics = {item["id"]: item for item in program["metrics"]} + compiled_metrics = {item["id"]: item for item in compiled_program["metrics"]} assert program["active_profile"] == "single_reporter" assert {item["id"] for item in program["profiles"]} == {"yfp_cfp", "single_reporter"} @@ -1043,20 +1111,20 @@ def test_inspect_json_surfaces_active_single_reporter_retron_sponge_profile(tmp_ assert "CFP" not in metrics assert metrics["OD"]["formula"] == "configured_growth_channel" assert metrics["OD"]["summary"] == "Raw configured growth-proxy trace." - assert metrics["Reporter"]["execution"]["status"] == "compiled" assert metrics["R"]["formula"] == "log2(configured_reporter_channel / configured_growth_channel)" assert metrics["mu"]["formula"] == "d(log(configured_growth_channel)) / dt" assert metrics["R"]["value_space"] == "log2_ratio" - assert metrics["R"]["execution"]["step_ids"] == ["semantic_metrics"] assert metrics["Reporter_OD"]["formula"] == "configured_reporter_channel / configured_growth_channel" - assert metrics["Reporter_OD"]["execution"]["step_ids"] == ["ratio_reporter_normalizer"] + assert compiled_metrics["Reporter"]["execution"]["status"] == "compiled" + assert compiled_metrics["R"]["execution"]["step_ids"] == ["semantic_metrics"] + assert compiled_metrics["Reporter_OD"]["execution"]["step_ids"] == ["ratio_reporter_normalizer"] assert ( - metrics["Reporter_OD"]["execution"]["note"] + compiled_metrics["Reporter_OD"]["execution"]["note"] == "The mCherry/OD700 support channel is materialized as a ratio step output." ) - assert program["controls"][0]["execution"]["step_ids"] == ["semantic_metrics"] + assert compiled_program["controls"][0]["execution"]["step_ids"] == ["semantic_metrics"] assert program["ranking"]["primary_metric"] == "O_abs_AUC" - assert program["ranking"]["execution"]["record_ids"] == ["semantic_metrics/summary"] + assert compiled_program["ranking"]["execution"]["record_ids"] == ["semantic_metrics/summary"] def test_plate_reader_single_reporter_compiler_derives_channels_from_analysis() -> None: diff --git a/src/reader/tests/domains/logic/sfxi/test_setpoint_scatter.py b/src/reader/tests/domains/logic/sfxi/test_setpoint_scatter.py new file mode 100644 index 0000000..3b61ed6 --- /dev/null +++ b/src/reader/tests/domains/logic/sfxi/test_setpoint_scatter.py @@ -0,0 +1,150 @@ +from __future__ import annotations + +import importlib +from types import SimpleNamespace + +import pandas as pd +import pytest + +from reader.domains.logic.sfxi.setpoint_scatter import score_sfxi_setpoints +from reader.errors import SFXIError + + +def _vec8_df() -> pd.DataFrame: + return pd.DataFrame( + { + "design_id": ["p01", "p02"], + "sequence": ["AAAA", "CCCC"], + "experiment_id": ["exp_a", "exp_b"], + "experiment_date": ["20260101", "20260102"], + "time_selected_h": [10.0, 10.0], + "v00": [0.0, 0.0], + "v10": [0.0, 0.0], + "v01": [0.0, 0.0], + "v11": [1.0, 1.0], + "y00_star": [0.0, 0.0], + "y10_star": [0.0, 0.0], + "y01_star": [0.0, 0.0], + "y11_star": [1.0, 0.0], + "r_logic": [8.0, 4.0], + "flat_logic": [False, False], + } + ) + + +def test_score_sfxi_setpoints_uses_canonical_objective_names(monkeypatch) -> None: + class _FakeConfig: + def __init__(self, **kwargs): + self.kwargs = kwargs + + class _FakeResult: + api_version = "1" + objective_name = "sfxi_v1" + + def to_records(self): + return [ + { + "objective_name": self.objective_name, + "api_version": self.api_version, + "state_order": ["00", "10", "01", "11"], + "setpoint_vector": [0.0, 0.0, 0.0, 1.0], + "denom_percentile": 95, + "denom_used": 2.0, + "logic_fidelity": 1.0, + "effect_raw": 2.0, + "effect_scaled": 1.0, + "sfxi": 1.0, + "clip_lo_mask": False, + "clip_hi_mask": True, + "intensity_disabled": False, + }, + { + "objective_name": self.objective_name, + "api_version": self.api_version, + "state_order": ["00", "10", "01", "11"], + "setpoint_vector": [0.0, 0.0, 0.0, 1.0], + "denom_percentile": 95, + "denom_used": 2.0, + "logic_fidelity": 1.0, + "effect_raw": 1.0, + "effect_scaled": 0.5, + "sfxi": 0.5, + "clip_lo_mask": False, + "clip_hi_mask": False, + "intensity_disabled": False, + }, + ] + + fake_api = SimpleNamespace( + SFXI_API_VERSION="1", + SFXIScoringConfig=_FakeConfig, + score_vec8=lambda *args, **kwargs: _FakeResult(), + ) + real_import = importlib.import_module + + def _fake_import(name: str, package: str | None = None): + if name == "dnadesign.opal.api.sfxi": + return fake_api + return real_import(name, package) + + monkeypatch.setattr(importlib, "import_module", _fake_import) + + scored = score_sfxi_setpoints( + _vec8_df(), + setpoints={"and": [0.0, 0.0, 0.0, 1.0]}, + scaling_min_n=1, + ) + + assert list(scored["design_id"]) == ["p01", "p02"] + assert list(scored["experiment_id"]) == ["exp_a", "exp_b"] + assert list(scored["experiment_date"]) == ["20260101", "20260102"] + assert list(scored["setpoint_name"]) == ["and", "and"] + assert scored["logic_fidelity"].tolist() == pytest.approx([1.0, 1.0]) + assert scored["effect_scaled"].tolist() == pytest.approx([1.0, 0.5]) + assert scored["sfxi"].tolist() == pytest.approx([1.0, 0.5]) + assert "score" not in scored.columns + assert "f_logic" not in scored.columns + assert "e_scaled" not in scored.columns + + +def test_score_sfxi_setpoints_reports_missing_public_dnadesign_api(monkeypatch) -> None: + real_import = importlib.import_module + + def _blocked(name: str, package: str | None = None): + if name == "dnadesign.opal.api.sfxi": + raise ModuleNotFoundError(name) + return real_import(name, package) + + monkeypatch.setattr(importlib, "import_module", _blocked) + + with pytest.raises(SFXIError, match=r"reader\[dnadesign\]"): + score_sfxi_setpoints(_vec8_df(), setpoints={"and": [0.0, 0.0, 0.0, 1.0]}, scaling_min_n=1) + + +def test_score_sfxi_setpoints_wraps_transitive_public_api_import_failures(monkeypatch) -> None: + def _blocked(name: str, package: str | None = None): + if name == "dnadesign.opal.api.sfxi": + raise ImportError("missing transitive dependency") + raise AssertionError((name, package)) + + monkeypatch.setattr(importlib, "import_module", _blocked) + + with pytest.raises(SFXIError, match=r"reader\[dnadesign\]") as exc_info: + score_sfxi_setpoints(_vec8_df(), setpoints={"and": [0.0, 0.0, 0.0, 1.0]}, scaling_min_n=1) + + assert isinstance(exc_info.value.__cause__, ImportError) + + +def test_score_sfxi_setpoints_rejects_unsupported_public_api_version(monkeypatch) -> None: + fake_api = SimpleNamespace(SFXI_API_VERSION="2", SFXIScoringConfig=object, score_vec8=lambda *args, **kwargs: None) + real_import = importlib.import_module + + def _fake_import(name: str, package: str | None = None): + if name == "dnadesign.opal.api.sfxi": + return fake_api + return real_import(name, package) + + monkeypatch.setattr(importlib, "import_module", _fake_import) + + with pytest.raises(SFXIError, match="Unsupported dnadesign SFXI API version"): + score_sfxi_setpoints(_vec8_df(), setpoints={"and": [0.0, 0.0, 0.0, 1.0]}, scaling_min_n=1) diff --git a/src/reader/tests/domains/plate_reader/plots/test_plot_hardening.py b/src/reader/tests/domains/plate_reader/plots/test_plot_hardening.py index 72a87da..0c50584 100644 --- a/src/reader/tests/domains/plate_reader/plots/test_plot_hardening.py +++ b/src/reader/tests/domains/plate_reader/plots/test_plot_hardening.py @@ -28,7 +28,7 @@ from reader.errors import ConfigError from reader.plugins.plot.snapshot_barplot import SnapshotBarCfg from reader.plugins.plot.snapshot_heatmap import HeatmapCfg, SnapshotHeatmapPlot -from reader.protocols import ProtocolBinding +from reader.protocols import ProtocolBinding, ProtocolSemanticProgram from reader.tests.support import load_decl, write_config from reader.workbench.engine import validate as validate_job from reader.workbench.experiment import ( @@ -213,6 +213,7 @@ def test_snapshot_heatmap_render_resolves_order_refs_from_semantics() -> None: logger=logging.getLogger("reader.tests"), experiment=ExperimentSemantics( protocol=ProtocolBinding(id="plate_reader/dual_reporter_screen"), + protocol_program=ProtocolSemanticProgram(protocol="plate_reader/dual_reporter_screen"), annotations=AnnotationSemantics( orders=AnnotationOrders( by_id={ @@ -261,6 +262,7 @@ def test_snapshot_heatmap_render_rejects_unknown_order_ref() -> None: logger=logging.getLogger("reader.tests"), experiment=ExperimentSemantics( protocol=ProtocolBinding(id="plate_reader/dual_reporter_screen"), + protocol_program=ProtocolSemanticProgram(protocol="plate_reader/dual_reporter_screen"), annotations=AnnotationSemantics(orders=AnnotationOrders()), resources=ResourceCatalog(), layout=OutputLayout( diff --git a/src/reader/tests/engine/test_explain.py b/src/reader/tests/engine/test_explain.py index 215acb5..c845f65 100644 --- a/src/reader/tests/engine/test_explain.py +++ b/src/reader/tests/engine/test_explain.py @@ -12,7 +12,7 @@ from rich.console import Console from reader.contracts import OutputContractSurface, builtin_contract_catalog -from reader.protocols import ProtocolBinding +from reader.protocols import ProtocolBinding, ProtocolSemanticProgram from reader.tests.support import base_reader_config, build_decl from reader.workbench import PluginSemantics from reader.workbench.assets import build_plugin_asset @@ -61,6 +61,7 @@ def _workbench_decl( ) -> WorkbenchDecl: semantics = ExperimentSemantics( protocol=ProtocolBinding(id="workbench/generic"), + protocol_program=ProtocolSemanticProgram(protocol="workbench/generic"), annotations=AnnotationSemantics(), resources=ResourceCatalog(), layout=OutputLayout( diff --git a/src/reader/tests/notebooks/test_dual_reporter_triptych.py b/src/reader/tests/notebooks/test_dual_reporter_triptych.py new file mode 100644 index 0000000..569c4a6 --- /dev/null +++ b/src/reader/tests/notebooks/test_dual_reporter_triptych.py @@ -0,0 +1,181 @@ +from __future__ import annotations + +import pandas as pd +import pytest + +from reader.protocols import ProtocolBinding, builtin_protocol_catalog +from reader.workbench.notebooks.dual_reporter_triptych import ( + build_dual_reporter_triptych_chart, + build_triptych_data, + summarize_design_context, +) +from reader.workbench.templates import ( + compatible_notebook_templates, + resolve_notebook_template_descriptor, +) + + +def _dual_reporter_df() -> pd.DataFrame: + rows: list[dict[str, object]] = [] + for treatment in ("water", "EtOH"): + for time in (0.0, 1.0, 2.0): + for rep in (1, 2): + rows.append( + { + "design_id": "pTest", + "treatment": treatment, + "time": time, + "channel": "OD600", + "value": 0.1 + time + rep * 0.01, + "position": f"A{rep}", + } + ) + rows.append( + { + "design_id": "pTest", + "treatment": treatment, + "time": time, + "channel": "YFP/CFP", + "value": (2.0 if treatment == "EtOH" else 1.0) + time + rep * 0.1, + "position": f"A{rep}", + } + ) + return pd.DataFrame(rows) + + +def test_dual_reporter_triptych_builds_three_panel_data() -> None: + result = build_triptych_data( + _dual_reporter_df(), + time_col="time", + treatment_col="treatment", + growth_channel="OD600", + ratio_channel="YFP/CFP", + snapshot_channel="YFP/CFP", + snapshot_time=1.0, + treatment_order=["water", "EtOH"], + ) + + assert list(result.treatment_order) == ["water", "EtOH"] + assert set(result.od600_time["treatment"]) == {"water", "EtOH"} + assert set(result.ratio_time["treatment"]) == {"water", "EtOH"} + assert list(result.snapshot_stats["treatment"]) == ["water", "EtOH"] + assert result.snapshot_points.shape[0] == 4 + + +def test_dual_reporter_triptych_explicit_treatment_order_is_closed() -> None: + df = _dual_reporter_df() + extra_rows = df[df["treatment"] == "water"].copy() + extra_rows["treatment"] = "unexpected" + df = pd.concat([df, extra_rows], ignore_index=True) + + result = build_triptych_data( + df, + time_col="time", + treatment_col="treatment", + growth_channel="OD600", + ratio_channel="YFP/CFP", + snapshot_channel="YFP/CFP", + snapshot_time=1.0, + treatment_order=["water", "EtOH", "AND"], + ) + + assert list(result.treatment_order) == ["water", "EtOH", "AND"] + assert list(result.missing_treatments) == ["AND"] + assert "unexpected" not in set(result.od600_time["treatment"]) + assert "unexpected" not in set(result.ratio_time["treatment"]) + assert "unexpected" not in set(result.snapshot_points["treatment"]) + + +def test_dual_reporter_triptych_chart_uses_square_panels_and_full_treatment_domain() -> None: + alt = pytest.importorskip("altair") + result = build_triptych_data( + _dual_reporter_df(), + time_col="time", + treatment_col="treatment", + growth_channel="OD600", + ratio_channel="YFP/CFP", + snapshot_channel="YFP/CFP", + snapshot_time=1.0, + treatment_order=["water", "EtOH", "AND"], + ) + + chart = build_dual_reporter_triptych_chart( + alt=alt, + pd_module=pd, + data=result, + time_col="time", + treatment_col="treatment", + ) + spec = chart.to_dict() + + assert spec["spacing"] == 16 + assert [panel["width"] for panel in spec["hconcat"]] == [260, 260, 260] + assert [panel["height"] for panel in spec["hconcat"]] == [260, 260, 260] + assert spec["hconcat"][2]["layer"][0]["encoding"]["x"]["scale"]["domain"] == ["water", "EtOH", "AND"] + + +def test_dual_reporter_triptych_design_context_summarizes_identity_columns() -> None: + df = _dual_reporter_df() + df["design_id_alias"] = "alias-A" + df["id"] = "uuid-1" + df["sequence"] = "A" * 120 + + rows = summarize_design_context( + df, + primary_col="design_id", + primary_value="pTest", + preferred_columns=("design_id_alias", "design_id", "id", "sequence"), + ) + + assert rows[0] == ("design_id", "pTest") + assert ("design_id_alias", "alias-A") in rows + assert ("id", "uuid-1") in rows + sequence_value = dict(rows)["sequence"] + assert len(sequence_value) < 90 + assert sequence_value.startswith("AAAA") + assert sequence_value.endswith("AAAA") + + +def test_dual_reporter_triptych_rejects_missing_channels() -> None: + df = _dual_reporter_df() + df = df[df["channel"] != "YFP/CFP"] + + try: + build_triptych_data( + df, + time_col="time", + treatment_col="treatment", + growth_channel="OD600", + ratio_channel="YFP/CFP", + snapshot_channel="YFP/CFP", + snapshot_time=1.0, + treatment_order=["water", "EtOH"], + ) + except ValueError as exc: + assert "YFP/CFP" in str(exc) + else: # pragma: no cover - explicit failure path + raise AssertionError("missing ratio channel should fail fast") + + +def test_dual_reporter_triptych_template_is_protocol_neutral() -> None: + descriptor = resolve_notebook_template_descriptor("notebook/dual_reporter_triptych") + + assert descriptor.domain == "plate_reader" + assert "sfxi" not in descriptor.tags + body = descriptor.load_body() + assert "Dual-reporter triptych" in body + assert "debounce=True" in body + assert "chart_selection=False" in body + assert "mo.output.replace(_chart_panel)" in body + assert "Selected design" in body + assert "Triptych context" not in body + assert "summarize_design_context" in body + assert "Export 8-vector" not in body + + +def test_dual_reporter_screen_allows_triptych_without_sfxi_vec8() -> None: + protocol = builtin_protocol_catalog().bind(ProtocolBinding(id="plate_reader/dual_reporter_screen")) + templates = [item.template for item in compatible_notebook_templates(protocol=protocol)] + + assert "notebook/dual_reporter_triptych" in templates + assert "notebook/sfxi_eda" not in templates diff --git a/src/reader/tests/notebooks/test_launch.py b/src/reader/tests/notebooks/test_launch.py index 70276b2..c281764 100644 --- a/src/reader/tests/notebooks/test_launch.py +++ b/src/reader/tests/notebooks/test_launch.py @@ -1,6 +1,7 @@ from __future__ import annotations import json +from dataclasses import asdict from pathlib import Path import pytest @@ -40,6 +41,16 @@ def test_plan_marimo_launch_uses_repo_local_runtime_dirs(tmp_path: Path) -> None assert "--no-token" in plan.cmd +def test_plan_marimo_launch_missing_target_fails_before_creating_runtime_dirs(tmp_path: Path) -> None: + repo_root, notebook = _make_repo(tmp_path) + missing = notebook.with_name("missing.py") + + with pytest.raises(FileNotFoundError): + launch.plan_marimo_launch(mode="run", target=missing, headless=True, base_env={}) + + assert not (repo_root / ".cache" / "marimo").exists() + + def test_plan_marimo_launch_reuses_live_session_for_same_notebook(monkeypatch, tmp_path: Path) -> None: _, notebook = _make_repo(tmp_path) runtime_paths = launch._runtime_paths_for_target(notebook) @@ -58,7 +69,7 @@ def test_plan_marimo_launch_reuses_live_session_for_same_notebook(monkeypatch, t notebook_size_bytes=22, runtime_fingerprint="fp-current", ) - runtime_paths.registry_path.write_text(json.dumps([launch.asdict(record)]), encoding="utf-8") + runtime_paths.registry_path.write_text(json.dumps([asdict(record)]), encoding="utf-8") monkeypatch.setattr(launch, "_pid_is_live", lambda pid: True) monkeypatch.setattr(launch, "_port_is_open", lambda host, port, timeout=0.15: True) @@ -88,7 +99,7 @@ def test_plan_marimo_launch_restarts_stale_same_notebook_session_on_runtime_drif notebook_size_bytes=22, runtime_fingerprint="fp-stale", ) - runtime_paths.registry_path.write_text(json.dumps([launch.asdict(record)]), encoding="utf-8") + runtime_paths.registry_path.write_text(json.dumps([asdict(record)]), encoding="utf-8") monkeypatch.setattr(launch, "_pid_is_live", lambda pid: True) monkeypatch.setattr(launch, "_target_signature", lambda target: (11, 22)) monkeypatch.setattr(launch, "_runtime_fingerprint", lambda repo_root: "fp-current") @@ -128,7 +139,7 @@ def test_plan_marimo_launch_prunes_same_experiment_sessions(monkeypatch, tmp_pat repo_root=str(notebook.parents[4].resolve()), launched_at=1.0, ) - runtime_paths.registry_path.write_text(json.dumps([launch.asdict(record)]), encoding="utf-8") + runtime_paths.registry_path.write_text(json.dumps([asdict(record)]), encoding="utf-8") monkeypatch.setattr(launch, "_pid_is_live", lambda pid: True) monkeypatch.setattr(launch, "_port_is_open", lambda host, port, timeout=0.15: False) terminated: list[int] = [] @@ -157,3 +168,30 @@ def test_plan_marimo_launch_rejects_busy_explicit_port(monkeypatch, tmp_path: Pa preferred_port=9999, base_env={}, ) + + +def test_register_and_unregister_managed_session_round_trip(monkeypatch, tmp_path: Path) -> None: + _, notebook = _make_repo(tmp_path) + runtime_paths = launch._runtime_paths_for_target(notebook) + monkeypatch.setattr(launch, "_pid_is_live", lambda pid: True) + + launch.register_managed_session( + registry_path=runtime_paths.registry_path, + pid=1234, + port=2718, + host="127.0.0.1", + mode="edit", + target=notebook, + ) + + records = launch._load_registry(runtime_paths.registry_path) + assert len(records) == 1 + record = records[0] + assert record.pid == 1234 + assert record.port == 2718 + assert record.notebook == str(notebook.resolve()) + assert record.experiment_root == str(notebook.parents[2].resolve()) + + launch.unregister_managed_session(registry_path=runtime_paths.registry_path, pid=1234) + + assert launch._load_registry(runtime_paths.registry_path) == [] diff --git a/src/reader/tests/notebooks/test_templates.py b/src/reader/tests/notebooks/test_templates.py index e2e6696..4577755 100644 --- a/src/reader/tests/notebooks/test_templates.py +++ b/src/reader/tests/notebooks/test_templates.py @@ -105,10 +105,42 @@ def test_notebook_template_uses_explicit_record_scan_placeholder() -> None: assert "allow_scan=__ALLOW_RECORD_SCAN__" in template +def test_triptych_notebook_templates_debounce_snapshot_time_slider() -> None: + for template_name in ("notebook/dual_reporter_triptych", "notebook/sfxi_eda"): + template = resolve_notebook_template_descriptor(template_name).load_body() + assert "debounce=True" in template + assert "chart_selection=False" in template + assert "legend_selection=False" in template + assert "min-height" in template + assert "mo.output.replace(_chart_panel)" in template + assert "Selected design" in template + assert "Triptych context" not in template + assert "Design alias" in template + assert "Sequence" in template + + +def test_sfxi_notebook_uses_protocol_bound_transform_config() -> None: + template = resolve_notebook_template_descriptor("notebook/sfxi_eda").load_body() + + assert "bind_protocol(decl.experiment_semantics.protocol)" in template + assert "effective_plugin_config(" in template + + +def test_sfxi_notebook_triptych_uses_closed_corner_condition_labels() -> None: + template = resolve_notebook_template_descriptor("notebook/sfxi_eda").load_body() + + assert "sfxi_condition_order" in template + assert 'f"{_corner}: {sfxi_cfg.treatment_map[_corner]}"' in template + assert 'sfxi_triptych_treatment_col = "sfxi_condition"' in template + assert "sfxi_triptych_rows[sfxi_triptych_treatment_col].isin(sfxi_condition_order)" in template + assert "treatment_order=sfxi_condition_order" in template + + def test_notebook_template_catalog_exposes_domain_semantics() -> None: descriptors = {item.template: item for item in builtin_notebook_template_catalog().all()} assert descriptors["notebook/eda"].domain == "generic" assert descriptors["notebook/microplate"].domain == "plate_reader" + assert descriptors["notebook/dual_reporter_triptych"].domain == "plate_reader" assert descriptors["notebook/retron_sponge"].domain == "plate_reader" assert descriptors["notebook/retron_sponge_aggregate"].domain == "generic" assert descriptors["notebook/cytometry"].domain == "cytometry" @@ -148,7 +180,7 @@ def test_notebook_template_default_selection_uses_protocol_policy() -> None: def test_notebook_template_catalog_filters_by_protocol() -> None: protocol = builtin_protocol_catalog().bind(ProtocolBinding(id="logic/sfxi_screen")) templates = [item.template for item in compatible_notebook_templates(protocol=protocol)] - assert templates == ["notebook/sfxi_eda", "notebook/eda", "notebook/basic"] + assert templates == ["notebook/sfxi_eda", "notebook/dual_reporter_triptych", "notebook/eda", "notebook/basic"] descriptor = require_notebook_template_for_protocol("notebook/sfxi_eda", protocol=protocol) assert descriptor.template == "notebook/sfxi_eda" with pytest.raises(ConfigError, match="does not allow notebook template"): @@ -169,6 +201,7 @@ def test_generic_notebook_template_catalog_includes_retron_aggregate_review() -> "notebook/retron_sponge_aggregate", "notebook/eda", "notebook/microplate", + "notebook/dual_reporter_triptych", "notebook/cytometry", "notebook/sfxi_eda", ] diff --git a/src/reader/tests/plots/test_render_path.py b/src/reader/tests/plots/test_render_path.py index d5431ac..4a697f4 100644 --- a/src/reader/tests/plots/test_render_path.py +++ b/src/reader/tests/plots/test_render_path.py @@ -14,7 +14,7 @@ import pandas as pd from reader.contracts import builtin_contract_catalog -from reader.protocols import ProtocolBinding, builtin_protocol_catalog +from reader.protocols import ProtocolBinding, ProtocolSemanticProgram, builtin_protocol_catalog from reader.runtime import ReaderRuntime from reader.workbench import PluginSemantics, resolve_workbench from reader.workbench.assets import AssetCatalog, build_plugin_asset @@ -86,6 +86,7 @@ def test_plot_save_calls_render(tmp_path: Path) -> None: experiment=ExperimentDecl(id="exp_plot", title="exp_plot", lifecycle="active", root=tmp_path), experiment_semantics=ExperimentSemantics( protocol=ProtocolBinding(id="workbench/generic"), + protocol_program=ProtocolSemanticProgram(protocol="workbench/generic"), annotations=AnnotationSemantics(), resources=ResourceCatalog(), layout=OutputLayout( diff --git a/src/reader/tests/plugins/plot/test_sfxi_setpoint_scatter.py b/src/reader/tests/plugins/plot/test_sfxi_setpoint_scatter.py new file mode 100644 index 0000000..3d1ba6f --- /dev/null +++ b/src/reader/tests/plugins/plot/test_sfxi_setpoint_scatter.py @@ -0,0 +1,188 @@ +from __future__ import annotations + +import importlib +from pathlib import Path +from types import SimpleNamespace + +import pandas as pd + +from reader.contracts import builtin_contract_catalog +from reader.plugins.plot.sfxi_setpoint_scatter import SFXISetpointScatterCfg, SFXISetpointScatterPlot +from reader.protocols import ProtocolBinding, ProtocolSemanticProgram +from reader.runtime import builtin_runtime +from reader.workbench import PluginSemantics +from reader.workbench.assets import build_plugin_asset +from reader.workbench.decl.model import ( + ExperimentDecl, + NotebookDecl, + PipelineDecl, + PluginStepDecl, + RecordInputDecl, + SurfaceDecl, + WorkbenchDecl, +) +from reader.workbench.engine import run_spec +from reader.workbench.experiment import AnnotationSemantics, ExperimentSemantics, OutputLayout, ResourceCatalog +from reader.workbench.graph import resolve_workbench + + +def _vec8_df() -> pd.DataFrame: + return pd.DataFrame( + { + "design_id": ["p01", "p02"], + "reference_design_id": ["REF", "REF"], + "time_selected_h": [10.0, 10.0], + "r_logic": [8.0, 4.0], + "v00": [0.0, 0.0], + "v10": [0.0, 0.0], + "v01": [0.0, 0.0], + "v11": [1.0, 1.0], + "y00_star": [0.0, 0.0], + "y10_star": [0.0, 0.0], + "y01_star": [0.0, 0.0], + "y11_star": [1.0, 0.0], + "flat_logic": [False, False], + } + ) + + +def _install_fake_dnadesign_api(monkeypatch) -> None: + class _FakeConfig: + def __init__(self, **kwargs): + self.kwargs = kwargs + + class _FakeResult: + api_version = "1" + objective_name = "sfxi_v1" + + def to_records(self): + return [ + { + "objective_name": "sfxi_v1", + "api_version": "1", + "state_order": ["00", "10", "01", "11"], + "setpoint_vector": [0.0, 0.0, 0.0, 1.0], + "denom_percentile": 95, + "denom_used": 2.0, + "logic_fidelity": 1.0, + "effect_raw": 2.0, + "effect_scaled": 1.0, + "sfxi": 1.0, + "clip_lo_mask": False, + "clip_hi_mask": True, + "intensity_disabled": False, + }, + { + "objective_name": "sfxi_v1", + "api_version": "1", + "state_order": ["00", "10", "01", "11"], + "setpoint_vector": [0.0, 0.0, 0.0, 1.0], + "denom_percentile": 95, + "denom_used": 2.0, + "logic_fidelity": 1.0, + "effect_raw": 1.0, + "effect_scaled": 0.5, + "sfxi": 0.5, + "clip_lo_mask": False, + "clip_hi_mask": False, + "intensity_disabled": False, + }, + ] + + fake_api = SimpleNamespace( + SFXI_API_VERSION="1", + SFXIScoringConfig=_FakeConfig, + score_vec8=lambda *args, **kwargs: _FakeResult(), + ) + real_import = importlib.import_module + + def _fake_import(name: str, package: str | None = None): + if name == "dnadesign.opal.api.sfxi": + return fake_api + return real_import(name, package) + + monkeypatch.setattr(importlib, "import_module", _fake_import) + + +def test_sfxi_setpoint_scatter_plot_saves_artifact(tmp_path, monkeypatch) -> None: + _install_fake_dnadesign_api(monkeypatch) + ctx = SimpleNamespace(plots_dir=tmp_path, palette_book=None) + cfg = SFXISetpointScatterCfg(setpoints={"and": [0.0, 0.0, 0.0, 1.0]}, scaling_min_n=1) + plugin = SFXISetpointScatterPlot() + plugin.bind_runtime( + descriptor=build_plugin_asset( + plugin_id="plot/sfxi_setpoint_scatter", + semantics=PluginSemantics( + domain="logic", + family="sfxi_objective_scatter", + summary="SFXI setpoint scatter plot.", + ), + plugin_cls=SFXISetpointScatterPlot, + ), + contracts=builtin_contract_catalog(), + ) + + output = plugin.run(ctx, {"vec8": _vec8_df()}, cfg) + + assert output["artifacts"] == [str(tmp_path / "sfxi_setpoint_scatter.pdf")] + assert (tmp_path / "sfxi_setpoint_scatter.pdf").exists() + + +def test_sfxi_setpoint_scatter_runtime_persists_plot_bundle_record(tmp_path: Path, monkeypatch) -> None: + _install_fake_dnadesign_api(monkeypatch) + runtime = builtin_runtime() + outputs = tmp_path / "outputs" + store = runtime.record_store(outputs, plots_subdir="plots", exports_subdir="exports") + store.persist_dataframe( + producer_id="sfxi_vec8", + producer_plugin="transform/sfxi", + out_name="vec8", + record_id="sfxi_vec8/vec8", + df=_vec8_df(), + contract_id="sfxi.vec8.v2", + inputs=[], + config_digest="sha256:test", + ) + decl = WorkbenchDecl( + experiment=ExperimentDecl(id="exp_sfxi_plot", title="exp_sfxi_plot", lifecycle="active", root=tmp_path), + experiment_semantics=ExperimentSemantics( + protocol=ProtocolBinding(id="workbench/generic"), + protocol_program=ProtocolSemanticProgram(protocol="workbench/generic"), + annotations=AnnotationSemantics(), + resources=ResourceCatalog(), + layout=OutputLayout( + outputs_dir=outputs, + plots_subdir="plots", + exports_subdir="exports", + notebooks_subdir="notebooks", + ), + ), + plotting_palette=None, + pipeline=PipelineDecl(runtime={}, steps=()), + plots=SurfaceDecl( + specs=( + PluginStepDecl( + id="sfxi_setpoint_scatter", + plugin="plot/sfxi_setpoint_scatter", + reads={"vec8": RecordInputDecl(record_id="sfxi_vec8/vec8")}, + with_={"setpoints": {"and": [0.0, 0.0, 0.0, 1.0]}, "scaling_min_n": 1}, + ), + ) + ), + exports=SurfaceDecl(specs=()), + notebooks=NotebookDecl(specs=()), + ) + + run_spec( + decl, + include_pipeline=False, + include_plots=True, + include_exports=False, + plot_specs=resolve_workbench(decl).plots, + log_level="ERROR", + runtime=runtime, + ) + + latest_ids = {record.record_id for record in store.iter_latest_records()} + assert "plot:sfxi_setpoint_scatter" in latest_ids + assert (outputs / "plots" / "sfxi_setpoint_scatter.pdf").exists() diff --git a/src/reader/tests/plugins/plot/test_sfxi_triptych_sequence.py b/src/reader/tests/plugins/plot/test_sfxi_triptych_sequence.py new file mode 100644 index 0000000..576bac7 --- /dev/null +++ b/src/reader/tests/plugins/plot/test_sfxi_triptych_sequence.py @@ -0,0 +1,268 @@ +from __future__ import annotations + +import json +from dataclasses import dataclass +from pathlib import Path +from types import SimpleNamespace + +import numpy as np +import pandas as pd +import pytest +from typer.testing import CliRunner + +from reader.domains.logic.sfxi import triptych_sequence, triptych_sequence_dnadesign +from reader.errors import SFXIError +from reader.protocols import ProtocolBinding, ProtocolSemanticProgram +from reader.runtime import builtin_runtime +from reader.tests.support import base_reader_config, write_config +from reader.workbench.cli import app +from reader.workbench.decl.model import ( + ExperimentDecl, + NotebookDecl, + PipelineDecl, + PluginStepDecl, + RecordInputDecl, + SurfaceDecl, + WorkbenchDecl, +) +from reader.workbench.engine import run_spec +from reader.workbench.experiment import AnnotationSemantics, ExperimentSemantics, OutputLayout, ResourceCatalog +from reader.workbench.graph import resolve_workbench + + +def _vec8_df() -> pd.DataFrame: + return pd.DataFrame( + { + "design_id": ["pDual-10-test01"], + "id": ["seq01"], + "sequence": ["ACGTACGTACGT"], + "reference_design_id": ["pDual-10"], + "time_selected_h": [12.0], + "v00": [0.0], + "v10": [1.0], + "v01": [0.0], + "v11": [0.0], + "y00_star": [0.0], + "y10_star": [1.0], + "y01_star": [0.0], + "y11_star": [0.0], + "r_logic": [4.0], + "flat_logic": [False], + } + ) + + +def _assay_df() -> pd.DataFrame: + rows: list[dict[str, object]] = [] + treatments = ["negative", "3% EtOH", "100 nM ciprofloxacin", "3% EtOH + 100 nM ciprofloxacin"] + for treatment_idx, treatment in enumerate(treatments, start=1): + for time in (0.0, 12.0, 24.0): + for rep in (1, 2): + rows.append( + { + "design_id": "pDual-10-test01", + "position": f"A{treatment_idx}{rep}", + "time": time, + "channel": "OD600", + "value": 0.12 + 0.018 * time + 0.01 * rep, + "treatment": treatment, + "treatment_alias": treatment, + } + ) + rows.append( + { + "design_id": "pDual-10-test01", + "position": f"B{treatment_idx}{rep}", + "time": time, + "channel": "YFP/CFP", + "value": 1.0 + 0.05 * treatment_idx + 0.02 * time + 0.01 * rep, + "treatment": treatment, + "treatment_alias": treatment, + } + ) + return pd.DataFrame(rows) + + +def _sequence_rows() -> pd.DataFrame: + return pd.DataFrame( + { + "usr_sequence_id": ["seq01"], + "usr_sequence": ["ACGTACGTACGT"], + "usr_label": ["pDual-10-test01"], + "usr_annotations": [[]], + "usr_dataset": ["usr_test_promoters"], + "sequence_adapter_kind": ["densegen_tfbs"], + } + ) + + +@dataclass(frozen=True) +class _FakeDiagnostics: + contract_id: str = "dnadesign.baserender.sequence_panel.v1" + contract_version: str = "1" + style_profile: str = "promoter_compact_slide.v1" + style_preset: str = "presentation_default" + adapter_kind: str = "densegen_tfbs" + renderer_name: str = "sequence_rows" + sequence_length_bp: int = 12 + feature_count: int = 1 + strand_count: int = 2 + legend_entries: tuple[str, ...] = ("promoter",) + image_width_px: int = 220 + image_height_px: int = 60 + + +class _FakeBaseRender: + BASERENDER_SEQUENCE_PANEL_CONTRACT_VERSION = "1" + + @staticmethod + def render_sequence_panel_image(*args, **kwargs): + del args, kwargs + image = np.full((60, 220, 4), 255, dtype=np.uint8) + image[15:45, 20:200, :3] = 100 + return SimpleNamespace(image=image, diagnostics=_FakeDiagnostics()) + + @staticmethod + def sequence_panel_config_for_adapter(*args, **kwargs): + del args, kwargs + return object() + + +class _FakeUsrWithTransitiveImportFailure: + def __getattr__(self, name: str): + if name == "Dataset": + raise ModuleNotFoundError("No module named 'Bio'") + raise AttributeError(name) + + +def _install_fake_sequence_panel(monkeypatch) -> None: + monkeypatch.setattr(triptych_sequence, "require_dnadesign_sequence_panel_api", lambda: (_FakeBaseRender, object())) + monkeypatch.setattr(triptych_sequence, "_load_usr_rows", lambda *, usr, cfg, exp_dir: _sequence_rows()) + + +def test_logic_sfxi_plot_list_surfaces_triptych_sequence(tmp_path: Path) -> None: + cfg = base_reader_config( + experiment_id="exp_logic_triptych", + protocol_id="logic/sfxi_screen", + protocol_inputs={"logic_map_ref": "induction_logic"}, + protocol_analysis={ + "include_vec8": True, + "include_fold_change": False, + "sfxi_triptych_sequence": {"sequence_source": {"dataset": "usr_test_promoters"}}, + }, + protocol_outputs={"plots": {"profile": "none", "include": ["sfxi_triptych_sequence"]}}, + resources={"sample_map": {"kind": "file", "path": "./inputs/metadata.xlsx"}}, + annotations={ + "logic_maps": { + "induction_logic": { + "column": "treatment", + "corners": {"00": "A", "10": "B", "01": "C", "11": "D"}, + } + } + }, + ) + cfg_path = write_config(tmp_path, cfg) + result = CliRunner().invoke(app, ["plot", str(cfg_path), "--list", "--format", "json"]) + + assert result.exit_code == 0 + payload = json.loads(result.output) + assert payload["summary"]["by_plugin"] == {"plot/sfxi_triptych_sequence": 1} + assert payload["plots"][0]["id"] == "sfxi_triptych_sequence" + reads = {item["label"]: item for item in payload["plots"][0]["reads"]} + assert reads["vec8"]["ref"] == {"record": "sfxi_vec8/vec8"} + assert reads["assay"]["ref"] == {"record": "promote_to_tidy_plus_map/df"} + + +def test_sfxi_triptych_sequence_dependency_check_wraps_transitive_import_failures(monkeypatch) -> None: + def fake_import_module(name: str): + if name == "dnadesign.baserender": + return _FakeBaseRender + if name == "dnadesign.usr": + return _FakeUsrWithTransitiveImportFailure() + raise AssertionError(name) + + monkeypatch.setattr(triptych_sequence_dnadesign.importlib, "import_module", fake_import_module) + + with pytest.raises(SFXIError, match="requires dnadesign public APIs") as exc_info: + triptych_sequence_dnadesign.require_dnadesign_sequence_panel_api() + + assert isinstance(exc_info.value.__cause__, ModuleNotFoundError) + + +def test_sfxi_triptych_sequence_runtime_persists_bundle_record(tmp_path: Path, monkeypatch) -> None: + _install_fake_sequence_panel(monkeypatch) + runtime = builtin_runtime() + outputs = tmp_path / "outputs" + store = runtime.record_store(outputs, plots_subdir="plots", exports_subdir="exports") + store.persist_dataframe( + producer_id="sfxi_vec8", + producer_plugin="transform/sfxi", + out_name="vec8", + record_id="sfxi_vec8/vec8", + df=_vec8_df(), + contract_id="sfxi.vec8.v2", + inputs=[], + config_digest="sha256:test-vec8", + ) + store.persist_dataframe( + producer_id="promote_to_tidy_plus_map", + producer_plugin="validator/to_tidy_plus_map", + out_name="df", + record_id="promote_to_tidy_plus_map/df", + df=_assay_df(), + contract_id="plate_reader.annotated.v1", + inputs=[], + config_digest="sha256:test-assay", + ) + decl = WorkbenchDecl( + experiment=ExperimentDecl(id="exp_sfxi_triptych", title="exp_sfxi_triptych", lifecycle="active", root=tmp_path), + experiment_semantics=ExperimentSemantics( + protocol=ProtocolBinding(id="workbench/generic"), + protocol_program=ProtocolSemanticProgram(protocol="workbench/generic"), + annotations=AnnotationSemantics(), + resources=ResourceCatalog(), + layout=OutputLayout( + outputs_dir=outputs, + plots_subdir="plots", + exports_subdir="exports", + notebooks_subdir="notebooks", + ), + ), + plotting_palette=None, + pipeline=PipelineDecl(runtime={}, steps=()), + plots=SurfaceDecl( + specs=( + PluginStepDecl( + id="sfxi_triptych_sequence", + plugin="plot/sfxi_triptych_sequence", + reads={ + "vec8": RecordInputDecl(record_id="sfxi_vec8/vec8"), + "assay": RecordInputDecl(record_id="promote_to_tidy_plus_map/df"), + }, + with_={"sequence_source": {"dataset": "usr_test_promoters"}}, + ), + ) + ), + exports=SurfaceDecl(specs=()), + notebooks=NotebookDecl(specs=()), + ) + + run_spec( + decl, + include_pipeline=False, + include_plots=True, + include_exports=False, + plot_specs=resolve_workbench(decl).plots, + log_level="ERROR", + runtime=runtime, + ) + + latest = {record.record_id: record for record in store.iter_latest_records()} + assert "plot:sfxi_triptych_sequence" in latest + assert (outputs / "plots" / "sfxi_triptych_sequence" / "sfxi_triptych_sequence.pdf").exists() + assert (outputs / "plots" / "sfxi_triptych_sequence" / "sfxi_triptych_sequence.png").exists() + manifest_path = outputs / "manifests" / "sfxi_triptych_sequence_manifest.json" + assert manifest_path.exists() + manifest = json.loads(manifest_path.read_text(encoding="utf-8")) + assert manifest["schema"] == "reader.sfxi_triptych_sequence_bundle.v1" + assert manifest["row_order"] == ["pDual-10-test01"] diff --git a/src/reader/tests/plugins/transform/test_assay_labels.py b/src/reader/tests/plugins/transform/test_assay_labels.py index f4a283e..d23663f 100644 --- a/src/reader/tests/plugins/transform/test_assay_labels.py +++ b/src/reader/tests/plugins/transform/test_assay_labels.py @@ -7,7 +7,7 @@ import pytest from reader.plugins.transform.assay_labels import AnnotationLabelsCfg, AnnotationLabelsTransform -from reader.protocols import ProtocolBinding +from reader.protocols import ProtocolBinding, ProtocolSemanticProgram from reader.workbench.experiment import ( AnnotationLabels, AnnotationLabelSpec, @@ -22,6 +22,7 @@ def _ctx(labels): logger = SimpleNamespace(info=lambda *args, **kwargs: None, debug=lambda *args, **kwargs: None) semantics = ExperimentSemantics( protocol=ProtocolBinding(id="workbench/generic"), + protocol_program=ProtocolSemanticProgram(protocol="workbench/generic"), annotations=AnnotationSemantics( labels=AnnotationLabels( by_id={ diff --git a/src/reader/tests/plugins/transform/test_sfxi.py b/src/reader/tests/plugins/transform/test_sfxi.py index 40025a0..f20b7e2 100644 --- a/src/reader/tests/plugins/transform/test_sfxi.py +++ b/src/reader/tests/plugins/transform/test_sfxi.py @@ -9,7 +9,7 @@ from reader.domains.logic.sfxi.run import build_vec8_from_tidy from reader.plugins.transform.sfxi import SFXICfg, SFXITransform -from reader.protocols import ProtocolBinding +from reader.protocols import ProtocolBinding, ProtocolSemanticProgram from reader.workbench.experiment import ( AnnotationSemantics, ExperimentSemantics, @@ -25,6 +25,7 @@ def _ctx(): logger=logging.getLogger("reader.tests.sfxi"), experiment=ExperimentSemantics( protocol=ProtocolBinding(id="logic/sfxi_screen"), + protocol_program=ProtocolSemanticProgram(protocol="logic/sfxi_screen"), annotations=AnnotationSemantics( logic_maps=LogicMaps( by_id={ diff --git a/src/reader/tests/protocols/test_semantic_invariants.py b/src/reader/tests/protocols/test_semantic_invariants.py index 4919740..236b6a5 100644 --- a/src/reader/tests/protocols/test_semantic_invariants.py +++ b/src/reader/tests/protocols/test_semantic_invariants.py @@ -1,10 +1,11 @@ from __future__ import annotations +from pathlib import Path + import pytest from reader.errors import ConfigError -from reader.protocols import ProtocolBinding, builtin_protocol_catalog -from reader.protocols.compiler import _semantic_program +from reader.protocols import BUILTIN_PROTOCOLS, BoundProtocol, ProtocolBinding, builtin_protocol_catalog from reader.protocols.model import ( CompiledProtocolPlan, ProtocolDescriptor, @@ -12,8 +13,24 @@ ProtocolMetricSpec, ProtocolNotebookPolicy, ProtocolSemanticExecution, + ProtocolSemanticNode, ProtocolSemanticProfileSpec, + ProtocolSemanticProgram, ) +from reader.protocols.semantic_coverage import _semantic_program +from reader.workbench.decl.model import NotebookTemplateCallDecl +from reader.workbench.experiment import AnnotationSemantics, ExperimentSemantics, OutputLayout, ResourceCatalog + + +def test_builtin_protocol_tuple_keeps_public_order_stable() -> None: + assert [descriptor.protocol for descriptor in BUILTIN_PROTOCOLS] == [ + "workbench/generic", + "plate_reader/dual_reporter_screen", + "plate_reader/single_reporter_screen", + "plate_reader/retron_sponge_screen", + "logic/sfxi_screen", + "cytometry/flow_panel", + ] def test_semantic_program_rejects_profile_scoped_missing_dependencies() -> None: @@ -49,7 +66,7 @@ def test_semantic_program_rejects_profile_scoped_missing_dependencies() -> None: allowed_templates=("notebook/basic",), summary="Test notebook policy.", ), - compiler=lambda protocol: CompiledProtocolPlan(), + compiler=lambda protocol: CompiledProtocolPlan(semantic_program=protocol.semantic_program()), ), ) @@ -66,3 +83,95 @@ def test_compiler_rejects_unknown_semantic_override_ids() -> None: overrides={"missing_metric": ProtocolSemanticExecution(status="compiled")}, active_profile="yfp_cfp_fold_change", ) + + +def test_bound_protocol_semantic_program_applies_execution_overrides_without_changing_structure() -> None: + protocol = builtin_protocol_catalog().bind(ProtocolBinding(id="plate_reader/dual_reporter_screen")) + + authored = protocol.semantic_program(active_profile="yfp_cfp_fold_change") + compiled = protocol.semantic_program( + active_profile="yfp_cfp_fold_change", + execution_overrides={ + "OD": ProtocolSemanticExecution( + status="compiled", + step_ids=("ingest",), + record_ids=("ingest/df",), + ) + }, + ) + authored_metrics = {node.id: node for node in authored.metrics} + compiled_metrics = {node.id: node for node in compiled.metrics} + + assert compiled.active_profile == authored.active_profile + assert [node.id for node in compiled.metrics] == [node.id for node in authored.metrics] + assert compiled_metrics["OD"].summary == authored_metrics["OD"].summary + assert compiled_metrics["OD"].formula == authored_metrics["OD"].formula + assert authored_metrics["OD"].execution.status == "descriptive_only" + assert compiled_metrics["OD"].execution.status == "compiled" + assert compiled_metrics["OD"].execution.step_ids == ("ingest",) + + +def test_bound_protocol_compile_injects_default_notebook_when_compiler_omits_notebooks() -> None: + descriptor = ProtocolDescriptor( + protocol="test/default_notebook", + domain="generic", + family="test_protocol", + summary="Compiler notebook fallback contract.", + execution=ProtocolExecutionPlan( + notebook=ProtocolNotebookPolicy( + default_template="notebook/basic", + allowed_templates=("notebook/basic", "notebook/eda"), + summary="Notebook policy.", + ), + compiler=lambda protocol: CompiledProtocolPlan(semantic_program=protocol.semantic_program()), + ), + ) + + compiled = BoundProtocol(descriptor=descriptor).compile() + + assert compiled.notebooks == (NotebookTemplateCallDecl(id="default", template="notebook/basic"),) + + +def test_protocol_semantic_execution_rejects_unknown_status() -> None: + with pytest.raises(ValueError, match="must be 'compiled' or 'descriptive_only'"): + ProtocolSemanticExecution(status="typo") + + +def test_semantic_program_rejects_profiles_without_declared_catalog() -> None: + with pytest.raises(ValueError, match="references unknown semantic profiles"): + ProtocolSemanticProgram( + protocol="test/missing_profiles", + metrics=( + ProtocolSemanticNode( + id="M", + kind="metric", + summary="Metric with undeclared profile.", + profiles=("profile_a",), + stage="raw", + formula="value", + ), + ), + ) + + +def test_experiment_semantics_rejects_mismatched_protocol_program() -> None: + compiled_program = ( + builtin_protocol_catalog() + .bind(ProtocolBinding(id="plate_reader/dual_reporter_screen")) + .compile() + .semantic_program + ) + + with pytest.raises(ValueError, match="must target the bound protocol"): + ExperimentSemantics( + protocol=ProtocolBinding(id="logic/sfxi_screen"), + annotations=AnnotationSemantics(), + resources=ResourceCatalog(), + layout=OutputLayout( + outputs_dir=Path("outputs"), + plots_subdir="plots", + exports_subdir="exports", + notebooks_subdir="notebooks", + ), + protocol_program=compiled_program, + ) diff --git a/src/reader/tests/repo/test_docs_routes.py b/src/reader/tests/repo/test_docs_routes.py new file mode 100644 index 0000000..576b4cd --- /dev/null +++ b/src/reader/tests/repo/test_docs_routes.py @@ -0,0 +1,68 @@ +from __future__ import annotations + +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parents[4] + + +def test_reader_experiment_bootstrap_skill_routes_to_primary_guide() -> None: + skill_path = REPO_ROOT / "skills" / "reader-experiment-bootstrap" / "SKILL.md" + text = skill_path.read_text(encoding="utf-8") + assert "docs/guides/experiment_bootstrap.md" in text + assert "docs/guides/data_operations_plan.md" in text + assert "docs/guides/data_operations_plan/data_classes.md" in text + + +def test_experiment_bootstrap_routes_through_data_operations_plan() -> None: + guide_path = REPO_ROOT / "docs" / "guides" / "experiment_bootstrap.md" + text = guide_path.read_text(encoding="utf-8") + assert "./data_operations_plan.md" in text + assert "./data_operations_plan/data_classes.md" in text + assert "Classify the data class" in text + + +def test_data_operations_plan_uses_progressive_disclosure() -> None: + guide_path = REPO_ROOT / "docs" / "guides" / "data_operations_plan.md" + text = guide_path.read_text(encoding="utf-8") + assert "./data_operations_plan/operating_model.md" in text + assert "./data_operations_plan/data_classes.md" in text + assert "./data_operations_plan/metadata_minimums.md" in text + assert "./data_operations_plan/transfer_and_verification.md" in text + + +def test_data_operations_plan_routes_to_machine_readable_registry() -> None: + guide_path = REPO_ROOT / "docs" / "guides" / "data_operations_plan.md" + text = guide_path.read_text(encoding="utf-8") + assert "../../src/reader/workbench/dop/" in text + assert "uv run reader dop classes --format json" in text + + +def test_data_operations_plan_routes_to_repo_skill() -> None: + guide_path = REPO_ROOT / "docs" / "guides" / "data_operations_plan.md" + text = guide_path.read_text(encoding="utf-8") + assert "../../skills/reader-data-operations-plan/SKILL.md" in text + + +def test_reader_data_operations_plan_skill_routes_to_owned_surfaces() -> None: + skill_path = REPO_ROOT / "skills" / "reader-data-operations-plan" / "SKILL.md" + text = skill_path.read_text(encoding="utf-8") + assert "docs/guides/data_operations_plan.md" in text + assert "docs/guides/data_operations_plan/operating_model.md" in text + assert "uv run reader dop classes --format json" in text + assert "./references/endpoint-contracts.md" in text + assert "./references/external-sources.md" in text + + +def test_reader_data_operations_plan_skill_routes_away_from_adjacent_workflows() -> None: + skill_path = REPO_ROOT / "skills" / "reader-data-operations-plan" / "SKILL.md" + text = skill_path.read_text(encoding="utf-8") + assert "Do not use for full experiment creation" in text + assert "reader-experiment-bootstrap" in text + assert "reader-workbench-gardening" in text + assert "## Success Criteria" in text + + +def test_repo_skill_index_lists_data_operations_plan_skill() -> None: + skill_index_path = REPO_ROOT / "skills" / "README.md" + text = skill_index_path.read_text(encoding="utf-8") + assert "./reader-data-operations-plan/SKILL.md" in text diff --git a/src/reader/tests/repo/test_experiment_smoke_runs.py b/src/reader/tests/repo/test_experiment_smoke_runs.py index 876b839..93cbaca 100644 --- a/src/reader/tests/repo/test_experiment_smoke_runs.py +++ b/src/reader/tests/repo/test_experiment_smoke_runs.py @@ -1,17 +1,20 @@ from __future__ import annotations +import json import shutil from pathlib import Path import pytest from rich.console import Console +from typer.testing import CliRunner from reader.contracts import builtin_contract_catalog from reader.tests.repo.experiment_matrix import END_TO_END_RUNNABLE_CONFIGS, repo_rel -from reader.tests.support import REPO_ROOT, load_decl +from reader.tests.support import REPO_ROOT, default_notebook_name, load_decl from reader.workbench import resolve_workbench +from reader.workbench.cli import app from reader.workbench.engine import run_spec -from reader.workbench.records import RecordStore +from reader.workbench.records import RecordStore, record_paths pytestmark = pytest.mark.integration @@ -51,7 +54,15 @@ def _run( ) -@pytest.mark.fleet +def _assert_file_bundle_records_exist(store: RecordStore, record_ids: set[str]) -> None: + for record_id in sorted(record_ids): + record = store.read_record(record_id) + paths = record_paths(record) + assert paths, f"Record {record_id} did not include any materialized files." + assert all(path.is_file() for path in paths), f"Record {record_id} referenced missing files: {paths!r}" + + +@pytest.mark.active_experiments @pytest.mark.parametrize("config_path", END_TO_END_RUNNABLE_CONFIGS, ids=repo_rel) def test_repo_data_backed_experiments_run_end_to_end(tmp_path: Path, config_path: Path) -> None: rel_dir = str(config_path.parent.relative_to(REPO_ROOT / "experiments")) @@ -74,17 +85,12 @@ def test_repo_data_backed_experiments_run_end_to_end(tmp_path: Path, config_path latest_ids = {record.record_id for record in store.iter_latest_records()} expected_plot_ids = {f"plot:{plot.id}" for plot in workbench.plots} expected_export_ids = {f"export:{export.id}" for export in workbench.exports} - plots_dir = outputs / layout.plots_subdir - exports_dir = outputs / layout.exports_subdir assert (outputs / "manifests" / "records.json").exists() assert "ingest/df" in latest_ids assert expected_plot_ids.issubset(latest_ids) assert expected_export_ids.issubset(latest_ids) - if expected_plot_ids: - assert any(path.is_file() for path in plots_dir.rglob("*")) - if expected_export_ids: - assert any(path.is_file() for path in exports_dir.rglob("*")) + _assert_file_bundle_records_exist(store, expected_plot_ids | expected_export_ids) @pytest.mark.smoke @@ -100,7 +106,6 @@ def test_plate_reader_panel_v3_generates_records_and_plots_from_clean_temp_copy( layout = decl.experiment_semantics.layout outputs = layout.outputs_dir manifests = outputs / "manifests" - plots_dir = outputs / layout.plots_subdir store = RecordStore( outputs, contracts=builtin_contract_catalog(), @@ -117,7 +122,7 @@ def test_plate_reader_panel_v3_generates_records_and_plots_from_clean_temp_copy( assert not (manifests / "exports_manifest.json").exists() assert "ingest/df" in latest_ids assert "plot:raw_kinetics" in latest_ids - assert any(plots_dir.glob("*.pdf")) + _assert_file_bundle_records_exist(store, {"plot:raw_kinetics"}) @pytest.mark.smoke @@ -149,6 +154,7 @@ def test_sfxi_v3_generates_records_and_export_from_clean_temp_copy(tmp_path: Pat assert not (manifests / "exports_manifest.json").exists() assert "sfxi_vec8/vec8" in latest_ids assert "export:logic_summary_workbook" in latest_ids + _assert_file_bundle_records_exist(store, {"export:logic_summary_workbook"}) assert (outputs / layout.exports_subdir / "sfxi" / "vec8.xlsx").exists() @@ -165,7 +171,6 @@ def test_sfxi_logic_geometry_experiment_runs_and_plots_from_clean_temp_copy(tmp_ layout = decl.experiment_semantics.layout outputs = layout.outputs_dir manifests = outputs / "manifests" - plots_dir = outputs / layout.plots_subdir store = RecordStore( outputs, contracts=builtin_contract_catalog(), @@ -180,7 +185,7 @@ def test_sfxi_logic_geometry_experiment_runs_and_plots_from_clean_temp_copy(tmp_ assert "promote_to_tidy_plus_map/df" in latest_ids assert "sfxi_vec8/vec8" not in latest_ids assert "plot:logic_symmetry" in latest_ids - assert any(plots_dir.glob("*.pdf")) + _assert_file_bundle_records_exist(store, {"plot:logic_symmetry"}) @pytest.mark.smoke @@ -213,7 +218,121 @@ def test_retron_sponge_experiment_generates_semantic_outputs_from_clean_temp_cop assert expected_plot_ids.issubset(latest_ids) assert expected_export_ids.issubset(latest_ids) assert "plot:baseline_shifted_kinetics" in latest_ids - assert any(plots_dir.glob("*.pdf")) + _assert_file_bundle_records_exist(store, expected_plot_ids | expected_export_ids) assert any(plots_dir.glob("raw_kinetics*.pdf")) assert (outputs / layout.exports_subdir / "retron" / "semantic_summary.csv").exists() assert (outputs / layout.exports_subdir / "retron" / "semantic_trace.csv").exists() + + +@pytest.mark.smoke +def test_cli_notebook_scaffold_on_staged_experiment_preserves_runnable_readiness(tmp_path: Path) -> None: + cfg_path = _stage_experiment(tmp_path, "2025/20250614_sensor_panel_M9_glu") + runner = CliRunner() + + notebook_result = runner.invoke(app, ["notebook", str(cfg_path), "--mode", "none"], env={"COLUMNS": "200"}) + assert notebook_result.exit_code == 0, notebook_result.output + + notebook_path = cfg_path.parent / "outputs" / "notebooks" / default_notebook_name() + assert notebook_path.exists() + + inspect_result = runner.invoke(app, ["inspect", str(cfg_path), "--format", "json"]) + assert inspect_result.exit_code == 0 + inspect_payload = json.loads(inspect_result.output) + assert inspect_payload["implementation"]["readiness"]["state"] == "runnable" + + +@pytest.mark.smoke +def test_cli_preflight_surface_contracts_on_staged_retron_experiment(tmp_path: Path) -> None: + cfg_path = _stage_experiment(tmp_path, "2026/20260317_tetra_functional_sponges") + runner = CliRunner() + + validate_no_files = runner.invoke(app, ["validate", str(cfg_path), "--no-files", "--format", "json"]) + validate_files = runner.invoke(app, ["validate", str(cfg_path), "--format", "json"]) + dry_run = runner.invoke(app, ["run", str(cfg_path), "--dry-run", "--format", "json"]) + plot_list = runner.invoke(app, ["plot", str(cfg_path), "--list", "--format", "json"]) + export_list = runner.invoke(app, ["export", str(cfg_path), "--list", "--format", "json"]) + inspect_result = runner.invoke(app, ["inspect", str(cfg_path), "--format", "json"]) + + assert validate_no_files.exit_code == 0, validate_no_files.output + assert validate_files.exit_code == 0, validate_files.output + assert dry_run.exit_code == 0, dry_run.output + assert plot_list.exit_code == 0, plot_list.output + assert export_list.exit_code == 0, export_list.output + assert inspect_result.exit_code == 0, inspect_result.output + + validate_no_files_payload = json.loads(validate_no_files.output) + validate_files_payload = json.loads(validate_files.output) + dry_run_payload = json.loads(dry_run.output) + plot_payload = json.loads(plot_list.output) + export_payload = json.loads(export_list.output) + inspect_payload = json.loads(inspect_result.output) + + assert validate_no_files_payload["summary"]["status"] == "ok" + assert validate_files_payload["summary"]["status"] == "ok" + assert dry_run_payload["dry_run"] is True + assert ( + dry_run_payload["implementation"]["compiled"]["semantic_program"]["ranking"]["execution"]["status"] + == "compiled" + ) + assert "baseline_shifted_kinetics" in {item["id"] for item in plot_payload["plots"]} + assert "semantic_summary_table" in {item["id"] for item in export_payload["exports"]} + assert inspect_payload["experiment"]["notebook_template"] == "notebook/retron_sponge" + + +@pytest.mark.smoke +def test_cli_retron_sponge_experiment_runs_end_to_end_and_writes_artifact_journal(tmp_path: Path) -> None: + cfg_path = _stage_experiment(tmp_path, "2026/20260317_tetra_functional_sponges") + runner = CliRunner() + + validate_result = runner.invoke(app, ["validate", str(cfg_path), "--format", "json"]) + assert validate_result.exit_code == 0 + validate_payload = json.loads(validate_result.output) + assert validate_payload["summary"]["status"] == "ok" + + for command in ( + ["run", str(cfg_path)], + ["plot", str(cfg_path)], + ["export", str(cfg_path)], + ): + result = runner.invoke(app, command, env={"COLUMNS": "200"}) + assert result.exit_code == 0, result.output + + inspect_result = runner.invoke(app, ["inspect", str(cfg_path), "--format", "json"]) + assert inspect_result.exit_code == 0 + inspect_payload = json.loads(inspect_result.output) + assert inspect_payload["implementation"]["readiness"]["state"] == "records_ready" + assert ( + inspect_payload["implementation"]["compiled"]["semantic_program"]["ranking"]["execution"]["status"] + == "compiled" + ) + + records_result = runner.invoke(app, ["records", str(cfg_path), "--format", "json"]) + assert records_result.exit_code == 0 + records_payload = json.loads(records_result.output) + record_ids = {item["record_id"] for item in records_payload["records"]} + assert "semantic_metrics/trace" in record_ids + assert "semantic_metrics/summary" in record_ids + assert "plot:baseline_shifted_kinetics" in record_ids + assert "export:semantic_summary_table" in record_ids + + decl = load_decl(cfg_path) + workbench = resolve_workbench(decl) + layout = decl.experiment_semantics.layout + outputs = layout.outputs_dir + store = RecordStore( + outputs, + contracts=builtin_contract_catalog(), + plots_subdir=layout.plots_subdir, + exports_subdir=layout.exports_subdir, + create=False, + ) + expected_plot_ids = {f"plot:{plot.id}" for plot in workbench.plots} + expected_export_ids = {f"export:{export.id}" for export in workbench.exports} + _assert_file_bundle_records_exist(store, expected_plot_ids | expected_export_ids) + + journal = cfg_path.parent / "JOURNAL.md" + assert journal.exists() + journal_text = journal.read_text(encoding="utf-8") + assert "uv run reader run" in journal_text + assert "uv run reader plot" in journal_text + assert "uv run reader export" in journal_text diff --git a/src/reader/tests/workbench/test_assets.py b/src/reader/tests/workbench/test_assets.py index b5135f3..6bded81 100644 --- a/src/reader/tests/workbench/test_assets.py +++ b/src/reader/tests/workbench/test_assets.py @@ -81,15 +81,10 @@ def test_select_default_notebook_template_uses_protocol_policy() -> None: def test_static_asset_catalog_only_exposes_templates() -> None: catalog = static_asset_catalog() - assert [item.kind for item in catalog.all()] == [ - "template", - "template", - "template", - "template", - "template", - "template", - "template", - ] + items = catalog.all() + assert items + assert {item.kind for item in items} == {"template"} + assert catalog.resolve("notebook/dual_reporter_triptych", kind="template").kind == "template" def test_build_workbench_asset_catalog_requires_explicit_plugin_registry() -> None: diff --git a/src/reader/tests/workbench/test_dop_registry.py b/src/reader/tests/workbench/test_dop_registry.py new file mode 100644 index 0000000..147f794 --- /dev/null +++ b/src/reader/tests/workbench/test_dop_registry.py @@ -0,0 +1,87 @@ +from __future__ import annotations + +import json + +from typer.testing import CliRunner + +from reader.protocols import builtin_protocol_catalog +from reader.workbench import cli +from reader.workbench.dop import builtin_dop_registry +from reader.workbench.inspection.readiness import READINESS_CAPABILITY_KEYS, READINESS_STATES + + +def test_dop_registry_protocol_candidates_resolve_against_builtin_catalog() -> None: + registry = builtin_dop_registry() + protocols = builtin_protocol_catalog() + + registry.validate_protocol_refs(descriptor.protocol for descriptor in protocols.all()) + + covered_protocols = { + protocol_id for data_class in registry.data_classes() for protocol_id in data_class.protocol_candidates + } + assert {descriptor.protocol for descriptor in protocols.all()} <= covered_protocols + + +def test_dop_registry_ready_specs_match_reader_readiness_contract() -> None: + registry = builtin_dop_registry() + + registry.validate_ready_refs( + readiness_states=READINESS_STATES, + capability_keys=READINESS_CAPABILITY_KEYS, + ) + + assert [spec.id for spec in registry.ready_specs()] == [ + "classified", + "metadata_ready", + "staged", + "preflight_ok", + "runnable", + "records_ready", + "review_ready", + ] + + +def test_dop_classes_cli_emits_json() -> None: + runner = CliRunner() + result = runner.invoke(cli.app, ["dop", "classes", "--format", "json"]) + + assert result.exit_code == 0 + payload = json.loads(result.output) + ids = [item["id"] for item in payload["data_classes"]] + assert ids[0] == "plate_reader_screen" + assert "unsupported_long_tail_assay" in ids + assert "ready_specs" not in payload + + +def test_dop_classes_cli_filters_by_protocol() -> None: + runner = CliRunner() + result = runner.invoke( + cli.app, + ["dop", "classes", "--protocol", "plate_reader/retron_sponge_screen", "--format", "json"], + ) + + assert result.exit_code == 0 + payload = json.loads(result.output) + assert [item["id"] for item in payload["data_classes"]] == ["plate_reader_screen"] + + +def test_dop_ready_specs_cli_emits_json() -> None: + runner = CliRunner() + result = runner.invoke(cli.app, ["dop", "ready-specs", "--format", "json"]) + + assert result.exit_code == 0 + payload = json.loads(result.output) + assert [item["id"] for item in payload["ready_specs"]] == [ + "classified", + "metadata_ready", + "staged", + "preflight_ok", + "runnable", + "records_ready", + "review_ready", + ] + assert payload["ready_specs"][-1]["required_capabilities"] == [ + "records", + "plot", + "notebook_scaffold", + ] diff --git a/src/reader/workbench/assets/plugin_manifest.py b/src/reader/workbench/assets/plugin_manifest.py index ed910a0..727bb5e 100644 --- a/src/reader/workbench/assets/plugin_manifest.py +++ b/src/reader/workbench/assets/plugin_manifest.py @@ -8,6 +8,8 @@ from reader.plugins.plot.logic_symmetry import LogicSymmetryPlot from reader.plugins.plot.retron_summary import RetronSummaryPlot from reader.plugins.plot.retron_trace import RetronTracePlot +from reader.plugins.plot.sfxi_setpoint_scatter import SFXISetpointScatterPlot +from reader.plugins.plot.sfxi_triptych_sequence import SFXITriptychSequencePlot from reader.plugins.plot.snapshot_barplot import SnapshotBarplot from reader.plugins.plot.snapshot_heatmap import SnapshotHeatmapPlot from reader.plugins.plot.time_series import TimeSeriesPlot @@ -101,6 +103,26 @@ ), plugin_cls=LogicSymmetryPlot, ), + build_plugin_asset( + plugin_id="plot/sfxi_setpoint_scatter", + semantics=PluginSemantics( + domain="logic", + family="sfxi_objective_scatter", + summary="Render OPAL-compatible SFXI setpoint scatter plots over logic_fidelity and effect_scaled.", + tags=("logic", "sfxi", "setpoint", "scatter"), + ), + plugin_cls=SFXISetpointScatterPlot, + ), + build_plugin_asset( + plugin_id="plot/sfxi_triptych_sequence", + semantics=PluginSemantics( + domain="logic", + family="sfxi_triptych_sequence", + summary="Render SFXI promoter kinetics, snapshot, and sequence annotation figure bundles.", + tags=("logic", "sfxi", "triptych", "sequence", "baserender"), + ), + plugin_cls=SFXITriptychSequencePlot, + ), build_plugin_asset( plugin_id="plot/retron_trace", semantics=PluginSemantics( diff --git a/src/reader/workbench/cli/__init__.py b/src/reader/workbench/cli/__init__.py index 510ee7a..7c43927 100644 --- a/src/reader/workbench/cli/__init__.py +++ b/src/reader/workbench/cli/__init__.py @@ -1,6 +1,7 @@ from __future__ import annotations from . import demo as _demo # noqa: F401 +from . import dop as _dop # noqa: F401 from . import experiments as _experiments # noqa: F401 from . import notebooks as notebook_commands from . import protocols as _protocols # noqa: F401 diff --git a/src/reader/workbench/cli/_records_view.py b/src/reader/workbench/cli/_records_view.py new file mode 100644 index 0000000..deda0e2 --- /dev/null +++ b/src/reader/workbench/cli/_records_view.py @@ -0,0 +1,107 @@ +from __future__ import annotations + +from rich import box +from rich.panel import Panel + +from reader.errors import ReaderError +from reader.workbench.commands import reader_command + +from . import shared +from ._lazy import load as _load +from .helpers import load_job_models +from .shared import emit_json, normalize_output_format, table + + +def render_records( + *, + job_path, + all_revisions: bool, + format: str, +) -> None: + _, decl = load_job_models(job_path) + outputs_dir = decl.experiment_semantics.layout.outputs_dir + store = ( + _load("reader.runtime") + .builtin_runtime() + .record_store( + outputs_dir, + plots_subdir=decl.experiment_semantics.layout.plots_subdir, + exports_subdir=decl.experiment_semantics.layout.exports_subdir, + create=False, + ) + ) + if not store.catalog_exists(): + raise ReaderError( + f"No outputs/manifests/records.json found. Run '{reader_command('run', job_path)}' first to produce records." + ) + + fmt = normalize_output_format(format) + if fmt == "json": + emit_json( + _load("reader.workbench.inspection.results").record_catalog_payload( + experiment=_load("reader.workbench.inspection.experiments").experiment_identity_payload( + job_path=job_path, decl=decl + ), + store=store, + outputs_dir=outputs_dir, + base=decl.experiment.root, + include_history=all_revisions, + ) + ) + return + + latest_records = store.iter_latest_records() + if all_revisions: + if not latest_records: + shared.console.print( + Panel.fit( + ( + "No record history listed in outputs/manifests/records.json. " + f"Run '{reader_command('run', job_path)}' first." + ), + border_style="warn", + box=box.ROUNDED, + ) + ) + return + revision_counts = store.revision_counts(record.record_id for record in latest_records) + listing = table("Records • history") + listing.add_column("Record") + listing.add_column("Kind", style="accent") + listing.add_column("Producer") + listing.add_column("Revisions", justify="right") + for record in latest_records: + listing.add_row( + record.record_id, + record.kind, + f"{record.producer.kind}:{record.producer.id}", + str(revision_counts[record.record_id]), + ) + shared.console.print(Panel(listing, border_style="accent", box=box.ROUNDED)) + return + + if not latest_records: + shared.console.print( + Panel.fit( + ( + "No records listed in outputs/manifests/records.json. " + f"Run '{reader_command('run', job_path)}' first." + ), + border_style="warn", + box=box.ROUNDED, + ) + ) + return + listing = table("Records • latest") + listing.add_column("Record") + listing.add_column("Kind", style="accent") + listing.add_column("Producer") + listing.add_column("Details", style="path") + for record in latest_records: + detail = ( + f"{record.contract_id} • {record.path}" + if record.kind == "dataframe_artifact" + else ", ".join(str(path) for path in record.files) + ) + listing.add_row(record.record_id, record.kind, f"{record.producer.kind}:{record.producer.id}", detail) + shared.console.print(Panel(listing, border_style="accent", box=box.ROUNDED)) diff --git a/src/reader/workbench/cli/_surface_execution.py b/src/reader/workbench/cli/_surface_execution.py new file mode 100644 index 0000000..510e27f --- /dev/null +++ b/src/reader/workbench/cli/_surface_execution.py @@ -0,0 +1,426 @@ +from __future__ import annotations + +from collections.abc import Callable +from pathlib import Path + +import typer +from rich import box +from rich.panel import Panel + +from reader.workbench.commands import reader_command + +from . import shared +from ._lazy import load as _load +from .helpers import bind_decl_protocol, format_job_arg, load_job_models +from .shared import emit_json, normalize_output_format, table + + +def spec_overrides(): + return _load("reader.workbench.spec_overrides") + + +def validate_list_mode_flags( + *, + list_only: bool, + dry_run: bool, + inputs: list[str] | None, + sets: list[str] | None, +) -> None: + if not list_only: + return + if dry_run: + raise typer.BadParameter("--dry-run cannot be combined with --list") + if inputs: + raise typer.BadParameter("--input cannot be combined with --list") + if sets: + raise typer.BadParameter("--set cannot be combined with --list") + + +def apply_surface_overrides( + selected, + *, + inputs: list[str] | None, + sets: list[str] | None, + experiment_root: Path, + resources, +): + overrides = spec_overrides() + input_overrides = overrides.parse_input_overrides(inputs or [], root=experiment_root, resources=resources) + set_overrides = overrides.parse_set_overrides(sets or []) + selected = overrides.apply_step_overrides( + selected, + input_overrides=input_overrides, + set_overrides=set_overrides, + root=experiment_root, + resources=resources, + ) + return overrides, selected + + +def _raise_dependency_preflight_errors(*, selected, runtime, bound_protocol, exp_root: Path, label: str) -> None: + errors: list[str] = [] + for step in selected: + plugin_cls = runtime.plugins.resolve(step.plugin) + cfg = plugin_cls.ConfigModel.model_validate( + bound_protocol.effective_plugin_config(plugin_id=step.plugin, step_with=(step.with_ or {})) + ) + for issue in plugin_cls.preflight_readiness(exp_dir=exp_root, cfg=cfg, reads=(step.reads or {})): + if issue.kind == "dependency": + errors.append(f"{label}:{step.id} • {issue.message}") + if errors: + raise typer.BadParameter("\n".join(errors)) + + +def validate_plot_job_for_execution( + job_path: Path, + *, + only: list[str] | None, + exclude: list[str] | None, + dry_run: bool, + inputs: list[str] | None, + sets: list[str] | None, + ensure_active_lifecycle_fn: Callable, + require_dataframe_records_fn: Callable, +) -> None: + runtime = _load("reader.runtime").builtin_runtime() + _, decl = load_job_models(job_path, runtime=runtime) + if not dry_run: + ensure_active_lifecycle_fn(decl, job_path, command_name="plot") + workbench = _load("reader.workbench.graph").resolve_workbench(decl) + plot_specs = list(workbench.plots) + if not plot_specs: + raise typer.BadParameter("No plots configured in this experiment. Add plots to the config.") + selected = spec_overrides().select_surface_specs( + plot_specs, only=only or [], exclude=exclude or [], kind="plot spec" + ) + if not selected: + raise typer.BadParameter("No plots selected. Adjust --only/--exclude or use --list to inspect valid ids.") + _, selected = apply_surface_overrides( + selected, + inputs=inputs, + sets=sets, + experiment_root=decl.experiment.root, + resources=decl.experiment_semantics.resources, + ) + if dry_run: + _raise_dependency_preflight_errors( + selected=selected, + runtime=runtime, + bound_protocol=bind_decl_protocol(decl=decl, runtime=runtime), + exp_root=decl.experiment.root, + label="plot", + ) + if not dry_run: + require_dataframe_records_fn(decl, job_path, runtime=runtime) + + +def render_surface_specs_table( + *, + title_text: str, + selected, + runtime, + record_producers, + summaries: dict[str, str], +) -> None: + inspection_runtime = _load("reader.workbench.inspection.runtime") + listing = table(title_text) + listing.add_column("#", justify="right", style="muted") + listing.add_column("id", style="accent", overflow="fold") + listing.add_column("summary", overflow="fold") + listing.add_column("from", overflow="fold") + listing.add_column("plugin", overflow="fold") + for index, spec in enumerate(selected, 1): + spec_payload = inspection_runtime.spec_step_payload( + spec, summary=summaries.get(spec.id, "—"), runtime=runtime, record_producers=record_producers + ) + from_refs = ", ".join(inspection_runtime.render_read_binding(item) for item in spec_payload["reads"]) or "—" + listing.add_row(str(index), spec.id, summaries.get(spec.id, "—"), from_refs, spec.plugin) + shared.console.print( + Panel(listing, border_style="accent", box=box.ROUNDED, subtitle=f"[muted]{len(selected)} total[/muted]") + ) + + +def surface_next_steps( + *, + job_hint: str | None, + output_dir: Path, + include_plot: bool, + include_export: bool, +) -> None: + def _cmd(base: str, tail: str = "") -> str: + return reader_command(base, job_hint, tail) + + lines = [f"Files saved in [path]{output_dir}[/path]", "", "Next steps:"] + if include_plot: + lines.append(f" {_cmd('plot')}") + if include_export: + lines.append(f" {_cmd('export')}") + lines.append(f" {_cmd('notebook')}") + shared.console.print(Panel.fit("\n".join(lines), border_style="green", box=box.ROUNDED)) + + +def run_plot_job( + job_path: Path, + *, + job_hint: str | None, + only: list[str] | None, + exclude: list[str] | None, + list_only: bool, + format: str, + dry_run: bool, + log_level: str, + inputs: list[str] | None, + sets: list[str] | None, + ensure_active_lifecycle_fn: Callable, + require_dataframe_records_fn: Callable, + append_journal_fn: Callable, +) -> None: + _, decl = load_job_models(job_path) + if not list_only and not dry_run: + ensure_active_lifecycle_fn(decl, job_path, command_name="plot") + runtime = _load("reader.runtime").builtin_runtime() + workbench = _load("reader.workbench.graph").resolve_workbench(decl) + bound_protocol = bind_decl_protocol(decl=decl, runtime=runtime) + inspection_catalogs = _load("reader.workbench.inspection.catalogs") + inspection_runtime = _load("reader.workbench.inspection.runtime") + fmt = normalize_output_format(format) + if not list_only: + if fmt == "json": + raise typer.BadParameter("--format json is only supported with --list") + if not dry_run: + require_dataframe_records_fn(decl, job_path, runtime=runtime) + plot_specs = list(workbench.plots) + record_producers = inspection_runtime.record_producer_map(workbench.plugin_steps(), runtime=runtime) + if not plot_specs: + if list_only: + if fmt == "json": + emit_json( + inspection_catalogs.workbench_surface_specs_payload( + job_path=job_path, + decl=decl, + runtime=runtime, + bound_protocol=bound_protocol, + selected=[], + kind="plot", + only=only or [], + exclude=exclude or [], + ) + ) + return + shared.console.print( + Panel.fit("No plots configured in this experiment.", border_style="warn", box=box.ROUNDED) + ) + return + raise typer.BadParameter("No plots configured in this experiment. Add plots to the config.") + selected = spec_overrides().select_surface_specs( + plot_specs, only=only or [], exclude=exclude or [], kind="plot spec" + ) + if list_only: + if fmt == "json": + emit_json( + inspection_catalogs.workbench_surface_specs_payload( + job_path=job_path, + decl=decl, + runtime=runtime, + bound_protocol=bound_protocol, + selected=selected, + kind="plot", + only=only or [], + exclude=exclude or [], + ) + ) + return + render_surface_specs_table( + title_text="Plots", + selected=selected, + runtime=runtime, + record_producers=record_producers, + summaries=inspection_runtime.plot_output_summaries(bound_protocol), + ) + return + if not selected: + raise typer.BadParameter("No plots selected. Adjust --only/--exclude or use --list to inspect valid ids.") + experiment_root = decl.experiment.root + resources = decl.experiment_semantics.resources + overrides, selected = apply_surface_overrides( + selected, + inputs=inputs, + sets=sets, + experiment_root=experiment_root, + resources=resources, + ) + if dry_run: + _raise_dependency_preflight_errors( + selected=selected, + runtime=runtime, + bound_protocol=bound_protocol, + exp_root=experiment_root, + label="plot", + ) + if not dry_run: + append_journal_fn( + job_path, + " ".join( + overrides.build_surface_command( + "reader plot", + job_path, + only=only, + exclude=exclude, + list_only=False, + dry_run=dry_run, + log_level=log_level, + inputs=inputs, + sets=sets, + ) + ), + ) + _load("reader.workbench.engine").run_spec( + decl, + dry_run=dry_run, + log_level=log_level, + console=shared.console, + include_pipeline=False, + include_plots=True, + include_exports=False, + plot_specs=selected, + runtime=runtime, + ) + if not dry_run: + outputs_dir = decl.experiment_semantics.layout.outputs_dir + plots_cfg = decl.experiment_semantics.layout.plots_subdir + plots_dir = outputs_dir if plots_cfg in ("", ".", "./") else outputs_dir / str(plots_cfg) + surface_next_steps( + job_hint=format_job_arg(job_hint), + output_dir=plots_dir, + include_plot=False, + include_export=bool(workbench.exports), + ) + + +def run_export_job( + job_path: Path, + *, + job_hint: str | None, + only: list[str] | None, + exclude: list[str] | None, + list_only: bool, + format: str, + dry_run: bool, + log_level: str, + inputs: list[str] | None, + sets: list[str] | None, + ensure_active_lifecycle_fn: Callable, + require_dataframe_records_fn: Callable, + append_journal_fn: Callable, +) -> None: + _, decl = load_job_models(job_path) + if not list_only and not dry_run: + ensure_active_lifecycle_fn(decl, job_path, command_name="export") + fmt = normalize_output_format(format) + workbench = _load("reader.workbench.graph").resolve_workbench(decl) + runtime = _load("reader.runtime").builtin_runtime() + inspection_catalogs = _load("reader.workbench.inspection.catalogs") + inspection_runtime = _load("reader.workbench.inspection.runtime") + record_producers = inspection_runtime.record_producer_map(workbench.plugin_steps(), runtime=runtime) + bound_protocol = bind_decl_protocol(decl=decl, runtime=runtime) + export_specs = list(workbench.exports) + if not export_specs: + if list_only: + if fmt == "json": + emit_json( + inspection_catalogs.workbench_surface_specs_payload( + job_path=job_path, + decl=decl, + runtime=runtime, + bound_protocol=bound_protocol, + selected=[], + kind="export", + only=only or [], + exclude=exclude or [], + ) + ) + return + shared.console.print( + Panel.fit("No exports configured in this experiment.", border_style="warn", box=box.ROUNDED) + ) + return + raise typer.BadParameter("No exports configured in this experiment. Add exports to the config.") + selected = spec_overrides().select_surface_specs( + export_specs, only=only or [], exclude=exclude or [], kind="export spec" + ) + if list_only: + if fmt == "json": + emit_json( + inspection_catalogs.workbench_surface_specs_payload( + job_path=job_path, + decl=decl, + runtime=runtime, + bound_protocol=bound_protocol, + selected=selected, + kind="export", + only=only or [], + exclude=exclude or [], + ) + ) + return + render_surface_specs_table( + title_text="Exports", + selected=selected, + runtime=runtime, + record_producers=record_producers, + summaries=inspection_runtime.export_output_summaries(bound_protocol), + ) + return + if not selected: + raise typer.BadParameter("No exports selected. Adjust --only/--exclude or use --list to inspect valid ids.") + if fmt == "json": + raise typer.BadParameter("--format json is only supported with --list") + if not dry_run: + require_dataframe_records_fn(decl, job_path, runtime=runtime) + experiment_root = decl.experiment.root + resources = decl.experiment_semantics.resources + overrides, selected = apply_surface_overrides( + selected, + inputs=inputs, + sets=sets, + experiment_root=experiment_root, + resources=resources, + ) + if not dry_run: + append_journal_fn( + job_path, + " ".join( + overrides.build_surface_command( + "reader export", + job_path, + only=only, + exclude=exclude, + list_only=False, + dry_run=dry_run, + log_level=log_level, + inputs=inputs, + sets=sets, + ) + ), + ) + _load("reader.workbench.engine").run_spec( + decl, + dry_run=dry_run, + log_level=log_level, + console=shared.console, + include_pipeline=False, + include_plots=False, + include_exports=True, + export_specs=selected, + runtime=runtime, + ) + if not dry_run: + outputs_dir = decl.experiment_semantics.layout.outputs_dir + exports_cfg = decl.experiment_semantics.layout.exports_subdir + exports_dir = outputs_dir if exports_cfg in ("", ".", "./") else outputs_dir / str(exports_cfg) + surface_next_steps( + job_hint=format_job_arg(job_hint), + output_dir=exports_dir, + include_plot=bool(workbench.plots), + include_export=False, + ) diff --git a/src/reader/workbench/cli/dop.py b/src/reader/workbench/cli/dop.py new file mode 100644 index 0000000..ee55218 --- /dev/null +++ b/src/reader/workbench/cli/dop.py @@ -0,0 +1,91 @@ +from __future__ import annotations + +import typer +from rich import box +from rich.panel import Panel + +from reader.errors import ConfigError + +from . import shared +from ._lazy import load as _load +from .shared import app, emit_json, normalize_output_format + +dop_app = typer.Typer( + add_completion=False, + help="Inspect the reader-local Data Operations Plan registry.", +) + + +def _registry(): + return _load("reader.workbench.dop").builtin_dop_registry() + + +def _validate_registry_protocol_refs(registry, runtime) -> None: + registry.validate_protocol_refs(descriptor.protocol for descriptor in runtime.protocols.all()) + + +@dop_app.command("classes", help="List DOP data classes and their reader protocol candidates.") +def data_classes( + name: str | None = typer.Argument(None, metavar="[DATA_CLASS]", help="Optional DOP data class id to describe."), + protocol: str | None = typer.Option( + None, + "--protocol", + metavar="ID", + help="Only show data classes that include the given reader protocol id.", + ), + format: str = typer.Option( + "table", + "--format", + metavar="FMT", + help="Output format: table | json (default: table).", + ), +): + registry = _registry() + runtime = _load("reader.runtime").builtin_runtime() + inspection_dop = _load("reader.workbench.inspection.dop") + fmt = normalize_output_format(format) + try: + _validate_registry_protocol_refs(registry, runtime) + selected = registry.data_classes() + if name is not None: + selected = (registry.data_class(name),) + if protocol is not None and protocol.strip(): + resolved_protocol = runtime.protocols.resolve(protocol.strip()).protocol + selected = tuple(item for item in selected if resolved_protocol in item.protocol_candidates) + except (ConfigError, ValueError) as err: + raise typer.BadParameter(str(err)) from err + if fmt == "json": + emit_json(inspection_dop.data_classes_payload(selected)) + return + shared.console.print(Panel(inspection_dop.data_classes_table(selected), border_style="accent", box=box.ROUNDED)) + + +@dop_app.command("ready-specs", help="List DOP readiness gates and their reader evidence requirements.") +def ready_specs( + name: str | None = typer.Argument(None, metavar="[READY_SPEC]", help="Optional DOP ready spec id to describe."), + format: str = typer.Option( + "table", + "--format", + metavar="FMT", + help="Output format: table | json (default: table).", + ), +): + registry = _registry() + inspection_dop = _load("reader.workbench.inspection.dop") + readiness = _load("reader.workbench.inspection.readiness") + fmt = normalize_output_format(format) + try: + registry.validate_ready_refs( + readiness_states=readiness.READINESS_STATES, + capability_keys=readiness.READINESS_CAPABILITY_KEYS, + ) + selected = registry.ready_specs() if name is None else (registry.ready_spec(name),) + except ValueError as err: + raise typer.BadParameter(str(err)) from err + if fmt == "json": + emit_json(inspection_dop.ready_specs_payload(selected)) + return + shared.console.print(Panel(inspection_dop.ready_specs_table(selected), border_style="accent", box=box.ROUNDED)) + + +app.add_typer(dop_app, name="dop") diff --git a/src/reader/workbench/cli/experiments.py b/src/reader/workbench/cli/experiments.py index cec28b6..1fe0b76 100644 --- a/src/reader/workbench/cli/experiments.py +++ b/src/reader/workbench/cli/experiments.py @@ -14,6 +14,7 @@ from ._lazy import load as _load from .helpers import ( append_journal, + ensure_active_lifecycle, find_nearest_experiments_dir, format_job_arg, indexed_jobs, @@ -45,7 +46,7 @@ def ls( include_scaffolds: bool = typer.Option( False, "--all", - help="Include scaffold/template directories alongside the default experiment inventory.", + help="Include scaffold and template directories too.", ), protocol: str | None = typer.Option( None, "--protocol", metavar="ID", help="Only show experiments bound to the given protocol id." @@ -65,12 +66,12 @@ def ls( details: bool = typer.Option( False, "--details", - help="Show protocol id, selected-plan summary, and generated output counts for each experiment.", + help="Show protocol id, selected steps, and generated output counts for each experiment.", ), readiness: bool = typer.Option( False, "--readiness", - help="With --details, run per-experiment preflight and record-state checks.", + help="With --details, run preflight and records checks for each experiment.", ), format: str = typer.Option( "table", @@ -312,15 +313,15 @@ def ls( readiness_bits = [f"{key}={value}" for key, value in dict(inventory_summary.get("by_readiness") or {}).items()] if readiness_bits: summary.add_row("Readiness", ", ".join(readiness_bits)) - shared.console.print(Panel(summary, border_style="accent", box=box.ROUNDED, title="Inventory summary")) + shared.console.print(Panel(summary, border_style="accent", box=box.ROUNDED, title="Experiment summary")) -@app.command(help="Inspect one experiment: readiness, inputs, pipeline chain, plots, artifacts, and generated outputs.") +@app.command(help="Inspect one experiment: readiness, inputs, pipeline steps, plots, exports, and generated outputs.") def inspect( job: str | None = typer.Argument( None, metavar="[CONFIG]", - help="Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' (defaults to nearest ./config.yaml)", + help=shared.JOB_ARG_HELP_WITH_DEFAULT, ), format: str = typer.Option( "table", "--format", metavar="FMT", help="Output format: table | json (default: table)." @@ -339,8 +340,10 @@ def inspect( if fmt == "json": emit_json(payload) return + semantic_program = decl.experiment_semantics.protocol_program for renderable in _load("reader.workbench.inspection.reports").experiment_inspect_renderables( - payload=payload, semantic_program=decl.experiment_semantics.protocol_program + payload=payload, + semantic_program=semantic_program, ): shared.console.print(renderable) @@ -350,7 +353,7 @@ def explain( job: str | None = typer.Argument( None, metavar="[CONFIG]", - help="Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' (defaults to nearest ./config.yaml)", + help=shared.JOB_ARG_HELP_WITH_DEFAULT, ), format: str = typer.Option( "table", "--format", metavar="FMT", help="Output format: table | json (default: table)." @@ -358,7 +361,6 @@ def explain( ): try: job_path = infer_job_path(job) - append_journal(job_path, reader_command("explain", job_path)) spec, decl = load_job_models(job_path) runtime = _load("reader.runtime").builtin_runtime() fmt = normalize_output_format(format) @@ -379,7 +381,7 @@ def validate( job: str | None = typer.Argument( None, metavar="[CONFIG]", - help="Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' (defaults to nearest ./config.yaml)", + help=shared.JOB_ARG_HELP_WITH_DEFAULT, ), no_files: bool = typer.Option(False, "--no-files", help="Skip file existence checks (config-only validation)."), format: str = typer.Option( @@ -388,7 +390,6 @@ def validate( ): try: job_path = infer_job_path(job) - append_journal(job_path, reader_command("validate", job_path)) _, decl = load_job_models(job_path) runtime = _load("reader.runtime").builtin_runtime() fmt = normalize_output_format(format) @@ -421,12 +422,12 @@ def validate( handle_reader_error(err) -@app.command(help="Print the authoring config plus compiled runtime plan.") +@app.command(help="Print the config plus compiled runtime plan.") def config( job: str | None = typer.Argument( None, metavar="[CONFIG]", - help="Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' (defaults to nearest ./config.yaml)", + help=shared.JOB_ARG_HELP_WITH_DEFAULT, ), format: str = typer.Option("yaml", "--format", metavar="FMT", help="Output format: yaml | json (default: yaml)."), ): @@ -463,7 +464,7 @@ def run( job: str | None = typer.Argument( None, metavar="[CONFIG]", - help="Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' (defaults to nearest ./config.yaml)", + help=shared.JOB_ARG_HELP_WITH_DEFAULT, ), from_step: str | None = typer.Option( None, @@ -504,9 +505,11 @@ def run( fmt = normalize_output_format(format) if fmt == "json" and not dry_run: raise typer.BadParameter("--format json is only supported with --dry-run") + if not dry_run: + ensure_active_lifecycle(decl, job_path, command_name="run") if only: - resolve_pipeline_step_id(decl, only) + resolve_pipeline_step_id(decl, only, job_path=job_path) parts += ["--only", only] if dry_run: parts += ["--dry-run"] @@ -516,8 +519,9 @@ def run( parts += ["--log-level", log_level] if compact: parts += ["--compact"] - append_journal(job_path, " ".join(parts)) try: + if not dry_run: + append_journal(job_path, " ".join(parts)) runtime = _load("reader.runtime").builtin_runtime() if dry_run and fmt == "json": emit_json( @@ -543,6 +547,8 @@ def run( include_pipeline=True, include_plots=False, include_exports=False, + job_label=format_job_arg(job), + show_next_steps=True, runtime=runtime, ) except ReaderError as err: @@ -550,10 +556,10 @@ def run( return if from_step: - resolve_pipeline_step_id(decl, from_step) + resolve_pipeline_step_id(decl, from_step, job_path=job_path) parts += ["--from", from_step] if until: - resolve_pipeline_step_id(decl, until) + resolve_pipeline_step_id(decl, until, job_path=job_path) parts += ["--until", until] if dry_run: parts += ["--dry-run"] @@ -563,8 +569,9 @@ def run( parts += ["--log-level", log_level] if compact: parts += ["--compact"] - append_journal(job_path, " ".join(parts)) try: + if not dry_run: + append_journal(job_path, " ".join(parts)) runtime = _load("reader.runtime").builtin_runtime() if dry_run and fmt == "json": emit_json( diff --git a/src/reader/workbench/cli/helpers.py b/src/reader/workbench/cli/helpers.py index 0b50612..5aaf3e2 100644 --- a/src/reader/workbench/cli/helpers.py +++ b/src/reader/workbench/cli/helpers.py @@ -6,7 +6,7 @@ import typer -from reader.errors import RecordError +from reader.errors import ReaderError, RecordError from reader.workbench.commands import reader_command from reader.workbench.experiments import discover_experiment_configs @@ -187,11 +187,16 @@ def infer_job_path(job: str | None) -> Path: return job_lookup[idx] jobs_with_scaffolds_lookup = dict(jobs_with_scaffolds) if idx in jobs_with_scaffolds_lookup: - return jobs_with_scaffolds_lookup[idx] + hidden_job = jobs_with_scaffolds_lookup[idx] + raise typer.BadParameter( + f"Experiment index {idx} points to hidden scaffold/template config {hidden_job.parent} under {root_path}. " + f"Numeric indexes only address the default 'uv run reader ls' inventory. " + "Pass the path explicitly for scaffold/template configs." + ) scaffold_hint = "" if jobs_with_scaffolds != jobs: scaffold_hint = ( - " Default inventory valid: " + " Default index range: " f"{_format_index_span(index for index, _ in jobs)}; with '--all': " f"{_format_index_span(index for index, _ in jobs_with_scaffolds)}." ) @@ -238,6 +243,17 @@ def format_job_arg(job: str | None) -> str | None: return value or None +def ensure_active_lifecycle(decl: WorkbenchDecl, job_path: Path, *, command_name: str) -> None: + lifecycle = decl.experiment.lifecycle + if lifecycle == "active": + return + raise typer.BadParameter( + f"Experiment lifecycle '{lifecycle}' is not runnable for '{command_name}'. " + f"Use '{reader_command('validate', job_path, '--no-files')}' to check the config or " + f"'{reader_command('inspect', job_path)}' for details." + ) + + def require_dataframe_records(decl: WorkbenchDecl, job_path: Path, *, runtime: ReaderRuntime) -> None: layout = decl.experiment_semantics.layout outputs_dir = layout.outputs_dir @@ -254,9 +270,7 @@ def require_dataframe_records(decl: WorkbenchDecl, job_path: Path, *, runtime: R try: records = store.iter_latest_records(kind="dataframe_artifact") except RecordError as exc: - raise RecordError( - f"Could not read record catalog at {store.records_path}. Run '{reader_command('run', job_path)}' first." - ) from exc + raise RecordError(f"Could not read record catalog at {store.records_path}: {exc}") from exc if not records: raise RecordError( f"No dataframe records listed in outputs/manifests/records.json. Run '{reader_command('run', job_path)}' first." @@ -265,9 +279,7 @@ def require_dataframe_records(decl: WorkbenchDecl, job_path: Path, *, runtime: R def append_journal(job_path: Path, command_line: str) -> None: exp_dir = job_path.parent - journal = exp_dir / ( - "JOURNAL.md" if (exp_dir / "JOURNAL.md").exists() or not (exp_dir / "journal.md").exists() else "journal.md" - ) + journal = _canonical_journal_path(exp_dir) ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S") header = "" if journal.exists() else "# Experiment Journal\n\n" entry = f"### {ts}\n\n```\n{command_line}\n```\n\n" @@ -277,14 +289,32 @@ def append_journal(job_path: Path, command_line: str) -> None: ) -def resolve_pipeline_step_id(decl: WorkbenchDecl, which: str) -> str: +def _canonical_journal_path(exp_dir: Path) -> Path: + canonical = exp_dir / "JOURNAL.md" + legacy = exp_dir / "journal.md" + if canonical.exists() and legacy.exists(): + try: + if canonical.samefile(legacy): + return canonical + except OSError: + pass + raise ReaderError( + f"Both {canonical.name} and {legacy.name} exist in {exp_dir}. Consolidate to {canonical.name} first." + ) + if legacy.exists(): + legacy.rename(canonical) + return canonical + + +def resolve_pipeline_step_id(decl: WorkbenchDecl, which: str, *, job_path: Path | None = None) -> str: which_str = str(which).strip() pipeline = list(_load("reader.workbench.graph").resolve_workbench(decl).pipeline) if any(step.id == which_str for step in pipeline): return which_str options = ", ".join(step.id for step in pipeline[:12]) + steps_command = reader_command("steps", job_path) if job_path is not None else reader_command("steps") raise typer.BadParameter( - f"Unknown pipeline step id '{which_str}'. Tip: use '{reader_command('steps')}' to list ids " + f"Unknown pipeline step id '{which_str}'. Tip: use '{steps_command}' to list ids " f"(first few: {options}{' …' if len(pipeline) > 12 else ''})." ) diff --git a/src/reader/workbench/cli/notebooks.py b/src/reader/workbench/cli/notebooks.py index 35f509e..9793fd7 100644 --- a/src/reader/workbench/cli/notebooks.py +++ b/src/reader/workbench/cli/notebooks.py @@ -263,7 +263,7 @@ def notebook( job: str | None = typer.Argument( None, metavar="CONFIG|DIR|INDEX", - help="Experiment config path, directory, or index from 'uv run reader ls'.", + help=shared.JOB_ARG_HELP_SHORT, ), name: str | None = typer.Option( None, diff --git a/src/reader/workbench/cli/protocols.py b/src/reader/workbench/cli/protocols.py index 463c014..d0f7304 100644 --- a/src/reader/workbench/cli/protocols.py +++ b/src/reader/workbench/cli/protocols.py @@ -27,8 +27,8 @@ def protocols( "or plate_reader/retron_sponge_screen)." ), ), - domain: str | None = typer.Option(None, "--domain", metavar="NAME", help="Filter protocols by semantic domain."), - family: str | None = typer.Option(None, "--family", metavar="NAME", help="Filter protocols by semantic family."), + domain: str | None = typer.Option(None, "--domain", metavar="NAME", help="Filter protocols by domain."), + family: str | None = typer.Option(None, "--family", metavar="NAME", help="Filter protocols by family."), example_config: bool = typer.Option( False, "--example-config", @@ -55,7 +55,7 @@ def protocols( emit_json(inspection_protocols.protocol_descriptor_payload(descriptor, runtime=runtime)) return bound_protocol, compiled_plan = default_protocol_plan(descriptor=descriptor, runtime=runtime) - semantic_program = compiled_plan.semantic_program or descriptor.semantic_program() + semantic_program = compiled_plan.semantic_program summary = table(f"Protocol: {descriptor.protocol}") summary.add_column("Section", style="accent") summary.add_column("Details") @@ -74,15 +74,16 @@ def protocols( summary.add_row("Metrics", ", ".join(item.id for item in descriptor.metrics)) if descriptor.ranking is not None: summary.add_row("Primary ranking", descriptor.ranking.primary_metric) - summary.add_row( - "Semantic nodes", - str( - len(semantic_program.controls) - + len(semantic_program.windows) - + len(semantic_program.metrics) - + (1 if semantic_program.ranking is not None else 0) - ), - ) + if semantic_program.has_nodes: + summary.add_row( + "Semantic nodes", + str( + len(semantic_program.controls) + + len(semantic_program.windows) + + len(semantic_program.metrics) + + (1 if semantic_program.ranking is not None else 0) + ), + ) summary.add_row("Default notebook", descriptor.execution.notebook.default_template) summary.add_row("Allowed notebooks", ", ".join(descriptor.execution.notebook.allowed_templates)) if descriptor.default_plot_profile is not None: @@ -93,7 +94,7 @@ def protocols( if input_rows: shared.console.print( Panel( - inspection_protocols.protocol_surface_table("Inputs Surface", input_rows), + inspection_protocols.protocol_surface_table("Inputs", input_rows), border_style="accent", box=box.ROUNDED, ) @@ -102,20 +103,26 @@ def protocols( if analysis_rows: shared.console.print( Panel( - inspection_protocols.protocol_surface_table("Analysis Surface", analysis_rows), + inspection_protocols.protocol_surface_table("Analysis", analysis_rows), border_style="accent", box=box.ROUNDED, ) ) - if ( - semantic_program.controls - or semantic_program.windows - or semantic_program.metrics - or semantic_program.ranking - ): + if semantic_program.has_nodes: shared.console.print( Panel( - inspection_semantics.semantic_program_table(semantic_program), + inspection_semantics.semantic_program_table(semantic_program, include_execution=False), + border_style="accent", + box=box.ROUNDED, + ) + ) + if semantic_program.has_nodes: + shared.console.print( + Panel( + inspection_semantics.semantic_program_table( + semantic_program, + title="Compiled Semantic Execution", + ), border_style="accent", box=box.ROUNDED, ) diff --git a/src/reader/workbench/cli/shared.py b/src/reader/workbench/cli/shared.py index 97fa7cc..d63e452 100644 --- a/src/reader/workbench/cli/shared.py +++ b/src/reader/workbench/cli/shared.py @@ -32,14 +32,21 @@ invoke_without_command=True, help=( "reader — experimental workbench.\n\n" - "Discover assays and experiments, inspect compiled workflow plans, validate authoring YAML, " - "run pipelines, and materialize plots, exports, or notebooks. " + "Discover assays and experiments, inspect workflow plans, validate YAML, " + "run pipelines, and produce plots, exports, or notebooks. " "Start with 'uv run reader demo', 'uv run reader ls', or 'uv run reader protocols'." ), ) console = Console(theme=THEME) rich_tracebacks(show_locals=False) +JOB_INDEX_SCOPE_NOTE = "from the default 'uv run reader ls' inventory, resolved against the nearest experiments/ root from the current working directory" +JOB_ARG_HELP = ( + f"Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' {JOB_INDEX_SCOPE_NOTE}." +) +JOB_ARG_HELP_WITH_DEFAULT = f"{JOB_ARG_HELP[:-1]} (defaults to nearest ./config.yaml)" +JOB_ARG_HELP_SHORT = f"Experiment config path, directory, or index from 'uv run reader ls' {JOB_INDEX_SCOPE_NOTE}." + PLOT_ONLY_OPTION = typer.Option(None, "--only", help="Run only the specified plot id (repeatable).") PLOT_EXCLUDE_OPTION = typer.Option(None, "--exclude", help="Exclude the specified plot id (repeatable).") PLOT_INPUT_OPTION = typer.Option( diff --git a/src/reader/workbench/cli/surfaces.py b/src/reader/workbench/cli/surfaces.py index 3cfc837..0f08705 100644 --- a/src/reader/workbench/cli/surfaces.py +++ b/src/reader/workbench/cli/surfaces.py @@ -7,16 +7,14 @@ from rich.panel import Panel from reader.errors import ReaderError -from reader.workbench.commands import reader_command -from . import shared +from . import _records_view, _surface_execution, shared from ._lazy import load as _load from .helpers import ( append_journal, - bind_decl_protocol, + ensure_active_lifecycle, find_nearest_experiments_dir, find_year_jobs, - format_job_arg, infer_job_path, load_job_models, require_dataframe_records, @@ -40,41 +38,81 @@ def _spec_overrides(): - return _load("reader.workbench.spec_overrides") + return _surface_execution.spec_overrides() + + +def _validate_list_mode_flags( + *, + list_only: bool, + dry_run: bool, + inputs: list[str] | None, + sets: list[str] | None, +) -> None: + _surface_execution.validate_list_mode_flags( + list_only=list_only, + dry_run=dry_run, + inputs=inputs, + sets=sets, + ) + + +def _apply_surface_overrides( + selected, + *, + inputs: list[str] | None, + sets: list[str] | None, + experiment_root: Path, + resources, +): + return _surface_execution.apply_surface_overrides( + selected, + inputs=inputs, + sets=sets, + experiment_root=experiment_root, + resources=resources, + ) + + +def _validate_plot_job_for_execution( + job_path: Path, + *, + only: list[str] | None, + exclude: list[str] | None, + dry_run: bool, + inputs: list[str] | None, + sets: list[str] | None, +) -> None: + _surface_execution.validate_plot_job_for_execution( + job_path, + only=only, + exclude=exclude, + dry_run=dry_run, + inputs=inputs, + sets=sets, + ensure_active_lifecycle_fn=ensure_active_lifecycle, + require_dataframe_records_fn=require_dataframe_records, + ) def _render_surface_specs_table( *, title_text: str, selected, runtime, record_producers, summaries: dict[str, str] ) -> None: - inspection_runtime = _load("reader.workbench.inspection.runtime") - listing = table(title_text) - listing.add_column("#", justify="right", style="muted") - listing.add_column("id", style="accent", overflow="fold") - listing.add_column("summary", overflow="fold") - listing.add_column("from", overflow="fold") - listing.add_column("plugin", overflow="fold") - for index, spec in enumerate(selected, 1): - spec_payload = inspection_runtime.spec_step_payload( - spec, summary=summaries.get(spec.id, "—"), runtime=runtime, record_producers=record_producers - ) - from_refs = ", ".join(inspection_runtime.render_read_binding(item) for item in spec_payload["reads"]) or "—" - listing.add_row(str(index), spec.id, summaries.get(spec.id, "—"), from_refs, spec.plugin) - shared.console.print( - Panel(listing, border_style="accent", box=box.ROUNDED, subtitle=f"[muted]{len(selected)} total[/muted]") + _surface_execution.render_surface_specs_table( + title_text=title_text, + selected=selected, + runtime=runtime, + record_producers=record_producers, + summaries=summaries, ) def _surface_next_steps(*, job_hint: str | None, output_dir: Path, include_plot: bool, include_export: bool) -> None: - def _cmd(base: str, tail: str = "") -> str: - return reader_command(base, job_hint, tail) - - lines = [f"Artifacts saved in [path]{output_dir}[/path]", "", "Next steps:"] - if include_plot: - lines.append(f" {_cmd('plot')}") - if include_export: - lines.append(f" {_cmd('export')}") - lines.append(f" {_cmd('notebook')}") - shared.console.print(Panel.fit("\n".join(lines), border_style="green", box=box.ROUNDED)) + _surface_execution.surface_next_steps( + job_hint=job_hint, + output_dir=output_dir, + include_plot=include_plot, + include_export=include_export, + ) def _run_plot_job( @@ -90,116 +128,21 @@ def _run_plot_job( inputs: list[str] | None, sets: list[str] | None, ) -> None: - _, decl = load_job_models(job_path) - runtime = _load("reader.runtime").builtin_runtime() - workbench = _load("reader.workbench.graph").resolve_workbench(decl) - bound_protocol = bind_decl_protocol(decl=decl, runtime=runtime) - inspection_catalogs = _load("reader.workbench.inspection.catalogs") - inspection_runtime = _load("reader.workbench.inspection.runtime") - fmt = normalize_output_format(format) - if not list_only: - if fmt == "json": - raise typer.BadParameter("--format json is only supported with --list") - if not dry_run: - require_dataframe_records(decl, job_path, runtime=runtime) - plot_specs = list(workbench.plots) - record_producers = inspection_runtime.record_producer_map(workbench.plugin_steps(), runtime=runtime) - if not plot_specs: - if list_only: - if fmt == "json": - emit_json( - inspection_catalogs.workbench_surface_specs_payload( - job_path=job_path, - decl=decl, - runtime=runtime, - bound_protocol=bound_protocol, - selected=[], - kind="plot", - only=only or [], - exclude=exclude or [], - ) - ) - return - shared.console.print( - Panel.fit("No plot specs configured in this experiment.", border_style="warn", box=box.ROUNDED) - ) - return - raise typer.BadParameter("No plot specs configured in this experiment. Add plots to the config.") - selected = _spec_overrides().select_surface_specs( - plot_specs, only=only or [], exclude=exclude or [], kind="plot spec" - ) - if list_only: - if fmt == "json": - emit_json( - inspection_catalogs.workbench_surface_specs_payload( - job_path=job_path, - decl=decl, - runtime=runtime, - bound_protocol=bound_protocol, - selected=selected, - kind="plot", - only=only or [], - exclude=exclude or [], - ) - ) - return - _render_surface_specs_table( - title_text="Plots", - selected=selected, - runtime=runtime, - record_producers=record_producers, - summaries=inspection_runtime.plot_output_summaries(bound_protocol), - ) - return - experiment_root = decl.experiment.root - resources = decl.experiment_semantics.resources - spec_overrides = _spec_overrides() - input_overrides = spec_overrides.parse_input_overrides(inputs or [], root=experiment_root, resources=resources) - set_overrides = spec_overrides.parse_set_overrides(sets or []) - selected = spec_overrides.apply_step_overrides( - selected, - input_overrides=input_overrides, - set_overrides=set_overrides, - root=experiment_root, - resources=resources, - ) - append_journal( + _surface_execution.run_plot_job( job_path, - " ".join( - spec_overrides.build_surface_command( - "reader plot", - job_path, - only=only, - exclude=exclude, - list_only=False, - dry_run=dry_run, - log_level=log_level, - inputs=inputs, - sets=sets, - ) - ), - ) - _load("reader.workbench.engine").run_spec( - decl, + job_hint=job_hint, + only=only, + exclude=exclude, + list_only=list_only, + format=format, dry_run=dry_run, log_level=log_level, - console=shared.console, - include_pipeline=False, - include_plots=True, - include_exports=False, - plot_specs=selected, - runtime=runtime, + inputs=inputs, + sets=sets, + ensure_active_lifecycle_fn=ensure_active_lifecycle, + require_dataframe_records_fn=require_dataframe_records, + append_journal_fn=append_journal, ) - if not dry_run: - outputs_dir = decl.experiment_semantics.layout.outputs_dir - plots_cfg = decl.experiment_semantics.layout.plots_subdir - plots_dir = outputs_dir if plots_cfg in ("", ".", "./") else outputs_dir / str(plots_cfg) - _surface_next_steps( - job_hint=format_job_arg(job_hint), - output_dir=plots_dir, - include_plot=False, - include_export=bool(workbench.exports), - ) def _run_export_job( @@ -215,123 +158,29 @@ def _run_export_job( inputs: list[str] | None, sets: list[str] | None, ) -> None: - _, decl = load_job_models(job_path) - fmt = normalize_output_format(format) - workbench = _load("reader.workbench.graph").resolve_workbench(decl) - runtime = _load("reader.runtime").builtin_runtime() - inspection_catalogs = _load("reader.workbench.inspection.catalogs") - inspection_runtime = _load("reader.workbench.inspection.runtime") - record_producers = inspection_runtime.record_producer_map(workbench.plugin_steps(), runtime=runtime) - bound_protocol = bind_decl_protocol(decl=decl, runtime=runtime) - export_specs = list(workbench.exports) - if not export_specs: - if list_only: - if fmt == "json": - emit_json( - inspection_catalogs.workbench_surface_specs_payload( - job_path=job_path, - decl=decl, - runtime=runtime, - bound_protocol=bound_protocol, - selected=[], - kind="export", - only=only or [], - exclude=exclude or [], - ) - ) - return - shared.console.print( - Panel.fit("No export specs configured in this experiment.", border_style="warn", box=box.ROUNDED) - ) - return - raise typer.BadParameter("No export specs configured in this experiment. Add exports to the config.") - selected = _spec_overrides().select_surface_specs( - export_specs, only=only or [], exclude=exclude or [], kind="export spec" - ) - if list_only: - if fmt == "json": - emit_json( - inspection_catalogs.workbench_surface_specs_payload( - job_path=job_path, - decl=decl, - runtime=runtime, - bound_protocol=bound_protocol, - selected=selected, - kind="export", - only=only or [], - exclude=exclude or [], - ) - ) - return - _render_surface_specs_table( - title_text="Exports", - selected=selected, - runtime=runtime, - record_producers=record_producers, - summaries=inspection_runtime.export_output_summaries(bound_protocol), - ) - return - if fmt == "json": - raise typer.BadParameter("--format json is only supported with --list") - if not dry_run: - require_dataframe_records(decl, job_path, runtime=runtime) - experiment_root = decl.experiment.root - resources = decl.experiment_semantics.resources - spec_overrides = _spec_overrides() - input_overrides = spec_overrides.parse_input_overrides(inputs or [], root=experiment_root, resources=resources) - set_overrides = spec_overrides.parse_set_overrides(sets or []) - selected = spec_overrides.apply_step_overrides( - selected, - input_overrides=input_overrides, - set_overrides=set_overrides, - root=experiment_root, - resources=resources, - ) - append_journal( + _surface_execution.run_export_job( job_path, - " ".join( - spec_overrides.build_surface_command( - "reader export", - job_path, - only=only, - exclude=exclude, - list_only=False, - dry_run=dry_run, - log_level=log_level, - inputs=inputs, - sets=sets, - ) - ), - ) - _load("reader.workbench.engine").run_spec( - decl, + job_hint=job_hint, + only=only, + exclude=exclude, + list_only=list_only, + format=format, dry_run=dry_run, log_level=log_level, - console=shared.console, - include_pipeline=False, - include_plots=False, - include_exports=True, - export_specs=selected, - runtime=runtime, + inputs=inputs, + sets=sets, + ensure_active_lifecycle_fn=ensure_active_lifecycle, + require_dataframe_records_fn=require_dataframe_records, + append_journal_fn=append_journal, ) - if not dry_run: - outputs_dir = decl.experiment_semantics.layout.outputs_dir - exports_cfg = decl.experiment_semantics.layout.exports_subdir - exports_dir = outputs_dir if exports_cfg in ("", ".", "./") else outputs_dir / str(exports_cfg) - _surface_next_steps( - job_hint=format_job_arg(job_hint), - output_dir=exports_dir, - include_plot=bool(workbench.plots), - include_export=False, - ) -@app.command(help="Save plot files from plot specs using existing dataframe records.") +@app.command(help="List plot specs or save plot files from existing dataframe records.") def plot( job: str | None = typer.Argument( None, metavar="CONFIG|DIR|INDEX", - help="Experiment config path, directory, or index from 'uv run reader ls'.", + help=shared.JOB_ARG_HELP_SHORT, ), year: str | None = typer.Option( None, "--year", metavar="YYYY", help="Run plots for all experiments under experiments/YYYY." @@ -362,6 +211,7 @@ def plot( ): if root and not year: raise typer.BadParameter("--root is only valid with --year") + _validate_list_mode_flags(list_only=list_only, dry_run=dry_run, inputs=inputs, sets=sets) fmt = normalize_output_format(format) if year: if fmt == "json": @@ -370,6 +220,24 @@ def plot( raise typer.BadParameter("--year cannot be combined with CONFIG|DIR|INDEX") root_path = find_nearest_experiments_dir(Path.cwd()) if root is None else Path(root).resolve() jobs = find_year_jobs(year, root_path) + if not list_only: + failures: list[tuple[Path, str]] = [] + for job_path in jobs: + try: + _validate_plot_job_for_execution( + job_path, + only=only, + exclude=exclude, + dry_run=dry_run, + inputs=inputs, + sets=sets, + ) + except (ReaderError, typer.BadParameter) as exc: + failures.append((job_path, str(exc))) + if failures: + lines = [f"{len(failures)} experiment(s) are not runnable for year {year}:"] + lines += [f"- {path.parent.name}: {msg}" for path, msg in failures] + abort("\n".join(lines)) shared.console.print( Panel.fit( f"Plotting {len(jobs)} experiment(s) for {year} under [path]{root_path}[/path].", @@ -438,12 +306,12 @@ def plot( handle_reader_error(err) -@app.command(help="Run export specs using existing dataframe records.") +@app.command(help="List export specs or write export files from existing dataframe records.") def export( job: str | None = typer.Argument( None, metavar="CONFIG|DIR|INDEX", - help="Experiment config path, directory, or index from 'uv run reader ls'.", + help=shared.JOB_ARG_HELP_SHORT, ), only: list[str] = EXPORT_ONLY_OPTION, exclude: list[str] = EXPORT_EXCLUDE_OPTION, @@ -464,6 +332,7 @@ def export( sets: list[str] = EXPORT_SET_OPTION, ): try: + _validate_list_mode_flags(list_only=list_only, dry_run=dry_run, inputs=inputs, sets=sets) job_path = infer_job_path(job) _run_export_job( job_path, @@ -481,12 +350,12 @@ def export( handle_reader_error(err) -@app.command(help="List emitted workbench records from outputs/manifests/records.json.") +@app.command(help="List records from outputs/manifests/records.json.") def records( job: str | None = typer.Argument( None, metavar="[CONFIG]", - help="Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' (defaults to nearest ./config.yaml)", + help=shared.JOB_ARG_HELP_WITH_DEFAULT, ), all: bool = typer.Option(False, "--all", help="Show revision history counts instead of latest entries."), format: str = typer.Option( @@ -495,102 +364,17 @@ def records( ): try: job_path = infer_job_path(job) - _, decl = load_job_models(job_path) - outputs_dir = decl.experiment_semantics.layout.outputs_dir - store = ( - _load("reader.runtime") - .builtin_runtime() - .record_store( - outputs_dir, - plots_subdir=decl.experiment_semantics.layout.plots_subdir, - exports_subdir=decl.experiment_semantics.layout.exports_subdir, - create=False, - ) - ) - if not store.catalog_exists(): - abort(f"No outputs/manifests/records.json found. Run '{reader_command('run')}' first to produce records.") - except ReaderError as err: - handle_reader_error(err) - fmt = normalize_output_format(format) - if fmt == "json": - try: - emit_json( - _load("reader.workbench.inspection.results").record_catalog_payload( - experiment=_load("reader.workbench.inspection.experiments").experiment_identity_payload( - job_path=job_path, decl=decl - ), - store=store, - outputs_dir=outputs_dir, - base=decl.experiment.root, - include_history=all, - ) - ) - except ReaderError as err: - handle_reader_error(err) - return - - try: - latest_records = store.iter_latest_records() + _records_view.render_records(job_path=job_path, all_revisions=all, format=format) except ReaderError as err: handle_reader_error(err) - if all: - if not latest_records: - shared.console.print( - Panel.fit( - f"No record history listed in outputs/manifests/records.json. Run '{reader_command('run')}' first.", - border_style="warn", - box=box.ROUNDED, - ) - ) - return - try: - revision_counts = store.revision_counts(record.record_id for record in latest_records) - except ReaderError as err: - handle_reader_error(err) - listing = table("Records • history") - listing.add_column("Record") - listing.add_column("Kind", style="accent") - listing.add_column("Producer") - listing.add_column("Revisions", justify="right") - for record in latest_records: - listing.add_row( - record.record_id, - record.kind, - f"{record.producer.kind}:{record.producer.id}", - str(revision_counts[record.record_id]), - ) - else: - if not latest_records: - shared.console.print( - Panel.fit( - f"No records listed in outputs/manifests/records.json. Run '{reader_command('run')}' first.", - border_style="warn", - box=box.ROUNDED, - ) - ) - return - listing = table("Records • latest") - listing.add_column("Record") - listing.add_column("Kind", style="accent") - listing.add_column("Producer") - listing.add_column("Details", style="path") - for record in latest_records: - detail = ( - f"{record.contract_id} • {record.path}" - if record.kind == "dataframe_artifact" - else ", ".join(str(path) for path in record.files) - ) - listing.add_row(record.record_id, record.kind, f"{record.producer.kind}:{record.producer.id}", detail) - shared.console.print(Panel(listing, border_style="accent", box=box.ROUNDED)) - @app.command(help="List pipeline steps and bindings for a config.") def steps( job: str | None = typer.Argument( None, metavar="[CONFIG]", - help="Path to config.yaml • experiment directory • or numeric index from 'uv run reader ls' (defaults to nearest ./config.yaml)", + help=shared.JOB_ARG_HELP_WITH_DEFAULT, ), format: str = typer.Option( "table", "--format", metavar="FMT", help="Output format: table | json (default: table)." @@ -633,7 +417,7 @@ def steps( ) -@app.command(help="List plugins by workbench ontology: category, domain, and family.") +@app.command(help="List plugins by category, domain, and family.") def plugins( category: str | None = typer.Option( None, "--category", metavar="NAME", help="Filter by category: ingest | transform | plot | export | validator" @@ -642,16 +426,16 @@ def plugins( None, "--domain", metavar="NAME", - help="Filter by semantic domain, for example: plate_reader | cytometry | logic | generic", + help="Filter by domain, for example: plate_reader | cytometry | logic | generic", ), family: str | None = typer.Option( None, "--family", metavar="NAME", - help="Filter by semantic family, for example: time_series | metadata_merge | workbook_ingest", + help="Filter by family, for example: time_series | metadata_merge | workbook_ingest", ), protocol: str | None = typer.Option( - None, "--protocol", metavar="ID", help="Limit to plugins used by the named protocol's default compiled plan." + None, "--protocol", metavar="ID", help="Limit to plugins used by the named protocol's default plan." ), format: str = typer.Option( "table", "--format", metavar="FMT", help="Output format: table | json (default: table)." diff --git a/src/reader/workbench/dop/__init__.py b/src/reader/workbench/dop/__init__.py new file mode 100644 index 0000000..6e6fa57 --- /dev/null +++ b/src/reader/workbench/dop/__init__.py @@ -0,0 +1,12 @@ +from .builtins import BUILTIN_DATA_CLASSES, BUILTIN_READY_SPECS, DOP_SCHEMA, builtin_dop_registry +from .model import DataClassSpec, DopRegistry, ReadySpec + +__all__ = [ + "BUILTIN_DATA_CLASSES", + "BUILTIN_READY_SPECS", + "DOP_SCHEMA", + "DataClassSpec", + "DopRegistry", + "ReadySpec", + "builtin_dop_registry", +] diff --git a/src/reader/workbench/dop/builtins.py b/src/reader/workbench/dop/builtins.py new file mode 100644 index 0000000..5d55c6e --- /dev/null +++ b/src/reader/workbench/dop/builtins.py @@ -0,0 +1,259 @@ +from __future__ import annotations + +from functools import cache + +from .model import DataClassSpec, DopRegistry, ReadySpec + +DOP_SCHEMA = "reader.dop/v1" + +BUILTIN_DATA_CLASSES: tuple[DataClassSpec, ...] = ( + DataClassSpec( + id="plate_reader_screen", + label="Plate-reader screen", + summary="Well-level plate-reader assay data with explicit channel, treatment, control, and plate/well semantics.", + decision_order=10, + protocol_candidates=( + "plate_reader/retron_sponge_screen", + "plate_reader/dual_reporter_screen", + "plate_reader/single_reporter_screen", + ), + minimum_capture=( + "raw plate-reader workbook or export", + "sample map with measured well coverage", + "channel labels and denominator/ratio meaning", + "treatment and control semantics", + "plate, well, replicate, and design identifiers", + ), + stop_conditions=( + "well coordinates or sample positions conflict", + "treatment or control meaning is incomplete", + "channel labels drift from the selected protocol", + "nearest protocol would silently change control semantics", + ), + transfer_rules=( + "stage the original workbook or export under inputs/", + "bind metadata files through resources", + "regenerate plots, exports, and records from source inputs", + ), + verification=( + "config schema and protocol binding validate", + "declared raw files and metadata resources exist", + "records catalog captures generated dataframe evidence", + ), + ), + DataClassSpec( + id="flow_cytometry_panel", + label="Flow-cytometry panel", + summary="Cytometry panel data with explicit FCS roots, channel naming, and sample metadata.", + decision_order=20, + protocol_candidates=("cytometry/flow_panel",), + minimum_capture=( + "raw FCS roots or files", + "channel naming field", + "sample metadata", + "required panel metadata columns", + ), + stop_conditions=( + "FCS root or file mapping is ambiguous", + "channel naming field is unknown", + "sample metadata cannot be joined to events", + ), + transfer_rules=( + "stage raw FCS material under inputs/", + "bind FCS roots and metadata through resources", + "keep generated review notebooks under outputs/notebooks/", + ), + verification=( + "FCS resources resolve", + "configured metadata columns are present", + "records catalog captures panel dataframe evidence", + ), + ), + DataClassSpec( + id="logic_sfxi_analysis", + label="Logic/SFXI analysis", + summary="Logic-response or SFXI-style assay data with response/intensity channels and logic-map corners.", + decision_order=30, + protocol_candidates=("logic/sfxi_screen",), + minimum_capture=( + "raw assay files", + "metadata map", + "response and intensity channel choices", + "reference design", + "logic-map corners", + ), + stop_conditions=( + "reference design cannot be reconstructed", + "logic-map corners are missing or contradictory", + "response or intensity channel choices are ambiguous", + ), + transfer_rules=( + "stage raw files under inputs/", + "encode logic maps in annotations or metadata resources", + "regenerate SFXI summaries from source inputs", + ), + verification=( + "logic reference config validates", + "logic-map annotations are present", + "records catalog captures vec8 summary evidence", + ), + ), + DataClassSpec( + id="aggregate_review_workspace", + label="Aggregate/review workspace", + summary="Review material assembled from prior reader records, plots, exports, or hand-authored notes.", + decision_order=40, + protocol_candidates=("workbench/generic",), + minimum_capture=( + "source experiment ids", + "record, plot, or export paths", + "review purpose", + "expected notebook template", + ), + stop_conditions=( + "source experiment ids are unknown", + "review material mixes generated outputs without source records", + "notebook purpose is unclear", + ), + transfer_rules=( + "reference source experiments instead of copying generated outputs", + "keep hand-authored notes under notebooks/", + "keep generated scaffolds under outputs/notebooks/", + ), + verification=( + "source records or exports are identifiable", + "review notebook template is explicit", + "unresolved assumptions are visible in the handoff", + ), + ), + DataClassSpec( + id="unsupported_long_tail_assay", + label="Unsupported long-tail assay", + summary="Assay data that does not yet fit an existing executable protocol contract.", + decision_order=50, + protocol_candidates=("workbench/generic",), + minimum_capture=( + "raw source path", + "intended analysis", + "required metadata", + "missing protocol decision", + "owner for follow-up", + ), + stop_conditions=( + "nearest protocol would change assay meaning", + "required metadata is unknown", + "execution contract is still being discovered", + ), + transfer_rules=( + "keep the experiment draft or template until semantics are clear", + "stage raw files without pretending they are runnable", + "add a protocol only after the metadata and execution contract stabilize", + ), + verification=( + "draft/template config shape validates with --no-files", + "missing protocol or metadata contract is documented", + "no generated outputs are treated as source material", + ), + ), +) + +BUILTIN_READY_SPECS: tuple[ReadySpec, ...] = ( + ReadySpec( + id="classified", + label="Classified", + summary="Dataset has a selected DOP data class and protocol candidate set.", + required_evidence=( + "DOP data class id", + "candidate reader protocol ids", + "reason the selected class fits the dataset", + ), + commands=("uv run reader dop classes",), + ), + ReadySpec( + id="metadata_ready", + label="Metadata ready", + summary="Required semantics for the selected data class have been captured before execution.", + required_evidence=( + "dataset identity", + "raw provenance", + "assay semantics", + "sample map", + "control semantics", + "canonical labels", + "requested outputs when they differ from protocol defaults", + ), + commands=("uv run reader protocols --example-config",), + ), + ReadySpec( + id="staged", + label="Staged", + summary="Raw files, metadata resources, and hand-authored notes live in the standard experiment layout.", + required_evidence=( + "raw files under inputs/", + "resources entries for consumed files or directories", + "hand-authored notes under notebooks/ when present", + "no copied generated outputs used as source material", + ), + commands=("uv run reader validate --no-files --format json",), + ), + ReadySpec( + id="preflight_ok", + label="Preflight OK", + summary="Schema, protocol binding, declared files, and dependencies pass reader validation.", + required_evidence=( + "config schema is reader/v7", + "protocol binding resolves", + "declared files and resources exist", + "runtime dependencies are available", + ), + accepted_readiness_states=("runnable", "legacy_outputs_present", "records_ready"), + commands=("uv run reader validate --format json",), + ), + ReadySpec( + id="runnable", + label="Runnable", + summary="The experiment is active and can run from authored source inputs.", + required_evidence=( + "reader readiness state allows run", + "run capability is true", + "next command is explicit", + ), + accepted_readiness_states=("runnable", "legacy_outputs_present", "records_ready"), + required_capabilities=("run",), + commands=("uv run reader run --dry-run --format json",), + ), + ReadySpec( + id="records_ready", + label="Records ready", + summary="Generated dataframe and file-bundle evidence is present in the records catalog.", + required_evidence=( + "records catalog exists", + "dataframe artifacts include contract ids", + "records include input and config digests", + ), + accepted_readiness_states=("records_ready",), + required_capabilities=("records",), + commands=("uv run reader records ",), + ), + ReadySpec( + id="review_ready", + label="Review ready", + summary="Records exist and review surfaces such as plots or notebooks can be inspected deliberately.", + required_evidence=( + "records catalog exists", + "selected plots or notebook scaffold are protocol-compatible", + "unresolved metadata assumptions remain visible in the handoff", + ), + accepted_readiness_states=("records_ready",), + required_capabilities=("records", "plot", "notebook_scaffold"), + commands=( + "uv run reader plot --list", + "uv run reader notebook --mode none", + ), + ), +) + + +@cache +def builtin_dop_registry() -> DopRegistry: + return DopRegistry(data_classes=BUILTIN_DATA_CLASSES, ready_specs=BUILTIN_READY_SPECS) diff --git a/src/reader/workbench/dop/model.py b/src/reader/workbench/dop/model.py new file mode 100644 index 0000000..8ac46b3 --- /dev/null +++ b/src/reader/workbench/dop/model.py @@ -0,0 +1,209 @@ +from __future__ import annotations + +from collections.abc import Iterable +from dataclasses import dataclass + + +def _clean_string(value: str, *, field_name: str) -> str: + if not isinstance(value, str) or not value.strip(): + raise ValueError(f"{field_name} must be a non-empty string.") + return value.strip() + + +def _clean_tuple(values: Iterable[str], *, field_name: str, allow_empty: bool = False) -> tuple[str, ...]: + cleaned = tuple(str(value).strip() for value in values if str(value).strip()) + if not allow_empty and not cleaned: + raise ValueError(f"{field_name} must include at least one value.") + if len(set(cleaned)) != len(cleaned): + raise ValueError(f"{field_name} must not include duplicate values.") + return cleaned + + +@dataclass(frozen=True) +class DataClassSpec: + id: str + label: str + summary: str + decision_order: int + protocol_candidates: tuple[str, ...] + minimum_capture: tuple[str, ...] + stop_conditions: tuple[str, ...] + transfer_rules: tuple[str, ...] + verification: tuple[str, ...] + + def __post_init__(self) -> None: + object.__setattr__(self, "id", _clean_string(self.id, field_name="DataClassSpec.id")) + object.__setattr__(self, "label", _clean_string(self.label, field_name="DataClassSpec.label")) + object.__setattr__(self, "summary", _clean_string(self.summary, field_name="DataClassSpec.summary")) + if not isinstance(self.decision_order, int) or self.decision_order < 0: + raise ValueError("DataClassSpec.decision_order must be a non-negative integer.") + object.__setattr__( + self, + "protocol_candidates", + _clean_tuple(self.protocol_candidates, field_name="DataClassSpec.protocol_candidates"), + ) + object.__setattr__( + self, + "minimum_capture", + _clean_tuple(self.minimum_capture, field_name="DataClassSpec.minimum_capture"), + ) + object.__setattr__( + self, + "stop_conditions", + _clean_tuple(self.stop_conditions, field_name="DataClassSpec.stop_conditions"), + ) + object.__setattr__( + self, + "transfer_rules", + _clean_tuple(self.transfer_rules, field_name="DataClassSpec.transfer_rules"), + ) + object.__setattr__( + self, + "verification", + _clean_tuple(self.verification, field_name="DataClassSpec.verification"), + ) + + def to_payload(self) -> dict[str, object]: + return { + "id": self.id, + "label": self.label, + "summary": self.summary, + "decision_order": self.decision_order, + "protocol_candidates": list(self.protocol_candidates), + "minimum_capture": list(self.minimum_capture), + "stop_conditions": list(self.stop_conditions), + "transfer_rules": list(self.transfer_rules), + "verification": list(self.verification), + } + + +@dataclass(frozen=True) +class ReadySpec: + id: str + label: str + summary: str + required_evidence: tuple[str, ...] + accepted_readiness_states: tuple[str, ...] = () + required_capabilities: tuple[str, ...] = () + commands: tuple[str, ...] = () + + def __post_init__(self) -> None: + object.__setattr__(self, "id", _clean_string(self.id, field_name="ReadySpec.id")) + object.__setattr__(self, "label", _clean_string(self.label, field_name="ReadySpec.label")) + object.__setattr__(self, "summary", _clean_string(self.summary, field_name="ReadySpec.summary")) + object.__setattr__( + self, + "required_evidence", + _clean_tuple(self.required_evidence, field_name="ReadySpec.required_evidence"), + ) + object.__setattr__( + self, + "accepted_readiness_states", + _clean_tuple( + self.accepted_readiness_states, + field_name="ReadySpec.accepted_readiness_states", + allow_empty=True, + ), + ) + object.__setattr__( + self, + "required_capabilities", + _clean_tuple( + self.required_capabilities, + field_name="ReadySpec.required_capabilities", + allow_empty=True, + ), + ) + object.__setattr__( + self, + "commands", + _clean_tuple(self.commands, field_name="ReadySpec.commands", allow_empty=True), + ) + + def to_payload(self) -> dict[str, object]: + return { + "id": self.id, + "label": self.label, + "summary": self.summary, + "required_evidence": list(self.required_evidence), + "accepted_readiness_states": list(self.accepted_readiness_states), + "required_capabilities": list(self.required_capabilities), + "commands": list(self.commands), + } + + +class DopRegistry: + def __init__(self, *, data_classes: Iterable[DataClassSpec], ready_specs: Iterable[ReadySpec]): + data_class_items = tuple(sorted(data_classes, key=lambda item: item.decision_order)) + ready_spec_items = tuple(ready_specs) + self._data_classes = data_class_items + self._ready_specs = ready_spec_items + self._data_classes_by_id = _index_by_id(data_class_items, kind="DOP data class") + self._ready_specs_by_id = _index_by_id(ready_spec_items, kind="DOP ready spec") + orders = [item.decision_order for item in data_class_items] + if len(set(orders)) != len(orders): + raise ValueError("DOP data class decision_order values must be unique.") + + def data_classes(self) -> tuple[DataClassSpec, ...]: + return self._data_classes + + def ready_specs(self) -> tuple[ReadySpec, ...]: + return self._ready_specs + + def data_class(self, data_class_id: str) -> DataClassSpec: + key = str(data_class_id).strip() + try: + return self._data_classes_by_id[key] + except KeyError: + options = ", ".join(sorted(self._data_classes_by_id)) or "—" + raise ValueError(f"Unknown DOP data class {data_class_id!r}. Available classes: {options}") from None + + def ready_spec(self, ready_spec_id: str) -> ReadySpec: + key = str(ready_spec_id).strip() + try: + return self._ready_specs_by_id[key] + except KeyError: + options = ", ".join(sorted(self._ready_specs_by_id)) or "—" + raise ValueError(f"Unknown DOP ready spec {ready_spec_id!r}. Available specs: {options}") from None + + def data_classes_for_protocol(self, protocol_id: str) -> tuple[DataClassSpec, ...]: + key = str(protocol_id).strip() + return tuple(item for item in self._data_classes if key in item.protocol_candidates) + + def validate_protocol_refs(self, protocol_ids: Iterable[str]) -> None: + known = {str(protocol_id).strip() for protocol_id in protocol_ids if str(protocol_id).strip()} + referenced = { + protocol_id for data_class in self._data_classes for protocol_id in data_class.protocol_candidates + } + missing = sorted(referenced - known) + if missing: + raise ValueError("DOP registry references unknown protocol ids: " + ", ".join(missing)) + + def validate_ready_refs(self, *, readiness_states: Iterable[str], capability_keys: Iterable[str]) -> None: + known_states = {str(state).strip() for state in readiness_states if str(state).strip()} + known_capabilities = {str(capability).strip() for capability in capability_keys if str(capability).strip()} + missing_states = sorted( + state for spec in self._ready_specs for state in spec.accepted_readiness_states if state not in known_states + ) + missing_capabilities = sorted( + capability + for spec in self._ready_specs + for capability in spec.required_capabilities + if capability not in known_capabilities + ) + errors = [] + if missing_states: + errors.append("unknown readiness states: " + ", ".join(missing_states)) + if missing_capabilities: + errors.append("unknown readiness capabilities: " + ", ".join(missing_capabilities)) + if errors: + raise ValueError("DOP ready specs reference " + "; ".join(errors)) + + +def _index_by_id(items: Iterable[DataClassSpec] | Iterable[ReadySpec], *, kind: str): + indexed = {} + for item in items: + if item.id in indexed: + raise ValueError(f"Duplicate {kind} id {item.id!r}.") + indexed[item.id] = item + return indexed diff --git a/src/reader/workbench/engine/planning.py b/src/reader/workbench/engine/planning.py index 122b24d..875f9c2 100644 --- a/src/reader/workbench/engine/planning.py +++ b/src/reader/workbench/engine/planning.py @@ -35,7 +35,7 @@ def _cmd(base: str, tail: str = "") -> str: configured_template=(notebook_specs[0].template if notebook_specs else None) ) require_notebook_template_for_protocol(notebook_template, protocol=bound_protocol) - steps.append((_cmd("records"), "Review generated workbench records (QC)")) + steps.append((_cmd("records"), "Review generated records")) if plot_specs: steps.append((_cmd("plot"), "Save plot files to outputs/plots")) if export_specs: diff --git a/src/reader/workbench/experiment/model.py b/src/reader/workbench/experiment/model.py index 95d3bb6..c8472f4 100644 --- a/src/reader/workbench/experiment/model.py +++ b/src/reader/workbench/experiment/model.py @@ -278,4 +278,11 @@ class ExperimentSemantics: annotations: AnnotationSemantics resources: ResourceCatalog layout: OutputLayout - protocol_program: ProtocolSemanticProgram | None = None + protocol_program: ProtocolSemanticProgram + + def __post_init__(self) -> None: + if self.protocol_program.protocol != self.protocol.id: + raise ValueError( + "ExperimentSemantics.protocol_program must target the bound protocol " + f"{self.protocol.id!r}, got {self.protocol_program.protocol!r}." + ) diff --git a/src/reader/workbench/inspection/__init__.py b/src/reader/workbench/inspection/__init__.py index 511f765..6b3ddac 100644 --- a/src/reader/workbench/inspection/__init__.py +++ b/src/reader/workbench/inspection/__init__.py @@ -1,3 +1,3 @@ -"""Inspection submodules for workbench discovery, reporting, and readiness.""" +"""Inspection helpers for experiment discovery, reporting, and readiness.""" __all__: list[str] = [] diff --git a/src/reader/workbench/inspection/dop.py b/src/reader/workbench/inspection/dop.py new file mode 100644 index 0000000..c4e0391 --- /dev/null +++ b/src/reader/workbench/inspection/dop.py @@ -0,0 +1,64 @@ +from __future__ import annotations + +from rich import box +from rich.table import Table + +from reader.workbench.dop import DOP_SCHEMA, DataClassSpec, ReadySpec + + +def data_classes_payload(data_classes: tuple[DataClassSpec, ...]) -> dict[str, object]: + return { + "schema": DOP_SCHEMA, + "data_classes": [item.to_payload() for item in data_classes], + } + + +def ready_specs_payload(ready_specs: tuple[ReadySpec, ...]) -> dict[str, object]: + return { + "schema": DOP_SCHEMA, + "ready_specs": [item.to_payload() for item in ready_specs], + } + + +def _table(title: str) -> Table: + return Table( + title=f"[title]{title}[/title]", + title_justify="left", + header_style="bold", + box=box.ROUNDED, + expand=True, + show_lines=False, + show_edge=True, + ) + + +def data_classes_table(data_classes: tuple[DataClassSpec, ...]) -> Table: + table = _table("DOP Data Classes") + table.add_column("ID", style="accent", overflow="fold") + table.add_column("Protocols", overflow="fold") + table.add_column("Minimum capture", overflow="fold") + table.add_column("Stop when", overflow="fold") + for item in data_classes: + table.add_row( + item.id, + ", ".join(item.protocol_candidates), + "; ".join(item.minimum_capture), + "; ".join(item.stop_conditions), + ) + return table + + +def ready_specs_table(ready_specs: tuple[ReadySpec, ...]) -> Table: + table = _table("DOP Ready Specs") + table.add_column("ID", style="accent", overflow="fold") + table.add_column("Evidence", overflow="fold") + table.add_column("Reader states", overflow="fold") + table.add_column("Capabilities", overflow="fold") + for item in ready_specs: + table.add_row( + item.id, + "; ".join(item.required_evidence), + ", ".join(item.accepted_readiness_states) or "—", + ", ".join(item.required_capabilities) or "—", + ) + return table diff --git a/src/reader/workbench/inspection/experiments.py b/src/reader/workbench/inspection/experiments.py index bb32e12..3ce5ae3 100644 --- a/src/reader/workbench/inspection/experiments.py +++ b/src/reader/workbench/inspection/experiments.py @@ -76,7 +76,7 @@ def experiment_surface_payload( "experiment": deepcopy(experiment), "authoring": deepcopy(authoring), "semantics": { - "program": semantic_program_payload(semantic_program) if semantic_program is not None else None, + "program": semantic_program_payload(semantic_program, include_execution=False), }, "implementation": deepcopy(implementation), } @@ -140,10 +140,21 @@ def experiment_explain_payload( export_steps = list(workbench.exports) notebook_steps = list(workbench.notebooks) record_producers = record_producer_map(workbench.plugin_steps(), runtime=runtime) + semantic_program = decl.experiment_semantics.protocol_program + compiled_payload = compiled_workbench_payload( + bound_protocol=bound_protocol, + pipeline_steps=pipeline_steps, + plot_steps=plot_steps, + export_steps=export_steps, + notebook_steps=notebook_steps, + runtime=runtime, + record_producers=record_producers, + ) + compiled_payload["semantic_program"] = semantic_program_payload(semantic_program) return experiment_surface_payload( experiment=experiment_payload, authoring=authoring_payload, - semantic_program=decl.experiment_semantics.protocol_program, + semantic_program=semantic_program, implementation=experiment_implementation_payload( plan=implementation_plan_payload( bound_protocol=bound_protocol, @@ -153,15 +164,7 @@ def experiment_explain_payload( export_steps=export_steps, notebook_steps=notebook_steps, ), - compiled=compiled_workbench_payload( - bound_protocol=bound_protocol, - pipeline_steps=pipeline_steps, - plot_steps=plot_steps, - export_steps=export_steps, - notebook_steps=notebook_steps, - runtime=runtime, - record_producers=record_producers, - ), + compiled=compiled_payload, ), ) @@ -226,21 +229,23 @@ def experiment_steps_payload( notebook_steps=[], ) plan_payload["pipeline_count"] = len(pipeline_steps) + semantic_program = decl.experiment_semantics.protocol_program + compiled_payload = { + "pipeline": [ + pipeline_step_payload(step, runtime=runtime, record_producers=record_producers) for step in pipeline_steps + ], + "plots": [], + "exports": [], + "notebooks": [], + } + compiled_payload["semantic_program"] = semantic_program_payload(semantic_program) return experiment_surface_payload( experiment=experiment_payload, authoring=authoring_payload, - semantic_program=decl.experiment_semantics.protocol_program, + semantic_program=semantic_program, implementation=experiment_implementation_payload( plan=plan_payload, - compiled={ - "pipeline": [ - pipeline_step_payload(step, runtime=runtime, record_producers=record_producers) - for step in pipeline_steps - ], - "plots": [], - "exports": [], - "notebooks": [], - }, + compiled=compiled_payload, ), ) @@ -264,10 +269,21 @@ def experiment_config_json_payload( export_steps = list(workbench.exports) notebook_steps = list(workbench.notebooks) record_producers = record_producer_map(workbench.plugin_steps(), runtime=runtime) + semantic_program = decl.experiment_semantics.protocol_program + compiled_payload = compiled_workbench_payload( + bound_protocol=bound_protocol, + pipeline_steps=pipeline_steps, + plot_steps=plot_steps, + export_steps=export_steps, + notebook_steps=notebook_steps, + runtime=runtime, + record_producers=record_producers, + ) + compiled_payload["semantic_program"] = semantic_program_payload(semantic_program) return experiment_surface_payload( experiment=experiment_payload, authoring=experiment_config_authoring_payload(document=spec.model_dump(by_alias=True)), - semantic_program=decl.experiment_semantics.protocol_program, + semantic_program=semantic_program, implementation=experiment_implementation_payload( plan=implementation_plan_payload( bound_protocol=bound_protocol, @@ -277,15 +293,7 @@ def experiment_config_json_payload( export_steps=export_steps, notebook_steps=notebook_steps, ), - compiled=compiled_workbench_payload( - bound_protocol=bound_protocol, - pipeline_steps=pipeline_steps, - plot_steps=plot_steps, - export_steps=export_steps, - notebook_steps=notebook_steps, - runtime=runtime, - record_producers=record_producers, - ), + compiled=compiled_payload, ), ) @@ -337,10 +345,21 @@ def experiment_inspect_payload( plot_steps = list(workbench.plots) export_steps = list(workbench.exports) notebook_steps = list(workbench.notebooks) + semantic_program = decl.experiment_semantics.protocol_program + compiled_payload = compiled_workbench_payload( + bound_protocol=bound_protocol, + pipeline_steps=pipeline_steps, + plot_steps=plot_steps, + export_steps=export_steps, + notebook_steps=notebook_steps, + runtime=runtime, + record_producers=record_producers, + ) + compiled_payload["semantic_program"] = semantic_program_payload(semantic_program) return experiment_surface_payload( experiment=experiment_payload, authoring=authoring_payload, - semantic_program=decl.experiment_semantics.protocol_program, + semantic_program=semantic_program, implementation=experiment_implementation_payload( plan=implementation_plan_payload( bound_protocol=bound_protocol, @@ -350,15 +369,7 @@ def experiment_inspect_payload( export_steps=export_steps, notebook_steps=notebook_steps, ), - compiled=compiled_workbench_payload( - bound_protocol=bound_protocol, - pipeline_steps=pipeline_steps, - plot_steps=plot_steps, - export_steps=export_steps, - notebook_steps=notebook_steps, - runtime=runtime, - record_producers=record_producers, - ), + compiled=compiled_payload, inputs={ "counts": { "files": input_file_count, diff --git a/src/reader/workbench/inspection/protocols.py b/src/reader/workbench/inspection/protocols.py index b72b9c1..aafc125 100644 --- a/src/reader/workbench/inspection/protocols.py +++ b/src/reader/workbench/inspection/protocols.py @@ -275,8 +275,18 @@ def protocol_runtime_defaults_payload(plugin_defaults) -> list[dict[str, object] def protocol_descriptor_payload(descriptor, *, runtime) -> dict[str, object]: bound_protocol = runtime.bind_protocol(ProtocolBinding(id=descriptor.protocol)) compiled_plan = bound_protocol.compile() - semantic_program = compiled_plan.semantic_program or descriptor.semantic_program() + semantic_program = compiled_plan.semantic_program record_producers = record_producer_map(compiled_plan.pipeline, runtime=runtime) + compiled_payload = compiled_workbench_payload( + bound_protocol=bound_protocol, + pipeline_steps=compiled_plan.pipeline, + plot_steps=compiled_plan.plots, + export_steps=compiled_plan.exports, + notebook_steps=compiled_plan.notebooks, + runtime=runtime, + record_producers=record_producers, + ) + compiled_payload["semantic_program"] = semantic_program_payload(semantic_program) return { "protocol": descriptor.protocol, "domain": descriptor.domain, @@ -308,18 +318,10 @@ def protocol_descriptor_payload(descriptor, *, runtime) -> dict[str, object]: } for item in descriptor.effect_signs ], - "program": semantic_program_payload(semantic_program), + "program": semantic_program_payload(semantic_program, include_execution=False), }, "implementation": { "defaults": protocol_runtime_defaults_payload(descriptor.execution.plugin_defaults), - "compiled": compiled_workbench_payload( - bound_protocol=bound_protocol, - pipeline_steps=compiled_plan.pipeline, - plot_steps=compiled_plan.plots, - export_steps=compiled_plan.exports, - notebook_steps=compiled_plan.notebooks, - runtime=runtime, - record_producers=record_producers, - ), + "compiled": compiled_payload, }, } diff --git a/src/reader/workbench/inspection/readiness.py b/src/reader/workbench/inspection/readiness.py index ca750b0..b6461bd 100644 --- a/src/reader/workbench/inspection/readiness.py +++ b/src/reader/workbench/inspection/readiness.py @@ -12,6 +12,29 @@ from .common import summarize_outputs_dir +READINESS_STATES = frozenset( + { + "config_error", + "draft", + "template", + "dependency_blocked", + "blocked", + "runnable", + "legacy_outputs_present", + "records_ready", + } +) +READINESS_CAPABILITY_KEYS = frozenset( + { + "run", + "records", + "plot", + "export", + "notebook_scaffold", + "notebook_scan_records", + } +) + def config_error_readiness_payload(error: str) -> dict[str, object]: return { @@ -43,7 +66,7 @@ def config_error_readiness_payload(error: str) -> dict[str, object]: def _non_active_lifecycle_payload(*, lifecycle: str, job_path: Path) -> dict[str, object]: next_step = { "command": reader_command("validate", job_path, "--no-files", "--format", "json"), - "description": "Inspect config shape without treating this non-active experiment as run-ready.", + "description": "Check the config shape without treating this non-active experiment as ready to run.", } summary_by_lifecycle = { "draft": "draft experiment; add inputs and switch lifecycle to active when ready", @@ -126,7 +149,7 @@ def experiment_readiness_payload( exports_subdir=layout.exports_subdir, notebooks_subdir=layout.notebooks_subdir, ) - legacy_outputs_present = any(generated.values()) and not records_catalog + legacy_outputs_present = any(generated[key] for key in ("records", "plots", "exports")) and not records_catalog workbench = resolve_workbench(decl) can_run = summary["status"] == "ok" file_issues = int(summary["files"].get("issues") or 0) @@ -137,7 +160,7 @@ def experiment_readiness_payload( next_steps = [ { "command": reader_command("validate", job_path, "--format", "json"), - "description": "Inspect missing runtime dependencies before running this assay.", + "description": "Check missing runtime dependencies before running this assay.", } ] elif not can_run: @@ -146,7 +169,7 @@ def experiment_readiness_payload( next_steps = [ { "command": reader_command("validate", job_path, "--format", "json"), - "description": "Inspect blocking file or dependency issues.", + "description": "Check blocking file or dependency issues.", } ] elif records_catalog: @@ -158,11 +181,11 @@ def experiment_readiness_payload( ] elif legacy_outputs_present: state = "legacy_outputs_present" - summary_text = "legacy outputs present without current record catalog" + summary_text = "old outputs present but no current records catalog" next_steps = [ { "command": reader_command("run", job_path), - "description": "Regenerate the current record catalog and selected outputs from source inputs.", + "description": "Rerun from source inputs to rebuild records and selected outputs.", } ] else: diff --git a/src/reader/workbench/inspection/reports.py b/src/reader/workbench/inspection/reports.py index 3734cc4..fa07c4e 100644 --- a/src/reader/workbench/inspection/reports.py +++ b/src/reader/workbench/inspection/reports.py @@ -13,7 +13,11 @@ from .semantics import semantic_program_table -def experiment_inspect_renderables(*, payload: dict[str, object], semantic_program) -> list[Panel]: +def experiment_inspect_renderables( + *, + payload: dict[str, object], + semantic_program, +) -> list[Panel]: experiment = dict(payload.get("experiment") or {}) authoring = dict(payload.get("authoring") or {}) implementation = dict(payload.get("implementation") or {}) @@ -90,7 +94,7 @@ def experiment_inspect_renderables(*, payload: dict[str, object], semantic_progr ) renderables.append(Panel(readiness_table, border_style="accent", box=box.ROUNDED)) - authoring_table = _table("Authoring bindings") + authoring_table = _table("Config values") authoring_table.add_column("section", style="accent", width=10) authoring_table.add_column("path", overflow="fold") authoring_table.add_column("value", overflow="fold") @@ -101,8 +105,23 @@ def experiment_inspect_renderables(*, payload: dict[str, object], semantic_progr authoring_table.add_row("—", "—", "No explicit bindings; protocol defaults only.") renderables.append(Panel(authoring_table, border_style="accent", box=box.ROUNDED)) - if semantic_program is not None: - renderables.append(Panel(semantic_program_table(semantic_program), border_style="accent", box=box.ROUNDED)) + renderables.append( + Panel( + semantic_program_table(semantic_program, include_execution=False), + border_style="accent", + box=box.ROUNDED, + ) + ) + renderables.append( + Panel( + semantic_program_table( + semantic_program, + title="Compiled Semantic Execution", + ), + border_style="accent", + box=box.ROUNDED, + ) + ) filesystem = _table("Inputs + resources") filesystem.add_column("kind", style="accent", width=10) @@ -141,7 +160,7 @@ def experiment_inspect_renderables(*, payload: dict[str, object], semantic_progr ) renderables.append(Panel(generated_table, border_style="accent", box=box.ROUNDED)) - records_table = _table("Record catalog") + records_table = _table("Records") records_table.add_column("record", style="accent", overflow="fold") records_table.add_column("kind", width=18) records_table.add_column("producer", overflow="fold") @@ -155,7 +174,7 @@ def experiment_inspect_renderables(*, payload: dict[str, object], semantic_progr str(record.get("detail") or "—"), ) else: - records_table.add_row("—", "—", "—", "No records catalog found under outputs/manifests/records.json.") + records_table.add_row("—", "—", "—", "No records found under outputs/manifests/records.json.") renderables.append(Panel(records_table, border_style="accent", box=box.ROUNDED)) pipeline_rows = [dict(item) for item in (compiled.get("pipeline") or []) if isinstance(item, dict)] @@ -197,7 +216,7 @@ def experiment_inspect_renderables(*, payload: dict[str, object], semantic_progr ) renderables.append(_surface_specs_panel(title="Plot outputs", rows=(compiled.get("plots") or []))) - renderables.append(_surface_specs_panel(title="Export artifacts", rows=(compiled.get("exports") or []))) + renderables.append(_surface_specs_panel(title="Exports", rows=(compiled.get("exports") or []))) notebooks = [dict(item) for item in (compiled.get("notebooks") or []) if isinstance(item, dict)] notebook_table = _table("Notebooks") @@ -206,7 +225,7 @@ def experiment_inspect_renderables(*, payload: dict[str, object], semantic_progr notebook_table.add_column("status") if notebooks: for idx, notebook in enumerate(notebooks, 1): - notebook_table.add_row(str(idx), str(notebook.get("template") or "—"), "selected") + notebook_table.add_row(str(idx), str(notebook.get("template") or "—"), "configured") else: notebook_table.add_row("—", "—", "No notebook template selected.") renderables.append(Panel(notebook_table, border_style="accent", box=box.ROUNDED)) @@ -238,16 +257,23 @@ def workflow_explain_renderables( resources = tuple(decl.experiment_semantics.resources.by_id.keys()) if resources: summary.add_row("Resources", ", ".join(resources)) - renderables.append(Panel(summary, border_style="cyan", box=box.ROUNDED, title="Protocol plan")) + renderables.append(Panel(summary, border_style="cyan", box=box.ROUNDED, title="Plan summary")) - if decl.experiment_semantics.protocol_program is not None: - renderables.append( - Panel( - semantic_program_table(decl.experiment_semantics.protocol_program), - border_style="cyan", - box=box.ROUNDED, - ) + semantic_program = decl.experiment_semantics.protocol_program + renderables.append( + Panel( + semantic_program_table(semantic_program, include_execution=False), + border_style="cyan", + box=box.ROUNDED, ) + ) + renderables.append( + Panel( + semantic_program_table(semantic_program, title="Compiled Semantic Execution"), + border_style="cyan", + box=box.ROUNDED, + ) + ) if pipeline_steps: renderables.append( @@ -364,7 +390,7 @@ def _surface_specs_panel(*, title: str, rows) -> Panel: from_refs = ", ".join(render_read_binding(read) for read in (item.get("reads") or [])) or "—" table.add_row(str(idx), str(item.get("id") or "—"), str(item.get("summary") or "—"), from_refs) else: - empty = "No plot outputs selected." if title == "Plot outputs" else "No export artifacts selected." + empty = "No plot outputs selected." if title == "Plot outputs" else "No exports selected." table.add_row("—", "—", empty, "—") return Panel(table, border_style="accent", box=box.ROUNDED) diff --git a/src/reader/workbench/inspection/semantics.py b/src/reader/workbench/inspection/semantics.py index 09bc455..7ff8e89 100644 --- a/src/reader/workbench/inspection/semantics.py +++ b/src/reader/workbench/inspection/semantics.py @@ -16,21 +16,22 @@ def _table(title: str) -> Table: ) -def semantic_node_payload(node) -> dict[str, object]: +def semantic_node_payload(node, *, include_execution: bool = True) -> dict[str, object]: payload = { "id": node.id, "kind": node.kind, "summary": node.summary, "profiles": list(node.profiles), - "execution": { + } + if include_execution: + payload["execution"] = { "status": node.execution.status, "step_ids": list(node.execution.step_ids), "plugin_ids": list(node.execution.plugin_ids), "record_ids": list(node.execution.record_ids), "config_paths": list(node.execution.config_paths), "note": node.execution.note, - }, - } + } if node.kind == "control_rule": payload["match_on"] = list(node.match_on) payload["control_selector"] = node.control_selector @@ -53,7 +54,33 @@ def semantic_node_payload(node) -> dict[str, object]: return payload -def semantic_program_summary(program) -> dict[str, object]: +def semantic_program_structure_summary(program) -> dict[str, object]: + summary = { + "total": 0, + "by_kind": { + "control_rule": 0, + "window": 0, + "metric": 0, + "ranking": 0, + }, + } + + def _record(node) -> None: + summary["total"] += 1 + summary["by_kind"][str(node.kind)] += 1 + + for node in program.controls: + _record(node) + for node in program.windows: + _record(node) + for node in program.metrics: + _record(node) + if program.ranking is not None: + _record(program.ranking) + return summary + + +def semantic_program_execution_summary(program) -> dict[str, object]: summary = { "total": 0, "compiled": 0, @@ -87,7 +114,7 @@ def _record(node) -> None: return summary -def semantic_program_payload(program) -> dict[str, object]: +def semantic_program_payload(program, *, include_execution: bool = True) -> dict[str, object]: return { "protocol": program.protocol, "profiles": [ @@ -102,34 +129,57 @@ def semantic_program_payload(program) -> dict[str, object]: for profile in program.profiles ], "active_profile": program.active_profile, - "summary": semantic_program_summary(program), - "controls": [semantic_node_payload(node) for node in program.controls], - "windows": [semantic_node_payload(node) for node in program.windows], - "metrics": [semantic_node_payload(node) for node in program.metrics], - "ranking": semantic_node_payload(program.ranking) if program.ranking is not None else None, + "summary": ( + semantic_program_execution_summary(program) + if include_execution + else semantic_program_structure_summary(program) + ), + "controls": [semantic_node_payload(node, include_execution=include_execution) for node in program.controls], + "windows": [semantic_node_payload(node, include_execution=include_execution) for node in program.windows], + "metrics": [semantic_node_payload(node, include_execution=include_execution) for node in program.metrics], + "ranking": ( + semantic_node_payload(program.ranking, include_execution=include_execution) + if program.ranking is not None + else None + ), } -def semantic_program_table(program) -> Table: - coverage = semantic_program_summary(program) +def semantic_program_table( + program, + *, + title: str = "Semantic Program", + include_execution: bool = True, +) -> Table: profile_text = f" • profile: {program.active_profile}" if program.active_profile else "" - table = _table( - "Semantic Program" - f"{profile_text}" - f" • {coverage['compiled']}/{coverage['total']} compiled" - f" • {coverage['descriptive_only']} descriptive" - ) - table.add_column("kind", style="accent", width=13) - table.add_column("id", style="accent", overflow="fold") - table.add_column("status", width=18) - table.add_column("compiled via", overflow="fold") - table.add_column("summary", overflow="fold") - - def _add_node(kind: str, node) -> None: - compiled_via = ", ".join(node.execution.step_ids) or "—" - note = node.execution.note - summary = node.summary if not note else f"{node.summary} ({note})" - table.add_row(kind, node.id, node.execution.status, compiled_via, summary) + if include_execution: + coverage = semantic_program_execution_summary(program) + table = _table( + f"{title}{profile_text} • {coverage['compiled']}/{coverage['total']} compiled" + f" • {coverage['descriptive_only']} descriptive" + ) + table.add_column("kind", style="accent", width=13) + table.add_column("id", style="accent", overflow="fold") + table.add_column("status", width=18) + table.add_column("compiled via", overflow="fold") + table.add_column("summary", overflow="fold") + + def _add_node(kind: str, node) -> None: + compiled_via = ", ".join(node.execution.step_ids) or "—" + note = node.execution.note + summary = node.summary if not note else f"{node.summary} ({note})" + table.add_row(kind, node.id, node.execution.status, compiled_via, summary) + else: + coverage = semantic_program_structure_summary(program) + table = _table(f"{title}{profile_text} • {coverage['total']} node(s)") + table.add_column("kind", style="accent", width=13) + table.add_column("id", style="accent", overflow="fold") + table.add_column("profiles", overflow="fold") + table.add_column("summary", overflow="fold") + + def _add_node(kind: str, node) -> None: + profiles = ", ".join(node.profiles) or "all" + table.add_row(kind, node.id, profiles, node.summary) for node in program.controls: _add_node("control_rule", node) diff --git a/src/reader/workbench/notebooks/_launch_registry.py b/src/reader/workbench/notebooks/_launch_registry.py new file mode 100644 index 0000000..07a016c --- /dev/null +++ b/src/reader/workbench/notebooks/_launch_registry.py @@ -0,0 +1,112 @@ +from __future__ import annotations + +import json +from collections.abc import Callable +from dataclasses import asdict, dataclass +from pathlib import Path + + +@dataclass(frozen=True) +class MarimoSessionRecord: + pid: int + port: int + host: str + mode: str + notebook: str + experiment_root: str + repo_root: str + launched_at: float + notebook_mtime_ns: int | None = None + notebook_size_bytes: int | None = None + runtime_fingerprint: str | None = None + + @classmethod + def from_dict(cls, payload: dict[str, object]) -> MarimoSessionRecord | None: + try: + return cls( + pid=int(payload["pid"]), + port=int(payload["port"]), + host=str(payload["host"]), + mode=str(payload["mode"]), + notebook=str(payload["notebook"]), + experiment_root=str(payload["experiment_root"]), + repo_root=str(payload["repo_root"]), + launched_at=float(payload["launched_at"]), + notebook_mtime_ns=( + int(payload["notebook_mtime_ns"]) if payload.get("notebook_mtime_ns") is not None else None + ), + notebook_size_bytes=( + int(payload["notebook_size_bytes"]) if payload.get("notebook_size_bytes") is not None else None + ), + runtime_fingerprint=( + str(payload["runtime_fingerprint"]) if payload.get("runtime_fingerprint") is not None else None + ), + ) + except (KeyError, TypeError, ValueError): + return None + + +def load_registry(registry_path: Path) -> list[MarimoSessionRecord]: + if not registry_path.exists(): + return [] + try: + payload = json.loads(registry_path.read_text(encoding="utf-8")) + except (OSError, json.JSONDecodeError): + return [] + if not isinstance(payload, list): + return [] + records: list[MarimoSessionRecord] = [] + for item in payload: + if isinstance(item, dict): + record = MarimoSessionRecord.from_dict(item) + if record is not None: + records.append(record) + return records + + +def write_registry(registry_path: Path, records: list[MarimoSessionRecord]) -> None: + registry_path.parent.mkdir(parents=True, exist_ok=True) + payload = [asdict(record) for record in records] + registry_path.write_text(json.dumps(payload, indent=2, sort_keys=True), encoding="utf-8") + + +def prune_registry( + records: list[MarimoSessionRecord], + *, + pid_is_live: Callable[[int], bool], +) -> list[MarimoSessionRecord]: + pruned: list[MarimoSessionRecord] = [] + for record in records: + notebook_path = Path(record.notebook) + if not notebook_path.exists(): + continue + if not pid_is_live(record.pid): + continue + pruned.append(record) + return pruned + + +def session_matches_current_inputs( + record: MarimoSessionRecord, + *, + mode: str, + resolved_target: Path, + experiment_root: Path, + runtime_fingerprint: str, + notebook_mtime_ns: int, + notebook_size_bytes: int, + port_is_open: Callable[[str, int], bool], +) -> bool: + if record.mode != mode: + return False + if record.notebook != str(resolved_target): + return False + if record.experiment_root != str(experiment_root): + return False + if record.notebook_mtime_ns != notebook_mtime_ns: + return False + if record.notebook_size_bytes != notebook_size_bytes: + return False + if record.runtime_fingerprint != runtime_fingerprint: + return False + return port_is_open(record.host, record.port) diff --git a/src/reader/workbench/notebooks/_launch_runtime.py b/src/reader/workbench/notebooks/_launch_runtime.py new file mode 100644 index 0000000..639abae --- /dev/null +++ b/src/reader/workbench/notebooks/_launch_runtime.py @@ -0,0 +1,97 @@ +from __future__ import annotations + +import hashlib +import os +from dataclasses import dataclass +from pathlib import Path + +from reader.errors import ConfigError + +RUNTIME_FINGERPRINT_SUFFIXES = (".py", ".txt") + + +@dataclass(frozen=True) +class MarimoRuntimePaths: + root: Path + registry_path: Path + xdg_config_home: Path + xdg_state_home: Path + xdg_cache_home: Path + mplconfigdir: Path + + +def find_repo_root(start: Path) -> Path: + for base in [start.resolve()] + list(start.resolve().parents): + if (base / "pyproject.toml").exists(): + return base + raise ConfigError(f"Could not find repository root from {start}") + + +def find_experiment_root(start: Path) -> Path: + for base in [start.resolve()] + list(start.resolve().parents): + if (base / "config.yaml").exists(): + return base + return start.resolve().parent + + +def runtime_paths_for_target(target: Path) -> MarimoRuntimePaths: + repo_root = find_repo_root(target) + root = repo_root / ".cache" / "marimo" + xdg_config_home = root / "xdg-config" + xdg_state_home = root / "xdg-state" + xdg_cache_home = root / "xdg-cache" + mplconfigdir = root / "matplotlib" + for path in (root, xdg_config_home, xdg_state_home, xdg_cache_home, mplconfigdir): + path.mkdir(parents=True, exist_ok=True) + return MarimoRuntimePaths( + root=root, + registry_path=root / "sessions.json", + xdg_config_home=xdg_config_home, + xdg_state_home=xdg_state_home, + xdg_cache_home=xdg_cache_home, + mplconfigdir=mplconfigdir, + ) + + +def target_signature(target: Path) -> tuple[int, int]: + stat = target.resolve().stat() + return stat.st_mtime_ns, stat.st_size + + +def runtime_fingerprint(repo_root: Path) -> str: + resolved_root = repo_root.resolve() + hasher = hashlib.sha256() + candidates: list[Path] = [] + pyproject = resolved_root / "pyproject.toml" + if pyproject.exists(): + candidates.append(pyproject) + source_root = resolved_root / "src" / "reader" + if source_root.exists(): + for suffix in RUNTIME_FINGERPRINT_SUFFIXES: + candidates.extend(path for path in source_root.rglob(f"*{suffix}") if path.is_file()) + for path in sorted({item.resolve() for item in candidates}): + stat = path.stat() + relative = path.relative_to(resolved_root) + hasher.update(f"{relative}:{stat.st_mtime_ns}:{stat.st_size}\n".encode()) + return hasher.hexdigest() + + +def build_env( + repo_root: Path, + *, + runtime_paths: MarimoRuntimePaths, + base_env: dict[str, str] | None = None, +) -> dict[str, str]: + env = dict(base_env or os.environ) + pythonpath_parts = [str(repo_root)] + existing_pythonpath = env.get("PYTHONPATH", "").strip() + if existing_pythonpath: + pythonpath_parts.append(existing_pythonpath) + env["PYTHONPATH"] = os.pathsep.join(pythonpath_parts) + env["READER_MARIMO_RUNTIME_PATCH"] = "1" + env["XDG_CONFIG_HOME"] = str(runtime_paths.xdg_config_home) + env["XDG_STATE_HOME"] = str(runtime_paths.xdg_state_home) + env["XDG_CACHE_HOME"] = str(runtime_paths.xdg_cache_home) + env["READER_MPLCONFIGDIR"] = str(runtime_paths.mplconfigdir) + env["MPLCONFIGDIR"] = str(runtime_paths.mplconfigdir) + return env diff --git a/src/reader/workbench/notebooks/dual_reporter_triptych.py b/src/reader/workbench/notebooks/dual_reporter_triptych.py new file mode 100644 index 0000000..11982f4 --- /dev/null +++ b/src/reader/workbench/notebooks/dual_reporter_triptych.py @@ -0,0 +1,459 @@ +from __future__ import annotations + +from dataclasses import dataclass +from typing import Any + +import pandas as pd + +DEFAULT_TRIPTYCH_PANEL_SIZE = 260 +DEFAULT_TRIPTYCH_SPACING = 16 + + +@dataclass(frozen=True) +class DualReporterTriptychData: + od600_time: pd.DataFrame + ratio_time: pd.DataFrame + snapshot_stats: pd.DataFrame + snapshot_points: pd.DataFrame + treatment_order: tuple[str, ...] + missing_treatments: tuple[str, ...] + snapshot_time: float + growth_channel: str + ratio_channel: str + snapshot_channel: str + + +def build_triptych_data( + df: pd.DataFrame, + *, + time_col: str, + treatment_col: str, + growth_channel: str = "OD600", + ratio_channel: str = "YFP/CFP", + snapshot_channel: str | None = None, + snapshot_time: float, + treatment_order: list[str] | tuple[str, ...] | None = None, + time_atol: float = 1e-9, +) -> DualReporterTriptychData: + """Prepare the neutral dual-reporter triptych data contract. + + This is intentionally independent of SFXI semantics. It only requires a + tidy dual-reporter dataframe with channels for growth, ratio kinetics, and + a snapshot channel. + """ + snapshot_channel = snapshot_channel or ratio_channel + required_channels = [growth_channel, ratio_channel, snapshot_channel] + _require_columns(df, ["channel", "value", time_col, treatment_col], where="dual_reporter_triptych") + + work = df.copy() + work[time_col] = pd.to_numeric(work[time_col], errors="coerce") + work["value"] = pd.to_numeric(work["value"], errors="coerce") + work = work.dropna(subset=[time_col, "value", treatment_col, "channel"]) + if work.empty: + raise ValueError("dual_reporter_triptych: no usable rows after numeric/time cleanup") + work[treatment_col] = work[treatment_col].astype(str) + work["channel"] = work["channel"].astype(str) + + order = _resolve_treatment_order(work, treatment_col=treatment_col, treatment_order=treatment_order) + if treatment_order is not None: + work = work[work[treatment_col].isin(order)].copy() + if work.empty: + raise ValueError("dual_reporter_triptych: no rows match the requested treatment order") + _require_channels(work, required_channels) + observed = set(work[treatment_col].dropna().astype(str).unique().tolist()) + missing_treatments = tuple(value for value in order if value not in observed) + + od600_time = _summarize_time( + work, channel=growth_channel, time_col=time_col, treatment_col=treatment_col, order=order + ) + ratio_time = _summarize_time( + work, channel=ratio_channel, time_col=time_col, treatment_col=treatment_col, order=order + ) + snapshot_stats, snapshot_points = _summarize_snapshot( + work, + channel=snapshot_channel, + time_col=time_col, + treatment_col=treatment_col, + snapshot_time=float(snapshot_time), + order=order, + time_atol=float(time_atol), + ) + return DualReporterTriptychData( + od600_time=od600_time, + ratio_time=ratio_time, + snapshot_stats=snapshot_stats, + snapshot_points=snapshot_points, + treatment_order=tuple(order), + missing_treatments=missing_treatments, + snapshot_time=float(snapshot_time), + growth_channel=str(growth_channel), + ratio_channel=str(ratio_channel), + snapshot_channel=str(snapshot_channel), + ) + + +def build_dual_reporter_triptych_chart( + *, + alt: Any, + pd_module: Any, + data: DualReporterTriptychData, + time_col: str, + treatment_col: str, + induction_time_h: float | None = None, + width: int = DEFAULT_TRIPTYCH_PANEL_SIZE, + height: int | None = None, + spacing: int = DEFAULT_TRIPTYCH_SPACING, +) -> Any: + if data.od600_time.empty: + raise ValueError("dual_reporter_triptych: no OD600 time-series data available") + if data.ratio_time.empty: + raise ValueError("dual_reporter_triptych: no ratio time-series data available") + if data.snapshot_stats.empty: + raise ValueError("dual_reporter_triptych: no snapshot data available") + + order = list(data.treatment_order) + panel_height = int(height if height is not None else width) + color = alt.Color( + f"{treatment_col}:N", + sort=order, + scale=alt.Scale(domain=order), + legend=alt.Legend(orient="bottom", title="Treatment"), + ) + + od600_chart = _time_chart( + alt=alt, + pd_module=pd_module, + frame=data.od600_time, + time_col=time_col, + treatment_col=treatment_col, + y_title=data.growth_channel, + color=color, + order=order, + snapshot_time=data.snapshot_time, + induction_time_h=induction_time_h, + width=width, + height=panel_height, + ) + ratio_chart = _time_chart( + alt=alt, + pd_module=pd_module, + frame=data.ratio_time, + time_col=time_col, + treatment_col=treatment_col, + y_title=data.ratio_channel, + color=color, + order=order, + snapshot_time=data.snapshot_time, + induction_time_h=induction_time_h, + width=width, + height=panel_height, + ) + snapshot_chart = _snapshot_chart( + alt=alt, + frame=data.snapshot_stats, + points=data.snapshot_points, + treatment_col=treatment_col, + y_title=f"{data.snapshot_channel} snapshot", + order=order, + width=width, + height=panel_height, + ) + return ( + alt.hconcat(od600_chart, ratio_chart, snapshot_chart, spacing=spacing) + .resolve_scale(color="shared") + .configure(background="white") + .configure_view(fill="white") + .configure_axis( + domain=True, + domainColor="black", + domainWidth=1, + tickColor="black", + labelColor="black", + titleColor="black", + labelFontSize=12, + titleFontSize=13, + ) + .configure_legend(labelColor="black", titleColor="black", labelFontSize=12, titleFontSize=12) + .configure_title(color="black", fontSize=14) + .configure_text(color="black", fontSize=12) + ) + + +def summarize_design_context( + df: pd.DataFrame, + *, + primary_col: str, + primary_value: object, + preferred_columns: tuple[str, ...] = ("design_id_alias", "design_id", "id", "sequence", "strain", "medium"), + max_value_chars: int = 84, +) -> list[tuple[str, str]]: + """Return compact identity rows for a selected design/genotype.""" + + rows = [(str(primary_col), _compact_context_value(primary_value, max_chars=max_value_chars))] + seen = {str(primary_col)} + for column in preferred_columns: + column = str(column) + if column in seen or column not in df.columns: + continue + values = _unique_context_values(df[column]) + if not values: + continue + if len(values) == 1: + display = _compact_context_value(values[0], max_chars=max_value_chars) + else: + display = f"{len(values)} values: {_compact_context_value(values[0], max_chars=max_value_chars)}" + rows.append((column, display)) + seen.add(column) + return rows + + +def choose_time(times: list[float] | tuple[float, ...], target: float | None, mode: str) -> float | None: + if not times: + return None + time_list = sorted(float(t) for t in times) + if target is None: + return time_list[-1] + target = float(target) + if mode == "exact": + for time_value in time_list: + if abs(time_value - target) <= 1e-12: + return time_value + return None + if mode == "nearest": + return min(time_list, key=lambda time_value: abs(time_value - target)) + if mode == "last_before": + candidates = [time_value for time_value in time_list if time_value <= target] + return max(candidates) if candidates else None + if mode == "first_after": + candidates = [time_value for time_value in time_list if time_value >= target] + return min(candidates) if candidates else None + raise ValueError("dual_reporter_triptych: time mode must be nearest, last_before, first_after, or exact") + + +def _unique_context_values(series: pd.Series) -> list[str]: + values: list[str] = [] + seen: set[str] = set() + for value in series.dropna().tolist(): + text = str(value).strip() + if not text or text.casefold() == "nan" or text in seen: + continue + values.append(text) + seen.add(text) + return values + + +def _compact_context_value(value: object, *, max_chars: int) -> str: + text = str(value).strip() + if len(text) <= max_chars: + return text + side = max(8, (max_chars - 3) // 2) + return f"{text[:side]}...{text[-side:]}" + + +def _require_columns(df: pd.DataFrame, columns: list[str], *, where: str) -> None: + missing = [column for column in columns if column not in df.columns] + if missing: + raise ValueError(f"{where}: missing column(s): {', '.join(missing)}") + + +def _require_channels(df: pd.DataFrame, channels: list[str]) -> None: + available = {str(value) for value in df["channel"].dropna().unique().tolist()} + missing = [channel for channel in channels if str(channel) not in available] + if missing: + options = ", ".join(sorted(available)) + raise ValueError(f"dual_reporter_triptych: requested channel(s) not found: {missing}. Available: {options}") + + +def _resolve_treatment_order( + df: pd.DataFrame, + *, + treatment_col: str, + treatment_order: list[str] | tuple[str, ...] | None, +) -> list[str]: + observed = [str(value) for value in df[treatment_col].dropna().unique().tolist()] + if treatment_order is not None: + order = [] + seen = set() + for value in treatment_order: + item = str(value) + if item not in seen: + order.append(item) + seen.add(item) + if not order: + raise ValueError("dual_reporter_triptych: treatment_order must not be empty when provided") + return order + return sorted(observed) + + +def _sort_by_treatment( + df: pd.DataFrame, *, treatment_col: str, order: list[str], extra_sort: list[str] +) -> pd.DataFrame: + if df.empty: + return df + order_map = {value: idx for idx, value in enumerate(order)} + sorted_df = df.copy() + sorted_df["__treatment_order"] = sorted_df[treatment_col].astype(str).map(order_map).fillna(len(order_map)) + sorted_df = sorted_df.sort_values(["__treatment_order", *extra_sort]).drop(columns=["__treatment_order"]) + return sorted_df.reset_index(drop=True) + + +def _summarize_time( + df: pd.DataFrame, + *, + channel: str, + time_col: str, + treatment_col: str, + order: list[str], +) -> pd.DataFrame: + work = df[df["channel"].astype(str) == str(channel)].copy() + if work.empty: + return pd.DataFrame(columns=[time_col, treatment_col, "y_mean", "y_sd", "y_n", "y_lo", "y_hi"]) + stats = work.groupby([time_col, treatment_col], dropna=False)["value"].agg(["mean", "std", "count"]).reset_index() + stats = stats.rename(columns={"mean": "y_mean", "std": "y_sd", "count": "y_n"}) + stats["y_sd"] = stats["y_sd"].fillna(0.0) + stats["y_lo"] = stats["y_mean"] - stats["y_sd"] + stats["y_hi"] = stats["y_mean"] + stats["y_sd"] + return _sort_by_treatment(stats, treatment_col=treatment_col, order=order, extra_sort=[time_col]) + + +def _summarize_snapshot( + df: pd.DataFrame, + *, + channel: str, + time_col: str, + treatment_col: str, + snapshot_time: float, + order: list[str], + time_atol: float, +) -> tuple[pd.DataFrame, pd.DataFrame]: + work = df[df["channel"].astype(str) == str(channel)].copy() + if work.empty: + empty_stats = pd.DataFrame(columns=[treatment_col, "y_mean", "y_sd", "y_n", "y_lo", "y_hi"]) + empty_points = pd.DataFrame(columns=[treatment_col, "value"]) + return empty_stats, empty_points + mask = (work[time_col] - float(snapshot_time)).abs() <= float(time_atol) + snapped = work[mask].copy() + if snapped.empty: + empty_stats = pd.DataFrame(columns=[treatment_col, "y_mean", "y_sd", "y_n", "y_lo", "y_hi"]) + empty_points = pd.DataFrame(columns=[treatment_col, "value"]) + return empty_stats, empty_points + stats = snapped.groupby(treatment_col, dropna=False)["value"].agg(["mean", "std", "count"]).reset_index() + stats = stats.rename(columns={"mean": "y_mean", "std": "y_sd", "count": "y_n"}) + stats["y_sd"] = stats["y_sd"].fillna(0.0) + stats["y_lo"] = stats["y_mean"] - stats["y_sd"] + stats["y_hi"] = stats["y_mean"] + stats["y_sd"] + stats = _sort_by_treatment(stats, treatment_col=treatment_col, order=order, extra_sort=[]) + points = _sort_by_treatment( + snapped[[treatment_col, "value"]].copy(), + treatment_col=treatment_col, + order=order, + extra_sort=["value"], + ) + return stats, points + + +def _time_chart( + *, + alt: Any, + pd_module: Any, + frame: pd.DataFrame, + time_col: str, + treatment_col: str, + y_title: str, + color: Any, + order: list[str], + snapshot_time: float, + induction_time_h: float | None, + width: int, + height: int, +) -> Any: + base = alt.Chart(frame).encode( + x=alt.X(f"{time_col}:Q", title="Time (h)", axis=alt.Axis(labelOverlap=False)), + color=color, + ) + band = base.mark_area(opacity=0.2).encode( + y=alt.Y("y_lo:Q", title=y_title), + y2=alt.Y2("y_hi:Q"), + tooltip=_time_tooltips(alt, time_col=time_col, treatment_col=treatment_col), + ) + line = base.mark_line().encode( + y=alt.Y("y_mean:Q", title=y_title), + tooltip=_time_tooltips(alt, time_col=time_col, treatment_col=treatment_col), + ) + layers = [band, line] + y_max = frame["y_hi"].max() + if pd_module.isna(y_max): + y_max = frame["y_mean"].max() + if pd_module.isna(y_max): + y_max = 0.0 + rule_df = pd_module.DataFrame({time_col: [float(snapshot_time)], "y": [float(y_max)]}) + layers.append(alt.Chart(rule_df).mark_rule(color="black").encode(x=alt.X(f"{time_col}:Q"))) + + if induction_time_h is not None: + try: + induction_time = float(induction_time_h) + except Exception: + induction_time = None + if induction_time is not None and not pd_module.isna(induction_time): + induction_df = pd_module.DataFrame({time_col: [induction_time]}) + layers.append( + alt.Chart(induction_df).mark_rule(color="red", strokeDash=[6, 4]).encode(x=alt.X(f"{time_col}:Q")) + ) + + return alt.layer(*layers).properties(width=width, height=height, title=y_title) + + +def _snapshot_chart( + *, + alt: Any, + frame: pd.DataFrame, + points: pd.DataFrame, + treatment_col: str, + y_title: str, + order: list[str], + width: int, + height: int, +) -> Any: + axis = alt.Axis(labelLimit=0, labelOverlap=False, labelAngle=-45) + base = alt.Chart(frame).encode( + x=alt.X(f"{treatment_col}:N", sort=order, scale=alt.Scale(domain=order), axis=axis), + y=alt.Y("y_mean:Q", title=y_title), + tooltip=[ + alt.Tooltip(f"{treatment_col}:N", title="Treatment"), + alt.Tooltip("y_mean:Q", title="Mean"), + alt.Tooltip("y_sd:Q", title="SD"), + alt.Tooltip("y_n:Q", title="N"), + ], + ) + layers = [ + base.mark_bar().encode( + color=alt.Color(f"{treatment_col}:N", sort=order, scale=alt.Scale(domain=order), legend=None) + ), + base.mark_rule(color="black").encode(y=alt.Y("y_lo:Q"), y2=alt.Y2("y_hi:Q")), + base.mark_tick(color="black", orient="horizontal", size=8, thickness=1.5).encode(y=alt.Y("y_lo:Q")), + base.mark_tick(color="black", orient="horizontal", size=8, thickness=1.5).encode(y=alt.Y("y_hi:Q")), + ] + if not points.empty: + layers.append( + alt.Chart(points) + .mark_point(filled=True, strokeWidth=0, size=50) + .encode( + x=alt.X(f"{treatment_col}:N", sort=order, scale=alt.Scale(domain=order), axis=axis), + y=alt.Y("value:Q"), + tooltip=[ + alt.Tooltip(f"{treatment_col}:N", title="Treatment"), + alt.Tooltip("value:Q", title="Value"), + ], + ) + ) + return alt.layer(*layers).properties(width=width, height=height, title=y_title) + + +def _time_tooltips(alt: Any, *, time_col: str, treatment_col: str) -> list[Any]: + return [ + alt.Tooltip(f"{time_col}:Q", title="Time (h)"), + alt.Tooltip(f"{treatment_col}:N", title="Treatment"), + alt.Tooltip("y_mean:Q", title="Mean"), + alt.Tooltip("y_sd:Q", title="SD"), + alt.Tooltip("y_n:Q", title="N"), + ] diff --git a/src/reader/workbench/notebooks/launch.py b/src/reader/workbench/notebooks/launch.py index f297e8a..82388e0 100644 --- a/src/reader/workbench/notebooks/launch.py +++ b/src/reader/workbench/notebooks/launch.py @@ -1,72 +1,56 @@ from __future__ import annotations -import hashlib -import json import os import signal import socket import sys import time import webbrowser -from dataclasses import asdict, dataclass +from dataclasses import dataclass from pathlib import Path from reader.errors import ConfigError +from ._launch_registry import ( + MarimoSessionRecord, +) +from ._launch_registry import ( + load_registry as _load_registry, +) +from ._launch_registry import ( + prune_registry as _prune_registry, +) +from ._launch_registry import ( + session_matches_current_inputs as _session_matches_current_inputs, +) +from ._launch_registry import ( + write_registry as _write_registry, +) +from ._launch_runtime import ( + MarimoRuntimePaths, +) +from ._launch_runtime import ( + build_env as _build_env, +) +from ._launch_runtime import ( + find_experiment_root as _find_experiment_root, +) +from ._launch_runtime import ( + find_repo_root as _find_repo_root, +) +from ._launch_runtime import ( + runtime_fingerprint as _runtime_fingerprint, +) +from ._launch_runtime import ( + runtime_paths_for_target as _runtime_paths_for_target, +) +from ._launch_runtime import ( + target_signature as _target_signature, +) + DEFAULT_HOST = "127.0.0.1" DEFAULT_PORT = 2718 DEFAULT_PORT_SCAN_LIMIT = 32 -RUNTIME_FINGERPRINT_SUFFIXES = (".py", ".txt") - - -@dataclass(frozen=True) -class MarimoRuntimePaths: - root: Path - registry_path: Path - xdg_config_home: Path - xdg_state_home: Path - xdg_cache_home: Path - mplconfigdir: Path - - -@dataclass(frozen=True) -class MarimoSessionRecord: - pid: int - port: int - host: str - mode: str - notebook: str - experiment_root: str - repo_root: str - launched_at: float - notebook_mtime_ns: int | None = None - notebook_size_bytes: int | None = None - runtime_fingerprint: str | None = None - - @classmethod - def from_dict(cls, payload: dict[str, object]) -> MarimoSessionRecord | None: - try: - return cls( - pid=int(payload["pid"]), - port=int(payload["port"]), - host=str(payload["host"]), - mode=str(payload["mode"]), - notebook=str(payload["notebook"]), - experiment_root=str(payload["experiment_root"]), - repo_root=str(payload["repo_root"]), - launched_at=float(payload["launched_at"]), - notebook_mtime_ns=( - int(payload["notebook_mtime_ns"]) if payload.get("notebook_mtime_ns") is not None else None - ), - notebook_size_bytes=( - int(payload["notebook_size_bytes"]) if payload.get("notebook_size_bytes") is not None else None - ), - runtime_fingerprint=( - str(payload["runtime_fingerprint"]) if payload.get("runtime_fingerprint") is not None else None - ), - ) - except (KeyError, TypeError, ValueError): - return None @dataclass(frozen=True) @@ -82,84 +66,14 @@ class MarimoLaunchPlan: terminated_sessions: tuple[MarimoSessionRecord, ...] = () -def _find_repo_root(start: Path) -> Path: - for base in [start.resolve()] + list(start.resolve().parents): - if (base / "pyproject.toml").exists(): - return base - raise ConfigError(f"Could not find repository root from {start}") - - -def _find_experiment_root(start: Path) -> Path: - for base in [start.resolve()] + list(start.resolve().parents): - if (base / "config.yaml").exists(): - return base - return start.resolve().parent - - -def _runtime_paths_for_target(target: Path) -> MarimoRuntimePaths: - repo_root = _find_repo_root(target) - root = repo_root / ".cache" / "marimo" - xdg_config_home = root / "xdg-config" - xdg_state_home = root / "xdg-state" - xdg_cache_home = root / "xdg-cache" - mplconfigdir = root / "matplotlib" - for path in (root, xdg_config_home, xdg_state_home, xdg_cache_home, mplconfigdir): - path.mkdir(parents=True, exist_ok=True) - return MarimoRuntimePaths( - root=root, - registry_path=root / "sessions.json", - xdg_config_home=xdg_config_home, - xdg_state_home=xdg_state_home, - xdg_cache_home=xdg_cache_home, - mplconfigdir=mplconfigdir, - ) - - -def _load_registry(registry_path: Path) -> list[MarimoSessionRecord]: - if not registry_path.exists(): - return [] - try: - payload = json.loads(registry_path.read_text(encoding="utf-8")) - except (OSError, json.JSONDecodeError): - return [] - if not isinstance(payload, list): - return [] - records: list[MarimoSessionRecord] = [] - for item in payload: - if isinstance(item, dict): - record = MarimoSessionRecord.from_dict(item) - if record is not None: - records.append(record) - return records - - -def _write_registry(registry_path: Path, records: list[MarimoSessionRecord]) -> None: - registry_path.parent.mkdir(parents=True, exist_ok=True) - payload = [asdict(record) for record in records] - registry_path.write_text(json.dumps(payload, indent=2, sort_keys=True), encoding="utf-8") - - -def _target_signature(target: Path) -> tuple[int, int]: - stat = target.resolve().stat() - return stat.st_mtime_ns, stat.st_size - - -def _runtime_fingerprint(repo_root: Path) -> str: - resolved_root = repo_root.resolve() - hasher = hashlib.sha256() - candidates: list[Path] = [] - pyproject = resolved_root / "pyproject.toml" - if pyproject.exists(): - candidates.append(pyproject) - source_root = resolved_root / "src" / "reader" - if source_root.exists(): - for suffix in RUNTIME_FINGERPRINT_SUFFIXES: - candidates.extend(path for path in source_root.rglob(f"*{suffix}") if path.is_file()) - for path in sorted({item.resolve() for item in candidates}): - stat = path.stat() - relative = path.relative_to(resolved_root) - hasher.update(f"{relative}:{stat.st_mtime_ns}:{stat.st_size}\n".encode()) - return hasher.hexdigest() +@dataclass(frozen=True) +class _LaunchContext: + resolved_target: Path + experiment_root: Path + repo_root: Path + notebook_mtime_ns: int + notebook_size_bytes: int + runtime_fingerprint: str def _pid_is_live(pid: int) -> bool: @@ -201,59 +115,19 @@ def _terminate_pid(pid: int, *, grace_seconds: float = 1.0) -> bool: return not _pid_is_live(pid) -def _prune_registry(records: list[MarimoSessionRecord]) -> list[MarimoSessionRecord]: - pruned: list[MarimoSessionRecord] = [] - for record in records: - notebook_path = Path(record.notebook) - if not notebook_path.exists(): - continue - if not _pid_is_live(record.pid): - continue - pruned.append(record) - return pruned - - -def _session_matches_current_inputs( - record: MarimoSessionRecord, - *, - mode: str, - resolved_target: Path, - experiment_root: Path, - runtime_fingerprint: str, - notebook_mtime_ns: int, - notebook_size_bytes: int, -) -> bool: - if record.mode != mode: - return False - if record.notebook != str(resolved_target): - return False - if record.experiment_root != str(experiment_root): - return False - if record.notebook_mtime_ns != notebook_mtime_ns: - return False - if record.notebook_size_bytes != notebook_size_bytes: - return False - if record.runtime_fingerprint != runtime_fingerprint: - return False - return _port_is_open(record.host, record.port) - - -def _build_env(target: Path, *, base_env: dict[str, str] | None = None) -> tuple[dict[str, str], MarimoRuntimePaths]: - env = dict(base_env or os.environ) - runtime_paths = _runtime_paths_for_target(target) - repo_root = _find_repo_root(target) - pythonpath_parts = [str(repo_root)] - existing_pythonpath = env.get("PYTHONPATH", "").strip() - if existing_pythonpath: - pythonpath_parts.append(existing_pythonpath) - env["PYTHONPATH"] = os.pathsep.join(pythonpath_parts) - env["READER_MARIMO_RUNTIME_PATCH"] = "1" - env["XDG_CONFIG_HOME"] = str(runtime_paths.xdg_config_home) - env["XDG_STATE_HOME"] = str(runtime_paths.xdg_state_home) - env["XDG_CACHE_HOME"] = str(runtime_paths.xdg_cache_home) - env["READER_MPLCONFIGDIR"] = str(runtime_paths.mplconfigdir) - env["MPLCONFIGDIR"] = str(runtime_paths.mplconfigdir) - return env, runtime_paths +def _resolve_launch_context(target: Path) -> _LaunchContext: + resolved_target = target.resolve() + experiment_root = _find_experiment_root(resolved_target) + repo_root = _find_repo_root(resolved_target) + notebook_mtime_ns, notebook_size_bytes = _target_signature(resolved_target) + return _LaunchContext( + resolved_target=resolved_target, + experiment_root=experiment_root, + repo_root=repo_root, + notebook_mtime_ns=notebook_mtime_ns, + notebook_size_bytes=notebook_size_bytes, + runtime_fingerprint=_runtime_fingerprint(repo_root), + ) def _choose_port( @@ -289,23 +163,21 @@ def plan_marimo_launch( preferred_port: int | None = None, base_env: dict[str, str] | None = None, ) -> MarimoLaunchPlan: - resolved_target = target.resolve() - experiment_root = _find_experiment_root(resolved_target) - repo_root = _find_repo_root(resolved_target) - env, runtime_paths = _build_env(resolved_target, base_env=base_env) - records = _prune_registry(_load_registry(runtime_paths.registry_path)) - notebook_mtime_ns, notebook_size_bytes = _target_signature(resolved_target) - runtime_fingerprint = _runtime_fingerprint(repo_root) + context = _resolve_launch_context(target) + runtime_paths = _runtime_paths_for_target(context.resolved_target) + env = _build_env(context.repo_root, runtime_paths=runtime_paths, base_env=base_env) + records = _prune_registry(_load_registry(runtime_paths.registry_path), pid_is_live=_pid_is_live) for record in records: if _session_matches_current_inputs( record, mode=mode, - resolved_target=resolved_target, - experiment_root=experiment_root, - runtime_fingerprint=runtime_fingerprint, - notebook_mtime_ns=notebook_mtime_ns, - notebook_size_bytes=notebook_size_bytes, + resolved_target=context.resolved_target, + experiment_root=context.experiment_root, + runtime_fingerprint=context.runtime_fingerprint, + notebook_mtime_ns=context.notebook_mtime_ns, + notebook_size_bytes=context.notebook_size_bytes, + port_is_open=lambda host, port: _port_is_open(host, port), ): _write_registry(runtime_paths.registry_path, records) return MarimoLaunchPlan( @@ -314,7 +186,7 @@ def plan_marimo_launch( url=f"http://{record.host}:{record.port}", port=record.port, host=record.host, - target=resolved_target, + target=context.resolved_target, runtime_paths=runtime_paths, reused_session=record, ) @@ -322,7 +194,11 @@ def plan_marimo_launch( terminated_sessions: list[MarimoSessionRecord] = [] kept_records: list[MarimoSessionRecord] = [] for record in records: - if record.mode == mode and record.experiment_root == str(experiment_root) and _terminate_pid(record.pid): + if ( + record.mode == mode + and record.experiment_root == str(context.experiment_root) + and _terminate_pid(record.pid) + ): terminated_sessions.append(record) continue kept_records.append(record) @@ -335,7 +211,7 @@ def plan_marimo_launch( if headless: cmd.append("--headless") cmd.append("--no-token") - cmd.append(str(resolved_target)) + cmd.append(str(context.resolved_target)) _write_registry(runtime_paths.registry_path, kept_records) return MarimoLaunchPlan( @@ -344,7 +220,7 @@ def plan_marimo_launch( url=f"http://{DEFAULT_HOST}:{port}", port=port, host=DEFAULT_HOST, - target=resolved_target, + target=context.resolved_target, runtime_paths=runtime_paths, terminated_sessions=tuple(terminated_sessions), ) @@ -359,7 +235,7 @@ def register_managed_session( mode: str, target: Path, ) -> None: - records = _prune_registry(_load_registry(registry_path)) + records = _prune_registry(_load_registry(registry_path), pid_is_live=_pid_is_live) target_mtime_ns, target_size_bytes = _target_signature(target) runtime_fingerprint = _runtime_fingerprint(_find_repo_root(target)) record = MarimoSessionRecord( @@ -381,7 +257,7 @@ def register_managed_session( def unregister_managed_session(*, registry_path: Path, pid: int) -> None: - records = _prune_registry(_load_registry(registry_path)) + records = _prune_registry(_load_registry(registry_path), pid_is_live=_pid_is_live) _write_registry(registry_path, [record for record in records if record.pid != pid]) diff --git a/src/reader/workbench/templates/builtins/__init__.py b/src/reader/workbench/templates/builtins/__init__.py index 9adeeda..a434622 100644 --- a/src/reader/workbench/templates/builtins/__init__.py +++ b/src/reader/workbench/templates/builtins/__init__.py @@ -1,5 +1,6 @@ from .basic import DESCRIPTOR as BASIC_TEMPLATE from .cytometry import DESCRIPTOR as CYTOMETRY_TEMPLATE +from .dual_reporter_triptych import DESCRIPTOR as DUAL_REPORTER_TRIPTYCH_TEMPLATE from .eda import DESCRIPTOR as EDA_TEMPLATE from .microplate import DESCRIPTOR as MICROPLATE_TEMPLATE from .retron_sponge import DESCRIPTOR as RETRON_SPONGE_TEMPLATE @@ -13,6 +14,7 @@ BASIC_TEMPLATE, MICROPLATE_TEMPLATE, CYTOMETRY_TEMPLATE, + DUAL_REPORTER_TRIPTYCH_TEMPLATE, SFXI_EDA_TEMPLATE, ) @@ -20,6 +22,7 @@ "BASIC_TEMPLATE", "BUILTIN_NOTEBOOK_TEMPLATES", "CYTOMETRY_TEMPLATE", + "DUAL_REPORTER_TRIPTYCH_TEMPLATE", "EDA_TEMPLATE", "MICROPLATE_TEMPLATE", "RETRON_SPONGE_TEMPLATE", diff --git a/src/reader/workbench/templates/builtins/dual_reporter_triptych.marimo.py.txt b/src/reader/workbench/templates/builtins/dual_reporter_triptych.marimo.py.txt new file mode 100644 index 0000000..e3d0ee8 --- /dev/null +++ b/src/reader/workbench/templates/builtins/dual_reporter_triptych.marimo.py.txt @@ -0,0 +1,372 @@ +import marimo + +__generated_with = "0.19.1" +app = marimo.App(width="medium") + + +@app.cell(hide_code=True) +def _(): + from pathlib import Path + + import marimo as mo + + try: + import polars as pl + except Exception: + pl = None + try: + import pandas as pd + except Exception: + pd = None + + altair_err = None + try: + import altair as alt + + alt.data_transformers.disable_max_rows() + except Exception as exc: + alt = None + altair_err = exc + + from reader.workbench.notebooks.context import load_notebook_workbench_context + from reader.workbench.notebooks.dual_reporter_triptych import ( + build_dual_reporter_triptych_chart, + build_triptych_data, + choose_time, + infer_induction_time_h, + summarize_design_context, + ) + from reader.workbench.records import discover_dataframe_records + + return ( + Path, + alt, + altair_err, + build_dual_reporter_triptych_chart, + build_triptych_data, + choose_time, + discover_dataframe_records, + infer_induction_time_h, + load_notebook_workbench_context, + mo, + pd, + pl, + summarize_design_context, + ) + + +@app.cell(hide_code=True) +def _(Path, load_notebook_workbench_context): + notebook_context = load_notebook_workbench_context(Path(__file__).resolve()) + decl = notebook_context.decl + exp_dir = notebook_context.experiment_root + outputs_dir = notebook_context.outputs_dir + exp_meta = { + "id": decl.experiment.id, + "title": decl.experiment.title or "", + } + pipeline_step_ids = [step.id for step in notebook_context.workbench.pipeline] + return decl, exp_dir, exp_meta, outputs_dir, pipeline_step_ids + + +@app.cell(hide_code=True) +def _(discover_dataframe_records, outputs_dir): + record_info, record_labels, record_note, record_warning = discover_dataframe_records( + outputs_dir, + allow_scan=__ALLOW_RECORD_SCAN__, + ) + return record_info, record_labels, record_note, record_warning + + +@app.cell(hide_code=True) +def _(mo, pipeline_step_ids, record_info, record_labels, record_note, record_warning): + if record_warning: + mo.md(record_warning) + if not record_labels: + note = record_note or "No datasets found. Run `uv run reader run` first." + mo.md(note) + record_dropdown = None + else: + _preferred_records = ( + "promote_to_tidy_plus_map/df", + "ratio_yfp_od600/df", + "ratio_yfp_cfp/df", + "overflow/df", + "labels/df", + ) + _default_label = None + for _record_id in _preferred_records: + _matches = [label for label, info in record_info.items() if info.get("record_id") == _record_id] + if _matches: + _default_label = sorted(_matches)[0] + break + if _default_label is None and pipeline_step_ids: + for _step_id in reversed(pipeline_step_ids): + _matches = [label for label, info in record_info.items() if info.get("step_id") == _step_id] + if _matches: + _default_label = sorted(_matches)[0] + break + _default_label = _default_label or record_labels[0] + mo.md(f"This run has {len(record_labels)} dataframe record(s). Select one to explore:") + record_dropdown = mo.ui.dropdown( + options=record_labels, + value=_default_label, + label="Dataset (dataframe record)", + full_width=True, + ) + return record_dropdown + + +@app.cell(hide_code=True) +def _(record_dropdown, record_info): + if record_dropdown is None: + selected_label = None + record_path = None + else: + selected_label = record_dropdown.value + record_path = record_info.get(selected_label, {}).get("path") + return record_path, selected_label + + +@app.cell(hide_code=True) +def _(pl, record_path): + df = None + df_error = None + _pl_error = None + if record_path is not None: + if pl is None: + df_error = "Polars is required to read parquet. Install the notebooks group." + else: + try: + df = pl.read_parquet(record_path) + except Exception as exc: + _pl_error = str(exc) + if df is None and df_error is None: + _suffix = _pl_error or "unknown error" + df_error = f"Failed to read parquet with polars ({_suffix})." + return df, df_error + + +@app.cell(hide_code=True) +def _(df, df_error, mo, selected_label): + if df_error: + mo.stop(True, mo.md(f"Failed to load `{selected_label}`: {df_error}")) + if df is None: + mo.stop(True, mo.md("Select a dataset to explore.")) + data_ready = True + return data_ready + + +@app.cell(hide_code=True) +def _(df, data_ready, mo, pd, pl): + if pd is None: + mo.stop(True, mo.md("Pandas is required for the triptych notebook.")) + if pl is not None and df.__class__.__module__.startswith("polars"): + tidy_pd = df.to_pandas() + else: + tidy_pd = df + return tidy_pd + + +@app.cell(hide_code=True) +def _(exp_dir, exp_meta, mo): + _exp_id = exp_meta.get("id") or exp_dir.name + _exp_title = exp_meta.get("title") or _exp_id + mo.md(f"# {_exp_title}\n**Experiment id:** `{_exp_id}`\n\n## Dual-reporter triptych") + + +@app.cell(hide_code=True) +def _(mo, tidy_pd): + _columns = list(tidy_pd.columns) + missing = [column for column in ("channel", "time", "value") if column not in _columns] + if missing: + mo.stop(True, mo.md(f"Selected dataset is not dual-reporter compatible. Missing: {', '.join(missing)}.")) + + design_col_options = [column for column in ("design_id_alias", "design_id") if column in _columns] + treatment_col_options = [column for column in ("treatment_alias", "treatment") if column in _columns] + if not design_col_options: + mo.stop(True, mo.md("Selected dataset needs `design_id` or `design_id_alias`.")) + if not treatment_col_options: + mo.stop(True, mo.md("Selected dataset needs `treatment` or `treatment_alias`.")) + channels_available = sorted(str(value) for value in tidy_pd["channel"].dropna().unique().tolist()) + for _channel in ("OD600", "YFP/CFP"): + if _channel not in channels_available: + mo.stop(True, mo.md(f"Selected dataset lacks required channel `{_channel}`.")) + design_col = design_col_options[0] + treatment_col = treatment_col_options[0] + time_col = "time" + return channels_available, design_col, treatment_col, time_col + + +@app.cell(hide_code=True) +def _(decl, pd, tidy_pd, time_col): + time_values = sorted( + float(value) for value in pd.to_numeric(tidy_pd[time_col], errors="coerce").dropna().unique().tolist() + ) + time_min = float(time_values[0]) + time_max = float(time_values[-1]) + time_step = min([b - a for a, b in zip(time_values[:-1], time_values[1:]) if b > a] or [0.25]) + + default_time = time_max + protocol_inputs = getattr(decl.experiment_semantics.protocol, "inputs", {}) or {} + if isinstance(protocol_inputs, dict): + fold_change = protocol_inputs.get("fold_change", {}) or {} + report_times = fold_change.get("report_times") if isinstance(fold_change, dict) else None + if isinstance(report_times, list) and report_times: + try: + default_time = float(report_times[0]) + except Exception: + default_time = time_max + if default_time < time_min or default_time > time_max: + default_time = time_max + return default_time, time_max, time_min, time_step, time_values + + +@app.cell(hide_code=True) +def _(default_time, design_col, mo, tidy_pd, time_col, time_max, time_min, time_step): + design_values = sorted(str(value) for value in tidy_pd[design_col].dropna().unique().tolist()) + design_select = mo.ui.dropdown( + options=design_values, + value=design_values[0], + label=f"Design ({design_col})", + full_width=True, + ) + time_mode = mo.ui.dropdown( + options=["nearest", "last_before", "first_after", "exact"], + value="nearest", + label="Time mode", + full_width=True, + ) + time_slider = mo.ui.slider( + start=time_min, + stop=time_max, + value=default_time, + step=time_step, + debounce=True, + show_value=True, + label="Target time (h)", + full_width=True, + ) + mo.hstack([design_select, time_mode, time_slider]) + return design_select, time_mode, time_slider + + +@app.cell(hide_code=True) +def _(choose_time, design_col, design_select, mo, pd, tidy_pd, time_col, time_mode, time_slider): + design_value = design_select.value + selected_rows = tidy_pd[tidy_pd[design_col].astype(str) == str(design_value)].copy() + if selected_rows.empty: + mo.stop(True, mo.md("No rows for the selected design.")) + design_times = sorted( + float(value) for value in pd.to_numeric(selected_rows[time_col], errors="coerce").dropna().unique().tolist() + ) + time_target_h = float(time_slider.value) + time_selected_h = choose_time(design_times, time_target_h, str(time_mode.value)) + if time_selected_h is None: + mo.stop(True, mo.md(f"No time matches {time_target_h:.3f} h with mode `{time_mode.value}`.")) + return design_value, selected_rows, time_selected_h, time_target_h + + +@app.cell(hide_code=True) +def _(infer_induction_time_h, selected_rows, time_col): + induction_time_h = infer_induction_time_h(selected_rows, time_col=time_col) + return induction_time_h + + +@app.cell(hide_code=True) +def _(design_col, design_value, mo, selected_rows, summarize_design_context): + _rows = summarize_design_context( + selected_rows, + primary_col=design_col, + primary_value=design_value, + preferred_columns=("design_id_alias", "design_id", "id", "sequence", "strain", "medium"), + ) + _label_names = { + "design_id": "Design ID", + "design_id_alias": "Design alias", + "id": "Record ID", + "sequence": "Sequence", + "strain": "Strain", + "medium": "Medium", + } + _lines = [f"**{_label_names.get(_label, _label)}:** `{_value}`" for _label, _value in _rows] + triptych_context = mo.md("### Selected design\n" + " \n".join(_lines)) + return triptych_context + + +@app.cell(hide_code=True) +def _(build_triptych_data, mo, selected_rows, time_col, time_selected_h, treatment_col): + treatment_order = sorted(str(value) for value in selected_rows[treatment_col].dropna().unique().tolist()) + try: + triptych_data = build_triptych_data( + selected_rows, + time_col=time_col, + treatment_col=treatment_col, + growth_channel="OD600", + ratio_channel="YFP/CFP", + snapshot_channel="YFP/CFP", + snapshot_time=float(time_selected_h), + treatment_order=treatment_order, + ) + except Exception as exc: + mo.stop(True, mo.md(f"Triptych build failed: `{exc}`")) + return triptych_data, treatment_order + + +@app.cell(hide_code=True) +def _(mo, time_mode, time_selected_h, time_target_h): + _delta = abs(float(time_selected_h) - float(time_target_h)) + _lines = [ + f"**Target time:** {float(time_target_h):.3f} h", + f"**Rendered snapshot time:** {float(time_selected_h):.3f} h", + ] + if _delta > 0: + _lines.append(f"Delta from target (mode={time_mode.value}): {_delta:.3f} h") + mo.md("## Snapshot selection\n" + "\n".join(_lines)) + + +@app.cell(hide_code=True) +def _( + alt, + altair_err, + build_dual_reporter_triptych_chart, + induction_time_h, + mo, + pd, + time_col, + treatment_col, + triptych_context, + triptych_data, +): + if alt is None: + mo.stop(True, mo.md(f"Altair is required for plotting: `{altair_err}`")) + try: + _chart = build_dual_reporter_triptych_chart( + alt=alt, + pd_module=pd, + data=triptych_data, + time_col=time_col, + treatment_col=treatment_col, + induction_time_h=induction_time_h, + ) + except Exception as exc: + mo.stop(True, mo.md(f"Triptych render failed: `{exc}`")) + _chart_view = mo.ui.altair_chart(_chart, chart_selection=False, legend_selection=False) + _chart_panel = mo.vstack([triptych_context, _chart_view], gap=0.35).style( + {"min-height": "520px", "width": "100%", "max-width": "100%"} + ) + mo.output.replace(_chart_panel) + + +@app.cell(hide_code=True) +def _(mo, selected_rows): + mo.vstack( + [ + mo.md("## Selected rows"), + mo.ui.table(selected_rows, page_size=10), + ] + ) + + +if __name__ == "__main__": + app.run() diff --git a/src/reader/workbench/templates/builtins/dual_reporter_triptych.py b/src/reader/workbench/templates/builtins/dual_reporter_triptych.py new file mode 100644 index 0000000..82aadf3 --- /dev/null +++ b/src/reader/workbench/templates/builtins/dual_reporter_triptych.py @@ -0,0 +1,20 @@ +from __future__ import annotations + +from reader.workbench.assets.types import AssetCapabilities, AssetRequirement +from reader.workbench.templates.model import NotebookTemplateDescriptor + +DESCRIPTOR = NotebookTemplateDescriptor( + template="notebook/dual_reporter_triptych", + domain="plate_reader", + family="screen_review", + summary="Dual-reporter OD600 + ratio kinetics + snapshot triptych.", + tags=("dual_reporter", "triptych", "plate_reader"), + source_package=__package__, + source_name="dual_reporter_triptych.marimo.py.txt", + capabilities=AssetCapabilities( + requires_any=( + AssetRequirement(domain="plate_reader"), + AssetRequirement(record_contract="plate_reader.annotated.v1"), + ) + ), +) diff --git a/src/reader/workbench/templates/builtins/sfxi_eda.marimo.py.txt b/src/reader/workbench/templates/builtins/sfxi_eda.marimo.py.txt index cea0800..79756fa 100644 --- a/src/reader/workbench/templates/builtins/sfxi_eda.marimo.py.txt +++ b/src/reader/workbench/templates/builtins/sfxi_eda.marimo.py.txt @@ -3,12 +3,14 @@ import marimo __generated_with = "0.19.1" app = marimo.App(width="medium") + @app.cell(hide_code=True) def _(): from pathlib import Path import json import marimo as mo + try: import polars as pl except Exception: @@ -25,6 +27,7 @@ def _(): discover_dataframe_records, ) + @app.cell(hide_code=True) def _(Path, load_notebook_workbench_context): notebook_context = load_notebook_workbench_context(Path(__file__).resolve()) @@ -44,6 +47,7 @@ def _(Path, load_notebook_workbench_context): pipeline_step_ids, ) + @app.cell(hide_code=True) def _(discover_dataframe_records, outputs_dir): record_info, record_labels, record_note, record_warning = discover_dataframe_records( @@ -52,6 +56,7 @@ def _(discover_dataframe_records, outputs_dir): ) return record_info, record_labels, record_note, record_warning + @app.cell(hide_code=True) def _(mo, pipeline_step_ids, record_info, record_labels, record_note, record_warning): if record_warning: @@ -62,14 +67,23 @@ def _(mo, pipeline_step_ids, record_info, record_labels, record_note, record_war record_dropdown = None else: _default_label = None + _preferred_records = ( + "promote_to_tidy_plus_map/df", + "ratio_yfp_od600/df", + "ratio_yfp_cfp/df", + ) + for _record_id in _preferred_records: + _matches = [label for label, info in record_info.items() if info.get("record_id") == _record_id] + if _matches: + _default_label = sorted(_matches)[0] + break if pipeline_step_ids: - for _step_id in reversed(pipeline_step_ids): - _matches = [ - label for label, info in record_info.items() if info.get("step_id") == _step_id - ] - if _matches: - _default_label = sorted(_matches)[0] - break + if _default_label is None: + for _step_id in reversed(pipeline_step_ids): + _matches = [label for label, info in record_info.items() if info.get("step_id") == _step_id] + if _matches: + _default_label = sorted(_matches)[0] + break if _default_label is None: _latest_label = None _latest_mtime = None @@ -83,9 +97,7 @@ def _(mo, pipeline_step_ids, record_info, record_labels, record_note, record_war _latest_mtime = _mtime _latest_label = _label _default_label = _latest_label or record_labels[0] - mo.md( - f"This run has {len(record_labels)} dataframe record(s). Select one to explore:" - ) + mo.md(f"This run has {len(record_labels)} dataframe record(s). Select one to explore:") record_dropdown = mo.ui.dropdown( options=record_labels, value=_default_label, @@ -94,6 +106,7 @@ def _(mo, pipeline_step_ids, record_info, record_labels, record_note, record_war ) return record_dropdown + @app.cell(hide_code=True) def _(record_dropdown, record_info): if record_dropdown is None: @@ -104,6 +117,7 @@ def _(record_dropdown, record_info): record_path = record_info.get(selected_label, {}).get("path") return selected_label, record_path + @app.cell(hide_code=True) def _(pl, record_path): df = None @@ -123,6 +137,7 @@ def _(pl, record_path): df_error = f"Failed to read parquet with polars ({_suffix})." return df, df_error + @app.cell(hide_code=True) def _(df, pl): design_treatment_rows = [] @@ -142,6 +157,7 @@ def _(df, pl): _missing.append("treatment") design_treatment_note = f"Missing column(s): {', '.join(_missing)}." else: + def _unique_values(df, col): values = [] try: @@ -165,6 +181,7 @@ def _(df, pl): ) return design_treatment_rows, design_treatment_note + @app.cell(hide_code=True) def _(design_treatment_note, design_treatment_rows, exp_dir, exp_meta, mo): _exp_id = exp_meta.get("id") or exp_dir.name @@ -182,6 +199,7 @@ def _(design_treatment_note, design_treatment_rows, exp_dir, exp_meta, mo): ) return eda_overview_panel + @app.cell(hide_code=True) def _(df_error, mo, record_dropdown, record_note): _elements = [mo.md("## Dataset selection")] @@ -194,6 +212,7 @@ def _(df_error, mo, record_dropdown, record_note): eda_dataset_panel = mo.vstack(_elements) return eda_dataset_panel + @app.cell(hide_code=True) def _(df, df_error, mo, selected_label): if df_error: @@ -203,6 +222,7 @@ def _(df, df_error, mo, selected_label): data_ready = True return data_ready + @app.cell(hide_code=True) def _(df, data_ready, mo, pl): _columns = list(df.columns) if hasattr(df, "columns") else [] @@ -217,6 +237,7 @@ def _(df, data_ready, mo, pl): eda_table_panel = mo.vstack(_elements) return eda_table_panel + @app.cell(hide_code=True) def _(eda_dataset_panel, eda_overview_panel, eda_table_panel, mo): eda_base_panel = mo.vstack( @@ -233,6 +254,7 @@ def _(eda_dataset_panel, eda_overview_panel, eda_table_panel, mo): def _(eda_base_panel): eda_base_panel + @app.cell(hide_code=True) def _(): try: @@ -247,6 +269,7 @@ def _(): altair_err = None try: import altair as alt + alt.data_transformers.disable_max_rows() except Exception as exc: alt = None @@ -255,44 +278,57 @@ def _(): from reader.domains.logic.sfxi.api import load_sfxi_config from reader.domains.logic.sfxi.run import build_vec8_from_tidy from reader.domains.logic.sfxi.selection import cornerize_and_aggregate, REQUIRED_COLS + from reader.workbench.notebooks.dual_reporter_triptych import ( + build_dual_reporter_triptych_chart, + build_triptych_data, + summarize_design_context, + ) + from reader.workbench.graph import resolve_workbench return ( pd, np, alt, altair_err, + build_dual_reporter_triptych_chart, + build_triptych_data, + summarize_design_context, load_sfxi_config, build_vec8_from_tidy, cornerize_and_aggregate, + resolve_workbench, REQUIRED_COLS, ) + @app.cell(hide_code=True) def _(decl, outputs_dir): exports_cfg = decl.experiment_semantics.layout.exports_subdir exports_dir = outputs_dir if exports_cfg in ("", ".", "./") else outputs_dir / str(exports_cfg) return exports_dir + @app.cell(hide_code=True) def _(mo): - mo.md( - """## SFXI 8-vector Builder -This section mirrors the **setpoint‑fidelity × intensity** definition used in OPAL: each design is summarized by an **8‑vector** with four **logic** values (v00..v11 in [0,1]) and four **intensity** values (y*00..y*11 in log2, reference‑normalized). The logic half captures **shape** (which corners turn on/off), while the intensity half captures **effect size** after reference normalization to make runs comparable. + mo.md(""" + ## SFXI 8-vector Builder + This section mirrors the **setpoint‑fidelity × intensity** definition used in OPAL: each design is summarized by an **8‑vector** with four **logic** values (v00..v11 in [0,1]) and four **intensity** values (y*00..y*11 in log2, reference‑normalized). The logic half captures **shape** (which corners turn on/off), while the intensity half captures **effect size** after reference normalization to make runs comparable. -Workflow: -- choose a **time slice** and map treatments to the **00/10/01/11** corners -- logic: `log2(YFP/CFP)` → per‑design min–max → **v00..v11** -- intensity: `log2((YFP/OD600)/(reference+α)+δ)` → **y*00..y*11** + Workflow: + - choose a **time slice** and map treatments to the **00/10/01/11** corners + - logic: `log2(YFP/CFP)` → per‑design min–max → **v00..v11** + - intensity: `log2((YFP/OD600)/(reference+α)+δ)` → **y*00..y*11** + + The 8-vector here uses the same `transform/sfxi` code and writes XLSX + JSON logs to the experiment's exports folder. + """) -The 8-vector here uses the same `transform/sfxi` code and writes XLSX + JSON logs to the experiment's exports folder.""" - ) @app.cell(hide_code=True) def _(decl, mo, resolve_workbench): - from reader.runtime import builtin_runtime + from reader.runtime import builtin_runtime as _builtin_runtime sfxi_step = None - registry = builtin_runtime().plugins + registry = _builtin_runtime().plugins for step in resolve_workbench(decl).pipeline: plugin = str(getattr(step, "plugin", "")) if not plugin: @@ -310,9 +346,16 @@ def _(decl, mo, resolve_workbench): ) return sfxi_step + @app.cell(hide_code=True) def _(decl, mo, sfxi_step): - sfxi_step_cfg = dict(getattr(sfxi_step, "with_", {}) or {}) + from reader.runtime import builtin_runtime as _builtin_runtime + + bound_protocol = _builtin_runtime().bind_protocol(decl.experiment_semantics.protocol) + sfxi_step_cfg = bound_protocol.effective_plugin_config( + plugin_id=sfxi_step.plugin, + step_with=dict(getattr(sfxi_step, "with_", {}) or {}), + ) sfxi_step_id = getattr(sfxi_step, "id", "") if not sfxi_step_cfg: mo.stop(True, mo.md(f"Step `{sfxi_step_id}` has no SFXI config (`with`).")) @@ -327,6 +370,7 @@ def _(decl, mo, sfxi_step): sfxi_step_cfg["treatment_case_sensitive"] = bool(logic_spec.case_sensitive) return sfxi_step_cfg, sfxi_step_id + @app.cell(hide_code=True) def _(load_sfxi_config, mo, sfxi_step_cfg, sfxi_step_id): try: @@ -335,6 +379,7 @@ def _(load_sfxi_config, mo, sfxi_step_cfg, sfxi_step_id): mo.stop(True, mo.md(f"SFXI config error in `{sfxi_step_id}`: `{exc}`")) return sfxi_cfg + @app.cell(hide_code=True) def _(REQUIRED_COLS, df, mo, sfxi_cfg): _cols = list(df.columns) if hasattr(df, "columns") else [] @@ -356,6 +401,7 @@ def _(REQUIRED_COLS, df, mo, sfxi_cfg): ) return required + @app.cell(hide_code=True) def _(df, mo, pd, pl): if pd is None: @@ -366,6 +412,7 @@ def _(df, mo, pd, pl): tidy_pd = df return tidy_pd + @app.cell(hide_code=True) def _(mo, np, pd, sfxi_cfg, tidy_pd): label_col = sfxi_cfg.design_by[0] @@ -404,6 +451,7 @@ def _(mo, np, pd, sfxi_cfg, tidy_pd): return label_col, time_col, design_vals, time_min, time_max, time_step, default_time + @app.cell(hide_code=True) def _(pd, tidy_pd, time_col): induction_time_h = None @@ -431,6 +479,7 @@ def _(pd, tidy_pd, time_col): return induction_time_h + @app.cell(hide_code=True) def _(mo, np, pd, sfxi_cfg, tidy_pd, time_col): case_sensitive = bool(sfxi_cfg.treatment_case_sensitive) @@ -481,16 +530,12 @@ def _(mo, np, pd, sfxi_cfg, tidy_pd, time_col): if not logic_times: mo.stop( True, - mo.md( - "No time values found for the logic channel after filtering to the configured treatments." - ), + mo.md("No time values found for the logic channel after filtering to the configured treatments."), ) if not intensity_times: mo.stop( True, - mo.md( - "No time values found for the intensity channel after filtering to the configured treatments." - ), + mo.md("No time values found for the intensity channel after filtering to the configured treatments."), ) def _round_times(times): @@ -509,8 +554,13 @@ def _(mo, np, pd, sfxi_cfg, tidy_pd, time_col): ) treatment_col = logic_treatment_col - treatment_order = [sfxi_cfg.treatment_map[_k] for _k in ("00", "10", "01", "11")] - return common_times, treatment_col, treatment_order + sfxi_condition_order = [f"{_corner}: {sfxi_cfg.treatment_map[_corner]}" for _corner in ("00", "10", "01", "11")] + sfxi_condition_map = { + str(sfxi_cfg.treatment_map[_corner]): f"{_corner}: {sfxi_cfg.treatment_map[_corner]}" + for _corner in ("00", "10", "01", "11") + } + return common_times, sfxi_condition_map, sfxi_condition_order, treatment_col + @app.cell(hide_code=True) def _(default_time, design_vals, label_col, mo, sfxi_cfg, time_max, time_min, time_step): @@ -531,6 +581,8 @@ def _(default_time, design_vals, label_col, mo, sfxi_cfg, time_max, time_min, ti stop=time_max, value=default_time, step=time_step, + debounce=True, + show_value=True, label="Target time (h)", full_width=True, ) @@ -543,6 +595,7 @@ def _(default_time, design_vals, label_col, mo, sfxi_cfg, time_max, time_min, ti ) return design_select, time_mode, time_slider + @app.cell(hide_code=True) def _(common_times, mo, np, time_mode, time_slider): time_target_h = float(time_slider.value) @@ -586,6 +639,7 @@ def _(common_times, mo, np, time_mode, time_slider): ) return time_target_h, time_selected_h + @app.cell(hide_code=True) def _( cornerize_and_aggregate, @@ -635,6 +689,28 @@ def _( chosen_time = target_time return subset_pd, sel_logic, sel_int, chosen_time + +@app.cell(hide_code=True) +def _(design_select, label_col, mo, subset_pd, summarize_design_context): + _rows = summarize_design_context( + subset_pd, + primary_col=label_col, + primary_value=design_select.value, + preferred_columns=("design_id_alias", "design_id", "id", "sequence", "strain", "medium"), + ) + _label_names = { + "design_id": "Design ID", + "design_id_alias": "Design alias", + "id": "Record ID", + "sequence": "Sequence", + "strain": "Strain", + "medium": "Medium", + } + _lines = [f"**{_label_names.get(_label, _label)}:** `{_value}`" for _label, _value in _rows] + triptych_context = mo.md("### Selected design\n" + " \n".join(_lines)) + return triptych_context + + @app.cell(hide_code=True) def _(mo, time_mode, time_selected_h, time_target_h): delta = abs(float(time_selected_h) - float(time_target_h)) @@ -646,241 +722,87 @@ def _(mo, time_mode, time_selected_h, time_target_h): lines.append(f"Δ from target (mode={time_mode.value}): {delta:.3f} h") mo.md("## Snapshot selection\n" + "\n".join(lines)) + @app.cell(hide_code=True) -def _(np, pd, subset_pd, time_col, treatment_col): - _dfc = subset_pd[subset_pd["channel"] == "OD600"].copy() - if _dfc.empty: - ts_od600 = pd.DataFrame(columns=[time_col, treatment_col, "y_mean", "y_sd", "y_n", "y_lo", "y_hi"]) - else: - _dfc[time_col] = pd.to_numeric(_dfc[time_col], errors="coerce") - _dfc["value"] = pd.to_numeric(_dfc["value"], errors="coerce") - _dfc = _dfc.dropna(subset=[time_col, "value", treatment_col]) - ts_od600 = ( - _dfc.groupby([time_col, treatment_col], dropna=False)["value"] - .agg(["mean", "std", "count"]) - .reset_index() +def _(mo, sfxi_condition_map, sfxi_condition_order, subset_pd, treatment_col): + sfxi_triptych_treatment_col = "sfxi_condition" + sfxi_triptych_rows = subset_pd.copy() + _raw_treatment = sfxi_triptych_rows[treatment_col].astype(str) + sfxi_triptych_rows[sfxi_triptych_treatment_col] = _raw_treatment.map(sfxi_condition_map) + sfxi_triptych_rows = sfxi_triptych_rows[ + sfxi_triptych_rows[sfxi_triptych_treatment_col].isin(sfxi_condition_order) + ].copy() + if sfxi_triptych_rows.empty: + mo.stop( + True, + mo.md( + "No rows for the selected design match the configured SFXI 00/10/01/11 treatment map. " + "Check `annotations.logic_maps` and the selected dataset." + ), ) - ts_od600 = ts_od600.rename(columns={"mean": "y_mean", "std": "y_sd", "count": "y_n"}) - ts_od600["y_sd"] = ts_od600["y_sd"].fillna(0.0) - ts_od600["y_lo"] = ts_od600["y_mean"] - ts_od600["y_sd"] - ts_od600["y_hi"] = ts_od600["y_mean"] + ts_od600["y_sd"] - return ts_od600 - -@app.cell(hide_code=True) -def _(np, pd, subset_pd, time_col, treatment_col, sfxi_cfg, time_selected_h): - bar_stats = pd.DataFrame(columns=[treatment_col, "y_mean", "y_sd", "y_n", "y_lo", "y_hi"]) - bar_points = pd.DataFrame(columns=[treatment_col, "value"]) - time_snap = None - - _dfc = subset_pd[subset_pd["channel"] == sfxi_cfg.response.logic_channel].copy() - if not _dfc.empty: - _dfc[time_col] = pd.to_numeric(_dfc[time_col], errors="coerce") - _dfc["value"] = pd.to_numeric(_dfc["value"], errors="coerce") - _dfc = _dfc.dropna(subset=[time_col, "value", treatment_col]) - time_snap = float(time_selected_h) - if np is not None: - _mask = np.isclose(_dfc[time_col], time_snap, atol=1e-9) - else: - _mask = (_dfc[time_col] - time_snap).abs() <= 1e-9 - _df_snap = _dfc[_mask].copy() - if not _df_snap.empty: - bar_stats = ( - _df_snap.groupby(treatment_col, dropna=False)["value"] - .agg(["mean", "std", "count"]) - .reset_index() - ) - bar_stats = bar_stats.rename(columns={"mean": "y_mean", "std": "y_sd", "count": "y_n"}) - bar_stats["y_sd"] = bar_stats["y_sd"].fillna(0.0) - bar_stats["y_lo"] = bar_stats["y_mean"] - bar_stats["y_sd"] - bar_stats["y_hi"] = bar_stats["y_mean"] + bar_stats["y_sd"] - bar_points = _df_snap[[treatment_col, "value"]].copy() - - return bar_stats, bar_points, time_snap + return sfxi_triptych_rows, sfxi_triptych_treatment_col + + +@app.cell(hide_code=True) +def _( + build_triptych_data, + mo, + sfxi_cfg, + sfxi_condition_order, + sfxi_triptych_rows, + sfxi_triptych_treatment_col, + time_col, + time_selected_h, +): + try: + triptych_data = build_triptych_data( + sfxi_triptych_rows, + time_col=time_col, + treatment_col=sfxi_triptych_treatment_col, + growth_channel="OD600", + ratio_channel=sfxi_cfg.response.logic_channel, + snapshot_channel=sfxi_cfg.response.logic_channel, + snapshot_time=float(time_selected_h), + treatment_order=sfxi_condition_order, + ) + except Exception as exc: + mo.stop(True, mo.md(f"Dual-reporter triptych build failed: `{exc}`")) + return triptych_data + @app.cell(hide_code=True) def _( alt, altair_err, - bar_points, - bar_stats, + build_dual_reporter_triptych_chart, induction_time_h, mo, pd, - sfxi_cfg, time_col, - time_selected_h, - treatment_col, - treatment_order, - ts_od600, + sfxi_triptych_treatment_col, + triptych_context, + triptych_data, ): if alt is None: mo.stop(True, mo.md(f"Altair is required for plotting: `{altair_err}`")) - if ts_od600 is None or ts_od600.empty: - mo.stop(True, mo.md("No OD600 data available for this design.")) - - _snap_time = float(time_selected_h) - - _ts_tooltips = [ - alt.Tooltip(f"{time_col}:Q", title="Time (h)"), - alt.Tooltip(f"{treatment_col}:N", title="Treatment"), - alt.Tooltip("y_mean:Q", title="Mean"), - alt.Tooltip("y_sd:Q", title="SD"), - alt.Tooltip("y_n:Q", title="N"), - ] - _ts_width = 320 - _ts_height = 320 - _bar_width = 420 - _bar_height = 320 - _chart_spacing = 28 - - _ts_base = alt.Chart(ts_od600).encode( - x=alt.X( - f"{time_col}:Q", - axis=alt.Axis(labelOverlap=False), - ), - color=alt.Color( - f"{treatment_col}:N", - sort=treatment_order, - scale=alt.Scale(domain=treatment_order), - legend=alt.Legend(orient="bottom", title="Treatment"), - ), - ) - - _ts_band = _ts_base.mark_area(opacity=0.2).encode( - y=alt.Y("y_lo:Q", title="OD600"), - y2=alt.Y2("y_hi:Q"), - tooltip=_ts_tooltips, - ) - _ts_line = _ts_base.mark_line().encode( - y=alt.Y("y_mean:Q", title="OD600"), - tooltip=_ts_tooltips, - ) - - _y_max = ts_od600["y_hi"].max() - if pd.isna(_y_max): - _y_max = ts_od600["y_mean"].max() - if pd.isna(_y_max): - _y_max = 0.0 - - _rule_df = pd.DataFrame( - { - time_col: [_snap_time], - "y": [float(_y_max)], - "label": [f"t = {_snap_time:.3f} h"], - } - ) - _ts_rule = alt.Chart(_rule_df).mark_rule(color="black").encode(x=alt.X(f"{time_col}:Q")) - _ts_text = alt.Chart(_rule_df).mark_text(color="black", align="left", dx=6, dy=-6).encode( - x=alt.X(f"{time_col}:Q"), - y=alt.Y("y:Q"), - text="label", - ) - - _induction_time = None - if induction_time_h is not None: - try: - _val = float(induction_time_h) - if not pd.isna(_val): - _induction_time = _val - except Exception: - _induction_time = None - - _ts_layers = [_ts_band, _ts_line] - if _induction_time is not None: - _ind_df = pd.DataFrame({time_col: [_induction_time]}) - _ts_induction = alt.Chart(_ind_df).mark_rule(color="red", strokeDash=[6, 4]).encode( - x=alt.X(f"{time_col}:Q") - ) - _ts_layers.append(_ts_induction) - _ts_layers.extend([_ts_rule, _ts_text]) - - ts_chart = alt.layer(*_ts_layers).properties( - width=_ts_width, - height=_ts_height, - ) - - if bar_stats is None or bar_stats.empty: - mo.stop(True, mo.md("No snapshot data available at this time.")) - - _bar_axis = alt.Axis(labelLimit=0, labelOverlap=False, labelAngle=-45) - _bar_title = f"{sfxi_cfg.response.logic_channel} snapshot (mean)" - _bar_base = alt.Chart(bar_stats).encode( - x=alt.X( - f"{treatment_col}:N", - sort=treatment_order, - axis=_bar_axis, - ), - y=alt.Y("y_mean:Q", title=_bar_title), - tooltip=[ - alt.Tooltip(f"{treatment_col}:N", title="Treatment"), - alt.Tooltip("y_mean:Q", title="Mean"), - alt.Tooltip("y_sd:Q", title="SD"), - alt.Tooltip("y_n:Q", title="N"), - ], - ) - - _bar_bars = _bar_base.mark_bar().encode( - color=alt.Color( - f"{treatment_col}:N", - sort=treatment_order, - scale=alt.Scale(domain=treatment_order), - legend=None, - ) - ) - _bar_err_rule = _bar_base.mark_rule(color="black").encode( - y=alt.Y("y_lo:Q"), - y2=alt.Y2("y_hi:Q"), - ) - _bar_err_low = _bar_base.mark_tick(color="black", orient="horizontal", size=8, thickness=1.5).encode( - y=alt.Y("y_lo:Q"), - ) - _bar_err_high = _bar_base.mark_tick(color="black", orient="horizontal", size=8, thickness=1.5).encode( - y=alt.Y("y_hi:Q"), - ) - - _bar_layers = [_bar_bars, _bar_err_rule, _bar_err_low, _bar_err_high] - if bar_points is not None and not bar_points.empty: - _bar_points = alt.Chart(bar_points).mark_point(filled=True, strokeWidth=0, size=50).encode( - x=alt.X(f"{treatment_col}:N", sort=treatment_order, axis=_bar_axis), - y=alt.Y("value:Q"), - tooltip=[ - alt.Tooltip(f"{treatment_col}:N", title="Treatment"), - alt.Tooltip("value:Q", title="Value"), - ], + try: + _chart = build_dual_reporter_triptych_chart( + alt=alt, + pd_module=pd, + data=triptych_data, + time_col=time_col, + treatment_col=sfxi_triptych_treatment_col, + induction_time_h=induction_time_h, ) - _bar_layers.append(_bar_points) - - bar_chart = alt.layer(*_bar_layers).properties( - width=_bar_width, - height=_bar_height, + except Exception as exc: + mo.stop(True, mo.md(f"Dual-reporter triptych render failed: `{exc}`")) + _chart_view = mo.ui.altair_chart(_chart, chart_selection=False, legend_selection=False) + _chart_panel = mo.vstack([triptych_context, _chart_view], gap=0.35).style( + {"min-height": "520px", "width": "100%", "max-width": "100%"} ) + mo.output.replace(_chart_panel) - chart = ( - alt.hconcat(ts_chart, bar_chart, spacing=_chart_spacing) - .resolve_scale(color="shared") - .configure(background="white") - .configure_view(fill="white") - .configure_axis( - domain=True, - domainColor="black", - domainWidth=1, - tickColor="black", - labelColor="black", - titleColor="black", - labelFontSize=13, - titleFontSize=14, - ) - .configure_legend( - labelColor="black", - titleColor="black", - labelFontSize=13, - titleFontSize=13, - ) - .configure_title(color="black", fontSize=15) - .configure_text(color="black", fontSize=13) - ) - mo.ui.altair_chart(chart) @app.cell(hide_code=True) def _(build_vec8_from_tidy, mo, sfxi_step_cfg, time_selected_h, tidy_pd): @@ -893,6 +815,7 @@ def _(build_vec8_from_tidy, mo, sfxi_step_cfg, time_selected_h, tidy_pd): mo.stop(True, mo.md(f"8-vector computation failed: `{exc}`")) return vec8_result + @app.cell(hide_code=True) def _(mo, vec8_result): mo.vstack( @@ -902,6 +825,7 @@ def _(mo, vec8_result): ] ) + @app.cell(hide_code=True) def _(mo, vec8_result): _ref = vec8_result.log.get("reference", {}) if hasattr(vec8_result, "log") else {} @@ -912,6 +836,7 @@ def _(mo, vec8_result): ] mo.md("## Reference anchor\n" + "\n".join(_lines)) + @app.cell(hide_code=True) def _(Path, exports_dir, mo, sfxi_cfg): export_dir = exports_dir / sfxi_cfg.output_subdir @@ -929,6 +854,7 @@ def _(Path, exports_dir, mo, sfxi_cfg): ) return export_button, export_path, log_name + @app.cell(hide_code=True) def _(export_button, export_path, json, log_name, mo, vec8_result): if not export_button.value: @@ -950,5 +876,6 @@ def _(export_button, export_path, json, log_name, mo, vec8_result): json.dump(vec8_result.log, fh, indent=2, sort_keys=True, default=str) mo.md(f"Exported 8-vector to `{export_path}` and log to `{log_path}`.") + if __name__ == "__main__": app.run() diff --git a/tools/audit_local_experiments.py b/tools/audit_local_experiments.py new file mode 100644 index 0000000..f51a6ed --- /dev/null +++ b/tools/audit_local_experiments.py @@ -0,0 +1,275 @@ +from __future__ import annotations + +import argparse +import io +import json +import shutil +import tempfile +import time +from dataclasses import asdict, dataclass +from pathlib import Path + +from rich.console import Console + +from reader.contracts import builtin_contract_catalog +from reader.runtime import builtin_runtime +from reader.workbench.decl import load_workbench_decl +from reader.workbench.engine import run_spec +from reader.workbench.engine.validation import validation_summary +from reader.workbench.experiments import discover_experiment_configs +from reader.workbench.graph import resolve_workbench +from reader.workbench.records import RecordStore + + +@dataclass(slots=True) +class AuditResult: + config: str + name: str + lifecycle: str + status: str + phase: str + seconds: float + detail: str | None + expected_plots: int + expected_exports: int + + +def parse_args() -> argparse.Namespace: + parser = argparse.ArgumentParser(description="Audit local reader experiments on staged temp copies.") + parser.add_argument("--root", default="experiments", help="Experiments root (default: experiments)") + parser.add_argument( + "--years", + nargs="+", + help="Specific year directories to audit. Defaults to every numeric year directory under --root.", + ) + parser.add_argument( + "--include-non-active", + action="store_true", + help="Include non-active lifecycles instead of reporting them as skipped", + ) + parser.add_argument("--format", choices=("text", "json"), default="text", help="Output format") + parser.add_argument("--report-path", help="Optional path to write a JSON report") + parser.add_argument("--fail-fast", action="store_true", help="Stop on the first failure") + return parser.parse_args() + + +def discover_year_dirs(root: Path) -> list[str]: + return sorted(path.name for path in root.iterdir() if path.is_dir() and path.name.isdigit() and len(path.name) == 4) + + +def discover_configs(root: Path, years: set[str]) -> list[Path]: + configs = discover_experiment_configs(root, include_scaffolds=False) + selected: list[Path] = [] + for config in configs: + try: + rel = config.relative_to(root) + except ValueError: + continue + if not rel.parts: + continue + if years and rel.parts[0] not in years: + continue + selected.append(config) + return sorted(selected) + + +def stage_experiment(source_dir: Path, target_dir: Path) -> Path: + shutil.copytree( + source_dir, + target_dir, + ignore=shutil.ignore_patterns("outputs", "__pycache__", ".DS_Store"), + ) + return target_dir / "config.yaml" + + +def verify_outputs(decl, runtime) -> tuple[int, int, str | None]: + workbench = resolve_workbench(decl) + layout = decl.experiment_semantics.layout + outputs = layout.outputs_dir + store = RecordStore( + outputs, + contracts=builtin_contract_catalog(), + plots_subdir=layout.plots_subdir, + exports_subdir=layout.exports_subdir, + create=False, + ) + latest_ids = {record.record_id for record in store.iter_latest_records()} + expected_plot_ids = {f"plot:{plot.id}" for plot in workbench.plots} + expected_export_ids = {f"export:{export.id}" for export in workbench.exports} + + if (outputs / "manifests" / "records.json").exists() is False: + return len(expected_plot_ids), len(expected_export_ids), "records.json missing" + if "ingest/df" not in latest_ids: + return len(expected_plot_ids), len(expected_export_ids), "ingest/df record missing" + + missing_plot_ids = sorted(expected_plot_ids - latest_ids) + if missing_plot_ids: + return len(expected_plot_ids), len(expected_export_ids), f"missing plot records: {missing_plot_ids}" + missing_export_ids = sorted(expected_export_ids - latest_ids) + if missing_export_ids: + return len(expected_plot_ids), len(expected_export_ids), f"missing export records: {missing_export_ids}" + + plots_dir = outputs / layout.plots_subdir + exports_dir = outputs / layout.exports_subdir + if expected_plot_ids and not any(path.is_file() for path in plots_dir.rglob("*")): + return len(expected_plot_ids), len(expected_export_ids), "expected plot files were not created" + if expected_export_ids and not any(path.is_file() for path in exports_dir.rglob("*")): + return len(expected_plot_ids), len(expected_export_ids), "expected export files were not created" + return len(expected_plot_ids), len(expected_export_ids), None + + +def audit_config(config_path: Path, *, include_non_active: bool, runtime) -> AuditResult: + start = time.perf_counter() + decl = load_workbench_decl(config_path, protocols=runtime.protocols) + lifecycle = decl.experiment.lifecycle + rel_config = str(config_path) + if lifecycle != "active" and not include_non_active: + return AuditResult( + config=rel_config, + name=config_path.parent.name, + lifecycle=lifecycle, + status="skipped", + phase="lifecycle", + seconds=time.perf_counter() - start, + detail=f"skipped non-active lifecycle: {lifecycle}", + expected_plots=0, + expected_exports=0, + ) + + with tempfile.TemporaryDirectory(prefix="reader-local-audit-") as tmpdir: + staged_config = stage_experiment(config_path.parent, Path(tmpdir) / config_path.parent.name) + staged_decl = load_workbench_decl(staged_config, protocols=runtime.protocols) + summary = validation_summary( + staged_decl, check_files=True, exp_root=staged_decl.experiment.root, runtime=runtime + ) + if summary["status"] != "ok": + return AuditResult( + config=rel_config, + name=config_path.parent.name, + lifecycle=lifecycle, + status="failed", + phase="validate", + seconds=time.perf_counter() - start, + detail="; ".join(summary["errors"]) or "validation failed", + expected_plots=0, + expected_exports=0, + ) + + try: + run_spec( + staged_decl, + include_pipeline=True, + include_plots=True, + include_exports=True, + runtime=runtime, + console=Console(file=io.StringIO(), force_terminal=False, color_system=None), + log_level="ERROR", + verbose=False, + ) + except Exception as exc: # pragma: no cover - exercised by live audit runs + return AuditResult( + config=rel_config, + name=config_path.parent.name, + lifecycle=lifecycle, + status="failed", + phase="run", + seconds=time.perf_counter() - start, + detail=str(exc), + expected_plots=0, + expected_exports=0, + ) + + expected_plots, expected_exports, verify_error = verify_outputs(staged_decl, runtime) + if verify_error is not None: + return AuditResult( + config=rel_config, + name=config_path.parent.name, + lifecycle=lifecycle, + status="failed", + phase="verify", + seconds=time.perf_counter() - start, + detail=verify_error, + expected_plots=expected_plots, + expected_exports=expected_exports, + ) + + return AuditResult( + config=rel_config, + name=config_path.parent.name, + lifecycle=lifecycle, + status="passed", + phase="complete", + seconds=time.perf_counter() - start, + detail=None, + expected_plots=expected_plots, + expected_exports=expected_exports, + ) + + +def render_text(results: list[AuditResult]) -> str: + lines = [] + for result in results: + detail = f" :: {result.detail}" if result.detail else "" + lines.append( + f"[{result.status.upper():7}] {result.name} " + f"(lifecycle={result.lifecycle}, phase={result.phase}, " + f"plots={result.expected_plots}, exports={result.expected_exports}, " + f"seconds={result.seconds:.2f}){detail}" + ) + counts: dict[str, int] = {} + for result in results: + counts[result.status] = counts.get(result.status, 0) + 1 + summary = ", ".join(f"{key}={counts[key]}" for key in sorted(counts)) + lines.append(f"Summary: {summary}") + return "\n".join(lines) + + +def main() -> int: + args = parse_args() + root = Path(args.root).resolve() + if not root.exists() or not root.is_dir(): + raise SystemExit(f"Experiments root not found: {root}") + years = discover_year_dirs(root) if args.years is None else [str(year) for year in args.years] + if not years: + raise SystemExit(f"No numeric year directories found under {root}. Use --years to select specific directories.") + runtime = builtin_runtime() + configs = discover_configs(root, set(years)) + + results: list[AuditResult] = [] + for config_path in configs: + result = audit_config(config_path, include_non_active=args.include_non_active, runtime=runtime) + results.append(result) + if args.format == "text": + detail = f" :: {result.detail}" if result.detail else "" + print( + f"[{result.status.upper():7}] {result.name} " + f"(phase={result.phase}, lifecycle={result.lifecycle}){detail}", + flush=True, + ) + if args.fail_fast and result.status == "failed": + break + + payload = { + "root": str(root), + "years": sorted(years), + "summary": { + "experiments": len(results), + "passed": sum(1 for item in results if item.status == "passed"), + "failed": sum(1 for item in results if item.status == "failed"), + "skipped": sum(1 for item in results if item.status == "skipped"), + }, + "results": [asdict(result) for result in results], + } + if args.report_path: + Path(args.report_path).write_text(json.dumps(payload, indent=2), encoding="utf-8") + + if args.format == "json": + print(json.dumps(payload, indent=2), flush=True) + else: + print(render_text(results), flush=True) + + return 1 if payload["summary"]["failed"] else 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tools/audit_repo_skills.py b/tools/audit_repo_skills.py new file mode 100644 index 0000000..fab3a70 --- /dev/null +++ b/tools/audit_repo_skills.py @@ -0,0 +1,121 @@ +from __future__ import annotations + +import re +import sys +from pathlib import Path + +REPO_ROOT = Path(__file__).resolve().parents[1] +SKILLS_DIR = REPO_ROOT / "skills" +FRONTMATTER_RE = re.compile(r"\A---\n(.*?)\n---\n", re.DOTALL) +SOURCE_ROW_RE = re.compile(r"^\| https?://[^|]+ \| \d{4}-\d{2}-\d{2} \| [^|]+ \|$", re.MULTILINE) + + +def iter_skill_dirs() -> list[Path]: + return sorted(path for path in SKILLS_DIR.iterdir() if path.is_dir() and not path.name.startswith(".")) + + +def read_text(path: Path) -> str: + return path.read_text(encoding="utf-8") + + +def frontmatter_block(text: str, skill_path: Path) -> str: + match = FRONTMATTER_RE.match(text) + if match is None: + raise ValueError(f"{skill_path.relative_to(REPO_ROOT)}: missing frontmatter") + return match.group(1) + + +def require_in_block(block: str, needle: str, skill_path: Path, label: str) -> list[str]: + if needle not in block: + return [f"{skill_path.relative_to(REPO_ROOT)}: missing {label}"] + return [] + + +def audit_skill_dir(skill_dir: Path) -> list[str]: + errors: list[str] = [] + skill_path = skill_dir / "SKILL.md" + if not skill_path.exists(): + return [f"{skill_dir.relative_to(REPO_ROOT)}: missing SKILL.md"] + + text = read_text(skill_path) + try: + frontmatter = frontmatter_block(text, skill_path) + except ValueError as exc: + return [str(exc)] + + errors.extend( + require_in_block( + frontmatter, + f"name: {skill_dir.name}", + skill_path, + "frontmatter name matching folder", + ) + ) + errors.extend(require_in_block(frontmatter, "description:", skill_path, "frontmatter description")) + errors.extend(require_in_block(frontmatter, "metadata:", skill_path, "metadata block")) + errors.extend(require_in_block(frontmatter, "version:", skill_path, "metadata.version")) + errors.extend(require_in_block(frontmatter, "category:", skill_path, "metadata.category")) + errors.extend(require_in_block(frontmatter, "tags:", skill_path, "metadata.tags")) + + if "Use when" not in frontmatter or "Do not use" not in frontmatter: + errors.append( + f"{skill_path.relative_to(REPO_ROOT)}: frontmatter description must include " + "'Use when' and 'Do not use' routing boundaries" + ) + + required_sections = [ + "## Purpose", + "## Scope", + "## Required Deliverables", + "## Output Contract", + "## Trigger Tests", + ] + for section in required_sections: + if section not in text: + errors.append(f"{skill_path.relative_to(REPO_ROOT)}: missing section {section}") + + external_sources_path = skill_dir / "references" / "external-sources.md" + if not external_sources_path.exists(): + errors.append(f"{skill_dir.relative_to(REPO_ROOT)}: missing references/external-sources.md") + elif "./references/external-sources.md" not in text: + errors.append( + f"{skill_path.relative_to(REPO_ROOT)}: top-level skill does not expose references/external-sources.md" + ) + else: + external_sources = read_text(external_sources_path) + if "| URL | Retrieved | Mapped update |" not in external_sources: + errors.append( + f"{external_sources_path.relative_to(REPO_ROOT)}: missing source table header " + "'| URL | Retrieved | Mapped update |'" + ) + if SOURCE_ROW_RE.search(external_sources) is None: + errors.append( + f"{external_sources_path.relative_to(REPO_ROOT)}: missing at least one source row " + "with URL, YYYY-MM-DD retrieved date, and mapped update" + ) + + return errors + + +def main() -> int: + if not SKILLS_DIR.exists(): + print("skills directory missing", file=sys.stderr) + return 1 + + errors: list[str] = [] + skill_dirs = iter_skill_dirs() + for skill_dir in skill_dirs: + errors.extend(audit_skill_dir(skill_dir)) + + if errors: + print("repo skill audit failed", file=sys.stderr) + for error in errors: + print(f"- {error}", file=sys.stderr) + return 1 + + print(f"repo skill audit ok: {len(skill_dirs)} skills") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/tools/check_docs.py b/tools/check_docs.py index ddba5dc..44bce65 100644 --- a/tools/check_docs.py +++ b/tools/check_docs.py @@ -21,6 +21,7 @@ "docs/guides/getting_started.md", "docs/guides/preflight_run_verify.md", "docs/guides/automation.md", + "docs/guides/data_operations_plan.md", "docs/core/cli.md", "docs/core/pipeline.md", "docs/repo-maintenance.md", @@ -30,6 +31,8 @@ "guides/common_routes.md", "guides/preflight_run_verify.md", "guides/automation.md", + "guides/data_operations_plan.md", + "guides/experiment_bootstrap.md", "guides/demo.md", "core/cli.md", "core/pipeline.md", @@ -44,9 +47,53 @@ "guides/common_routes.md", "guides/preflight_run_verify.md", "guides/automation.md", + "guides/data_operations_plan.md", "core/cli.md", "core/pipeline.md", }, + "docs/guides/experiment_bootstrap.md": { + "./data_operations_plan.md", + "./data_operations_plan/data_classes.md", + }, + "docs/guides/data_operations_plan.md": { + "../../src/reader/workbench/dop/", + "../../skills/reader-data-operations-plan/SKILL.md", + "./data_operations_plan/operating_model.md", + "./data_operations_plan/data_classes.md", + "./data_operations_plan/metadata_minimums.md", + "./data_operations_plan/transfer_and_verification.md", + "./experiment_bootstrap.md", + "./preflight_run_verify.md", + }, + "skills/reader-data-operations-plan/SKILL.md": { + "../../docs/guides/data_operations_plan.md", + "../../docs/guides/data_operations_plan/operating_model.md", + "../../docs/guides/experiment_bootstrap.md", + "./references/endpoint-contracts.md", + "./references/external-sources.md", + "./references/test-matrix.md", + "./references/workflow.md", + }, + "docs/core/spec.md": { + "./pipeline.md", + "../../src/reader/protocols/", + "../../src/reader/workbench/dop/", + "../../src/reader/workbench/experiment/", + "../../src/reader/workbench/engine/", + "../../src/reader/plugins/", + "../../src/reader/contracts/", + "../repo-maintenance.md", + "../../QUALITY.md", + "../../RELIABILITY.md", + }, + "docs/core/plugins.md": { + "./pipeline.md", + "./spec.md", + "../../ARCHITECTURE.md", + "../../src/reader/plugins/", + "../../src/reader/workbench/assets/plugin_manifest.py", + "../../src/reader/protocols/compiler.py", + }, }