Decision Space Harness is a configuration-driven evaluation harness for measuring how LLM-agent pipelines preserve or compress disagreement, frame diversity, and option breadth during evidence synthesis.
The repo currently supports:
- 3 registered task families:
conflict_preservation,frame_diversity,option_space - 4 built-in agents:
baseline_direct,retrieve_then_synthesize,option_generation,structured_conflict_preserving - heuristic and Ollama-backed model providers
- named evidence providers including
top1,top2,top5,top10, andprecomputed_top2 benchmark_single_run/v1andperturbation_group/v1study protocols- structural metrics for conflict retention, frame preservation, option breadth, path dependence, and lexical overlap
- append-only
attempts.jsonltelemetry with derivedruns.jsonl,metrics.jsonl,steps.jsonl, and optionalmessage_records.jsonl - experiment report artifacts, repro bundles, and fidelity traces
- rerun support for failed cells without deleting prior attempt history
- 40 conflict-preservation benchmark tasks plus a frame-diversity smoke fixture
Run a sample experiment:
cd /home/etalbert102/decision_space_harness
source .venv/bin/activate
PYTHONPATH=src python -m decision_space_harness.experiments.runner experiments/configs/conflict_smoothing_v1.yamlRerun only cells whose current selected run is failed:
PYTHONPATH=src python -m decision_space_harness.experiments.runner \
--rerun-failed-only \
experiments/configs/conflict_smoothing_v1.yamlRun the frame-diversity smoke experiment:
PYTHONPATH=src python -m decision_space_harness.experiments.runner experiments/configs/frame_diversity_smoke_v1.yamlRun tests:
source .venv/bin/activate
PYTHONPATH=src pytest -qRun fidelity assessment:
source .venv/bin/activate
PYTHONPATH=src python -m decision_space_harness.fidelity.assessment experiments/configs/conflict_smoothing_v1.yamlEach experiment writes into outputs/experiments/<experiment_id>/:
attempts.jsonl: canonical append-only attempt historyruns.jsonl: deterministic selected-run viewmetrics.jsonl: scored metric rowssteps.jsonl: boundary and diagnostic step recordsmessage_records.jsonl: optional intra-attempt communication tracesummary.json: experiment summaryreport.md: markdown reportfigures/: text and HTML metric summariesrepro_bundle/: config, manifests, environment, and supporting reproducibility artifactsfidelity_trace.json: fidelity-framework event trace
experiments/configs/conflict_smoothing_v1.yamlexperiments/configs/conflict_full_suite_v1.yamlexperiments/configs/conflict_phase3_analysis_v1.yamlexperiments/configs/conflict_confirmatory_demo_v1.yamlexperiments/configs/conflict_ollama_demo_v1.yamlexperiments/configs/frame_diversity_smoke_v1.yaml