Skip to content

Releases: MaverickHQ/executable-world-models

v0.8.5.1 — Evidence Policy Feedback Loop

14 Mar 21:26

Choose a tag to compare

This release completes the deterministic policy feedback loop for Executable World Models.

What changed:

  • Added evidence-policy feedback layer
  • Experiments can now influence future trading decisions
  • Added policy module, policy builder, and policy feedback demo
  • Added tests for evidence-policy and feedback loop
  • Fixed deployed health version alignment

Why it matters:
The architecture now closes the loop from environment interaction to trajectories, evaluation, experiments, evidence datasets, policy updates, and improved future decisions.

This is deterministic policy feedback, not reinforcement learning.

v0.8.3 – Structural Evaluation Layer

23 Feb 10:23

Choose a tag to compare

0.8.3 introduces the first formal evaluation layer for Executable World Models.

This release adds deterministic, schema-aware structural evaluation for both single runs and experiments, built on top of the canonical v2 manifest schema.

Run-Level Structural Evaluation
Evaluate a single run’s artifacts:

ewm run evaluate --artifacts-dir <path>
ewm run evaluate --artifacts-dir <root> --run-id <id>

Produces:
• Deterministic evaluation.json
• Integrity validation (manifest v2 enforcement)
• Constraint checks (runtime_budgets_max_steps, policy_limits)
• Structural metrics (steps_executed, truncated_by_budget)

Key properties:
• Deterministic output (no timestamps)
• Stable error codes (manifest_missing, run_id_mismatch, etc.)
• Writes evaluation output even on failure
• No AWS or network dependencies

Experiment-Level Structural Aggregation
Aggregate metrics across multiple runs:

ewm experiment evaluate --experiment-dir <path>

Produces:
• evaluation_summary.json
• evaluation_runs.csv

Metrics include:
• total_runs
• avg_steps_executed
• pct_truncated_by_budget
• integrity summaries
• per-run structural results

•	Manifest v2 canonical schema enforced
•	Deterministic JSON (sort_keys=True)
•	No optional dependency coupling
•	No runtime/AWS imports in evaluation layer
•	231 unit tests passing
•	Full AWS integration suite passing
•	Observability validation passing

🧪 CLI Examples

Single run:

ewm run evaluate --artifacts-dir tmp/artifacts --run-id abc-123

Experiment:

ewm experiment evaluate --experiment-dir tmp/experiment_001

📦 Technical Notes
• runtime_budgets_max_steps is canonical (runtime_budget_max_steps retained for backward compatibility)
• integrity_errors now use stable error codes
• Evaluation writes output even if manifest invalid
• correlation_id remains canonical; trace_id retained for backward compatibility

v0.8.2.3

22 Feb 18:35

Choose a tag to compare

Stability & Test Baseline Release
This patch release improves CLI robustness and establishes a clean-green test baseline across local and AWS environments.

Improvements

  • Lazy CLI import for experiment command: Optional dependencies are no longer required for non-experiment commands (mode, env, target, cost, runs, etc.).
  • Removed eager certifi import: The experiment module now loads optional HTTPS dependencies only when needed (AWS target), eliminating unnecessary startup coupling.
  • Subprocess test reliability: CLI subprocess tests now use sys.executable, ensuring consistent interpreter usage across environments.
  • Improved AWS integration test portability: Integration tests resolve the artifacts bucket via CloudFormation outputs when ARTIFACT_BUCKET is not set.

Test Baseline
• make lint → 0 errors
• pytest tests/ --ignore=infra → 224 passed, 5 skipped, 0 failed
• AWS deployment verified (/health returns 0.8.2.3)
• Observability verification script passes

Compatibility
• No breaking changes.
• No runtime behavior changes beyond improved CLI dependency handling.

v0.8.1-Agent-Runtime

21 Feb 19:24

Choose a tag to compare

This release introduces the first deployed Agent Runtime for executable-world-models.

Highlights
• Deployed /agentcore/loop execution endpoint
• Deterministic artifact upload to S3 (decision.json, trajectory.json, deltas.json)
• DynamoDB run persistence
• Budget semantics enforcement
• Correlation ID propagation across API, logs, and EMF metrics
• Structured observability and latency tracking

This release establishes the execution and deployment layers required for experimental evaluation of agent-based world models.

v0.8.1-fixes

20 Feb 22:32

Choose a tag to compare

What’s in this release
• S3 artifacts enabled: agentcore/loop now uploads decision.json, deltas.json, and trajectory.json to:
• s3://beyondtokensstack-artifactsbucket2aac5544-vayurszcre4w/artifacts/<run_id>/
• Correlation ID propagation: x-correlation-id is preferred; fallback to X-Ray trace id, then UUID.
• Returned in the API response as correlation_id
• Logged in a grep-friendly line: correlation_id=
• EMF metrics now also use correlation_id (no trace_id)
• Response rounding: cash_balance is now rounded to 2 decimal places in API responses.

Verified
• Deployed and tested in us-east-1
• GET /health ✅
• POST /agentcore/loop ✅
• DynamoDB run persistence ✅
• S3 artifact presence for new runs ✅
• CloudWatch logs include correlation_id ✅

Notes
• Older artifacts remain in S3 under prior run IDs; new runs now consistently write the three key artifacts.

v0.7.12-cli — Deterministic run inspection + guardrail transparency

20 Feb 09:24

Choose a tag to compare

This release strengthens the inspection and observability layer of the executable world model loop.

Key improvements:

• Human-readable decision field (APPROVED, REJECTED, UNKNOWN)
• Rounded financial values for deterministic CLI output
• Rejection summaries with step index, action, and limiter reason
• Support for --raw and --json output modes
• Expanded unit test coverage for runs inspection

Why this matters:

The loop is now externally inspectable and deterministic.
Guardrail decisions are transparent, reproducible, and versioned via release tags.