Releases: Jasvina/AgentReliabilityKit
Releases · Jasvina/AgentReliabilityKit
v0.1.0 - AgentEvalKit initial public toolkit release
v0.1.0
Why this release exists
v0.1.0 introduces AgentEvalKit as a public toolkit for the agent reliability loop: capture real runs, replay and diff them, turn traces into reusable eval artifacts, cluster recurring failures, and slice the same evidence into reproducible datasets.
This release is meant to make the repo understandable and usable as a coherent workflow, not just a collection of tools.
What is included
AgentCIfor replay-first regression testing of tool-using agentsTracePackfor packaging traces into reusable benchmark packsFailMapfor clustering recurring failures and comparing releasesPackSlicefor balanced train/eval/test splits from the same pack- Root-level automation, docs, and community health files so the full toolchain is easy to discover and run
- A public roadmap backlog with starter issues for the next improvements
First run
From the repo root:
./scripts/run_automation_demo.sh /tmp/agentevalkit-demoThis produces a machine-readable manifest.json plus per-tool artifacts that show the full pipeline working end to end.
Public backlog
The next public work stays focused on the reliability loop:
- expand adapters and regression diffing in
AgentCI - improve redaction, labeling, and export coverage in
TracePack - strengthen release-over-release comparisons and issue routing in
FailMap - add label-aware, temporal, and reproducibility improvements in
PackSlice - make root automation outputs easier to consume in CI and dashboards
Scope
This release covers the current monorepo toolchain and its public workflow surface.
It does not try to be:
- a general-purpose agent framework
- a broad orchestration platform
- an open-ended memory layer
- a demo-first UI product