Releases · Jasvina/AgentReliabilityKit

v0.1.0

Why this release exists

v0.1.0 introduces AgentEvalKit as a public toolkit for the agent reliability loop: capture real runs, replay and diff them, turn traces into reusable eval artifacts, cluster recurring failures, and slice the same evidence into reproducible datasets.

This release is meant to make the repo understandable and usable as a coherent workflow, not just a collection of tools.

What is included

AgentCI for replay-first regression testing of tool-using agents
TracePack for packaging traces into reusable benchmark packs
FailMap for clustering recurring failures and comparing releases
PackSlice for balanced train/eval/test splits from the same pack
Root-level automation, docs, and community health files so the full toolchain is easy to discover and run
A public roadmap backlog with starter issues for the next improvements

First run

From the repo root:

./scripts/run_automation_demo.sh /tmp/agentevalkit-demo

This produces a machine-readable manifest.json plus per-tool artifacts that show the full pipeline working end to end.

Public backlog

The next public work stays focused on the reliability loop:

expand adapters and regression diffing in AgentCI
improve redaction, labeling, and export coverage in TracePack
strengthen release-over-release comparisons and issue routing in FailMap
add label-aware, temporal, and reproducibility improvements in PackSlice
make root automation outputs easier to consume in CI and dashboards

Scope

This release covers the current monorepo toolchain and its public workflow surface.

It does not try to be:

a general-purpose agent framework
a broad orchestration platform
an open-ended memory layer
a demo-first UI product

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

v0.1.0

Why this release exists

What is included

First run

Public backlog

Scope

Uh oh!

Releases: Jasvina/AgentReliabilityKit

v0.1.0 - AgentEvalKit initial public toolkit release

v0.1.0

Why this release exists

What is included

First run

Public backlog

Scope

Uh oh!