Your agents silently degrade in production. Kalibr keeps them on the optimal execution path — scoring every call, learning what works, and routing to the best model+tool combination automatically.
Open source SDK. Hosted optimization intelligence.
- Outcome-aware routing — Routes each goal to the model+tool path that is actually succeeding in production
- Continuous optimization — Learns from real outcomes using Thompson Sampling. Adapts as models degrade, tools fail, or costs shift
- Auto-instrumentation — Traces OpenAI, Anthropic, and Google AI calls with zero code changes
- TraceCapsule — Cross-agent context propagation for multi-agent systems
- Cost & token tracking — Real-time cost calculation and token monitoring across all providers
- Framework integrations — LangChain, CrewAI, OpenAI Agents SDK
pip install kalibr
kalibr auth
kalibr init
kalibr verify
Your agent runs kalibr auth, your terminal shows a code, you enter it at dashboard.kalibr.systems/link, and the agent handles the rest — credentials, codebase scanning, framework detection, and instrumentation. One approval, zero config.
Already have credentials? Skip signup:
pip install kalibr
export KALIBR_API_KEY=sk_...
export KALIBR_TENANT_ID=tenant_...
kalibr init
kalibr verify
Full autonomous onboarding protocol: kalibr.systems/llms.txt · kalibr.systems/setup.txt
from kalibr import Router
router = Router(
goal="extract_company",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: "company" in output.lower()
)
response = router.completion(
messages=[{"role": "user", "content": "Extract the company: Hi from Stripe."}]
)pip install kalibr
pip install kalibr
# For accurate token counting
pip install kalibr[tokens]Get your credentials from dashboard.kalibr.systems/settings, then:
export KALIBR_API_KEY=your-api-key
export KALIBR_TENANT_ID=your-tenant-id
export OPENAI_API_KEY=sk-... # or ANTHROPIC_API_KEY for Claude modelsOr use autonomous provisioning:
export KALIBR_PROVISIONING_TOKEN=your-token # create at dashboard.kalibr.systems/settings
kalibr init # scans your project and provisions credentials automaticallyOr link via device code (recommended):
kalibr auth
# Terminal shows a code. Enter it at dashboard.kalibr.systems/link.
# Agent receives credentials automatically. No email required.
kalibr initkalibr auth # link agent to your Kalibr account (device code — recommended)
kalibr signup EMAIL # DEPRECATED: use kalibr auth instead
kalibr init # scan codebase, wrap bare LLM calls with Router, provision credentials
kalibr verify # check credentials and Router connectivity
kalibr prompt # copy Claude Code / Cursor integration prompt to clipboardEvery call your agent makes generates data. Kalibr uses that data to get better.
- You define paths — models, tools, and parameters that can handle your task
- Kalibr picks — uses Thompson Sampling to route to what's been working while exploring alternatives
- You report outcomes — tell Kalibr if it worked (or use
success_whento automate it) - Kalibr adapts — routes more traffic to what works, routes around what doesn't
No dashboards to watch. No alerts to triage. Your agent improves itself.
A path is any combination of model + tools + params. Kalibr tracks each combination separately and learns which one works best for each goal.
# Just models
paths = ["gpt-4o", "claude-sonnet-4-20250514", "gpt-4o-mini"]
# With tools
paths = [
{"model": "gpt-4o", "tools": ["web_search"]},
{"model": "claude-sonnet-4-20250514", "tools": ["web_search", "browser"]},
]
# With params
paths = [
{"model": "gpt-4o", "params": {"temperature": 0.7}},
{"model": "gpt-4o", "params": {"temperature": 0.2}},
]
# Mix and match
paths = [
{"model": "gpt-4o", "tools": ["web_search"], "params": {"temperature": 0.3}},
{"model": "claude-sonnet-4-20250514", "params": {"temperature": 0.7}},
"gpt-4o-mini"
]This is what makes Kalibr different from model routers. OpenRouter picks a model. Kalibr picks the full execution path — and knows whether it actually worked.
router = Router(
goal="summarize",
paths=["gpt-4o", "claude-sonnet-4-20250514"],
success_when=lambda output: len(output) > 100
)
response = router.completion(messages=[...])
# Outcome reported automatically based on success_whenrouter = Router(goal="book_meeting", paths=["gpt-4o", "claude-sonnet-4-20250514"])
response = router.completion(messages=[...])
meeting_created = check_calendar_api()
router.report(success=meeting_created)Tell Kalibr why something failed so routing decisions are made against root cause, not just success rate:
from kalibr import FAILURE_CATEGORIES
# ["timeout", "context_exceeded", "tool_error", "rate_limited",
# "validation_failed", "hallucination_detected", "user_unsatisfied",
# "empty_response", "malformed_output", "auth_error", "provider_error", "unknown"]
router.report(
success=False,
failure_category="rate_limited",
reason="hit provider limit"
)
# Invalid categories raise ValueError immediatelyFor async validation, user feedback, or downstream system confirmation:
from kalibr import update_outcome
update_outcome(
trace_id="abc123",
goal="resolve_ticket",
success=False,
failure_reason="customer_reopened",
failure_category="user_unsatisfied",
score=0.3,
metadata={"ticket_id": "T-9182"}
)Query what Kalibr has learned about your goals — health status, failure mode breakdowns, path comparisons, and actionable signals:
from kalibr import get_insights
# All goals, last 7 days
insights = get_insights()
# Specific goal, custom window
insights = get_insights(goal="research_agent", window_hours=24)
for goal_data in insights["goals"]:
print(goal_data["status"]) # healthy / degraded / insufficient_data
print(goal_data["top_failure_modes"])
print(goal_data["actionable_signals"]) # path_underperforming, drift_detected, etc.pip install kalibr[langchain]from kalibr import Router
router = Router(goal="summarize", paths=["gpt-4o", "claude-sonnet-4-20250514"])
llm = router.as_langchain()
chain = prompt | llm | parserpip install kalibr[crewai] # CrewAI
pip install kalibr[openai-agents] # OpenAI Agents SDK
pip install kalibr[langchain-all] # LangChain with all providersKalibr auto-instruments OpenAI, Anthropic, and Google SDKs on import:
import kalibr # Must be first import
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(model="gpt-4o", messages=[...])
# Traced automatically — cost, latency, tokens, success all capturedDisable with KALIBR_AUTO_INSTRUMENT=false.
Use get_policy() when you need fine-grained control — custom retry logic, framework integrations, or provider-specific features:
from kalibr import get_policy, report_outcome
policy = get_policy(goal="summarize")
model = policy["recommended_model"]
# You call the provider yourself
if model.startswith("gpt"):
client = OpenAI()
response = client.chat.completions.create(model=model, messages=[...])
report_outcome(trace_id=trace_id, goal="summarize", success=True)Or go even lower:
from kalibr import register_path, decide, report_outcome
register_path(goal="book_meeting", model_id="gpt-4o")
register_path(goal="book_meeting", model_id="claude-sonnet-4-20250514")
decision = decide(goal="book_meeting")
model = decision["model_id"]
# Make your own LLM call, then report
report_outcome(trace_id="...", goal="book_meeting", success=True)| Variable | Description | Default |
|---|---|---|
KALIBR_API_KEY |
API key from dashboard | Required |
KALIBR_TENANT_ID |
Tenant ID from dashboard | Required |
KALIBR_PROVISIONING_TOKEN |
Enables kalibr init credential auto-provisioning |
— |
KALIBR_AUTO_INSTRUMENT |
Auto-instrument LLM SDKs on import | true |
KALIBR_INTELLIGENCE_URL |
Intelligence service URL | https://kalibr-intelligence.fly.dev |
KALIBR_COLLECTOR_URL |
Ingest endpoint | https://api.kalibr.systems/api/ingest |
KALIBR_CONSOLE_EXPORT |
Print spans to console | false |
git clone https://github.com/kalibr-ai/kalibr-sdk-python.git
cd kalibr-sdk-python
pip install -e ".[dev]"
pytestSee CONTRIBUTING.md.
Apache-2.0