A reference library showcasing agent-level optimization techniques for LLM-based applications. This is a research/prototype project designed to help developers understand and explore different optimization strategies they can apply to their own agents.
This library is grounded in academic research from 2024-2025. See docs/RESEARCH_BASIS.md for the research papers and industry resources that inform these techniques.
A collection of 15+ independent optimization modules covering:
- Caching strategies (simple, semantic, response-based, prefix, KV caching)
- Model routing (cost-aware, cascade, fallback patterns)
- Prompt optimization (compression, few-shot selection, templates)
- Context management (sliding windows, token counting, memory hierarchy)
- Request batching (sync/async request batching)
- Advanced retrieval (hybrid search, reranking, hypothetical documents)
- Advanced prompting (chain-of-thought, few-shot CoT, ReAct)
- Structured output handling (parsing, validation, function calls)
- Cost tracking (budget management, cost analysis)
- Evaluation & monitoring (feedback loops, metrics, agent monitoring)
- Framework integrations (DSPy, LangGraph)
- Specialized techniques (streaming, parallel execution, speculative execution)
Each module can be used independently or composed together.
- Not production-ready code - This is prototype/research quality
- Not model-level optimization - Focuses only on agent/application layer
- Not hardware-level optimization - Does not address compute or infrastructure
- Not a turnkey framework - Requires understanding and adaptation for specific use cases
Testing: 282 tests passing (244 unit + 38 integration tests)
Documentation: Quickstart guide + module docs + working examples
Code Quality: Clean APIs with comprehensive test coverage, limited production error handling
pip install -e .Or install specific feature sets:
pip install -e ".[tiktoken]" # For token counting
pip install -e ".[dspy]" # For DSPy integration
pip install -e ".[langgraph]" # For LangGraph integration
pip install -e ".[hnswlib]" # For HNSW retrievalfrom agent_opt.caching import SimpleCache
cache = SimpleCache(max_size=1000, ttl_seconds=3600)
# Check cache
result = cache.get(prompt)
if result is None:
result = llm.call(prompt)
cache.set(prompt, result)
print(cache.get_stats())from agent_opt.routing import CostAwareRouter, CostModelConfig
router = CostAwareRouter(
models=[
CostModelConfig("gpt-4", cost_per_1k_input=0.03, cost_per_1k_output=0.06),
CostModelConfig("gpt-3.5", cost_per_1k_input=0.0005, cost_per_1k_output=0.0015),
],
optimization_goal="balanced",
)
selected = router.select_model(input_tokens=100, output_tokens=50)
print(f"Using {selected.name}")from agent_opt.context import SlidingWindow
window = SlidingWindow(max_messages=10, max_tokens=4000)
window.add_message({"role": "user", "content": "Hello"})
window.add_message({"role": "assistant", "content": "Hi there"})
messages = window.get_messages() # Bounded contextRun the examples to see techniques in action:
python examples/basic_caching_example.py
python examples/model_routing_example.py
python examples/comprehensive_optimization_example.pysrc/agent_opt/
├── caching/ # Cache implementations
├── routing/ # Model routing strategies
├── prompts/ # Prompt optimization
├── context/ # Context window management
├── batching/ # Request batching
├── retrieval/ # Advanced retrieval techniques
├── advanced_prompts/ # Complex prompting strategies
├── structured/ # Structured output handling
├── cost/ # Cost tracking and budgeting
├── evaluation/ # Evaluation and monitoring
├── advanced_caching/ # Specialized caching techniques
├── streaming/ # Token streaming utilities
├── request_optimization/ # Request-level optimizations
├── dspy_integration/ # DSPy framework support
└── langgraph_integration/# LangGraph support
examples/ # Usage examples
- Quick Start Guide - Getting started with key modules
- Module Docstrings - Each module has detailed API documentation
- Examples - Working examples in
examples/
Run the test suite:
pytest test/unit/ -vCurrent Status: 244 tests passing across all 15 modules
See CONTRIBUTING.md for guidelines on contributing to this project.
If you discover a security issue, please create an issue in our issue tracker. As a reminder is this project for research basis, as I have built the stems from this library, so please make sure that you have built out the code for production use.
MIT