Skip to content

marcusjihansson/agent_opt

Agent Optimization Techniques Library

Build Status codecov License: MIT

A reference library showcasing agent-level optimization techniques for LLM-based applications. This is a research/prototype project designed to help developers understand and explore different optimization strategies they can apply to their own agents.

This library is grounded in academic research from 2024-2025. See docs/RESEARCH_BASIS.md for the research papers and industry resources that inform these techniques.

What This Is

A collection of 15+ independent optimization modules covering:

  • Caching strategies (simple, semantic, response-based, prefix, KV caching)
  • Model routing (cost-aware, cascade, fallback patterns)
  • Prompt optimization (compression, few-shot selection, templates)
  • Context management (sliding windows, token counting, memory hierarchy)
  • Request batching (sync/async request batching)
  • Advanced retrieval (hybrid search, reranking, hypothetical documents)
  • Advanced prompting (chain-of-thought, few-shot CoT, ReAct)
  • Structured output handling (parsing, validation, function calls)
  • Cost tracking (budget management, cost analysis)
  • Evaluation & monitoring (feedback loops, metrics, agent monitoring)
  • Framework integrations (DSPy, LangGraph)
  • Specialized techniques (streaming, parallel execution, speculative execution)

Each module can be used independently or composed together.

What This Is NOT

  • Not production-ready code - This is prototype/research quality
  • Not model-level optimization - Focuses only on agent/application layer
  • Not hardware-level optimization - Does not address compute or infrastructure
  • Not a turnkey framework - Requires understanding and adaptation for specific use cases

Current State (January 2025)

Testing: 282 tests passing (244 unit + 38 integration tests)

Documentation: Quickstart guide + module docs + working examples

Code Quality: Clean APIs with comprehensive test coverage, limited production error handling

Installation

pip install -e .

Or install specific feature sets:

pip install -e ".[tiktoken]"    # For token counting
pip install -e ".[dspy]"         # For DSPy integration
pip install -e ".[langgraph]"    # For LangGraph integration
pip install -e ".[hnswlib]"      # For HNSW retrieval

Quick Start

Basic Caching

from agent_opt.caching import SimpleCache

cache = SimpleCache(max_size=1000, ttl_seconds=3600)

# Check cache
result = cache.get(prompt)
if result is None:
    result = llm.call(prompt)
    cache.set(prompt, result)

print(cache.get_stats())

Cost-Aware Routing

from agent_opt.routing import CostAwareRouter, CostModelConfig

router = CostAwareRouter(
    models=[
        CostModelConfig("gpt-4", cost_per_1k_input=0.03, cost_per_1k_output=0.06),
        CostModelConfig("gpt-3.5", cost_per_1k_input=0.0005, cost_per_1k_output=0.0015),
    ],
    optimization_goal="balanced",
)
selected = router.select_model(input_tokens=100, output_tokens=50)
print(f"Using {selected.name}")

Context Management

from agent_opt.context import SlidingWindow

window = SlidingWindow(max_messages=10, max_tokens=4000)
window.add_message({"role": "user", "content": "Hello"})
window.add_message({"role": "assistant", "content": "Hi there"})
messages = window.get_messages()  # Bounded context

Examples

Run the examples to see techniques in action:

python examples/basic_caching_example.py
python examples/model_routing_example.py
python examples/comprehensive_optimization_example.py

Project Structure

src/agent_opt/
├── caching/              # Cache implementations
├── routing/              # Model routing strategies
├── prompts/              # Prompt optimization
├── context/              # Context window management
├── batching/             # Request batching
├── retrieval/            # Advanced retrieval techniques
├── advanced_prompts/     # Complex prompting strategies
├── structured/           # Structured output handling
├── cost/                 # Cost tracking and budgeting
├── evaluation/           # Evaluation and monitoring
├── advanced_caching/     # Specialized caching techniques
├── streaming/            # Token streaming utilities
├── request_optimization/ # Request-level optimizations
├── dspy_integration/     # DSPy framework support
└── langgraph_integration/# LangGraph support

examples/                 # Usage examples

Documentation

  • Quick Start Guide - Getting started with key modules
  • Module Docstrings - Each module has detailed API documentation
  • Examples - Working examples in examples/

Testing

Run the test suite:

pytest test/unit/ -v

Current Status: 244 tests passing across all 15 modules

Contributing

See CONTRIBUTING.md for guidelines on contributing to this project.

Security

If you discover a security issue, please create an issue in our issue tracker. As a reminder is this project for research basis, as I have built the stems from this library, so please make sure that you have built out the code for production use.

License

MIT

About

I started to research different agent optimization techniques for my own benefit, so I built this library. Feel free to take these implementations and start experimenting with different strategies.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors