CUCo supports multiple LLM providers for code generation, mutation, judging, meta-summarization, and embeddings. This document covers setup for each provider.
| Provider | Model Name Format | Client | Use Case |
|---|---|---|---|
| Anthropic (direct) | claude-sonnet-4-6, claude-opus-4-6 |
anthropic.Anthropic() |
Direct Anthropic API |
| Anthropic (Bedrock) | bedrock/us.anthropic.claude-opus-4-6-v1 |
anthropic.AnthropicBedrock() |
AWS-managed Anthropic |
| OpenAI | gpt-4.1-mini, o3-mini |
openai.OpenAI() |
OpenAI API |
| Azure OpenAI | azure-gpt-4.1-mini |
openai.AzureOpenAI() |
Azure-managed OpenAI |
| DeepSeek | deepseek-chat, deepseek-reasoner |
openai.OpenAI(base_url=...) |
DeepSeek API |
| Google Gemini | gemini-2.0-flash, gemini-2.5-pro |
openai.OpenAI(base_url=...) |
Google AI API |
| Claude CLI | claude-cli/opus, claude-cli/sonnet |
subprocess | Claude Code CLI |
ANTHROPIC_API_KEY=sk-ant-...llm_models=["claude-sonnet-4-6"]
llm_models=["claude-opus-4-6"]
llm_models=["claude-haiku-4-5"]This is the default provider in the included workloads.
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION_NAME=us-east-1Model names use the bedrock/ prefix followed by the Bedrock model ID:
# Sonnet 4.6
llm_models=["bedrock/us.anthropic.claude-sonnet-4-6"]
# Opus 4.6 (strongest, most expensive)
llm_models=["bedrock/us.anthropic.claude-opus-4-6-v1"]
# Sonnet 4.5
llm_models=["bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0"]
# Sonnet 4
llm_models=["bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0"]
# Haiku 4.5 (fastest, cheapest)
llm_models=["bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0"]| Model | Input | Output |
|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 |
| Claude Sonnet 4.6 | $3.00 | $15.00 |
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $1.00 | $5.00 |
OPENAI_API_KEY=sk-...llm_models=["gpt-4.1-mini"]
llm_models=["gpt-4.1"]
llm_models=["o3-mini"]
llm_models=["o4-mini"]AZURE_OPENAI_API_KEY=...
AZURE_API_VERSION=2024-02-15-preview
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/Use the azure- prefix:
llm_models=["azure-gpt-4.1-mini"]
llm_models=["azure-gpt-4.1"]DEEPSEEK_API_KEY=...llm_models=["deepseek-chat"]
llm_models=["deepseek-reasoner"]GEMINI_API_KEY=...llm_models=["gemini-2.0-flash"]
llm_models=["gemini-2.5-pro-preview-05-06"]
llm_models=["gemini-2.5-flash-preview-04-17"]Uses the Claude Code CLI (claude -p) as a subprocess. No API key needed if Claude CLI is already authenticated.
llm_models=["claude-cli/opus"]
llm_models=["claude-cli/sonnet"]
llm_models=["claude-cli/haiku"]This is primarily used for the fast-path agent mode, where the LLM gets full file system autonomy.
Some models support extended thinking / chain-of-thought reasoning. CUCo automatically enables this for known reasoning models:
| Provider | Reasoning Models |
|---|---|
| Anthropic | claude-3-7-sonnet-*, claude-sonnet-4-*, claude-opus-4-* |
| OpenAI | o3-mini, o4-mini |
| DeepSeek | deepseek-reasoner |
| Gemini | gemini-2.5-pro-*, gemini-2.5-flash-* |
For reasoning models, CUCo:
- Sets temperature to 1.0 (required by most reasoning APIs)
- Adds thinking/budget parameters (e.g.,
thinking.budget_tokensfor Anthropic) - Passes
reasoning_effortif configured inllm_kwargs
Used for novelty filtering and similarity-based retrieval.
| Provider | Model | Variable |
|---|---|---|
| OpenAI | text-embedding-3-small, text-embedding-3-large |
OPENAI_API_KEY |
| Azure | azure-text-embedding-3-small |
AZURE_OPENAI_API_KEY |
| Gemini | gemini-embedding-001 |
GEMINI_API_KEY |
| Bedrock | bedrock-amazon.titan-embed-text-v1 |
AWS_ACCESS_KEY_ID |
Configure via:
evo_config = EvolutionConfig(
embedding_model="bedrock-amazon.titan-embed-text-v1",
...
)CUCo can automatically select between multiple models using a bandit algorithm:
evo_config = EvolutionConfig(
llm_models=[
"bedrock/us.anthropic.claude-opus-4-6-v1",
"bedrock/us.anthropic.claude-sonnet-4-6",
],
llm_dynamic_selection="ucb", # Asymmetric Upper Confidence Bound
...
)The UCB bandit tracks which models produce higher-scoring candidates and allocates more queries to better-performing models over time.
Alternatively, use None (default) for round-robin selection across models.
CUCo tracks API costs for all LLM calls. Each QueryResult includes input_cost and output_cost based on the pricing tables in cuco/llm/models/pricing.py. Cumulative costs are logged during evolution.
To add a new model, add its pricing entry to the appropriate dictionary in pricing.py:
BEDROCK_MODELS = {
"bedrock/your-new-model-id": {
"input_price": X / M,
"output_price": Y / M,
},
...
}Recommendations for CUCo workloads:
| Role | Recommended | Reasoning |
|---|---|---|
| Mutation (slow-path) | Opus 4.6 or Sonnet 4.6 | Complex code reasoning, large context |
| Meta-summarization | Opus 4.6 | Cross-generation pattern analysis |
| Fast-path rewrite | Sonnet 4.6 | Good balance of quality and cost |
| Fast-path judge | Same as rewriter | Simpler task, lower token count |
| Evaluation feedback | Sonnet 4.6 | Quick factual analysis |
| Embeddings | Titan or text-embedding-3-small | Cheap, fast |
For budget-conscious runs, Sonnet 4.6 works well for all roles. For maximum quality, use Opus 4.6 for mutation and meta-summarization.