Skip to content

Latest commit

 

History

History
366 lines (298 loc) · 7.91 KB

File metadata and controls

366 lines (298 loc) · 7.91 KB

LLM Backends

CUCo supports multiple LLM providers for code generation, mutation, judging, meta-summarization, and embeddings. This document covers setup for each provider.

Provider Overview

Provider Model Name Format Client Use Case
Anthropic (direct) claude-sonnet-4-6, claude-opus-4-6 anthropic.Anthropic() Direct Anthropic API
Anthropic (Bedrock) bedrock/us.anthropic.claude-opus-4-6-v1 anthropic.AnthropicBedrock() AWS-managed Anthropic
OpenAI gpt-4.1-mini, o3-mini openai.OpenAI() OpenAI API
Azure OpenAI azure-gpt-4.1-mini openai.AzureOpenAI() Azure-managed OpenAI
DeepSeek deepseek-chat, deepseek-reasoner openai.OpenAI(base_url=...) DeepSeek API
Google Gemini gemini-2.0-flash, gemini-2.5-pro openai.OpenAI(base_url=...) Google AI API
Claude CLI claude-cli/opus, claude-cli/sonnet subprocess Claude Code CLI

Anthropic (Direct API)

Environment Variables

ANTHROPIC_API_KEY=sk-ant-...

Available Models

llm_models=["claude-sonnet-4-6"]
llm_models=["claude-opus-4-6"]
llm_models=["claude-haiku-4-5"]

Anthropic via AWS Bedrock (recommended)

This is the default provider in the included workloads.

Environment Variables

AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION_NAME=us-east-1

Available Models

Model names use the bedrock/ prefix followed by the Bedrock model ID:

# Sonnet 4.6
llm_models=["bedrock/us.anthropic.claude-sonnet-4-6"]

# Opus 4.6 (strongest, most expensive)
llm_models=["bedrock/us.anthropic.claude-opus-4-6-v1"]

# Sonnet 4.5
llm_models=["bedrock/us.anthropic.claude-sonnet-4-5-20250929-v1:0"]

# Sonnet 4
llm_models=["bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0"]

# Haiku 4.5 (fastest, cheapest)
llm_models=["bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0"]

Pricing (per million tokens)

Model Input Output
Claude Opus 4.6 $5.00 $25.00
Claude Sonnet 4.6 $3.00 $15.00
Claude Sonnet 4.5 $3.00 $15.00
Claude Haiku 4.5 $1.00 $5.00

OpenAI

Environment Variables

OPENAI_API_KEY=sk-...

Available Models

llm_models=["gpt-4.1-mini"]
llm_models=["gpt-4.1"]
llm_models=["o3-mini"]
llm_models=["o4-mini"]

Azure OpenAI

Environment Variables

AZURE_OPENAI_API_KEY=...
AZURE_API_VERSION=2024-02-15-preview
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/

Model Names

Use the azure- prefix:

llm_models=["azure-gpt-4.1-mini"]
llm_models=["azure-gpt-4.1"]

DeepSeek

Environment Variables

DEEPSEEK_API_KEY=...

Available Models

llm_models=["deepseek-chat"]
llm_models=["deepseek-reasoner"]

Google Gemini

Environment Variables

GEMINI_API_KEY=...

Available Models

llm_models=["gemini-2.0-flash"]
llm_models=["gemini-2.5-pro-preview-05-06"]
llm_models=["gemini-2.5-flash-preview-04-17"]

Claude CLI

Uses the Claude Code CLI (claude -p) as a subprocess. No API key needed if Claude CLI is already authenticated.

llm_models=["claude-cli/opus"]
llm_models=["claude-cli/sonnet"]
llm_models=["claude-cli/haiku"]

This is primarily used for the fast-path agent mode, where the LLM gets full file system autonomy.

Reasoning Models

Some models support extended thinking / chain-of-thought reasoning. CUCo automatically enables this for known reasoning models:

Provider Reasoning Models
Anthropic claude-3-7-sonnet-*, claude-sonnet-4-*, claude-opus-4-*
OpenAI o3-mini, o4-mini
DeepSeek deepseek-reasoner
Gemini gemini-2.5-pro-*, gemini-2.5-flash-*

For reasoning models, CUCo:

  • Sets temperature to 1.0 (required by most reasoning APIs)
  • Adds thinking/budget parameters (e.g., thinking.budget_tokens for Anthropic)
  • Passes reasoning_effort if configured in llm_kwargs

Embedding Models

Used for novelty filtering and similarity-based retrieval.

Provider Model Variable
OpenAI text-embedding-3-small, text-embedding-3-large OPENAI_API_KEY
Azure azure-text-embedding-3-small AZURE_OPENAI_API_KEY
Gemini gemini-embedding-001 GEMINI_API_KEY
Bedrock bedrock-amazon.titan-embed-text-v1 AWS_ACCESS_KEY_ID

Configure via:

evo_config = EvolutionConfig(
    embedding_model="bedrock-amazon.titan-embed-text-v1",
    ...
)

Dynamic Model Selection

CUCo can automatically select between multiple models using a bandit algorithm:

evo_config = EvolutionConfig(
    llm_models=[
        "bedrock/us.anthropic.claude-opus-4-6-v1",
        "bedrock/us.anthropic.claude-sonnet-4-6",
    ],
    llm_dynamic_selection="ucb",  # Asymmetric Upper Confidence Bound
    ...
)

The UCB bandit tracks which models produce higher-scoring candidates and allocates more queries to better-performing models over time.

Alternatively, use None (default) for round-robin selection across models.

Cost Tracking

CUCo tracks API costs for all LLM calls. Each QueryResult includes input_cost and output_cost based on the pricing tables in cuco/llm/models/pricing.py. Cumulative costs are logged during evolution.

To add a new model, add its pricing entry to the appropriate dictionary in pricing.py:

BEDROCK_MODELS = {
    "bedrock/your-new-model-id": {
        "input_price": X / M,
        "output_price": Y / M,
    },
    ...
}

Choosing a Model

Recommendations for CUCo workloads:

Role Recommended Reasoning
Mutation (slow-path) Opus 4.6 or Sonnet 4.6 Complex code reasoning, large context
Meta-summarization Opus 4.6 Cross-generation pattern analysis
Fast-path rewrite Sonnet 4.6 Good balance of quality and cost
Fast-path judge Same as rewriter Simpler task, lower token count
Evaluation feedback Sonnet 4.6 Quick factual analysis
Embeddings Titan or text-embedding-3-small Cheap, fast

For budget-conscious runs, Sonnet 4.6 works well for all roles. For maximum quality, use Opus 4.6 for mutation and meta-summarization.