Skip to content

feat: multi-provider LLM support (#545–549)#552

Merged
frankbria merged 6 commits intomainfrom
feat/multi-provider-llm-545-549
Apr 6, 2026
Merged

feat: multi-provider LLM support (#545–549)#552
frankbria merged 6 commits intomainfrom
feat/multi-provider-llm-545-549

Conversation

@frankbria
Copy link
Copy Markdown
Owner

@frankbria frankbria commented Apr 5, 2026

Summary

Completes subissues #545–549 of the multi-provider LLM initiative (#542). Issues #543 and #544 (OpenAI adapter and factory) were already merged.

How it works

# Use OpenAI or Ollama instead of Anthropic
cf work start <id> --execute --llm-provider openai --llm-model gpt-4o
cf work start <id> --execute --llm-provider openai --llm-model qwen2.5-coder:7b

# Route to local Ollama via config.yaml
# .codeframe/config.yaml:
# llm:
#   provider: openai
#   model: qwen2.5-coder:7b
#   base_url: http://localhost:11434/v1

# Default unchanged — still uses Anthropic
cf work start <id> --execute

Provider priority (lowest → highest): config file < CODEFRAME_LLM_PROVIDER env < --llm-provider CLI

What was added to the LLMProvider ABC

  • LLMError, LLMAuthError, LLMRateLimitError, LLMConnectionError common exceptions
  • async_complete() default implementation (thread-executor wrapping complete())
  • AnthropicProvider.async_complete() — true async via AsyncAnthropic
  • OpenAIProvider.async_complete() — true async via openai.AsyncOpenAI

Issue #548 (streaming_chat.py)

Not included in this PR. The issue notes that making streaming abstract is acceptable as a future step; streaming_chat.py still uses AsyncAnthropic internally but its lazy import is already isolated. Will be tracked separately.

Test plan

  • uv run pytest tests/adapters/test_llm_async.py — async provider methods (7 new tests)
  • uv run pytest tests/agents/test_worker_agent_provider.py — WorkerAgent + MockProvider (2 new)
  • uv run pytest tests/core/test_cli_llm_flags.py — CLI flag presence (4 new)
  • uv run pytest tests/core/test_config_llm.py — config.yaml llm: block (4 new)
  • uv run pytest tests/agents/test_worker_agent.py tests/agents/test_frontend_worker_agent.py — 57 existing tests pass
  • uv run ruff check . — no lint errors

Closes #545
Closes #546
Closes #547
Closes #549
Part of #542

Summary by CodeRabbit

  • New Features

    • Added --llm-provider and --llm-model CLI flags
    • Workspace LLM config block (provider, model, base_url)
    • Async LLM completion API supported across providers
  • Improvements

    • Distinct auth, rate‑limit, and connection error types
    • Refined retry behavior for transient LLM failures
    • Agents routed to a provider abstraction (removes vendor-specific runtime dependency)
  • Tests

    • New/updated tests for async providers, CLI flags, config parsing, provider-based agents
  • Documentation

    • CLI and config docs updated with LLM examples and precedence rules

Completes the remaining subissues of #542 (multi-provider LLM):

## #546/#547 — Worker agents use LLMProvider abstraction
- Add async_complete() + common exceptions (LLMAuthError, LLMRateLimitError,
  LLMConnectionError) to LLMProvider ABC in base.py
- Implement async_complete() in AnthropicProvider (via AsyncAnthropic)
  and OpenAIProvider (via openai.AsyncOpenAI) with error mapping
- Refactor worker_agent.py: remove AsyncAnthropic import, add llm_provider
  constructor param, use llm_provider.async_complete() in execute_task
- Refactor frontend_worker_agent.py and test_worker_agent.py: same pattern

## #545 — CLI flags --llm-provider and --llm-model
- Add --llm-provider / --llm-model to `cf work start` and `cf work batch run`
- Thread params through execute_agent() in runtime.py
- Add llm_provider/llm_model fields to BatchRun dataclass and start_batch()
- Pass flags through _execute_task_subprocess() subprocess invocations

## #549 — Workspace config llm: block
- Add LLMConfig dataclass to config.py with provider/model/base_url fields
- Add llm: Optional[LLMConfig] = None to EnvironmentConfig
- Update from_dict() to deserialise llm: block
- runtime.py resolves provider with priority: CLI flag > env var > config > default

Priority chain: config file < CODEFRAME_LLM_PROVIDER env < --llm-provider CLI

Closes #545, #546, #547, #549
Part of #542
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 5, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Adds async completion APIs and unified LLM error types; implements provider-based async calls in Anthropic/OpenAI adapters; refactors agents to use an LLMProvider abstraction; exposes per-run CLI LLM flags and workspace LLM config; updates runtime/conductor wiring and tests.

Changes

Cohort / File(s) Summary
LLM Base Abstractions
codeframe/adapters/llm/base.py
Introduce LLMError and subclasses (LLMAuthError, LLMRateLimitError, LLMConnectionError) and add LLMProvider.async_complete(...) default implemented via asyncio.run_in_executor.
Provider Async Implementations
codeframe/adapters/llm/anthropic.py, codeframe/adapters/llm/openai.py
Add async_complete(...) coroutines that lazily init async SDK clients, mirror sync request construction, map SDK exceptions to new LLMError types, and parse responses via existing parsers.
Agent refactors
codeframe/agents/worker_agent.py, codeframe/agents/frontend_worker_agent.py, codeframe/agents/test_worker_agent.py
Replace direct Anthropic SDK usage with self.llm_provider.async_complete(...); add llm_provider injection/resolution, adapt retry/error handling to unified LLM errors, and normalize response/usage fields.
CLI & Conductor wiring
codeframe/cli/app.py, codeframe/core/conductor.py
Add --llm-provider and --llm-model CLI options; extend BatchRun and subprocess invocation to propagate these flags.
Runtime provider selection
codeframe/core/runtime.py
Resolve provider/model with precedence CLI → env → workspace config → default; pass model/base_url overrides into provider factory for builtin engines.
Workspace config
codeframe/core/config.py
Add LLMConfig dataclass and EnvironmentConfig.llm field; parse llm: block (provider, model, base_url) in from_dict().
Tests: async & provider changes
tests/adapters/test_llm_async.py, tests/agents/*, tests/core/*
Add async provider tests and LLM error-hierarchy tests; update agent tests to inject MockProvider instead of mocking Anthropic; add CLI flag and config parsing tests; minor lifecycle test cleanups.
CI / pytest config
.github/workflows/test.yml, pytest.ini
Remove step-level ANTHROPIC_API_KEY injection from CI test step; add lifecycle pytest marker and usage hint.
Docs / Demos
CLAUDE.md, demo-pr552.md
Document new CLI flags, provider precedence, and example .codeframe/config.yaml LLM block; add PR demo/walkthrough.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent / Worker
    participant Runtime as Runtime / Conductor
    participant Provider as LLMProvider
    participant SDK as Async SDK
    participant Parser as Response Parser

    Agent->>Runtime: start/execute_task(..., llm_provider/llm_model)
    Runtime->>Provider: async_complete(messages, purpose, max_tokens, system)
    Provider->>SDK: await messages.create(...) / chat.completions.create(...)
    SDK-->>Provider: raw SDK response
    Provider->>Parser: _parse_response(raw)
    Parser-->>Provider: LLMResponse (content, usage)
    Provider-->>Agent: return LLMResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

A rabbit hops through provider lanes, 🐇
Async pings and mapped error names,
Providers answer, parsers glean,
Flags and configs steer the scene,
I nibble tests and dance with gains. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: multi-provider LLM support (#545–549)' clearly and concisely describes the main change: implementing support for multiple LLM providers across the codebase.
Linked Issues check ✅ Passed All four linked issues (#545, #546, #547, #549) have their core requirements addressed: CLI flags added [#545], worker_agent refactored to LLMProvider abstraction [#546], frontend/test agents similarly refactored [#547], and workspace config support via llm: block implemented [#549].
Out of Scope Changes check ✅ Passed All code changes are scoped to the multi-provider LLM support objectives. The PR intentionally defers streaming_chat.py refactor to issue #548, and test/documentation updates are directly related to the feature.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/multi-provider-llm-545-549

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

Code Review — PR #552: Multi-provider LLM support (#545–549)

This is a well-structured refactor that successfully decouples agent code from the Anthropic SDK. The provider abstraction, exception hierarchy, config layering, and test updates are all correct in intent. A few issues need attention before merge.


Bug: asyncio.get_event_loop() is deprecated and wrong here

File: codeframe/adapters/llm/base.py, async_complete() default impl

loop = asyncio.get_event_loop()
return await loop.run_in_executor(...)

Since this method is async def, a loop is already running. get_event_loop() is deprecated since Python 3.10 and can raise DeprecationWarning (or worse, return the wrong loop in some runtime configurations). Replace with:

loop = asyncio.get_running_loop()
return await loop.run_in_executor(...)

Regression: Timeout removed from LLM calls

File: codeframe/agents/worker_agent.py, _call_llm_with_retry() / execute_task()

The original code intentionally calculated a dynamic timeout proportional to max_tokens:

# CRITICAL-1 FIX
timeout = base_timeout + (max_tokens / 1000.0) * timeout_per_1k_tokens

This has been removed with no replacement. LLM calls through any provider can now hang indefinitely if the upstream is slow/unresponsive. The TimeoutError is still listed in the @retry decorator but can no longer actually be raised from within async_complete. Consider wrapping the await self.llm_provider.async_complete(...) call with asyncio.wait_for(…, timeout=timeout) to restore this protection.


Lazy _async_client init pattern is fragile

Files: codeframe/adapters/llm/anthropic.py, codeframe/adapters/llm/openai.py

if not hasattr(self, "_async_client") or self._async_client is None:
    self._async_client = AsyncAnthropic(api_key=self.api_key)

hasattr guarding is unnecessary since __init__ should set _async_client = None. If _async_client isn't declared in __init__, add it there. The guarded pattern hides this omission and is confusing. Just use:

if self._async_client is None:
    self._async_client = AsyncAnthropic(api_key=self.api_key)

Fallback behavior silently dropped in frontend_worker_agent.py and test_worker_agent.py

Both agents previously fell back to template generation when no API client was available:

if not self.client:
    return self._generate_basic_component_template(spec)

These guards have been removed. The test_handle_missing_api_key test now says "via exception fallback" in its comment, but the actual exception path depends on what happens when get_provider("anthropic") is called without a valid API key — the lazy llm_provider property will still try to call get_provider(os.getenv("CODEFRAME_LLM_PROVIDER", "anthropic")) which may raise at a different layer. If the intent is to preserve graceful degradation, add a try/except around the async_complete call and fall back to the template on LLMError. If the intent is to surface failures loudly, update the test comment to say so explicitly and remove the misleading "fallback" language.


Minor: import asyncio inside async_complete in base.py

This is a deferred import inside an async method. Functionally fine, but inconsistent with the file's style. Since asyncio is part of the stdlib and has no side effects on import, consider moving it to the top-level imports for consistency.


Tests look solid overall

  • FailThenSucceedMockProvider, ErrorMockProvider, and RateLimitMockProvider cleanly replace the complex AsyncMock.side_effect chains from the old tests.
  • tests/core/test_config_llm.py covers the three priority tiers (default, env, config file) — well done.
  • tests/adapters/test_llm_async.py correctly exercises the MockProvider path without hitting real APIs.

One note: test_api_key_validation_rejects_invalid_format has been completely inverted — previously it tested that invalid keys were rejected; now it tests that they're ignored (which is the correct new behavior). The test name should be updated to reflect the new intent, e.g. test_injected_provider_bypasses_key_validation.


Summary

Issue Severity
get_event_loop()get_running_loop() High — deprecated, can fail in Python 3.10+
Timeout removed from LLM calls Medium — potential hang under slow/unresponsive providers
Fragile hasattr guard for _async_client Low — cleanup
Missing fallback in frontend/test agents Low — document the intent clearly
Stale test name Low — cosmetic

Fix the get_running_loop() issue and add back timeout protection before merging.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
tests/agents/test_frontend_worker_agent.py (1)

186-198: ⚠️ Potential issue | 🟡 Minor

Make the fallback deterministic.

_generate_react_component() now goes through self.llm_provider.async_complete(), so frontend_agent.client = None no longer triggers the fallback path. As written, this can pass without ever exercising the exception handling the test is named for.

🧪 Suggested change
     async def test_generate_component_api_fallback(self, frontend_agent):
         """Test component generation falls back on API failure."""
-        # Set client to None to trigger fallback
-        frontend_agent.client = None
+        failing_provider = Mock()
+        failing_provider.async_complete = AsyncMock(side_effect=RuntimeError("boom"))
+        frontend_agent._llm_provider = failing_provider
 
         spec = {"name": "FallbackComponent", "description": "Component with API failure"}
 
         code = await frontend_agent._generate_react_component(spec)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agents/test_frontend_worker_agent.py` around lines 186 - 198, The test
test_generate_component_api_fallback is no longer exercising the fallback
because setting frontend_agent.client = None doesn't make
_generate_react_component() raise; instead force the LLM call to fail by making
frontend_agent.llm_provider.async_complete raise an exception (e.g., set a
side_effect or patch it to raise) so _generate_react_component(spec) enters the
exception/fallback branch and returns the basic template; update the test to
patch/mock frontend_agent.llm_provider.async_complete (or the relevant async
method) to throw an error before calling _generate_react_component.
codeframe/cli/app.py (1)

2393-2404: ⚠️ Potential issue | 🔴 Critical

Provider-specific auth is still hardcoded to Anthropic.

Both preflight checks call require_anthropic_api_key() for every non-external engine before the LLM provider is resolved. That means cf work start --execute --llm-provider openai ... and the batch equivalent still fail unless an Anthropic key is set, which breaks the new multi-provider path. Validate credentials against the resolved provider (CLI > env > config), not just the engine type.

Also applies to: 3765-3775

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/cli/app.py` around lines 2393 - 2404, The preflight auth checks
hardcode Anthropic for any non-external engine; instead resolve the LLM provider
the same way the CLI does (check CLI flag, then env, then config) and call the
corresponding validator (require_openai_api_key, require_e2b_api_key,
require_anthropic_api_key) based on that resolved provider; update the execute
preflight block (the code around the engine check and is_external_engine(...)
usage) and the similar block at the other location (lines ~3765-3775) to compute
provider = resolve_llm_provider(...) and switch on provider to invoke the right
require_* validator rather than always calling require_anthropic_api_key for
non-external engines.
codeframe/adapters/llm/base.py (1)

45-50: ⚠️ Potential issue | 🟠 Major

Update the GENERATION default model.

DEFAULT_GENERATION_MODEL is still claude-3-5-haiku-20241022, so callers using the shared selector won't follow the repository's required GENERATION mapping.

♻️ Suggested change
-DEFAULT_GENERATION_MODEL = "claude-3-5-haiku-20241022"
+DEFAULT_GENERATION_MODEL = "claude-haiku-4-20250514"
As per coding guidelines, `codeframe/adapters/llm/**/*.py`: Model selection must follow the pattern: PLANNING → claude-sonnet-4-20250514, EXECUTION → claude-sonnet-4-20250514, GENERATION → claude-haiku-4-20250514.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/llm/base.py` around lines 45 - 50, The
DEFAULT_GENERATION_MODEL constant is out of date; update
DEFAULT_GENERATION_MODEL in the adapter constants (alongside
DEFAULT_PLANNING_MODEL, DEFAULT_EXECUTION_MODEL, DEFAULT_CORRECTION_MODEL,
DEFAULT_SUPERVISION_MODEL) to use the required generation mapping
"claude-haiku-4-20250514" so that callers using the shared selector follow the
repository's GENERATION model policy; locate and replace the current value of
DEFAULT_GENERATION_MODEL with "claude-haiku-4-20250514".
codeframe/agents/frontend_worker_agent.py (1)

33-63: ⚠️ Potential issue | 🟠 Major

FrontendWorkerAgent still has no public provider injection path.

This refactor switched generation to self.llm_provider, but the constructor still doesn't accept/forward llm_provider. Callers and tests now have to mutate _llm_provider privately to control the backend, and api_key is only retained as an attribute here. Please expose provider injection in __init__, and if api_key remains supported, consume it during provider creation instead of just storing it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/agents/frontend_worker_agent.py` around lines 33 - 63, The
FrontendWorkerAgent constructor lacks a public injection for the LLM provider
and currently only stores api_key; update __init__ to accept an llm_provider
parameter (or provider_config) and use it to set up self.llm_provider instead of
requiring callers to mutate _llm_provider privately, and if api_key is still
supported consume it when constructing the provider; specifically, add an
llm_provider argument to FrontendWorkerAgent.__init__, pass/forward that into
the provider creation logic used by the base class or set self.llm_provider
appropriately (using api_key to initialize the provider if provided), and remove
the expectation that callers must mutate _llm_provider directly.
codeframe/core/conductor.py (1)

2174-2194: ⚠️ Potential issue | 🟠 Major

Persist llm_provider / llm_model with the batch record.

BatchRun now carries these fields, but _save_batch() still writes only the old schema and _row_to_batch() still reconstructs only the old fields. After any DB round-trip—most notably resume_batch()—the override is lost and tasks rerun with env/default selection instead.

Also applies to: 2201-2216

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/conductor.py` around lines 2174 - 2194, The DB schema
persistence and reconstruction for BatchRun miss the new llm_provider and
llm_model fields: update _save_batch() to include llm_provider and llm_model in
the INSERT OR REPLACE INTO batch_runs columns and values (add corresponding
entries to the parameter tuple using batch.llm_provider and batch.llm_model) and
update _row_to_batch() to read these two columns from the cursor row and set
them on the reconstructed BatchRun (or pass them into the BatchRun constructor),
ensuring resumed batches retain their overridden provider/model; also apply the
same additions for the duplicate block referenced around lines 2201–2216.
codeframe/agents/test_worker_agent.py (1)

39-70: ⚠️ Potential issue | 🟠 Major

Expose llm_provider on this subclass too.

WorkerAgent now supports provider injection, but TestWorkerAgent.__init__() still only accepts provider/api_key and never forwards an injected provider. That forces callers to mutate _llm_provider or rely on env state, which defeats the new multi-provider path.

Suggested change
-from codeframe.adapters.llm.base import Purpose
+from codeframe.adapters.llm.base import Purpose, LLMProvider
...
     def __init__(
         self,
         agent_id: str,
         provider: str = "anthropic",
         maturity: AgentMaturity = AgentMaturity.D1,
         api_key: Optional[str] = None,
+        llm_provider: Optional[LLMProvider] = None,
         websocket_manager=None,
         max_correction_attempts: int = 3,
         db=None,
     ):
...
         super().__init__(
             agent_id=agent_id,
             agent_type="test",
             provider=provider,
             maturity=maturity,
             system_prompt=self._build_system_prompt(),
             db=db,
+            llm_provider=llm_provider,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/agents/test_worker_agent.py` around lines 39 - 70,
TestWorkerAgent.__init__ should accept and forward an injected llm_provider so
callers can use multi-provider injection: add a new parameter llm_provider:
Optional[...] = None to the TestWorkerAgent.__init__ signature (keep existing
provider and api_key for backwards compatibility), pass that through to
super().__init__ as llm_provider=llm_provider (instead of relying solely on
provider/env), and if necessary set self.llm_provider = llm_provider when
provided so the subclass exposes the injected provider; update any
docstring/parameter list accordingly.
🧹 Nitpick comments (1)
tests/agents/test_worker_agent.py (1)

424-428: Rename this test to match its actual expectation.

The test currently validates successful execution with injected MockProvider, but its name still says rejects_invalid_format, which is misleading.

Suggested rename
-async def test_api_key_validation_rejects_invalid_format(self, db):
+async def test_api_key_validation_ignored_with_injected_provider(self, db):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agents/test_worker_agent.py` around lines 424 - 428, The test function
currently named something like test_rejects_invalid_format in
tests/agents/test_worker_agent.py is misleading because it actually verifies
successful execution when a MockProvider is injected; rename the test to a clear
name such as test_executes_with_injected_MockProvider or
test_accepts_injected_MockProvider_without_key_validation, update the function
definition and any references/assert messages to the new name, and run the tests
to ensure no other code relies on the old name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@codeframe/adapters/llm/anthropic.py`:
- Around line 123-173: The sync method complete() currently fails to map
Anthropic SDK exceptions to the adapter's base exceptions (AuthenticationError →
LLMAuthError, RateLimitError → LLMRateLimitError, APIConnectionError →
LLMConnectionError) unlike async_complete(), causing inconsistent contracts;
update complete() to import the same SDK exceptions (AuthenticationError,
RateLimitError, APIConnectionError) and wrap its call to
self._client.messages.create(...) in a try/except that catches those three
exceptions and re-raises them as LLMAuthError, LLMRateLimitError, and
LLMConnectionError respectively (preserve original message and use "from exc"),
mirroring the async_complete() behavior and keeping function name complete() and
parsing logic (_parse_response) unchanged.

In `@codeframe/adapters/llm/openai.py`:
- Around line 149-197: The sync path (complete()) currently maps OpenAI SDK
exceptions to ValueError while async_complete() maps them to LLMAuthError /
LLMRateLimitError / LLMConnectionError, causing divergent contracts; update
complete() to import and raise the same adapter exceptions (LLMAuthError,
LLMRateLimitError, LLMConnectionError) when catching
_openai.AuthenticationError, _openai.RateLimitError, and
_openai.APIConnectionError respectively so both complete() and async_complete()
present the same exception types to callers (refer to the methods complete and
async_complete and the exception classes LLMAuthError, LLMRateLimitError,
LLMConnectionError).

In `@codeframe/agents/worker_agent.py`:
- Around line 279-284: The issue is that model_name overrides are not propagated
into the actual LLM call, causing cost/audit/metadata drift; update
_call_llm_with_retry (and the other call sites noted around execute_task) to
pass the model_name override through to self.llm_provider.async_complete (add
the model_name argument to the async_complete call) so the provider uses the
correct model, and ensure execute_task uses that same model_name when computing
cost limits, populating audit logs, and building the returned payload/metadata
(so reported model and cost guardrails match the real request).
- Around line 104-111: The llm_provider property currently ignores the
constructor-provided self.provider when lazily creating the adapter; update the
llm_provider getter so that when self._llm_provider is None it first uses
self.provider (if truthy) as the provider_type, otherwise falls back to
os.getenv("CODEFRAME_LLM_PROVIDER", "anthropic"), and then calls
get_provider(provider_type) to set self._llm_provider; reference the
llm_provider property, self.provider field, and get_provider to locate and
change the logic.

In `@codeframe/core/runtime.py`:
- Around line 672-679: The Anthropic branch ignores llm_model because
provider_kwargs (with model/base_url) are only passed into get_provider for
non-external engines; update the Anthropic path so the constructed provider
receives those kwargs — either by calling get_provider(provider_type,
**provider_kwargs) for provider_type == "anthropic" or by passing
provider_kwargs into the AnthropicProvider constructor used in the else branch;
look for is_external_engine, provider_type, provider_kwargs, get_provider, and
AnthropicProvider and ensure model/base_url from
model_override/base_url_override are forwarded to the provider.
- Around line 643-661: The precedence for resolving base_url is inconsistent:
update base_url_override so environment variables win over workspace config
(match provider_type and model_override) by using the OPENAI_BASE_URL env var
before falling back to _cfg_base_url; locate base_url_override (and related
symbols _cfg_base_url, provider_type, model_override) in runtime.py and swap the
order so the env var is checked first, e.g., use the OPENAI_BASE_URL env value
or else _cfg_base_url.

In `@tests/agents/test_worker_agent_provider.py`:
- Around line 3-6: The test module imports unused symbols causing lint
failures—remove the unused imports MagicMock and patch from the unittest.mock
import, and remove AgentMaturity from the codeframe.core.models import so only
the actually used imports (MockProvider, WorkerAgent, and Task) remain; update
the import lines that contain MagicMock, patch, and AgentMaturity accordingly to
keep the file lint-clean.

In `@tests/agents/test_worker_agent.py`:
- Around line 665-677: The test currently only verifies that a warning was
logged for prompt sanitization but doesn't assert that execution continued;
update the test around agent.execute_task(task) to explicitly assert the
non-blocking outcome by checking the returned result (variable result) or the
expected success indicator (e.g., result.success or result.status ==
"completed") or that the downstream execution method was invoked, using the
existing agent.execute_task and task objects; add a concise assertion after the
logger check to ensure the task completed successfully.
- Around line 19-20: This test module is missing the required v2 pytest marker;
add the marker by either setting module-level pytestmark = pytest.mark.v2 at the
top of the module or decorating the test functions with `@pytest.mark.v2` so the
tests that import MockProvider, LLMRateLimitError, and LLMConnectionError are
correctly marked as v2; ensure you import pytest if adding pytestmark or using
the decorator.

In `@tests/core/test_cli_llm_flags.py`:
- Line 4: The test imports unused mocks: remove the dead imports "patch" and
"MagicMock" from the import statement so the file no longer imports unused
symbols; either delete the entire "from unittest.mock import patch, MagicMock"
line or replace it with only mocks actually used in the file (e.g., remove both
symbols if none are used) to satisfy ruff.

In `@tests/core/test_config_llm.py`:
- Around line 3-4: Remove the unused imports tempfile and Path from the import
block at the top of the test module; locate the import line that currently reads
"import tempfile" and "from pathlib import Path" and delete those import tokens
so the test file no longer imports unused symbols (tempfile, Path) and Ruff/CI
will stop failing.

---

Outside diff comments:
In `@codeframe/adapters/llm/base.py`:
- Around line 45-50: The DEFAULT_GENERATION_MODEL constant is out of date;
update DEFAULT_GENERATION_MODEL in the adapter constants (alongside
DEFAULT_PLANNING_MODEL, DEFAULT_EXECUTION_MODEL, DEFAULT_CORRECTION_MODEL,
DEFAULT_SUPERVISION_MODEL) to use the required generation mapping
"claude-haiku-4-20250514" so that callers using the shared selector follow the
repository's GENERATION model policy; locate and replace the current value of
DEFAULT_GENERATION_MODEL with "claude-haiku-4-20250514".

In `@codeframe/agents/frontend_worker_agent.py`:
- Around line 33-63: The FrontendWorkerAgent constructor lacks a public
injection for the LLM provider and currently only stores api_key; update
__init__ to accept an llm_provider parameter (or provider_config) and use it to
set up self.llm_provider instead of requiring callers to mutate _llm_provider
privately, and if api_key is still supported consume it when constructing the
provider; specifically, add an llm_provider argument to
FrontendWorkerAgent.__init__, pass/forward that into the provider creation logic
used by the base class or set self.llm_provider appropriately (using api_key to
initialize the provider if provided), and remove the expectation that callers
must mutate _llm_provider directly.

In `@codeframe/agents/test_worker_agent.py`:
- Around line 39-70: TestWorkerAgent.__init__ should accept and forward an
injected llm_provider so callers can use multi-provider injection: add a new
parameter llm_provider: Optional[...] = None to the TestWorkerAgent.__init__
signature (keep existing provider and api_key for backwards compatibility), pass
that through to super().__init__ as llm_provider=llm_provider (instead of
relying solely on provider/env), and if necessary set self.llm_provider =
llm_provider when provided so the subclass exposes the injected provider; update
any docstring/parameter list accordingly.

In `@codeframe/cli/app.py`:
- Around line 2393-2404: The preflight auth checks hardcode Anthropic for any
non-external engine; instead resolve the LLM provider the same way the CLI does
(check CLI flag, then env, then config) and call the corresponding validator
(require_openai_api_key, require_e2b_api_key, require_anthropic_api_key) based
on that resolved provider; update the execute preflight block (the code around
the engine check and is_external_engine(...) usage) and the similar block at the
other location (lines ~3765-3775) to compute provider =
resolve_llm_provider(...) and switch on provider to invoke the right require_*
validator rather than always calling require_anthropic_api_key for non-external
engines.

In `@codeframe/core/conductor.py`:
- Around line 2174-2194: The DB schema persistence and reconstruction for
BatchRun miss the new llm_provider and llm_model fields: update _save_batch() to
include llm_provider and llm_model in the INSERT OR REPLACE INTO batch_runs
columns and values (add corresponding entries to the parameter tuple using
batch.llm_provider and batch.llm_model) and update _row_to_batch() to read these
two columns from the cursor row and set them on the reconstructed BatchRun (or
pass them into the BatchRun constructor), ensuring resumed batches retain their
overridden provider/model; also apply the same additions for the duplicate block
referenced around lines 2201–2216.

In `@tests/agents/test_frontend_worker_agent.py`:
- Around line 186-198: The test test_generate_component_api_fallback is no
longer exercising the fallback because setting frontend_agent.client = None
doesn't make _generate_react_component() raise; instead force the LLM call to
fail by making frontend_agent.llm_provider.async_complete raise an exception
(e.g., set a side_effect or patch it to raise) so
_generate_react_component(spec) enters the exception/fallback branch and returns
the basic template; update the test to patch/mock
frontend_agent.llm_provider.async_complete (or the relevant async method) to
throw an error before calling _generate_react_component.

---

Nitpick comments:
In `@tests/agents/test_worker_agent.py`:
- Around line 424-428: The test function currently named something like
test_rejects_invalid_format in tests/agents/test_worker_agent.py is misleading
because it actually verifies successful execution when a MockProvider is
injected; rename the test to a clear name such as
test_executes_with_injected_MockProvider or
test_accepts_injected_MockProvider_without_key_validation, update the function
definition and any references/assert messages to the new name, and run the tests
to ensure no other code relies on the old name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e023dfd8-9d47-4d37-810d-09eb12b30009

📥 Commits

Reviewing files that changed from the base of the PR and between bdb6686 and 8749493.

📒 Files selected for processing (16)
  • codeframe/adapters/llm/anthropic.py
  • codeframe/adapters/llm/base.py
  • codeframe/adapters/llm/openai.py
  • codeframe/agents/frontend_worker_agent.py
  • codeframe/agents/test_worker_agent.py
  • codeframe/agents/worker_agent.py
  • codeframe/cli/app.py
  • codeframe/core/conductor.py
  • codeframe/core/config.py
  • codeframe/core/runtime.py
  • tests/adapters/test_llm_async.py
  • tests/agents/test_frontend_worker_agent.py
  • tests/agents/test_worker_agent.py
  • tests/agents/test_worker_agent_provider.py
  • tests/core/test_cli_llm_flags.py
  • tests/core/test_config_llm.py

Comment on lines +123 to +173
async def async_complete(
self,
messages: list[dict],
purpose: Purpose = Purpose.EXECUTION,
tools: Optional[list[Tool]] = None,
max_tokens: int = 4096,
temperature: float = 0.0,
system: Optional[str] = None,
) -> LLMResponse:
"""True async completion via AsyncAnthropic.

Raises LLMAuthError / LLMRateLimitError / LLMConnectionError on failure.
"""
from anthropic import AsyncAnthropic
from anthropic import (
AuthenticationError,
RateLimitError,
APIConnectionError,
)
from codeframe.adapters.llm.base import (
LLMAuthError,
LLMRateLimitError,
LLMConnectionError,
)

if not hasattr(self, "_async_client") or self._async_client is None:
self._async_client = AsyncAnthropic(api_key=self.api_key)

model = self.get_model(purpose)
kwargs: dict = {
"model": model,
"max_tokens": max_tokens,
"messages": self._convert_messages(messages),
}
if temperature > 0:
kwargs["temperature"] = temperature
if system:
kwargs["system"] = system
if tools:
kwargs["tools"] = self._convert_tools(tools)

try:
response = await self._async_client.messages.create(**kwargs)
return self._parse_response(response)
except AuthenticationError as exc:
raise LLMAuthError(str(exc)) from exc
except RateLimitError as exc:
raise LLMRateLimitError(str(exc)) from exc
except APIConnectionError as exc:
raise LLMConnectionError(str(exc)) from exc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the current state of the complete() method in anthropic.py
cd codeframe/adapters/llm && wc -l anthropic.py

Repository: frankbria/codeframe

Length of output: 79


🏁 Script executed:

# Get the full anthropic.py to see both complete() and async_complete() methods
cat -n codeframe/adapters/llm/anthropic.py | head -150

Repository: frankbria/codeframe

Length of output: 5551


🏁 Script executed:

# Search for the complete() method to see if it has exception mapping
rg "def complete" codeframe/adapters/llm/anthropic.py -A 30

Repository: frankbria/codeframe

Length of output: 1003


🏁 Script executed:

# Check the base adapter interface to understand the contract
cat codeframe/adapters/llm/base.py

Repository: frankbria/codeframe

Length of output: 10579


🏁 Script executed:

# Search for SupervisorResolver and _classify_with_supervision
fd -t f "*.py" | xargs rg "SupervisorResolver" --files-with-matches

Repository: frankbria/codeframe

Length of output: 459


🏁 Script executed:

# Look at the SupervisorResolver implementation
rg "class SupervisorResolver" -A 50

Repository: frankbria/codeframe

Length of output: 3215


🏁 Script executed:

# Search for _classify_with_supervision in conductor.py
rg "_classify_with_supervision" codeframe/core/conductor.py -A 20

Repository: frankbria/codeframe

Length of output: 1830


🏁 Script executed:

# Get more context around SupervisorResolver methods
rg "def " codeframe/core/conductor.py | grep -A 5 -B 5 "SupervisorResolver"

Repository: frankbria/codeframe

Length of output: 664


🏁 Script executed:

# Check what exceptions the base interface expects complete() to handle or raise
rg "def complete" codeframe/adapters/llm/base.py -A 15

Repository: frankbria/codeframe

Length of output: 599


🏁 Script executed:

# Get the full _classify_with_supervision method to see exception handling
rg "_classify_with_supervision" codeframe/core/conductor.py -A 40

Repository: frankbria/codeframe

Length of output: 3896


Apply the same exception mapping to complete().

The sync complete() method (lines 77–121) does not normalize Anthropic SDK exceptions to the base adapter interface, while async_complete() implements full exception mapping (AuthenticationErrorLLMAuthError, RateLimitErrorLLMRateLimitError, APIConnectionErrorLLMConnectionError).

This creates divergent error contracts between sync and async paths. SupervisorResolver._classify_with_supervision() calls complete() and catches all exceptions broadly, falling back to a heuristic—behavior that masks authentication and rate-limit failures, which should fail fast rather than silently degrade. Per coding guidelines, adapters must implement the base interface uniformly for integration consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/llm/anthropic.py` around lines 123 - 173, The sync method
complete() currently fails to map Anthropic SDK exceptions to the adapter's base
exceptions (AuthenticationError → LLMAuthError, RateLimitError →
LLMRateLimitError, APIConnectionError → LLMConnectionError) unlike
async_complete(), causing inconsistent contracts; update complete() to import
the same SDK exceptions (AuthenticationError, RateLimitError,
APIConnectionError) and wrap its call to self._client.messages.create(...) in a
try/except that catches those three exceptions and re-raises them as
LLMAuthError, LLMRateLimitError, and LLMConnectionError respectively (preserve
original message and use "from exc"), mirroring the async_complete() behavior
and keeping function name complete() and parsing logic (_parse_response)
unchanged.

Comment on lines +149 to +197
async def async_complete(
self,
messages: list[dict],
purpose: Purpose = Purpose.EXECUTION,
tools: Optional[list[Tool]] = None,
max_tokens: int = 4096,
temperature: float = 0.0,
system: Optional[str] = None,
) -> LLMResponse:
"""True async completion via openai.AsyncOpenAI.

Raises LLMAuthError / LLMRateLimitError / LLMConnectionError on failure.
"""
import openai as _openai
from codeframe.adapters.llm.base import (
LLMAuthError,
LLMRateLimitError,
LLMConnectionError,
)

if not hasattr(self, "_async_client") or self._async_client is None:
self._async_client = _openai.AsyncOpenAI(
api_key=self.api_key, base_url=self.base_url
)

converted = self._convert_messages(messages)
if system:
converted = [{"role": "system", "content": system}] + converted

kwargs: dict = {
"model": self.get_model(purpose),
"max_tokens": max_tokens,
"messages": converted,
"temperature": temperature,
}
if tools:
kwargs["tools"] = self._convert_tools(tools)
kwargs["tool_choice"] = "auto"

try:
response = await self._async_client.chat.completions.create(**kwargs)
return self._parse_response(response)
except _openai.AuthenticationError as exc:
raise LLMAuthError(str(exc)) from exc
except _openai.RateLimitError as exc:
raise LLMRateLimitError(str(exc)) from exc
except _openai.APIConnectionError as exc:
raise LLMConnectionError(str(exc)) from exc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the complete() method in openai.py
rg -A 50 "def complete\(" codeframe/adapters/llm/openai.py

Repository: frankbria/codeframe

Length of output: 1820


🏁 Script executed:

# Also check the base adapter interface to understand the contract
cat -n codeframe/adapters/llm/base.py | head -100

Repository: frankbria/codeframe

Length of output: 4170


🏁 Script executed:

# Let's get the full structure of the openai.py file to understand the class
wc -l codeframe/adapters/llm/openai.py

Repository: frankbria/codeframe

Length of output: 99


🏁 Script executed:

# Let's check the base class structure to see if there are abstract method definitions
rg -B 5 -A 20 "def complete|def async_complete" codeframe/adapters/llm/base.py

Repository: frankbria/codeframe

Length of output: 1840


🏁 Script executed:

# Let's also verify the OpenAIProvider class definition to see if it implements the interface
rg -B 2 "class OpenAIProvider" codeframe/adapters/llm/openai.py

Repository: frankbria/codeframe

Length of output: 99


🏁 Script executed:

# Let's verify what the LLMProvider base class looks like to confirm the contract
rg -B 5 -A 30 "class LLMProvider" codeframe/adapters/llm/base.py

Repository: frankbria/codeframe

Length of output: 1109


🏁 Script executed:

# Let's check if there are other adapter implementations to see how they handle exceptions
fd "\.py$" codeframe/adapters/llm/ | grep -v base.py | grep -v openai.py

Repository: frankbria/codeframe

Length of output: 164


🏁 Script executed:

# Check the Anthropic adapter's exception handling for both sync and async
rg -B 5 -A 30 "def complete\(|def async_complete\(" codeframe/adapters/llm/anthropic.py | head -150

Repository: frankbria/codeframe

Length of output: 2277


🏁 Script executed:

# Check the mock adapter as well to understand the pattern
rg -B 5 -A 30 "def complete\(|def async_complete\(" codeframe/adapters/llm/mock.py

Repository: frankbria/codeframe

Length of output: 1209


🏁 Script executed:

# Let's also check if there are any callers of the sync complete() method to understand the impact
rg "\.complete\(" codeframe/ --type py | grep -v "async_complete" | head -20

Repository: frankbria/codeframe

Length of output: 1498


Sync and async paths have divergent exception contracts.

async_complete() raises LLMAuthError, LLMRateLimitError, and LLMConnectionError, but complete() catches the same OpenAI SDK exceptions and converts them to ValueError. Callers using the sync path will handle different exception types than those using the async path, violating the uniform adapter interface requirement stated in the coding guidelines: "Adapters (LLM providers in codeframe/adapters/llm/) must implement the base adapter interface for uniform integration."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/adapters/llm/openai.py` around lines 149 - 197, The sync path
(complete()) currently maps OpenAI SDK exceptions to ValueError while
async_complete() maps them to LLMAuthError / LLMRateLimitError /
LLMConnectionError, causing divergent contracts; update complete() to import and
raise the same adapter exceptions (LLMAuthError, LLMRateLimitError,
LLMConnectionError) when catching _openai.AuthenticationError,
_openai.RateLimitError, and _openai.APIConnectionError respectively so both
complete() and async_complete() present the same exception types to callers
(refer to the methods complete and async_complete and the exception classes
LLMAuthError, LLMRateLimitError, LLMConnectionError).

Comment on lines +104 to +111
@property
def llm_provider(self) -> LLMProvider:
"""Return the LLM provider, initialising from env vars if not supplied."""
if self._llm_provider is None:
import os
provider_type = os.getenv("CODEFRAME_LLM_PROVIDER", "anthropic")
self._llm_provider = get_provider(provider_type)
return self._llm_provider
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Honor the constructor provider when lazily creating the adapter.

When _llm_provider is absent, this property ignores self.provider and resolves only from CODEFRAME_LLM_PROVIDER or "anthropic". That makes WorkerAgent(provider="openai") instantiate the wrong backend unless the env var happens to match.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/agents/worker_agent.py` around lines 104 - 111, The llm_provider
property currently ignores the constructor-provided self.provider when lazily
creating the adapter; update the llm_provider getter so that when
self._llm_provider is None it first uses self.provider (if truthy) as the
provider_type, otherwise falls back to os.getenv("CODEFRAME_LLM_PROVIDER",
"anthropic"), and then calls get_provider(provider_type) to set
self._llm_provider; reference the llm_provider property, self.provider field,
and get_provider to locate and change the logic.

Comment on lines 279 to 284
return await self.llm_provider.async_complete(
messages=messages,
purpose=Purpose.EXECUTION,
max_tokens=max_tokens,
system=system,
messages=messages,
timeout=timeout,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

model_name is now bookkeeping only.

execute_task() still uses model_name for cost limits, audit logs, and the returned payload, but _call_llm_with_retry() never passes that override into async_complete(). The request uses whatever model the provider was constructed with, so cost guardrails and reported metadata can drift from the actual call.

Also applies to: 419-478

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/agents/worker_agent.py` around lines 279 - 284, The issue is that
model_name overrides are not propagated into the actual LLM call, causing
cost/audit/metadata drift; update _call_llm_with_retry (and the other call sites
noted around execute_task) to pass the model_name override through to
self.llm_provider.async_complete (add the model_name argument to the
async_complete call) so the provider uses the correct model, and ensure
execute_task uses that same model_name when computing cost limits, populating
audit logs, and building the returned payload/metadata (so reported model and
cost guardrails match the real request).

Comment on lines +643 to +661
# Resolve LLM provider: CLI flag → env var → workspace config → default "anthropic"
from codeframe.core.config import load_environment_config as _load_cfg
_env_cfg = _load_cfg(workspace.repo_path)
_cfg_provider = _env_cfg.llm.provider if (_env_cfg and _env_cfg.llm) else None
_cfg_model = _env_cfg.llm.model if (_env_cfg and _env_cfg.llm) else None
_cfg_base_url = _env_cfg.llm.base_url if (_env_cfg and _env_cfg.llm) else None

provider_type = (
llm_provider
or os.getenv("CODEFRAME_LLM_PROVIDER")
or _cfg_provider
or "anthropic"
)
model_override = (
llm_model
or os.getenv("CODEFRAME_LLM_MODEL")
or _cfg_model
)
base_url_override = _cfg_base_url or os.getenv("OPENAI_BASE_URL")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep base_url precedence consistent with the rest of LLM resolution.

base_url_override = _cfg_base_url or os.getenv("OPENAI_BASE_URL") makes workspace config win over env, while provider_type and model_override use env over config. That blocks env-level overrides for local endpoints.

Suggested change
-    base_url_override = _cfg_base_url or os.getenv("OPENAI_BASE_URL")
+    base_url_override = os.getenv("OPENAI_BASE_URL") or _cfg_base_url
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Resolve LLM provider: CLI flag → env var → workspace config → default "anthropic"
from codeframe.core.config import load_environment_config as _load_cfg
_env_cfg = _load_cfg(workspace.repo_path)
_cfg_provider = _env_cfg.llm.provider if (_env_cfg and _env_cfg.llm) else None
_cfg_model = _env_cfg.llm.model if (_env_cfg and _env_cfg.llm) else None
_cfg_base_url = _env_cfg.llm.base_url if (_env_cfg and _env_cfg.llm) else None
provider_type = (
llm_provider
or os.getenv("CODEFRAME_LLM_PROVIDER")
or _cfg_provider
or "anthropic"
)
model_override = (
llm_model
or os.getenv("CODEFRAME_LLM_MODEL")
or _cfg_model
)
base_url_override = _cfg_base_url or os.getenv("OPENAI_BASE_URL")
# Resolve LLM provider: CLI flag → env var → workspace config → default "anthropic"
from codeframe.core.config import load_environment_config as _load_cfg
_env_cfg = _load_cfg(workspace.repo_path)
_cfg_provider = _env_cfg.llm.provider if (_env_cfg and _env_cfg.llm) else None
_cfg_model = _env_cfg.llm.model if (_env_cfg and _env_cfg.llm) else None
_cfg_base_url = _env_cfg.llm.base_url if (_env_cfg and _env_cfg.llm) else None
provider_type = (
llm_provider
or os.getenv("CODEFRAME_LLM_PROVIDER")
or _cfg_provider
or "anthropic"
)
model_override = (
llm_model
or os.getenv("CODEFRAME_LLM_MODEL")
or _cfg_model
)
base_url_override = os.getenv("OPENAI_BASE_URL") or _cfg_base_url
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@codeframe/core/runtime.py` around lines 643 - 661, The precedence for
resolving base_url is inconsistent: update base_url_override so environment
variables win over workspace config (match provider_type and model_override) by
using the OPENAI_BASE_URL env var before falling back to _cfg_base_url; locate
base_url_override (and related symbols _cfg_base_url, provider_type,
model_override) in runtime.py and swap the order so the env var is checked
first, e.g., use the OPENAI_BASE_URL env value or else _cfg_base_url.

Comment on lines +19 to +20
from codeframe.adapters.llm import MockProvider
from codeframe.adapters.llm.base import LLMRateLimitError, LLMConnectionError
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add the required v2 marker to this test module.

This updated test module under tests/** is missing @pytest.mark.v2 (or module-level pytestmark = pytest.mark.v2), which violates the repo test-marking rule.

Suggested patch
 import pytest
 import os
 from unittest.mock import Mock, AsyncMock, patch
 from codeframe.agents.worker_agent import WorkerAgent
 from codeframe.core.models import Task, AgentMaturity, CallType, TaskStatus
 from codeframe.persistence.database import Database
 from codeframe.adapters.llm import MockProvider
 from codeframe.adapters.llm.base import LLMRateLimitError, LLMConnectionError
 
+pytestmark = pytest.mark.v2
+

As per coding guidelines: tests/**/*.py: New v2 Python tests must be marked with @pytest.mark.v2 decorator or pytestmark = pytest.mark.v2.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from codeframe.adapters.llm import MockProvider
from codeframe.adapters.llm.base import LLMRateLimitError, LLMConnectionError
import pytest
import os
from unittest.mock import Mock, AsyncMock, patch
from codeframe.agents.worker_agent import WorkerAgent
from codeframe.core.models import Task, AgentMaturity, CallType, TaskStatus
from codeframe.persistence.database import Database
from codeframe.adapters.llm import MockProvider
from codeframe.adapters.llm.base import LLMRateLimitError, LLMConnectionError
pytestmark = pytest.mark.v2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agents/test_worker_agent.py` around lines 19 - 20, This test module is
missing the required v2 pytest marker; add the marker by either setting
module-level pytestmark = pytest.mark.v2 at the top of the module or decorating
the test functions with `@pytest.mark.v2` so the tests that import MockProvider,
LLMRateLimitError, and LLMConnectionError are correctly marked as v2; ensure you
import pytest if adding pytestmark or using the decorator.

Comment on lines +665 to +677
# Should log warning but still execute (sanitization is defensive, not blocking)
with patch("codeframe.agents.worker_agent.logger") as mock_logger:
result = await agent.execute_task(task)

# Check that warning was logged
mock_logger.warning.assert_any_call(
"Potential prompt injection detected",
extra={
"event": "prompt_injection_attempt",
"phrase": "ignore all previous instructions",
"agent_id": "test-001"
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert the non-blocking outcome explicitly.

Line 665 says sanitization should warn and still execute, but the test only checks logging. Add a completion assertion so the behavior is actually enforced.

Suggested patch
         with patch("codeframe.agents.worker_agent.logger") as mock_logger:
             result = await agent.execute_task(task)
+            assert result["status"] == "completed"
 
             # Check that warning was logged
             mock_logger.warning.assert_any_call(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agents/test_worker_agent.py` around lines 665 - 677, The test currently
only verifies that a warning was logged for prompt sanitization but doesn't
assert that execution continued; update the test around agent.execute_task(task)
to explicitly assert the non-blocking outcome by checking the returned result
(variable result) or the expected success indicator (e.g., result.success or
result.status == "completed") or that the downstream execution method was
invoked, using the existing agent.execute_task and task objects; add a concise
assertion after the logger check to ensure the task completed successfully.

@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

PR Review: feat: multi-provider LLM support (#545–549)

Solid implementation overall. The abstraction is clean, tests are well-refactored, and the priority chain (CLI > env > config > default) is exactly right. A few issues worth addressing before merge:


Bug: Empty content after response extraction

In codeframe/agents/worker_agent.py, after the provider refactor:

content = response.content
if not content:
    logger.warning(f"Empty response from LLM for task {task_id}")

The original code had content = "" as a fallback when empty. Now if response.content is falsy, content is set to None/empty string but execution continues to use content below without a fallback assignment. This could cause a NameError or unexpected behavior in the output dict. Suggest:

content = response.content or ""
if not content:
    logger.warning(...)

asyncio.get_event_loop() deprecation in base.py

In the default async_complete implementation:

loop = asyncio.get_event_loop()
return await loop.run_in_executor(...)

asyncio.get_event_loop() is deprecated in Python 3.10+ and raises a DeprecationWarning (and can raise RuntimeError in Python 3.12+ in some contexts). Since this method is always called from within a running event loop (it's async def), use:

loop = asyncio.get_running_loop()

Timeout removed (regression from prior CRITICAL-1 fix)

The original _call_llm_with_retry calculated and passed an adaptive timeout to the Anthropic client:

# CRITICAL-1 FIX: Calculate timeout based on max_tokens
base_timeout = 30.0
timeout_per_1k_tokens = 15.0
timeout = base_timeout + (max_tokens / 1000.0) * timeout_per_1k_tokens

The new async_complete API has no timeout parameter, so this protection is silently gone. For the WorkerAgent path specifically (which can request up to 8k tokens), this means LLM calls can hang indefinitely. The stall detector covers the ReAct engine, but WorkerAgent.execute_task calls don't go through that path.

A follow-up issue (#548-adjacent) to add timeout_s: Optional[float] to async_complete would preserve this. For now, at minimum update the CRITICAL-1 comment so it's clear this was intentionally deferred, not lost.


_async_client lazy-init via hasattr pattern

Both anthropic.py and openai.py use:

if not hasattr(self, "_async_client") or self._async_client is None:
    self._async_client = AsyncAnthropic(api_key=self.api_key)

The hasattr check is unnecessary since _async_client should always exist if initialized in __init__. Consider declaring self._async_client: Optional[AsyncAnthropic] = None in __init__ and simplifying to if self._async_client is None:. This also avoids the subtle issue where the attribute doesn't exist yet if __init__ is ever bypassed (e.g., object.__new__).


Minor: test mock signature divergence

ErrorMockProvider and RateLimitMockProvider use **kwargs in async_complete:

async def async_complete(self, **kwargs):

The base class uses explicit positional parameters. While functional for testing, this won't catch callers accidentally passing messages positionally instead of as a keyword argument. Matching the full signature (or at least *args, **kwargs) would be more robust.


CI change looks correct

Removing ANTHROPIC_API_KEY from the test env is appropriate — tests now use MockProvider throughout. Just confirm no tests in the tests/ tree still require a real key without being marked @pytest.mark.lifecycle.


Overall: The provider abstraction is well-executed and the test refactoring is much cleaner than the previous patch("anthropic.AsyncAnthropic") approach. The empty-content bug and get_event_loop deprecation are the only changes I'd consider blockers; the timeout regression is worth tracking but acceptable as a known deferral if documented.

@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

Follow-up Review — open items from previous reviews

The latest commits update test files, CI config, and pytest.ini. The test refactoring looks clean. However, the three issues flagged in my prior two reviews are still unresolved in the diff:

Still open: get_event_loop() deprecation

codeframe/adapters/llm/base.py still uses asyncio.get_event_loop() inside an async def method. This is deprecated in Python 3.10+ and can raise RuntimeError in Python 3.12+. Since a running loop is always present inside an async method, change to asyncio.get_running_loop().

Still open: empty content fallback

In codeframe/agents/worker_agent.py the refactored block sets content = response.content and logs a warning if falsy, but no longer assigns content = "" as the original did. Downstream code receives None where it expects a string. Fix: content = response.content or ""

Still open: timeout regression needs a comment

The CRITICAL-1 adaptive timeout has been removed with no replacement or tracking note. The TimeoutError in the retry decorator is now dead weight for the WorkerAgent path. If intentionally deferred, add a comment referencing the tracking issue so it does not look accidental.

The lifecycle test updates and pytestmark additions in the latest push look correct. The two blockers above (get_running_loop and content or "") are one-liner fixes worth landing before merge.

Test User added 2 commits April 5, 2026 14:30
- base.py: use asyncio.get_running_loop() (not deprecated get_event_loop()),
  move asyncio import to top-level
- worker_agent.py: content = response.content or "" (prevent None downstream),
  restore CRITICAL-1 adaptive timeout via asyncio.wait_for()
- anthropic.py + openai.py: declare _async_client=None in __init__,
  simplify lazy-init guard (hasattr not needed)
- test_worker_agent.py: match async_complete() signatures in error mock
  subclasses; rename stale test to test_injected_provider_bypasses_key_validation
@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

Review — Multi-Provider LLM Support

Good architectural work overall. CodeRabbit's review already covers the main structural issues (hardcoded Anthropic preflight auth, missing DB persistence of llm_provider/llm_model in BatchRun, sync complete() not mapping to the new exception types, etc.). I won't repeat those — they should all be addressed.

Here are a few additional issues I found that weren't flagged:


1. asyncio.get_event_loop() is deprecated in Python 3.10+

codeframe/adapters/llm/base.pyasync_complete() default impl uses asyncio.get_event_loop(). Since this method is only called from within an already-running event loop, use asyncio.get_running_loop() instead — it's explicit, non-deprecated, and raises RuntimeError rather than silently creating a new loop if called from a non-async context.


2. Timeout regression — tasks can now hang indefinitely

The original execute_task computed timeout = base_timeout + (max_tokens / 1000.0) * timeout_per_1k_tokens and passed it to _call_llm_with_retry. Both the calculation and the parameter are gone in this PR, and there's no provider-level timeout in their place. A stalled or slow provider will now block the worker indefinitely.

The stall-detector operates at a coarser granularity and won't catch a hung HTTP request mid-flight. At minimum the timeout should be reinstated — either as an optional timeout_s parameter on async_complete() in the base class, or by wrapping the provider call in asyncio.wait_for() inside _call_llm_with_retry.


3. _async_client is never closed — connection leak

Both AnthropicProvider.async_complete() and OpenAIProvider.async_complete() lazily create an _async_client as an instance attribute but never close it. The underlying httpx.AsyncClient holds an open connection pool that won't be released until garbage collection. Consider adding async def aclose(self) to the LLMProvider ABC and having the agent lifecycle call it, or at least implementing __del__ as a best-effort fallback.


4. llm_provider lazy property ignores CODEFRAME_LLM_MODEL and OPENAI_BASE_URL

worker_agent.py's llm_provider property only reads CODEFRAME_LLM_PROVIDER when initializing the fallback provider. But runtime.py also reads CODEFRAME_LLM_MODEL and OPENAI_BASE_URL and forwards them to get_provider(). The gap means these two env vars are silently ignored for any code path that doesn't go through runtime.execute_agent — including direct agent construction in tests. The lazy init should mirror runtime's resolution logic (or delegate to a shared resolve_llm_provider() helper).


Minor: ErrorMockProvider in exhaustion test has no call counter

test_retry_exhaustion_returns_failure now uses ErrorMockProvider but never asserts call count. The original test checked call_count == 3 to verify tenacity actually ran 3 attempts. Adding a call_count attribute to ErrorMockProvider and asserting it equals 3 would restore that coverage.


The LLMProvider ABC, exception hierarchy, and CLI priority chain are all clean. The timeout regression (#2) is the most important item here — it's a genuine reliability regression that existed prior to this PR for a reason.

@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Follow-up Review — final open items

Good progress on this PR. The main blockers from my previous reviews have all been addressed:

  • asyncio.get_running_loop() — fixed ✅
  • content = response.content or "" — fixed ✅
  • CRITICAL-1 adaptive timeout via asyncio.wait_for — restored ✅
  • _async_client = None declared in __init__ for both providers — fixed ✅
  • Test name test_injected_provider_bypasses_key_validation — renamed correctly ✅
  • FailThenSucceedMockProvider._call_count + assertion — added ✅

Two items from my last review are still open:


1. llm_provider lazy property ignores CODEFRAME_LLM_MODEL and OPENAI_BASE_URL

codeframe/agents/worker_agent.py, llm_provider property:

@property
def llm_provider(self) -> LLMProvider:
    if self._llm_provider is None:
        provider_type = os.getenv("CODEFRAME_LLM_PROVIDER", "anthropic")
        self._llm_provider = get_provider(provider_type)   # ← model/base_url never applied
    return self._llm_provider

runtime.py now correctly reads CODEFRAME_LLM_MODEL and OPENAI_BASE_URL and forwards them to get_provider(). Any code path that constructs a WorkerAgent directly (tests, future callers) won't pick up those two env vars. If a user sets CODEFRAME_LLM_MODEL=qwen2.5-coder:7b and OPENAI_BASE_URL=http://localhost:11434/v1 but doesn't go through runtime, the lazy init silently ignores them. The fix is one-liner:

model = os.getenv("CODEFRAME_LLM_MODEL")
base_url = os.getenv("OPENAI_BASE_URL")
self._llm_provider = get_provider(
    provider_type,
    **({"model": model} if model else {}),
    **({"base_url": base_url} if base_url else {}),
)

2. test_retry_exhaustion_returns_failure no longer asserts retry count

tests/agents/test_worker_agent.py — the exhaustion test now uses ErrorMockProvider (which always raises), but the assert call_count == 3 was removed without a replacement:

agent._llm_provider = ErrorMockProvider(default_response="Task completed")
result = await agent.execute_task(task)
assert result["status"] == "failed"
assert "Failed after 3 retry attempts" in result["output"]
# ← no call_count assertion here

ErrorMockProvider unlike FailThenSucceedMockProvider has no _call_count. Adding one and asserting agent._llm_provider._call_count == 3 restores the coverage that was there before.


3. _async_client connection leak (tracked from prior review, still unaddressed)

Both AnthropicProvider and OpenAIProvider lazily create an _async_client (httpx-backed) but never close it. This was flagged two reviews ago. If this is intentionally deferred to a follow-up issue, a comment referencing the issue number in the provider code would close the loop.


Overall: The PR is very close to merge-ready. Item 1 is a genuine behavioral gap; items 2 and 3 are lower-priority but should not be forgotten.

@frankbria frankbria merged commit 21785ae into main Apr 6, 2026
10 of 11 checks passed
@frankbria frankbria deleted the feat/multi-provider-llm-545-549 branch April 6, 2026 02:23
@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Review: feat: multi-provider LLM support (#545–549)

The abstraction is solid — clean exception hierarchy, proper async implementations for both providers, and MockProvider making tests much simpler. The threading of llm_provider/llm_model through CLI → conductor → runtime subprocess is consistent. Here are the issues worth addressing before merge:


Bug: base_url priority is inverted

In codeframe/core/runtime.py:

base_url_override = _cfg_base_url or os.getenv("OPENAI_BASE_URL")

This makes the config file win over the env var, contradicting the documented priority (CLI > env > config). Fix:

base_url_override = os.getenv("OPENAI_BASE_URL") or _cfg_base_url

Dev artifact committed to the repo

demo-pr552.md is 150 lines of acceptance-criteria demo output. It shouldn't be checked in — it'll accumulate with every PR and there's no obvious home for it. Remove it from the tree before merge.


Missing catch-all for base LLMError

worker_agent.execute_task() catches LLMAuthError and (LLMRateLimitError, LLMConnectionError, TimeoutError) separately, but not the base LLMError. If a provider raises the base class directly (or a future subclass not in this list), it propagates unhandled. A except LLMError as e catch-all after the specific handlers would prevent silent failures from bubbling up as unhandled exceptions.


Fallback behavior change is subtle but works

The old if not self.client: return self._generate_basic_component_template(spec) guard was explicit. The new path relies on the try block catching the ValueError raised by the lazy llm_provider property when no API key is available. Since the property access happens inside try, this works — but the test comment "exception fallback" is the only signal. Consider a brief inline comment in _generate_react_component and _generate_pytest_tests noting that the provider access can raise and that the existing except is intentionally broad.


Minor: lifecycle test comment still references ANTHROPIC_API_KEY

pytest.ini was updated with a comment # ANTHROPIC_API_KEY=sk-ant-... pytest -m lifecycle. Since ANTHROPIC_API_KEY was deliberately removed from the CI env, it would be slightly cleaner to generalize it to CODEFRAME_LLM_PROVIDER=anthropic ANTHROPIC_API_KEY=sk-ant-... to show it's provider-agnostic.


What's good

  • Exception hierarchy (LLMError → LLMAuthError / LLMRateLimitError / LLMConnectionError) is the right shape — provider-agnostic and tenacity-compatible.
  • async_complete defaults in the ABC (thread-executor wrapping complete()) let existing providers opt in without a rewrite.
  • MockProvider-based tests are much cleaner than the patched AsyncAnthropic mocks they replace.
  • Removing the ANTHROPIC_API_KEY env requirement from CI is the right move now that MockProvider handles unit tests.
  • The _async_client lazy init pattern is consistent between AnthropicProvider and OpenAIProvider.

The base_url priority inversion and the demo file are the two things that need to land before this ships. Everything else is informational.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant