This module implements the AI engine abstraction layer — the core of the multi-AI system. Here's a thorough review.
Architecture
AIEngine (ABC)
├── GeminiEngine — Google genai SDK
├── OpenAIEngine — OpenAI-compatible (GPT, Grok, Local/Ollama)
└── ClaudeEngine — Anthropic SDK
All three share a common pattern:
call() → trim history → build messages → loop with auto-continue → update history → return
Key Design Pattern: Auto-Continue
Every engine implements a truncation recovery loop — if the model hits its output token limit, the engine automatically re-prompts to continue. This is a genuinely useful feature, but the implementation has issues.
Critical Issues
1. Duplicate initialize_engines() — config.py vs engines.py
There are two initialize_engines() functions: one in config.py and one in engines.py. They do the same thing differently:
| Aspect |
config.py version |
engines.py version |
| Gemini SDK |
google.generativeai (old) |
google.genai (new) |
| GeminiEngine args |
2 args (no client) |
3 args (with client) |
| max_tokens_key |
Not passed for grok/local |
Passed per-engine |
This is a conflict. Whichever runs last wins, but the config.py version would crash because GeminiEngine.__init__ requires 3 arguments (name, model_name, client), yet config.py only passes 2.
2. Gemini system prompt sent as "role": "model"
if self.system_prompt:
contents.append({"role": "model", "parts": [{"text": self.system_prompt}]})
This pretends the system prompt is a model response, which is semantically wrong. The Gemini API supports a dedicated system_instruction parameter in the generation config. This hack can cause the model to behave unpredictably.
Fix:
response = self.client.models.generate_content(
model=self.model_name,
contents=contents, # without system prompt
config={
"max_output_tokens": self.max_output_tokens,
"system_instruction": self.system_prompt or None,
},
)
3. GeminiEngine error handling doesn't wrap in AIError
# GeminiEngine.call()
except Exception as e:
logger.error(f"Gemini Error: {e}")
raise # ← raw exception
# OpenAIEngine.call()
except Exception as e:
logger.error(f"{self.name} API Error: {e}")
raise AIError(f"{self.name} error: {e}") # ← wrapped
This inconsistency means callers can't uniformly catch AIError for all engines.
4. Gemini auto-continue heuristic is fragile
if answer_chunk.count("```") % 2 == 1:
return True
if answer_chunk.rstrip().endswith((",", ":", "(", "[", "{")):
return True
These heuristics check answer_chunk (the latest chunk), not full_answer. On round 2+, this only examines the continuation fragment, not the accumulated text. An odd number of triple-backticks in a chunk doesn't necessarily mean truncation — the opening backtick could be in a previous chunk.
5. _trim_history is called twice per call()
def call(self, prompt: str) -> str:
self._trim_history() # ← before building messages
# ... API call ...
self.history.append(...)
self.history.append(...)
self._trim_history() # ← after appending
The first trim is redundant since history is always trimmed at the end of each call. Alternatively, if scrub() or load_persona() are interleaved, the first trim is still unnecessary since those methods reset history entirely.
Moderate Issues
6. Decaying tail_chars only in Gemini
# GeminiEngine only:
eff_tail_chars = max(300, int(tail_chars * (0.8 ** (round_idx - 1))))
OpenAI and Claude use a fixed tail_chars across rounds. This inconsistency should be intentional and documented, or unified.
7. No timeout configuration
None of the API calls specify timeouts. A hung API call will block indefinitely. All three SDKs support timeout parameters.
8. Claude passes empty string for system when no persona is set
system=self.system_prompt if self.system_prompt else "",
The Anthropic API accepts system as optional. Passing an empty string is wasteful at best and could subtly influence behavior. Better:
kwargs = {"model": self.model_name, "max_tokens": self.max_tokens, "messages": messages}
if self.system_prompt:
kwargs["system"] = self.system_prompt
response = self.client.messages.create(**kwargs)
Suggested Refactoring: Extract the Auto-Continue Loop
All three engines duplicate the same loop structure. This can be extracted:
class AIEngine(ABC):
def call(self, prompt: str) -> str:
self._trim_history()
messages = self._build_messages(prompt)
full_answer = self._auto_continue_loop(messages)
self.history.append({"role": "user", "content": prompt})
self.history.append({"role": "assistant", "content": full_answer})
self._trim_history()
return full_answer
def _auto_continue_loop(self, messages) -> str:
full_answer = ""
for round_idx in range(1, self._max_rounds + 1):
chunk, truncated = self._single_call(messages)
full_answer += chunk
if not truncated:
break
messages = self._append_continue(messages, chunk, full_answer, round_idx)
else:
full_answer += "\n\n[TRUNCATED: auto-continue limit reached]\n"
return full_answer
@abstractmethod
def _build_messages(self, prompt: str) -> list: ...
@abstractmethod
def _single_call(self, messages) -> tuple[str, bool]: ...
@abstractmethod
def _append_continue(self, messages, chunk, full, round_idx) -> list: ...
This would eliminate ~60% of the duplicated code across the three engine classes.
Summary
| Severity |
Issue |
Location |
| 🔴 Critical |
Duplicate & conflicting initialize_engines() |
config.py vs engines.py |
| 🔴 Critical |
System prompt as model role in Gemini |
GeminiEngine.call() |
| 🟠 High |
Inconsistent error wrapping (AIError) |
GeminiEngine.call() |
| 🟠 High |
Fragile truncation heuristics on chunk, not full |
_hit_output_limit() |
| 🟡 Medium |
No API timeouts |
All engines |
| 🟡 Medium |
Duplicated auto-continue loop |
All three call() methods |
| 🟢 Low |
Double _trim_history() calls |
All engines |
| 🟢 Low |
Empty string system prompt to Claude |
ClaudeEngine.call() |
This module implements the AI engine abstraction layer — the core of the multi-AI system. Here's a thorough review.
Architecture
All three share a common pattern:
Key Design Pattern: Auto-Continue
Every engine implements a truncation recovery loop — if the model hits its output token limit, the engine automatically re-prompts to continue. This is a genuinely useful feature, but the implementation has issues.
Critical Issues
1. Duplicate
initialize_engines()— config.py vs engines.pyThere are two
initialize_engines()functions: one inconfig.pyand one inengines.py. They do the same thing differently:config.pyversionengines.pyversiongoogle.generativeai(old)google.genai(new)This is a conflict. Whichever runs last wins, but the
config.pyversion would crash becauseGeminiEngine.__init__requires 3 arguments (name, model_name, client), yetconfig.pyonly passes 2.2. Gemini system prompt sent as
"role": "model"This pretends the system prompt is a model response, which is semantically wrong. The Gemini API supports a dedicated
system_instructionparameter in the generation config. This hack can cause the model to behave unpredictably.Fix:
3. GeminiEngine error handling doesn't wrap in
AIErrorThis inconsistency means callers can't uniformly catch
AIErrorfor all engines.4. Gemini auto-continue heuristic is fragile
These heuristics check
answer_chunk(the latest chunk), notfull_answer. On round 2+, this only examines the continuation fragment, not the accumulated text. An odd number of triple-backticks in a chunk doesn't necessarily mean truncation — the opening backtick could be in a previous chunk.5.
_trim_historyis called twice percall()The first trim is redundant since history is always trimmed at the end of each call. Alternatively, if
scrub()orload_persona()are interleaved, the first trim is still unnecessary since those methods reset history entirely.Moderate Issues
6. Decaying
tail_charsonly in GeminiOpenAI and Claude use a fixed
tail_charsacross rounds. This inconsistency should be intentional and documented, or unified.7. No timeout configuration
None of the API calls specify timeouts. A hung API call will block indefinitely. All three SDKs support
timeoutparameters.8. Claude passes empty string for system when no persona is set
The Anthropic API accepts
systemas optional. Passing an empty string is wasteful at best and could subtly influence behavior. Better:Suggested Refactoring: Extract the Auto-Continue Loop
All three engines duplicate the same loop structure. This can be extracted:
This would eliminate ~60% of the duplicated code across the three engine classes.
Summary
initialize_engines()GeminiEngine.call()GeminiEngine.call()_hit_output_limit()call()methods_trim_history()callsClaudeEngine.call()