feat: orchestration quota, streaming, tools, classify, and vision APIs#37
Conversation
stackbilt-admin
left a comment
There was a problem hiding this comment.
CodeBeast Review — PR #37 (feat: orchestration quota, streaming, tools, classify, and vision APIs)
Verdict: APPROVE with 4 findings (1 HIGH, 3 MEDIUM)
This is a big PR — 1330 additions across 12 files, closing 7 issues. The surface area is wide (streaming, tool loops, quota hooks, classify, vision, AI Gateway, balance reporting) but each piece is reasonably isolated. Reviewing against the edge-auth #102 context: the QuotaHook interface is exactly what foodfiles will wire up to consumeAnonymousQuota for demo traffic.
HIGH
H-1: Tool loop cost check is post-hoc — can overshoot maxCostUSD by one full response.
// factory.ts — generateResponseWithTools
const response = await this.generateResponse({ ...request, messages });
cumulativeCost += response.usage.cost;
if (opts.maxCostUSD !== undefined && cumulativeCost > opts.maxCostUSD) {
throw new ToolLoopLimitError(...)The cost check happens after the response is received and billed. If a tool loop is at $0.09 with a $0.10 cap and the next iteration costs $0.05, the loop will execute (total $0.14) and then throw. For expensive models (Opus at $75/1M output tokens), a single overshoot could be significant.
Fix: Add a pre-flight check: if (opts.maxCostUSD !== undefined && cumulativeCost + estimatedCost > opts.maxCostUSD) before calling generateResponse. The estimate won't be exact but it'll prevent obvious overshoots. Document that maxCostUSD is a soft cap with ±1 iteration tolerance.
MEDIUM
M-1: classify() JSON extraction fallback is fragile.
private parseJsonResponse(message: string): unknown {
try {
return JSON.parse(message);
} catch {
const start = message.indexOf('{');
const end = message.lastIndexOf('}');
if (start >= 0 && end > start) {
return JSON.parse(message.slice(start, end + 1));
}
throw new ConfigurationError('factory', 'Classification response was not valid JSON');
}
}The indexOf('{') / lastIndexOf('}') heuristic will break on markdown-fenced JSON (\``json\n{...}\n```) where the braces are valid but there's trailing text. It will also match nested objects incorrectly if the model returns commentary with {characters before the actual JSON. Theresponse_format: { type: 'json_object' }` on the request should prevent this, but not all providers honor it equally (Cloudflare Workers AI notably doesn't support json_object mode for all models).
Suggestion: Strip markdown fences before the brace search. Or better: if JSON.parse fails on the full message, try stripping ```json...``` first, then fall back to brace extraction.
M-2: getDefaultVisionModel() hardcodes model IDs.
private getDefaultVisionModel(): string | undefined {
if (this.providers.has('anthropic')) return 'claude-haiku-4-5-20251001';
if (this.providers.has('openai')) return 'gpt-4o-mini';
return undefined;
}These are hardcoded model IDs that will go stale. claude-haiku-4-5-20251001 is already a dated snapshot. When Anthropic ships 4.6 Haiku, this still routes to 4.5. Not a bug today, but a maintenance trap.
Suggestion: Pull from the provider's getModelCapabilities() and filter for supportsVision: true, picking the cheapest. Or at minimum, make these configurable on ProviderFactoryConfig.
M-3: Groq healthCheck indentation drift.
async healthCheck(): Promise<boolean> {
try {
- const response = await this.makeGroqRequest('/models', null, 'GET');
+ const response = await this.makeGroqRequest('/models', null, 'GET');The indentation changed from 6 spaces to 4 spaces on the const response line. Cosmetic, but it'll trigger lint noise and makes git blame noisy. Likely a whitespace artifact from the stacked PR rebase.
What's good
-
QuotaHookdesign —checkbefore dispatch,recordafter response,quotaFailPolicy: 'closed' | 'open'is a clean contract. The fail-open path logs a warning and continues, fail-closed re-throws asQuotaExceededError. This is exactly the pattern foodfiles needs to wireEDGE_AUTH.consumeAnonymousQuotainto the factory for demo traffic. -
Streaming with pre-first-chunk fallback —
openStreamWithFirstChunkreads the first chunk before returning the stream to the caller. If that first read throws, the factory catches it and falls through to the next provider. This is the right boundary — once the first chunk is delivered, the stream is committed. -
AI Gateway header forwarding —
getAIGatewayHeaders()inBaseProvideronly activates whenbaseUrlcontainsgateway.ai.cloudflare.com, so providers hitting their native APIs don't get polluted withcf-aig-*headers. The metadata composition (requestId, tenantId, custom) gives good observability through the gateway dashboard. -
analyzeImage()routing — Vision-only requests filter the provider chain viaproviderSupportsVision(). Cloudflare Workers AI (no vision) gets excluded. ThesupportsVisionflag on theLLMProviderinterface makes this extensible. -
Tool loop —
maxIterations,maxCostUSD,AbortSignal,onIterationcallback, cumulative cost in response metadata. The tool result serialization handles both success (JSON.stringify(output)) and error cases. The message accumulation correctly preserves the conversation context across iterations. -
Test coverage — 7 new factory test cases covering streaming fallback, quota hooks (allow/deny), tool loops, classify with confidence, and vision routing. Groq AI Gateway header test is thorough (verifies exact header values).
Edge-auth #102 integration note
The QuotaHook is the bridge. Foodfiles will implement:
quotaHook: {
check: async (input) => {
const result = await env.EDGE_AUTH.checkAnonymousQuota({
projectSlug: 'foodfiles',
feature: input.model.includes('vision') ? 'images' : 'generations',
clientIp: request.headers.get('CF-Connecting-IP')!,
});
return { allowed: result.allowed, remainingBudget: result.remaining };
},
record: async (input) => {
await env.EDGE_AUTH.consumeAnonymousQuota({
projectSlug: 'foodfiles',
feature: input.model.includes('vision') ? 'images' : 'generations',
clientIp: request.headers.get('CF-Connecting-IP')!,
});
}
}This wires the anonymous quota tier directly into the LLM provider factory for demo traffic. The quotaFailPolicy: 'open' setting would be appropriate for demos — if edge-auth is unreachable, let the demo work rather than breaking the landing page.
Ship it after addressing H-1 (tool loop cost overshoot).
🤖 Reviewed by CodeBeast (0xDEBT666F)
H-1: Add pre-flight cost guard to tool loop — uses previous iteration's cost as estimate to prevent obvious maxCostUSD overshoots. M-1: Strip markdown fences before brace extraction in classify() JSON parser so ```json...``` responses parse correctly. M-2: Make default vision model configurable via defaultVisionModel on ProviderFactoryConfig, falling back to hardcoded defaults. M-3: Fix Groq healthCheck indentation drift (6→4 space artifact). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Stacked on #36. Adds the remaining factory-level orchestration APIs and provider capabilities:
getProviderBalance(provider?): ledger-backed balance reporting with observability event; Anthropic API attempt + Groq explicit unavailability.generateResponseStream(): factory-level streaming with pre-first-chunk fallback across the provider chain.QuotaHookwithquotaFailPolicy: 'closed' | 'open'for consumer-side quota enforcement; quota check before request, record after response.generateResponseWithTools(): tool-loop helper withmaxIterations,maxCostUSD,AbortSignalsupport,onIterationcallback, and cumulative cost tracking.baseUrlpoints at a gateway endpoint.classify(): JSON response parsing, optional schema parser support, confidence extraction, andseedpassthrough.analyzeImage()plus image input support and vision-capable model routing for OpenAI/Anthropic-style providers.New types:
ToolExecutor,ToolLoopOptions,ToolLoopState,QuotaHook,QuotaCheckInput,QuotaRecordInput,ClassifyOptions,ClassifyResult,AnalyzeImageInput,ProviderBalance.New errors:
ToolLoopAbortedError,ToolLoopLimitError.Closes #25, closes #26, closes #27, closes #28, closes #29, closes #30, closes #35
Test plan
npm test— 184 tests pass across 11 suitesnpm run typecheck— cleanQuotaHookfail-closed rejects and fail-open allows on hook errormaxIterations,maxCostUSD, andabortSignalclassify()returns parsed JSON with confidence extractionanalyzeImage()routes to vision-capable models🤖 Generated with Claude Code