Skip to content

feat: orchestration quota, streaming, tools, classify, and vision APIs#37

Merged
stackbilt-admin merged 2 commits intofix-factory-contract-bugsfrom
knock-out-remaining-issues
Apr 12, 2026
Merged

feat: orchestration quota, streaming, tools, classify, and vision APIs#37
stackbilt-admin merged 2 commits intofix-factory-contract-bugsfrom
knock-out-remaining-issues

Conversation

@stackbilt-admin
Copy link
Copy Markdown
Member

Summary

Stacked on #36. Adds the remaining factory-level orchestration APIs and provider capabilities:

New types: ToolExecutor, ToolLoopOptions, ToolLoopState, QuotaHook, QuotaCheckInput, QuotaRecordInput, ClassifyOptions, ClassifyResult, AnalyzeImageInput, ProviderBalance.
New errors: ToolLoopAbortedError, ToolLoopLimitError.

Closes #25, closes #26, closes #27, closes #28, closes #29, closes #30, closes #35

Test plan

  • npm test — 184 tests pass across 11 suites
  • npm run typecheck — clean
  • Verify streaming fallback triggers on pre-first-chunk failure
  • Verify QuotaHook fail-closed rejects and fail-open allows on hook error
  • Verify tool loop respects maxIterations, maxCostUSD, and abortSignal
  • Verify classify() returns parsed JSON with confidence extraction
  • Verify analyzeImage() routes to vision-capable models

🤖 Generated with Claude Code

Copy link
Copy Markdown
Member Author

@stackbilt-admin stackbilt-admin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeBeast Review — PR #37 (feat: orchestration quota, streaming, tools, classify, and vision APIs)

Verdict: APPROVE with 4 findings (1 HIGH, 3 MEDIUM)

This is a big PR — 1330 additions across 12 files, closing 7 issues. The surface area is wide (streaming, tool loops, quota hooks, classify, vision, AI Gateway, balance reporting) but each piece is reasonably isolated. Reviewing against the edge-auth #102 context: the QuotaHook interface is exactly what foodfiles will wire up to consumeAnonymousQuota for demo traffic.


HIGH

H-1: Tool loop cost check is post-hoc — can overshoot maxCostUSD by one full response.

// factory.ts — generateResponseWithTools
const response = await this.generateResponse({ ...request, messages });
cumulativeCost += response.usage.cost;
if (opts.maxCostUSD !== undefined && cumulativeCost > opts.maxCostUSD) {
  throw new ToolLoopLimitError(...)

The cost check happens after the response is received and billed. If a tool loop is at $0.09 with a $0.10 cap and the next iteration costs $0.05, the loop will execute (total $0.14) and then throw. For expensive models (Opus at $75/1M output tokens), a single overshoot could be significant.

Fix: Add a pre-flight check: if (opts.maxCostUSD !== undefined && cumulativeCost + estimatedCost > opts.maxCostUSD) before calling generateResponse. The estimate won't be exact but it'll prevent obvious overshoots. Document that maxCostUSD is a soft cap with ±1 iteration tolerance.


MEDIUM

M-1: classify() JSON extraction fallback is fragile.

private parseJsonResponse(message: string): unknown {
  try {
    return JSON.parse(message);
  } catch {
    const start = message.indexOf('{');
    const end = message.lastIndexOf('}');
    if (start >= 0 && end > start) {
      return JSON.parse(message.slice(start, end + 1));
    }
    throw new ConfigurationError('factory', 'Classification response was not valid JSON');
  }
}

The indexOf('{') / lastIndexOf('}') heuristic will break on markdown-fenced JSON (\``json\n{...}\n```) where the braces are valid but there's trailing text. It will also match nested objects incorrectly if the model returns commentary with {characters before the actual JSON. Theresponse_format: { type: 'json_object' }` on the request should prevent this, but not all providers honor it equally (Cloudflare Workers AI notably doesn't support json_object mode for all models).

Suggestion: Strip markdown fences before the brace search. Or better: if JSON.parse fails on the full message, try stripping ```json...``` first, then fall back to brace extraction.

M-2: getDefaultVisionModel() hardcodes model IDs.

private getDefaultVisionModel(): string | undefined {
  if (this.providers.has('anthropic')) return 'claude-haiku-4-5-20251001';
  if (this.providers.has('openai')) return 'gpt-4o-mini';
  return undefined;
}

These are hardcoded model IDs that will go stale. claude-haiku-4-5-20251001 is already a dated snapshot. When Anthropic ships 4.6 Haiku, this still routes to 4.5. Not a bug today, but a maintenance trap.

Suggestion: Pull from the provider's getModelCapabilities() and filter for supportsVision: true, picking the cheapest. Or at minimum, make these configurable on ProviderFactoryConfig.

M-3: Groq healthCheck indentation drift.

async healthCheck(): Promise<boolean> {
    try {
-      const response = await this.makeGroqRequest('/models', null, 'GET');
+    const response = await this.makeGroqRequest('/models', null, 'GET');

The indentation changed from 6 spaces to 4 spaces on the const response line. Cosmetic, but it'll trigger lint noise and makes git blame noisy. Likely a whitespace artifact from the stacked PR rebase.


What's good

  1. QuotaHook designcheck before dispatch, record after response, quotaFailPolicy: 'closed' | 'open' is a clean contract. The fail-open path logs a warning and continues, fail-closed re-throws as QuotaExceededError. This is exactly the pattern foodfiles needs to wire EDGE_AUTH.consumeAnonymousQuota into the factory for demo traffic.

  2. Streaming with pre-first-chunk fallbackopenStreamWithFirstChunk reads the first chunk before returning the stream to the caller. If that first read throws, the factory catches it and falls through to the next provider. This is the right boundary — once the first chunk is delivered, the stream is committed.

  3. AI Gateway header forwardinggetAIGatewayHeaders() in BaseProvider only activates when baseUrl contains gateway.ai.cloudflare.com, so providers hitting their native APIs don't get polluted with cf-aig-* headers. The metadata composition (requestId, tenantId, custom) gives good observability through the gateway dashboard.

  4. analyzeImage() routing — Vision-only requests filter the provider chain via providerSupportsVision(). Cloudflare Workers AI (no vision) gets excluded. The supportsVision flag on the LLMProvider interface makes this extensible.

  5. Tool loopmaxIterations, maxCostUSD, AbortSignal, onIteration callback, cumulative cost in response metadata. The tool result serialization handles both success (JSON.stringify(output)) and error cases. The message accumulation correctly preserves the conversation context across iterations.

  6. Test coverage — 7 new factory test cases covering streaming fallback, quota hooks (allow/deny), tool loops, classify with confidence, and vision routing. Groq AI Gateway header test is thorough (verifies exact header values).

Edge-auth #102 integration note

The QuotaHook is the bridge. Foodfiles will implement:

quotaHook: {
  check: async (input) => {
    const result = await env.EDGE_AUTH.checkAnonymousQuota({
      projectSlug: 'foodfiles',
      feature: input.model.includes('vision') ? 'images' : 'generations',
      clientIp: request.headers.get('CF-Connecting-IP')!,
    });
    return { allowed: result.allowed, remainingBudget: result.remaining };
  },
  record: async (input) => {
    await env.EDGE_AUTH.consumeAnonymousQuota({
      projectSlug: 'foodfiles',
      feature: input.model.includes('vision') ? 'images' : 'generations',
      clientIp: request.headers.get('CF-Connecting-IP')!,
    });
  }
}

This wires the anonymous quota tier directly into the LLM provider factory for demo traffic. The quotaFailPolicy: 'open' setting would be appropriate for demos — if edge-auth is unreachable, let the demo work rather than breaking the landing page.

Ship it after addressing H-1 (tool loop cost overshoot).

🤖 Reviewed by CodeBeast (0xDEBT666F)

H-1: Add pre-flight cost guard to tool loop — uses previous iteration's
cost as estimate to prevent obvious maxCostUSD overshoots.

M-1: Strip markdown fences before brace extraction in classify() JSON
parser so ```json...``` responses parse correctly.

M-2: Make default vision model configurable via defaultVisionModel on
ProviderFactoryConfig, falling back to hardcoded defaults.

M-3: Fix Groq healthCheck indentation drift (6→4 space artifact).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant