feat: orchestration quota, streaming, tools, classify, and vision APIs by stackbilt-admin · Pull Request #37 · Stackbilt-dev/llm-providers

stackbilt-admin · 2026-04-12T14:36:45Z

Summary

Stacked on #36. Adds the remaining factory-level orchestration APIs and provider capabilities:

feat: per-provider credit balance reporting in observability hooks #25 — getProviderBalance(provider?): ledger-backed balance reporting with observability event; Anthropic API attempt + Groq explicit unavailability.
feat: factory-level streaming with fallback chain support #26 — generateResponseStream(): factory-level streaming with pre-first-chunk fallback across the provider chain.
feat: pluggable QuotaHook for consumer-side quota enforcement #27 — QuotaHook with quotaFailPolicy: 'closed' | 'open' for consumer-side quota enforcement; quota check before request, record after response.
feat: optional tool-use loop helper (generateResponseWithTools) #28 — generateResponseWithTools(): tool-loop helper with maxIterations, maxCostUSD, AbortSignal support, onIteration callback, and cumulative cost tracking.
feat: Cloudflare AI Gateway metadata forwarding #29 — Cloudflare AI Gateway metadata/header forwarding when baseUrl points at a gateway endpoint.
Consolidate eco LLM calls through @stackbilt/llm-providers + add classify() capability #30 — classify(): JSON response parsing, optional schema parser support, confidence extraction, and seed passthrough.
feat: vision model support — image-to-text analysis capability #35 — analyzeImage() plus image input support and vision-capable model routing for OpenAI/Anthropic-style providers.

New types: ToolExecutor, ToolLoopOptions, ToolLoopState, QuotaHook, QuotaCheckInput, QuotaRecordInput, ClassifyOptions, ClassifyResult, AnalyzeImageInput, ProviderBalance.
New errors: ToolLoopAbortedError, ToolLoopLimitError.

Closes #25, closes #26, closes #27, closes #28, closes #29, closes #30, closes #35

Test plan

npm test — 184 tests pass across 11 suites
npm run typecheck — clean
Verify streaming fallback triggers on pre-first-chunk failure
Verify QuotaHook fail-closed rejects and fail-open allows on hook error
Verify tool loop respects maxIterations, maxCostUSD, and abortSignal
Verify classify() returns parsed JSON with confidence extraction
Verify analyzeImage() routes to vision-capable models

🤖 Generated with Claude Code

stackbilt-admin

CodeBeast Review — PR #37 (feat: orchestration quota, streaming, tools, classify, and vision APIs)

Verdict: APPROVE with 4 findings (1 HIGH, 3 MEDIUM)

This is a big PR — 1330 additions across 12 files, closing 7 issues. The surface area is wide (streaming, tool loops, quota hooks, classify, vision, AI Gateway, balance reporting) but each piece is reasonably isolated. Reviewing against the edge-auth #102 context: the QuotaHook interface is exactly what foodfiles will wire up to consumeAnonymousQuota for demo traffic.

HIGH

H-1: Tool loop cost check is post-hoc — can overshoot maxCostUSD by one full response.

// factory.ts — generateResponseWithTools
const response = await this.generateResponse({ ...request, messages });
cumulativeCost += response.usage.cost;
if (opts.maxCostUSD !== undefined && cumulativeCost > opts.maxCostUSD) {
  throw new ToolLoopLimitError(...)

The cost check happens after the response is received and billed. If a tool loop is at $0.09 with a $0.10 cap and the next iteration costs $0.05, the loop will execute (total $0.14) and then throw. For expensive models (Opus at $75/1M output tokens), a single overshoot could be significant.

Fix: Add a pre-flight check: if (opts.maxCostUSD !== undefined && cumulativeCost + estimatedCost > opts.maxCostUSD) before calling generateResponse. The estimate won't be exact but it'll prevent obvious overshoots. Document that maxCostUSD is a soft cap with ±1 iteration tolerance.

MEDIUM

M-1: classify() JSON extraction fallback is fragile.

private parseJsonResponse(message: string): unknown {
  try {
    return JSON.parse(message);
  } catch {
    const start = message.indexOf('{');
    const end = message.lastIndexOf('}');
    if (start >= 0 && end > start) {
      return JSON.parse(message.slice(start, end + 1));
    }
    throw new ConfigurationError('factory', 'Classification response was not valid JSON');
  }
}

The indexOf('{') / lastIndexOf('}') heuristic will break on markdown-fenced JSON (\``json\n{...}\n```) where the braces are valid but there's trailing text. It will also match nested objects incorrectly if the model returns commentary with {characters before the actual JSON. Theresponse_format: { type: 'json_object' }` on the request should prevent this, but not all providers honor it equally (Cloudflare Workers AI notably doesn't support json_object mode for all models).

Suggestion: Strip markdown fences before the brace search. Or better: if JSON.parse fails on the full message, try stripping ```json...``` first, then fall back to brace extraction.

M-2: getDefaultVisionModel() hardcodes model IDs.

private getDefaultVisionModel(): string | undefined {
  if (this.providers.has('anthropic')) return 'claude-haiku-4-5-20251001';
  if (this.providers.has('openai')) return 'gpt-4o-mini';
  return undefined;
}

These are hardcoded model IDs that will go stale. claude-haiku-4-5-20251001 is already a dated snapshot. When Anthropic ships 4.6 Haiku, this still routes to 4.5. Not a bug today, but a maintenance trap.

Suggestion: Pull from the provider's getModelCapabilities() and filter for supportsVision: true, picking the cheapest. Or at minimum, make these configurable on ProviderFactoryConfig.

M-3: Groq healthCheck indentation drift.

async healthCheck(): Promise<boolean> {
    try {
-      const response = await this.makeGroqRequest('/models', null, 'GET');
+    const response = await this.makeGroqRequest('/models', null, 'GET');

The indentation changed from 6 spaces to 4 spaces on the const response line. Cosmetic, but it'll trigger lint noise and makes git blame noisy. Likely a whitespace artifact from the stacked PR rebase.

What's good

QuotaHook design — check before dispatch, record after response, quotaFailPolicy: 'closed' | 'open' is a clean contract. The fail-open path logs a warning and continues, fail-closed re-throws as QuotaExceededError. This is exactly the pattern foodfiles needs to wire EDGE_AUTH.consumeAnonymousQuota into the factory for demo traffic.
Streaming with pre-first-chunk fallback — openStreamWithFirstChunk reads the first chunk before returning the stream to the caller. If that first read throws, the factory catches it and falls through to the next provider. This is the right boundary — once the first chunk is delivered, the stream is committed.
AI Gateway header forwarding — getAIGatewayHeaders() in BaseProvider only activates when baseUrl contains gateway.ai.cloudflare.com, so providers hitting their native APIs don't get polluted with cf-aig-* headers. The metadata composition (requestId, tenantId, custom) gives good observability through the gateway dashboard.
analyzeImage() routing — Vision-only requests filter the provider chain via providerSupportsVision(). Cloudflare Workers AI (no vision) gets excluded. The supportsVision flag on the LLMProvider interface makes this extensible.
Tool loop — maxIterations, maxCostUSD, AbortSignal, onIteration callback, cumulative cost in response metadata. The tool result serialization handles both success (JSON.stringify(output)) and error cases. The message accumulation correctly preserves the conversation context across iterations.
Test coverage — 7 new factory test cases covering streaming fallback, quota hooks (allow/deny), tool loops, classify with confidence, and vision routing. Groq AI Gateway header test is thorough (verifies exact header values).

Edge-auth #102 integration note

The QuotaHook is the bridge. Foodfiles will implement:

quotaHook: {
  check: async (input) => {
    const result = await env.EDGE_AUTH.checkAnonymousQuota({
      projectSlug: 'foodfiles',
      feature: input.model.includes('vision') ? 'images' : 'generations',
      clientIp: request.headers.get('CF-Connecting-IP')!,
    });
    return { allowed: result.allowed, remainingBudget: result.remaining };
  },
  record: async (input) => {
    await env.EDGE_AUTH.consumeAnonymousQuota({
      projectSlug: 'foodfiles',
      feature: input.model.includes('vision') ? 'images' : 'generations',
      clientIp: request.headers.get('CF-Connecting-IP')!,
    });
  }
}

This wires the anonymous quota tier directly into the LLM provider factory for demo traffic. The quotaFailPolicy: 'open' setting would be appropriate for demos — if edge-auth is unreachable, let the demo work rather than breaking the landing page.

Ship it after addressing H-1 (tool loop cost overshoot).

🤖 Reviewed by CodeBeast (0xDEBT666F)

H-1: Add pre-flight cost guard to tool loop — uses previous iteration's cost as estimate to prevent obvious maxCostUSD overshoots. M-1: Strip markdown fences before brace extraction in classify() JSON parser so ```json...``` responses parse correctly. M-2: Make default vision model configurable via defaultVisionModel on ProviderFactoryConfig, falling back to hardcoded defaults. M-3: Fix Groq healthCheck indentation drift (6→4 space artifact). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat add orchestration quota and vision APIs

9772e01

stackbilt-admin commented Apr 12, 2026

View reviewed changes

stackbilt-admin marked this pull request as ready for review April 12, 2026 14:53

stackbilt-admin merged commit 5bcd12e into fix-factory-contract-bugs Apr 12, 2026

This was referenced Apr 12, 2026

feat: vision model support — image-to-text analysis capability #35

Closed

feat: orchestration quota, streaming, tools, classify, and vision APIs #38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: orchestration quota, streaming, tools, classify, and vision APIs#37

feat: orchestration quota, streaming, tools, classify, and vision APIs#37
stackbilt-admin merged 2 commits intofix-factory-contract-bugsfrom
knock-out-remaining-issues

stackbilt-admin commented Apr 12, 2026

Uh oh!

stackbilt-admin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stackbilt-admin commented Apr 12, 2026

Summary

Test plan

Uh oh!

stackbilt-admin left a comment

Choose a reason for hiding this comment

CodeBeast Review — PR #37 (feat: orchestration quota, streaming, tools, classify, and vision APIs)

HIGH

MEDIUM

What's good

Edge-auth #102 integration note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant