All notable changes to @stackbilt/llm-providers are documented here.
Format follows Keep a Changelog. Versions use Semantic Versioning.
- SSE streaming schema validation (#41) — all four providers (
AnthropicProvider,OpenAIProvider,GroqProvider,CerebrasProvider) now surface malformed SSE frames asSchemaDriftErrorand fireonSchemaDriftinstead of swallowing silently. Anthropic additionally validatescontent_block_deltaevent shape anddelta.texttype; future tool-streaming delta types are skipped via forward-compat discriminator. 26 new streaming tests viadescribe.each. - Schema drift canary (#39 Part 2) —
src/utils/schema-canary.tsships three exported utilities:extractShape(obj)— walks a raw response and returns a flatpath → JSON-typemapcompareShapes(golden, live)— diffs two shape maps into{ added, removed, changed }runCanaryCheck(provider, golden, liveResponse)— one-shot canary returning aCanaryReport- Golden fixtures committed under
src/__tests__/fixtures/response-shapes/for all five providers - All exports available at package root
- Cache-aware routing (#52) — provider-agnostic cache hints on
LLMRequest:- New
CacheHintstype withstrategy,key,ttl,sessionId,cacheablePrefixfields LLMRequest.cache?: CacheHints— no-op for callers that don't set it- Anthropic:
strategy: 'provider-prefix' | 'both'wraps the system prompt as a content-block array withcache_control: { type: 'ephemeral' }and marks the last tool definition as a breakpoint whencacheablePrefixis'auto'or'tools' - OpenAI, Groq, Cerebras: automatic caching with no request-side translation needed
- Cloudflare: platform-level prefix caching via Workers AI binding
- New
- Cached token usage reporting (#52) — all providers now extract provider-specific cached token counts into normalized
TokenUsagefields:cachedInputTokens— Groq, Cerebras, OpenAI automatic cache hits (prompt_tokens_details.cached_tokens)cacheReadInputTokens— Anthropiccache_read_input_tokenscacheCreationInputTokens— Anthropiccache_creation_input_tokenssupportsPromptCacheflag added toModelCapabilitiesand populated for all Anthropic models
- Cloudflare LoRA / fine-tune forwarding (#51) —
LLMRequest.lora?: stringis forwarded to the Workers AI binding, enabling adapter-based fine-tunes without wrapper code. - Factory-level streaming with fallback (#26) —
LLMProviderFactory.generateResponseStream()andLLMProviders.generateResponseStream()use the same provider-selection, circuit-breaker, and exhaustion-registry path asgenerateResponse(). Pre-stream HTTP errors fall over to the next provider before emitting the first chunk. - Tool-use loop helper (#28) —
generateResponseWithTools(request, executor, opts?)owns thegenerateResponse → parse → execute → append → repeatloop. Ships withToolLoopLimitError(max iterations / cost cap) andToolLoopAbortedError(AbortSignal).ToolLoopOptionssupportsmaxIterations,maxCostUSD,onIteration,abortSignal. - Cloudflare AI Gateway metadata forwarding (#29) —
GatewayMetadataonLLMRequest(viaBaseProvider.getAIGatewayHeaders) forwardscf-aig-cache-key,cf-aig-cache-ttl, andcf-aig-metadataheaders only whenbaseUrlmatches the Cloudflare AI Gateway pattern. Non-Gateway base URLs are unaffected.
stop_sequenceschema drift false positive — the Anthropic response schema declaredstop_sequenceastype: 'string'but the real API always returnsnullwhen no stop sequence triggers, causingSchemaDriftErroron every normal response. Changed totype: 'string-or-null'. TheAnthropicResponseinternal interface andformatResponsenull guard updated to match.AnthropicProvider.getProviderBalance()invalid endpoint — was callingGET /v1/organizations/cost_report, which is not a valid Anthropic API endpoint, returning HTTP errors in production. Now returnsstatus: 'unavailable', source: 'not_supported'with a message directing users to the Admin API (likeGroqProvideralready did). (#25)- Inline
import('../types').TokenUsageannotations — cleaned up ingroq.ts,cerebras.ts,anthropic.ts, andopenai.ts;TokenUsagenow lives in the existingimport typeblock in each file.
analyzeImage()silent empty response on Cloudflare —@cf/meta/llama-3.2-11b-vision-instructvia the Workers AI binding requires a raw{ image: number[], prompt, max_tokens }input shape, not the OpenAI-compatiblemessages/image_urlformat. The chat path returnschoices[0].message.content === nullvia the binding, causingextractText()to silently return"". The provider now detects this model and dispatches to the raw binding format, mapping the result's{ response: string }back through the existing normalisation path. Other vision models (@cf/google/gemma-4-26b-a4b-it,@cf/meta/llama-4-scout-17b-16e-instruct) continue using the chat format unchanged. Fixes #53.
Bundles the unreleased 1.4.0 scope (model retirements, drift test) with envelope validation, env auto-discovery, and the declarative catalog into a single minor release. 1.4.0 was tagged in package.json but never published to npm; consumers upgrading from 1.3.0 receive all of the following.
- Declarative model catalog — new
src/model-catalog.tsintroduces a semantic catalog for provider/model metadata, recommendation use cases, lifecycle status, and runtime scoring. - Catalog tests — coverage for retired-model exclusion, provider-health-aware ranking, and request-shape use-case inference.
- Runtime recommendation API —
LLMProviders#getRecommendedModel(request, useCase?)exposes the same routing logic the factory uses internally. - Schema drift envelope validation —
OpenAIProvider,GroqProvider, andCerebrasProvidernow validate/chat/completionsresponse envelopes at the provider boundary, throwingSchemaDriftErroron mismatch to route through the factory fallback chain and fireonSchemaDriftinstead of corrupting downstream consumers silently. Anthropic envelope validation added in the same scope. Per-provider schema constants (not shared) — correlated drift across providers is a signal worth detecting. LLMProviders.fromEnv()static factory — auto-discovers providers from Cloudflare Workersenvbindings (AI,GROQ_API_KEY,CEREBRAS_API_KEY,OPENAI_API_KEY,ANTHROPIC_API_KEY) without manual wiring.- Model drift test (
src/__tests__/model-drift.test.ts) — asserts every provider'smodels[]array is symmetrically covered by its capabilities map. Prevents future retirement drift where a model is removed from one list but not the other. Runs across all 5 providers.
- Factory routing —
LLMProviderFactorynow selects provider/model pairs from the model catalog instead of relying only on hardcoded provider ordering. - Health-aware dispatch — recommendation and auto-routing now consider circuit-breaker state, including degraded and recovering providers, not just fully open breakers.
- Budget-aware dispatch — when a
CreditLedgeris attached, selection can demote providers under high utilization or near projected depletion. - Provider defaults — OpenAI, Anthropic, Cloudflare, Cerebras, and Groq now resolve default models through the shared catalog instead of separate hardcoded fallbacks.
- Cloudflare model recommendation —
CloudflareProvider.getRecommendedModel()now prefers modern active baselines such as Gemma 4 and GPT-OSS instead of legacy TinyLlama/Qwen heuristics. - Public recommendation exports —
MODEL_RECOMMENDATIONSandgetRecommendedModel()now exclude retired recommendation targets such asgpt-4o, while preserving deprecated constants for compatibility.
MODELS.CLAUDE_3_HAIKU(claude-3-haiku-20240307) — Anthropic retired 2026-04-19. Migrate toMODELS.CLAUDE_HAIKU_4_5orMODELS.CLAUDE_3_5_HAIKU. Export retained; callers get a compile-time@deprecatedwarning.MODELS.GPT_4O(gpt-4o) — retired by OpenAI on 2026-04-03. Migrate toMODELS.GPT_4O_MINIor a current GPT-4 successor. Export retained; callers get a compile-time@deprecatedwarning.
claude-3-haiku-20240307— dropped fromAnthropicProvider.models[]and its capabilities/pricing table. Calls to this ID will fail at Anthropic's cutoff; keeping it advertised would mislead consumers. Arbitrary-string passthrough on request inputs is unchanged.gpt-4o— dropped fromOpenAIProvider.models[]and its capabilities/pricing table.gpt-4-turbo-preview— dead alias dropped fromOpenAIProvider.models[](no corresponding capabilities entry; caught by the new drift test).
- Cloudflare Workers AI vision support —
CloudflareProvidernow acceptsrequest.imagesand routes to vision-capable models. Previously image data was silently dropped on the CF path. - Three new CF vision models:
@cf/google/gemma-4-26b-a4b-it— 256K context, vision + function calling + reasoning@cf/meta/llama-4-scout-17b-16e-instruct— natively multimodal, tool calling@cf/meta/llama-3.2-11b-vision-instruct— image understanding
CloudflareProvider.supportsVision = true— factory'sanalyzeImagenow dispatches to CF when configured.- Factory default vision fallback —
getDefaultVisionModel()falls back to@cf/google/gemma-4-26b-a4b-itwhen neither Anthropic nor OpenAI is configured, enabling CF-only deployments to useanalyzeImage().
- Images are passed to CF using the OpenAI-compatible
image_urlcontent-part shape (base64 data URIs). HTTP image URLs throw a helpfulConfigurationError— fetch the image and pass bytes inimage.data. - Attempting
request.imageson a non-vision CF model throws aConfigurationErrornaming the vision-capable alternatives.
- Structured Logger —
Loggerinterface withnoopLogger(silent default) andconsoleLogger(opt-in). All components accept an optionalloggervia config. Zeroconsole.*calls in production code. - Rate limit enforcement —
LLMProviderFactorynow checksCreditLedger.checkRateLimit(rpm/rpd)before dispatching to each provider, skipping exceeded providers. - Claude 4.6 models —
claude-opus-4-6-20250618,claude-sonnet-4-6-20250618added to Anthropic provider. - Claude Haiku 4.5 —
claude-haiku-4-5-20251001added. - Claude 3.7 Sonnet —
claude-3-7-sonnet-20250219added (replaces incorrectclaude-sonnet-3.7ID). CostAnalyticsandProviderHealthEntry— typed return values forgetCostAnalytics()andgetProviderHealth().
- 30+
anytypes eliminated — all provider interfaces, tool call types, Workers AI response shapes, error bodies, cost analytics returns, and decorator signatures fully typed. Three boundary casts for CloudflareAi.run()retained with explicit eslint-disable. - Data leak removed —
console.logatanthropic.ts:492that dumped full tool call payloads into worker logs. - Anthropic JSON mode — only prepends
{if the response doesn't already start with one, preventing{{...}corruption. - OpenAI
supportsBatching— set tofalse(wastruebutprocessBatch()is a sequential loop). - Default model — OpenAI default changed from deprecated
gpt-3.5-turbotogpt-4o-mini. - Default fallback chain — now includes all 5 configured providers (was hardcoded to cloudflare/anthropic/openai, excluding Cerebras and Groq).
- Anthropic healthCheck — switched from real API call (burned tokens) to lightweight OPTIONS reachability check.
TokenUsage.cost— made required (was optional, causing NaN accumulation in cost tracker).- Circuit breaker test isolation —
defaultCircuitBreakerManager.resetAll()in all testbeforeEachblocks to prevent cross-test state leaks.
- Logging default — library is now silent by default (
noopLogger). PassconsoleLoggeror a customLoggerto enable output. - Model catalog — updated to current-gen models; removed stale/incorrect model IDs and TBD pricing.
- ImageProvider — multi-provider image generation (Cloudflare Workers AI + Google Gemini). Extracted from img-forge production codebase.
- 5 built-in image models:
sdxl-lightning(fast/draft),flux-klein(balanced),flux-dev(high quality),gemini-flash-image(text-capable),gemini-flash-image-preview(latest). IMAGE_MODELSregistry with full config: dimensions, steps, guidance, negative prompt support, seed support.normalizeAiResponse()— handles all Workers AI return types (ArrayBuffer, ReadableStream, objects with.image, base64 strings).getImageModel()— lookup helper for model configs.- Custom model configs via
ImageProviderConfig.models— extend or override the built-in registry.
First stable release. Production-tested in AEGIS cognitive kernel since v1.72.0.
LLMProviders.fromEnv()— auto-discovers available providers from environment variables. One-line setup for multi-provider configurations.response_format— unified structured output support ({ type: 'json_object' }) across all providers that support it.- CreditLedger — per-provider monthly budget tracking with threshold alerts (80%/90%/95%), burn rate calculation, and depletion projection.
- Burn rate analytics —
burnRate()returns current spend velocity and projected depletion date per provider. - Cerebras provider — ZAI-GLM 4.7 (355B reasoning) and Qwen 3 235B (MoE) via OpenAI-compatible API with tool calling support.
- Groq provider — fast inference via OpenAI-compatible API.
- Cloudflare provider — Workers AI integration with GPT-OSS 120B tool calling support.
- OpenAI provider — GPT-4o and compatible models.
- Anthropic provider — Claude models via Messages API.
- Graduated circuit breaker — half-open probe state, configurable failure thresholds, automatic recovery.
- CostTracker — per-provider cost aggregation with
breakdown(),total(), anddrain()for periodic reporting. - RetryManager — exponential backoff with jitter, configurable
shouldRetrycallback, max attempts. - Rich error model — 12 typed error classes (RateLimitError, QuotaExceededError, AuthenticationError, etc.) with
retryableflag. - Model constants —
MODELSobject with all supported model identifiers. - Model recommendations —
getRecommendedModel()for use-case-based model selection (cost-effective, high-performance, balanced, tool-calling, long-context). - npm provenance — all published versions include cryptographic provenance attestation linking to the exact GitHub commit.
- CI workflows — typecheck + test suite on Node 18/20/22 for every PR.
- SECURITY.md — vulnerability reporting policy and supply chain security documentation.
- Zero runtime dependencies. Published tarball contains only compiled code and license.
- CI-only publishing with OIDC-based npm authentication and provenance signing.
- Automated
npm auditon every CI run.