feat(schema-drift): envelope validation + factory fallback (slice 1 of #39) by stackbilt-admin · Pull Request #40 · Stackbilt-dev/llm-providers

stackbilt-admin · 2026-04-12T19:03:16Z

Summary

First slice of #39 — defense-in-depth against silent upstream API deprecations. Adds response envelope schema validation at the provider boundary, wires the Anthropic provider as the template, and routes SchemaDriftError through the factory's fallback path with a new observability hook.

The 2am scenario this shields you from: Anthropic silently renames usage.input_tokens at 2am. Today, the parser throws TypeError: Cannot read property 'input_tokens' of undefined, the error is non-routable, and the caller gets a hard failure. With this PR, the parser throws a structured SchemaDriftError instead, the factory treats it as fallback-eligible, traffic routes to OpenAI/Groq/Cerebras/Cloudflare, and onSchemaDrift fires with { provider, path, expected, actual } so oncall sees the drift before it cascades.

What's in this slice

SchemaDriftError (src/errors.ts) — non-retryable, code SCHEMA_DRIFT, statusCode 502. Carries path, expected, actual for observability.
validateSchema (src/utils/schema-validator.ts) — zero-dep path-based validator. No new runtime dependencies added. Hand-rolled because package.json declares zero runtime deps and I'm not breaking that invariant for one feature. Supports string | number | boolean | array | object | string-or-null, optional fields, fail-fast semantics, and distinguishes null/undefined/array from generic object in error messages.
Anthropic wire-up (src/providers/anthropic.ts) — validates the minimum envelope fields the parser actually touches (id, content, model, stop_reason, usage, usage.input_tokens, usage.output_tokens) before field access. stop_sequence left unspecified because it's genuinely optional.
Factory routing (src/factory.ts) — getFallbackDecision treats SchemaDriftError (and SCHEMA_DRIFT error code on generic LLMProviderError) as fallback-eligible. On catch, fires onSchemaDrift with path/expected/actual/requestId before evaluating fallback.
Observability hook (src/utils/hooks.ts) — SchemaDriftEvent + onSchemaDrift? on the ObservabilityHooks interface. Wired into composeHooks for multi-sink observability.
Re-exports (src/index.ts) — SchemaDriftError, validateSchema, SchemaField, SchemaFieldType, SchemaDriftEvent all exported from the package root.

Tests (17 new, 201 total passing)

Validator primitives — valid envelopes, missing required paths, wrong types, null root, optional fields, NaN rejection, array-vs-object, string-or-null, fail-fast on first drift
Anthropic drift detection (real provider + mocked fetch) — passes valid responses, detects usage.input_tokens rename, detects content removal, detects content type change from array to string, asserts non-retryable (fetch called exactly once even with maxRetries: 3)
Factory fallback (real providers + mocked fetch) — drifted Anthropic response falls over to OpenAI; onSchemaDrift fires with correct path/expected/actual/requestId

npm test → 201 passed (201). tsc --noEmit → clean.

Deferred to follow-up PRs

Keeping this PR focused so review stays tractable. The follow-ups are:

Wire up remaining providers — openai, groq, cerebras, cloudflare. Each gets its own envelope schema + validateSchema call in generateResponse. Same test pattern as Anthropic. Small, parallelizable PRs.
Schema canary module (src/utils/schema-canary.ts) — Part 2 of Response envelope schema validation + schema drift canary #39. Opt-in runSchemaCanary(providers) that sends known-good requests, captures responses, diffs against committed golden fixtures, returns a report. Scheduling stays with the consumer (cron Worker / GitHub Action).
Streaming path validation — streamResponse in each provider uses a different parser that doesn't go through generateResponse. Needs its own drift shield.

What this PR does NOT do

No changelog/RSS polling (out of scope, different cadence)
No auto-patching or auto-healing parsers (humans review drift diffs, period)
No request-payload schema validation (already handled in validateRequest)
No new runtime deps — zero-dep validator by design

Design notes

Why fail-fast instead of collecting all errors? The first drift is enough to trigger fallback. Walking the whole schema when we're already broken wastes budget and muddies the hook event (which error do you report?).
Why not zod/valibot? The package currently declares zero runtime dependencies. Adding a schema library for one feature would be a meaningful ecosystem decision and should not be snuck in via a bugfix-adjacent PR. The hand-rolled validator is ~100 lines and covers the bounded set of types providers actually return.
Why 502 status code? Semantically it's an upstream gateway problem — the upstream returned a shape we can't parse. Callers that map statusCode onto HTTP responses will surface this correctly.
Why non-retryable? Retrying the same provider hits the same broken shape. Only a different provider can rescue the request. Circuit breaker still counts it as a failure so repeated drifts open the breaker.

Closes (partially): #39

Test plan

npx -y tsc --noEmit clean
npm test → 201/201 passing (17 new schema-drift tests + 184 existing)
New drift tests exercise: validator primitives, Anthropic parser, factory fallback, hook firing
Follow-up PRs for remaining 4 providers + canary module

…factory fallback Defense-in-depth against silent upstream API deprecations. Adds SchemaDriftError, a zero-dep path-based validator, and wires the Anthropic provider as the template. Drift is non-retryable but fallback-eligible - factory routes to the next healthy provider and fires onSchemaDrift hook for observability. Slice 1 of #39. Remaining providers and schema canary land in follow-ups. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

stackbilt-admin

CodeBeast review

Verdict: Good slice, but the headline claim ("shields against silent deprecations") has a hole big enough to drive a llama through. Ship it after H-2 gets scoped or address the gap here.

🔴 HIGH — drift shield has a blind spot

H-2: Nested content blocks are not validated, and that's where Anthropic actually changes shape. → filed as #42

The schema validates content is an array. It does NOT validate element shapes. Then formatResponse reads:

data.content.filter(block => block.type === 'text').map(block => block.text)
toolUses.map(tool => ({ id: tool.id!, name: tool.name!, ... }))

Those non-null assertions are doing heavy lifting. If Anthropic renames tool_use to function_call, or moves id inside a metadata sub-object — neither is impossible — the validator passes, filter returns empty, and the response looks successful with silent data loss. Or tool.id! becomes undefined and we get TypeError on the happy path, bypassing the SchemaDriftError → fallback → hook machinery entirely.

The PR claims drift = fallback. This case = crash. That's the exact scenario the PR body describes as the problem to solve.

H-1: Streaming path is unguarded. → filed as #41

Acknowledged in PR body as "deferred," but bluntly: streamResponse parses parsed.delta.text from SSE chunks with zero shape validation and an inline try/catch that swallows errors. If Anthropic reshapes deltas, streaming consumers get partial/empty output with no error, no hook, no fallback. Worse than the generateResponse path, which at least crashes loudly.

🟡 MEDIUM

M-1: No test for the circuit breaker opening after N drifts. The resilience story implies repeated drift trips the breaker — it probably does via executeWithResiliency → CircuitBreaker.execute, but "probably" isn't a test. Add one: hammer a mock-drifted provider 5 times, assert breaker state = OPEN.

M-2: No test for end-of-chain drift. Happy-path test exercises drift-on-first-then-success. Missing: every provider drifts, verify the terminal error is SchemaDriftError (not generic ALL_PROVIDERS_FAILED) and onSchemaDrift fires for each. ~20 lines, catches real regressions.

M-3: statusCode: 502 on SchemaDriftError is semantically a lie. Upstream returned HTTP 200. 502 means "bad gateway," which callers that surface statusCode in HTTP responses will incorrectly attribute to a network/gateway failure. 422 (unprocessable entity) is more accurate. One-line fix.

M-4: optional: true flag isn't exercised in any wired-up schema. Validator-tested but no provider uses it. Anthropic's stop_sequence is a real optional field — natural template case. Dead code smell until the next wire-up PR pulls it in.

🟢 LOW

L-1: Fail-fast validation hurts debuggability. Correct for routing (first drift = trigger fallback, done). But if the shape shifted in three places at once, ops fixes one field and ships, then gets paged again tomorrow. Consider a validateSchemaAll variant for logging paths. Not blocking.

L-2: Hook ordering — onRequestError fires before onSchemaDrift for the same incident. Consumers composing both hooks see duplicate-looking events. Document or dedupe.

L-3: Module-level ANTHROPIC_RESPONSE_SCHEMA won't scale cleanly to 5 providers. Fine for slice 1. When the next three wire-ups land, pull into src/providers/schemas.ts or colocate per provider.

L-4: The non-null assertions in formatResponse (tool.id!, tool.name!) were a code smell before this PR. Now that the validator exists, they could be replaced with schema-checked access after #42 lands.

Test gaps

ID	What's missing	Severity	Tracked
T-1	Streaming path drift	HIGH	#41
T-2	Nested content block reshape	HIGH	#42
T-3	Circuit breaker opens after N drifts	MED	—
T-4	End-of-chain drift path	MED	—
T-5	`optional: true` in a real provider schema	LOW	—

Security / PII

Clean. The validator only emits type names ('string', 'number', 'array') in error messages — never actual values. No prompt content, API keys, or user data leak into SchemaDriftError.message or the onSchemaDrift event payload.

Don't break this. If someone "improves" the error message to include the offending value later, they'll be leaking request content into logs and telemetry. Suggest adding an assertion test that the error message contains only path + type names, never values.

Merge recommendation

#42 (H-2) now or scoped: either land a fix in this PR, or merge as-is with the gap explicitly scoped to #42. The README/PR body claim needs tightening if you pick the latter.
#41 (H-1) tracked: filed, good to merge.
M-3 now: one-line change to statusCode: 422.
T-4 before merge: add the end-of-chain test.
Merge.

Overall: 7/10. Honest scope, clean tests for what's covered, but "shields against silent API deprecations" while leaving content-block paths unvalidated is a load-bearing overstatement. Tighten the claim or tighten the shield.

…ests Responds to CodeBeast review of #40. Closes H-2 (#42) in-PR, fixes M-3, adds missing tests for M-1 and M-2 and security message surface. - Extend validator with items/discriminator for discriminated-union array element validation. Unknown variants are forward-compatible (skipped, not rejected) so additive upstream changes don't break consumers on the next deploy. - Rewire ANTHROPIC_RESPONSE_SCHEMA to validate content[] elements: text blocks need .text (string), tool_use blocks need .id, .name, .input. stop_sequence now marked optional. - Change SchemaDriftError statusCode 502 -> 422. Upstream returned HTTP 200 with a wrong body; Unprocessable Entity is accurate, Bad Gateway was misleading (M-3). - Replace tool.id! / tool.name! non-null assertions with explicit casts and a comment documenting that the schema validator guarantees the invariant (L-4). - Add tests: circuit breaker opens after repeated drift (M-1), SchemaDriftError propagates correctly at end of single-provider chain with hook fired (M-2), error message/serialized form never contains actual response values only type names (security), and six new H-2 tests covering tool_use missing/wrong fields, text block missing .text, bare-string element, missing discriminator, forward-compat unknown block type, and optional stop_sequence. 212 tests passing (11 new). H-1 (streaming) tracked separately in #41. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

stackbilt-admin · 2026-04-12T19:45:59Z

Review fixes pushed (`b6ef2af`)

Addressed the CodeBeast review findings in-PR:

Finding	Status	Notes
H-2 (#42) nested content-block validation	✅ Fixed	Validator extended with `items` + `discriminator` supporting discriminated unions. `ANTHROPIC_RESPONSE_SCHEMA` now validates `tool_use` blocks (`id`, `name`, `input`) and `text` blocks (`.text`). Unknown variants pass as forward-compat so Anthropic adding a new block type doesn't break every deploy.
H-1 (#41) streaming path validation	⏭ Tracked	Follow-up PR. Separate parser, separate wiring.
M-1 circuit breaker after N drifts	✅ Tested	New test: repeated drift across 10 calls, assert breaker state = OPEN.
M-2 end-of-chain drift	✅ Tested	Single-provider factory, drift propagates as `SchemaDriftError` (not generic `ALL_PROVIDERS_FAILED`), `onSchemaDrift` still fires.
M-3 statusCode 502 → 422	✅ Fixed	One-line change in `SchemaDriftError` constructor. 422 Unprocessable Entity is semantically correct — upstream returned HTTP 200 with a wrong body.
M-4 `optional: true` in real schema	✅ Fixed	`stop_sequence` marked optional in Anthropic schema; new test verifies a response without it passes validation.
L-4 non-null assertions	✅ Cleaned	`tool.id! / tool.name!` replaced with explicit `as string` casts and a comment documenting the schema-enforced invariant.
Security no value leakage	✅ Tested	New test puts a synthetic `sk-verysecret-api-key-shouldnotleak` into a drifting response, asserts the string never appears in `err.message` or the serialized error form.

Test count: 201 → 212 (11 new)

6 new H-2 tests: tool_use missing id, tool_use wrong-typed input, text block missing .text, bare-string element, missing discriminator, forward-compat unknown block type
1 optional-field test (stop_sequence)
1 circuit breaker test
1 end-of-chain test
1 security assertion test
1 updated M-2 test (from the original PR)

All 212 passing, tsc --noEmit clean.

What's still in follow-ups

H-1 / Validate streaming response path against schema drift #41 — streaming path validation (different parser surface)
Remaining 4 providers — openai, groq, cerebras, cloudflare — each needs its own schema + wire-up. Small, parallelizable PRs.
L-1, L-2, L-3 — low-priority cleanups, not blocking this PR.

Ready for re-review.

The default circuit breaker uses a graduated degradation curve with a probabilistic rejection gate at DEGRADED levels. With 10 attempts and the default [1.0, 0.9, 0.7, 0.4, 0.1] curve, reaching OPEN requires getting lucky at each step - expected value ~16 attempts, not 10. Locally passed by luck, CI failed on all three Node versions when the random sequence bounced attempts off the gate without advancing. Stub Math.random to 0 inside the test so the gate always admits the request through to fn() where the drift advances a real failure, and restore the stub in finally{}. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This was referenced Apr 12, 2026

Validate streaming response path against schema drift #41

Open

Validate nested response shapes (content blocks, tool_use, choices) #42

Closed

stackbilt-admin commented Apr 12, 2026

View reviewed changes

stackbilt-admin merged commit 1a57f1f into main Apr 12, 2026
3 checks passed

stackbilt-admin deleted the feat/schema-drift-detection branch April 12, 2026 19:54

stackbilt-admin mentioned this pull request Apr 12, 2026

Response envelope schema validation + schema drift canary #39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(schema-drift): envelope validation + factory fallback (slice 1 of #39)#40

feat(schema-drift): envelope validation + factory fallback (slice 1 of #39)#40
stackbilt-admin merged 3 commits intomainfrom
feat/schema-drift-detection

stackbilt-admin commented Apr 12, 2026

Uh oh!

stackbilt-admin left a comment

Uh oh!

stackbilt-admin commented Apr 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stackbilt-admin commented Apr 12, 2026

Summary

What's in this slice

Tests (17 new, 201 total passing)

Deferred to follow-up PRs

What this PR does NOT do

Design notes

Test plan

Uh oh!

stackbilt-admin left a comment

Choose a reason for hiding this comment

CodeBeast review

🔴 HIGH — drift shield has a blind spot

🟡 MEDIUM

🟢 LOW

Test gaps

Security / PII

Merge recommendation

Uh oh!

stackbilt-admin commented Apr 12, 2026

Review fixes pushed (b6ef2af)

Test count: 201 → 212 (11 new)

What's still in follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Review fixes pushed (`b6ef2af`)