Skip to content

feat(schema-drift): envelope validation + factory fallback (slice 1 of #39)#40

Merged
stackbilt-admin merged 3 commits intomainfrom
feat/schema-drift-detection
Apr 12, 2026
Merged

feat(schema-drift): envelope validation + factory fallback (slice 1 of #39)#40
stackbilt-admin merged 3 commits intomainfrom
feat/schema-drift-detection

Conversation

@stackbilt-admin
Copy link
Copy Markdown
Member

Summary

First slice of #39 — defense-in-depth against silent upstream API deprecations. Adds response envelope schema validation at the provider boundary, wires the Anthropic provider as the template, and routes SchemaDriftError through the factory's fallback path with a new observability hook.

The 2am scenario this shields you from: Anthropic silently renames usage.input_tokens at 2am. Today, the parser throws TypeError: Cannot read property 'input_tokens' of undefined, the error is non-routable, and the caller gets a hard failure. With this PR, the parser throws a structured SchemaDriftError instead, the factory treats it as fallback-eligible, traffic routes to OpenAI/Groq/Cerebras/Cloudflare, and onSchemaDrift fires with { provider, path, expected, actual } so oncall sees the drift before it cascades.

What's in this slice

  • SchemaDriftError (src/errors.ts) — non-retryable, code SCHEMA_DRIFT, statusCode 502. Carries path, expected, actual for observability.
  • validateSchema (src/utils/schema-validator.ts) — zero-dep path-based validator. No new runtime dependencies added. Hand-rolled because package.json declares zero runtime deps and I'm not breaking that invariant for one feature. Supports string | number | boolean | array | object | string-or-null, optional fields, fail-fast semantics, and distinguishes null/undefined/array from generic object in error messages.
  • Anthropic wire-up (src/providers/anthropic.ts) — validates the minimum envelope fields the parser actually touches (id, content, model, stop_reason, usage, usage.input_tokens, usage.output_tokens) before field access. stop_sequence left unspecified because it's genuinely optional.
  • Factory routing (src/factory.ts) — getFallbackDecision treats SchemaDriftError (and SCHEMA_DRIFT error code on generic LLMProviderError) as fallback-eligible. On catch, fires onSchemaDrift with path/expected/actual/requestId before evaluating fallback.
  • Observability hook (src/utils/hooks.ts) — SchemaDriftEvent + onSchemaDrift? on the ObservabilityHooks interface. Wired into composeHooks for multi-sink observability.
  • Re-exports (src/index.ts) — SchemaDriftError, validateSchema, SchemaField, SchemaFieldType, SchemaDriftEvent all exported from the package root.

Tests (17 new, 201 total passing)

  • Validator primitives — valid envelopes, missing required paths, wrong types, null root, optional fields, NaN rejection, array-vs-object, string-or-null, fail-fast on first drift
  • Anthropic drift detection (real provider + mocked fetch) — passes valid responses, detects usage.input_tokens rename, detects content removal, detects content type change from array to string, asserts non-retryable (fetch called exactly once even with maxRetries: 3)
  • Factory fallback (real providers + mocked fetch) — drifted Anthropic response falls over to OpenAI; onSchemaDrift fires with correct path/expected/actual/requestId

npm test201 passed (201). tsc --noEmit → clean.

Deferred to follow-up PRs

Keeping this PR focused so review stays tractable. The follow-ups are:

  1. Wire up remaining providers — openai, groq, cerebras, cloudflare. Each gets its own envelope schema + validateSchema call in generateResponse. Same test pattern as Anthropic. Small, parallelizable PRs.
  2. Schema canary module (src/utils/schema-canary.ts) — Part 2 of Response envelope schema validation + schema drift canary #39. Opt-in runSchemaCanary(providers) that sends known-good requests, captures responses, diffs against committed golden fixtures, returns a report. Scheduling stays with the consumer (cron Worker / GitHub Action).
  3. Streaming path validationstreamResponse in each provider uses a different parser that doesn't go through generateResponse. Needs its own drift shield.

What this PR does NOT do

  • No changelog/RSS polling (out of scope, different cadence)
  • No auto-patching or auto-healing parsers (humans review drift diffs, period)
  • No request-payload schema validation (already handled in validateRequest)
  • No new runtime deps — zero-dep validator by design

Design notes

  • Why fail-fast instead of collecting all errors? The first drift is enough to trigger fallback. Walking the whole schema when we're already broken wastes budget and muddies the hook event (which error do you report?).
  • Why not zod/valibot? The package currently declares zero runtime dependencies. Adding a schema library for one feature would be a meaningful ecosystem decision and should not be snuck in via a bugfix-adjacent PR. The hand-rolled validator is ~100 lines and covers the bounded set of types providers actually return.
  • Why 502 status code? Semantically it's an upstream gateway problem — the upstream returned a shape we can't parse. Callers that map statusCode onto HTTP responses will surface this correctly.
  • Why non-retryable? Retrying the same provider hits the same broken shape. Only a different provider can rescue the request. Circuit breaker still counts it as a failure so repeated drifts open the breaker.

Closes (partially): #39

Test plan

  • npx -y tsc --noEmit clean
  • npm test → 201/201 passing (17 new schema-drift tests + 184 existing)
  • New drift tests exercise: validator primitives, Anthropic parser, factory fallback, hook firing
  • Follow-up PRs for remaining 4 providers + canary module

…factory fallback

Defense-in-depth against silent upstream API deprecations. Adds
SchemaDriftError, a zero-dep path-based validator, and wires the Anthropic
provider as the template. Drift is non-retryable but fallback-eligible -
factory routes to the next healthy provider and fires onSchemaDrift hook
for observability.

Slice 1 of #39. Remaining providers and schema canary land in follow-ups.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member Author

@stackbilt-admin stackbilt-admin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeBeast review

Verdict: Good slice, but the headline claim ("shields against silent deprecations") has a hole big enough to drive a llama through. Ship it after H-2 gets scoped or address the gap here.


🔴 HIGH — drift shield has a blind spot

H-2: Nested content blocks are not validated, and that's where Anthropic actually changes shape. → filed as #42

The schema validates content is an array. It does NOT validate element shapes. Then formatResponse reads:

data.content.filter(block => block.type === 'text').map(block => block.text)
toolUses.map(tool => ({ id: tool.id!, name: tool.name!, ... }))

Those non-null assertions are doing heavy lifting. If Anthropic renames tool_use to function_call, or moves id inside a metadata sub-object — neither is impossible — the validator passes, filter returns empty, and the response looks successful with silent data loss. Or tool.id! becomes undefined and we get TypeError on the happy path, bypassing the SchemaDriftError → fallback → hook machinery entirely.

The PR claims drift = fallback. This case = crash. That's the exact scenario the PR body describes as the problem to solve.

H-1: Streaming path is unguarded. → filed as #41

Acknowledged in PR body as "deferred," but bluntly: streamResponse parses parsed.delta.text from SSE chunks with zero shape validation and an inline try/catch that swallows errors. If Anthropic reshapes deltas, streaming consumers get partial/empty output with no error, no hook, no fallback. Worse than the generateResponse path, which at least crashes loudly.


🟡 MEDIUM

M-1: No test for the circuit breaker opening after N drifts. The resilience story implies repeated drift trips the breaker — it probably does via executeWithResiliencyCircuitBreaker.execute, but "probably" isn't a test. Add one: hammer a mock-drifted provider 5 times, assert breaker state = OPEN.

M-2: No test for end-of-chain drift. Happy-path test exercises drift-on-first-then-success. Missing: every provider drifts, verify the terminal error is SchemaDriftError (not generic ALL_PROVIDERS_FAILED) and onSchemaDrift fires for each. ~20 lines, catches real regressions.

M-3: statusCode: 502 on SchemaDriftError is semantically a lie. Upstream returned HTTP 200. 502 means "bad gateway," which callers that surface statusCode in HTTP responses will incorrectly attribute to a network/gateway failure. 422 (unprocessable entity) is more accurate. One-line fix.

M-4: optional: true flag isn't exercised in any wired-up schema. Validator-tested but no provider uses it. Anthropic's stop_sequence is a real optional field — natural template case. Dead code smell until the next wire-up PR pulls it in.


🟢 LOW

L-1: Fail-fast validation hurts debuggability. Correct for routing (first drift = trigger fallback, done). But if the shape shifted in three places at once, ops fixes one field and ships, then gets paged again tomorrow. Consider a validateSchemaAll variant for logging paths. Not blocking.

L-2: Hook ordering — onRequestError fires before onSchemaDrift for the same incident. Consumers composing both hooks see duplicate-looking events. Document or dedupe.

L-3: Module-level ANTHROPIC_RESPONSE_SCHEMA won't scale cleanly to 5 providers. Fine for slice 1. When the next three wire-ups land, pull into src/providers/schemas.ts or colocate per provider.

L-4: The non-null assertions in formatResponse (tool.id!, tool.name!) were a code smell before this PR. Now that the validator exists, they could be replaced with schema-checked access after #42 lands.


Test gaps

ID What's missing Severity Tracked
T-1 Streaming path drift HIGH #41
T-2 Nested content block reshape HIGH #42
T-3 Circuit breaker opens after N drifts MED
T-4 End-of-chain drift path MED
T-5 optional: true in a real provider schema LOW

Security / PII

Clean. The validator only emits type names ('string', 'number', 'array') in error messages — never actual values. No prompt content, API keys, or user data leak into SchemaDriftError.message or the onSchemaDrift event payload.

Don't break this. If someone "improves" the error message to include the offending value later, they'll be leaking request content into logs and telemetry. Suggest adding an assertion test that the error message contains only path + type names, never values.


Merge recommendation

  1. #42 (H-2) now or scoped: either land a fix in this PR, or merge as-is with the gap explicitly scoped to #42. The README/PR body claim needs tightening if you pick the latter.
  2. #41 (H-1) tracked: filed, good to merge.
  3. M-3 now: one-line change to statusCode: 422.
  4. T-4 before merge: add the end-of-chain test.
  5. Merge.

Overall: 7/10. Honest scope, clean tests for what's covered, but "shields against silent API deprecations" while leaving content-block paths unvalidated is a load-bearing overstatement. Tighten the claim or tighten the shield.

…ests

Responds to CodeBeast review of #40. Closes H-2 (#42) in-PR, fixes M-3,
adds missing tests for M-1 and M-2 and security message surface.

- Extend validator with items/discriminator for discriminated-union
  array element validation. Unknown variants are forward-compatible
  (skipped, not rejected) so additive upstream changes don't break
  consumers on the next deploy.
- Rewire ANTHROPIC_RESPONSE_SCHEMA to validate content[] elements:
  text blocks need .text (string), tool_use blocks need .id, .name,
  .input. stop_sequence now marked optional.
- Change SchemaDriftError statusCode 502 -> 422. Upstream returned
  HTTP 200 with a wrong body; Unprocessable Entity is accurate,
  Bad Gateway was misleading (M-3).
- Replace tool.id! / tool.name! non-null assertions with explicit
  casts and a comment documenting that the schema validator
  guarantees the invariant (L-4).
- Add tests: circuit breaker opens after repeated drift (M-1),
  SchemaDriftError propagates correctly at end of single-provider
  chain with hook fired (M-2), error message/serialized form never
  contains actual response values only type names (security), and
  six new H-2 tests covering tool_use missing/wrong fields, text
  block missing .text, bare-string element, missing discriminator,
  forward-compat unknown block type, and optional stop_sequence.

212 tests passing (11 new). H-1 (streaming) tracked separately in #41.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stackbilt-admin
Copy link
Copy Markdown
Member Author

Review fixes pushed (b6ef2af)

Addressed the CodeBeast review findings in-PR:

Finding Status Notes
H-2 (#42) nested content-block validation ✅ Fixed Validator extended with items + discriminator supporting discriminated unions. ANTHROPIC_RESPONSE_SCHEMA now validates tool_use blocks (id, name, input) and text blocks (.text). Unknown variants pass as forward-compat so Anthropic adding a new block type doesn't break every deploy.
H-1 (#41) streaming path validation ⏭ Tracked Follow-up PR. Separate parser, separate wiring.
M-1 circuit breaker after N drifts ✅ Tested New test: repeated drift across 10 calls, assert breaker state = OPEN.
M-2 end-of-chain drift ✅ Tested Single-provider factory, drift propagates as SchemaDriftError (not generic ALL_PROVIDERS_FAILED), onSchemaDrift still fires.
M-3 statusCode 502 → 422 ✅ Fixed One-line change in SchemaDriftError constructor. 422 Unprocessable Entity is semantically correct — upstream returned HTTP 200 with a wrong body.
M-4 optional: true in real schema ✅ Fixed stop_sequence marked optional in Anthropic schema; new test verifies a response without it passes validation.
L-4 non-null assertions ✅ Cleaned tool.id! / tool.name! replaced with explicit as string casts and a comment documenting the schema-enforced invariant.
Security no value leakage ✅ Tested New test puts a synthetic sk-verysecret-api-key-shouldnotleak into a drifting response, asserts the string never appears in err.message or the serialized error form.

Test count: 201 → 212 (11 new)

  • 6 new H-2 tests: tool_use missing id, tool_use wrong-typed input, text block missing .text, bare-string element, missing discriminator, forward-compat unknown block type
  • 1 optional-field test (stop_sequence)
  • 1 circuit breaker test
  • 1 end-of-chain test
  • 1 security assertion test
  • 1 updated M-2 test (from the original PR)

All 212 passing, tsc --noEmit clean.

What's still in follow-ups

  • H-1 / Validate streaming response path against schema drift #41 — streaming path validation (different parser surface)
  • Remaining 4 providers — openai, groq, cerebras, cloudflare — each needs its own schema + wire-up. Small, parallelizable PRs.
  • L-1, L-2, L-3 — low-priority cleanups, not blocking this PR.

Ready for re-review.

The default circuit breaker uses a graduated degradation curve with a
probabilistic rejection gate at DEGRADED levels. With 10 attempts and
the default [1.0, 0.9, 0.7, 0.4, 0.1] curve, reaching OPEN requires
getting lucky at each step - expected value ~16 attempts, not 10.
Locally passed by luck, CI failed on all three Node versions when the
random sequence bounced attempts off the gate without advancing.

Stub Math.random to 0 inside the test so the gate always admits the
request through to fn() where the drift advances a real failure,
and restore the stub in finally{}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@stackbilt-admin stackbilt-admin merged commit 1a57f1f into main Apr 12, 2026
3 checks passed
@stackbilt-admin stackbilt-admin deleted the feat/schema-drift-detection branch April 12, 2026 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant