Context
While reviewing PR #208 (add-ooxml-doc-subset-and-inv-field-001-proof), a peer review (Codex + Gemini) and a deep-dive into ECMA-376 Part 4 surfaced two schema-conformance gaps in validateFieldStructure at packages/docx-core/src/baselines/atomizer/pipeline.ts:352-402. The Lean Tier 2 model in PR #208 enforces (1); ECMA-376 mandates (2). The TS engine currently checks neither.
Gap 1 — w:delInstrText placement is unvalidated
pipeline.ts:357 collects only findAllByTagName(root, 'w:instrText'), and the placement check at pipeline.ts:391-393 only branches on el.tagName === 'w:instrText'. The engine itself emits w:delInstrText (inPlaceModifier.ts:319-320; reject converts it back at trackChangesAcceptorAst.ts:613-615), so this is a real tag we are producing and not validating.
Per ECMA-376 (DeletedFieldCode): `w:delInstrText` MUST reside exclusively within a `<w:del>` revision element. Word's renderer discards the field state machine and falls back to literal text when this is violated. docx4j and LibreOffice both enforce or split to maintain this invariant.
Lean implication: Tier2/FieldStructure.lean:64-71 already validates both instrText and delInstrText, so the Lean model is correct relative to the spec — TS is the under-validator. PR #208's residual axiom compareDocumentXml_output_recursivelyWellformed asserts a property stronger than what TS validates today, so the bridge falsifiability layer is the only check of the stricter property and the single NUMPAGES fixture produces zero w:delInstrText atoms.
Gap 2 — w:fldChar inside <w:del> is not rejected
ECMA-376 explicitly forbids w:fldChar from entering a <w:del> wrapper: "complex field characters are strictly barred from entering deletion blocks." Microsoft Word treats violations as a fatal schema error (discards the field state machine). The current validateFieldStructure happily walks through <w:del> wrappers, never checking parentage of <w:fldChar>.
Acceptance criteria
Cross-references
Notes
The PR review also surfaced a deeper concern that fieldContextNeutral's ∀ ctx predicate may be incompatible with ECMA-376-conformant field fragmentation (a <w:ins> containing only <w:instrText> is conformant but not context-neutral under empty stack). That is a Lean-model issue tracked separately in PR #208, not in scope here.
Context
While reviewing PR #208 (
add-ooxml-doc-subset-and-inv-field-001-proof), a peer review (Codex + Gemini) and a deep-dive into ECMA-376 Part 4 surfaced two schema-conformance gaps invalidateFieldStructureatpackages/docx-core/src/baselines/atomizer/pipeline.ts:352-402. The Lean Tier 2 model in PR #208 enforces (1); ECMA-376 mandates (2). The TS engine currently checks neither.Gap 1 —
w:delInstrTextplacement is unvalidatedpipeline.ts:357collects onlyfindAllByTagName(root, 'w:instrText'), and the placement check atpipeline.ts:391-393only branches onel.tagName === 'w:instrText'. The engine itself emitsw:delInstrText(inPlaceModifier.ts:319-320; reject converts it back attrackChangesAcceptorAst.ts:613-615), so this is a real tag we are producing and not validating.Per ECMA-376 (DeletedFieldCode): `w:delInstrText` MUST reside exclusively within a `<w:del>` revision element. Word's renderer discards the field state machine and falls back to literal text when this is violated. docx4j and LibreOffice both enforce or split to maintain this invariant.
Lean implication:
Tier2/FieldStructure.lean:64-71already validates bothinstrTextanddelInstrText, so the Lean model is correct relative to the spec — TS is the under-validator. PR #208's residual axiomcompareDocumentXml_output_recursivelyWellformedasserts a property stronger than what TS validates today, so the bridge falsifiability layer is the only check of the stricter property and the single NUMPAGES fixture produces zerow:delInstrTextatoms.Gap 2 —
w:fldCharinside<w:del>is not rejectedECMA-376 explicitly forbids
w:fldCharfrom entering a<w:del>wrapper: "complex field characters are strictly barred from entering deletion blocks." Microsoft Word treats violations as a fatal schema error (discards the field state machine). The currentvalidateFieldStructurehappily walks through<w:del>wrappers, never checking parentage of<w:fldChar>.Acceptance criteria
validateFieldStructurecollects and validates placement of bothw:instrTextANDw:delInstrText(must be inside an open, pre-separatefield body).validateFieldStructurerejects anyw:fldCharwhose ancestor chain (up to the document root) includes a<w:del>.w:delInstrTextcorrectly placed, (b) a malformedw:delInstrTextoutside<w:del>correctly rejected, (c) aw:fldCharnested inside<w:del>correctly rejected.lean-spec-bridge.test.tsextends its falsifiability layer with a deletion-of-existing-field fixture exercisingw:delInstrText.Tier2/FieldStructure.lean:7-9remains accurate (it currently claims both checks are performed by TS).Cross-references
inv_field_001proof) — this issue gates the merge.Notes
The PR review also surfaced a deeper concern that
fieldContextNeutral's∀ ctxpredicate may be incompatible with ECMA-376-conformant field fragmentation (a<w:ins>containing only<w:instrText>is conformant but not context-neutral under empty stack). That is a Lean-model issue tracked separately in PR #208, not in scope here.