Skip to content

validateFieldStructure: add w:delInstrText placement check + reject w:fldChar inside <w:del> #209

@stevenobiajulu

Description

@stevenobiajulu

Context

While reviewing PR #208 (add-ooxml-doc-subset-and-inv-field-001-proof), a peer review (Codex + Gemini) and a deep-dive into ECMA-376 Part 4 surfaced two schema-conformance gaps in validateFieldStructure at packages/docx-core/src/baselines/atomizer/pipeline.ts:352-402. The Lean Tier 2 model in PR #208 enforces (1); ECMA-376 mandates (2). The TS engine currently checks neither.

Gap 1 — w:delInstrText placement is unvalidated

pipeline.ts:357 collects only findAllByTagName(root, 'w:instrText'), and the placement check at pipeline.ts:391-393 only branches on el.tagName === 'w:instrText'. The engine itself emits w:delInstrText (inPlaceModifier.ts:319-320; reject converts it back at trackChangesAcceptorAst.ts:613-615), so this is a real tag we are producing and not validating.

Per ECMA-376 (DeletedFieldCode): `w:delInstrText` MUST reside exclusively within a `<w:del>` revision element. Word's renderer discards the field state machine and falls back to literal text when this is violated. docx4j and LibreOffice both enforce or split to maintain this invariant.

Lean implication: Tier2/FieldStructure.lean:64-71 already validates both instrText and delInstrText, so the Lean model is correct relative to the spec — TS is the under-validator. PR #208's residual axiom compareDocumentXml_output_recursivelyWellformed asserts a property stronger than what TS validates today, so the bridge falsifiability layer is the only check of the stricter property and the single NUMPAGES fixture produces zero w:delInstrText atoms.

Gap 2 — w:fldChar inside <w:del> is not rejected

ECMA-376 explicitly forbids w:fldChar from entering a <w:del> wrapper: "complex field characters are strictly barred from entering deletion blocks." Microsoft Word treats violations as a fatal schema error (discards the field state machine). The current validateFieldStructure happily walks through <w:del> wrappers, never checking parentage of <w:fldChar>.

Acceptance criteria

  • validateFieldStructure collects and validates placement of both w:instrText AND w:delInstrText (must be inside an open, pre-separate field body).
  • validateFieldStructure rejects any w:fldChar whose ancestor chain (up to the document root) includes a <w:del>.
  • New TS unit tests cover: (a) a deleted complete field producing w:delInstrText correctly placed, (b) a malformed w:delInstrText outside <w:del> correctly rejected, (c) a w:fldChar nested inside <w:del> correctly rejected.
  • PR feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset #208's lean-spec-bridge.test.ts extends its falsifiability layer with a deletion-of-existing-field fixture exercising w:delInstrText.
  • Lean docstring at Tier2/FieldStructure.lean:7-9 remains accurate (it currently claims both checks are performed by TS).

Cross-references

Notes

The PR review also surfaced a deeper concern that fieldContextNeutral's ∀ ctx predicate may be incompatible with ECMA-376-conformant field fragmentation (a <w:ins> containing only <w:instrText> is conformant but not context-neutral under empty stack). That is a Lean-model issue tracked separately in PR #208, not in scope here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions