Skip to content

validateFieldStructure: per-story field-closure check (footnotes, headers, endnotes) #212

@stevenobiajulu

Description

@stevenobiajulu

Context

Surfaced during PR #208 peer review and ECMA-376 deep research. Tracked separately so it doesn't block #208 or #209.

Per ECMA-376 Part 4 (fldChar topic): a complex field that is not closed before the end of a document story invalidates the field state machine — the engine ignores the field characters and parses the remaining run contents as standard literal text. A "story" is an isolated structural stream: main body, each footnote, each endnote, each header, each footer.

The current validateFieldStructure at packages/docx-core/src/baselines/atomizer/pipeline.ts:352-402 runs a single global balance check across the entire document.xml. It does not partition by story. If a future expansion (e.g., the footnote work in add-footnote-support) emits field-bearing footnote XML and a field straddles story boundaries (or worse, opens in main body and closes in a footnote), the current balance check would still pass but Word's renderer would discard the field state machine.

What needs to change

  • Partition the validation by story when validating multi-part XML (main body + footnotes + endnotes + headers + footers).
  • Reject any field whose w:fldChar begin and end markers are not in the same story.
  • Add fixtures covering: a balanced field within a footnote (valid); a field that opens in the main body and "ends" in a footnote (rejected); a footnote with an unclosed field (rejected).

Notes

validateFieldStructure currently takes a single documentXml: string. The signature may need to accept multiple parts (or the caller may need to invoke it per-story). The choice depends on how multi-part comparison output is currently surfaced in pipeline.ts:439-440; investigate before changing the signature.

This complements PR #209 (which fixes within-document-story placement) and is independent of PR #208 (Lean Tier 2 proof).

Sources

  • ECMA-376 Part 4, fldChar topic (c-rex.net mirror).
  • Microsoft Learn DeletedFieldCode class reference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions