Establish structured OOXML/ECMA-376 spec traceability and compliance coverage

## Problem

SafeDocX increasingly makes implementation and test claims that depend on OOXML / ECMA-376 requirements, but those claims are not currently tied to a structured, queryable, authoritative source of truth.

Today the repo has comments, issue bodies, docs, fixtures, runtime checks, and tests that say things like "ECMA-376 requires X" or "this mirrors canonical WordprocessingML behavior." Those claims may be correct, but they are mostly free-text. A reviewer, contributor, or AI agent cannot reliably answer:

- Which exact ECMA-376 edition, part, section, and topic is being relied on?
- Where is the authoritative text or canonical example for that claim?
- Which production code paths and tests cover that fragment of the standard?
- Which fragments are intentionally out of scope for SafeDocX?
- Are we claiming full OOXML compliance, scoped tracked-change compliance, or only conformance for a specific supported editing surface?

This makes internal GitHub issues and PR review threads become de facto authority. That is useful engineering context, but it should not be the primary public-facing authority for normative OOXML behavior.

## Concrete grounding example

Issue #217 is a good example of the problem, not because it is bad, but because it shows the current limitation clearly:

- #217 asserts requirements from ECMA-376 Part 4 around `w:fldChar`, `w:delInstrText`, and fragmented field markup.
- `packages/docx-core/src/baselines/atomizer/inPlaceModifier.ts` contains field-handling logic such as `getAtomRuns(...)` treating collapsed field atoms as a single logical unit, plus pre-split logic that skips collapsed field atoms and field-character elements.
- The intended implementation behavior is highly specific: field-character runs may need to stay at sibling level while only payload runs are wrapped in `w:ins` / `w:del`.

That is exactly the sort of claim that should be traceable to a stable spec reference, a canonical fixture, and coverage status, rather than only to an issue narrative.

## Why this matters

1. **Trust:** Users and contributors should be able to see what SafeDocX means when it says it emits valid or conformant OOXML.
2. **Review quality:** A reviewer should not have to reverse-engineer a spec claim from an issue thread or web search.
3. **Agent usability:** AI agents working in the repo should be able to resolve spec citations locally and structurally, without relying on ad-hoc internet lookup.
4. **Scope control:** We should be explicit about what we do not attempt to implement. This issue is not about implementing all of ECMA-376.
5. **Regression safety:** Tests should be able to declare the normative spec fragment they exercise, making coverage and drift visible over time.
6. **Professionalism:** Public-facing code comments should prefer authoritative standards references as primary support. Internal GitHub issues can remain useful secondary context.

## Important scope distinction

This should not become a vague claim of "SafeDocX supports all OOXML."

The goal is to create a structured way to say, for example:

- This source/test/fixture is intended to satisfy ECMA-376 5th edition, Part 4, section/topic X.
- This spec fragment is covered by tests A and B and runtime validator C.
- This adjacent spec fragment is intentionally out of scope because SafeDocX does not support that editing surface.
- This behavior is implementation-informed rather than directly normative, and therefore should be marked as such.

## Non-goals for this issue

- Do not implement the whole ECMA-376 standard.
- Do not decide the final design in this issue.
- Do not replace Word / LibreOffice / docx4j / pandoc interoperability testing.
- Do not treat GitHub issues as normative sources; they should remain context and project history.
- Do not silently copy or modify standards text without preserving required notices and verifying the applicable Ecma terms.

## Categories of solution to evaluate later

This issue should first capture the problem. Follow-up design can choose between these categories or combine them:

1. **Pinned standards source / corpus**
   - Vendored ECMA-376 artifacts, or a script that fetches official artifacts by pinned URL and checksum.
   - If vendored, preserve copyright notices and keep the material unchanged/up-to-date as required by Ecma's text copyright policy.
   - Consider whether to store original ZIP/PDF files, extracted section text, structured indexes, or only a manifest plus fetch script.

2. **Normative reference IDs**
   - Create stable internal IDs such as `ooxml.ecma376.5ed.part4.17.16.5.fldChar`.
   - Each ID should record edition, part, section/topic, title, normative/informative status, source artifact, checksum/page/anchor, and any known errata or related implementation notes.

3. **Structured annotations in code and tests**
   - Add JSDoc or test metadata such as `@ooxmlSpec <id>` / `@ooxmlCoverage <id>`.
   - Allow source comments, XML constants, fixtures, runtime validators, and tests to reference the same spec IDs.

4. **Coverage matrix**
   - Track per-spec-fragment status: `covered`, `partial`, `out-of-scope`, `not-yet-covered`, `implementation-note`, etc.
   - Include rationale for out-of-scope decisions.
   - Generate a report so maintainers can see coverage by part/section and avoid accidental overclaims.

5. **Spec-backed fixtures**
   - Keep canonical OOXML examples or minimized fixtures associated with spec IDs.
   - Where examples are copied from the standard, preserve precise source attribution and required notices.
   - Where examples are derived/minimized, mark them as derived and explain the transformation.

6. **Lint / CI enforcement**
   - Fail CI when a code/test annotation references a missing spec ID.
   - Optionally warn when a public-facing comment cites only an internal issue for an OOXML rule that has a known spec ID.
   - Optionally verify checksums of vendored or fetched standards artifacts.

7. **Public conformance/support documentation**
   - Extend existing support/conformance docs to state the supported OOXML surface precisely.
   - Separate normative ECMA-376 conformance claims from product-interoperability findings and project-specific design choices.

## Acceptance criteria for a first pass

- [ ] A short ADR/proposal exists that records the problem, candidate approaches, and licensing/copyright constraints for storing or extracting ECMA text.
- [ ] The repo has an initial structured spec-reference manifest with at least the ECMA-376 Part 4 field-fragmentation references needed by #217.
- [ ] At least one production source comment or test references the manifest ID instead of only free-text section prose.
- [ ] At least one `out-of-scope` or `partial` entry exists, to make clear this is scoped coverage rather than a full-standard compliance claim.
- [ ] A simple report or script can summarize referenced spec IDs and coverage status.
- [ ] Contributor guidance states that internal issues may be cited as project history, but normative OOXML claims should cite the spec reference ID when one exists.

## External references to consider

- ECMA-376 official publication page: https://ecma-international.org/publications-and-standards/standards/ecma-376/
- Ecma text copyright policy: https://ecma-international.org/policies/by-ipr/ecma-text-copyright-policy/
- WHATWG DOM Standard pattern: https://dom.spec.whatwg.org/ links directly to its GitHub repository and web-platform-tests.

## Related repo context

- #217 — field-fragmentation / ECMA-376 Part 4 example that motivated this issue.
- `packages/docx-core/src/baselines/atomizer/inPlaceModifier.ts` — concrete implementation surface containing collapsed-field and field-character behavior.
- `packages/docx-core/SUPPORT.md` — existing scoped support surface that could become the public-facing place to state which OOXML claims are covered, partial, or out of scope.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Establish structured OOXML/ECMA-376 spec traceability and compliance coverage #223

Problem

Concrete grounding example

Why this matters

Important scope distinction

Non-goals for this issue

Categories of solution to evaluate later

Acceptance criteria for a first pass

External references to consider

Related repo context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Establish structured OOXML/ECMA-376 spec traceability and compliance coverage #223

Description

Problem

Concrete grounding example

Why this matters

Important scope distinction

Non-goals for this issue

Categories of solution to evaluate later

Acceptance criteria for a first pass

External references to consider

Related repo context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions