You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SafeDocX increasingly makes implementation and test claims that depend on OOXML / ECMA-376 requirements, but those claims are not currently tied to a structured, queryable, authoritative source of truth.
Today the repo has comments, issue bodies, docs, fixtures, runtime checks, and tests that say things like "ECMA-376 requires X" or "this mirrors canonical WordprocessingML behavior." Those claims may be correct, but they are mostly free-text. A reviewer, contributor, or AI agent cannot reliably answer:
Which exact ECMA-376 edition, part, section, and topic is being relied on?
Where is the authoritative text or canonical example for that claim?
Which production code paths and tests cover that fragment of the standard?
Which fragments are intentionally out of scope for SafeDocX?
Are we claiming full OOXML compliance, scoped tracked-change compliance, or only conformance for a specific supported editing surface?
This makes internal GitHub issues and PR review threads become de facto authority. That is useful engineering context, but it should not be the primary public-facing authority for normative OOXML behavior.
Concrete grounding example
Issue #217 is a good example of the problem, not because it is bad, but because it shows the current limitation clearly:
packages/docx-core/src/baselines/atomizer/inPlaceModifier.ts contains field-handling logic such as getAtomRuns(...) treating collapsed field atoms as a single logical unit, plus pre-split logic that skips collapsed field atoms and field-character elements.
The intended implementation behavior is highly specific: field-character runs may need to stay at sibling level while only payload runs are wrapped in w:ins / w:del.
That is exactly the sort of claim that should be traceable to a stable spec reference, a canonical fixture, and coverage status, rather than only to an issue narrative.
Why this matters
Trust: Users and contributors should be able to see what SafeDocX means when it says it emits valid or conformant OOXML.
Review quality: A reviewer should not have to reverse-engineer a spec claim from an issue thread or web search.
Agent usability: AI agents working in the repo should be able to resolve spec citations locally and structurally, without relying on ad-hoc internet lookup.
Scope control: We should be explicit about what we do not attempt to implement. This issue is not about implementing all of ECMA-376.
Regression safety: Tests should be able to declare the normative spec fragment they exercise, making coverage and drift visible over time.
Professionalism: Public-facing code comments should prefer authoritative standards references as primary support. Internal GitHub issues can remain useful secondary context.
Important scope distinction
This should not become a vague claim of "SafeDocX supports all OOXML."
The goal is to create a structured way to say, for example:
This source/test/fixture is intended to satisfy ECMA-376 5th edition, Part 4, section/topic X.
This spec fragment is covered by tests A and B and runtime validator C.
This adjacent spec fragment is intentionally out of scope because SafeDocX does not support that editing surface.
This behavior is implementation-informed rather than directly normative, and therefore should be marked as such.
Non-goals for this issue
Do not implement the whole ECMA-376 standard.
Do not decide the final design in this issue.
Do not replace Word / LibreOffice / docx4j / pandoc interoperability testing.
Do not treat GitHub issues as normative sources; they should remain context and project history.
Do not silently copy or modify standards text without preserving required notices and verifying the applicable Ecma terms.
Categories of solution to evaluate later
This issue should first capture the problem. Follow-up design can choose between these categories or combine them:
Pinned standards source / corpus
Vendored ECMA-376 artifacts, or a script that fetches official artifacts by pinned URL and checksum.
If vendored, preserve copyright notices and keep the material unchanged/up-to-date as required by Ecma's text copyright policy.
Consider whether to store original ZIP/PDF files, extracted section text, structured indexes, or only a manifest plus fetch script.
Normative reference IDs
Create stable internal IDs such as ooxml.ecma376.5ed.part4.17.16.5.fldChar.
Each ID should record edition, part, section/topic, title, normative/informative status, source artifact, checksum/page/anchor, and any known errata or related implementation notes.
Structured annotations in code and tests
Add JSDoc or test metadata such as @ooxmlSpec <id> / @ooxmlCoverage <id>.
Allow source comments, XML constants, fixtures, runtime validators, and tests to reference the same spec IDs.
Coverage matrix
Track per-spec-fragment status: covered, partial, out-of-scope, not-yet-covered, implementation-note, etc.
Include rationale for out-of-scope decisions.
Generate a report so maintainers can see coverage by part/section and avoid accidental overclaims.
Spec-backed fixtures
Keep canonical OOXML examples or minimized fixtures associated with spec IDs.
Where examples are copied from the standard, preserve precise source attribution and required notices.
Where examples are derived/minimized, mark them as derived and explain the transformation.
Lint / CI enforcement
Fail CI when a code/test annotation references a missing spec ID.
Optionally warn when a public-facing comment cites only an internal issue for an OOXML rule that has a known spec ID.
Optionally verify checksums of vendored or fetched standards artifacts.
Public conformance/support documentation
Extend existing support/conformance docs to state the supported OOXML surface precisely.
Separate normative ECMA-376 conformance claims from product-interoperability findings and project-specific design choices.
Acceptance criteria for a first pass
A short ADR/proposal exists that records the problem, candidate approaches, and licensing/copyright constraints for storing or extracting ECMA text.
At least one production source comment or test references the manifest ID instead of only free-text section prose.
At least one out-of-scope or partial entry exists, to make clear this is scoped coverage rather than a full-standard compliance claim.
A simple report or script can summarize referenced spec IDs and coverage status.
Contributor guidance states that internal issues may be cited as project history, but normative OOXML claims should cite the spec reference ID when one exists.
packages/docx-core/src/baselines/atomizer/inPlaceModifier.ts — concrete implementation surface containing collapsed-field and field-character behavior.
packages/docx-core/SUPPORT.md — existing scoped support surface that could become the public-facing place to state which OOXML claims are covered, partial, or out of scope.
Problem
SafeDocX increasingly makes implementation and test claims that depend on OOXML / ECMA-376 requirements, but those claims are not currently tied to a structured, queryable, authoritative source of truth.
Today the repo has comments, issue bodies, docs, fixtures, runtime checks, and tests that say things like "ECMA-376 requires X" or "this mirrors canonical WordprocessingML behavior." Those claims may be correct, but they are mostly free-text. A reviewer, contributor, or AI agent cannot reliably answer:
This makes internal GitHub issues and PR review threads become de facto authority. That is useful engineering context, but it should not be the primary public-facing authority for normative OOXML behavior.
Concrete grounding example
Issue #217 is a good example of the problem, not because it is bad, but because it shows the current limitation clearly:
w:fldChar,w:delInstrText, and fragmented field markup.packages/docx-core/src/baselines/atomizer/inPlaceModifier.tscontains field-handling logic such asgetAtomRuns(...)treating collapsed field atoms as a single logical unit, plus pre-split logic that skips collapsed field atoms and field-character elements.w:ins/w:del.That is exactly the sort of claim that should be traceable to a stable spec reference, a canonical fixture, and coverage status, rather than only to an issue narrative.
Why this matters
Important scope distinction
This should not become a vague claim of "SafeDocX supports all OOXML."
The goal is to create a structured way to say, for example:
Non-goals for this issue
Categories of solution to evaluate later
This issue should first capture the problem. Follow-up design can choose between these categories or combine them:
Pinned standards source / corpus
Normative reference IDs
ooxml.ecma376.5ed.part4.17.16.5.fldChar.Structured annotations in code and tests
@ooxmlSpec <id>/@ooxmlCoverage <id>.Coverage matrix
covered,partial,out-of-scope,not-yet-covered,implementation-note, etc.Spec-backed fixtures
Lint / CI enforcement
Public conformance/support documentation
Acceptance criteria for a first pass
out-of-scopeorpartialentry exists, to make clear this is scoped coverage rather than a full-standard compliance claim.External references to consider
Related repo context
packages/docx-core/src/baselines/atomizer/inPlaceModifier.ts— concrete implementation surface containing collapsed-field and field-character behavior.packages/docx-core/SUPPORT.md— existing scoped support surface that could become the public-facing place to state which OOXML claims are covered, partial, or out of scope.