Context
Child issue of #223.
We want SafeDocX's OOXML constants, semantic tag groups, implementation comments, and tests to be traceable to the official ECMA-376 standard artifacts rather than to hand-written strings or internal issue history alone.
Decision from project discussion: vendor the official ECMA-376 ZIP files unchanged rather than relying only on a fetch script. The reason is durability: the ECMA website URL structure may change, while the repository should continue to preserve the exact standard edition used for review, agent lookup, and reproducible code generation.
Source artifacts
Use the ECMA-376 official publication page:
https://ecma-international.org/publications-and-standards/standards/ecma-376/
Vendor the official ZIP downloads for the four parts listed there:
- Part 1: Fundamentals And Markup Language Reference, 5th edition, December 2016
- Part 2: Open Packaging Conventions, 5th edition, December 2021
- Part 3: Markup Compatibility and Extensibility, 5th edition, December 2015
- Part 4: Transitional Migration Features, 5th edition, December 2016
Required repository treatment
- Store the downloaded ZIPs unchanged.
- Add a
SHA256SUMS file for the exact vendored ZIPs.
- Add a
README.md next to the artifacts explaining source URL, download date, edition/part metadata, and why the artifacts are vendored.
- Add the required Ecma copyright notice/license/disclaimer text or a pointer file sufficient to satisfy the Ecma text copyright policy.
- Do not edit the standard artifacts in place.
- Any extracted/generated artifacts must clearly identify themselves as derived from the unchanged vendored source and must record the input ZIP checksum.
Implementation direction
Create a small spec-ingestion layer that can read the vendored ECMA artifacts and produce structured metadata usable by TypeScript and docs.
At minimum, produce:
-
Artifact manifest
- Edition
- Part
- Title
- Publication date
- Vendored path
- SHA-256
- Source URL
- Notes/copyright status
-
Spec-reference manifest
- Stable internal ID, e.g.
ooxml.ecma376.5ed.part4.<topic-or-section>
- Edition / part / section or topic
- Normative vs informative where determinable
- Source artifact and locator
- Coverage status:
covered, partial, out-of-scope, not-yet-covered, implementation-note
- Related tests/source files
-
Generated OOXML vocabulary registry
- Namespace URI
- Preferred prefix
- Local name
- QName form, e.g.
w:fldChar
- Clark-notation form
- Element vs attribute where determinable
- Source schema/artifact locator
-
Generated TypeScript constants
- Generated constants should represent raw vocabulary entries.
- Existing handwritten constants such as
W_FLDCHAR, W_INSTRTEXT, W_DEL, W_INS, etc. should gradually migrate to generated constants or be validated against the generated registry.
-
Hand-authored semantic groups over generated vocabulary
- Groups such as
FIELD_CHAR_TAG_NAMES / FIELD_CODE_BOUNDARY_TAGS should not be treated as purely schema-generated.
- They should be hand-authored semantic subsets that import generated vocabulary constants and cite spec-reference IDs.
- Example target shape:
/**
* Field-code marker/payload elements that require field-context-aware splitting.
*
* @ooxmlSpec ooxml.ecma376.5ed.part4.fields.fragmented-track-changes
*/
export const FIELD_CODE_BOUNDARY_TAGS = new Set([
W.FLD_CHAR.qname,
W.INSTR_TEXT.qname,
W.DEL_INSTR_TEXT.qname,
]);
Initial migration target
Start with the field-fragmentation area from #217:
w:fldChar
w:instrText
w:delInstrText
w:ins
w:del
- any attributes needed for
w:fldChar/@w:fldCharType, w:id, w:author, and w:date
Then add a report showing whether packages/docx-core/src/baselines/atomizer/inPlaceModifier.ts references generated/validated OOXML names for those elements.
Acceptance criteria
Non-goals
- No claim of full OOXML implementation coverage.
- No modification of the ECMA artifacts themselves.
- No requirement to migrate every existing OOXML string constant in the first PR.
- No replacement for interoperability tests against Microsoft Word, LibreOffice, docx4j, or other consumers/producers.
Context
Child issue of #223.
We want SafeDocX's OOXML constants, semantic tag groups, implementation comments, and tests to be traceable to the official ECMA-376 standard artifacts rather than to hand-written strings or internal issue history alone.
Decision from project discussion: vendor the official ECMA-376 ZIP files unchanged rather than relying only on a fetch script. The reason is durability: the ECMA website URL structure may change, while the repository should continue to preserve the exact standard edition used for review, agent lookup, and reproducible code generation.
Source artifacts
Use the ECMA-376 official publication page:
https://ecma-international.org/publications-and-standards/standards/ecma-376/
Vendor the official ZIP downloads for the four parts listed there:
Required repository treatment
SHA256SUMSfile for the exact vendored ZIPs.README.mdnext to the artifacts explaining source URL, download date, edition/part metadata, and why the artifacts are vendored.Implementation direction
Create a small spec-ingestion layer that can read the vendored ECMA artifacts and produce structured metadata usable by TypeScript and docs.
At minimum, produce:
Artifact manifest
Spec-reference manifest
ooxml.ecma376.5ed.part4.<topic-or-section>covered,partial,out-of-scope,not-yet-covered,implementation-noteGenerated OOXML vocabulary registry
w:fldCharGenerated TypeScript constants
W_FLDCHAR,W_INSTRTEXT,W_DEL,W_INS, etc. should gradually migrate to generated constants or be validated against the generated registry.Hand-authored semantic groups over generated vocabulary
FIELD_CHAR_TAG_NAMES/FIELD_CODE_BOUNDARY_TAGSshould not be treated as purely schema-generated.Initial migration target
Start with the field-fragmentation area from #217:
w:fldCharw:instrTextw:delInstrTextw:insw:delw:fldChar/@w:fldCharType,w:id,w:author, andw:dateThen add a report showing whether
packages/docx-core/src/baselines/atomizer/inPlaceModifier.tsreferences generated/validated OOXML names for those elements.Acceptance criteria
@ooxmlSpecID.Non-goals