diff --git a/CHANGELOG.md b/CHANGELOG.md index fb6b3c3..6bfbd00 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,9 @@ All notable changes to this project will be documented in this file. ## [Unreleased] - .NET 8 / Open XML SDK 3.x Migration +### Documentation +- **Clarified sectPr handling in table cells (Issue #51)** - Investigated and documented that `GetDocumentMetadata()` correctly ignores `w:sectPr` inside table cells. Sections are body-level constructs; Word ignores section breaks in table cells (MS-OI29500 §17.7.6.1 confirms Word disallows sectPr in table style pPr). Added OOXML corner case documentation and a test (`DM070`) verifying correct behavior. The issue's proposed fix (`body.Descendants(W.sectPr)`) would have been incorrect. + ### Fixed (npm) - **TypeScript subpath exports not resolving under `moduleResolution: "node"` (Issue #113)** - Added `typesVersions` fallback to npm package.json so `docxodus/react` and `docxodus/worker` subpath imports resolve types correctly under all TypeScript module resolution modes. Also reordered export conditions to put `types` before `import` per TypeScript requirements. diff --git a/Docxodus.Tests/DocumentMetadataTests.cs b/Docxodus.Tests/DocumentMetadataTests.cs index 43b9e2a..8d590a0 100644 --- a/Docxodus.Tests/DocumentMetadataTests.cs +++ b/Docxodus.Tests/DocumentMetadataTests.cs @@ -657,6 +657,64 @@ public void DM060_GetDocumentMetadata_CorrectlyConvertsTwipsToPoints() #endregion + #region OOXML Spec Compliance Tests + + [Fact] + public void DM070_GetDocumentMetadata_IgnoresSectPrInsideTableCells() + { + // Sections are body-level constructs in OOXML. Word ignores sectPr inside + // table cells (MS-OI29500 §17.7.6.1 notes Word disallows sectPr in table + // style pPr). This test verifies we correctly produce only 1 section (the + // body-level one), not 2, when a sectPr appears inside a table cell paragraph. + using (var ms = new MemoryStream()) + { + using (var wDoc = WordprocessingDocument.Create(ms, WordprocessingDocumentType.Document)) + { + var mainPart = wDoc.AddMainDocumentPart(); + mainPart.Document = new Document( + new Body( + new Paragraph(new Run(new Text("Before table"))), + new DocumentFormat.OpenXml.Wordprocessing.Table( + new DocumentFormat.OpenXml.Wordprocessing.TableRow( + new DocumentFormat.OpenXml.Wordprocessing.TableCell( + new Paragraph( + new ParagraphProperties( + // This sectPr should be ignored per spec + new SectionProperties( + new PageSize() { Width = 15840, Height = 12240 } + ) + ), + new Run(new Text("Cell with sectPr")) + ) + ) + ) + ), + new Paragraph(new Run(new Text("After table"))), + new SectionProperties( + new PageSize() { Width = 12240, Height = 15840 }, + new PageMargin() { Top = 1440, Right = 1440, Bottom = 1440, Left = 1440 } + ) + ) + ); + mainPart.Document.Save(); + } + + ms.Position = 0; + var wmlDoc = new WmlDocument("test.docx", ms); + + // Act + var metadata = WmlToHtmlConverter.GetDocumentMetadata(wmlDoc); + + // Assert - should be exactly 1 section (the body-level one), not 2 + Assert.Equal(1, metadata.Sections.Count); + + // The section should use the body-level dimensions (US Letter), not the table-cell one + Assert.Equal(612, metadata.Sections[0].PageWidthPt); + } + } + + #endregion + } } diff --git a/Docxodus/WmlToHtmlConverter.cs b/Docxodus/WmlToHtmlConverter.cs index 634bd70..6606415 100644 --- a/Docxodus/WmlToHtmlConverter.cs +++ b/Docxodus/WmlToHtmlConverter.cs @@ -1100,9 +1100,13 @@ public static DocumentMetadata GetDocumentMetadata(WordprocessingDocument wordDo /// /// /// - /// LIMITATION: sectPr elements inside tables or text boxes are NOT detected. - /// This is an edge case - most documents don't have section breaks inside tables. - /// See GitHub issue #51 for tracking this enhancement. + /// NOTE: sectPr elements inside tables or text boxes are intentionally NOT detected. + /// Sections are body-level constructs in OOXML — a section spans top-level body content + /// and is delimited by sectPr in the last paragraph's pPr or the body's trailing sectPr. + /// Word does not support section breaks inside table cells (it either splits the table + /// or ignores the break). MS-OI29500 §17.7.6.1 further notes that Word does not allow + /// sectPr as a child of pPr in table style definitions. See GitHub issue #51 and + /// docs/ooxml_corner_cases.md for details. /// /// private static List<(XElement sectPr, List paragraphs, List tables)> CollectSectionData(XElement body) diff --git a/docs/ooxml_corner_cases.md b/docs/ooxml_corner_cases.md index 8473446..d460ed9 100644 --- a/docs/ooxml_corner_cases.md +++ b/docs/ooxml_corner_cases.md @@ -8,7 +8,9 @@ This document tracks edge cases and quirks in Open XML document processing where - [Legal Numbering with Multi-Level Format Strings](#legal-numbering-with-multi-level-format-strings) 2. [Footnotes](#footnotes) - [Footnote Count Discrepancy in Legal Templates](#footnote-count-discrepancy-in-legal-templates) -3. [Contributing](#contributing) +3. [Section Properties](#section-properties) + - [sectPr Inside Table Cells Must Be Ignored](#sectpr-inside-table-cells-must-be-ignored) +4. [Contributing](#contributing) --- @@ -224,6 +226,90 @@ The ECMA-376 specification clarifies how footnote numbering works: --- +## Section Properties + +### sectPr Inside Table Cells Must Be Ignored + +**Status:** Not a bug — current behavior is correct (March 2026) +**Discovered:** 2026-03-21 +**Related Issue:** [#51](https://github.com/JSv4/Docxodus/issues/51) + +#### The Problem + +Issue #51 reported that `GetDocumentMetadata()` does not detect `w:sectPr` elements nested inside table cells or text boxes, and proposed using `body.Descendants(W.sectPr)` to find them. + +#### Why sectPr in Table Cells Should Be Ignored + +Sections are a **body-level construct** in OOXML. A section spans top-level body content and is delimited by `sectPr` in either: +- The last paragraph's `pPr` (for mid-document sections) +- The `body` element's trailing `sectPr` (for the final section) + +Several pieces of evidence confirm that `sectPr` inside table cells should be ignored: + +1. **MS-OI29500 §17.7.6.1** explicitly states: "The standard states that the cnfStyle, divId, pStyle, rPr, and **sectPr** elements are valid child elements of the pPr element. **Word does not allow these elements** to be child elements of the pPr element" (in table style contexts). + +2. **Word's behavior**: Word does not support section breaks inside table cells. Attempting to insert one either splits the table or the break is silently ignored. + +3. **Structural argument**: The `w:tc` content model shares its schema with `w:body`, which is why the XML schema technically allows `sectPr` in `pPr` inside a table cell. But sections delineate page-level layout (page size, margins, columns) which cannot meaningfully apply within a table cell. + +**Note**: The full ISO/IEC 29500 PDF (not freely searchable online) may contain additional normative language in §17.6.17–19 about this constraint. The evidence above is from publicly accessible Microsoft implementation notes and observed Word behavior. + +#### Minimal XML Reproducer + +```xml + + Before table + + + + + + + + + + + Cell with sectPr + + + + + After table + + + + +``` + +#### Renderer Comparison + +| Renderer | Sections Detected | Behavior | +|----------|------------------|----------| +| Microsoft Word | 1 (body-level only) | Ignores table-cell sectPr | +| LibreOffice Writer | 1 (body-level only) | Ignores table-cell sectPr | +| Docxodus | 1 (body-level only) | Correct — only processes top-level elements | + +#### Analysis + +The `CollectSectionData` method in `WmlToHtmlConverter.cs` iterates over `body.Elements()` (top-level block elements only). This is the correct approach because: + +1. `w:sectPr` in `w:pPr` of top-level paragraphs → valid section breaks (handled) +2. `w:sectPr` as direct child of `w:body` → final section (handled) +3. `w:sectPr` inside table cells → must be ignored per spec (correctly not detected) +4. `w:sectPr` inside text boxes (`w:txbxContent`) → separate content flow, not a document section + +Using `body.Descendants(W.sectPr)` as proposed in #51 would be **incorrect** — it would pick up table-cell sectPr elements that the spec says to ignore. + +#### Relevant Code + +- `Docxodus/WmlToHtmlConverter.cs`: `CollectSectionData()` method (line ~1108) + +#### Test Coverage + +- `DM070_GetDocumentMetadata_IgnoresSectPrInsideTableCells` — verifies only 1 section is detected when a sectPr exists inside a table cell + +--- + ## Contributing When adding new corner cases to this document: