feat: diffing extension for comparing documents (SD-1324 and SD-89) by luccas-harbour · Pull Request #2306 · superdoc-dev/superdoc

luccas-harbour · 2026-03-05T14:46:53Z

Summary

This PR delivers an end-to-end document compare + replay workflow, including comment
diff/replay and tracked-changes integration.

What’s Included

Adds a full diffing extension in @superdoc/super-editor with compareDocuments and
replayDifferences commands.
Implements document diff computation across block, paragraph, inline text, inline
nodes, attrs, marks, comments, styles and numbering properties.
Implements replay engine for paragraph/non-paragraph/inline/comment/styles/numbering diffs.
Adds tracked-changes-aware replay behavior:
- applyTrackedChanges support in replay.
- Single-transaction replay path (avoids lost replay steps).
Improves diff/replay correctness:
- Preserves duplicate same-type marks (e.g. overlapping comment marks).
- Applies inline run-attribute diffs for modified text ranges.
- Matches inline node types by name across different editor/schema instances.
- Fixes insertion anchor computation for depth transitions in tree diffs.
- Preserves multi-block comment body edits in comment diffing.
Improves multi-document comment safety in superdoc:
- Scopes replay update/delete handling to active document context.
- Uses imported-id-aware identity matching where needed.
- Deletes full reply subtree for thread removals.
- Avoids replay-driven active-thread flicker by syncing active state only when
  explicitly requested.
Improves tracked-change comment resync/pruning:
- Rebuilds after replay completion.
- Prunes stale tracked-change threads only for active document.
- Uses both commentId and importedId to avoid false prune/duplication.
Improves DOCX comment fidelity on replay/export:
- Carries document identity (documentId/fileId) in replay comment payloads.
- Preserves structured comment bodies (elements ↔ docxCommentJSON) on add/update.
- Ensures updated docxCommentJSON is reflected by getValues() for export.
- Applies isDone fallback to resolved fields when replay payload omits explicit
  resolved metadata.

Tests

Adds/updates extensive test coverage for:
- diff algorithms (attributes, inline, paragraph, generic, comment, sequence,
  computeDiff).
- replay modules (replay-inline, replay-paragraph, replay-non-paragraph, replay-
  comments, replay-attrs, marks-from-diff).
- integration fixture tests in replayDiffs.test.js.
- superdoc comment/store behavior (SuperDoc.test.js, comments-store.test.js, use-
  comment.test.js).
Adds fixture corpus for replay scenarios (diff_before*.docx / diff_after*.docx,
including additional cases up to 11).
Validated locally: pnpm --filter super-editor exec vitest run src/extensions/diffing/
replayDiffs.test.js (passing).

Notes

One known replay limitation remains intentionally accepted for now: preserving non-
mark run attrs on some inline text additions. This can be addressed once we stop using marks for formatting and use run properties directly.

…sitions

This function can then be reused when diffing paragraphs and runs. It helps identifying modifications instead of delete/insert pairs

Always maps starting/ending positions to the old document instead of the new one.

…rtedId - seed tracked-change existingIds with both runtime commentId and stable importedId - prevent duplicate tracked-change thread creation when grouped mark id matches an existing imported id - add created sync IDs (id, params.changeId, params.importedId) back into dedupe set during sync pass - add regression test for mixed-ID replay/sync scenario where commentId diverges but importedId remains live

- make useComment store docxCommentJSON as reactive state instead of a construction-time constant - update getValues() to return the current docxCommentJSON value - ensure replay-updated imported comment structure is reflected in translateCommentsForExport output - add unit test verifying getValues() returns updated docxCommentJSON after mutation

- add replay payload normalization for comment model creation (text -> commentText, elements -> docxCommentJSON) - apply normalization in replay ADD path before useComment(...) - reuse the same normalization in replay UPDATE fallback when creating missing comments - ensure replay-added imported comments keep DOCX-native body structure for export/ round-trip - add regression test verifying replay ADD maps elements into docxCommentJSON

- map replay isDone updates to resolvedTime/resolvedBy* when payload resolved fields are null/missing - apply the same fallback during replay payload normalization and model updates - refactor shared isDone resolution fallback logic to avoid duplicated code - add regression test covering replay update payloads with isDone: true and null resolved fields, ensuring resolved state is persisted and can be exported

…ent thread - return the matched comment’s concrete id (prefer commentId) from replay update matching - avoid cross-document active-thread misselection when importedId overlaps across open documents - update replay regression coverage to assert setActiveComment receives the active document’s thread id

… reselection - only sync active comment when activeCommentId is explicitly present in the event payload - avoid inferring active selection from replay add/update events to prevent repeated focus/unfocus churn - preserve explicit active clear behavior on replay deletions - update replay update test expectation to reflect non-selecting replay events

linear · 2026-03-05T14:46:57Z

SD-1324 Diffing method to convert the differences into tracked changes

SD-89 #1 Feature: Document diffing

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b926bfd1d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

packages/superdoc/src/SuperDoc.vue

…remap

caio-pizzol

@luccas-harbour — solid approach overall, and the test coverage for the new diffing extension is impressive (co-located tests for every algorithm and replay module, plus end-to-end coverage).

Correctness: two things to look at — heading text changes can be missed by the diff, and tracked-change comment IDs can collide across documents. one question about the run properties skip when marks change.

DX: small cleanup opportunity with duplicated helpers.

Tests: coverage is strong. only minor gaps around the belongsToDocument legacy fallback in the comments store.

left a few inline comments with details.

caio-pizzol · 2026-03-05T18:40:36Z

packages/super-editor/src/extensions/diffing/algorithm/generic-diffing.ts

+  }
+  return JSON.stringify(oldNodeInfo.node.attrs) !== JSON.stringify(newNodeInfo.node.attrs);
+}
+


if a heading's text changes but nothing else does, the diff misses it entirely. probably rare since most headings come in as styled paragraphs, but it can happen. worth a TODO?

We don't actually use heading nodes anymore, we only use styled paragraphs so this doesn't really happen

caio-pizzol · 2026-03-05T18:40:36Z

packages/superdoc/src/stores/comments-store.js

+    // Build a Set of existing tracked-change IDs for O(1) lookup.
+    // Include both runtime and imported IDs to avoid duplicate threads when
+    // replay/import flows remap commentId but marks still reference importedId.
+    const existingIds = new Set();


this collects IDs from all documents, but Word numbers tracked changes starting from 1 in each file. two files can share the same ID, and the second file's comments get skipped. filtering by document here would fix it.

sounds good, I implemented this

caio-pizzol · 2026-03-05T18:40:36Z

packages/super-editor/src/extensions/diffing/replay/replay-inline.ts

+
+        // runProperties can overlap with mark-derived formatting. Apply these paths
+        // only when marks are unchanged to avoid double-applying style deltas.
+        if (!diff.marksDiff) {


when marks change, all run properties are skipped to avoid applying things twice. but properties like styleId aren't marks — they get lost too. is that an accepted tradeoff for now?

yes, that's an accepted tradeoff for now. we'll need to move away from using marks for formatting (and use runProperties directly) in order to implement this.

caio-pizzol · 2026-03-05T18:40:36Z

packages/super-editor/src/extensions/diffing/replay/marks-from-diff.ts

+ * @param right Second attrs object.
+ * @returns True when both attrs payloads serialize identically.
+ */
+const deepEquals = (left: unknown, right: unknown): boolean => {


there are three deepEquals implementations across the diffing code (here, replay-inline.ts:361, and attributes-diffing.ts:306) — two use JSON.stringify, one does a proper comparison. lodash.isEqual is already installed. worth picking one and reusing it?

good call. I unified those

…ates

Luccas Correa added 30 commits March 5, 2026 11:23

feat: add function for mapping doc paragraphs by id

0838c1a

feat: add function for flattening paragraph text but keep track of po…

17dda09

…sitions

feat: add function for calculating text diffs using LCS

ca37833

feat: add function for calculating paragraph-level diffing

ed342f0

feat: add diffing extension

fcb59e1

test: add diffing tests

758c7a8

refactor: switch LCS algorith to Myers for performance

35b5caa

refactor: code structure

1a05b50

feat: compute text similarity using Levenshtein distance

380ba42

feat: identify contiguous text changes as single operation

acffed0

feat: implement logic for diffing paragraph attributes

cc9f41e

refactor: extract generic sequence diffing helper

fcc497f

refactor: modify paragraph diffing to reuse generic helper

c5751f1

refactor: extract operation reordering function

f06e10b

This function can then be reused when diffing paragraphs and runs. It helps identifying modifications instead of delete/insert pairs

fix: standardize positions for text diffing

da13f09

Always maps starting/ending positions to the old document instead of the new one.

refactor: change text diffing logic to account for formatting

71d9d36

refactor: change from "type" to "action"

2f8fcd3

feat: support diffing non-textual inline nodes

f6231d6

feat: include previous element when building add diff

7bd4594

docs: add JSDoc to inline diffing

0cb5ec1

fix: handle arrays in attributes diff

05b8099

feat: implement generic diffing for all nodes

d1e0195

refactor: move paragraph utility functions

f8f71b6

feat: use generic diffing for diff computation command

cad3f8c

refactor: simplify grouping logic during inline diffing

91f23dc

refactor: convert modules to typescript and improve documentation

9a23375

fix: diffing of inline node attributes

e7ad05a

fix: diff positions for modified non-paragraph nodes

8ff1328

feat: improve diff comparison for table rows

c663198

fix: emit single diff when container node is deleted

b17870b

luccas-harbour added 6 commits March 5, 2026 11:25

superdoc-bot bot added the risk: sensitive label Mar 5, 2026

luccas-harbour added 2 commits March 5, 2026 12:05

test: add missing test documents

860354d

test: adjust diff replay test

9b926bf

luccas-harbour marked this pull request as ready for review March 5, 2026 16:39

luccas-harbour changed the title ~~feat: diffing method to convert the differences into tracked (SD-1324)~~ feat: diffing extension for comparing documents (SD-1324) Mar 5, 2026

chatgpt-codex-connector bot reviewed Mar 5, 2026

View reviewed changes

packages/superdoc/src/SuperDoc.vue Show resolved Hide resolved

packages/superdoc/src/SuperDoc.vue Show resolved Hide resolved

luccas-harbour added 2 commits March 5, 2026 13:52

fix(superdoc): update replay comment parent linkage fields on thread …

de6fa04

…remap

fix(superdoc): scope replay add deduplication to active document context

868415b

luccas-harbour self-assigned this Mar 5, 2026

luccas-harbour changed the title ~~feat: diffing extension for comparing documents (SD-1324)~~ feat: diffing extension for comparing documents (SD-1324 and SD-89) Mar 5, 2026

luccas-harbour requested review from VladaHarbour, caio-pizzol, harbournick and tupizz March 5, 2026 18:02

caio-pizzol reviewed Mar 5, 2026

View reviewed changes

luccas-harbour added 7 commits March 5, 2026 17:15

fix(track-changes): preserve property attrs for ReplaceAroundStep upd…

fa350e1

…ates

feat: implement diffing for styles

b125bae

feat: implement replay for style differences

300bc13

feat: implement diffing for numbering

7f0bf69

feat: implement replay for numbering differences

62d6d43

fix(comments): scope tracked-change dedupe to active document

e1bd511

refactor(diffing): share deepEquals helper across replay modules

7557b8b

luccas-harbour requested a review from caio-pizzol March 6, 2026 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: diffing extension for comparing documents (SD-1324 and SD-89)#2306

feat: diffing extension for comparing documents (SD-1324 and SD-89)#2306
luccas-harbour wants to merge 122 commits intomainfrom
luccas/sd-1324-diffing-method-to-convert-the-differences-into-tracked

luccas-harbour commented Mar 5, 2026 •

edited

Loading

Uh oh!

linear bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

caio-pizzol left a comment

Uh oh!

caio-pizzol Mar 5, 2026

Uh oh!

luccas-harbour Mar 6, 2026

Uh oh!

caio-pizzol Mar 5, 2026

Uh oh!

luccas-harbour Mar 6, 2026

Uh oh!

caio-pizzol Mar 5, 2026

Uh oh!

luccas-harbour Mar 6, 2026

Uh oh!

caio-pizzol Mar 5, 2026

Uh oh!

luccas-harbour Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luccas-harbour commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What’s Included

Tests

Notes

Uh oh!

linear bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

caio-pizzol left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

luccas-harbour commented Mar 5, 2026 •

edited

Loading

linear bot commented Mar 5, 2026 •

edited

Loading