Skip to content

Handle morph-type tokens: filtering / persistence#2202

Open
myieye wants to merge 13 commits intofeat/sync-morph-typesfrom
claude/add-lexeme-headwords-TowRX
Open

Handle morph-type tokens: filtering / persistence#2202
myieye wants to merge 13 commits intofeat/sync-morph-typesfrom
claude/add-lexeme-headwords-TowRX

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Mar 10, 2026

Resolves #2198

  • Update FTS headword column to include all WS and appropriate morph-tokens
  • Handle queries that include morph-tokens (FwData, FTS and CRDT non-FTS)
  • Respect MorphType.SecondaryOrder when sorting

@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 10, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 10, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7e177e16-054c-45c8-9df8-4b397721b873

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR implements morph-type-aware headword handling with secondary-order sorting defaults. Changes include updating headword computation to apply morph prefix/postfix tokens conditionally, adding secondary-order tie-breaking (defaulting to Stem) across sorting and search logic, modifying search predicates to use SearchHeadwords with tokens, updating the FTS table to include headwords with tokens for all writing systems, and adding comprehensive tests for morph-token and secondary-order behavior.

Changes

Cohort / File(s) Summary
Headword Computation & Morph-Token Handling
FwDataMiniLcmBridge/Api/LcmHelpers.cs, LcmCrdt/Data/EntryQueryHelpers.cs
Updated LexEntryHeadword and LexEntryHeadwordOrUnknown to accept applyMorphTokens parameter (defaults true); added SearchHeadWord extension to check headword containment across vernacular writing systems. Introduced HeadwordWithTokens, SearchHeadwords, and ComputeHeadwords helpers for token-aware headword calculation with morph-type prefix/postfix wrapping.
Sorting with Secondary-Order Defaults
FwDataMiniLcmBridge/Api/Sorting.cs, LcmCrdt/Data/Sorting.cs
Added ApplyHeadwordOrder extension methods in both bridges; updated ApplyRoughBestMatchOrder signature to accept stemSecondaryOrder parameter. Extended tie-breaking logic to use PrimaryMorphType?.SecondaryOrder (falling back to stem), then HomographNumber, then Id. Reordered diacritic match relevance tiers (starts-with before contains) in rough matching.
Entry Search & Filtering Integration
FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs, LcmCrdt/Data/Filtering.cs, LcmCrdt/FullTextSearch/EntrySearchService.cs, LcmCrdt/Data/MiniLcmRepository.cs
Modified Filtering.SearchFilter to accept entries and morphTypes queryables, returning IQueryable<Entry> with SearchHeadwords predicate. Updated EntrySearchService.Filter and FilterAndRank to accept MorphType[] parameter. Changed search-table indexing to compute headwords via ComputeHeadwords concatenating all vernacular writing systems. Added StripMorphTokens helper for morph-aware token stripping.
Model & Support Infrastructure
MiniLcm/Models/MorphType.cs, FwDataMiniLcmBridge/Api/UpdateProxy/UpdateMorphTypeProxy.cs, LcmCrdt/Json.cs, LcmCrdt/SqlHelpers.cs, LcmCrdt/QueryHelpers.cs
Made MorphType.Kind a required property; updated UpdateMorphTypeProxy constructor to initialize Kind with [SetsRequiredMembers] attribute. Added QueryEntries overload in Json.cs for MultiString queryability. Added ConcatTokens SQL helper for server-side token concatenation. Minor formatting in QueryHelpers.Finalize signature.
Integration & Boundary Updates
FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs, LcmCrdt/Data/MiniLcmRepository.cs
Updated sorting paths to retrieve morphTypes and pass into ApplyHeadwordOrder and ApplyRoughBestMatchOrder. Modified EntrySearchPredicate to use entry.SearchHeadWord(query) instead of CitationForm.SearchValue(query). Refactored FromLexEntry to construct Entry in local variable before return.
Test Suite: Secondary-Order & Morph-Token Coverage
FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs, LcmCrdt.Tests/MiniLcmTests/SortingTests.cs, MiniLcm.Tests/SortingTestsBase.cs, MiniLcm.Tests/QueryEntryTestsBase.cs
Added SecondaryOrder_DefaultsToStem theory tests covering FTS/non-FTS queries across Headword and SearchRelevance sort fields. Added MorphTokens_DoNotAffectSortOrder and four SecondaryOrder_Relevance/Headword_* tests validating secondary-order tiebreaking with morph types. Updated SuccessfulMatches with identical default parameter; added tests for lexeme matching, citation form override, and prefix/suffix morph-token searching.
Test Data & Verification Fixtures
LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests*.verified.txt, LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.cs, LcmCrdt.Tests/Data/FilteringTests.cs, FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt
Updated search-record Headword fields to reflect multi-WS concatenated format (e.g., citation1citation1 fr_citation1). Modified RanksResultsByColumn test setup to use "en" writing system and updated expected ranking order. Updated snapshot data with new entry IDs and morphological field adjustments. Added _morphTypes queryable field to FilteringTests and updated SearchFilter calls to include morph types.
Frontend Demo Data
frontend/viewer/src/project/demo/demo-entry-data.ts, frontend/viewer/src/stories/editor/entity-primitives/entry-editor-primitive.stories.svelte
Added empty headword object properties to demo _entries records. Updated story initialization to include headword with 'seh' value set to "Citation form" alongside existing lexemeForm and citationForm.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • rmunn
  • hahn-kev
  • imnasnainaec

Poem

🐰 Hops through headwords, tokens in tow,
Morph-types dance as sorting takes the bow,
Secondary orders tie-break true,
Stems default when unknown breaks through,
Search and filter now understand the way—
A rabbit's delight in a well-ordered day! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Handle morph-type tokens: filtering / persistence' accurately describes the primary changes: implementing morph-type token handling in filtering and persistence systems across FTS and non-FTS code paths.
Description check ✅ Passed The description clearly relates to the changeset by referencing issue #2198 and outlining three key objectives: FTS headword updates, query token handling, and MorphType.SecondaryOrder respect—all reflected in the changes.
Linked Issues check ✅ Passed The changes comprehensively implement issue #2198 requirements: FTS headwords now include all WS with morph-tokens, filtering respects tokens in queries, sorting respects MorphType.SecondaryOrder, and tokens don't affect sort order (only secondary order does).
Out of Scope Changes check ✅ Passed All changes are scoped to the linked issue #2198: token-aware filtering/persistence, headword computation, sorting logic, and FTS updates. Frontend demo data and story updates appear incidental to API payload changes but remain within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-lexeme-headwords-TowRX

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 13a4e0e to 7723717 Compare March 10, 2026 14:37
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 10, 2026

UI unit Tests

  1 files  ±  0   54 suites  +51   27s ⏱️ +27s
140 tests +130  140 ✅ +130  0 💤 ±0  0 ❌ ±0 
207 runs  +197  207 ✅ +197  0 💤 ±0  0 ❌ ±0 

Results for commit 7eddc67. ± Comparison against base commit 26054ac.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 10, 2026

C# Unit Tests

0 tests   - 165   0 ✅  - 165   0s ⏱️ -19s
0 suites  -  23   0 💤 ±  0 
0 files    -   1   0 ❌ ±  0 

Results for commit a193991. ± Comparison against base commit 469a825.

♻️ This comment has been updated with latest results.

@argos-ci
Copy link
Copy Markdown

argos-ci bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ✅ No changes detected - Mar 27, 2026, 4:28 PM

@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from a5ea68d to 4e2d25d Compare March 13, 2026 15:58
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 4e2d25d to a193991 Compare March 16, 2026 16:54
@github-actions github-actions bot added the 📦 Lexbox issues related to any server side code, fw-headless included label Mar 16, 2026
@myieye myieye changed the base branch from develop to feat/sync-morph-types March 16, 2026 16:55
Copy link
Copy Markdown
Contributor

@rmunn rmunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search for "Missed a rename here" to find places that need updating post-rename.

Also, be aware that I ran out of time to finish the review properly, so not everything has been carefully checked.

@myieye myieye closed this Mar 17, 2026
@myieye myieye reopened this Mar 17, 2026
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 8ac397d to dc45a12 Compare March 17, 2026 09:42
@myieye myieye changed the title Add pre-computed headwords with morph tokens to Entry model Handle morph-type tokens: filtering / persistence Mar 17, 2026
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from dc45a12 to 823531a Compare March 24, 2026 14:27
@myieye myieye marked this pull request as ready for review March 24, 2026 15:39
@myieye
Copy link
Copy Markdown
Collaborator Author

myieye commented Mar 24, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
backend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt (1)

19518-19662: Consider adding targeted semantic assertions alongside this snapshot.

These large fixture updates are valid but high-churn. A few focused assertions (e.g., token-aware headword/filter/sort outcomes for representative entries) would make regressions easier to detect than ID-heavy snapshot diffs alone.

Also applies to: 27606-27722, 36860-36917, 56276-56350, 62143-62207

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt`
around lines 19518 - 19662, The snapshot update is too ID-heavy and lacks
focused assertions—add a few targeted semantic assertions in the test suite that
uses sena-3-live_snapshot.verified.txt: for example, assert token-aware
headword/filter/sort behavior for the LexemeForm "dza" entries (check senses
with Gloss "CUL" and "PURP"), verify that the sense definitions (e.g.,
"culmination" and "in order to, distal/purpose infinitive") appear in
search/filter results, and add one or two deterministic checks (headword
normalization, sort order, and a sample published-in "Main Dictionary"
membership) so regressions are caught without relying only on huge ID-filled
snapshots.
backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs (1)

8-8: Add a tokenized headword case to this parity test.

With _morphTypes always empty and both fixtures matching only through raw LexemeForm, this still exercises the pre-token path. Adding a prefix/suffix morph type plus an entry with no CitationForm would make this test catch the -word / word- behavior this PR is introducing, and it will also flush out any drift between SearchFilter(...) and CompiledFilter(...).

Also applies to: 17-17, 40-40, 56-57

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs` at line 8, The parity
test currently only exercises the pre-token path because _morphTypes is empty;
add a tokenized headword case by populating _morphTypes with a MorphType that
has a prefix or suffix (e.g., Prefix or Suffix set) and add a test entry/lexeme
with no CitationForm but a LexemeForm that requires tokenization (e.g., "-word"
or "word-"); then assert that SearchFilter(...) and CompiledFilter(...) produce
identical results for that scenario to catch the new -word/word- behavior and
any drift between the two filter paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/FwLite/LcmCrdt/Data/EntryQueryHelpers.cs`:
- Around line 24-29: The code currently checks citation presence using the raw
e.CitationForm[ws], causing whitespace-only citation forms to block falling back
to lexeme; change the logic to trim/normalize the citation string first (e.g.,
var citation = (e.CitationForm[ws] ?? "").Trim(); apply any additional
normalization used by LcmHelpers.LexEntryHeadword if needed), then treat an
empty trimmed citation as absent and fall back to lexeme and token application
(keep using leading/trailing when building the fallback headword). Update the
same pattern in the other occurrences mentioned so all places use the
trimmed/normalized citation before deciding override behavior, ensuring
consistency with FwDataMiniLcmBridge.Api.LcmHelpers.LexEntryHeadword().

In `@backend/FwLite/LcmCrdt/Data/Filtering.cs`:
- Around line 18-26: In SearchFilter, the left-join can produce mt == null via
DefaultIfEmpty(), so the where clause currently reads mt.Prefix/mt.Postfix and
will throw for LINQ-to-Objects; update the predicate in SearchFilter to guard mt
before accessing its tokens (e.g., use null-safe values for mt.Prefix/mt.Postfix
or check mt != null before calling entry.SearchHeadwords) so the expression
works when mt is null while preserving the other conditions
(entry.LexemeForm.SearchValue and Senses.Any(...)).

In `@backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs`:
- Around line 308-317: The code in ToEntrySearchRecord narrows headwords to only
keys present in writingSystems and drops any headword entries for WS not yet in
writingSystems; instead, build the headword string from the computed headwords
dictionary itself while using writingSystems only to impose a stable ordering
where definitions exist: iterate writingSystems and append headwords[ws.WsId]
when present, then append any remaining headwords keys (from the headwords
dictionary) that weren't in writingSystems in a deterministic order (e.g.,
sorted by WsId) so no computed headword is lost from FTS.
- Around line 214-217: The MorphType write paths (CreateMorphType,
UpdateMorphType, DeleteMorphType in CrdtMiniLcmApi.cs) do not trigger
reindexing, so update one or more of those methods to call the existing indexing
hooks: call UpdateEntrySearchTable() for affected entries (or
RegenerateEntrySearchTable() if simpler) after a MorphType change, ensuring you
pass the proper EntrySearchRecordsTable/context; alternatively extend
UpdateEntrySearchTableInterceptor to also detect MorphType changes and propagate
reindexing. Locate the MorphType handlers
(CreateMorphType/UpdateMorphType/DeleteMorphType) and add a call to
UpdateEntrySearchTable() or RegenerateEntrySearchTable() (or invoke the same
code path used by Entry/Sense updates) so headword tokenization is refreshed
when morph prefixes/postfixes change.

---

Nitpick comments:
In `@backend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt`:
- Around line 19518-19662: The snapshot update is too ID-heavy and lacks focused
assertions—add a few targeted semantic assertions in the test suite that uses
sena-3-live_snapshot.verified.txt: for example, assert token-aware
headword/filter/sort behavior for the LexemeForm "dza" entries (check senses
with Gloss "CUL" and "PURP"), verify that the sense definitions (e.g.,
"culmination" and "in order to, distal/purpose infinitive") appear in
search/filter results, and add one or two deterministic checks (headword
normalization, sort order, and a sample published-in "Main Dictionary"
membership) so regressions are caught without relying only on huge ID-filled
snapshots.

In `@backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs`:
- Line 8: The parity test currently only exercises the pre-token path because
_morphTypes is empty; add a tokenized headword case by populating _morphTypes
with a MorphType that has a prefix or suffix (e.g., Prefix or Suffix set) and
add a test entry/lexeme with no CitationForm but a LexemeForm that requires
tokenization (e.g., "-word" or "word-"); then assert that SearchFilter(...) and
CompiledFilter(...) produce identical results for that scenario to catch the new
-word/word- behavior and any drift between the two filter paths.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 78c21460-3da7-4a9e-85ac-947ed35937a0

📥 Commits

Reviewing files that changed from the base of the PR and between c62728c and 4246a62.

📒 Files selected for processing (25)
  • backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
  • backend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.cs
  • backend/FwLite/FwDataMiniLcmBridge/Api/LcmHelpers.cs
  • backend/FwLite/FwDataMiniLcmBridge/Api/Sorting.cs
  • backend/FwLite/FwDataMiniLcmBridge/Api/UpdateProxy/UpdateMorphTypeProxy.cs
  • backend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt
  • backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs
  • backend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.SearchTableIsUpdatedAutomaticallyOnInsert.verified.txt
  • backend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.SearchTableIsUpdatedAutomaticallyOnUpdate.verified.txt
  • backend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.cs
  • backend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.cs
  • backend/FwLite/LcmCrdt/Data/EntryQueryHelpers.cs
  • backend/FwLite/LcmCrdt/Data/Filtering.cs
  • backend/FwLite/LcmCrdt/Data/MiniLcmRepository.cs
  • backend/FwLite/LcmCrdt/Data/Sorting.cs
  • backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs
  • backend/FwLite/LcmCrdt/Json.cs
  • backend/FwLite/LcmCrdt/QueryHelpers.cs
  • backend/FwLite/LcmCrdt/SqlHelpers.cs
  • backend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.cs
  • backend/FwLite/MiniLcm.Tests/SortingTestsBase.cs
  • backend/FwLite/MiniLcm/Models/Entry.cs
  • backend/FwLite/MiniLcm/Models/MorphType.cs
  • frontend/viewer/src/project/demo/demo-entry-data.ts
  • frontend/viewer/src/stories/editor/entity-primitives/entry-editor-primitive.stories.svelte

Copy link
Copy Markdown
Contributor

@rmunn rmunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Not yet ready to approve, but it's looking pretty close. Also not yet ready to hit the "request changes" button since a few of my changes I'm uncertain about. I'll discuss this with you at our meeting, then I should have a better grasp on it.

Copy link
Copy Markdown
Contributor

@rmunn rmunn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to go ahead and approve this. There are a couple tweaks we may want to do before we merge, but after our discussion yesterday, I think this is in a good state to merge once we get the tests passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related 📦 Lexbox issues related to any server side code, fw-headless included 📙 Platform.Bible

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants