Handle morph-type tokens: filtering / persistence#2202
Handle morph-type tokens: filtering / persistence#2202myieye wants to merge 13 commits intofeat/sync-morph-typesfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe PR implements morph-type-aware headword handling with secondary-order sorting defaults. Changes include updating headword computation to apply morph prefix/postfix tokens conditionally, adding secondary-order tie-breaking (defaulting to Stem) across sorting and search logic, modifying search predicates to use SearchHeadwords with tokens, updating the FTS table to include headwords with tokens for all writing systems, and adding comprehensive tests for morph-token and secondary-order behavior. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
13a4e0e to
7723717
Compare
|
The latest updates on your projects. Learn more about Argos notifications ↗︎
|
a5ea68d to
4e2d25d
Compare
4e2d25d to
a193991
Compare
rmunn
left a comment
There was a problem hiding this comment.
Search for "Missed a rename here" to find places that need updating post-rename.
Also, be aware that I ran out of time to finish the review properly, so not everything has been carefully checked.
backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
Outdated
Show resolved
Hide resolved
backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
Outdated
Show resolved
Hide resolved
backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
Outdated
Show resolved
Hide resolved
frontend/viewer/src/lib/dotnet-types/generated-types/MiniLcm/Models/IMorphTypeData.ts
Outdated
Show resolved
Hide resolved
8ac397d to
dc45a12
Compare
dc45a12 to
823531a
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
backend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt (1)
19518-19662: Consider adding targeted semantic assertions alongside this snapshot.These large fixture updates are valid but high-churn. A few focused assertions (e.g., token-aware headword/filter/sort outcomes for representative entries) would make regressions easier to detect than ID-heavy snapshot diffs alone.
Also applies to: 27606-27722, 36860-36917, 56276-56350, 62143-62207
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt` around lines 19518 - 19662, The snapshot update is too ID-heavy and lacks focused assertions—add a few targeted semantic assertions in the test suite that uses sena-3-live_snapshot.verified.txt: for example, assert token-aware headword/filter/sort behavior for the LexemeForm "dza" entries (check senses with Gloss "CUL" and "PURP"), verify that the sense definitions (e.g., "culmination" and "in order to, distal/purpose infinitive") appear in search/filter results, and add one or two deterministic checks (headword normalization, sort order, and a sample published-in "Main Dictionary" membership) so regressions are caught without relying only on huge ID-filled snapshots.backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs (1)
8-8: Add a tokenized headword case to this parity test.With
_morphTypesalways empty and both fixtures matching only through rawLexemeForm, this still exercises the pre-token path. Adding a prefix/suffix morph type plus an entry with noCitationFormwould make this test catch the-word/word-behavior this PR is introducing, and it will also flush out any drift betweenSearchFilter(...)andCompiledFilter(...).Also applies to: 17-17, 40-40, 56-57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs` at line 8, The parity test currently only exercises the pre-token path because _morphTypes is empty; add a tokenized headword case by populating _morphTypes with a MorphType that has a prefix or suffix (e.g., Prefix or Suffix set) and add a test entry/lexeme with no CitationForm but a LexemeForm that requires tokenization (e.g., "-word" or "word-"); then assert that SearchFilter(...) and CompiledFilter(...) produce identical results for that scenario to catch the new -word/word- behavior and any drift between the two filter paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/FwLite/LcmCrdt/Data/EntryQueryHelpers.cs`:
- Around line 24-29: The code currently checks citation presence using the raw
e.CitationForm[ws], causing whitespace-only citation forms to block falling back
to lexeme; change the logic to trim/normalize the citation string first (e.g.,
var citation = (e.CitationForm[ws] ?? "").Trim(); apply any additional
normalization used by LcmHelpers.LexEntryHeadword if needed), then treat an
empty trimmed citation as absent and fall back to lexeme and token application
(keep using leading/trailing when building the fallback headword). Update the
same pattern in the other occurrences mentioned so all places use the
trimmed/normalized citation before deciding override behavior, ensuring
consistency with FwDataMiniLcmBridge.Api.LcmHelpers.LexEntryHeadword().
In `@backend/FwLite/LcmCrdt/Data/Filtering.cs`:
- Around line 18-26: In SearchFilter, the left-join can produce mt == null via
DefaultIfEmpty(), so the where clause currently reads mt.Prefix/mt.Postfix and
will throw for LINQ-to-Objects; update the predicate in SearchFilter to guard mt
before accessing its tokens (e.g., use null-safe values for mt.Prefix/mt.Postfix
or check mt != null before calling entry.SearchHeadwords) so the expression
works when mt is null while preserving the other conditions
(entry.LexemeForm.SearchValue and Senses.Any(...)).
In `@backend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.cs`:
- Around line 308-317: The code in ToEntrySearchRecord narrows headwords to only
keys present in writingSystems and drops any headword entries for WS not yet in
writingSystems; instead, build the headword string from the computed headwords
dictionary itself while using writingSystems only to impose a stable ordering
where definitions exist: iterate writingSystems and append headwords[ws.WsId]
when present, then append any remaining headwords keys (from the headwords
dictionary) that weren't in writingSystems in a deterministic order (e.g.,
sorted by WsId) so no computed headword is lost from FTS.
- Around line 214-217: The MorphType write paths (CreateMorphType,
UpdateMorphType, DeleteMorphType in CrdtMiniLcmApi.cs) do not trigger
reindexing, so update one or more of those methods to call the existing indexing
hooks: call UpdateEntrySearchTable() for affected entries (or
RegenerateEntrySearchTable() if simpler) after a MorphType change, ensuring you
pass the proper EntrySearchRecordsTable/context; alternatively extend
UpdateEntrySearchTableInterceptor to also detect MorphType changes and propagate
reindexing. Locate the MorphType handlers
(CreateMorphType/UpdateMorphType/DeleteMorphType) and add a call to
UpdateEntrySearchTable() or RegenerateEntrySearchTable() (or invoke the same
code path used by Entry/Sense updates) so headword tokenization is refreshed
when morph prefixes/postfixes change.
---
Nitpick comments:
In `@backend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txt`:
- Around line 19518-19662: The snapshot update is too ID-heavy and lacks focused
assertions—add a few targeted semantic assertions in the test suite that uses
sena-3-live_snapshot.verified.txt: for example, assert token-aware
headword/filter/sort behavior for the LexemeForm "dza" entries (check senses
with Gloss "CUL" and "PURP"), verify that the sense definitions (e.g.,
"culmination" and "in order to, distal/purpose infinitive") appear in
search/filter results, and add one or two deterministic checks (headword
normalization, sort order, and a sample published-in "Main Dictionary"
membership) so regressions are caught without relying only on huge ID-filled
snapshots.
In `@backend/FwLite/LcmCrdt.Tests/Data/FilteringTests.cs`:
- Line 8: The parity test currently only exercises the pre-token path because
_morphTypes is empty; add a tokenized headword case by populating _morphTypes
with a MorphType that has a prefix or suffix (e.g., Prefix or Suffix set) and
add a test entry/lexeme with no CitationForm but a LexemeForm that requires
tokenization (e.g., "-word" or "word-"); then assert that SearchFilter(...) and
CompiledFilter(...) produce identical results for that scenario to catch the new
-word/word- behavior and any drift between the two filter paths.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 78c21460-3da7-4a9e-85ac-947ed35937a0
📒 Files selected for processing (25)
backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.csbackend/FwLite/FwDataMiniLcmBridge/Api/FwDataMiniLcmApi.csbackend/FwLite/FwDataMiniLcmBridge/Api/LcmHelpers.csbackend/FwLite/FwDataMiniLcmBridge/Api/Sorting.csbackend/FwLite/FwDataMiniLcmBridge/Api/UpdateProxy/UpdateMorphTypeProxy.csbackend/FwLite/FwLiteProjectSync.Tests/sena-3-live_snapshot.verified.txtbackend/FwLite/LcmCrdt.Tests/Data/FilteringTests.csbackend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.SearchTableIsUpdatedAutomaticallyOnInsert.verified.txtbackend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.SearchTableIsUpdatedAutomaticallyOnUpdate.verified.txtbackend/FwLite/LcmCrdt.Tests/FullTextSearch/EntrySearchServiceTests.csbackend/FwLite/LcmCrdt.Tests/MiniLcmTests/SortingTests.csbackend/FwLite/LcmCrdt/Data/EntryQueryHelpers.csbackend/FwLite/LcmCrdt/Data/Filtering.csbackend/FwLite/LcmCrdt/Data/MiniLcmRepository.csbackend/FwLite/LcmCrdt/Data/Sorting.csbackend/FwLite/LcmCrdt/FullTextSearch/EntrySearchService.csbackend/FwLite/LcmCrdt/Json.csbackend/FwLite/LcmCrdt/QueryHelpers.csbackend/FwLite/LcmCrdt/SqlHelpers.csbackend/FwLite/MiniLcm.Tests/QueryEntryTestsBase.csbackend/FwLite/MiniLcm.Tests/SortingTestsBase.csbackend/FwLite/MiniLcm/Models/Entry.csbackend/FwLite/MiniLcm/Models/MorphType.csfrontend/viewer/src/project/demo/demo-entry-data.tsfrontend/viewer/src/stories/editor/entity-primitives/entry-editor-primitive.stories.svelte
rmunn
left a comment
There was a problem hiding this comment.
Looking good so far. Not yet ready to approve, but it's looking pretty close. Also not yet ready to hit the "request changes" button since a few of my changes I'm uncertain about. I'll discuss this with you at our meeting, then I should have a better grasp on it.
backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
Outdated
Show resolved
Hide resolved
backend/FwLite/FwDataMiniLcmBridge.Tests/MiniLcmTests/SortingTests.cs
Outdated
Show resolved
Hide resolved
backend/FwLite/FwDataMiniLcmBridge/Api/UpdateProxy/UpdateMorphTypeProxy.cs
Show resolved
Hide resolved
rmunn
left a comment
There was a problem hiding this comment.
I'm ready to go ahead and approve this. There are a couple tweaks we may want to do before we merge, but after our discussion yesterday, I think this is in a good state to merge once we get the tests passing.
based on YAGNI principle
4246a62 to
2e82fa6
Compare
Resolves #2198