Skip to content

Implement homograph numbers and improve entry search/sort behavior#2220

Draft
myieye wants to merge 7 commits intoclaude/add-lexeme-headwords-TowRXfrom
claude/add-homograph-numbers-V1a05
Draft

Implement homograph numbers and improve entry search/sort behavior#2220
myieye wants to merge 7 commits intoclaude/add-lexeme-headwords-TowRXfrom
claude/add-homograph-numbers-V1a05

Conversation

@myieye
Copy link
Copy Markdown
Collaborator

@myieye myieye commented Mar 24, 2026

Summary

This PR implements homograph number auto-assignment for entries with duplicate headwords and significantly improves entry search and sorting behavior to properly handle morphological tokens and secondary ordering.

Key Changes

Homograph Number Management

  • Added HomographNumber property to Entry model to distinguish entries with identical headwords
  • Implemented auto-assignment logic in CrdtMiniLcmApi.AssignHomographNumber() that:
    • Groups entries by headword and MorphType.SecondaryOrder
    • Auto-assigns sequential numbers (1, 2, 3...) when creating new entries
    • Promotes existing lone entries from 0 to 1 when a duplicate is created
    • Respects explicitly set homograph numbers

Search and Filtering Improvements

  • Morph Token Handling: Search now properly strips morph tokens (prefixes/suffixes) from lexeme forms before matching, while preserving them in headword display
  • Citation Form Override: Citation forms now correctly override lexeme forms with morph tokens in search results
  • Lexeme Form Matching: Fixed search to match against lexeme forms (not just headwords), enabling discovery of entries where only the lexeme matches the query
  • Secondary Order Integration: Sorting now incorporates MorphType.SecondaryOrder to group related morphological variants together

Sorting Enhancements

  • Added ApplyHeadwordOrder() method to apply consistent headword-based sorting with secondary order consideration
  • Updated ApplyRoughBestMatchOrder() to include secondary order and homograph number in sort criteria
  • Search relevance ranking now properly considers:
    • Headword matches vs. other field matches
    • Prefix matches vs. contains matches
    • Match length (shorter is better)
    • Secondary order (for grouping morphological variants)
    • Homograph number (for stable ordering of duplicates)

Data Model Changes

  • Added Headword computed property to Entry (pre-computed by backend, populated during finalization)
  • Added helper methods: HeadwordWithTokens(), SearchHeadwords(), ComputeHeadwords() in EntryQueryHelpers
  • Updated EntrySearchService.FilterAndRank() to use new ranking logic with secondary order support
  • Modified Filtering.cs to accept IQueryable<MorphType> for proper secondary order filtering

Test Coverage

  • Added comprehensive tests for homograph number auto-assignment
  • Added tests verifying morph tokens don't affect sort order
  • Added tests for search relevance with secondary order (both FTS and non-FTS)
  • Added tests for citation form override behavior
  • Added tests for morph token search functionality

API Updates

  • Updated EntrySearchService.Filter() to accept WritingSystemId parameter
  • Renamed Headword() method to HeadwordText() in various test helpers to distinguish from the new Headword property
  • Updated FwData bridge to compute headwords during entry conversion

Notable Implementation Details

  • Homograph number assignment uses a single DB query pattern consistent with existing sorting logic
  • Morph token stripping is applied only to lexeme form searches, not citation form searches
  • Secondary order defaults to Stem's order when an entry's MorphType is not found
  • The Headword property is computed by the backend and excluded from strict equality checks in sync tests

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2

myieye and others added 7 commits March 17, 2026 10:42
Add HomographNumber (int, 0 = unset) to the Entry model with full
round-trip support through CRDT, FwData bridge, and sync.

Key changes:
- Entry model: add HomographNumber property with Copy() support
- CreateEntryChange: persist HomographNumber in CRDT changes
- CrdtMiniLcmApi: auto-assign homograph numbers on entry creation
  when HomographNumber is 0, respecting SecondaryOrder scoping.
  Updates existing lone entries from 0→1 when a second homograph appears.
- FwDataMiniLcmApi: read HomographNumber from ILexEntry, set on create
- UpdateEntryProxy: bidirectional HomographNumber sync to LibLCM
- EntrySync: include HomographNumber in diff/patch operations
- Sorting: uncomment HomographNumber in CRDT sort and search queries
- Tests: uncomment sorting tests with HomographNumber, add auto-
  assignment tests, add sync test verifying LibLCM corrects numbers
  after entry deletion via two sync cycles

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
…ering

Covers two gaps in homograph number test coverage:
- CitationForm-based grouping (different LexemeForms, same CitationForm)
- Sequential numbering with 3+ entries (verifies max+1 assignment)

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
…aphNumber

Previously, CreateEntry skipped AssignHomographNumber when an explicit
HomographNumber was provided. This left existing lone entries at 0 and
accepted out-of-range values. Now we always query the homograph scope
(matching headword + SecondaryOrder) to enforce invariants:
- Lone entries always get HomographNumber 0
- Explicit values in [1, max+1] are preserved (supports sync reordering)
- Out-of-range values are clamped to max+1

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
…y promotion

Remove clamping logic from AssignHomographNumber. The only case we need
to handle is auto-assigning for new FW-Lite entries (HomographNumber == 0).
Explicit values from sync/import are trusted as-is — FLEx handles getting
homograph numbers right. The key fix remains: always query the homograph
scope so a lone existing entry at 0 gets promoted to 1.

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
Replace the two-stage approach (DB query for headword match, then
in-memory filter for SecondaryOrder) with a single DB query using
correlated subqueries on MorphTypes, matching the pattern in Sorting.cs.

Restore the original if(HomographNumber == 0) guard — explicit values
get no special handling at all; only auto-assign for new FW-Lite entries.

https://claude.ai/code/session_01FJj2v135u6KdgVxoK4tRp2
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 27ebdd83-7154-428d-9ce1-453561d9599e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-homograph-numbers-V1a05

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 24, 2026
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 4246a62 to 2e82fa6 Compare March 27, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants