[WIP] Limited version of spelling correction#1007
Closed
veloman-yunkan wants to merge 4 commits intomainfrom
Closed
[WIP] Limited version of spelling correction#1007veloman-yunkan wants to merge 4 commits intomainfrom
veloman-yunkan wants to merge 4 commits intomainfrom
Conversation
This is a prototype version of spelling correction attempting to mirror the client's implementation at https://github.com/gremid/xapian-spelling-suggestions/ For an unknown reason the new unit test fails as follows: [ RUN ] Suggestion.spellingSuggestions Resolve redirect set index test/suggestion.cpp:835: Failure Expected equality of these values: getSpellingSuggestions(a, "Tsunge", 1) Which is: {} std::vector<std::string> ({"Zunge"}) Which is: { "Zunge" } test/suggestion.cpp:841: Failure Expected equality of these values: getSpellingSuggestions(a, "Lax", 1) Which is: {} std::vector<std::string> ({"Lachs"}) Which is: { "Lachs" } test/suggestion.cpp:842: Failure Expected equality of these values: getSpellingSuggestions(a, "Mont", 1) Which is: {} std::vector<std::string> ({"Mond"}) Which is: { "Mond" } test/suggestion.cpp:845: Failure Expected equality of these values: getSpellingSuggestions(a, "Trok", 1) Which is: {} std::vector<std::string> ({"Trog"}) Which is: { "Trog" } test/suggestion.cpp:850: Failure Expected equality of these values: getSpellingSuggestions(a, "Son", 1) Which is: {} std::vector<std::string> ({"Sohn"}) Which is: { "Sohn" } test/suggestion.cpp:852: Failure Expected equality of these values: getSpellingSuggestions(a, "Grahl", 1) Which is: { "Stuhl" } std::vector<std::string> ({"Gral"}) Which is: { "Gral" } test/suggestion.cpp:861: Failure Expected equality of these values: getSpellingSuggestions(a, "aba", 1) Which is: {} std::vector<std::string> ({"aber"}) Which is: { "aber" } test/suggestion.cpp:880: Failure Expected equality of these values: getSpellingSuggestions(a, "Füreschein", 1) Which is: {} std::vector<std::string> ({"Führerschein"}) Which is: { "F\xC3\xBChrerschein" As Text: "Führerschein" } [ FAILED ] Suggestion.spellingSuggestions (280 ms)
4c7c178 to
04d82ad
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1007 +/- ##
==========================================
+ Coverage 58.13% 58.14% +0.01%
==========================================
Files 101 102 +1
Lines 5384 5462 +78
Branches 2197 2234 +37
==========================================
+ Hits 3130 3176 +46
- Misses 795 798 +3
- Partials 1459 1488 +29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
... and fixed a bug in the test data.
This reduced the count of failures in the Suggestion.spellingSuggestions
unit test from 8 to 1:
[ RUN ] Suggestion.spellingSuggestions
Resolve redirect
set index
test/suggestion.cpp:841: Failure
Expected equality of these values:
getSpellingSuggestions(a, "Lax", 1)
Which is: {}
std::vector<std::string> ({"Lachs"})
Which is: { "Lachs" }
[ FAILED ] Suggestion.spellingSuggestions (260 ms)
The spelling correction "Lax -> Lachs" is not returned because the max
edit distance is capped at (length(query_word) - 1) which reduces our
passed value of the max edit distance argument from 3 to 2.
This problem disappears if the version of libxapian found on Ubuntu
22.04 (libxapian.so.30.11.0) is used instead of the one that we build
ourselves as a base dependency (libxapian.so.30.12.4).
b530190 to
b325bff
Compare
Contributor
|
@veloman-yunkan Can we close this because kiwix/libkiwix#1230 superseed it? Do we have anything else left interesting which has not been put in kiwix/libkiwix#1230? |
Collaborator
Author
|
Superseded by kiwix/libkiwix#1230 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a less ambitious version of #994 intended to deliver a new feature in a more limited form as soon as possible.
Fixes #731 (will open other issues for future improvements)