Open
Conversation
- Removed extra whitespaces found in dictionaray input file
- removed an irrelevant line from the input file
- ran the new code and generated output
- output is in data/dict.txt
- entries not parsed by grammar are in error.log
- Changes in the Grammar
- an entry is valid even if there is a period at the end of the line
- a pos can be terminated with either a fullstop or a comma
- comma is a typo
- glosses can be terminated with fullcolon
- full colons are typo
- attempted to support phrase entries (multiple words in headword)
- failed, hence commented out the code
- bailey now generate an sfm output (in MDF) of the input text
- better handling of input file
- in case of a mal constructructed line in the input text,
- bailey will copy the line to error.log
- continue to parse the next line
- the parsed output will be stored in dict.txt file
This has been recreated from the MDF documentation.
beniza
commented
Jan 7, 2020
| entry = hash headword comma pos ws senses subentry period emptyline | ||
| # entry = hash headphrase comma pos ws senses subentry period emptyline | ||
| hash = (~"#")* | ||
| # headphrase = headword (ws headword)* |
Author
There was a problem hiding this comment.
I've added this to capture the head words with multiple words. Most of the exception are due to this.
However I couldn't get this to work.
beniza
commented
Jan 7, 2020
| sense = (ml ws ml)* ml | ||
| ml = ~"[\u0d00-\u0d7f]*" | ||
| semicolon = ~";" | ||
| semicolon = ~"[;:]" |
Author
There was a problem hiding this comment.
in many places the keyboardists made typos where they put a : in the place of ;. Since we are not preserving the data, I thought of bypassing them.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've made the following changes
Please review the code before merge. The rest of the files/data may be merged directly