Save the errors in the state, not just the error counts. Makes it ea…#138
Save the errors in the state, not just the error counts. Makes it ea…#138
Conversation
|
Should we worry about memory in cases where a treebank has over 260k incidents? |
|
Good point, we also haven't considered this when saving errors to dump into json. I'm thinking about two possible strategies (but it's just a random idea, I have to give it a little more thought):
|
|
I like @ellepannitto's first idea (using the list as an array of length n, unbounded if the user passes, say, -1). Should we add this to the to-do list in #132 too? (For context: our rewrite addresses the exact same problem, although it does not use the state to do so. If I recall correctly, our solution is described in the text of our draft pull request). |
There is already the option called Now either this option could also regulate the number of errors saved and returned. Or there could be a similar option so that errors printed and errors returned are regulated separately. |
Sure, it'd be easy enough to redo it so the counts are still kept separately, but there's a field that keeps the errors up to --max_err and then stops |
…ier for a calling program to use the results Only keep track of --max_err errors, but still count all the errors
32e6649 to
9ad3a82
Compare
|
(updated the PR) |
Actually, after a bit more thinking, I would control the two requirements separately. In the current on-line validation, I print all errors to the log (i.e., |
|
Added a flag for that, as well |
Save the errors in the state, not just the error counts. Makes it easier for a calling program to use the results