Conversation
zdenop
commented
Dec 18, 2024
- use std::filesystem::path instead of std::string for datadir
- add warning if datadir is not directory or does not exists
…g because it points at a temporary instance that was destroyed.`)
egorpugin
left a comment
There was a problem hiding this comment.
Probably all is_directory() checks are unnecessary. User will just get any other random error if he points to a file instead of to a dir.
|
@stweil : Thank you for review. Unittests needs be fixed yet. |
|
Old tprintfs should be removed slowly in favor of std::format. |
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Isn't the tesserr stream even better? |
|
Streams are better than tprintf. compared to Having even more insertion points the original string can be hardly readable compared to a string with |
I mean with |
|
I've slightly updated |
Now I get numerous compiler warnings: And the code size is larger, too. |
|
|
Can you try changing it to ? |
The suggested change still produces the compiler warnings, but I think it improves the code because like that calling |
|
It will be an error too if we call |
|
Added format in d95e9f7 I'll check ci for other warnings. |
|
I don't see warnings on gcc-14 (even without format). Possible solution is |
|
@egorpugin : I cherry-picked your commits regarding tprintf to my local branch it does not work for 'tessdata_path.string()'. |
|
How do you use it? |
|
I am sorry, it looks like I messed something up... Today I started from clean setup and it worked for me: |
|
I looks like unittest fails if If I made simplified test based on |
|
Ignore unit test errors, they can be fixed after. |
src/api/baseapi.cpp
Outdated
| */ | ||
| static void addAvailableLanguages(const std::string &datadir, | ||
| std::vector<std::string> *langs) { | ||
| if (!std::filesystem::is_directory(datadir)) |
There was a problem hiding this comment.
Do we still want this check? My impression is that we agreed to prefer reporting the exception which is thrown without this conditional code.
There was a problem hiding this comment.
If throwing an exception is the agreed-upon approach, then this check is no longer necessary.
stweil
left a comment
There was a problem hiding this comment.
We cannot merge this PR as long as several CI tests fail. Some of them seem to be related to layout recognition (for example layout_test) and could indicate a severe regression.
| */ | ||
| void CCUtil::main_setup(const std::string &argv0, const std::string &basename) { | ||
| imagebasename = basename; /**< name of image */ | ||
| datadir = find_data_path(argv0); |
There was a problem hiding this comment.
The old datadir was always terminated with a '/' which is missing for the new datadir. This is an API change, but it might also be it is not the reason for the failing CI tests.
| tprintf("Error opening data file %s\n", tessdata_path.c_str()); | ||
| tprintf( | ||
| if (!mgr->is_loaded() && !mgr->Init(tessdata_path.string().c_str())) { | ||
| tesserr << "Error opening data file " << tessdata_path.string() << '\n' << |
There was a problem hiding this comment.
| tesserr << "Error opening data file " << tessdata_path.string() << '\n' << | |
| tesserr << "Error opening data file " << tessdata_path << '\n' << |
There was a problem hiding this comment.
Regarding the difference in path formatting between using tesseract.exe a b -l c with tessdata_path.string() versus tessdata_path:
tessdata_path.string()produces a native OS path (e.g., on Windows):
Error opening data file X:\Projects\tesseract\c.traineddata
tessdata_pathproduces an escaped path:
Error opening data file "X:\\Projects\\tesseract\\c.traineddata"
I believe it’s more user-friendly and intuitive to display the native OS path. Is there a specific reason we should use the escaped version instead? If not, I recommend sticking with the native path for error messages.
| "not present in %s!!\n", | ||
| tessdata_path.c_str()); | ||
| tesserr << "Error: Tesseract (legacy) engine requested, but components are " | ||
| "not present in " << tessdata_path.string() << "!!\n"; |
There was a problem hiding this comment.
| "not present in " << tessdata_path.string() << "!!\n"; | |
| "not present in " << tessdata_path << "!!\n"; |
There was a problem hiding this comment.
The same as above: I recommend sticking with the native path for error messages
| */ | ||
| void CCUtil::main_setup(const std::string &argv0, const std::string &basename) { | ||
| imagebasename = basename; /**< name of image */ | ||
| datadir = find_data_path(argv0); |
There was a problem hiding this comment.
| datadir = find_data_path(argv0); | |
| datadir = find_data_path(argv0) / ""; |
There was a problem hiding this comment.
Previously, datadir was a string, and appending a / was used as a workaround to handle cases where users omitted the directory separator.
However, with datadir now being a std::filesystem::path, this manual string manipulation is no longer necessary. The path class automatically normalizes path separators and handles path operations (like concatenation) correctly for the underlying OS.
As a result, we can safely remove the workaround and rely on the standard library to manage path separators and concatenation.
Or do I miss something?
| if (!mgr->is_loaded() && !mgr->Init(tessdata_path.c_str())) { | ||
| tprintf("Error opening data file %s\n", tessdata_path.c_str()); | ||
| tprintf( | ||
| if (!mgr->is_loaded() && !mgr->Init(tessdata_path.string().c_str())) { |
There was a problem hiding this comment.
| if (!mgr->is_loaded() && !mgr->Init(tessdata_path.string().c_str())) { | |
| if (!mgr->is_loaded() && !mgr->Init(tessdata_path.c_str())) { |
There was a problem hiding this comment.
This change would broke msvc build:
error C2664: 'bool tesseract::TessdataManager::Init(const char *)': cannot convert argument 1 from 'const std: :filesystem::path::value_type *' to 'const char *'
src/ccmain/tessedit.cpp
Outdated
| bool set_only_non_debug_params, TessdataManager *mgr) { | ||
| // Set the language data path prefix | ||
| lang = !language.empty() ? language : "eng"; | ||
| language_data_path_prefix = datadir; |
There was a problem hiding this comment.
language_data_path_prefix is a class variable which is no longer set in the new code. This causes the CI failures.
Co-authored-by: Stefan Weil <sw@weilnetz.de>
