[fix](paimon) infer manifest format from split file format in cpp reader#60795
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
Adjusts Doris BE’s Paimon C++ reader option construction to avoid incorrect/default manifest format selection when FE scan ranges omit table options, by inferring formats from split metadata.
Changes:
- Infer
paimon::Options::FILE_FORMATfrom split-levelpaimon_params.file_formatwhen the option is missing/empty. - Infer
paimon::Options::MANIFEST_FORMATfrom split-levelpaimon_params.file_formatwhen the option is missing/empty.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // FE currently may not pass paimon table options in scan ranges. | ||
| // Avoid paimon-cpp defaulting manifest.format to avro when split file format is known. | ||
| if (_range.__isset.table_format_params && _range.table_format_params.__isset.paimon_params && | ||
| _range.table_format_params.paimon_params.__isset.file_format && | ||
| !_range.table_format_params.paimon_params.file_format.empty()) { | ||
| const auto& split_file_format = _range.table_format_params.paimon_params.file_format; | ||
| auto file_format_it = options.find(paimon::Options::FILE_FORMAT); | ||
| if (file_format_it == options.end() || file_format_it->second.empty()) { | ||
| options[paimon::Options::FILE_FORMAT] = split_file_format; | ||
| } | ||
| auto manifest_format_it = options.find(paimon::Options::MANIFEST_FORMAT); | ||
| if (manifest_format_it == options.end() || manifest_format_it->second.empty()) { | ||
| options[paimon::Options::MANIFEST_FORMAT] = split_file_format; | ||
| } | ||
| } |
There was a problem hiding this comment.
New option inference logic isn’t covered by existing PaimonCppReader unit tests. Consider adding a test that asserts when split-level file_format is set and options lacks/has empty paimon::Options::FILE_FORMAT / MANIFEST_FORMAT, _build_options() (or an observable init path) populates them, and does not override non-empty values.
| // FE currently may not pass paimon table options in scan ranges. | ||
| // Avoid paimon-cpp defaulting manifest.format to avro when split file format is known. | ||
| if (_range.__isset.table_format_params && _range.table_format_params.__isset.paimon_params && | ||
| _range.table_format_params.paimon_params.__isset.file_format && | ||
| !_range.table_format_params.paimon_params.file_format.empty()) { | ||
| const auto& split_file_format = _range.table_format_params.paimon_params.file_format; | ||
| auto file_format_it = options.find(paimon::Options::FILE_FORMAT); | ||
| if (file_format_it == options.end() || file_format_it->second.empty()) { | ||
| options[paimon::Options::FILE_FORMAT] = split_file_format; | ||
| } | ||
| auto manifest_format_it = options.find(paimon::Options::MANIFEST_FORMAT); | ||
| if (manifest_format_it == options.end() || manifest_format_it->second.empty()) { | ||
| options[paimon::Options::MANIFEST_FORMAT] = split_file_format; | ||
| } | ||
| } |
There was a problem hiding this comment.
This behavior is slightly broader than the PR description (“only when table options are missing/empty”): it will also set FILE_FORMAT/MANIFEST_FORMAT when other table options are present but just these keys are absent/empty. If that’s intended, the PR description should be updated; if not, consider tightening the condition to only apply when the table-level paimon options map is missing/empty.
|
run buildall |
TPC-H: Total hot run time: 28640 ms |
TPC-DS: Total hot run time: 183793 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This reverts commit 72b52c1.
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
TPC-H: Total hot run time: 28877 ms |
TPC-DS: Total hot run time: 184287 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
…1-rc01 (#61125) ## Summary Backport paimon-cpp reader integration chain into tmp-branch-2.1.11-rc01-paimon-cpp as a single squashed change. Included upstream PRs: - #60296 - #60676 - #60711 - #60730 - #60795 - #60883 - #60946 ## Notes - This PR is intentionally squashed into one commit for release-branch delivery. - Local uncommitted changes in the original workspace were not touched; work was done in an isolated worktree.
…der (apache#60795) ## Problem Followup apache#60676 When FE does not pass full table options in scan ranges, paimon-cpp may default manifest.format to avro. For non-avro environments, this can fail in PaimonCppReader initialization with: Could not find a FileFormatFactory implementation class for format avro. ## Solution In PaimonCppReader::_build_options, if split-level file_format exists and table options are missing/empty: - set file.format from split file_format - set manifest.format from split file_format This keeps paimon-cpp format resolution consistent with the actual split format and avoids unintended avro fallback. ## Verification - Incremental BE build succeeded for doris_be target. - Change scope is limited to be/src/vec/exec/format/table/paimon_cpp_reader.cpp.
…der (apache#60795) ## Problem Followup apache#60676 When FE does not pass full table options in scan ranges, paimon-cpp may default manifest.format to avro. For non-avro environments, this can fail in PaimonCppReader initialization with: Could not find a FileFormatFactory implementation class for format avro. ## Solution In PaimonCppReader::_build_options, if split-level file_format exists and table options are missing/empty: - set file.format from split file_format - set manifest.format from split file_format This keeps paimon-cpp format resolution consistent with the actual split format and avoids unintended avro fallback. ## Verification - Incremental BE build succeeded for doris_be target. - Change scope is limited to be/src/vec/exec/format/table/paimon_cpp_reader.cpp.
…61379) ## Summary - Cherry-pick - #60676: [feat](paimon) integrate paimon-cpp reader - #60795: [fix](paimon) infer manifest format from split file format in cpp reader - #60883 ## Conflict Resolution - `gensrc/thrift/PaloInternalService.thrift`: kept both new fields from branch-4.1 and the PR (200: `enable_adjust_conjunct_order_by_cost`, 201: `enable_paimon_cpp_reader`, 202: `single_backend_query`) --------- Co-authored-by: morningman <yunyou@selectdb.com>
Problem
Followup #60676
When FE does not pass full table options in scan ranges, paimon-cpp may default manifest.format to avro.
For non-avro environments, this can fail in PaimonCppReader initialization with:
Could not find a FileFormatFactory implementation class for format avro.
Solution
In PaimonCppReader::_build_options, if split-level file_format exists and table options are missing/empty:
This keeps paimon-cpp format resolution consistent with the actual split format and avoids unintended avro fallback.
Verification