Skip to content

fix: helper function to strip mime params#1050

Merged
kelson42 merged 1 commit intoopenzim:mainfrom
Pranavjeet-Naidu:fix/counterhandler_MIMEstripping
Apr 3, 2026
Merged

fix: helper function to strip mime params#1050
kelson42 merged 1 commit intoopenzim:mainfrom
Pranavjeet-Naidu:fix/counterhandler_MIMEstripping

Conversation

@Pranavjeet-Naidu
Copy link
Copy Markdown
Contributor

@Pranavjeet-Naidu Pranavjeet-Naidu commented Mar 20, 2026

Fixes #1000

Basically a helper function that strips MIME-types params.
should fix the article count issue with the regex validation in zim-tools, this in libzim.

@kelson42 kelson42 requested a review from veloman-yunkan March 21, 2026 08:55
Comment thread src/writer/counterHandler.cpp Outdated
Comment thread src/writer/counterHandler.cpp Outdated
@Pranavjeet-Naidu Pranavjeet-Naidu force-pushed the fix/counterhandler_MIMEstripping branch from 2200ab0 to b9ee28c Compare March 23, 2026 22:08
@Pranavjeet-Naidu
Copy link
Copy Markdown
Contributor Author

@kelson42 is the RTD fail because of the rebase or something else that I have to fix?

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.29%. Comparing base (3977fd1) to head (21d7594).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/writer/counterHandler.cpp 0.00% 1 Missing and 3 partials ⚠️

❌ Your patch status has failed because the patch coverage (33.33%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1050      +/-   ##
==========================================
- Coverage   56.31%   56.29%   -0.02%     
==========================================
  Files         101      101              
  Lines        5031     5036       +5     
  Branches     2194     2196       +2     
==========================================
+ Hits         2833     2835       +2     
- Misses        737      738       +1     
- Partials     1461     1463       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a small internal helper to normalize MIME types by removing parameters (everything after ;), and uses it when generating the writer “Counter” metadata so downstream tooling (zim-tools) can reliably validate/count articles by base MIME type.

Changes:

  • Add zim::stripMimeParameters() to strip MIME parameters and trim whitespace before the first ;.
  • Apply MIME-parameter stripping in CounterHandler before incrementing the MIME counter.
  • Add unit tests covering common and edge-case MIME strings for the new helper.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/tooltesting.cpp Adds tests for stripMimeParameters() behavior across typical and edge inputs.
src/tools.h Declares the new private helper API.
src/tools.cpp Implements stripMimeParameters().
src/writer/counterHandler.h Adds <string> include for the std::string counter key type.
src/writer/counterHandler.cpp Normalizes MIME types (strip params) before counting, preventing parameter variants from inflating counts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/tools.cpp Outdated
Comment thread src/writer/counterHandler.cpp Outdated
@Pranavjeet-Naidu
Copy link
Copy Markdown
Contributor Author

Pranavjeet-Naidu commented Mar 24, 2026

@kelson42 the RTD fail is fixed, and about the codecov fail, ill need to add an integration test for the uncovered lines:

auto cleanMimetype = zim::stripMimeParameters(mimetype);
if (cleanMimetype.empty()) {
  return;
}
m_mimetypeCounter[cleanMimetype] += 1;

It'll probably be added in src/writer/creator.cpp, but doesn't seem necessary and the partials also seem unnecessary, please confirm. If no more changes are needed, I'll squash again.

@kelson42
Copy link
Copy Markdown
Contributor

@Pranavjeet-Naidu Thank younfor your PR, but I guess you have linked the wrong issue in your PR description!?

@Pranavjeet-Naidu
Copy link
Copy Markdown
Contributor Author

@Pranavjeet-Naidu Thank younfor your PR, but I guess you have linked the wrong issue in your PR description!?

Actually nope. I linked this issue because of your reply in the end over there. You concluded saying we have to strip MIME type paras. Please let me know if there's a better issue to link instead.

@Pranavjeet-Naidu
Copy link
Copy Markdown
Contributor Author

@kelson42 the RTD fail is fixed, and about the codecov fail, ill need to add an integration test for the uncovered lines:

auto cleanMimetype = zim::stripMimeParameters(mimetype);
if (cleanMimetype.empty()) {
  return;
}
m_mimetypeCounter[cleanMimetype] += 1;

It'll probably be added in src/writer/creator.cpp, but doesn't seem necessary and the partials also seem unnecessary, please confirm. If no more changes are needed, I'll squash again.

@kelson42 @veloman-yunkan any update on this? I have the test for these lines ready if needed, but please let me know before I proceed. The test just seems like redundant code considering all it does is check for empty. Wouldn't the unit tests written in tooltesting.cpp suffice?

Copy link
Copy Markdown
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please simplify the zim::stripMimeParameters() function and squash all your changes into a single commit.

Comment thread src/tools.cpp
@veloman-yunkan
Copy link
Copy Markdown
Collaborator

Also this branch must be rebased on main

@veloman-yunkan
Copy link
Copy Markdown
Collaborator

about the codecov fail...

Don't mind about codecov failures if they seem dubious.

@Pranavjeet-Naidu Pranavjeet-Naidu force-pushed the fix/counterhandler_MIMEstripping branch from 0dfc3ff to c2cb719 Compare March 31, 2026 18:02
@Pranavjeet-Naidu
Copy link
Copy Markdown
Contributor Author

@veloman-yunkan I would squash the empty commit as well and force push but I'm worried RTD will break again :(

Copy link
Copy Markdown
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get rid of the empty commit and don't worry about RTD.

Comment thread src/tools.cpp Outdated
@Pranavjeet-Naidu Pranavjeet-Naidu force-pushed the fix/counterhandler_MIMEstripping branch from 7694255 to 2e481a2 Compare April 1, 2026 15:21
Copy link
Copy Markdown
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have approved this PR but in the meantime another PR has been merged in this repository so this PR has to be rebased. While doing it please note the new comments. If you choose to address them, please don't create new commits (just amend the only commit of this PR).

Comment thread src/writer/counterHandler.h Outdated
Comment thread src/writer/counterHandler.cpp Outdated
Comment thread src/tools.cpp Outdated
@Pranavjeet-Naidu Pranavjeet-Naidu force-pushed the fix/counterhandler_MIMEstripping branch from 2e481a2 to 63f885b Compare April 1, 2026 20:51
@kelson42
Copy link
Copy Markdown
Contributor

kelson42 commented Apr 2, 2026

@veloman-yunkan Last review pass... hopefully :)

Copy link
Copy Markdown
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The #include of "../tools.h" should have stayed in src/writer/counterHandler.cpp. It is now included indirectly in a secretive way which explains why compilation doesn't fail. But in general, you shouldn't depend on such luck and must include explicitly the headers on which your code relies.

@kelson42
Copy link
Copy Markdown
Contributor

kelson42 commented Apr 3, 2026

The #include of "../tools.h" should have stayed in src/writer/counterHandler.cpp. It is now included indirectly in a secretive way which explains why compilation doesn't fail. But in general, you shouldn't depend on such luck and must include explicitly the headers on which your code relies.

@Pranavjeet-Naidu Please fix this and we will merge.

@Pranavjeet-Naidu Pranavjeet-Naidu force-pushed the fix/counterhandler_MIMEstripping branch from 63f885b to 1fc37a3 Compare April 3, 2026 04:15
@Pranavjeet-Naidu
Copy link
Copy Markdown
Contributor Author

The #include of "../tools.h" should have stayed in src/writer/counterHandler.cpp. It is now included indirectly in a secretive way which explains why compilation doesn't fail. But in general, you shouldn't depend on such luck and must include explicitly the headers on which your code relies.

@Pranavjeet-Naidu Please fix this and we will merge.

It's up right now, thank you for being patient :)

@kelson42
Copy link
Copy Markdown
Contributor

kelson42 commented Apr 3, 2026

@veloman-yunkan please confirm all good

Copy link
Copy Markdown
Collaborator

@veloman-yunkan veloman-yunkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and I bet that it should look good to at least 99% percent of those people who are qualified enough to tell if C++ code can look good at all.

Signed-off-by: Pranavjeet-Naidu <pranavjeetnaidu@gmail.com>
@kelson42 kelson42 force-pushed the fix/counterhandler_MIMEstripping branch from 1fc37a3 to 21d7594 Compare April 3, 2026 08:59
@kelson42 kelson42 merged commit d0e1535 into openzim:main Apr 3, 2026
26 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wikipedia_en_top_all has 829k entries instead of 50k

4 participants