Skip to content

pre-encode non-ASCII bytes in sanitize_url before parsing#23053

Draft
thijsoo wants to merge 2 commits intotrunkfrom
22903-wpseo_utilssanitize_url-fails-on-path-with-unencoded-non-latin-characters
Draft

pre-encode non-ASCII bytes in sanitize_url before parsing#23053
thijsoo wants to merge 2 commits intotrunkfrom
22903-wpseo_utilssanitize_url-fails-on-path-with-unencoded-non-latin-characters

Conversation

@thijsoo
Copy link
Copy Markdown
Contributor

@thijsoo thijsoo commented Mar 6, 2026

Resolves #22903

Context

Summary

This PR can be summarized in the following changelog entry:

  • Updates url sanitization to better work with UTF-8

Relevant technical choices:

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Default Block/Gutenberg/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

QA can test this PR by following these steps:

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.
  • This PR also affects Yoast SEO for Google Docs. I have added a changelog entry starting with [yoast-doc-extension], added test instructions for Yoast SEO for Google Docs and attached the Google Docs Add-on label to this PR.

Documentation

  • I have written documentation for this change. For example, comments in the Relevant technical choices, comments in the code, documentation on Confluence / shared Google Drive / Yoast developer portal, or other.

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins that Yoast SEO provides integrations for.
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.
  • I have checked that the base branch is correctly set.
  • I have run grunt build:images and commited the results, if my PR introduces new images or SVGs.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label.
  • I have added my hours to the WBSO document.

Fixes #

wp_parse_url() (wrapping PHP's parse_url()) corrupts multibyte UTF-8
bytes in URL paths. Pre-encoding non-ASCII bytes (\x80-\xff) with
rawurlencode() converts them to percent-encoded ASCII before parsing,
fixing URLs with unencoded non-Latin characters like Farsi, Chinese,
and Cyrillic scripts.

Resolves #22903

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@thijsoo thijsoo added the changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog label Mar 6, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Mar 6, 2026

Pull Request Test Coverage Report for Build b3742e6331c09bf9716e3e3bb4d38d737fb31c3f

Details

  • 7 of 7 (100.0%) changed or added relevant lines in 1 file are covered.
  • 47 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.6%) to 53.323%

Files with Coverage Reduction New Missed Lines %
inc/class-wpseo-utils.php 47 39.01%
Totals Coverage Status
Change from base Build 655ef32d56e1796ff53d6012616cc430da9ffa1d: -0.6%
Covered Lines: 33513
Relevant Lines: 62824

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WPSEO_Utils::sanitize_url fails on path with unencoded non-latin characters

2 participants