Skip to content

Fix GH-19088: Make sscanf %c read whitespace#22041

Open
prateekbhujel wants to merge 6 commits into
php:masterfrom
prateekbhujel:prateekbhujel/fix-gh-19088-sscanf-c-whitespace
Open

Fix GH-19088: Make sscanf %c read whitespace#22041
prateekbhujel wants to merge 6 commits into
php:masterfrom
prateekbhujel:prateekbhujel/fix-gh-19088-sscanf-c-whitespace

Conversation

@prateekbhujel
Copy link
Copy Markdown
Contributor

Fixes GH-19088.

%c already disables the usual leading whitespace skip, but after that it still reused the %s scan path, which stops as soon as it sees whitespace. That made sscanf(' ', '%c%n', ...) assign an empty string and leave the offset at 0.

Keep %c separate from %s in the scan loop so it consumes the requested number of characters, including spaces. The same scanner is used by fscanf(), so the existing %c expectations there are updated as well.

Tests run:

sapi/cli/php run-tests.php -q ext/standard/tests/strings/gh19088.phpt
sapi/cli/php run-tests.php -q ext/standard/tests/strings/gh19088.phpt ext/standard/tests/strings/sscanf_basic4.phpt ext/standard/tests/strings/bug21730.phpt ext/standard/tests/file/fscanf_variation20.phpt ext/standard/tests/file/fscanf_variation21.phpt ext/standard/tests/file/fscanf_variation22.phpt ext/standard/tests/file/fscanf_variation23.phpt ext/standard/tests/file/fscanf_variation24.phpt ext/standard/tests/file/fscanf_variation25.phpt ext/standard/tests/file/fscanf_variation26.phpt
git diff --check

Keep the %c conversion separate from %s so it reads whitespace instead of stopping at it. The %c conversion already suppresses leading whitespace skipping; it also must not treat whitespace as the end of the field.
Comment thread ext/standard/tests/strings/gh19088.phpt
Comment thread ext/standard/scanf.c Outdated
@prateekbhujel prateekbhujel force-pushed the prateekbhujel/fix-gh-19088-sscanf-c-whitespace branch from acdb609 to 08ea8fb Compare May 14, 2026 21:16
@prateekbhujel
Copy link
Copy Markdown
Contributor Author

@devnexen quick follow-up on this one: the extra %c cases and the UPGRADING note are in now, and the checks are green. Is there anything else you would want changed here before merge?

Copy link
Copy Markdown
Contributor

@hakre hakre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you summarize if/how %c differs now from the original code? (see file comment)

Comment thread ext/standard/scanf.c Outdated
Comment thread UPGRADING Outdated
@prateekbhujel
Copy link
Copy Markdown
Contributor Author

can you summarize if/how %c differs now from the original code? (see file comment)

Sure. Previously %c already did not skip leading whitespace, but once the conversion started it still stopped when the next input character was whitespace, because it shared the %s stop condition.

Now %c consumes the next character(s) exactly, including whitespace. So sscanf(" hi", "%c") reads " ", and a width like %3c can include spaces. %s is unchanged and still stops at whitespace.

@hakre
Copy link
Copy Markdown
Contributor

hakre commented May 26, 2026

can you summarize if/how %c differs now from the original code? (see file comment)

Sure. Previously %c already did not skip leading whitespace, but once the conversion started it still stopped when the next input character was whitespace, because it shared the %s stop condition.

Now %c consumes the next character(s) exactly, including whitespace. So sscanf(" hi", "%c") reads " ", and a width like %3c can include spaces. %s is unchanged and still stops at whitespace.

Thanks for the detailed clarification – that matches exactly what I expected. The key point is that %c was already not skipping leading whitespace (correct), but the op = 's' alias made it stop at whitespace, which was the actual bug. So this PR restores the ANSI C / Tcl behavior that was lost in the original PHP port.

Given that this is a straightforward bug fix (and not a new feature or intentional behavioral change), it would be great to treat it as such – i.e., target the oldest supported branch where the bug exists, and remove the UPGRADING note (or replace it with a bug‑fix entry in the changelog). That way users on stable versions get the correct behavior too.

I mentioned the file comment in scanf.c because it explicitly references the Tcl 8.3.0 original code. If you haven’t already, it might be interesting to cross‑check that Tcl implementation – it already had op = 'c' and SCAN_NOSKIP for %c, so the PHP bug was indeed a porting oversight.

Happy to help with any further testing or wording if needed. 😊

@prateekbhujel
Copy link
Copy Markdown
Contributor Author

Ah yes, that makes sense.

I added the UPGRADING note because the visible result changes, but I agree with your framing here: this is fixing the %c handling from the old port, not adding a new conversion behavior. %c already had SCAN_NOSKIP; the bug was that it was still sharing the %s whitespace stop path.

I’ll remove the UPGRADING note and keep this as a bug fix. For the target branch, I’ll follow whichever oldest supported branch the maintainers want this based on, rather than treating it as master-only. I’ll also cross-check the Tcl code before the next update so the PR text is clear about why this is a porting oversight.

Thanks, this helped make the direction much clearer.

@hakre
Copy link
Copy Markdown
Contributor

hakre commented May 26, 2026

That’s a great direction – thanks for taking the time to dig into the Tcl history. One small nuance: @devnexen has a point that any change in visible behavior, even a bug fix, can be a BC break for code that relied on the old (broken) %c behavior. A short UPGRADING note that says something like “%c now correctly reads whitespace characters (restoring ANSI C behavior); previously it stopped at whitespace” might still be helpful, even if we frame it as a correction rather than a new feature. That way users are warned, and the fix stays clearly classified as a bug fix.

Happy to help with the exact wording if needed. 😊

@prateekbhujel
Copy link
Copy Markdown
Contributor Author

Yes, agreed. That is the better balance.

I put the UPGRADING note back, but worded it as a correction rather than a new feature:

%c conversions now correctly read whitespace characters, restoring ANSI C behavior. Previously, they stopped at whitespace.

Pushed that in d88f22eab35. Thanks for pointing out the BC angle here — it makes sense to warn users while still keeping the PR framed as a bug fix.

@hakre
Copy link
Copy Markdown
Contributor

hakre commented May 26, 2026

Given that this is a bug fix that isn't a security fix,
the contributing guide suggests targeting the lowest actively supported branch where the bug exists — if I'm reading it right, that would be 8.4 at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sscanf c conversion specifier

3 participants