[mypyc] Add `str.isalnum()` primitive by VaggelisD · Pull Request #20852 · python/mypy

VaggelisD · 2026-02-20T15:23:16Z

Added str.isalnum() similar to str.isspace().

One interesting thing to point out here is that the benchmarks decline in speed relative to the string's length:

All-alphanumeric	mypyc (s)	Python (s)	Speedup
length 1 (`'a'`)	0.645	2.036	3.16x
length 10 (`'abcde12345'`)	1.026	2.607	2.54x
length 100 (`'a' * 100`)	3.599	7.848	2.18x
length 1 (UCS-2: U+00E9 `é`)	0.816	1.976	2.42x
length 10 (UCS-2: U+00E9 * 10)	2.091	2.587	1.24x
length 100 (UCS-2: U+00E9 * 100)	14.298	7.814	0.55x

Non-alphanumeric (early exit)	mypyc (s)	Python (s)	Speedup
length 1 (`' '`)	0.622	2.006	3.22x
length 100 (`'!' * 100`)	0.617	2.024	3.28x
length 100 (`'a' * 99 + '!'`)	3.453	10.246	2.97x

Not entirely sure how to interpret this but could it be because the Py_UNICODE_ISALNUM calls 4 functions internally which is more optimized in CPython due to PGO & LTO (?)

VaggelisD · 2026-02-20T15:23:38Z

mypyc/doc/str_operations.rst

 * ``s1.find(s2: str)``
 * ``s1.find(s2: str, start: int)``
 * ``s1.find(s2: str, start: int, end: int)``
+* ``s.isspace()``


This was not documented in the str.isspace() PR, added it now

JukkaL · 2026-02-20T16:44:56Z

mypyc/lib-rt/str_ops.c

+
+    int kind = PyUnicode_KIND(str);
+    const void *data = PyUnicode_DATA(str);
+    for (Py_ssize_t i = 0; i < len; i++) {


Performance might increase if there was a separate loop for 2 byte and 4 byte kinds. This way the read operation wouldn't need to branch based on kind, which might result in better code. Can you try this out?

I tried it locally, it only slightly reduced the tail end (still 13+ seconds for the 2 byte 100 length one) so we'd still spot a significant regression for the larger strings.

I also tried calling PyObject_CallMethodNoArgs for larger strings in case we can fallback to the interpreter function but it doesn't make a difference; If it's the LTO/PGO inlining doing its magic we can't seem to reuse it at this point.

What is the preferred action here, do we still keep the primitive in the hopes that most strings are small or should mypyc always be at least on par or better than CPython?

This still looks better than CPython on average, as ASCII strings and short strings are common. To match CPython performance we might need to have a custom implementation of Py_UNICODE_ISALNUM, which doesn't seem worth it. I'll experiment with this a little, but this might be close to as good as we can easily achieve.

Sounds good! I also wondered what it'd take to mirror PY_UNICODE_ISALNUM, Claude suggested against it as CPython is using gettyperecord() at each internal function call which operates on its internal unicode database (supposedly, hard to replicate)

JukkaL · 2026-02-20T16:48:39Z

mypyc/lib-rt/str_ops.c

+    Py_ssize_t len = PyUnicode_GET_LENGTH(str);
+    if (len == 0) return false;
+
+    if (PyUnicode_IS_ASCII(str)) {


Could this be PyUnicode_KIND(obj) == PyUnicode_1BYTE_KIND instead? This would be needed if the loop below was split into dedicated 2/4 byte loops.

Switching from ASCII path to the 1 byte puts a very big dent on performance, I assume because Py_ISALNUM operates off a lookup table whereas Py_UNICODE_ISALNUM has 4 separate function calls in it:

All-alphanumeric ASCII fast path 1 Byte kind Speedup

length 1 ('a') 0.623 0.873 1.40x

length 10 ('abcde12345') 1.003 2.708 2.70x

length 100 ('a' * 100) 3.139 14.147 4.51x

Non-alphanumeric (early exit) ASCII fast path 1 Byte kind Speedup

length 1 (' ') 0.609 1.118 1.84x

length 100 ('!' * 100) 0.617 1.126 1.82x

length 100 ('a' * 99 + '!') 3.322 14.802 4.46x

However, I can combine all 4 cases (ASCII plus 3 byte kinds) and hide their for loops behind a macro

Judging from these numbers, it seems like checking any non-trivial string with Py_UNICODE_ISALNUM is already on par or worse than CPython

VaggelisD · 2026-02-23T18:42:19Z

This might be of interest: tobymao/sqlglot#7120

[mypyc] Add str.isalnum() primitive

ec85e7d

VaggelisD commented Feb 20, 2026

View reviewed changes

JukkaL reviewed Feb 20, 2026

View reviewed changes

VaggelisD mentioned this pull request Feb 24, 2026

[mypyc] Add str.isdigit() primitive #20893

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

[mypyc] Add `str.isalnum()` primitive#20852

[mypyc] Add `str.isalnum()` primitive#20852
VaggelisD wants to merge 1 commit intopython:masterfrom
VaggelisD:str_isalnum

VaggelisD commented Feb 20, 2026

Uh oh!

VaggelisD Feb 20, 2026

Uh oh!

JukkaL Feb 20, 2026

Uh oh!

VaggelisD Feb 23, 2026 •

edited

Loading

Uh oh!

JukkaL Feb 23, 2026

Uh oh!

VaggelisD Feb 23, 2026

Uh oh!

JukkaL Feb 20, 2026

Uh oh!

VaggelisD Feb 23, 2026 •

edited

Loading

Uh oh!

VaggelisD Feb 23, 2026

Uh oh!

VaggelisD commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

All-alphanumeric	ASCII fast path	1 Byte kind	Speedup
length 1 (`'a'`)	0.623	0.873	1.40x
length 10 (`'abcde12345'`)	1.003	2.708	2.70x
length 100 (`'a' * 100`)	3.139	14.147	4.51x

Non-alphanumeric (early exit)	ASCII fast path	1 Byte kind	Speedup
length 1 (`' '`)	0.609	1.118	1.84x
length 100 (`'!' * 100`)	0.617	1.126	1.82x
length 100 (`'a' * 99 + '!'`)	3.322	14.802	4.46x

Uh oh!

Comments

Conversation

VaggelisD commented Feb 20, 2026

Uh oh!

VaggelisD Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

JukkaL Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

VaggelisD Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JukkaL Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

VaggelisD Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

JukkaL Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

VaggelisD Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VaggelisD Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

VaggelisD commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

VaggelisD Feb 23, 2026 •

edited

Loading

VaggelisD Feb 23, 2026 •

edited

Loading