Skip to content

Pull requests: huggingface/tokenizers

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add scaling_bench: encode_batch vs worker-pool comparison (#1900)
#2048 opened May 1, 2026 by stargazerZJ Loading…
5 of 6 tasks
WIP: testing only
#2047 opened Apr 30, 2026 by assafvayner Draft
4 tasks
Batch encode: simple lock-free scheduler
#2044 opened Apr 28, 2026 by sebpop Contributor Loading…
feat(ByteLevel): skip per-byte transform for printable-ASCII tokens
#2038 opened Apr 26, 2026 by KimYannn Loading…
2 of 3 tasks
feat(NFC): skip Unicode pass for all-ASCII inputs
#2037 opened Apr 26, 2026 by KimYannn Loading…
2 of 3 tasks
feat: SIMD ASCII fast path for Lowercase normalizer (~30-49x)
#2036 opened Apr 26, 2026 by KimYannn Loading…
6 of 7 tasks
V0.23 release
#2032 opened Apr 24, 2026 by ArthurZucker Collaborator Loading…
perf: skip alignment tracking in encode_fast normalization
#2022 opened Apr 10, 2026 by ArthurZucker Collaborator Loading…
Reduce crate size
#2015 opened Apr 9, 2026 by ArthurZucker Collaborator Loading…
node: bump version to 0.22.2 for release
#2009 opened Apr 4, 2026 by MayCXC Contributor Loading…
feat(pattern): parallel regex find_matches for large inputs
#2003 opened Mar 31, 2026 by McPatate Member Loading…
fix: skip serializing ByteLevel fields at their default value
#2001 opened Mar 30, 2026 by ArthurZucker Collaborator Loading…
Regex split parity
#1991 opened Mar 27, 2026 by ArthurZucker Collaborator Loading…
feat: add new faster whitespace split pretok
#1985 opened Mar 26, 2026 by McPatate Member Loading…
Implementing Parity-aware BPE
#1974 opened Mar 21, 2026 by cimeister Loading…
ProTip! Filter pull requests by the default branch with base:main.