Skip to content

Support non-ASCII Unicode in grammar rule names#2196

Merged
ehuss merged 1 commit intomasterfrom
TC/support-nonascii-in-grammar-rule-names
Mar 4, 2026
Merged

Support non-ASCII Unicode in grammar rule names#2196
ehuss merged 1 commit intomasterfrom
TC/support-nonascii-in-grammar-rule-names

Conversation

@traviscross
Copy link
Contributor

The grammar currently supports only ASCII rule names. We want to support non-ASCII Unicode symbols such as (bottom) since we plan to add that rule.

In this commit, we add is_name_start and is_name_continue predicates that centralize the decision of what can appear in a rule name. is_name_start accepts alphabetic characters, underscores, and non-ASCII characters; is_name_continue accepts alphanumeric characters, underscores, and non-ASCII characters.

We use is_name_start in the parse_expr1 condition that routes to parse_nonterminal. The previous condition (is_alphanumeric) was slightly misaligned with what parse_name actually accepts -- it included digits (which parse_name rejects) and excluded underscores (which parse_name accepts). Using is_name_start makes the dispatch condition match parse_name exactly.

The NAMES_RE regex in mdbook-spec encodes the same name-matching logic as a regex pattern, so let's add a comment tying it to the predicates.

cc @ehuss

The grammar currently supports only ASCII rule names.  We want to
support non-ASCII Unicode symbols such as `⊥` (bottom) since we plan
to add that rule.

In this commit, we add `is_name_start` and `is_name_continue`
predicates that centralize the decision of what can appear in a rule
name.  `is_name_start` accepts alphabetic characters, underscores,
and non-ASCII characters; `is_name_continue` accepts alphanumeric
characters, underscores, and non-ASCII characters.

We use `is_name_start` in the `parse_expr1` condition that routes to
`parse_nonterminal`.  The previous condition (`is_alphanumeric`) was
slightly misaligned with what `parse_name` actually accepts -- it
included digits (which `parse_name` rejects) and excluded underscores
(which `parse_name` accepts).  Using `is_name_start` makes the
dispatch condition match `parse_name` exactly.

The `NAMES_RE` regex in `mdbook-spec` encodes the same name-matching
logic as a regex pattern, so let's add a comment tying it
to the predicates.
@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Mar 4, 2026
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ehuss ehuss added this pull request to the merge queue Mar 4, 2026
Merged via the queue into master with commit caa4205 Mar 4, 2026
6 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Mar 4, 2026
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Mar 9, 2026
Update books

## rust-embedded/book

4 commits in 99d0341ff4e06757490af8fceee790c4ede50bc0..e88aa4403b4bf2071c8df9509160477e40179099
2026-02-28 20:13:44 UTC to 2026-02-28 20:07:25 UTC

- Clarify that a mini usb cable is used on the STM32F3DISCOVERY (rust-embedded/book#381)
- Update outdated qemu documentation (rust-embedded/book#403)
- Add TRACE32 to Debuggers section (rust-embedded/book#406)
- Add a link to Rust for Zephyr (rust-embedded/book#407)

## rust-lang/nomicon

4 commits in b8f254a991b8b7e8f704527f0d4f343a4697dfa9..cc6a6bae8c3bfa389974e533c54694662c1a9de6
2026-02-27 23:27:18 UTC to 2026-02-26 22:57:03 UTC

- Fix `Vec::push_all` ptr code in exception-safety (rust-lang/nomicon#418)
- Clarify parameter and argument compatibility (rust-lang/nomicon#516)
- Improve grammar in Variance section (rust-lang/nomicon#515)
- Explicit `extern "C"` ABI for FFI (rust-lang/nomicon#520)

## rust-lang/reference

7 commits in 50a1075e879be75aeec436252c84eef0fad489f4..c49e89cc8c7c2c43ca625a8d5b7ad9a53a9ce978
2026-03-04 15:39:00 UTC to 2026-03-01 06:34:18 UTC

- Resolve grammar rules in link reference definitions (rust-lang/reference#2198)
- Support non-ASCII Unicode in grammar rule names (rust-lang/reference#2196)
- Fix grammar for block comments (rust-lang/reference#2191)
- Fix an EN grammar error & add an item to place expr context list (rust-lang/reference#2189)
- Align attribute template with applied conventions (rust-lang/reference#2194)
- Update shebang (rust-lang/reference#2192)
- Remove RESERVED_NUMBER (rust-lang/reference#2193)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 9, 2026
Update books

## rust-embedded/book

4 commits in 99d0341ff4e06757490af8fceee790c4ede50bc0..e88aa4403b4bf2071c8df9509160477e40179099
2026-02-28 20:13:44 UTC to 2026-02-28 20:07:25 UTC

- Clarify that a mini usb cable is used on the STM32F3DISCOVERY (rust-embedded/book#381)
- Update outdated qemu documentation (rust-embedded/book#403)
- Add TRACE32 to Debuggers section (rust-embedded/book#406)
- Add a link to Rust for Zephyr (rust-embedded/book#407)

## rust-lang/nomicon

4 commits in b8f254a991b8b7e8f704527f0d4f343a4697dfa9..cc6a6bae8c3bfa389974e533c54694662c1a9de6
2026-02-27 23:27:18 UTC to 2026-02-26 22:57:03 UTC

- Fix `Vec::push_all` ptr code in exception-safety (rust-lang/nomicon#418)
- Clarify parameter and argument compatibility (rust-lang/nomicon#516)
- Improve grammar in Variance section (rust-lang/nomicon#515)
- Explicit `extern "C"` ABI for FFI (rust-lang/nomicon#520)

## rust-lang/reference

7 commits in 50a1075e879be75aeec436252c84eef0fad489f4..c49e89cc8c7c2c43ca625a8d5b7ad9a53a9ce978
2026-03-04 15:39:00 UTC to 2026-03-01 06:34:18 UTC

- Resolve grammar rules in link reference definitions (rust-lang/reference#2198)
- Support non-ASCII Unicode in grammar rule names (rust-lang/reference#2196)
- Fix grammar for block comments (rust-lang/reference#2191)
- Fix an EN grammar error & add an item to place expr context list (rust-lang/reference#2189)
- Align attribute template with applied conventions (rust-lang/reference#2194)
- Update shebang (rust-lang/reference#2192)
- Remove RESERVED_NUMBER (rust-lang/reference#2193)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 9, 2026
Update books

## rust-embedded/book

4 commits in 99d0341ff4e06757490af8fceee790c4ede50bc0..e88aa4403b4bf2071c8df9509160477e40179099
2026-02-28 20:13:44 UTC to 2026-02-28 20:07:25 UTC

- Clarify that a mini usb cable is used on the STM32F3DISCOVERY (rust-embedded/book#381)
- Update outdated qemu documentation (rust-embedded/book#403)
- Add TRACE32 to Debuggers section (rust-embedded/book#406)
- Add a link to Rust for Zephyr (rust-embedded/book#407)

## rust-lang/nomicon

4 commits in b8f254a991b8b7e8f704527f0d4f343a4697dfa9..cc6a6bae8c3bfa389974e533c54694662c1a9de6
2026-02-27 23:27:18 UTC to 2026-02-26 22:57:03 UTC

- Fix `Vec::push_all` ptr code in exception-safety (rust-lang/nomicon#418)
- Clarify parameter and argument compatibility (rust-lang/nomicon#516)
- Improve grammar in Variance section (rust-lang/nomicon#515)
- Explicit `extern "C"` ABI for FFI (rust-lang/nomicon#520)

## rust-lang/reference

7 commits in 50a1075e879be75aeec436252c84eef0fad489f4..c49e89cc8c7c2c43ca625a8d5b7ad9a53a9ce978
2026-03-04 15:39:00 UTC to 2026-03-01 06:34:18 UTC

- Resolve grammar rules in link reference definitions (rust-lang/reference#2198)
- Support non-ASCII Unicode in grammar rule names (rust-lang/reference#2196)
- Fix grammar for block comments (rust-lang/reference#2191)
- Fix an EN grammar error & add an item to place expr context list (rust-lang/reference#2189)
- Align attribute template with applied conventions (rust-lang/reference#2194)
- Update shebang (rust-lang/reference#2192)
- Remove RESERVED_NUMBER (rust-lang/reference#2193)
rust-timer added a commit to rust-lang/rust that referenced this pull request Mar 10, 2026
Rollup merge of #153619 - rustbot:docs-update, r=ehuss

Update books

## rust-embedded/book

4 commits in 99d0341ff4e06757490af8fceee790c4ede50bc0..e88aa4403b4bf2071c8df9509160477e40179099
2026-02-28 20:13:44 UTC to 2026-02-28 20:07:25 UTC

- Clarify that a mini usb cable is used on the STM32F3DISCOVERY (rust-embedded/book#381)
- Update outdated qemu documentation (rust-embedded/book#403)
- Add TRACE32 to Debuggers section (rust-embedded/book#406)
- Add a link to Rust for Zephyr (rust-embedded/book#407)

## rust-lang/nomicon

4 commits in b8f254a991b8b7e8f704527f0d4f343a4697dfa9..cc6a6bae8c3bfa389974e533c54694662c1a9de6
2026-02-27 23:27:18 UTC to 2026-02-26 22:57:03 UTC

- Fix `Vec::push_all` ptr code in exception-safety (rust-lang/nomicon#418)
- Clarify parameter and argument compatibility (rust-lang/nomicon#516)
- Improve grammar in Variance section (rust-lang/nomicon#515)
- Explicit `extern "C"` ABI for FFI (rust-lang/nomicon#520)

## rust-lang/reference

7 commits in 50a1075e879be75aeec436252c84eef0fad489f4..c49e89cc8c7c2c43ca625a8d5b7ad9a53a9ce978
2026-03-04 15:39:00 UTC to 2026-03-01 06:34:18 UTC

- Resolve grammar rules in link reference definitions (rust-lang/reference#2198)
- Support non-ASCII Unicode in grammar rule names (rust-lang/reference#2196)
- Fix grammar for block comments (rust-lang/reference#2191)
- Fix an EN grammar error & add an item to place expr context list (rust-lang/reference#2189)
- Align attribute template with applied conventions (rust-lang/reference#2194)
- Update shebang (rust-lang/reference#2192)
- Remove RESERVED_NUMBER (rust-lang/reference#2193)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants