Skip to content

Add APIs for dealing with titlecase#122668

Open
Jules-Bertholet wants to merge 10 commits intorust-lang:mainfrom
Jules-Bertholet:titlecase
Open

Add APIs for dealing with titlecase#122668
Jules-Bertholet wants to merge 10 commits intorust-lang:mainfrom
Jules-Bertholet:titlecase

Conversation

@Jules-Bertholet
Copy link
Contributor

@Jules-Bertholet Jules-Bertholet commented Mar 17, 2024

ACP: rust-lang/libs-team#354
Tracking issue: #153892

r? libs-api

@rustbot label T-libs -T-libs-api A-unicode

The last commit has some insta-stable PartialEq impls, therefore: @rustbot label -needs-fcp
Alternatively, I could split those out into a follow-up PR.
(Edit: will do in follow-up)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. A-Unicode Area: Unicode S-waiting-on-ACP Status: PR has an ACP and is waiting for the ACP to complete. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Mar 17, 2024
@bors
Copy link
Collaborator

bors commented Mar 29, 2024

☔ The latest upstream changes (presumably #122616) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Collaborator

bors commented Apr 20, 2024

☔ The latest upstream changes (presumably #122013) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC Dylan-DPC removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 3, 2024
@rustbot
Copy link
Collaborator

rustbot commented Nov 24, 2025

library/core/src/unicode/unicode_data.rs is generated by the src/tools/unicode-table-generator tool.

If you want to modify unicode_data.rs, please modify the tool then regenerate the library source file via ./x run src/tools/unicode-table-generator instead of editing unicode_data.rs manually.

@rustbot

This comment has been minimized.

@Jules-Bertholet
Copy link
Contributor Author

Jules-Bertholet commented Nov 24, 2025

The libs team gave a positive response to the ACP. @rustbot label -S-waiting-on-ACP S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. and removed S-waiting-on-ACP Status: PR has an ACP and is waiting for the ACP to complete. labels Nov 24, 2025
@rust-bors

This comment has been minimized.

@rustbot

This comment has been minimized.

@rust-bors

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Mar 15, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rustbot rustbot removed the needs-fcp This change is insta-stable, or significant enough to need a team FCP to proceed. label Mar 15, 2026
@rustbot rustbot removed the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 15, 2026
@Jules-Bertholet
Copy link
Contributor Author

@rustbot reroll

@rustbot rustbot assigned Mark-Simulacrum and unassigned m-ou-se Mar 15, 2026
c => c > '\x7f' && unicode::Alphabetic(c),
'A'..='Z' | 'a'..='z' => true,
'\0'..='\u{A9}' => false,
_ => unicode::Alphabetic(self),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is from "Extend ASCII fast paths of char methods beyond ASCII"...

Can you confirm we continue to have exhaustive coverage in tests? I'm not sure how much the generated tests by the test generator call the public methods vs check that unicode::Alphabetic (for example) is accurate.

Also, I imagine that flipping the order of capital A-Z vs. lowercase a-z might influence codegen, and lowercase seems more likely to be common. Maybe worth doing something different there?

How did you decide on the particular threshold here (and in other modified functions)? Maybe we can split this out to a separate PR?

Copy link
Contributor Author

@Jules-Bertholet Jules-Bertholet Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I imagine that flipping the order of capital A-Z vs. lowercase a-z might influence codegen, and lowercase seems more likely to be common. Maybe worth doing something different there?

Good point, I fixed it.

How did you decide on the particular threshold here (and in other modified functions)?

I chose the highest value that would work, using https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp to verify. E.g., for this function, https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AAlphabetic%3A%5D-%5B%3AASCII%3A%5D&abb=on says that the first non-ASCII alphabetic character is U+AA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm we continue to have exhaustive coverage in tests? I'm not sure how much the generated tests by the test generator call the public methods vs check that unicode::Alphabetic (for example) is accurate.

They didn't call the public methods, no. But should now

/// as returned by [`char::case`].
#[unstable(feature = "titlecase", issue = "none")]
#[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)]
pub enum CharCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: also #[must_use], probably.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I don't think we usually label types like this #[must_use]?

/// Titlecase. Corresponds to the `Titlecase_Letter` Unicode general category.
Title = 0b10,
/// Uppercase. Corresponds to the `Uppercase` Unicode property.
Upper = 0b11,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about why these particular discriminants are chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a doc comment explaining this.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 16, 2026
@rustbot
Copy link
Collaborator

rustbot commented Mar 16, 2026

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@rust-log-analyzer

This comment has been minimized.

@Jules-Bertholet
Copy link
Contributor Author

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Unicode Area: Unicode S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants