Add <title sort>, <title short sort>, <first series sort> template tags (#1620) by CryptoJones · Pull Request #1810 · rmcrackan/Libation

CryptoJones · 2026-05-14T03:34:10Z

Summary

Adds three new template tags that strip a leading English article (A / An / The, case-insensitive) from the resolved value:

Tag	Example input	Output
`<title sort>`	The Hobbit: There and Back Again	Hobbit: There and Back Again
`<title short sort>`	The Hobbit: There and Back Again	Hobbit
`<first series sort>`	The Lord of the Rings	Lord of the Rings

Words that merely start with "The", "A", or "An" but aren't whole-word articles are untouched (e.g. "Theatre of War" → "Theatre of War").

Usage examples

Folder template that files under the sort letter:

<title sort short> [<id>]

Mixed template that puts the full title in the filename but sorts by the stripped form in the directory:

<if series-><first series sort>\<-if series><title sort> [<id>]

Changes

TemplateTags.cs — three new TemplateTags static properties
Templates.cs — StripLeadingArticle() private helper; registered in filePropertyTags (used by Folder, File, and ChapterFile templates) and the chapterPropertyTags inner collection
TemplatesTests.cs — new SortTags test class: 14 cases covering all three tags, no-article pass-through, case-insensitivity, and availability in the chapter template

Closes #1620

🤖 Generated with Claude Code

…gs (rmcrackan#1620) Adds three new file/folder naming template tags that strip a leading article (A/An/The, case-insensitive) from the resolved value: <title sort> — full title, article removed <title short sort> — title up to first colon, article removed <first series sort>— first series name, article removed Useful for organizing libraries so "The Hobbit" files into "H/" instead of "T/". Article stripping is additive-only; existing templates are unchanged. Covered by unit tests in TemplatesTests.SortTags. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Jo-Be-Co · 2026-05-14T13:14:16Z

I’ve considered similar cuts before.

Here are a few general comments:

These cuts should be available for every language so that the new tags also work for non-English books.
Would we always remove all leading components of all languages, or should we refer to the respective language of the book?
Currently, only a few selected tags are provided with the new logic. While this doesn’t make sense for every tag, there are at least more title tags and the list tag for series.
How about a formatting option for text output? In addition to indicating uppercase, lowercase, titlecase, and length restrictions, we could also include an additional S for a <title[10l]>.
Since you’ve already named the new tags with “sort,” would that also be a possible sorting option (at least for series)?

A completely different approach would be to give the user the possibility of text replacement. For example, with a replace-tag that is set around another tag. The regular expression ^(A|An|The) could, for example, capture the current list.

rmcrackan · 2026-05-14T13:50:12Z

This PR does not compile. Please test AI-generated code locally before submitting.

The new SortTags class lives in namespace Templates_ChapterFile_Tests but referenced Shared.GetLibraryBook(). The Shared class is in namespace TemplatesTests; the file's top-level `using static TemplatesTests.Shared;` brings the methods in unqualified, so drop the `Shared.` prefix to match the surrounding test conventions. Verified locally: SortTags tests pass (14/14), full project builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CryptoJones · 2026-05-15T06:21:22Z

Sincere apologies — you're absolutely right, and I'm sorry for the wasted CI runs. The PR went up without a local build (the new test class moved to a different namespace and lost access to the unqualified Shared symbol). Just pushed 002ebbb6 with the fix and verified dotnet test locally first this time — 14/14 new test cases pass.

Also: obligatory xkcd 2347 — you're the Nebraska person here. Thank you for the time you sink into reviewing AI-generated noise like this.

CryptoJones · 2026-05-15T06:21:28Z

Thanks for the careful review, @Jo-Be-Co. Genuinely good points:

Multi-language articles: Agreed, hard-coding English A/An/The is a real limitation. Cleanest fix is probably keying off the book's Language field. The replace-tag idea you raise below would subsume this entirely.
Tag selectivity: Fair — narrow scoping was deliberate but extending to author / series list is straightforward; happy to do that here or as a follow-up.
Format-option vs. dedicated tag: I prefer your <title[…s]>-style modifier over a separate <title sort> tag — it composes better. Open to reworking the PR that way if maintainers prefer.
series sort as a sort order: good catch, worth a separate issue.
Replace-tag with regex: that's the most flexible design and probably the right long-term answer. I think it deserves its own issue rather than expanding this PR.

Will hold off on rework until there's direction from maintainers.

rmcrackan · 2026-05-15T14:04:54Z

I'm really torn on this. I like this idea. It very much feels like the kind of thing I might have included myself in Libation's early days when it was just me and my English-speaking self. And let's be honest -- everything about Libation is anglocentric. BUT the books it liberates are not -- and this feature is about those books and their potentially non-English metadata. I'll think about this.

I'm not crazy about the proposed syntax but you and @Jo-Be-Co are working through that; I'll chime in after you 2 come to a consensus.

rmcrackan · 2026-05-15T14:28:29Z

I couldn't find an off-the-shelf solution and it looks like Humanizer removed this in v3 (boo!). But a good-enough version seems straight-forward (I know: famous last words)

private static Dictionary<string, string[]> LeadingArticles { get; } = new(StringComparer.OrdinalIgnoreCase)
{
    ["en"] = new[] { "the", "a", "an" },
    ["fr"] = new[] { "le", "la", "les", "l'", "un", "une", "des" },
    ["es"] = new[] { "el", "la", "los", "las", "un", "una", "unos", "unas" },
    ["it"] = new[] { "il", "lo", "la", "i", "gli", "le", "l'", "un", "uno", "una" },
    ["de"] = new[] { "der", "die", "das", "ein", "eine", "einen", "einem", "einer" },
    ["pt"] = new[] { "o", "a", "os", "as", "um", "uma", "uns", "umas" },
    ["nl"] = new[] { "de", "het", "een" },
    ["sv"] = new[] { "en", "ett" },
};

public static string ToSorted(string title, string? languageHint = null)
{
    var trimmed = title.TrimStart();
    var lang = languageHint ?? DetectLanguage(trimmed) ?? "en";
    if (LeadingArticles.TryGetValue(lang, out var articles))
    {
        foreach (var art in articles)
        {
            var prefix = art.EndsWith("'") ? art : art + " ";
            if (trimmed.StartsWithInsensitive(prefix))
                return trimmed[prefix.Length..].TrimStart();
        }
    }
    return trimmed;
}

While I was playing with this, I also found some string sorting algo notes about sorting without diacritics, which relates to the other PR discussion. The Normalization Form D on line 1 is "canonical decomposition". It decomposes single-character with accent into latin letter plus a combining character. Prints the same but is now 2 characters. (Which allows us to strip the non-latin character in a later step.) (Normalize method, NormalizationForm enum)

example:

before:
U+00E9 (LATIN SMALL LETTER E WITH ACUTE)
after:
U+0065 (plain e)
U+0301 (COMBINING ACUTE ACCENT)

var normalized = trimmed.Normalize(NormalizationForm.FormD);
var sb = new StringBuilder(normalized.Length);
foreach (var ch in normalized)
    if (CharUnicodeInfo.GetUnicodeCategory(ch) != UnicodeCategory.NonSpacingMark)
        sb.Append(ch);
return sb.ToString().ToLowerInvariant();

Jo-Be-Co · 2026-05-16T10:08:09Z

Two good approaches that point in the right direction.

Whereby I would not be sure whether I like the special treatment of `'`` or if I would add the spaces to the other terms. In the end, it will come down to a developer maintained solution anyway.

I would probably have stored one regular expression per language here again. 🤓

The topic of Unicode accent removal is exactly how to customise a large part without the need for explicit mapping. But we shouldn't delve into this here.

Jo-Be-Co · 2026-05-16T10:38:08Z

So what solutions do I see here:

The existing implementation should be extended to other languages. In most cases, the language of the book should be the correct choice.
When implementing with new tags, you should immediately become feature complete. A <series sort> will then be a bit more complex.
The series tags already have a format option {N} for the output of the value. Here you could also add another letter that implements the shortening. This would then also be useable directly for sorting <series[format({X}) sort(X)]>. Unfortunately, this does not yet exist with the simple text fields. But would be a consideration (analogous to <tag[{S:3}]>). <title[5U]> <title[{S:5U}]> <title[{X:5U}]> ...
On the other hand, only one place where the extension is installed is to intervene in the text formatting. But it works everywhere. But will not be needed everywhere. '<title[X]>, <series[format({N:X})]>`
A <replace> tag or as an option for the text outputs is certainly very powerful and universally useable. But this feels like driving a truck shopping.

Personally, I would prefer one of the variants with X, whereby the X should of course be a wisely chosen letter here.

Jo-Be-Co · 2026-05-16T11:09:11Z

var normalized = trimmed.Normalize(NormalizationForm.FormD);

var sb = new StringBuilder(normalized.Length);

foreach (var ch in normalized)

    if (CharUnicodeInfo.GetUnicodeCategory(ch) != UnicodeCategory.NonSpacingMark)

        sb.Append(ch);

return sb.ToString().ToLowerInvariant();

My compact variant:

var cleaned = Regex.Replace(input.Normalize(NormalizationForm.FormD), @"\p{Mn}+", "");

I think that it is quite likely that these replacements are rather rare. Then the character-by-character processing by a StringBuilder would only generate a string copy. A regexp without a match might be more efficent here.

CryptoJones mentioned this pull request May 15, 2026

Add German umlaut/eszett to default character replacement presets (#1667) #1809

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add <title sort>, <title short sort>, <first series sort> template tags (#1620)#1810

Add <title sort>, <title short sort>, <first series sort> template tags (#1620)#1810
CryptoJones wants to merge 2 commits into
rmcrackan:masterfrom
CryptoJones:feat/1620-sort-tags

CryptoJones commented May 14, 2026

Uh oh!

Jo-Be-Co commented May 14, 2026

Uh oh!

rmcrackan commented May 14, 2026

Uh oh!

CryptoJones commented May 15, 2026

Uh oh!

CryptoJones commented May 15, 2026

Uh oh!

rmcrackan commented May 15, 2026

Uh oh!

rmcrackan commented May 15, 2026

Uh oh!

Jo-Be-Co commented May 16, 2026

Uh oh!

Jo-Be-Co commented May 16, 2026

Uh oh!

Jo-Be-Co commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

CryptoJones commented May 14, 2026

Summary

Usage examples

Changes

Uh oh!

Jo-Be-Co commented May 14, 2026

Uh oh!

rmcrackan commented May 14, 2026

Uh oh!

CryptoJones commented May 15, 2026

Uh oh!

CryptoJones commented May 15, 2026

Uh oh!

rmcrackan commented May 15, 2026

Uh oh!

rmcrackan commented May 15, 2026

Uh oh!

Jo-Be-Co commented May 16, 2026

Uh oh!

Jo-Be-Co commented May 16, 2026

Uh oh!

Jo-Be-Co commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants