Add <title sort>, <title short sort>, <first series sort> template tags (#1620)#1810
Add <title sort>, <title short sort>, <first series sort> template tags (#1620)#1810CryptoJones wants to merge 2 commits into
Conversation
…gs (rmcrackan#1620) Adds three new file/folder naming template tags that strip a leading article (A/An/The, case-insensitive) from the resolved value: <title sort> — full title, article removed <title short sort> — title up to first colon, article removed <first series sort>— first series name, article removed Useful for organizing libraries so "The Hobbit" files into "H/" instead of "T/". Article stripping is additive-only; existing templates are unchanged. Covered by unit tests in TemplatesTests.SortTags. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
I’ve considered similar cuts before. Here are a few general comments:
A completely different approach would be to give the user the possibility of text replacement. For example, with a replace-tag that is set around another tag. The regular expression |
|
This PR does not compile. Please test AI-generated code locally before submitting. |
The new SortTags class lives in namespace Templates_ChapterFile_Tests but referenced Shared.GetLibraryBook(). The Shared class is in namespace TemplatesTests; the file's top-level `using static TemplatesTests.Shared;` brings the methods in unqualified, so drop the `Shared.` prefix to match the surrounding test conventions. Verified locally: SortTags tests pass (14/14), full project builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Sincere apologies — you're absolutely right, and I'm sorry for the wasted CI runs. The PR went up without a local build (the new test class moved to a different namespace and lost access to the unqualified Also: obligatory xkcd 2347 — you're the Nebraska person here. Thank you for the time you sink into reviewing AI-generated noise like this. |
|
Thanks for the careful review, @Jo-Be-Co. Genuinely good points:
Will hold off on rework until there's direction from maintainers. |
|
I'm really torn on this. I like this idea. It very much feels like the kind of thing I might have included myself in Libation's early days when it was just me and my English-speaking self. And let's be honest -- everything about Libation is anglocentric. BUT the books it liberates are not -- and this feature is about those books and their potentially non-English metadata. I'll think about this. I'm not crazy about the proposed syntax but you and @Jo-Be-Co are working through that; I'll chime in after you 2 come to a consensus. |
|
I couldn't find an off-the-shelf solution and it looks like Humanizer removed this in v3 (boo!). But a good-enough version seems straight-forward (I know: famous last words) private static Dictionary<string, string[]> LeadingArticles { get; } = new(StringComparer.OrdinalIgnoreCase)
{
["en"] = new[] { "the", "a", "an" },
["fr"] = new[] { "le", "la", "les", "l'", "un", "une", "des" },
["es"] = new[] { "el", "la", "los", "las", "un", "una", "unos", "unas" },
["it"] = new[] { "il", "lo", "la", "i", "gli", "le", "l'", "un", "uno", "una" },
["de"] = new[] { "der", "die", "das", "ein", "eine", "einen", "einem", "einer" },
["pt"] = new[] { "o", "a", "os", "as", "um", "uma", "uns", "umas" },
["nl"] = new[] { "de", "het", "een" },
["sv"] = new[] { "en", "ett" },
};
public static string ToSorted(string title, string? languageHint = null)
{
var trimmed = title.TrimStart();
var lang = languageHint ?? DetectLanguage(trimmed) ?? "en";
if (LeadingArticles.TryGetValue(lang, out var articles))
{
foreach (var art in articles)
{
var prefix = art.EndsWith("'") ? art : art + " ";
if (trimmed.StartsWithInsensitive(prefix))
return trimmed[prefix.Length..].TrimStart();
}
}
return trimmed;
}While I was playing with this, I also found some string sorting algo notes about sorting without diacritics, which relates to the other PR discussion. The Normalization Form D on line 1 is "canonical decomposition". It decomposes single-character with accent into latin letter plus a combining character. Prints the same but is now 2 characters. (Which allows us to strip the non-latin character in a later step.) (Normalize method, NormalizationForm enum) example:
var normalized = trimmed.Normalize(NormalizationForm.FormD);
var sb = new StringBuilder(normalized.Length);
foreach (var ch in normalized)
if (CharUnicodeInfo.GetUnicodeCategory(ch) != UnicodeCategory.NonSpacingMark)
sb.Append(ch);
return sb.ToString().ToLowerInvariant(); |
|
Two good approaches that point in the right direction. Whereby I would not be sure whether I like the special treatment of `'`` or if I would add the spaces to the other terms. In the end, it will come down to a developer maintained solution anyway. I would probably have stored one regular expression per language here again. 🤓 The topic of Unicode accent removal is exactly how to customise a large part without the need for explicit mapping. But we shouldn't delve into this here. |
|
So what solutions do I see here:
Personally, I would prefer one of the variants with |
My compact variant: var cleaned = Regex.Replace(input.Normalize(NormalizationForm.FormD), @"\p{Mn}+", "");I think that it is quite likely that these replacements are rather rare. Then the character-by-character processing by a StringBuilder would only generate a string copy. A regexp without a match might be more efficent here. |
Summary
Adds three new template tags that strip a leading English article (A / An / The, case-insensitive) from the resolved value:
<title sort><title short sort><first series sort>Words that merely start with "The", "A", or "An" but aren't whole-word articles are untouched (e.g. "Theatre of War" → "Theatre of War").
Usage examples
Folder template that files under the sort letter:
Mixed template that puts the full title in the filename but sorts by the stripped form in the directory:
Changes
TemplateTags.cs— three newTemplateTagsstatic propertiesTemplates.cs—StripLeadingArticle()private helper; registered infilePropertyTags(used by Folder, File, and ChapterFile templates) and thechapterPropertyTagsinner collectionTemplatesTests.cs— newSortTagstest class: 14 cases covering all three tags, no-article pass-through, case-insensitivity, and availability in the chapter templateCloses #1620
🤖 Generated with Claude Code