Skip to content

fix(chunking): keep Dr./Mr./a.m./p.m. inline instead of splitting sen…#115

Open
biraj21 wants to merge 1 commit intoKittenML:mainfrom
biraj21:biraj/fix-am-pm-chunking
Open

fix(chunking): keep Dr./Mr./a.m./p.m. inline instead of splitting sen…#115
biraj21 wants to merge 1 commit intoKittenML:mainfrom
biraj21:biraj/fix-am-pm-chunking

Conversation

@biraj21
Copy link
Copy Markdown

@biraj21 biraj21 commented Mar 21, 2026

…tences

Before:

  • chunk_text() split on every . / ! / ? with no abbreviation handling.
  • Dr. Sharma... was treated like Dr + new sentence.
  • 9:30 a.m. and 5:15 p.m. were split at the periods inside the abbreviation.
  • That made TTS pause or slow down after Dr, a, and m, as if they were full sentence boundaries.

After:

  • Protect common inline abbreviations before sentence splitting and restore them afterward.
  • Dr., Mr., Mrs., Ms., Prof., a.m., and p.m. now stay inside the same sentence/chunk.
  • TTS no longer inserts sentence-break pauses around those abbreviations.
  • Real sentence-ending punctuation still splits normally

You can try running inference w sentence "Dr. Sharma arrived at 9:30 a.m., slightly out of breath, but the meeting had already started without him." and see the difference.

…tences

Before:
  - `chunk_text()` split on every `.` / `!` / `?` with no abbreviation handling.
  - `Dr. Sharma...` was treated like `Dr` + new sentence.
  - `9:30 a.m.` and `5:15 p.m.` were split at the periods inside the abbreviation.
  - That made TTS pause or slow down after `Dr`, `a`, and `m`, as if they were full sentence boundaries.

After:
  - Protect common inline abbreviations before sentence splitting and restore them afterward.
  - `Dr.`, `Mr.`, `Mrs.`, `Ms.`, `Prof.`, `a.m.`, and `p.m.` now stay inside the same sentence/chunk.
  - TTS no longer inserts sentence-break pauses around those abbreviations.
  - Real sentence-ending punctuation still splits normally
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant