TeensyTalk V2

Text-to-speech for the Teensy 4.1 + Audio Shield. Converts arbitrary English text to speech by concatenating MBROLA phoneme WAV files, using a two-tier pronunciation lookup system and eSpeak-ng-derived letter-to-sound rules.

How It Works

1. Text → Words

say("some text") splits the input on whitespace and punctuation. Commas insert a short pause (80 ms), periods/question marks/exclamation marks insert a longer pause (200 ms), and spaces insert a 50 ms inter-word gap.

Each word is normalized: lowercased, leading/trailing punctuation stripped, contractions truncated at the apostrophe.

2. Word → Phonemes (two-tier lookup)

Tier 1 — Flash dictionary (tts_dict.h) A small hand-curated table compiled into firmware. Checked first. Use this to override or correct any word that the SD dictionary or rules get wrong.

Tier 2 — SD card dictionary (DICT.BIN on SD card) ~125,000 words from the CMU Pronouncing Dictionary, converted from ARPABET to MBROLA us2 phoneme names. Binary-searched in ~17 seeks (~34 ms worst case). Falls back to rules if the word is not found.

Tier 3 — Letter-to-sound rules (tts_rules.h) eSpeak-ng-derived context-sensitive rules applied letter by letter when neither dictionary has the word.

3. Phonemes → Audio

All phoneme WAV files are 16 kHz, 16-bit mono, stored in the SD card root directory. For each word:

Each phoneme's WAV is read from SD and appended into a RAM buffer.
Adjacent phonemes are joined with a 128-sample bidirectional crossfade to eliminate click artifacts at boundaries.
Short word-final phonemes (< 55 ms) are extended by looping their aspiration tail with a linear fade-out.
The entire word is played as one continuous PCM stream through the Teensy Audio library — no gaps between phonemes.

The I²S sample rate is set to 16000 Hz in setup() to match the phoneme WAV files.

SD Card Contents

Copy all of the following to the root directory of the SD card (no subdirectory):

Phoneme WAV files (~45 files)

All files from TeensyTalkV2/wavs/ — listed below with their MBROLA phoneme mapping:

WAV filename	MBROLA phoneme	Example
`A.wav`	`A`	father
`AE.wav`	`{`	cat
`AI.wav`	`AI`	bite
`SCH.wav`	`@`	about (schwa)
`OW.wav`	`@U`	boat
`E.wav`	`E`	bet
`EI.wav`	`EI`	bait
`ER.wav`	`r=`	bird
`IH.wav`	`I`	bit
`i.wav`	`i`	beat
`O.wav`	`O`	thought
`OI.wav`	`OI`	boy
`OR.wav`	`OR`	for
`UH.wav`	`V`	but
`UU.wav`	`U`	book
`aU.wav`	`aU`	how
`b.wav`	`b`	bat
`d.wav`	`d`	dog
`dZ.wav`	`dZ`	jump
`DH.wav`	`D`	the (voiced)
`f.wav`	`f`	fat
`FL.wav`	`4`	butter (flap T)
`g.wav`	`g`	go
`h.wav`	`h`	hat
`j.wav`	`j`	yes
`k.wav`	`k`	(mapped to KH — see below)
`KH.wav`	`k`, `k_h`	kite (aspirated, used for all K)
`l.wav`	`l`	let
`LS.wav`	`l=`	bottle (syllabic L)
`m.wav`	`m`	mat
`n.wav`	`n`	net
`NG.wav`	`N`	sing
`p.wav`	`p`	pat
`PH.wav`	`p_h`	aspirated P
`r.wav`	`r`	rat
`s.wav`	`s`	sat
`SH.wav`	`S`	she
`t.wav`	`t`	tap
`TH.wav`	`T`	think (voiceless)
`TH2.wav`	`t_h`	aspirated T
`tS.wav`	`tS`	chip
`u.wav`	`u`	food
`v.wav`	`v`	vat
`w.wav`	`w`	wet
`z.wav`	`z`	zoo
`ZH.wav`	`Z`	vision

Dictionary file

File	Description
`DICT.BIN`	CMU Pronouncing Dictionary, ~125,000 words, binary format

Download DICT.BIN from the latest release and copy it to the SD card root. It is not stored in the repo due to its size (~15 MB).

Important: All files go directly in the SD card root — no subdirectories.

Building DICT.BIN (optional)

DICT.BIN is available as a pre-built download on the releases page. If you want to rebuild it yourself (e.g. after modifying the ARPABET conversion), use build_dict.py in the repo root:

python3 build_dict.py          # writes DICT.BIN in current directory
python3 build_dict.py out.bin  # custom output path

Copy the resulting DICT.BIN to the SD card root. The file is ~15 MB (~125,000 entries × 128 bytes each).

On boot, the firmware prints: SD dict loaded: 125247 entries

How the Phoneme WAV Files Were Made

The WAV files are individual phoneme recordings extracted from the MBROLA us2 voice — a diphone synthesis voice for American English developed as part of the MBROLA project. Each file contains a single phoneme sound at 16 kHz, 16-bit mono.

The MBROLA project: https://github.com/numediart/MBROLA
The us2 voice database is available from the MBROLA voices repository.

To regenerate or add phonemes, install MBROLA and the us2 voice database, then synthesize individual phoneme sequences using a .pho input file (phoneme name + duration in ms + pitch points).

Fixing a Mispronounced Word

Step 1 — Identify the phonemes

Run the sketch and type the word into the Serial Monitor. The debug output shows what phonemes were used and whether they came from the dictionary or rules:

rocket: r A k I t (dict)
travel: t r { v @ l (rules)

Determine the correct MBROLA us2 phoneme sequence for the word. Reference the WAV filename table above for phoneme names — note that some phonemes use special characters ({, @, r=, etc.) that map to renamed WAV files.

Step 2 — Add to the flash dictionary

Open TeensyTalkV2/tts_dict.h and add an entry in alphabetical order:

{ "travel",  "t r { v @ l"    },
{ "rocket",  "r A k I t"      },   // ← new entry, in alphabetical position

The list must stay in strict alphabetical order — it is searched with strcmp comparisons. Adding an entry out of order will cause incorrect lookups for nearby words.

Step 3 — Rebuild and flash

Recompile and upload. The flash dictionary takes priority over the SD card dictionary, so your correction will always win.

MBROLA Phoneme Reference (us2)

MBROLA	WAV file	Sound
`A`	A.wav	"father" vowel
`{`	AE.wav	"cat" vowel
`@`	SCH.wav	schwa — "about", "-er" unstressed
`@U`	OW.wav	"boat" diphthong
`AI`	AI.wav	"bite" diphthong
`aU`	aU.wav	"how" diphthong
`E`	E.wav	"bet" vowel
`EI`	EI.wav	"bait" diphthong
`I`	IH.wav	"bit" short vowel
`i`	i.wav	"beat" long vowel
`O`	O.wav	"thought" vowel
`OI`	OI.wav	"boy" diphthong
`r=`	ER.wav	"bird" r-colored vowel
`U`	UU.wav	"book" vowel
`u`	u.wav	"food" vowel
`V`	UH.wav	"but" strut vowel
`b`	b.wav	voiced bilabial stop
`d`	d.wav	voiced alveolar stop
`D`	DH.wav	voiced "th" (the, this)
`dZ`	dZ.wav	"jump" affricate
`f`	f.wav	voiceless labiodental
`4`	FL.wav	flapped T (butter, later)
`g`	g.wav	voiced velar stop
`h`	h.wav	glottal fricative
`j`	j.wav	palatal approximant (yes)
`k`	KH.wav	voiceless velar stop (aspirated)
`l`	l.wav	lateral approximant
`l=`	LS.wav	syllabic L (bottle)
`m`	m.wav	bilabial nasal
`n`	n.wav	alveolar nasal
`N`	NG.wav	velar nasal (sing)
`p`	p.wav	voiceless bilabial stop
`r`	r.wav	approximant
`s`	s.wav	voiceless alveolar fricative
`S`	SH.wav	"she" fricative
`t`	t.wav	voiceless alveolar stop
`T`	TH.wav	voiceless "th" (think)
`tS`	tS.wav	"chip" affricate
`u`	u.wav	high back rounded vowel
`v`	v.wav	voiced labiodental
`w`	w.wav	labio-velar approximant
`z`	z.wav	voiced alveolar fricative
`Z`	ZH.wav	"vision" fricative

Hardware

Teensy 4.1
Teensy Audio Shield (SGTL5000)
Micro SD card formatted FAT32

Audio output is on the headphone jack of the Audio Shield.

Source Files

File	Purpose
`TeensyTalkV2.ino`	Setup, audio graph, serial input loop
`tts_buffer.h`	PCM RAM buffer, WAV loading, crossfade, normalization
`tts_phonemes.h`	MBROLA→WAV filename mapping, `loadPhoneme()`
`tts_dict.h`	Flash dictionary (hand-curated, highest priority)
`tts_dict_sd.h`	SD card binary dictionary, `sdDictInit()`, `sdDictLookup()`
`tts_rules.h`	eSpeak-ng-derived letter-to-sound rules
`tts_say.h`	`say()`, `sayNumber()`, word splitting and normalization
`build_dict.py`	Builds `DICT.BIN` from CMU Pronouncing Dictionary

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
TeensyTalkV2		TeensyTalkV2
tools		tools
wavs		wavs
.gitignore		.gitignore
MAC_SETUP.md		MAC_SETUP.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TeensyTalk V2

How It Works

1. Text → Words

2. Word → Phonemes (two-tier lookup)

3. Phonemes → Audio

SD Card Contents

Phoneme WAV files (~45 files)

Dictionary file

Building DICT.BIN (optional)

How the Phoneme WAV Files Were Made

Fixing a Mispronounced Word

Step 1 — Identify the phonemes

Step 2 — Add to the flash dictionary

Step 3 — Rebuild and flash

MBROLA Phoneme Reference (us2)

Hardware

Source Files

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TeensyTalk V2

How It Works

1. Text → Words

2. Word → Phonemes (two-tier lookup)

3. Phonemes → Audio

SD Card Contents

Phoneme WAV files (~45 files)

Dictionary file

Building DICT.BIN (optional)

How the Phoneme WAV Files Were Made

Fixing a Mispronounced Word

Step 1 — Identify the phonemes

Step 2 — Add to the flash dictionary

Step 3 — Rebuild and flash

MBROLA Phoneme Reference (us2)

Hardware

Source Files

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages