feat(videos): cycle highlight uses ElevenLabs alignment timestamps by jjackson · Pull Request #456 · jjackson/ace-web

jjackson · 2026-05-19T19:50:13Z

Summary

@jjackson: "we want the highlight to specifically transition when we say, learn, deliver, verify, pay onto each highlighted area when we say the word"

The narration-synced highlight from #451/#455 picked a reasonable proportional position (word_index / total_words) but that's an estimate — it transitions roughly when the word is said, not precisely. This PR wires up actual per-character TTS timestamps.

Pipeline

voiceover.ts::synthesize — switched to ElevenLabs' /v1/text-to-speech/{voice_id}/with-timestamps endpoint. Same audio quality + voice settings; the response is JSON with audio_base64 + alignment.{characters, character_start_times_seconds, character_end_times_seconds}. The alignment is persisted in the sidecar JSON next to the mp3.
scripts/render.ts — reads the cycle beat's sidecar, calls wordStartSeconds(alignment, "learn"|"deliver"|"verif"|"paid"|"pay"), and threads the 4 numbers through Remotion props as cycleStepStartSeconds.
Root.tsx + Intro.tsx — Cycle checks frame / fps against the 4 timestamps; switches activeIndex on each crossing. Falls back to the word-index proportional path when alignment isn't available (Studio preview, or cached audio from before this PR).

Cache compatibility

The sidecar JSON used to live without an alignment field. synthesize() now treats those as cache misses so the audio re-synthesizes once per program ($0.01-ish in ElevenLabs cost) to backfill alignment. Re-renders after that hit cache as before.

Verified

render.log excerpt:

Cycle step timings (seconds into cycle audio):
  { learn: 0.975, deliver: 1.509, verify: 2.229, pay: 3.599 }

Frame extracts from a fresh chc/run-001 render (cycle beat starts at video t≈4s):

Video t	Cycle-relative t	Highlighted	Expected (per spoken word)
5s	1s	Learn	Learn (0.975s ≤ t < 1.509s) ✓
6s	2s	Deliver	Deliver (1.509s ≤ t < 2.229s) ✓
7s	3s	Verify	Verify (2.229s ≤ t < 3.599s) ✓
8s	4s	Pay	Pay (3.599s ≤ t) ✓

Each transition coincides with the spoken word's onset.

🤖 Generated with Claude Code

@jjackson

The narration-synced highlight from #451/#455 picked a reasonable proportional position (word_index / total_words) but that's an estimate. The highlight transitioned roughly when the word was said, not precisely. @jjackson asked for "specifically transition when we say learn, deliver, verify, pay onto each highlighted area when we say the word" — that requires actual per-character timestamps from the TTS engine. ElevenLabs has a /v1/text-to-speech/{voice_id}/with-timestamps endpoint that returns the same audio mp3 plus a per-character alignment array with start/end seconds. Switched synthesize() over to it and persisted the alignment in the existing sidecar JSON (adds an `alignment: {characters, character_start_times_seconds, character_end_times_seconds}` field). Pipeline: scripts/render.ts → reads alignment for the 'cycle' beat's per-beat audio sidecar → calls wordStartSeconds(alignment, "learn"|"deliver"|"verif"|"paid"|"pay") → passes the 4 numbers as `cycleStepStartSeconds` in Remotion props src/Root.tsx → threads cycleStepStartSeconds through to <Intro> src/compositions/Intro.tsx → Cycle component checks frame/fps against the 4 timestamps; switches activeIndex on each crossing. No estimation. → Falls back to the word-index proportional path when alignment isn't available (Studio preview, or cached audio from before the alignment switch). Cache compatibility: the sidecar JSON used to live without an `alignment` field. `synthesize()` now treats those as cache misses so the audio re-synthesizes once (per-program, $0.01-ish) to backfill alignment. Re-renders after that hit cache as before. Verified on chc/run-001: - render.log: "Cycle step timings (seconds into cycle audio): { learn: 0.975, deliver: 1.509, verify: 2.229, pay: 3.599 }" - Frame extracts at t=5/6/7/8s: Learn / Deliver / Verify / Pay highlighted respectively. Each transition coincides with the spoken word's onset. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

jjackson merged commit bda0ed7 into main May 19, 2026
3 checks passed

jjackson deleted the feat/cycle-tts-alignment branch May 19, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(videos): cycle highlight uses ElevenLabs alignment timestamps#456

feat(videos): cycle highlight uses ElevenLabs alignment timestamps#456
jjackson merged 1 commit into
mainfrom
feat/cycle-tts-alignment

jjackson commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jjackson commented May 19, 2026

Summary

Pipeline

Cache compatibility

Verified

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant