Skip to content

[RD-567] Add TTS evals#7

Open
dbrkn wants to merge 10 commits intomainfrom
berkin/tts-evals
Open

[RD-567] Add TTS evals#7
dbrkn wants to merge 10 commits intomainfrom
berkin/tts-evals

Conversation

@dbrkn
Copy link
Copy Markdown
Owner

@dbrkn dbrkn commented Feb 13, 2026

  1. Adds a speech_generation pipeline that generates audio from text prompts via whisperkit-cli tts, transcribes the output using the WhisperKitPro engine, and computes WER against the original prompt. Includes a text-only dataset, configurable TTS/transcription params, a registered alias
  2. Adds a generic --pipeline-config key=value CLI flag for alias-mode overrides. ( mainly to set speakers and language for tts generation)

Sample command:

export WHISPERKIT_CLI_PATH="/path/to/whisperkit-cli"
export WHISPERKITPRO_CLI_PATH="/path/to/whisperkitpro-cli"
uv run openbench-cli evaluate \
  --pipeline whisperkit-speech-generation \
  --dataset customer-service-tts-prompts-vocalized \
  --metrics wer \
  --verbose

Sample Result:

Screenshot 2026-02-13 at 9 23 02 PM

dberkin1 and others added 10 commits February 13, 2026 20:52
Adds a new TTS evaluation pipeline using OpenAI's API to generate
audio from text prompts, then transcribes with WhisperKitPro for WER.

Made-with: Cursor
Adds a new TTS evaluation pipeline using ElevenLabs' API to generate
audio from text prompts, then transcribes with WhisperKitPro for WER.

Made-with: Cursor

Co-authored-by: dberkin1 <berkin@argmax.com>
…erkin/tts-evals

Made-with: Cursor

# Conflicts:
#	src/openbench/pipeline/pipeline_aliases.py
#	src/openbench/pipeline/speech_generation/__init__.py
Adds a new TTS evaluation pipeline using Cartesia's API to generate
audio from text prompts, then transcribes with WhisperKitPro for WER.
Adds a new TTS evaluation pipeline using Google's Gemini API to generate
audio from text prompts, then transcribes with WhisperKitPro for WER.
)

Adds multi-speaker dialogue TTS evaluation using ElevenLabs'
text_to_dialogue API with chunking for long dialogues.

Also includes:
- Speech generation support for local datasets
- Dialogue field in speech generation dataset schema
- Empty dictionary guard in keyword boosting metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant