Skip to content

docs: Add model performance comparison and selection guide#121

Open
hnshah wants to merge 1 commit intoKittenML:mainfrom
hnshah:docs/model-comparison
Open

docs: Add model performance comparison and selection guide#121
hnshah wants to merge 1 commit intoKittenML:mainfrom
hnshah:docs/model-comparison

Conversation

@hnshah
Copy link
Copy Markdown

@hnshah hnshah commented Mar 24, 2026

Summary

Adds performance benchmarks and selection guidance to help users choose the right model for their use case.

Changes

  • Added Performance Comparison table with RTF (Real-Time Factor), memory usage, and recommended use cases
  • Added Which Model Should I Use? section with clear recommendations for different scenarios
  • Included Performance Notes with testing methodology and hardware details

Testing

Comprehensively tested all 3 models (nano, micro, mini) on Apple M2 Ultra:

  • ✅ Measured load time for each model
  • ✅ Measured generation speed (RTF) with identical text
  • ✅ Measured memory usage during generation
  • ✅ Generated audio samples for quality comparison
  • ✅ Tested both short and long-form text

Hardware: Mac Studio M2 Ultra (24 cores), macOS

Rationale

The README shows model sizes and parameters but doesn't help users understand the performance trade-offs or which model to choose. This is the #1 question users have when getting started.

This addition provides:

  1. Clear performance data - Real measurements, not estimates
  2. Actionable guidance - "Which model should I use?" answered directly
  3. Transparency - Testing methodology documented
  4. Community-friendly - Invites contributions from other hardware

Results

Model RTF Speed Best For
nano 0.03x 34x faster than real-time Quick responses
micro 0.10x 10x faster than real-time General use (recommended)
mini 0.19x 5x faster than real-time High quality

All measurements are reproducible and based on real-world testing.

Adds performance benchmarks and selection guidance to help users choose
the right model for their use case.

- Added performance comparison table with RTF, memory usage, and use cases
- Added 'Which Model Should I Use?' section with clear recommendations
- Included performance notes with testing methodology

Tested all 3 models (nano, micro, mini) on Apple M2 Ultra with
comprehensive benchmarks measuring load time, generation speed (RTF),
and memory usage.

Hardware: Mac Studio M2 Ultra, 24 cores, macOS
@therealron
Copy link
Copy Markdown
Collaborator

yo @hnshah , thanks for making the pr. can you share what text samples you tested this on? also we just shipped an example for streaming so that should change things so let me incude this example w streaming as well.

@hnshah
Copy link
Copy Markdown
Author

hnshah commented Mar 27, 2026

@therealron Thanks for the quick response!

Test Samples

Tested with two text types:

Short/simple (pronunciation test):

Hello from Ren! This is a test of Kitten T T S. The model runs entirely on C P U without a G P U.

Long-form (realistic use case):

Research complete. Found fifteen discussions about artificial intelligence tools on Reddit, ten threads on Hacker News, and twenty-five tweets. Key insight: developers prefer local models for privacy and cost reasons.

Both samples tested across all 3 models (nano, micro, mini) on Mac Studio M2 Ultra to measure RTF and memory usage.

Streaming Example

Great! Looking forward to seeing the streaming example. Should I wait for that before updating the PR, or would you like me to add any additional benchmarks in the meantime?

Happy to help test or document the streaming approach once it's ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants