On-device text-to-speech Swift SDK for iOS and macOS, powered by Kitten TTS.
let tts = try await KittenTTS()
let result = try await tts.generate("Hello from Kitten TTS!")
try await tts.speak("Good morning!")- Fully on-device — no network calls after the one-time data & model download
- 4 model sizes — nano (fp32/int8), micro, and mini
- 8 voices — Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
- Simple async API — one line to generate or play speech
- WAV export — save audio to disk with
result.writeWAV(to:) - iOS 16+ / macOS 14+
| Platform | Minimum |
|---|---|
| iOS | 16.0 |
| macOS | 14.0 |
| Swift | 5.9 |
| Xcode | 15+ |
Add the following to your Package.swift:
dependencies: [
.package(url: "https://github.com/KittenML/KittenTTS-swift", from: "0.1.0"),
],
targets: [
.target(name: "YourApp", dependencies: [
.product(name: "KittenTTS", package: "KittenTTS-swift"),
]),
]Or in Xcode: File → Add Package Dependencies… and enter the repository URL.
The first call downloads phonemizer data files and the model, then caches them in Application Support. Subsequent calls are instant.
import KittenTTS
// Default config (nano fp32 model, Bella voice, 1× speed)
let tts = try await KittenTTS()
// With download progress
let tts = try await KittenTTS { progress in
print("Download: \(Int(progress * 100))%")
}let result = try await tts.generate("Hello, world!")
print("Duration: \(result.duration)s")
print("Voice: \(result.voice.displayName)")
print("Samples: \(result.samples.count)")// Generate + play (returns when playback finishes)
try await tts.speak("Hello, world!")
// Or generate first, then play later
let result = try await tts.generate("Hello, world!")
try await tts.speak(result.inputText, voice: result.voice)let result = try await tts.generate("Hello, world!")
// Get raw Data
let data = result.wavData()
// Write directly to a URL
let url = URL(fileURLWithPath: "/tmp/hello.wav")
try result.writeWAV(to: url)let config = KittenTTSConfig(
model: .nano, // .nano (fp32), .nanoInt8, .micro, .mini
defaultVoice: .luna, // default voice
speed: 1.1, // global speed multiplier (0.5–2.0)
storageDirectory: nil, // nil = Application Support/KittenTTS/
ortNumThreads: 4, // ONNX intra-op thread count
maxTokensPerChunk: 400 // max tokens per inference chunk
)
let tts = try await KittenTTS(config)| Case | Name | Size | Parameters |
|---|---|---|---|
.nano (default) |
Nano (fp32) | ~56 MB | 15M |
.nanoInt8 |
Nano (int8) | ~25 MB | 15M |
.micro |
Micro | ~41 MB | 40M |
.mini |
Mini | ~80 MB | 80M |
KittenSDK ships three options for English G2P (grapheme-to-phoneme) conversion, all
selected through KittenTTSConfig.phonemizer:
| Option | Quality | Dependencies | Platform |
|---|---|---|---|
.builtin (default) |
High — accurate IPA with stress marks | None (data files downloaded on first use) | iOS + macOS |
.espeak |
High — invokes the system espeak-ng binary |
brew install espeak-ng |
macOS only |
.custom(…) |
Varies | Your own implementation | Any |
The default .builtin phonemizer uses EPhonemizer, a C++ engine that reads
rule and dictionary data files to produce high-quality IPA output. Data files
(en_rules, en_list) are automatically downloaded on first use and cached
alongside the model.
You can override the download URLs to point to your own hosted copies:
let phonemizer = EPhonemizer(
rulesURL: URL(string: "https://example.com/en_rules")!,
listURL: URL(string: "https://example.com/en_list")!
)
let config = KittenTTSConfig(phonemizer: .custom(phonemizer))Implement KittenPhonemizerProtocol to plug in any G2P engine:
struct MyPhonemizer: KittenPhonemizerProtocol {
func phonemize(_ text: String) -> String {
// your G2P logic here
}
}
let config = KittenTTSConfig(phonemizer: .custom(MyPhonemizer()))| Case | Name | Gender |
|---|---|---|
.bella (default) |
Bella | Female |
.jasper |
Jasper | Male |
.luna |
Luna | Female |
.bruno |
Bruno | Male |
.rosie |
Rosie | Female |
.hugo |
Hugo | Male |
.kiki |
Kiki | Female |
.leo |
Leo | Male |
// Init — downloads phonemizer data & model if needed
public init(_ config: KittenTTSConfig = .init(),
downloadProgressHandler: ((Double) -> Void)? = nil) async throws
// Generate audio
public func generate(_ text: String,
voice: KittenVoice? = nil,
speed: Float? = nil) async throws -> KittenTTSResult
// Generate and play
@discardableResult
public func speak(_ text: String,
voice: KittenVoice? = nil,
speed: Float? = nil) async throws -> KittenTTSResult
// Stop playback
public func stopSpeaking()
// Check if model is cached (no download required)
public static func isModelCached(for config: KittenTTSConfig = .init()) -> Bool
// Pre-download without creating a full instance
public static func prewarm(config: KittenTTSConfig = .init()) async throwspublic struct KittenTTSResult {
public let samples: [Float] // Raw Float32 PCM at 24 kHz
public let sampleRate: Int // Always 24_000
public var duration: TimeInterval // Audio length in seconds
public let voice: KittenVoice // Voice used
public let effectiveSpeed: Float // Speed × model speed prior
public let inputText: String // Original input text
public func wavData() -> Data // Encode as WAV
public func writeWAV(to url: URL) throws // Write WAV to disk
}The Examples/ directory contains two complete SwiftUI apps:
KittenTTSiOSExample/— iOS 17+ app (iPhone / iPad)KittenTTSmacOSExample/— macOS 14+ app (native Mac window)
To open an example, regenerate the Xcode project with XcodeGen:
cd Examples/KittenTTSiOSExample
xcodegen generate
open KittenTTSiOSExample.xcodeprojKittenSDK/
├── Sources/
│ ├── Czlib/ # System library shim for zlib (DEFLATE in .npz)
│ ├── CEPhonemizer/ # C++ phonemizer engine + C bridge for Swift interop
│ └── KittenTTS/
│ ├── Public/ # Public API (KittenTTS, KittenVoice, KittenPhonemizerProtocol, …)
│ ├── Engine/ # TextPreprocessor, TextCleaner, TTSEngine
│ ├── Phonemizer/ # EPhonemizer, BuiltinPhonemizer, ESpeakBinaryPhonemizer, RuleBasedG2P
│ ├── Loader/ # NPZLoader (ZIP64 + float16/32 .npy)
│ ├── Audio/ # WAVEncoder, AudioOutput
│ └── Internal/ # Extensions, ModelDownloader
├── Tests/KittenTTSTests/ # XCTest suite
└── Examples/
├── KittenTTSiOSExample/
└── KittenTTSmacOSExample/
text
└─▶ TextPreprocessor (numbers, currency, ordinals → words)
└─▶ KittenPhonemizerProtocol (pluggable G2P: builtin / espeak binary / custom)
└─▶ TextCleaner (IPA → Int64 token IDs)
└─▶ TTSEngine (ONNX Runtime inference, chunked)
└─▶ Float32 PCM @ 24 kHz
└─▶ WAVEncoder / AudioOutput
Apache License 2.0. See LICENSE for details.
The KittenTTS model weights are distributed separately under their own license. See KittenML/KittenTTS for details.
The SDK supports multiple pluggable phonemizers via KittenPhonemizerProtocol.
When using EPhonemizer it downloads GPL v3-licensed data files (en_rules,
en_list) at runtime - these files are only fetched when EPhonemizer is
selected and are not bundled in the SDK/app.