Skip to content

KittenML/KittenTTS-swift

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kitten TTS Swift SDK

On-device text-to-speech Swift SDK for iOS and macOS, powered by Kitten TTS.

let tts = try await KittenTTS()
let result = try await tts.generate("Hello from Kitten TTS!")
try await tts.speak("Good morning!")

Features

  • Fully on-device — no network calls after the one-time data & model download
  • 4 model sizes — nano (fp32/int8), micro, and mini
  • 8 voices — Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo
  • Simple async API — one line to generate or play speech
  • WAV export — save audio to disk with result.writeWAV(to:)
  • iOS 16+ / macOS 14+

Requirements

Platform Minimum
iOS 16.0
macOS 14.0
Swift 5.9
Xcode 15+

Installation

Swift Package Manager

Add the following to your Package.swift:

dependencies: [
    .package(url: "https://github.com/KittenML/KittenTTS-swift", from: "0.1.0"),
],
targets: [
    .target(name: "YourApp", dependencies: [
        .product(name: "KittenTTS", package: "KittenTTS-swift"),
    ]),
]

Or in Xcode: File → Add Package Dependencies… and enter the repository URL.

Quick Start

1. Create an instance

The first call downloads phonemizer data files and the model, then caches them in Application Support. Subsequent calls are instant.

import KittenTTS

// Default config (nano fp32 model, Bella voice, 1× speed)
let tts = try await KittenTTS()

// With download progress
let tts = try await KittenTTS { progress in
    print("Download: \(Int(progress * 100))%")
}

2. Generate audio

let result = try await tts.generate("Hello, world!")

print("Duration: \(result.duration)s")
print("Voice: \(result.voice.displayName)")
print("Samples: \(result.samples.count)")

3. Play through speakers

// Generate + play (returns when playback finishes)
try await tts.speak("Hello, world!")

// Or generate first, then play later
let result = try await tts.generate("Hello, world!")
try await tts.speak(result.inputText, voice: result.voice)

4. Save as WAV

let result = try await tts.generate("Hello, world!")

// Get raw Data
let data = result.wavData()

// Write directly to a URL
let url = URL(fileURLWithPath: "/tmp/hello.wav")
try result.writeWAV(to: url)

Configuration

let config = KittenTTSConfig(
    model: .nano,           // .nano (fp32), .nanoInt8, .micro, .mini
    defaultVoice: .luna,    // default voice
    speed: 1.1,             // global speed multiplier (0.5–2.0)
    storageDirectory: nil,  // nil = Application Support/KittenTTS/
    ortNumThreads: 4,       // ONNX intra-op thread count
    maxTokensPerChunk: 400  // max tokens per inference chunk
)
let tts = try await KittenTTS(config)

Models

Case Name Size Parameters
.nano (default) Nano (fp32) ~56 MB 15M
.nanoInt8 Nano (int8) ~25 MB 15M
.micro Micro ~41 MB 40M
.mini Mini ~80 MB 80M

Phonemizers

KittenSDK ships three options for English G2P (grapheme-to-phoneme) conversion, all selected through KittenTTSConfig.phonemizer:

Option Quality Dependencies Platform
.builtin (default) High — accurate IPA with stress marks None (data files downloaded on first use) iOS + macOS
.espeak High — invokes the system espeak-ng binary brew install espeak-ng macOS only
.custom(…) Varies Your own implementation Any

EPhonemizer (default)

The default .builtin phonemizer uses EPhonemizer, a C++ engine that reads rule and dictionary data files to produce high-quality IPA output. Data files (en_rules, en_list) are automatically downloaded on first use and cached alongside the model.

You can override the download URLs to point to your own hosted copies:

let phonemizer = EPhonemizer(
    rulesURL: URL(string: "https://example.com/en_rules")!,
    listURL:  URL(string: "https://example.com/en_list")!
)
let config = KittenTTSConfig(phonemizer: .custom(phonemizer))

Custom phonemizer

Implement KittenPhonemizerProtocol to plug in any G2P engine:

struct MyPhonemizer: KittenPhonemizerProtocol {
    func phonemize(_ text: String) -> String {
        // your G2P logic here
    }
}
let config = KittenTTSConfig(phonemizer: .custom(MyPhonemizer()))

Voices

Case Name Gender
.bella (default) Bella Female
.jasper Jasper Male
.luna Luna Female
.bruno Bruno Male
.rosie Rosie Female
.hugo Hugo Male
.kiki Kiki Female
.leo Leo Male

API Reference

KittenTTS

// Init — downloads phonemizer data & model if needed
public init(_ config: KittenTTSConfig = .init(),
            downloadProgressHandler: ((Double) -> Void)? = nil) async throws

// Generate audio
public func generate(_ text: String,
                     voice: KittenVoice? = nil,
                     speed: Float? = nil) async throws -> KittenTTSResult

// Generate and play
@discardableResult
public func speak(_ text: String,
                  voice: KittenVoice? = nil,
                  speed: Float? = nil) async throws -> KittenTTSResult

// Stop playback
public func stopSpeaking()

// Check if model is cached (no download required)
public static func isModelCached(for config: KittenTTSConfig = .init()) -> Bool

// Pre-download without creating a full instance
public static func prewarm(config: KittenTTSConfig = .init()) async throws

KittenTTSResult

public struct KittenTTSResult {
    public let samples: [Float]          // Raw Float32 PCM at 24 kHz
    public let sampleRate: Int           // Always 24_000
    public var duration: TimeInterval    // Audio length in seconds
    public let voice: KittenVoice        // Voice used
    public let effectiveSpeed: Float     // Speed × model speed prior
    public let inputText: String         // Original input text

    public func wavData() -> Data                    // Encode as WAV
    public func writeWAV(to url: URL) throws         // Write WAV to disk
}

Examples

The Examples/ directory contains two complete SwiftUI apps:

  • KittenTTSiOSExample/ — iOS 17+ app (iPhone / iPad)
  • KittenTTSmacOSExample/ — macOS 14+ app (native Mac window)

To open an example, regenerate the Xcode project with XcodeGen:

cd Examples/KittenTTSiOSExample
xcodegen generate
open KittenTTSiOSExample.xcodeproj

Architecture

KittenSDK/
├── Sources/
│   ├── Czlib/               # System library shim for zlib (DEFLATE in .npz)
│   ├── CEPhonemizer/        # C++ phonemizer engine + C bridge for Swift interop
│   └── KittenTTS/
│       ├── Public/          # Public API (KittenTTS, KittenVoice, KittenPhonemizerProtocol, …)
│       ├── Engine/          # TextPreprocessor, TextCleaner, TTSEngine
│       ├── Phonemizer/      # EPhonemizer, BuiltinPhonemizer, ESpeakBinaryPhonemizer, RuleBasedG2P
│       ├── Loader/          # NPZLoader (ZIP64 + float16/32 .npy)
│       ├── Audio/           # WAVEncoder, AudioOutput
│       └── Internal/        # Extensions, ModelDownloader
├── Tests/KittenTTSTests/    # XCTest suite
└── Examples/
    ├── KittenTTSiOSExample/
    └── KittenTTSmacOSExample/

Pipeline

text
 └─▶ TextPreprocessor      (numbers, currency, ordinals → words)
      └─▶ KittenPhonemizerProtocol  (pluggable G2P: builtin / espeak binary / custom)
           └─▶ TextCleaner (IPA → Int64 token IDs)
                └─▶ TTSEngine (ONNX Runtime inference, chunked)
                     └─▶ Float32 PCM @ 24 kHz
                          └─▶ WAVEncoder / AudioOutput

License

Apache License 2.0. See LICENSE for details.

The KittenTTS model weights are distributed separately under their own license. See KittenML/KittenTTS for details.

The SDK supports multiple pluggable phonemizers via KittenPhonemizerProtocol. When using EPhonemizer it downloads GPL v3-licensed data files (en_rules, en_list) at runtime - these files are only fetched when EPhonemizer is selected and are not bundled in the SDK/app.

About

On-device text-to-speech Swift SDK for iOS and macOS, powered by Kitten TTS

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors