Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@
"server/services/tts/rime",
"server/services/tts/sarvam",
"server/services/tts/speechmatics",
"server/services/tts/xai",
"server/services/tts/xtts"
]
},
Expand Down
1 change: 1 addition & 0 deletions server/services/supported-services.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ Text-to-Speech services receive text input and output audio streams or chunks.
| [Rime](/server/services/tts/rime) | `pip install "pipecat-ai[rime]"` |
| [Sarvam](/server/services/tts/sarvam) | No dependencies required |
| [Speechmatics](/server/services/tts/speechmatics) | `pip install "pipecat-ai[speechmatics]"` |
| [xAI](/server/services/tts/xai) | `pip install "pipecat-ai[xai]"` |
| [XTTS](/server/services/tts/xtts) | `pip install "pipecat-ai[xtts]"` |

## Speech-to-Speech
Expand Down
192 changes: 192 additions & 0 deletions server/services/tts/xai.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
---
title: "xAI"
description: "Text-to-speech service using xAI's HTTP API with support for 20 languages"
---

## Overview

xAI provides text-to-speech synthesis via an HTTP API with support for multiple languages and audio encoding formats.

<CardGroup cols={2}>
<Card
title="xAI TTS API Reference"
icon="code"
href="https://reference-server.pipecat.ai/en/latest/api/pipecat.services.xai.tts.html"
>
Complete API reference for all parameters and methods
</Card>
<Card
title="Example Implementation"
icon="play"
href="https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/07e-interruptible-xai.py"
>
Complete example with interruption handling
</Card>
<Card
title="xAI Documentation"
icon="book"
href="https://docs.x.ai/developers/model-capabilities/audio/text-to-speech"
>
Official xAI TTS API documentation
</Card>
</CardGroup>

## Installation

```bash
pip install "pipecat-ai[xai]"
```

## Prerequisites

1. **xAI Account**: Sign up at [xAI](https://x.ai/)
2. **API Key**: Generate an API key from your account dashboard (also works with Grok API keys)

Set the following environment variable:

```bash
export GROK_API_KEY=your_api_key
```

## Configuration

### XAIHttpTTSService

<ParamField path="api_key" type="str" required>
xAI API key for authentication.
</ParamField>

<ParamField path="base_url" type="str" default="https://api.x.ai/v1/tts">
xAI TTS endpoint URL. Override for custom or proxied deployments.
</ParamField>

<ParamField path="sample_rate" type="int" default="None">
Output audio sample rate in Hz. When `None`, uses the pipeline's configured
sample rate.
</ParamField>

<ParamField path="encoding" type="str" default="pcm">
Output audio encoding format. Supported formats: `"pcm"`, `"mp3"`, `"wav"`,
`"mulaw"`, `"alaw"`.
</ParamField>

<ParamField path="aiohttp_session" type="aiohttp.ClientSession" default="None">
Optional shared aiohttp session for HTTP requests. If `None`, the service
creates and manages its own session.
</ParamField>

<ParamField path="settings" type="XAIHttpTTSService.Settings" default="None">
Runtime-configurable settings. See [Settings](#settings) below.
</ParamField>

### Settings

Runtime-configurable settings passed via the `settings` constructor argument using `XAIHttpTTSService.Settings(...)`. These can be updated mid-conversation with `TTSUpdateSettingsFrame`. See [Service Settings](/guides/fundamentals/service-settings) for details.

| Parameter | Type | Default | Description |
| ---------- | ----------------- | ------------- | ---------------------------------------------------- |
| `model` | `str` | `None` | Model identifier. _(Inherited from base settings.)_ |
| `voice` | `str` | `"eve"` | Voice identifier. _(Inherited from base settings.)_ |
| `language` | `Language \| str` | `Language.EN` | Language code. _(Inherited from base settings.)_ |

## Supported Languages

xAI TTS supports 20 languages. Use the `Language` enum from `pipecat.transcriptions.language`:

- Arabic (Egyptian, Saudi, UAE): `Language.AR`, `Language.AR_EG`, `Language.AR_SA`, `Language.AR_AE`
- Bengali: `Language.BN`
- Chinese: `Language.ZH`
- English: `Language.EN`
- French: `Language.FR`
- German: `Language.DE`
- Hindi: `Language.HI`
- Indonesian: `Language.ID`
- Italian: `Language.IT`
- Japanese: `Language.JA`
- Korean: `Language.KO`
- Portuguese (Brazil, Portugal): `Language.PT`, `Language.PT_BR`, `Language.PT_PT`
- Russian: `Language.RU`
- Spanish (Spain, Mexico): `Language.ES`, `Language.ES_ES`, `Language.ES_MX`
- Turkish: `Language.TR`
- Vietnamese: `Language.VI`

## Usage

### Basic Setup

```python
import os
from pipecat.services.xai import XAIHttpTTSService

tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
```

### With Custom Language

```python
from pipecat.transcriptions.language import Language

tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
settings=XAIHttpTTSService.Settings(
voice="eve",
language=Language.ES,
),
)
```

### With Custom Encoding

```python
tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
encoding="mp3",
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
```

### With Shared HTTP Session

```python
import aiohttp

async with aiohttp.ClientSession() as session:
tts = XAIHttpTTSService(
api_key=os.getenv("GROK_API_KEY"),
aiohttp_session=session,
settings=XAIHttpTTSService.Settings(
voice="eve",
),
)
```

### Updating Settings at Runtime

Voice settings can be changed mid-conversation using `TTSUpdateSettingsFrame`:

```python
from pipecat.frames.frames import TTSUpdateSettingsFrame
from pipecat.services.xai.tts import XAITTSSettings
from pipecat.transcriptions.language import Language

await task.queue_frame(
TTSUpdateSettingsFrame(
delta=XAITTSSettings(
language=Language.FR,
)
)
)
```

## Notes

- **HTTP-only**: This service uses xAI's HTTP API. The service requests raw PCM audio by default, which matches Pipecat's downstream expectations without extra decoding.
- **Encoding options**: When using non-PCM encodings (`mp3`, `wav`, `mulaw`, `alaw`), ensure your audio pipeline can handle the selected format.
- **Automatic session management**: If you don't provide an `aiohttp_session`, the service creates and manages its own session lifecycle automatically.
Loading