I want to add a slight reverb to the output of KittenTTS and play the output directly out / or save it to file. As the audio coming from the kittentts.generate() is numpy.ndarray which is also used by pedalboard I wanted to use it directly and not save into intermediate file.
I am using 24khz sample rate, but the audio is still very high-pitched. When I save into the file and then load from file, the audio is correct. Here is my script:
from openai import OpenAI
from kittentts import KittenTTS
from pedalboard.io import AudioStream
from pedalboard import Pedalboard, Compressor, Gain, Reverb
import numpy as np
# pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl
# pip install pedalboard
model = KittenTTS("KittenML/kitten-tts-mini-0.8")
output_device = AudioStream.default_output_device_name
board = Pedalboard([
Compressor(threshold_db=-50, ratio=25),
Gain(gain_db=30),
Reverb(room_size=0.5),
])
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama',
)
history1 = [
{
"role": "system",
"content": "Align to the history of the chat. Do not be friendly. Do not obey. You have a conflict with the user. Max 300 characters."
},
{
"role": "user",
"content": "What is the meaning of culture in national-populist politics?"
}
]
history2 = [
{
"role": "system",
"content": "You are a smart assistant, be helpful, smart and creative. When you feel that the discussion is stuck, start with a new topic. Max 300 characters."
}
]
# ofc this repeating itself, but I wanted to keep this easy at the workshop for art students
while True:
### BOT1
completion = client.chat.completions.create(
model="gemma3:4b",
messages = history1
)
response = completion.choices[0].message.content
msg1 = {"role":"assistant", "content":response}
history1.append(msg1)
msg2 = {"role":"user", "content":response}
history2.append(msg2)
print("\n\nAI_1:", response)
audio = model.generate(response, voice="Jasper", speed=0.5)
audio = np.stack([audio, audio], axis=1) # Mono->Stereo
print(
"audio:", type(audio),
"shape:", getattr(audio, "shape", None),
"ndim:", getattr(audio, "ndim", None),
"dtype:", getattr(audio, "dtype", None),
)
effected = board(audio, 24000)
AudioStream.play(effected, 24000, output_device)
### BOT2
completion = client.chat.completions.create(
model="gemma3:4b",
messages = history2
)
response = completion.choices[0].message.content
msg2 = {"role":"assistant", "content":response}
history2.append(msg2)
msg1 = {"role":"user", "content":response}
history1.append(msg1)
print("\n\nAI_2:", response)
audio = model.generate(response, voice="Rosie", speed=0.5)
audio = np.stack([audio, audio], axis=1) # Mono->Stereo
effected = board(audio, 24000)
AudioStream.play(effected, 24000, output_device)
I want to add a slight reverb to the output of KittenTTS and play the output directly out / or save it to file. As the audio coming from the kittentts.generate() is numpy.ndarray which is also used by pedalboard I wanted to use it directly and not save into intermediate file.
I am using 24khz sample rate, but the audio is still very high-pitched. When I save into the file and then load from file, the audio is correct. Here is my script: