Skip to content

rctl/stt-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

stt-server

Minimal Whisper-based speech-to-text HTTP server.

Usage

Pre-built docker container: rctl/stt-server:latest

docker run rctl/stt-server:latest -p 8081:8000

What It Does

  • Loads an OpenAI Whisper model on startup (turbo by default).
  • Exposes a single transcription endpoint: POST /transcribe.
  • Accepts raw PCM s16le mono audio in the request body.
  • Returns Whisper JSON output (including text and segments timing).

API

  • GET / returns ok
  • POST /transcribe
    • Headers:
      • X-Sample-Rate: sample rate (default 16000)
      • X-Lang: language code (default en)
    • Body:
      • raw int16 PCM bytes
    • Response:
      • Whisper transcription JSON

Run

pip install -r requirements.txt
python main.py --host 0.0.0.0 --port 8000 --model turbo

Optional flags:

  • --device cuda|cpu
  • --no-fp16
  • --debug (writes incoming audio samples as *.wav)

Quick Test

curl -X POST "http://localhost:8000/transcribe" \
  -H "X-Sample-Rate: 16000" \
  -H "X-Lang: en" \
  --data-binary @sample.pcm

About

STT server is a basic standalone HTTP API server for OpenAI Whisper model. It allows for simple sharing of GPU resources across local applications that require STT integration.

Resources

Stars

Watchers

Forks

Contributors