Skip to content

Sonic-Forage/stable-audio-api

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stable Audio API

FastAPI server for Stability AI Stable Audio 3, using uv for Python environment and dependency management.

The Hugging Face models are gated. Before starting the server, accept the terms for each model you want to use and provide a token with access:

Setup

uv sync
export HF_TOKEN=hf_your_token_here
export STABLE_AUDIO_DEFAULT_MODEL=small-sfx
export STABLE_AUDIO_DEVICE=cpu
uv run stable-audio-api --host 0.0.0.0 --port 8000

The server automatically loads a .env file from this project directory. You can also ask uv to load it explicitly:

uv run --env-file .env stable-audio-api --host 0.0.0.0 --port 8000

Use STABLE_AUDIO_DEVICE=cuda on a CUDA machine, or leave it unset to let stable-audio-3 auto-detect cuda, mps, then cpu. The medium model requires CUDA with Flash Attention support in the upstream Stable Audio 3 package.

Generate Audio

Choose a model per request with the model property. Valid values are small-sfx, small-music, and medium.

For local development, the synchronous endpoint returns WAV bytes directly:

curl -X POST http://localhost:8000/v1/audio/generations \
  -H "Content-Type: application/json" \
  --output train.wav \
  -d '{
    "model": "small-sfx",
    "prompt": "chugging train coming into station with horn",
    "duration": 7,
    "steps": 8,
    "cfg_scale": 1.0,
    "seed": -1
  }'

The API also accepts full Hugging Face repo IDs as aliases, for example "model": "stabilityai/stable-audio-3-medium".

Generate With Jobs

For cloud deployments, use the async job endpoints. They return quickly, generate audio in the background, write the WAV to local storage or S3/R2, and expose a download URL when complete.

curl -X POST http://localhost:8000/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "model": "small-sfx",
    "prompt": "short metallic impact with room reverb",
    "duration": 5,
    "steps": 8
  }'

Response:

{
  "id": "68e48e7af36c4d829e3797a0b3e7687c",
  "status": "queued",
  "status_url": "http://localhost:8000/jobs/68e48e7af36c4d829e3797a0b3e7687c"
}

Poll status:

curl http://localhost:8000/jobs/68e48e7af36c4d829e3797a0b3e7687c

When status is succeeded, download_url points to the generated WAV. Without object storage configured, job outputs are written under outputs/ and served by the local API.

Job state is kept in memory. For multiple workers, restarts, or production serverless, use Redis/Postgres or another shared job store.

Endpoints

  • GET /health returns available models, preloaded models, loaded models, and duration limits.
  • POST /jobs starts a background generation job and returns a job ID.
  • GET /jobs/{id} returns job status and a download URL when complete.
  • POST /v1/audio/generations returns a audio/wav response.
  • POST /generate is an alias for the generation endpoint.

Configuration

Environment variable Default Description
HF_TOKEN unset Hugging Face token for gated model access.
STABLE_AUDIO_DEFAULT_MODEL small-sfx Default model when a request omits model.
STABLE_AUDIO_MODEL unset Backward-compatible alias for STABLE_AUDIO_DEFAULT_MODEL.
STABLE_AUDIO_PRELOAD_MODELS default model Comma-separated models to load at startup. Set empty to lazy-load only.
STABLE_AUDIO_DEVICE unset Optional cuda, mps, or cpu.
STABLE_AUDIO_MODEL_HALF true Use fp16 on CUDA. Automatically disabled by the model on CPU/MPS.
STABLE_AUDIO_MAX_DURATION 380 API-wide duration cap. Small models still cap at 120s; medium caps at 380s.
STABLE_AUDIO_MAX_STEPS 50 API sampling step limit.
STABLE_AUDIO_OUTPUT_DIR outputs Local output directory for job WAVs when S3/R2 is not configured.
STABLE_AUDIO_STORAGE_BUCKET unset S3/R2 bucket for job WAV output. Enables S3-compatible storage.
STABLE_AUDIO_STORAGE_PREFIX stable-audio/jobs Object key prefix for uploaded WAV files.
STABLE_AUDIO_STORAGE_ENDPOINT_URL unset S3-compatible endpoint URL, such as Cloudflare R2.
STABLE_AUDIO_STORAGE_REGION us-east-1 S3 region. Use auto for Cloudflare R2 if desired.
STABLE_AUDIO_STORAGE_PUBLIC_BASE_URL unset Optional public/CDN base URL. If unset, the API generates presigned URLs.
STABLE_AUDIO_PRESIGNED_URL_EXPIRES 3600 Presigned download URL lifetime in seconds.

The upstream Stable Audio 3 package pins PyTorch and torchaudio. This project mirrors its CUDA 12.6 uv source configuration for Linux x86_64; macOS uses the standard PyPI wheels.

About

No description, website, or topics provided.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 92.7%
  • Dockerfile 7.3%