Skip to content

Add cli#31

Merged
mhrice merged 5 commits into
mainfrom
add-cli
May 20, 2026
Merged

Add cli#31
mhrice merged 5 commits into
mainfrom
add-cli

Conversation

@mhrice
Copy link
Copy Markdown
Collaborator

@mhrice mhrice commented May 20, 2026

No description provided.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a stable-audio command-line interface entrypoint for running Stable Audio 3 generation (text-to-audio, audio-to-audio, inpainting/continuation) without writing Python, along with unit tests and documentation.

Changes:

  • Introduces stable_audio_3/cli.py implementing argparse-based generation and WAV output saving.
  • Adds tests/test_cli.py covering CLI flag wiring via mocked model and torchaudio I/O.
  • Updates README.md with CLI usage examples and adds the stable-audio console script in pyproject.toml.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
stable_audio_3/cli.py New CLI implementation: argument parsing, model loading, generation invocation, and output saving.
tests/test_cli.py New unit tests validating CLI flag wiring with mocked dependencies.
README.md Documents CLI usage and adds a note about uv sync behavior.
pyproject.toml Adds a stable-audio console script entrypoint.
Comments suppressed due to low confidence (2)

stable_audio_3/cli.py:225

  • torchaudio.load() returns (waveform, sample_rate), but StableAudioModel.generate() expects audio inputs as (sample_rate, tensor) tuples (see StableAudioModel._encode_audio_input unpacking). Passing the raw torchaudio.load() result here will swap sample rate and waveform and break audio-to-audio/inpainting. Load as (waveform, sr) and then pass (sr, waveform) for both init_audio and inpaint_audio.
    init_audio = None
    if args.init_audio:
        init_audio = torchaudio.load(args.init_audio)

    inpaint_audio = None
    if args.inpaint_audio:
        inpaint_audio = torchaudio.load(args.inpaint_audio)

stable_audio_3/cli.py:195

  • When --prompt/--duration/--negative-prompt are passed with multiple values, the CLI will pass lists into model.generate(). The model validates list lengths with assert len(value) == batch_size (in StableAudioModel._build_conditioning_dicts), which will raise an AssertionError (and can become a silent logic bug under python -O). Add explicit CLI-side validation (or auto-infer batch_size from the list lengths when the user didn’t set it) and fail with parser.error(...) on mismatches.
    # --- Build scalar / list args ---
    prompt = args.prompt[0] if len(args.prompt) == 1 else args.prompt
    negative_prompt = None
    if args.negative_prompt:
        negative_prompt = (
            args.negative_prompt[0]
            if len(args.negative_prompt) == 1
            else args.negative_prompt
        )
    duration = args.duration[0] if len(args.duration) == 1 else args.duration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_cli.py
Comment on lines +20 to +24
SAMPLE_RATE = 44100
CHANNELS = 2
FAKE_AUDIO_PATH = "some/audio.wav"
_FAKE_LOAD_RESULT = (SAMPLE_RATE, torch.zeros(CHANNELS, SAMPLE_RATE * 5))

Comment thread stable_audio_3/cli.py
if args.inpaint_starts and len(args.inpaint_starts) != len(args.inpaint_ends):
parser.error(
"--inpaint-start and --inpaint-end must be specified the same number of times"
)
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

stable_audio_3/cli.py:199

  • Same issue as prompts: when --duration is provided multiple times, the CLI passes a list through to StableAudioModel.generate, which asserts that list length == batch_size. Please validate the length against --batch-size (or require batch_size to match) and fail via parser.error so users get a clear CLI message instead of an AssertionError.
    duration = args.duration[0] if len(args.duration) == 1 else args.duration

Comment thread stable_audio_3/cli.py
Comment on lines +191 to +195
prompt = args.prompt[0] if len(args.prompt) == 1 else args.prompt
negative_prompt = None
if args.negative_prompt:
negative_prompt = (
args.negative_prompt[0]
Comment thread stable_audio_3/cli.py
Comment on lines +215 to +219
if args.loras:
print(f"Loading LoRA(s): {args.loras}")
model.load_lora(args.loras)
if args.lora_strength is not None:
model.set_lora_strength(args.lora_strength, lora_index=args.lora_index)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine

Comment thread stable_audio_3/cli.py Outdated
Comment on lines +32 to +33
warnings.filterwarnings("ignore")

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

stable_audio_3/cli.py:188

  • Inpaint region times are not validated for non-negativity or ordering. Negative seconds can produce negative sample indices (masking the end of the clip due to Python negative indexing), and end <= start becomes a silent no-op region. Validate each start/end pair after parsing (>=0 and end > start) and parser.error on invalid regions.
    # --- Validate inpaint args ---
    if (args.inpaint_starts is None) != (args.inpaint_ends is None):
        parser.error("--inpaint-start and --inpaint-end must both be provided together")
    if args.inpaint_starts and len(args.inpaint_starts) != len(args.inpaint_ends):
        parser.error(
            "--inpaint-start and --inpaint-end must be specified the same number of times"
        )
    if args.inpaint_starts and not args.inpaint_audio:
        parser.error("--inpaint-start/--inpaint-end require --inpaint-audio")
    if args.inpaint_audio and not args.inpaint_starts:
        parser.error("--inpaint-audio requires --inpaint-start and --inpaint-end")

stable_audio_3/cli.py:214

  • --duration accepts zero/negative values. In StableAudioModel.generate, non-positive durations cause sample-size adaptation to fall back to the default sample_size, which can generate unexpectedly long audio instead of failing fast. Add a check that all provided durations are > 0 and raise parser.error otherwise.
    # --- Validate list-flag lengths against batch size ---
    if (
        args.negative_prompt
        and len(args.negative_prompt) > 1
        and len(args.negative_prompt) != batch_size
    ):
        parser.error(
            f"Got {len(args.negative_prompt)} --negative-prompt values but batch size is {batch_size}"
        )
    if len(args.duration) > 1 and len(args.duration) != batch_size:
        parser.error(
            f"Got {len(args.duration)} --duration values but batch size is {batch_size}"
        )

Comment thread stable_audio_3/cli.py
Comment on lines +108 to +113
parser.add_argument(
"--init-noise-level",
type=float,
default=0.9,
help="Noise level for audio-to-audio (0.0–1.0, default: 0.9)",
)
@mhrice mhrice merged commit bbcf738 into main May 20, 2026
5 checks passed
@mhrice mhrice deleted the add-cli branch May 20, 2026 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants