Skip to content

fix(transcribe): auto-fallback to CPU + int8 when CUDA is unavailable#19

Closed
fadenb wants to merge 1 commit into
pretyflaco:mainfrom
fadenb:main
Closed

fix(transcribe): auto-fallback to CPU + int8 when CUDA is unavailable#19
fadenb wants to merge 1 commit into
pretyflaco:mainfrom
fadenb:main

Conversation

@fadenb
Copy link
Copy Markdown
Contributor

@fadenb fadenb commented May 7, 2026

Summary

  • TranscriptionConfig.post_init no longer raises ValueError when device='cuda' (or torch_device='cuda') is requested but CUDA is not present. Instead it automatically falls back to 'cpu' and downgrades compute_type from 'float16' to 'int8' (float16 is unsupported on CPU).
  • Model-loading log now indicates whether CPU was explicitly requested (forced), automatically chosen because no GPU was found (fallback - no GPU), or torch is missing (no torch).

Motivation
Running meet run on a machine without a GPU (laptop, container without passthrough, CI runner) currently crashes with an unhelpful ValueError. The user must know to pass --device cpu --compute-type int8 manually. This change makes it "just work" - the common case shouldn't require flags.

Test plan
I currently can only do one of them as I lack a device with CUDA capable GPU.

  • Run meet run on a machine without a GPU - should see warning, then transcribe successfully with int8 on CPU
  • Run meet run --device cpu on a machine with a GPU - should see (forced) in the log and use CPU as requested
  • Run meet run on a machine with a GPU - should use CUDA with float16 as before (no behavioural change)

Instead of raising ValueError when the requested CUDA device is not
present, automatically fall back to CPU and downgrade compute_type from
float16 to int8 (float16 is unsupported on CPU).  Also indicate whether
CPU is forced or a fallback in the model-loading print message.
Copy link
Copy Markdown
Owner

@pretyflaco pretyflaco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction is right — the current ValueError is genuinely user-hostile on no-GPU machines. The PR is well-scoped and the motivation is clear. Approving.

A few small follow-ups I'd want before this hits a release:

  1. Unit test for the __post_init__ fallback. Mocking _torch_device_available lets us cover all three of your test scenarios in CI without the hardware mix you flagged. ~15 lines in tests/test_transcribe.py.
  2. Warning log should mention the compute_type change. Right now a user who passed --compute-type float16 explicitly sees only the device fallback warning, while compute_type silently flips to int8. One-line append.
  3. (Out of scope, just noting:) meet check on a CUDA-less machine should keep working — your change preserves the if available is None: continue branch so this should be fine, I'll smoke-test it on my side.

You're blocked on hardware for two of your three test scenarios anyway, so happy to land this as-is and push a follow-up commit with the unit test + warning tweak — or if you'd rather do it yourself for the learning, take a few days and add them here. Either works for me, just let me know which you prefer.

Either way, thanks for the clean PR — the diagnosis in the description was excellent.

@pretyflaco
Copy link
Copy Markdown
Owner

Superseded by #21 — picked up the follow-ups from review (compute_type warning, accurate fallback log via internal flag, unit tests, CHANGELOG). Your commit is preserved with full attribution via cherry-pick. Thanks again @fadenb for the clean diagnosis and patch!

@pretyflaco pretyflaco closed this May 14, 2026
pasogott pushed a commit to calumba-holding/meetscribe that referenced this pull request May 14, 2026
Follow-up to pretyflaco#19 (cherry-picked) addressing review feedback:

- __post_init__ now emits a second warning when compute_type is flipped
  from float16 to int8 because the device fell back to CPU.  Previously
  the user only saw the device fallback message; the compute_type change
  was silent.
- TranscriptionConfig gains an internal _device_auto_fallback flag set
  when device is auto-flipped to cpu.  _load_whisperx_asr_model reads
  the flag instead of re-sniffing torch at print time, so the
  "(forced)" vs "(fallback — no GPU)" annotation is accurate even when
  the user explicitly passes --device cpu on a no-GPU machine.
- Removed dead conditional `fallback = "cpu" if value == "cuda" else "cpu"`.
- tests/test_transcribe.py: rewrote the two raise-expecting tests
  (test_invalid_torch_device_{cuda,mps}_raises) to assert the new
  fallback behavior, and added three tests covering the compute_type
  warning, the no-spurious-warning case when compute_type is already
  int8, and that explicit --device cpu does not set _device_auto_fallback.
- CHANGELOG: v0.7.1 entry crediting @fadenb.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants