First off, thank you for building this awesome tool with cross platform support!
Currently, when importing a video file (MP4/MKV) with multiple discrete audio tracks (e.g., separate tracks for different speakers in a podcast or gameplay recording), Buzz appears to only transcribe the first audio stream.
It would be great to have a way to select which audio track(s) to transcribe from a multi-track file, with options for:
- Track Selection: Upon importing a file with multiple audio streams, provide a dropdown or checklist to select specific tracks.
- Batch Processing: The ability to transcribe multiple selected tracks from the same file simultaneously (producing separate SRT/TXT files for each).
- Automatic Labeling: If multiple tracks are selected, automatically use the track name or index as a prefix for the exported filenames (e.g., video_track1.srt, video_track2.srt).
The current workaround is to manually extract tracks using ffmpeg -map 0:a:X before importing them into Buzz.
This would be a huge win for editors using DaVinci Resolve or Premiere Pro. Having separate SRTs allows for 100% accurate speaker separation without the extra compute time or potential errors of AI diarization. It also makes it much easier to style speakers differently (like using different colors or screen positions) in the final edit.
First off, thank you for building this awesome tool with cross platform support!
Currently, when importing a video file (MP4/MKV) with multiple discrete audio tracks (e.g., separate tracks for different speakers in a podcast or gameplay recording), Buzz appears to only transcribe the first audio stream.
It would be great to have a way to select which audio track(s) to transcribe from a multi-track file, with options for:
The current workaround is to manually extract tracks using ffmpeg -map 0:a:X before importing them into Buzz.
This would be a huge win for editors using DaVinci Resolve or Premiere Pro. Having separate SRTs allows for 100% accurate speaker separation without the extra compute time or potential errors of AI diarization. It also makes it much easier to style speakers differently (like using different colors or screen positions) in the final edit.