Summary
Add NVIDIA NVENC hardware video encoding support for faster H.264 encoding. Includes ring buffer for smoothing I/O jitter.
Priority: MEDIUM ⏳
Performance Optimization - Can be done in parallel with deployment work.
Dependencies
Design
Encoder Abstraction
trait VideoEncoder {
fn encode_frame(&mut self, frame: &[u8]) -> Option<EncodedChunk>;
fn flush(&mut self) -> Vec<EncodedChunk>;
}
Backend Priority
- h264_nvenc (NVIDIA GPU)
- h264_qsv (Intel QuickSync)
- h264_videotoolbox (macOS)
- libx264 (software fallback)
Ring Buffer
- GPU memory ring buffer
- Producer: Pipeline pushes frames
- Consumer: Encoder pulls frames
- Smooths network I/O jitter
Tasks
1. Define Encoder Abstraction
- Create
src/pipeline/gpu/encoder.rs
- Define
VideoEncoder trait
- Define
EncoderConfig:
- codec, preset, crf, gop_size
- pixel_format, resolution
2. Implement NVENC Backend
- Create
src/pipeline/gpu/nvenc.rs
- Options:
- Direct NVENC API (complex, best performance)
- ffmpeg subprocess with -c:v h264_nvenc (simpler)
- Initial: Use ffmpeg subprocess
- Future: Direct API for lower latency
3. Implement Ring Buffer
- Create
src/pipeline/gpu/ring_buffer.rs
- Fixed-size circular buffer
- CUDA pinned memory (optional)
- Block producer when full
- Block consumer when empty
4. Implement Auto-Detection
detect_available_encoders() -> Vec<EncoderType>
- Check for:
- nvidia-smi presence
- ffmpeg encoder availability
- CUDA runtime
- Select best available
5. Implement Fallback Chain
- Try encoders in priority order
- On failure: try next
- Log which encoder selected
- Always have software fallback
6. Configuration
struct GpuEncoderConfig {
preferred_encoder: Option<EncoderType>,
enable_ring_buffer: bool,
ring_buffer_size: usize, // frames
nvenc_preset: String, // p1-p7
nvenc_tune: String, // hq, ll, ull
}
7. Integrate with Pipeline
- Replace current ffmpeg encoder stage
- Use encoder abstraction
- Ring buffer between network and encoder
- Async frame feeding
8. Metrics
encoder_type (label on all metrics)
encoder_frames_total
encoder_fps
encoder_queue_depth (ring buffer)
encoder_gpu_utilization (if available)
Performance Target
| Metric |
Software Encoding |
NVENC |
Improvement |
| Encoding Time |
30-60 sec/job |
5-10 sec/job |
5-10x |
| Throughput/Worker |
~67 MB/s |
~400 MB/s |
6x |
Files to Create
src/pipeline/gpu/encoder.rs
src/pipeline/gpu/nvenc.rs
src/pipeline/gpu/ring_buffer.rs
Files to Modify
src/pipeline/gpu/mod.rs
src/dataset/kps/video_encoder.rs (optional: share abstraction)
Hardware Requirements
- NVIDIA GPU (Turing or newer recommended)
- NVIDIA drivers 450+
- CUDA runtime (optional for direct API)
- ffmpeg compiled with --enable-nvenc
Acceptance Criteria
Summary
Add NVIDIA NVENC hardware video encoding support for faster H.264 encoding. Includes ring buffer for smoothing I/O jitter.
Priority: MEDIUM ⏳
Performance Optimization - Can be done in parallel with deployment work.
Dependencies
Design
Encoder Abstraction
Backend Priority
Ring Buffer
Tasks
1. Define Encoder Abstraction
src/pipeline/gpu/encoder.rsVideoEncodertraitEncoderConfig:2. Implement NVENC Backend
src/pipeline/gpu/nvenc.rs3. Implement Ring Buffer
src/pipeline/gpu/ring_buffer.rs4. Implement Auto-Detection
detect_available_encoders() -> Vec<EncoderType>5. Implement Fallback Chain
6. Configuration
7. Integrate with Pipeline
8. Metrics
encoder_type(label on all metrics)encoder_frames_totalencoder_fpsencoder_queue_depth(ring buffer)encoder_gpu_utilization(if available)Performance Target
Files to Create
src/pipeline/gpu/encoder.rssrc/pipeline/gpu/nvenc.rssrc/pipeline/gpu/ring_buffer.rsFiles to Modify
src/pipeline/gpu/mod.rssrc/dataset/kps/video_encoder.rs(optional: share abstraction)Hardware Requirements
Acceptance Criteria