Skip to content

fix(trt): explicit TRT context/engine teardown + VRAM pre-check to prevent CUDA OOM on reload#2

Open
livepeer-tessa wants to merge 1 commit intomainfrom
fix/cuda-oom-tensorrt-memory-leak
Open

fix(trt): explicit TRT context/engine teardown + VRAM pre-check to prevent CUDA OOM on reload#2
livepeer-tessa wants to merge 1 commit intomainfrom
fix/cuda-oom-tensorrt-memory-leak

Conversation

@livepeer-tessa
Copy link

Summary

Fixes #723 — streamdiffusion-sdxl workers enter ERROR state after pipeline param updates due to CUDA OOM.

Root Causes

Two bugs conspired to leave ~22 GB of GPU memory pinned after cleanup_gpu_memory():

1. Incorrect teardown order in Engine.__del__ (utilities.py)**

TensorRT requires the execution context to be destroyed before the engine. The old code did del self.engine / del self.context (Python reference deletion, no ordering guarantee). Setting self.context = None first forces the C++ IExecutionContext destructor to run before ICudaEngine is released.

2. Manual __del__() call is unreliable (wrapper.py)**

cleanup_gpu_memory() called unet_engine.engine.__del__() explicitly. Python's destructor protocol doesn't guarantee immediate native teardown when called this way — the object can still be alive in the GC graph and TRT CUDA memory stays pinned.

Changes

src/streamdiffusion/acceleration/tensorrt/utilities.py

  • Engine.__del__: set self.context = None then self.engine = None before the del statements to ensure the C++ destructors fire in the correct order.

src/streamdiffusion/wrapper.py

  • New static helper _destroy_trt_engine: explicitly nullifies contextengine → frees buffers on any Engine wrapper. Replaces the fragile manual __del__() call.
  • cleanup_gpu_memory rewrite: uses _destroy_trt_engine on UNet, VAE encoder/decoder, and ControlNet engines; calls gc.collect() between stages; logs non-PyTorch residual VRAM so operators can spot incomplete teardown.
  • VRAM pre-flight check in _load_model: after cleanup, checks torch.cuda.mem_get_info(). If free VRAM < 2 GB, raises RuntimeError with an actionable message instead of letting the process OOM mid-load and exhaust the 3-restart budget.

Testing

Cannot be tested locally without a 24 GB GPU + TRT engine build. Tested via log inspection and code review against the crash trace in #723.

Checklist:

  • Both changed files parse cleanly (ast.parse)
  • Teardown order matches TRT documentation (context before engine)
  • No silent except: pass that would hide new failures
  • VRAM check raises with an actionable message rather than logging and continuing

…t CUDA OOM on reload

Fixes #723 — streamdiffusion-sdxl CUDA OOM on pipeline reload.

Root causes:
1. cleanup_gpu_memory() called Engine.__del__() manually, which is unreliable.
   CPython may defer destructor invocation, leaving TensorRT execution contexts
   and ICudaEngine objects alive (and their GPU memory pinned) past the call.
2. Engine.__del__() did 'del self.engine' without first setting self.context = None,
   violating TRT's required teardown order (context must be destroyed before engine).
3. No VRAM guard before reload — OOM occurred mid-load with no early diagnostic.

Changes:
- utilities.py / Engine.__del__: set self.context = None then self.engine = None
  before the 'del' statements so the C++ destructors fire in the correct order.
- wrapper.py / _destroy_trt_engine (new static helper): explicit per-attribute
  nullification of Engine.context, Engine.engine, and buffer freeing; replaces
  the fragile manual __del__() call.
- wrapper.py / cleanup_gpu_memory (rewrite): uses _destroy_trt_engine on every
  TRT wrapper (UNet, VAE encoder/decoder, ControlNet pool); calls gc.collect()
  between context and engine deletion; reports non-PyTorch residual VRAM so
  operators can detect incomplete TRT teardown.
- wrapper.py / _load_model: adds VRAM pre-flight check after cleanup — raises
  RuntimeError with actionable message if free VRAM < 2 GB, preventing the
  process from entering a slow OOM crash loop (and hitting the 3-restart limit).

Signed-off-by: Tessa (livepeer-tessa) <tessa@livepeer.org>
Signed-off-by: livepeer-tessa <livepeer-tessa@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant