feat: IP-Adapter auto-resolution, VRAM offloading & dependency updates by forkni · Pull Request #5 · dotsimulate/StreamDiffusion

forkni · 2026-04-03T02:01:32Z

Summary

Stacked on #4 — merge that PR first.

Adds architecture-aware auto-resolution of IP-Adapter model paths, text encoder CPU offloading for VRAM savings, and dependency updates for TensorRT/FP8 prerequisites.

Key Changes

IP-Adapter Reliability:

IPADAPTER_MODEL_MAP mapping (model_type, IPAdapterType) → correct h94/IP-Adapter HuggingFace paths
resolve_ipadapter_paths() auto-corrects mismatched adapters (e.g., SD1.5 adapter on SDXL model) — prevents cross_attention_dim crash
SD2.1 REGULAR mapped to None (ip-adapter_sd21.bin was never released by h94) — gracefully disables instead of 404 crash
Generic exception handler restores UNet attention processors on IP-Adapter load failure

VRAM Management:

Text encoder CPU offloading: moves CLIP-L + OpenCLIP-G to CPU after initial prepare() in TRT mode (~1.6 GB VRAM saved on SDXL)
Reload on-demand before prompt re-encoding via _reload_text_encoders() / _offload_text_encoders() with try/finally guards
Applied consistently to prepare(), update_prompt(), update_stream_params()

Dependency Updates (FP8 prerequisites):

New standalone StreamDiffusionTD/install_tensorrt.py (158 lines)
Added nvidia-modelopt[onnx], cupy-cuda12x for future FP8 quantization
Pinned opencv-contrib-python==4.9.0.80 (away from 4.10.x incompatibility)
onnx==1.18.0, onnxruntime-gpu==1.24.3 (IR 11 for FLOAT4E2M1 support)
Removed CPU onnxruntime co-install (shared files conflict with GPU variant)

Files Modified

File	Changes
`ipadapter_module.py`	`IPADAPTER_MODEL_MAP`, `resolve_ipadapter_paths()`, SD2.1 = None
`wrapper.py`	Auto-resolve hook, text encoder offload/reload methods
`install_tensorrt.py` (StreamDiffusionTD)	New — standalone TRT installer
`tools/install-tensorrt.py`	FP8 deps, formatting cleanup
`setup.py`	onnx/ort version pins, removed CPU ort

Impact

Metric	Before	After
SD-Turbo + wrong IP-Adapter	cross_attention_dim crash	Auto-corrected or disabled
SD2.1 IP-Adapter	404 download crash	Gracefully disabled
IP-Adapter load failure	Pipeline crash	Restores processors, continues
Text encoder VRAM (SDXL TRT)	Always on GPU	CPU-offloaded (~1.6 GB saved)

Test plan

Load SDXL-Turbo with SD1.5 IP-Adapter path: confirm auto-correction
Load SD-Turbo with IP-Adapter enabled: confirm graceful disable
Monitor VRAM during SDXL TRT inference: confirm text encoders absent from GPU
Call update_prompt() during live inference: confirm reload/offload cycle
Verify max_batch_size=4 unchanged in wrapper.py
pip install with updated setup.py: verify onnxruntime-gpu only (no CPU variant)

🤖 Generated with Claude Code

Add resolve_ipadapter_paths() to ipadapter_module.py with a mapping of known h94/IP-Adapter model/encoder paths keyed by (model_type, IPAdapterType). Wire into wrapper.py:_load_model() after model detection so both pre-TRT and post-TRT installation paths see the resolved config. - SD-Turbo (SD2.1, dim=1024) + sd15 adapter → auto-resolves to sd21 - SDXL-Turbo + sd15 adapter → auto-resolves to sdxl + sdxl encoder - SD2.1 + plus/faceid → falls back to regular with warning - Custom/local paths are never overridden - Updated hardcoded "SD-Turbo is SD2.1-based" warning to generic msg Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…p-adapter_sd21.bin The h94/IP-Adapter repo never released an SD2.1 adapter. The auto-resolution logic was mapping SD2.1 to a non-existent HuggingFace path, causing a 404 that crashed the entire pipeline. Now gracefully disables IP-Adapter for unsupported architectures and continues without it. Changes: - ipadapter_module.py: Set SD2.1 REGULAR map entry to None (file never existed) - ipadapter_module.py: resolve_ipadapter_paths() sets cfg["enabled"]=False when no adapter exists for the detected architecture - wrapper.py: Early guard skips install if auto-resolution disabled IP-Adapter - wrapper.py: Generic except handler now gracefully skips instead of re-raising Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…n SDXL) Move CLIP-L + OpenCLIP-G to CPU after initial prepare() in TRT mode. Reload on-demand before prompt re-encoding in prepare(), update_prompt(), and update_stream_params() with try/finally to ensure always offloaded back. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…upy 13.x)

…co-install

… IR 11)

INTER-NYC and others added 6 commits April 2, 2026 21:59

fix: use opencv-contrib-python 4.9.0.80 and add FP8 deps (modelopt, c…

b87b8a6

…upy 13.x)

fix: pin onnx 1.17.0, onnxruntime-gpu 1.22.0; remove CPU onnxruntime …

6eeef56

…co-install

fix: bump onnx 1.18.0 + onnxruntime-gpu 1.24.3 (modelopt FLOAT4E2M1 +…

19c427a

… IR 11)

forkni mentioned this pull request Apr 6, 2026

perf: Tier 1 hot-path allocation elimination & text encoder stutter fix #7

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: IP-Adapter auto-resolution, VRAM offloading & dependency updates#5

feat: IP-Adapter auto-resolution, VRAM offloading & dependency updates#5
forkni wants to merge 6 commits intopr1/inference-performancefrom
pr3/ipadapter-vram-deps

forkni commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

forkni commented Apr 3, 2026

Summary

Key Changes

Files Modified

Impact

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants