Skip to content

feat: IP-Adapter auto-resolution, VRAM offloading & dependency updates#5

Open
forkni wants to merge 6 commits intopr1/inference-performancefrom
pr3/ipadapter-vram-deps
Open

feat: IP-Adapter auto-resolution, VRAM offloading & dependency updates#5
forkni wants to merge 6 commits intopr1/inference-performancefrom
pr3/ipadapter-vram-deps

Conversation

@forkni
Copy link
Copy Markdown
Collaborator

@forkni forkni commented Apr 3, 2026

Summary

Stacked on #4 — merge that PR first.

Adds architecture-aware auto-resolution of IP-Adapter model paths, text encoder CPU offloading for VRAM savings, and dependency updates for TensorRT/FP8 prerequisites.

Key Changes

IP-Adapter Reliability:

  • IPADAPTER_MODEL_MAP mapping (model_type, IPAdapterType) → correct h94/IP-Adapter HuggingFace paths
  • resolve_ipadapter_paths() auto-corrects mismatched adapters (e.g., SD1.5 adapter on SDXL model) — prevents cross_attention_dim crash
  • SD2.1 REGULAR mapped to None (ip-adapter_sd21.bin was never released by h94) — gracefully disables instead of 404 crash
  • Generic exception handler restores UNet attention processors on IP-Adapter load failure

VRAM Management:

  • Text encoder CPU offloading: moves CLIP-L + OpenCLIP-G to CPU after initial prepare() in TRT mode (~1.6 GB VRAM saved on SDXL)
  • Reload on-demand before prompt re-encoding via _reload_text_encoders() / _offload_text_encoders() with try/finally guards
  • Applied consistently to prepare(), update_prompt(), update_stream_params()

Dependency Updates (FP8 prerequisites):

  • New standalone StreamDiffusionTD/install_tensorrt.py (158 lines)
  • Added nvidia-modelopt[onnx], cupy-cuda12x for future FP8 quantization
  • Pinned opencv-contrib-python==4.9.0.80 (away from 4.10.x incompatibility)
  • onnx==1.18.0, onnxruntime-gpu==1.24.3 (IR 11 for FLOAT4E2M1 support)
  • Removed CPU onnxruntime co-install (shared files conflict with GPU variant)

Files Modified

File Changes
ipadapter_module.py IPADAPTER_MODEL_MAP, resolve_ipadapter_paths(), SD2.1 = None
wrapper.py Auto-resolve hook, text encoder offload/reload methods
install_tensorrt.py (StreamDiffusionTD) New — standalone TRT installer
tools/install-tensorrt.py FP8 deps, formatting cleanup
setup.py onnx/ort version pins, removed CPU ort

Impact

Metric Before After
SD-Turbo + wrong IP-Adapter cross_attention_dim crash Auto-corrected or disabled
SD2.1 IP-Adapter 404 download crash Gracefully disabled
IP-Adapter load failure Pipeline crash Restores processors, continues
Text encoder VRAM (SDXL TRT) Always on GPU CPU-offloaded (~1.6 GB saved)

Test plan

  • Load SDXL-Turbo with SD1.5 IP-Adapter path: confirm auto-correction
  • Load SD-Turbo with IP-Adapter enabled: confirm graceful disable
  • Monitor VRAM during SDXL TRT inference: confirm text encoders absent from GPU
  • Call update_prompt() during live inference: confirm reload/offload cycle
  • Verify max_batch_size=4 unchanged in wrapper.py
  • pip install with updated setup.py: verify onnxruntime-gpu only (no CPU variant)

🤖 Generated with Claude Code

INTER-NYC and others added 6 commits April 2, 2026 21:59
Add resolve_ipadapter_paths() to ipadapter_module.py with a mapping
of known h94/IP-Adapter model/encoder paths keyed by (model_type,
IPAdapterType). Wire into wrapper.py:_load_model() after model
detection so both pre-TRT and post-TRT installation paths see the
resolved config.

- SD-Turbo (SD2.1, dim=1024) + sd15 adapter → auto-resolves to sd21
- SDXL-Turbo + sd15 adapter → auto-resolves to sdxl + sdxl encoder
- SD2.1 + plus/faceid → falls back to regular with warning
- Custom/local paths are never overridden
- Updated hardcoded "SD-Turbo is SD2.1-based" warning to generic msg

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p-adapter_sd21.bin

The h94/IP-Adapter repo never released an SD2.1 adapter. The auto-resolution
logic was mapping SD2.1 to a non-existent HuggingFace path, causing a 404
that crashed the entire pipeline. Now gracefully disables IP-Adapter for
unsupported architectures and continues without it.

Changes:
- ipadapter_module.py: Set SD2.1 REGULAR map entry to None (file never existed)
- ipadapter_module.py: resolve_ipadapter_paths() sets cfg["enabled"]=False when
  no adapter exists for the detected architecture
- wrapper.py: Early guard skips install if auto-resolution disabled IP-Adapter
- wrapper.py: Generic except handler now gracefully skips instead of re-raising

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n SDXL)

Move CLIP-L + OpenCLIP-G to CPU after initial prepare() in TRT mode.
Reload on-demand before prompt re-encoding in prepare(), update_prompt(),
and update_stream_params() with try/finally to ensure always offloaded back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants