feat: IP-Adapter auto-resolution, VRAM offloading & dependency updates#5
Open
forkni wants to merge 6 commits intopr1/inference-performancefrom
Open
feat: IP-Adapter auto-resolution, VRAM offloading & dependency updates#5forkni wants to merge 6 commits intopr1/inference-performancefrom
forkni wants to merge 6 commits intopr1/inference-performancefrom
Conversation
Add resolve_ipadapter_paths() to ipadapter_module.py with a mapping of known h94/IP-Adapter model/encoder paths keyed by (model_type, IPAdapterType). Wire into wrapper.py:_load_model() after model detection so both pre-TRT and post-TRT installation paths see the resolved config. - SD-Turbo (SD2.1, dim=1024) + sd15 adapter → auto-resolves to sd21 - SDXL-Turbo + sd15 adapter → auto-resolves to sdxl + sdxl encoder - SD2.1 + plus/faceid → falls back to regular with warning - Custom/local paths are never overridden - Updated hardcoded "SD-Turbo is SD2.1-based" warning to generic msg Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p-adapter_sd21.bin The h94/IP-Adapter repo never released an SD2.1 adapter. The auto-resolution logic was mapping SD2.1 to a non-existent HuggingFace path, causing a 404 that crashed the entire pipeline. Now gracefully disables IP-Adapter for unsupported architectures and continues without it. Changes: - ipadapter_module.py: Set SD2.1 REGULAR map entry to None (file never existed) - ipadapter_module.py: resolve_ipadapter_paths() sets cfg["enabled"]=False when no adapter exists for the detected architecture - wrapper.py: Early guard skips install if auto-resolution disabled IP-Adapter - wrapper.py: Generic except handler now gracefully skips instead of re-raising Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n SDXL) Move CLIP-L + OpenCLIP-G to CPU after initial prepare() in TRT mode. Reload on-demand before prompt re-encoding in prepare(), update_prompt(), and update_stream_params() with try/finally to ensure always offloaded back. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds architecture-aware auto-resolution of IP-Adapter model paths, text encoder CPU offloading for VRAM savings, and dependency updates for TensorRT/FP8 prerequisites.
Key Changes
IP-Adapter Reliability:
IPADAPTER_MODEL_MAPmapping(model_type, IPAdapterType)→ correct h94/IP-Adapter HuggingFace pathsresolve_ipadapter_paths()auto-corrects mismatched adapters (e.g., SD1.5 adapter on SDXL model) — prevents cross_attention_dim crashNone(ip-adapter_sd21.bin was never released by h94) — gracefully disables instead of 404 crashVRAM Management:
prepare()in TRT mode (~1.6 GB VRAM saved on SDXL)_reload_text_encoders()/_offload_text_encoders()with try/finally guardsprepare(),update_prompt(),update_stream_params()Dependency Updates (FP8 prerequisites):
StreamDiffusionTD/install_tensorrt.py(158 lines)nvidia-modelopt[onnx],cupy-cuda12xfor future FP8 quantizationopencv-contrib-python==4.9.0.80(away from 4.10.x incompatibility)onnx==1.18.0,onnxruntime-gpu==1.24.3(IR 11 for FLOAT4E2M1 support)onnxruntimeco-install (shared files conflict with GPU variant)Files Modified
ipadapter_module.pyIPADAPTER_MODEL_MAP,resolve_ipadapter_paths(), SD2.1 = Nonewrapper.pyinstall_tensorrt.py(StreamDiffusionTD)tools/install-tensorrt.pysetup.pyImpact
Test plan
update_prompt()during live inference: confirm reload/offload cyclemax_batch_size=4unchanged in wrapper.pypip installwith updated setup.py: verify onnxruntime-gpu only (no CPU variant)🤖 Generated with Claude Code