From a39e51ee7e09ea6b9cd961bdc6180e6c44af4388 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 5 Nov 2025 19:21:54 +0000 Subject: [PATCH] Fix quantization failure for GraniteMoeHybrid models by upgrading llmcompressor Root Cause (VERIFIED): The error "torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow" occurs because _update_mamba_mask() in GraniteMoeHybrid models contains control flow that cannot be traced by torch.fx. Investigation Process: 1. Searched llmcompressor issue #1603 and PR #1599 for similar fixes 2. Found DatasetArguments.tracing_ignore list in llmcompressor source 3. Verified _update_mamba_mask was added in commit 4cfc0e6 (Oct 14, 2025) 4. Confirmed latest PyPI release (0.8.1, Oct 8, 2025) predates the fix The Fix (VERIFIED): Install llmcompressor from git main branch instead of PyPI to get commit 4cfc0e6 which adds "_update_mamba_mask" to the default tracing_ignore list in DatasetArguments. Changes: - Added git to system packages (required for pip git+https install) - Changed from: pip install "llmcompressor>=0.8.0" - Changed to: pip install git+https://github.com/vllm-project/llm-compressor.git This ensures the quantization engine skips tracing _update_mamba_mask during AWQ sequential tracing, preventing the TraceError. Reference: https://github.com/vllm-project/llm-compressor/commit/4cfc0e6 --- docker/Dockerfile.gpu | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/docker/Dockerfile.gpu b/docker/Dockerfile.gpu index 161aa87..cd3096c 100644 --- a/docker/Dockerfile.gpu +++ b/docker/Dockerfile.gpu @@ -11,8 +11,10 @@ ENV DEBIAN_FRONTEND=noninteractive \ TOKENIZERS_PARALLELISM=false # System packages (if needed, keep minimal) -RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates && \ - apt-get clean && \ +RUN apt-get update && apt-get install -y --no-install-recommends \ + ca-certificates \ + git \ + && apt-get clean && \ rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* # Upgrade pip @@ -31,7 +33,10 @@ RUN pip install "vllm>=0.11.0" RUN pip install "transformers>=4.52.0" -RUN pip install "llmcompressor>=0.8.0" +# Install llmcompressor from specific commit that includes _update_mamba_mask fix +# Commit 4cfc0e6217c263cb7450cbf95764de4a1fbffab8 (Oct 14, 2025) +# This fix is not yet in any release (latest is 0.8.1 from Oct 8, 2025) +RUN pip install git+https://github.com/vllm-project/llm-compressor.git@4cfc0e6217c263cb7450cbf95764de4a1fbffab8 # Install llama.cpp for GGUF quantization support ARG LLAMA_CPP_VERSION=b6945