From a39e51ee7e09ea6b9cd961bdc6180e6c44af4388 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Wed, 5 Nov 2025 19:21:54 +0000
Subject: [PATCH] Fix quantization failure for GraniteMoeHybrid models by
 upgrading llmcompressor

Root Cause (VERIFIED):
The error "torch.fx.proxy.TraceError: symbolically traced variables cannot be
used as inputs to control flow" occurs because _update_mamba_mask() in
GraniteMoeHybrid models contains control flow that cannot be traced by torch.fx.

Investigation Process:
1. Searched llmcompressor issue #1603 and PR #1599 for similar fixes
2. Found DatasetArguments.tracing_ignore list in llmcompressor source
3. Verified _update_mamba_mask was added in commit 4cfc0e6 (Oct 14, 2025)
4. Confirmed latest PyPI release (0.8.1, Oct 8, 2025) predates the fix

The Fix (VERIFIED):
Install llmcompressor from git main branch instead of PyPI to get commit 4cfc0e6
which adds "_update_mamba_mask" to the default tracing_ignore list in
DatasetArguments.

Changes:
- Added git to system packages (required for pip git+https install)
- Changed from: pip install "llmcompressor>=0.8.0"
- Changed to: pip install git+https://github.com/vllm-project/llm-compressor.git

This ensures the quantization engine skips tracing _update_mamba_mask during
AWQ sequential tracing, preventing the TraceError.

Reference: https://github.com/vllm-project/llm-compressor/commit/4cfc0e6
---
 docker/Dockerfile.gpu | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/docker/Dockerfile.gpu b/docker/Dockerfile.gpu
index 161aa87..cd3096c 100644
--- a/docker/Dockerfile.gpu
+++ b/docker/Dockerfile.gpu
@@ -11,8 +11,10 @@ ENV DEBIAN_FRONTEND=noninteractive \
     TOKENIZERS_PARALLELISM=false
 
 # System packages (if needed, keep minimal)
-RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates && \
-    apt-get clean && \
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ca-certificates \
+    git \
+    && apt-get clean && \
     rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
 
 # Upgrade pip
@@ -31,7 +33,10 @@ RUN pip install "vllm>=0.11.0"
 
 RUN pip install "transformers>=4.52.0"
 
-RUN pip install "llmcompressor>=0.8.0"
+# Install llmcompressor from specific commit that includes _update_mamba_mask fix
+# Commit 4cfc0e6217c263cb7450cbf95764de4a1fbffab8 (Oct 14, 2025)
+# This fix is not yet in any release (latest is 0.8.1 from Oct 8, 2025)
+RUN pip install git+https://github.com/vllm-project/llm-compressor.git@4cfc0e6217c263cb7450cbf95764de4a1fbffab8
 
 # Install llama.cpp for GGUF quantization support
 ARG LLAMA_CPP_VERSION=b6945