Pull the Llama4 inference benchmark from lightning-thunder#5578
Pull the Llama4 inference benchmark from lightning-thunder#5578github-actions[bot] merged 6 commits intomainfrom
Conversation
Greptile OverviewGreptile SummaryPulls the latest Llama4 inference benchmark script from lightning-thunder repository into the fuser repo. This is part of migrating the script from lightning-thunder (where it will be deleted) to fuser for ongoing maintenance. Key Changes:
Dependency Concern:
Confidence Score: 3/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant main
participant InferenceBenchmark
participant Model
participant Thunder
participant nvFuser
User->>main: Run benchmark script
main->>main: parse_args()
main->>main: _register_nvfp4_ops()
Note over main: Register nvFP4 custom ops<br/>with Thunder/nvFuser
main->>InferenceBenchmark: __init__(config)
InferenceBenchmark->>Model: _load_model()
Note over Model: Load on meta device
InferenceBenchmark->>Model: _replace_llama4_moe()
Note over Model: Replace HF MoE with custom<br/>Llama4MoE using GroupedSwiGLU
InferenceBenchmark->>Model: parallelize_module()
Note over Model: Apply tensor parallelism
InferenceBenchmark->>Model: to_empty(device)
Note over Model: Materialize on GPU
InferenceBenchmark->>Model: _quantize_llama4()
Note over Model: Replace GroupedSwiGLU with<br/>NVFP4InferenceGroupedSwiGLU
InferenceBenchmark->>Thunder: _compile_model()
Thunder->>nvFuser: Apply transforms
Note over Thunder,nvFuser: thunderfx/thunder.jit compilation
InferenceBenchmark->>InferenceBenchmark: run_benchmark()
loop warmup_iterations
InferenceBenchmark->>Model: generate()
Model->>Thunder: forward()
Thunder->>nvFuser: Execute fused kernels
end
loop num_iterations
InferenceBenchmark->>Model: measure_inference_step()
Model->>Model: prefill()
Model->>Model: decode_one_token() x N
InferenceBenchmark->>InferenceBenchmark: Track metrics
end
InferenceBenchmark->>User: print_results()
|
|
Review updated until commit f5d05ae Auto-merge Status✅ PR is approved Description
|
| Relevant files | |||||
|---|---|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| 🔒 No security concerns identified |
| ⚡ Recommended focus areas for review |
Missing Tests
|
wujingyue
left a comment
There was a problem hiding this comment.
I didn't review the nvfp4 stuff. Other changes LGTM!
| group_outs = [] | ||
| for group_a, group_b in zip(a.split(group_sizes), b.unbind()): | ||
| group_outs.append(group_a @ group_b) | ||
| for idx, group_a in enumerate(a.split(group_sizes)): |
There was a problem hiding this comment.
I don't think this fallback implementation is necessary any more. Lightning-AI/lightning-thunder#2721
But this can come as a different PR.
There was a problem hiding this comment.
we only build against the latest stable and nightly
790d7d7 to
f5d05ae
Compare
|
!test |
There was a problem hiding this comment.
Additional Comments (1)
-
benchmarks/python/layers_for_inference_benchmark.py, line 550 (link)style: imports from test module
thunder.tests.llama4_moe- verify this is intended to be part of thunder's public API when installed as a dependency, or if it should use a different public module
2 files reviewed, 1 comment
|
The latest internal pipeline run actually had two failures jit_binary_distributed_tests_20_GB200 and jit_python_distributed_tests_20_GB200 . Those were not detected by the auto-merge workflow because I missed the pagination of CI status check and only checked the latest 30 statuses, which were all successful. I'm working on a fix for that.
|
This fixes a severe bug where the auto-merge workflow only checked the first 30 commit statuses, causing it to miss failures and incorrectly merge PRs with failing checks. Root cause analysis: - PR #5578 had 2 failed GB200 tests at 23:23-23:25 UTC - By 03:29 UTC, 27+ new successful statuses pushed failures past position 30 - Workflow only fetched first page (30 items), saw 0 failures, and merged Fixed 4 critical pagination issues: 1. listCommitStatusesForRef (line 140) - CRITICAL: Only saw 30 of 57 statuses 2. checks.listForRef (line 173) - Could miss failed checks if >30 exist 3. issues.listComments (line 349) - Wouldn't find status comment if >30 comments 4. pulls.list (line 64) - Could miss PR if >30 open PRs on branch All API calls now use github.paginate() to retrieve complete results. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
## Summary Fixes a critical bug where the auto-merge workflow only fetched the first 30 results from GitHub API list operations, causing it to miss failed checks and incorrectly merge PRs. ## Root Cause PR #5578 had 2 failed GB200 tests that occurred early in the CI run. By the time the auto-merge action ran 4+ hours later, 27 newer successful statuses had been created. Since the workflow used unpaginated API calls (default limit: 30 items), the failed statuses were pushed beyond the first page and never detected. ## Changes Fixed 4 GitHub API calls to use `github.paginate()`: 1. `listCommitStatusesForRef` - Was only checking 30 of 57+ statuses 2. `checks.listForRef` - Could miss failed checks if >30 exist 3. `issues.listComments` - Could miss status comment if >30 comments 4. `pulls.list` - Could miss PR if >30 open PRs on branch Also simplified the `pr_approved` check logic which was deriving approval status from `mergeable_state` in a confusing way. The workflow now shows the actual `mergeable_state` value in status comments for transparency. ## Impact The auto-merge workflow will now correctly detect ALL failures regardless of how many statuses exist, preventing incorrect merges like #5578. --------- Co-authored-by: Claude <noreply@anthropic.com>
Pull the latest version of this script after substantial changes in the lightning repo. The script is being "moved" into the fuser repo - it will be deleted from lightning-thunder, and subsequent changes will be merged into the fuser repo.
The script does not have any nvfuser changes at this moment. No configs or model code are pulled inside yet.