[Fix][CI] Benchmark fixes, FA3 paged decode, add mamba-ssm to runner#774
[Fix][CI] Benchmark fixes, FA3 paged decode, add mamba-ssm to runner#774superAngGao wants to merge 3 commits intotile-ai:mainfrom
Conversation
…ction Op instances store autotune config on the Kernel attribute (op.kernel.config), not directly on the Op. Add fallback so BenchmarkReport correctly captures config in nightly reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the benchmark recording logic in benchmarks/benchmark.py to attempt to retrieve configuration from an operator's kernel if it is not directly available on the operator itself. A review comment correctly identifies a potential AttributeError in the implementation, as the nested getattr call fails to handle cases where the kernel attribute is missing or null. A safer implementation was suggested to ensure the kernel exists before accessing its configuration.
flash_attn_interface.flash_attn_with_kvcache() (FA3) uses `page_table` instead of `block_table` (FA2). This caused all paged decode benchmarks with page_size=256 to fail in nightly. Fixes 6 FAILED tests in nightly run #23935448516. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Updated with fix for #776: FA3 Verified locally: 11 passed, 3 skipped, 0 failed. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ases in nightly (#817) ## Summary Restores **21 nightly benchmark cases** that were failing in [run 24046226885](https://github.com/tile-ai/TileOPs/actions/runs/24046226885) due to **two independent root causes**: | Bucket | Cases | Symptom | Root cause | |---|---|---|---| | `softmax` / `log_softmax` / `logsumexp` bench | 15 | `AttributeError: Cannot autotune SoftmaxKernel: 'self.kernel' is not set` | `SoftmaxKernel.__init__` calls `init_config(config, tune)` *before* setting `self.kernel`; when `tune=True`, `init_config()` delegates to `autotune()` which requires `self.kernel`. Introduced by #801. | | GQA / MHA paged decode bench (FA3 baseline) | 6 | `TypeError: flash_attn_with_kvcache() got an unexpected keyword argument 'block_table'` | `flash_attn_interface` renamed its paged-attention kwarg from `block_table` to `page_table`. | The two issues are independent and split into two commits. This is a follow-up to #774 (which partially addressed the FA3 rename); the softmax bug landed afterwards via #801 and needed a separate fix. ## Commits ### `[Fix][Benchmark] Build softmax-family kernels before init_config` Reorders `SoftmaxKernel` and `LogSumExpKernel` `__init__` to construct `self.kernel` *before* calling `init_config()`. `tile_n` is baked into the kernel at build time, so we pre-build with the heuristic `block_m` from `default_config`. `autotune_configs` is already filtered to a single `tile_n`, so the pre-built kernel matches the search space. If a user-provided `config` selects a `block_m` mapping to a different `tile_n`, the kernel is rebuilt (cheap due to the `lru_cache` on `_softmax_kernel` / `_logsumexp_kernel`). Adds `tune=True` regression cases to `SoftmaxFixture` / `LogSoftmaxFixture` / `LogSumExpFixture` — existing `tune=False` cases never exercised the broken path, which is how #801 slipped through. ### `[Fix][Benchmark] Rename FA3 paged-decode baseline kwarg block_table to page_table` Two-line change in `bench_gqa_decode_paged.py` and `bench_mha_decode_paged.py`. Verified the FA3 signature in `tileops-runner:latest` lists `page_table` and no `block_table`. ## Test plan All verification was done on a local GPU with the upstream `tileops-runner:latest` image and the worktree overlaid on `/home/ci-runner/tileops` so both the bench files **and** the kernel fix are exercised: - [x] **21 / 21 previously-failing nightly bench cases pass** in `tileops-runner:latest` (3 GQA paged + 3 MHA paged + 5 softmax + 5 log_softmax + 5 logsumexp), 9m19s end-to-end - [x] **104 / 104** `tests/ops/test_softmax.py -m full` cases pass on host (includes the 3 new `tune=True` regression cases + all pre-existing `tune=False` cases — no regression) - [x] **15 / 15** `benchmarks/ops/bench_softmax.py` cases pass on host - [x] FA3 signature inspection in `tileops-runner:latest` confirms `page_table` is the new kwarg - [x] All pre-commit hooks pass (incl. gitleaks, ruff, codespell) ## Notes - The 3 new `tune=True` regression params are marked `pytest.mark.full` so the nightly `op_test` job catches the same bug class before it reaches the bench job. - This PR does **not** modify the manifest or change op interfaces, in line with the trust model.
🪿 goose review —
|
Summary
op.config→op.kernel.config(subsequently refactored out)block_table→page_table(renamed in flash-attn-interface)mamba-ssm==2.3.1to runner Dockerfile for Mamba-2 benchmarksTest plan
🤖 Generated with Claude Code