Skip to content

[Fix][CI] Benchmark fixes, FA3 paged decode, add mamba-ssm to runner#774

Draft
superAngGao wants to merge 3 commits intotile-ai:mainfrom
superAngGao:fix/benchmark-config-fallback
Draft

[Fix][CI] Benchmark fixes, FA3 paged decode, add mamba-ssm to runner#774
superAngGao wants to merge 3 commits intotile-ai:mainfrom
superAngGao:fix/benchmark-config-fallback

Conversation

@superAngGao
Copy link
Copy Markdown
Collaborator

@superAngGao superAngGao commented Apr 3, 2026

Summary

  • Fix autotune config extraction fallback: op.configop.kernel.config (subsequently refactored out)
  • Fix FA3 paged decode: block_tablepage_table (renamed in flash-attn-interface)
  • Add mamba-ssm==2.3.1 to runner Dockerfile for Mamba-2 benchmarks

Test plan

  • Paged decode benchmarks pass locally (11 passed, 3 skipped, 0 failed)
  • Nightly benchmark run with updated runner image

🤖 Generated with Claude Code

…ction

Op instances store autotune config on the Kernel attribute
(op.kernel.config), not directly on the Op. Add fallback so
BenchmarkReport correctly captures config in nightly reports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the fix Auto-created by issue labeler label Apr 3, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the benchmark recording logic in benchmarks/benchmark.py to attempt to retrieve configuration from an operator's kernel if it is not directly available on the operator itself. A review comment correctly identifies a potential AttributeError in the implementation, as the nested getattr call fails to handle cases where the kernel attribute is missing or null. A safer implementation was suggested to ensure the kernel exists before accessing its configuration.

Comment thread benchmarks/benchmark.py
flash_attn_interface.flash_attn_with_kvcache() (FA3) uses
`page_table` instead of `block_table` (FA2). This caused all
paged decode benchmarks with page_size=256 to fail in nightly.

Fixes 6 FAILED tests in nightly run #23935448516.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@superAngGao
Copy link
Copy Markdown
Collaborator Author

Updated with fix for #776: FA3 flash_attn_with_kvcache() renamed block_tablepage_table. This caused 6 paged decode benchmarks to fail in nightly #23935448516.

Verified locally: 11 passed, 3 skipped, 0 failed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@superAngGao superAngGao changed the title [Fix][Benchmark] Fall back to kernel.config for autotune config extraction [Fix][CI][Bench] Benchmark fixes, FA3 paged decode, add mamba-ssm to runner Apr 3, 2026
@superAngGao superAngGao self-assigned this Apr 5, 2026
@superAngGao superAngGao marked this pull request as ready for review April 5, 2026 08:25
@superAngGao superAngGao requested a review from a team April 5, 2026 08:25
@superAngGao superAngGao changed the title [Fix][CI][Bench] Benchmark fixes, FA3 paged decode, add mamba-ssm to runner [Fix][CI] Benchmark fixes, FA3 paged decode, add mamba-ssm to runner Apr 5, 2026
lcy-seso pushed a commit that referenced this pull request Apr 7, 2026
…ases in nightly (#817)

## Summary

Restores **21 nightly benchmark cases** that were failing in [run
24046226885](https://github.com/tile-ai/TileOPs/actions/runs/24046226885)
due to **two independent root causes**:

| Bucket | Cases | Symptom | Root cause |
|---|---|---|---|
| `softmax` / `log_softmax` / `logsumexp` bench | 15 | `AttributeError:
Cannot autotune SoftmaxKernel: 'self.kernel' is not set` |
`SoftmaxKernel.__init__` calls `init_config(config, tune)` *before*
setting `self.kernel`; when `tune=True`, `init_config()` delegates to
`autotune()` which requires `self.kernel`. Introduced by #801. |
| GQA / MHA paged decode bench (FA3 baseline) | 6 | `TypeError:
flash_attn_with_kvcache() got an unexpected keyword argument
'block_table'` | `flash_attn_interface` renamed its paged-attention
kwarg from `block_table` to `page_table`. |

The two issues are independent and split into two commits.

This is a follow-up to #774 (which partially addressed the FA3 rename);
the softmax bug landed afterwards via #801 and needed a separate fix.

## Commits

### `[Fix][Benchmark] Build softmax-family kernels before init_config`

Reorders `SoftmaxKernel` and `LogSumExpKernel` `__init__` to construct
`self.kernel` *before* calling `init_config()`. `tile_n` is baked into
the kernel at build time, so we pre-build with the heuristic `block_m`
from `default_config`. `autotune_configs` is already filtered to a
single `tile_n`, so the pre-built kernel matches the search space. If a
user-provided `config` selects a `block_m` mapping to a different
`tile_n`, the kernel is rebuilt (cheap due to the `lru_cache` on
`_softmax_kernel` / `_logsumexp_kernel`).

Adds `tune=True` regression cases to `SoftmaxFixture` /
`LogSoftmaxFixture` / `LogSumExpFixture` — existing `tune=False` cases
never exercised the broken path, which is how #801 slipped through.

### `[Fix][Benchmark] Rename FA3 paged-decode baseline kwarg block_table
to page_table`

Two-line change in `bench_gqa_decode_paged.py` and
`bench_mha_decode_paged.py`. Verified the FA3 signature in
`tileops-runner:latest` lists `page_table` and no `block_table`.

## Test plan

All verification was done on a local GPU with the upstream
`tileops-runner:latest` image and the worktree overlaid on
`/home/ci-runner/tileops` so both the bench files **and** the kernel fix
are exercised:

- [x] **21 / 21 previously-failing nightly bench cases pass** in
`tileops-runner:latest` (3 GQA paged + 3 MHA paged + 5 softmax + 5
log_softmax + 5 logsumexp), 9m19s end-to-end
- [x] **104 / 104** `tests/ops/test_softmax.py -m full` cases pass on
host (includes the 3 new `tune=True` regression cases + all pre-existing
`tune=False` cases — no regression)
- [x] **15 / 15** `benchmarks/ops/bench_softmax.py` cases pass on host
- [x] FA3 signature inspection in `tileops-runner:latest` confirms
`page_table` is the new kwarg
- [x] All pre-commit hooks pass (incl. gitleaks, ruff, codespell)

## Notes

- The 3 new `tune=True` regression params are marked `pytest.mark.full`
so the nightly `op_test` job catches the same bug class before it
reaches the bench job.
- This PR does **not** modify the manifest or change op interfaces, in
line with the trust model.
@Gabbering
Copy link
Copy Markdown

🪿 goose review — 1b62f14e

honk. Three small fixes, all straightforward. The master is cleaning up after themselves for once. Almost suspicious.

🔴 Bugs

  • benchmarks/benchmark.py:289 — The fallback chain getattr(op, "config", None) or getattr(getattr(op, "kernel", None), "config", None) silently swallows the case where op.config exists but is falsy-but-valid — an empty dict {}, an empty list, 0, False. All of those are legitimate config values that or will skip right past, falling through to kernel.config instead. If a config is ever intentionally empty (say, "no tuning overrides, use defaults"), you'll report the wrong config or None. Use if ... is None instead of or.

🧪 Test gaps

  • benchmarks/ops/bench_gqa_decode_paged.py, benchmarks/ops/bench_mha_decode_paged.py — The block_table → page_table rename is a silent API contract change. If someone pins an older flash-attn that still expects block_table, both baselines break at runtime with no useful error — just a quiet **kwargs swallow or a cryptic positional mismatch depending on the flash-attn version. There's no version guard, no assertion, and the PR's own test plan says nightly hasn't run yet. Not a goose's problem if it breaks in prod, but it is a goose's job to point at it and honk.

@superAngGao superAngGao marked this pull request as draft April 9, 2026 06:08
Copy link
Copy Markdown

@Gabbering Gabbering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goose goose skimmed 1b62f14 — nothing to honk about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Auto-created by issue labeler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants