Development updates 20260507#22
Merged
Merged
Conversation
The CuTe C++ oracle currently auto-skips when CUTLASS headers aren't on its short hard-coded candidate list. Add CUTLASS_INCLUDE_DIR and CUTLASS_PATH env-var overrides so callers can point at an out-of-tree install (e.g. a vendored fbsource tree) without editing the file. Also add /usr/local/cuda-12.8/include to the default candidates so x86 boxes with stock CUDA 12.8 pick up cuda/std/utility automatically.
complement() previously raised AttributeError on ComposedLayout because its inner code path reaches for .stride, which ComposedLayout intentionally omits. CuTe C++ (layout_composed.hpp:395-409) defines complement(ComposedLayout) as complement of the inner only -- the outer is an involution / permutation and doesn't change the codomain image. Mirror that here, plus a Python regression test and a CuTe C++ oracle case so the parity stays anchored.
When slicing a ComposedLayout(Swizzle, affine, preoffset), CuTe C++ checks whether the surviving inner's image hits both the swizzle's Y and Z bits. If it doesn't, the swizzle is affine on the surviving subspace and CuTe collapses the wrapper into a plain Layout with per-bit XOR'd strides plus a constant base offset (cute/swizzle_layout.hpp:263-294). tensor-layouts always re-wrapped as another ComposedLayout, so equivalent slice expressions printed as larger composed trees than CuTe's. Add the same decay path: build an explicit affine bit-decomposition of the swizzle for the (now constant) preoffset, compose with the inner, and verify the result reproduces the swizzled output before returning. On any reducibility failure the helper bails and the existing re-wrap runs unchanged. The Layout-with-embedded-swizzle slice path is intentionally left alone because tensor-layouts' Tensor model applies the tensor's base offset INSIDE the embedded swizzle (see tensor.py::_tensor_address). Decaying there would silently break Tensor offset semantics; decay only runs for genuine ComposedLayout where the preoffset is fully internal.
logical_product(Layout, ComposedLayout(Swizzle, Layout)) used to silently drop the embedded swizzle: the inner compose produced a Layout-with-embedded-swizzle, but the final tuple-Layout constructor discarded the swizzle attribute, returning a semantically wrong plain layout. Add the explicit transfer-swizzle path matching CuTe C++ (cute/swizzle_layout.hpp:549-587): do the affine product on the inner tile, then derive new active Y/Z masks under the new product layout and wrap the result in the freshly-constructed swizzle. Also preserve any embedded swizzle that the generic compose path produces in the fallback constructor, so swizzled-tile products no longer lose their swizzle silently. The new path falls back to ComposedLayout when the new active masks aren't a representable Swizzle.
Add a second oracle case exercising the "apparent violation" truncation path from paper section 3.3.3, where B's image fits entirely inside the current LHS mode. The new case uses a simpler (8,8):(3,97) layout that matches the paper example more directly.
CuTe's right_inverse/left_inverse of a swizzle-fronted ComposedLayout with nonzero preoffset produces ComposedLayout<Layout, Offset, Swizzle>. Our right_inverse() and left_inverse() already build exactly that form in their fallback path, but the constructor invariant rejected Swizzle in the inner slot, so any inversion of a swizzle-fronted ComposedLayout with nonzero offset raised TypeError before the user could use the result. Loosen the invariant and update the small set of consumers that recurse into .inner so they handle the Swizzle leaf: - shape returns outer.shape (1-D integer domain) - cosize recurses into outer (Swizzle is bijective) - flatten/mode/_forward_layout_domain/complement guard the Swizzle leaf - _slice_for_composition rejects sub-domain slicing on the Swizzle
The new ComposedLayout(Layout, Swizzle, offset) form (introduced when right_inverse / left_inverse handle a swizzle-fronted ComposedLayout with nonzero preoffset) has a 1-D integer domain and no affine stride to tile against. logical_product was unprepared and crashed in the affine fallback with AttributeError on .stride.
A differential survey of ComposedLayout behavior flagged that
several common operations -- complement, coalesce, compose, and the
degenerate right_inverse case -- have no CuTe C++ oracle entry, so any
divergence between our impl and CuTe ToT goes unnoticed.
Add 20 new entries covering complement, coalesce, compose(_, Layout(4,1)),
and right_inverse on the canonical form variants F2-F8. All pass against
local CuTe headers, pinning the current behavior:
- complement(ComposedLayout) returns Layout(1, 0) for every form.
- coalesce is evaluation-preserving (compared pointwise).
- compose with a Layout RHS truncates the inner.
- right_inverse on a Layout-outer ComposedLayout collapses to the
contiguous prefix when the outer has a stride gap (size 1 result).
Also fix print_offsets_n to static_cast<int>() the layout's output --
without it, CuTe's compile-time integer types print as "_-2" / "_0" and
trip the byte-for-byte pointwise comparison.
The inverse-form ComposedLayout — outer=Layout, inner=Swizzle, possibly negative offset — arises from right_inverse / left_inverse on a swizzle- fronted ComposedLayout with a nonzero offset. CuTe C++ refuses every non-trivial op on this form. We previously either silently returned a degenerate result (complement -> Layout(1,0), coalesce -> input unchanged) or recursed into a callee that raised with a misleading op name (logical_divide via coalesce). Now: structurally allowed, semantically narrow, errors loud. complement, coalesce, logical_divide, and logical_product all raise NotImplementedError. The form remains usable for the ops the inverse-and-cancel round trip needs: __call__, size, cosize, shape, rank, depth, flatten, right_inverse, left_inverse, compose. Tensor._validate_storage already enumerated the addressed range, but it folded the negative-address and oversized-storage cases into one message. Split them so the negative-address case names both legitimate sources. Triggered at construction and on .data = setter so there is no bypass.
Technically a non-functional change, but API breaking!
The inverse-form ComposedLayout has a rank-1 integer domain with no multi-mode structure to merge and no size-1 modes to filter, so the only correct answer for coalesce is the input itself. CuTe C++ refuses this form because its template delegates to the inner Swizzle which doesn't satisfy the Layout interface; we can answer it directly because we know the answer is structural and trivial.
Previously cosize() for ComposedLayout delegated to cosize(inner) (or
cosize(outer) for the inverse-form). This is wrong: it ignores the outer
layout and the offset, so for forms where those modify the codomain the
reported cosize doesn't match the actual image extent. There are five forms
where cosize was either smaller than max(L(c))+1 (under-reports
the buffer needed) or wrong in a milder direction.
The only definition that matches the actual codomain extent for every
non-affine form is:
cosize(L) := max(L(i) for i in range(size(L))) + 1
ComposedLayout has no closed form because the outer can be a Swizzle, a
non-bijective Layout, or another ComposedLayout that permutes or
rescales the inner's image. cosize on a ComposedLayout is now
O(size(L)) instead of O(1). The affine path is unchanged
to_F2_matrix changes: - Now accepts ComposedLayout when the form is F2-linear (zero offset, no Swizzle in the inner slot). Internally computes the matrix as the product M_outer @ M_inner over GF(2) and trims trailing zero rows so the output's row count matches the actual codomain bit-width. - Rejects with ValueError (not TypeError) the non-F2-linear cases: nonzero offset (affine translation) and the inverse-form ComposedLayout(Layout, offset, Swizzle) (the forward IS F2-linear -- invert the matrix in GF(2) instead). from_F2_matrix(M, shape) -> Layout - Requires shape because the matrix loses the partition of input bits into modes. - Try plain affine reconstruction first; if the column values match a clean power-of-2 progression per mode, return Layout(shape, strides). Otherwise brute-force search over Swizzle(bits, base, shift) candidates: for each, apply S to M (Swizzle is involutive over F2) and re-attempt affine reconstruction. If found, return Layout with embedded Swizzle. If no single-Swizzle factorization works, raise NotImplementedError (Triton's LinearLayout handles multi-swizzle via Smith-normal-form-style decomposition; we don't replicate that). - Round-trip identity: from_F2_matrix(to_F2_matrix(L), L.shape) == L
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[New] Forward complement() through ComposedLayout
[New] Decay swizzled ComposedLayout slices to plain Layouts when the swizzle's Y/Z bits aren't both touched on the surviving subspace
[New] Allow Swizzle in ComposedLayout's inner slot — the inverse-form arising from right_inverse / left_inverse on offset-bearing swizzle-fronted ComposedLayouts
[New] Support coalesce on ComposedLayout(Layout, offset, Swizzle) as a no-op (rank-1 inverse-form has no structure to merge)
[New] Strengthen to_F2_matrix() to accept any F2-linear ComposedLayout; add from_F2_matrix() inverse constructor with affine + brute-force-Swizzle-extraction reconstruction; round-trip identity holds
[New] CuTe C++ oracle: env-var override (CUTLASS_PATH / CUTLASS_INCLUDE_DIR) for out-of-tree CUTLASS
[New] CuTe C++ oracle coverage: 20 new entries pinning complement, coalesce, compose, right_inverse on ComposedLayout form variants F2-F8; compose_truncation_paper case for paper section 3.3.3
[Fix] cosize(ComposedLayout) now uses max(L(i)) + 1 enumeration over the full domain---O(n) instead of O(1); the previous delegation to inner-or-outer mis-reported the codomain extent for five common forms and could cause buffer under-allocation
[Fix] Transfer swizzle through logical_product against swizzled tiles (was silently dropping the embedded swizzle and returning a semantically wrong plain layout)
[Fix] Tensor address-boundary sanitizer distinguishes negative-address vs out-of-upper-bound failures with separate messages naming both legitimate sources
[Robustness] Reject complement / logical_product / logical_divide on the inverse-form ComposedLayout(Layout, offset, Swizzle) with NotImplementedError
[Robustness] Reject logical_product on ComposedLayout(Layout, Swizzle, offset) (was crashing in the affine fallback with AttributeError on .stride)
[API] Rename ComposedLayout.preoffset → offset
[Docs] Document supported / unsupported ops for the inverse-form ComposedLayout in ComposedLayout docstring and docs/layout_api.md
[Docs] Document strengthened to_F2_matrix and new from_F2_matrix in docs/analysis_api.md