Skip to content

Development updates 20260507#22

Merged
jduprat merged 13 commits into
facebookresearch:mainfrom
jduprat:dev
May 7, 2026
Merged

Development updates 20260507#22
jduprat merged 13 commits into
facebookresearch:mainfrom
jduprat:dev

Conversation

@jduprat
Copy link
Copy Markdown
Contributor

@jduprat jduprat commented May 7, 2026

[New] Forward complement() through ComposedLayout
[New] Decay swizzled ComposedLayout slices to plain Layouts when the swizzle's Y/Z bits aren't both touched on the surviving subspace
[New] Allow Swizzle in ComposedLayout's inner slot — the inverse-form arising from right_inverse / left_inverse on offset-bearing swizzle-fronted ComposedLayouts
[New] Support coalesce on ComposedLayout(Layout, offset, Swizzle) as a no-op (rank-1 inverse-form has no structure to merge)
[New] Strengthen to_F2_matrix() to accept any F2-linear ComposedLayout; add from_F2_matrix() inverse constructor with affine + brute-force-Swizzle-extraction reconstruction; round-trip identity holds
[New] CuTe C++ oracle: env-var override (CUTLASS_PATH / CUTLASS_INCLUDE_DIR) for out-of-tree CUTLASS
[New] CuTe C++ oracle coverage: 20 new entries pinning complement, coalesce, compose, right_inverse on ComposedLayout form variants F2-F8; compose_truncation_paper case for paper section 3.3.3
[Fix] cosize(ComposedLayout) now uses max(L(i)) + 1 enumeration over the full domain---O(n) instead of O(1); the previous delegation to inner-or-outer mis-reported the codomain extent for five common forms and could cause buffer under-allocation
[Fix] Transfer swizzle through logical_product against swizzled tiles (was silently dropping the embedded swizzle and returning a semantically wrong plain layout)
[Fix] Tensor address-boundary sanitizer distinguishes negative-address vs out-of-upper-bound failures with separate messages naming both legitimate sources
[Robustness] Reject complement / logical_product / logical_divide on the inverse-form ComposedLayout(Layout, offset, Swizzle) with NotImplementedError
[Robustness] Reject logical_product on ComposedLayout(Layout, Swizzle, offset) (was crashing in the affine fallback with AttributeError on .stride)
[API] Rename ComposedLayout.preoffset → offset
[Docs] Document supported / unsupported ops for the inverse-form ComposedLayout in ComposedLayout docstring and docs/layout_api.md
[Docs] Document strengthened to_F2_matrix and new from_F2_matrix in docs/analysis_api.md

jduprat and others added 13 commits May 7, 2026 16:11
The CuTe C++ oracle currently auto-skips when CUTLASS headers aren't on
its short hard-coded candidate list. Add CUTLASS_INCLUDE_DIR and
CUTLASS_PATH env-var overrides so callers can point at an out-of-tree
install (e.g. a vendored fbsource tree) without editing the file. Also
add /usr/local/cuda-12.8/include to the default candidates so x86 boxes
with stock CUDA 12.8 pick up cuda/std/utility automatically.
complement() previously raised AttributeError on ComposedLayout because
its inner code path reaches for .stride, which ComposedLayout
intentionally omits. CuTe C++ (layout_composed.hpp:395-409) defines
complement(ComposedLayout) as complement of the inner only -- the outer
is an involution / permutation and doesn't change the codomain image.
Mirror that here, plus a Python regression test and a CuTe C++ oracle
case so the parity stays anchored.
When slicing a ComposedLayout(Swizzle, affine, preoffset), CuTe C++
checks whether the surviving inner's image hits both the swizzle's Y
and Z bits. If it doesn't, the swizzle is affine on the surviving
subspace and CuTe collapses the wrapper into a plain Layout with
per-bit XOR'd strides plus a constant base offset
(cute/swizzle_layout.hpp:263-294). tensor-layouts always re-wrapped as
another ComposedLayout, so equivalent slice expressions printed as
larger composed trees than CuTe's.

Add the same decay path: build an explicit affine bit-decomposition of
the swizzle for the (now constant) preoffset, compose with the inner,
and verify the result reproduces the swizzled output before returning.
On any reducibility failure the helper bails and the existing re-wrap
runs unchanged.

The Layout-with-embedded-swizzle slice path is intentionally left alone
because tensor-layouts' Tensor model applies the tensor's base offset
INSIDE the embedded swizzle (see tensor.py::_tensor_address). Decaying
there would silently break Tensor offset semantics; decay only runs for
genuine ComposedLayout where the preoffset is fully internal.
logical_product(Layout, ComposedLayout(Swizzle, Layout)) used to
silently drop the embedded swizzle: the inner compose produced a
Layout-with-embedded-swizzle, but the final tuple-Layout constructor
discarded the swizzle attribute, returning a semantically wrong plain
layout.

Add the explicit transfer-swizzle path matching CuTe C++
(cute/swizzle_layout.hpp:549-587): do the affine product on the inner
tile, then derive new active Y/Z masks under the new product layout
and wrap the result in the freshly-constructed swizzle. Also preserve
any embedded swizzle that the generic compose path produces in the
fallback constructor, so swizzled-tile products no longer lose their
swizzle silently. The new path falls back to ComposedLayout when the
new active masks aren't a representable Swizzle.
Add a second oracle case exercising the "apparent violation" truncation
path from paper section 3.3.3, where B's image fits entirely inside the
current LHS mode.  The new case uses a simpler (8,8):(3,97) layout that
matches the paper example more directly.
CuTe's right_inverse/left_inverse of a swizzle-fronted ComposedLayout
with nonzero preoffset produces ComposedLayout<Layout, Offset, Swizzle>.
Our right_inverse() and left_inverse() already build exactly that form
in their fallback path, but the constructor invariant rejected Swizzle
in the inner slot, so any inversion of a swizzle-fronted ComposedLayout with
nonzero offset raised TypeError before the user could use the result.

Loosen the invariant and update the small set of consumers that
recurse into .inner so they handle the Swizzle leaf:
  - shape returns outer.shape (1-D integer domain)
  - cosize recurses into outer (Swizzle is bijective)
  - flatten/mode/_forward_layout_domain/complement guard the Swizzle leaf
  - _slice_for_composition rejects sub-domain slicing on the Swizzle
The new ComposedLayout(Layout, Swizzle, offset) form (introduced when
right_inverse / left_inverse handle a swizzle-fronted ComposedLayout with
nonzero preoffset) has a 1-D integer domain and no affine stride to tile
against. logical_product was unprepared and crashed in the affine
fallback with AttributeError on .stride.
A differential survey of ComposedLayout behavior flagged that
several common operations -- complement, coalesce, compose, and the
degenerate right_inverse case -- have no CuTe C++ oracle entry, so any
divergence between our impl and CuTe ToT goes unnoticed.

Add 20 new entries covering complement, coalesce, compose(_, Layout(4,1)),
and right_inverse on the canonical form variants F2-F8. All pass against
local CuTe headers, pinning the current behavior:
  - complement(ComposedLayout) returns Layout(1, 0) for every form.
  - coalesce is evaluation-preserving (compared pointwise).
  - compose with a Layout RHS truncates the inner.
  - right_inverse on a Layout-outer ComposedLayout collapses to the
    contiguous prefix when the outer has a stride gap (size 1 result).

Also fix print_offsets_n to static_cast<int>() the layout's output --
without it, CuTe's compile-time integer types print as "_-2" / "_0" and
trip the byte-for-byte pointwise comparison.
The inverse-form ComposedLayout — outer=Layout, inner=Swizzle, possibly
negative offset — arises from right_inverse / left_inverse on a swizzle-
fronted ComposedLayout with a nonzero offset. CuTe C++ refuses every
non-trivial op on this form. We previously either silently returned a
degenerate result (complement -> Layout(1,0), coalesce -> input
unchanged) or recursed into a callee that raised with a misleading op
name (logical_divide via coalesce).

Now: structurally allowed, semantically narrow, errors loud.
complement, coalesce, logical_divide, and logical_product all raise
NotImplementedError. The form remains usable for the ops the
inverse-and-cancel round trip needs: __call__, size, cosize,
shape, rank, depth, flatten, right_inverse, left_inverse, compose.

Tensor._validate_storage already enumerated the addressed range, but it
folded the negative-address and oversized-storage cases into one
message. Split them so the negative-address case names both legitimate
sources. Triggered at construction and on .data = setter so there is
no bypass.
Technically a non-functional change, but API breaking!
The inverse-form ComposedLayout has a rank-1 integer domain with no
multi-mode structure to merge and no size-1 modes to filter, so the
only correct answer for coalesce is the input itself. CuTe C++ refuses
this form because its template delegates to the inner Swizzle which
doesn't satisfy the Layout interface; we can answer it directly because
we know the answer is structural and trivial.
Previously cosize() for ComposedLayout delegated to cosize(inner) (or
cosize(outer) for the inverse-form). This is wrong: it ignores the outer
layout and the offset, so for forms where those modify the codomain the
reported cosize doesn't match the actual image extent.  There are five forms
where cosize was either smaller than max(L(c))+1 (under-reports
the buffer needed) or wrong in a milder direction.

The only definition that matches the actual codomain extent for every
non-affine form is:
    cosize(L) := max(L(i) for i in range(size(L))) + 1

ComposedLayout has no closed form because the outer can be a Swizzle, a
non-bijective Layout, or another ComposedLayout that permutes or
rescales the inner's image. cosize on a ComposedLayout is now
O(size(L)) instead of O(1). The affine path is unchanged
to_F2_matrix changes:
- Now accepts ComposedLayout when the form is F2-linear (zero offset, no
  Swizzle in the inner slot). Internally computes the matrix as the
  product M_outer @ M_inner over GF(2) and trims trailing zero rows so
  the output's row count matches the actual codomain bit-width.
- Rejects with ValueError (not TypeError) the non-F2-linear cases:
  nonzero offset (affine translation) and the inverse-form
  ComposedLayout(Layout, offset, Swizzle) (the forward IS F2-linear --
  invert the matrix in GF(2) instead).

from_F2_matrix(M, shape) -> Layout
- Requires shape because the matrix loses the partition of input bits
  into modes.
- Try plain affine reconstruction first; if the column values
  match a clean power-of-2 progression per mode, return Layout(shape,
  strides). Otherwise brute-force search over Swizzle(bits, base, shift)
  candidates: for each, apply S to M (Swizzle is involutive over F2) and
  re-attempt affine reconstruction. If found, return Layout with embedded
  Swizzle. If no single-Swizzle factorization works, raise
  NotImplementedError (Triton's LinearLayout handles multi-swizzle via
  Smith-normal-form-style decomposition; we don't replicate that).
- Round-trip identity: from_F2_matrix(to_F2_matrix(L), L.shape) == L
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 7, 2026
@jduprat jduprat merged commit 9fcdb2a into facebookresearch:main May 7, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant