Development updates 20260515 by jduprat · Pull Request #24 · facebookresearch/tensor-layouts

jduprat · 2026-05-15T17:39:55Z

[New] Layout becomes purely affine; ComposedLayout carries every swizzled / non-affine form. [New] Promote split_outer_swizzle() to public API
[New] Revisited Exception hierarchy
[New] _address_bounds fast path for the canonical Sw o L form
[Fix] Tensor over a Layout-with-embedded-swizzle now adds the external offset AFTER the swizzle
[Fix] cosize() on embedded-swizzle Layout for non-power-of-2 shapes
[Fix] Tensor[(slice(None), 0), 1] now raises TypeError
[Fix] Drop typing.Self fallback that imported the undeclared typing_extensions
[Perf] Cache cosize() on each ComposedLayout and swizzled Layout instance
[Perf] _address_bounds: drop the embedded-vs-composed gate on the fast path
[Robustness] Make ComposedLayout's offset keyword-only
[Robustness] as_affine_layout() does an explicit is_affine() post-check
[Robustness] Validate warp_size > 0 in coalescing_efficiency / segment_analysis
[Robustness] viz raises an actionable ImportError when matplotlib/numpy are missing
[API] Align four pre-existing exception-class inconsistencies
[Refactor] Split layouts.py (4.4k LOC) into a layouts/ package with three layered modules. No public API change
[Refactor] Dedup bank_conflicts / per_group_bank_conflicts via shared helper
[Refactor] Dedup coalescing_efficiency / per_group_coalescing via shared helper
[Refactor] Move Layout._calculate_max_offset to module-level
[Refactor] Rename internal _affine_inner → _strip_swizzle
[Refactor] Hoist from math import gcd to module top in analysis.py
[Tests] Parametrize 32 AMD oracle C-layout per-atom tests into a single parametrized test
[Tests] Relax isinstance(R, Layout); R.swizzle == sw assertions to representation-tolerant structural / pointwise checks
[Tests] Add examples/composed.py to make examples
[Docs] Update layout_api.md / tensor_api.md / analysis_api.md / examples after Layout becomes purely affine

The staticmethod was always called as obj._calculate_max_offset(obj.shape, obj.stride) -- it did not use 'self' at all and was already private. As a free function it is easier to reuse from cosize() and from any future helper that needs the affine span without going through a Layout instance. Tests: full suite (881 passed, 2 skipped).

The local import inside order() saved nothing -- math.gcd is in the standard library and importing at module top is the convention. Tests: tests/analysis.py (173 passed).

Replace the docstring-only alias for as_layout() with an explicit affinity post-check. The is_affine() assertion is belt-and-suspenders today (as_layout already rejects ComposedLayout) but documents the contract at one point of truth and protects against future loosening of as_layout(). Error message points callers at as_layout_expr() for the non-affine path, and the docs table now reflects the same. Tests: full suite (881 passed, 2 skipped).

Internal helper that returns the underlying affine Layout with any embedded Swizzle removed. The new name says what the helper does at the call site: 'strip the swizzle' is exactly the operation, used by flatten, right_inverse / left_inverse, compose against the affine layer, and the domain-only transforms. Also expand the docstring to explain why the operation is recurring. Mechanical rename across 7 call sites within layouts.py; no external callers. No behavior change. Tests: full suite (881 passed, 2 skipped).

The composed.py example was the only one in examples/ not exercised by the smoke target. Adding it ensures the ComposedLayout demo stays in sync with the API (same coverage gate as layouts.py / tensor.py / viz.py). Tests: make examples (4 scripts, 167 SVG + 2 PNG + 2 PDF generated).

The Self import existed only because typing.Self is Python >=3.11 while requires-python = '>=3.10'. The typing_extensions fallback was unsound: typing_extensions was never declared in the dependencies, so a fresh 3.10 install would ImportError on import. Single use site (Layout.squeeze) is a return-type annotation; with 'from __future__ import annotations' already in effect, annotations are strings at runtime, so 'Layout' as a forward-ref string is equivalent for the runtime and accepted by static type checkers (mypy/pyright resolve it in scope). Net effect: drops an undeclared optional dependency, removes a try/except import block, no support-version change, no behavior change. Tests: full suite (881 passed, 2 skipped).

Both functions executed the same per-thread-range bank-mapping kernel. Extract _bank_conflicts_for_thread_range() and have both call it; the single-group version passes start=0/end=min(thread_count,group_size), the per-group version calls it once per group. Single source of truth for the bank-conflict math; future bug-fixes or behavior tweaks (e.g. handling 8-byte words) land in one place. Public API and per-range result dict shape (conflict_free, max_ways, bank_to_threads) are unchanged. Tests: full suite (881 passed, 2 skipped).

Both functions ran the same per-thread-range cache-line / efficiency computation on top of _group_access_offsets(). Extract _coalescing_for_thread_range() and have both call it. Single source of truth for the cache-line/efficiency math; future tweaks (e.g. 32-byte segment counting, alignment-aware accounting) land in one place. Public API and per-range result dict shape (transactions, efficiency, cache_lines) are unchanged. Tests: full suite (881 passed, 2 skipped).

Both functions take a warp_size kwarg and use min(thread_count, warp_size) without validating warp_size. Negative or zero values silently produced nonsense results (negative range or empty range, with 0/0 efficiency etc.). Now matches the validation pattern already used by bank_conflicts and the per_group_* helpers (group_size > 0). Tests: tests/analysis.py (173 passed); manually exercised with warp_size in {0, -4} to confirm ValueError.

The two key-walking helpers (_has_nested_none, _contains_free_coordinates) were near-duplicates. _has_nested_none silently misclassified one case: slice objects nested in a hierarchical coordinate tuple (e.g. `T[(slice(None), 0), 1]`) returned False, so the code treated the key as fully fixed and passed the slice through to slice_and_offset, which is not contractually defined for slice objects. Replace with _contains_free_coordinates at the call site and add a focused _tuple_contains_slice helper that surfaces the misuse with an explicit TypeError. Net helper count is unchanged; the diverging case is now an explicit error rather than an undefined silent-pass. Tests: full suite (882 passed, 2 skipped); new regression test_tensor_slice_rejects_slice_inside_coordinate_tuple covers the explicit reject.

cosize(ComposedLayout) is O(size(L)) -- it must enumerate the full domain because the outer can be a Swizzle, a non-bijective Layout, or another ComposedLayout that permutes/rescales the inner's image. The result is invariant for a given (outer, inner, offset) triple, so cache it once on the instance. Implementation uses a declarative dataclass field with init/repr/eq/hash all set to False, which keeps the cache slot out of __init__, __repr__, __eq__ and __hash__ (so two structurally equal layouts with different cache states still compare equal and remain dict-key compatible). Frozen dataclass blocks normal __setattr__, so the lazy write in cosize() goes through object.__setattr__. Tests: full suite (884 passed, 2 skipped). New tests verify the cache is read on the hot path (poison + observe) and that the cache is invisible to equality/hashing.

The affine path in cosize() returned _affine_max_offset(shape, stride) + 1 even when the layout carried an embedded swizzle, ignoring the swizzle's effect on the image. For non-power-of-2 affine images the swizzle's XOR can flip a bit ABOVE the affine max: L = Layout(5, 1, swizzle=Swizzle(2, 0, 2)) affine image: [0, 5) -> declared cosize 5 actual image: {0,1,2,3,5} -> true cosize 6 For power-of-2 affine images the swizzle is a bijection on [0, 2^N) and the two formulas agree, which is why the bug was invisible to the existing test suite. Mirrors the ComposedLayout fix in 5fbd19f for the embedded-swizzle form. Add a swizzle-enumeration branch (same shape as the ComposedLayout one but uncached for now -- a Layout cache slot will be added in a follow-up to amortize the O(size) cost). Tests: full suite (884 passed, 2 skipped); new regression test_cosize_swizzled_layout_enumerates_image covers both the bug case (L(5,1)+Sw -> 6) and the agreement case (L(16,1)+Sw -> 16).

Layout(shape, stride, swizzle=Sw) had its swizzle-aware cosize fixed in 79fa734 by enumerating max(L(i)) + 1, but each call paid O(size). Add a _cached_cosize slot on Layout (mutable; the class is hand-written, not a frozen dataclass) and have the swizzled branch read/write it the same way the ComposedLayout cache works. Unswizzled layouts go through the O(1) affine path and never touch the cache slot. Coverage expansion: - test_cosize_swizzled_layout_caches_on_instance: poison + observe to verify the hot path reads the cache. - test_cosize_unswizzled_layout_does_not_populate_cache: unswizzled layouts skip the cache entirely (sanity check). - test_cosize_swizzled_layout_matches_composed_form: cross-form parity check across 4 shape/swizzle combinations -- embedded Layout and ComposedLayout(Sw, L, 0) must agree on cosize. - test_complement_consumes_corrected_swizzled_cosize: confirms the fix propagates downstream. complement uses cosize as its codomain bound; with the corrected cosize on Layout(5, 1, swizzle=Sw), complement returns a layout of size 2 / stride 5 instead of the degenerate result the buggy cosize=5 would have produced. Tests: full suite (889 passed, 2 skipped).

Skip the O(size) per-coordinate walk in _validate_storage when the layout is the canonical zero-offset Sw o L form (both ComposedLayout(Sw, L, 0) and embedded-swizzle Layout). The image lives in [0, cosize(layout)) because Swizzle is a bit-permutation; with cosize cached on the instance (65fd5c4 / de0269f) repeat calls are O(1). Two preconditions: 1. Inner affine layout has non-negative strides (image starts at 0, so Sw(0) = 0 anchors the lower bound). 2. Tensor offset is added AFTER the swizzle. ComposedLayout always adds it after; embedded-swizzle Layout adds it BEFORE the swizzle (per tensor.py:38-41), which rotates the swizzle's input domain rather than translating the image. So the embedded form only takes the fast path at offset == 0; ComposedLayout takes it for any offset. Five focused tests: - matches_walk: table-driven correctness check across both representations, both fast and slow paths, agreeing with explicit per-coordinate enumeration. - fast_path_taken_for_canonical_swizzle: whitebox poison check on both preconditioned forms. - slow_walk_for_negative_stride_inner: confirms the existing affine fast path still handles negative strides. - slow_walk_for_embedded_swizzle_with_nonzero_offset: pins the precondition I initially got wrong; embedded@offset>0 must walk. - slow_walk_for_inverse_form_composed: F6 inverse-form (Layout outer, Swizzle inner) walks correctly. Tests: full suite (894 passed, 2 skipped).

matplotlib (and the numpy it pulls in) are an optional dependency of the package, exposed via the [viz] extra. Direct submodule imports like 'from tensor_layouts.viz import draw_layout' (the form the README documents) used to surface a deep ModuleNotFoundError pointing at matplotlib internals when the extra wasn't installed -- not actionable. Wrap the matplotlib/numpy imports at module top in a single try/except that re-raises ImportError with the install hint (chained from the original error). Behavior with the extra installed is unchanged. Tests: full suite (895 passed, 2 skipped). New regression test_viz_module_raises_actionable_importerror_when_matplotlib_missing simulates the missing-dependency case via a meta_path finder that blocks matplotlib + numpy and asserts the wrapped message contains the 'pip install tensor-layouts[viz]' hint.

Same kind of failure was reported with different exception classes in different places. Fix four small misalignments before introducing the LayoutError / UnsupportedComposedLayoutError / TensorStorageError hierarchy so the migration lands cleanly: 1. to_F2_matrix's F6 inverse-form rejection: was ValueError, now NotImplementedError -- matches the canonical _reject_swizzle_inner_composed family in layouts.py. 2. slice_modes / dice_modes 'tuple coord vs scalar shape' rejection: was TypeError, now ValueError -- matches the structure-mismatch precedent already used by crd2flat / crd2offset and other rank mismatches. 3. prefix_product / suffix_product 'tuple init applied to scalar' rejection: was TypeError, now ValueError -- matches the Length mismatch sibling raise in the same function. 4. _validate_order_permutation: 'order argument not iterable' was ValueError (wrapping the underlying TypeError), now TypeError -- the precondition is a type problem, not a value one. The 'not a permutation' case at the next branch correctly stays ValueError. Updated tests/analysis.py::test_to_F2_matrix_rejects_inverse_form_composed_layout to expect NotImplementedError. Tests: full suite (895 passed, 2 skipped).

…eError Same family of failure was being reported with different exception classes across the codebase (ValueError vs TypeError vs NotImplementedError) and there was no way for a caller to catch 'layout-algebra error' or 'tensor-storage error' specifically without matching message text. Define three small marker classes at the top of layouts.py and migrate the corresponding raise sites: - LayoutError(ValueError) for layout-algebra preconditions (shape/stride congruence, rank mismatch, mode out of range, tiler incompatibility, swizzle mask overlap, etc.). 42 raise sites in the layout algebra now use this. - UnsupportedComposedLayoutError(NotImplementedError) for the F6 inverse-form / Swizzle-in-inner-slot ComposedLayout rejections (complement, coalesce, logical_product / logical_divide, slice decomposition, to_F2_matrix). Three raise sites canonicalised on this name. The unrelated 'this decomposition is not implemented' raise in from_F2_matrix stays as plain NotImplementedError. - TensorStorageError(ValueError) for Tensor storage-state errors (no backing storage on view/assign, layout addresses negative indices, storage too small). Four raise sites; the two 'no storage' cases were previously TypeError -- the type is right, the *state* is wrong, so the new class is more accurate. All three subclass standard Python base classes, so existing 'except ValueError' / 'except NotImplementedError' handlers continue to catch them. The new names are added to the package's __all__ so they propagate via the existing star-import surface; user code can 'from tensor_layouts import LayoutError' and catch the specific kind. Also updates two tests in tests/tensor.py that asserted on the previous TypeError class for the 'no storage' cases. Tests: full suite (895 passed, 2 skipped). All previously-passing tests still pass; the two tests asserting on the prior TypeError class were updated to expect TensorStorageError.

The oracle_amd.py file had 32 hand-written test functions, one per atom, each calling validate_c_layout(<ATOM>, <arch>) with no other variation. Adding a new atom required appending a new function with the same shape, and the section banners doubled the boilerplate. Replace the 32 functions with a single parametrized test driven by an ORACLE_C_LAYOUT_CASES list of (atom, arch) pairs. The id= callback reuses the atom name so per-case test IDs stay readable (test_oracle_validate_c_layout[CDNA_32x32x8_F32F16F16_MFMA-cdna1] etc.). Coverage parity verified: pytest --collect-only reports the same 32 parametrized cases, one per (atom, arch) pair as before. No change to the arch tagging, no change to validate_c_layout. The structural self-consistency tests below this section (TestMFMAStructural and TestLayoutAlgebra, already parametrized over ALL_ATOMS) are unchanged. Adding a new atom is now a one-line append to ORACLE_C_LAYOUT_CASES instead of a copy-paste of the @requires_calculator + def + body. Tests: full suite (1955 passed, 34 skipped). The 32 skipped come from @requires_calculator -- this env doesn't have the AMD calculator package installed; the gating semantics are unchanged.

…TER swizzle) Previously a Tensor over a Layout-with-embedded-swizzle computed the storage address as 'Sw(offset + L(coord))' -- the external offset was folded into the swizzle's input domain. A Tensor over a ComposedLayout already computed 'offset + ComposedLayout(coord)', adding the offset linearly AFTER the layout call. The two forms thus disagreed on addresses for nonzero Tensor offset, even though they denoted the same function as layout expressions. The asymmetry was inherited from the very first commit (6cde897, the import from the Meta-internal predecessor). It pre-dated ComposedLayout support entirely; ComposedLayout was added later (da0ea0e) with deliberately CuTe-aligned semantics, and the embedded-swizzle path was not re-examined against it. This commit aligns the embedded-swizzle path with CuTe and with the existing ComposedLayout path. After: tensor(coord) == tensor.offset + tensor.layout(coord) for ALL Tensor forms. The Tensor's external offset is a pointer-style shift; it never enters the swizzle's input domain. Cross-referenced against: - CuTe C++ ComposedLayout::operator() cutlass/include/cute/layout_composed.hpp:114-120 return layout_a()(offset() + layout_b()(coord)); // (A o O o B)(c) = A(O + B(c)) - CuTe C++ Tensor::operator[] cutlass/include/cute/tensor_impl.hpp:222-225 return data()[layout()(coord)]; The base offset lives in the data() iterator; the layout call is NOT given access to it. Slicing folds the slice contribution into the engine pointer: return make_tensor(data() + offset, sliced_layout); - CuTe canonical Tensor documentation cutlass/media/docs/cpp/cute/03_tensor.md:9 'uses the result of the Layout computation to offset and dereference a random-access iterator held by the Engine.' - pycute (Python reference port -- layout side only; no Tensor concept) cutlass/python/pycute/swizzle.py:108-109 return self.layoutB(self.offset + self.layoutA(*args)) Same formula as CuTe C++ (just the opposite naming convention: pycute's layoutB == CuTe's layout_a == outer). - CuTe paper formal definition Tensor T = (Engine E, Layout L); T(coord) = E[L(coord)]. To make slicing chains preserve correctness under the new addressing rule, slice_and_offset's bare-Layout-with-embedded-swizzle path now mirrors what _slice_for_composition already did for the ComposedLayout case: the slice's contribution is folded into a Form-B ComposedLayout(Sw, sub_L, offset=delta) so the swizzle is applied to '(delta + sub_L(coord))' inside the ComposedLayout, then the existing affine-decay attempt is given a chance to reduce it back to a plain Layout when the swizzle is affine on the surviving inner image. The linear residue handed back to the Tensor is zero (or the post-decay base offset). This matches CuTe slice_and_offset on a swizzled ComposedLayout (cutlass/include/cute/swizzle_layout.hpp:230-262), where the swizzle-interacting part of the slice is XORed into the ComposedLayout's own offset and only the linear-residue part is returned for the engine pointer. Behavior change from a caller's point of view: only manifests when constructing 'Tensor(swizzled_Layout, offset=k_nonzero, data=...)' DIRECTLY. The pre-fix repo never exercised that pattern -- zero tests, zero examples, zero notebooks construct it. Slicing chains on swizzled Tensors continue to produce the same memory addresses as direct indexing (verified pointwise across the existing test suite). A UserWarning is raised at Tensor.__init__ for the affected pattern (Layout-with-swizzle plus nonzero external offset) to flag the change. The suggested replacement -- Tensor(ComposedLayout(swizzle, layout, offset=k), offset=0) -- recovers the old fold-into-domain semantic exactly, since ComposedLayout's own offset slot DOES enter the swizzle's input domain via outer(comp.offset + inner(coord)). Documentation: docs/tensor_api.md updated to state the unified rule and remove the 'pre-swizzle linear address' wording that described the old embedded-swizzle path. Tests: 897 passed, 2 skipped. Eight tests that pinned the pre-fix structural representation (e.g. asserting row.offset == 24 after slicing, asserting the sub-Layout still carried .swizzle) were rewritten to the functional contract (sub(coord) == orig(c)). One test (test_swizzled_tensor_full_slice_matches_explicit_full_slice) wraps its body in warnings.catch_warnings() because it deliberately exercises the affected pattern. Two new positive regression tests pin the new semantic explicitly: - test_tensor_embedded_swizzle_offset_added_after_swizzle: asserts Tensor(EmbSwL, offset=k)[coord] == k + Sw(L(coord)). - test_tensor_embedded_swizzle_and_composed_form_agree_under_offset: asserts the embedded and explicit ComposedLayout(Sw, L, 0) forms produce the same addresses for the same Tensor offset. The full suite was also re-run with -W error::UserWarning to confirm nothing in the test/example surface inadvertently trips the new warning.

After the CuTe-aligned addressing fix (commit c19e378), the Tensor's external offset is added linearly AFTER the layout call for both the embedded-swizzle Layout and the explicit ComposedLayout(Sw, L, 0) forms. The fast-path bound bounds = (offset, offset + cosize(layout) - 1) is therefore correct for ANY offset on either form -- the previous gate that restricted the embedded form to offset == 0 is no longer needed. Simplifies the precondition from two to one (inner image starts at 0) and shortens the fast-path comment accordingly. Behavior unchanged for the cases the gate previously allowed; behavior improved (fast path instead of slow O(size) walk) for embedded-swizzle Layout with nonzero external offset. Tests: full suite (897 passed, 2 skipped, 0 warnings under -W error::UserWarning). Two tests updated: - test_address_bounds_slow_walk_for_embedded_swizzle_with_nonzero_offset is renamed to ..._fast_path_for_... and pins the fast-path bound explicitly (lo == offset, hi == offset + cosize - 1). - test_address_bounds_fast_path_taken_for_canonical_swizzle (the whitebox poison-the-cache test) now exercises the embedded form with a non-zero offset too.

…le form Pure no-op refactor of the test surface in preparation for the Path X representation collapse: Layout becomes purely affine. ComposedLayout is the single representation for swizzled forms. This commit only touches tests and examples; source behavior is unchanged and the suite still passes. The next commit flips producers and the following removes the embedded-form carrier, this same test suite must continue to pass without any further test-side changes -- so all assertions that pinned the embedded form ("isinstance(R, Layout) and R.swizzle == sw") have been relaxed to representation-tolerant equivalents. Two kinds of edits: * Construction-site swap: Layout(.., swizzle=Sw) literals replaced with compose(Sw, Layout(...)) or ComposedLayout(Sw, Layout(...)). Both forms have identical address semantics post-c19e378, so the swap is purely surface. * Assertion-site relaxation: "isinstance(R, Layout); R.swizzle ==" pins replaced with "isinstance(R, (Layout, ComposedLayout))" plus a structural / pointwise equality check that holds for either representation.

…ded swizzle in-tree After this commit no in-tree code constructs an embedded-swizzle Layout via the algebra; ComposedLayout is the canonical representation for every swizzled form. The `Layout(..., swizzle=...)` constructor kwarg is still accepted (removed in the next commit) for backward compatibility while the slot lives on the class. Producer flips (src/tensor_layouts/layouts.py): * _compose_swizzle_lhs: always returns ComposedLayout(swizzle, layout_b) instead of decaying to Layout(.., swizzle=swizzle) for affine layout_b. This is the single most cascading change -- every `compose(Sw, L)` call site now produces a ComposedLayout. * _compose_layout_with_layout: drops the `if layout_b.swizzle is not None: return ComposedLayout(layout_a, layout_b)` short-circuit that previously kept compose(L, embedded_L) intact. * _compose_with_composed_rhs: now keeps the swizzled wrapper intact instead of associating compose(L, ComposedLayout(Sw, L', 0)) into compose(compose(L, Sw), L'). Pre-Path-X this path was hidden by the short-circuit above; the underlying swizzle-transfer in _compose_with_swizzle_rhs is not pointwise-correct for hierarchical affine outers. Pure-affine ComposedLayout (Layout outer) still associates safely. * compose() dispatcher: drops the LHS-Layout-with-swizzle arm at line 3434-3435 (now unreachable from in-tree producers). The helper _compose_swizzled_layout_lhs is left in place; deleted in C3. * _forward_layout_domain: the legacy embedded-swizzle branch now always promotes to ComposedLayout (no more `Layout(.., swizzle=...)` rewrap fast path). * Layout.__call__ slice path, Layout.squeeze, Layout.filter, Layout.flatten, mode(): drop the `swizzle=self._swizzle` kwarg. Path X Layout is purely affine; in-tree callers never reach these paths with a swizzled self. * right_inverse, left_inverse: drop the `Layout.swizzle is not None` fast paths; only the ComposedLayout(Swizzle, Layout, 0) arm remains. * slice_and_offset, _slice_for_composition: collapse to the affine slice; the legacy Form-B promotion (slice contribution folded into a ComposedLayout(Sw, sub_L, offset=delta)) is now unreachable for bare Layout because no Layout is ever swizzled in-tree. * _try_decay_swizzle_composed: drop the redundant `inner.swizzle is not None` rejection. * logical_product: drop the `inner.swizzle is None` extra check on the swizzled-tile fast path; drop the `Layout(.., swizzle=embedded_swizzle)` reattachment in the generic fallback (no longer needed; _logical_product_with_swizzled_tile already returns the right ComposedLayout form). Consumer updates: * src/tensor_layouts/tensor.py: - _tensor_address: legacy Layout.swizzle arm preserved as a compatibility branch while the kwarg is accepted; removed in C3. - _address_bounds: precondition on the affine fast path uses getattr(layout, 'swizzle', None) is None so the canonical path is taken even when split helpers handle the swizzled form. - Tensor.__getitem__: drop `swizzle=sub.swizzle` in the slice-result reconstruction. - Tensor.__init__ embedded-form warning: left in place for C2 (warning is unreachable from in-tree code); deleted in C3. * src/tensor_layouts/analysis.py: - to_F2_matrix: legacy Layout.swizzle post-composition preserved with getattr; in-tree callers never hit it. - from_F2_matrix: now constructs ComposedLayout(sw, Layout, 0) instead of Layout(.., swizzle=sw). * src/tensor_layouts/viz.py: - _normalize_display_layout: drops `swizzle=layout.swizzle` kwarg. - _eval_layout_with_offset: always takes the affine branch on a bare Layout (no embedded-swizzle apply step). - _layout_expr_with_offset: always uses the identity-outer ComposedLayout to internalise the external offset.

After the previous commit, nothing in-tree produced or read embedded-swizzle Layout; this commit removes the carrier itself and all the dispatch arms that fed it. src/tensor_layouts/layouts.py: * Layout.__init__: drop the `swizzle=` kwarg and the `self._swizzle` slot. Drop the `self._cached_cosize` slot (cosize is closed-form O(1) for affine Layout; the cache lived only to amortise the swizzle-aware enumeration that no longer applies to Layout). * Layout.swizzle property: deleted. * Layout.__eq__: drop the `_swizzle` term. * Layout.__hash__: drop the `_swizzle` hash term; reduce to hash((shape, stride)). * Layout.__repr__: collapse to the single-form `Layout(shape, stride)`; the eval-roundtrip is now exact. * Layout.__str__: drop the `(Sw) o (...)` wrapper; simple `shape : stride` notation. * Layout.__call__: drop the swizzle post-application; coordinate evaluation is just `crd2offset(coords, shape, stride)`. * cosize(): drop the embedded-swizzle Layout cache branch. * _strip_swizzle(): deleted (no callers). * _split_zero_offset_swizzle(): keep only the `ComposedLayout(Sw, L, 0)` arm; the Layout arm is gone. * _compose_swizzled_layout_lhs(): deleted (was unreachable since C2). * compose() dispatcher comment cleaned up. src/tensor_layouts/tensor.py: * _tensor_address: collapsed to `return offset + crd2offset(coords, layout.shape, layout.stride)` on the Layout branch (no more swizzle post-application). * _address_bounds: simplified affine fast-path; the `getattr(layout, 'swizzle', None) is None` guard is gone since Layout has no swizzle attribute. ComposedLayout split-handling below is unchanged. * Tensor.__init__: deleted the `Tensor(swizzled Layout, offset!=0)` UserWarning block and the back-compat folding documentation. The warned-about case is no longer constructible -- Layout has no swizzle attribute, so `isinstance(self._layout, Layout) and self._layout.swizzle is not None` is structurally impossible. * import warnings kept as `# noqa: F401` for re-export stability. src/tensor_layouts/analysis.py: * to_F2_matrix: dropped the `if layout.swizzle is not None` post-composition arm; bare Layout always goes through the affine column-build path.

Tracking the representation collapse in user-facing prose. All behaviour changes in the last 2 commits; this commit only updates documentation, example narration, and one example assertion that pinned the legacy embedded-swizzle Layout shape. docs/layout_api.md: * 'Layout Expressions and ComposedLayout' section: drop the 'Layout may also carry one canonical final swizzle' framing; Layout is now purely affine and ComposedLayout is the home for every non-affine form. * 'When compose() returns Layout vs ComposedLayout' section: the canonical Sw o L now uniformly returns ComposedLayout(Sw, L, 0) instead of Layout(.., swizzle=Sw). Updated the example output to match the new repr; bare Layout returns are documented as 'both operands affine' only. * 'Example: canonical fast path vs exact fallback' section: updated the type assertions to match the new ComposedLayout return. * compose() reference paragraph: drop the 'returns a Layout with an embedded swizzle' shortcut wording. docs/tensor_api.md: * 'Composed Layouts' section: drop the dual-form `Tensor(Layout(.., swizzle=Sw))` mention; only `Tensor(ComposedLayout(Sw, L, k))` survives. The address rule `tensor(coord) == tensor.offset + tensor.layout(coord)` is unchanged. docs/analysis_api.md: * to_F2_matrix example: drop the embedded-form input; show only the canonical ComposedLayout form. * from_F2_matrix description: now returns a LayoutExpr (Layout or ComposedLayout(Sw, Layout, 0)) instead of optionally embedded. * Round-trip example uses the canonical ComposedLayout form. examples/composed.py: * example_fast_path renamed in spirit ('canonical swizzled form ... returns a ComposedLayout'); pinned to the new structure (assert isinstance(swizzled, ComposedLayout); swizzled.outer == Swizzle(...); swizzled.inner == base). examples/layouts.py: * example_swizzle docstring: 'embeds the swizzle inside the Layout' -> 'produces a ComposedLayout(Swizzle, Layout, 0) -- the canonical Path X representation'.

…te order difference tensor-layouts uses `ComposedLayout(outer, inner, offset=k)` while CuTe C++ and pycute place the offset positionally between the outer and inner slots: `ComposedLayout<A, Offset, B>` / `ComposedLayout(layoutB, offset, layoutA)`. The Python ordering is the more ergonomic choice for the common zero-offset canonical `Sw o L` case (the parameter can drop entirely), but it creates a porting trap: someone copying a CuTe positional literal into Python could accidentally write `ComposedLayout(Sw, k, L)` and get... well, today, a clear TypeError on the `inner` type-check (`int is not Layout/ComposedLayout/Swizzle`), but the failure mode wasn't obvious from the constructor signature. Promote `offset` to a keyword-only field so both shapes of the trap fail at the call-site (positional argument count) rather than later in `__post_init__`: ComposedLayout(Sw, L, 4) # tensor-layouts positional offset -> rejected ComposedLayout(Sw, 4, L) # CuTe-style positional order -> rejected ComposedLayout(Sw, L) # default offset=0 (canonical) -> works ComposedLayout(Sw, L, offset=4) # explicit non-zero offset -> works All in-tree call sites already use `offset=` keyword form (audited across src/, tests/, examples/), so this is a zero-breakage hardening. Implementation: `offset: int = field(default=0, kw_only=True)` on the frozen dataclass (Python 3.10+ `kw_only` field option). Docs: new 'Constructor signature vs CuTe / pycute' subsection in docs/layout_api.md. Includes a 3-row comparison table and explicit shows-and-tells for the rejected positional shapes. The semantics section above it now uses `offset=k` notation everywhere to match the constructor. Tests: 2240 passed, 142 skipped (unchanged).

Summary: ``_split_zero_offset_swizzle`` was imported privately by ``tensor.py`` to gate an O(1) fast path in ``_address_bounds``, an abstraction leak. It also matches the structural query that ``max_common_layout`` / ``max_common_vector`` already use internally. Promote it to public: - Renamed to ``split_outer_swizzle`` -- the prior name overpromised; it only recognises the canonical ``ComposedLayout(Sw, L, offset=0)`` form, not the inverse-form ``ComposedLayout(L, Sw, offset)`` produced by ``right_inverse`` / ``left_inverse``. The "outer" qualifier names the slot the Swizzle occupies; pairs with the existing private predicate ``_is_swizzle_inner_composed``. - Expanded docstring: states what is and isn't recognised, explains why the inverse-form is intentionally excluded (different semantics, can emit negative addresses), and points at where to grow a sibling ``split_inner_swizzle`` if a public consumer ever appears. - Added to ``__all__`` next to the other swizzle exports. - Dropped the private back-channel ``from .layouts import _split_zero_offset_swizzle`` from ``tensor.py``; the public name arrives via the existing ``from .layouts import *``. Test Plan: ``make test`` -- 2240 passed, 142 skipped (unchanged).

Promote ``src/tensor_layouts/layouts.py`` (4.4k LOC) to a package with three layered modules and an aggregating ``__init__.py``: * ``core.py`` -- exceptions, type predicates, tuple operations, the affine ``Layout`` class, ``Tile``, and the ``Swizzle`` primitive. No dependency on ``ComposedLayout``; the import graph now enforces what used to be a convention. * ``expr.py`` -- the ``LayoutExpr`` layer: ``ComposedLayout`` plus every predicate / coercer that operates on ``LayoutExpr = Layout | ComposedLayout`` (``is_layout``, ``is_affine``, ``as_layout``, ``as_layout_expr``, ``as_affine_layout``, ``split_outer_swizzle``, ``_forward_layout_domain``). * ``algebra.py`` -- the CuTe layout algebra (compose, complement, divide, product, inverses, coalesce, idx2crd, upcast/downcast, ...). Dependency direction is strictly ``core <- expr <- algebra``. The package's ``__all__`` is the union of the three submodules' own ``__all__`` lists (rather than a hand-curated 89-name copy that drifts in practice -- this fix surfaces ``coords_all_none``, which the curated list had silently dropped). Private symbols still consumed by other in-package modules (``_NO_FORWARD``, ``_forward_layout_domain``, ...) are explicitly re-exported from ``__init__.py``. No public API change: every name previously importable from ``tensor_layouts.layouts`` remains importable from the same path. All 2350 tests still pass.

jduprat and others added 27 commits May 12, 2026 12:36

Hoist 'from math import gcd' in analysis.py

e838aae

The local import inside order() saved nothing -- math.gcd is in the standard library and importing at module top is the convention. Tests: tests/analysis.py (173 passed).

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 15, 2026

jduprat merged commit 75ec7ad into facebookresearch:main May 15, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Development updates 20260515#24

Development updates 20260515#24
jduprat merged 27 commits into
facebookresearch:mainfrom
jduprat:dev

jduprat commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jduprat commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant