Proposal
I propose adding GeometricSparseAttention, a new modular attention layer that enables data-dependent, mathematically safe sparsity for long-context inference.
Unlike static sparse patterns (e.g., Sliding Window, BigBird) which are rigid, this layer uses AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) logic to dynamically prune computation blocks at runtime based on the geometric topology of the keys.
Motivation
Current attention mechanisms in Penzai (pz.nn.Attention) scale quadratically $O(N^2)$. While Penzai excels at model surgery and interpretability, analyzing long-context behavior (100k+ tokens) is currently computationally prohibitive.
By introducing a geometric decision gate, we can enable Penzai users to:
- Scale: Run inference on massive contexts using sub-linear compute.
- Inspect: Visualize the "Manifold Mask" in Treescope to understand semantically which blocks the model deems important.
Technical Approach
The core logic relies on the Cauchy-Schwarz Upper Bound to guarantee safety. For a query $q$ and a key block $B$ with centroid $\mu$ and radius $r$:
$$\max_{k \in B} (q \cdot k) \le q \cdot \mu + |q| \cdot r$$
If this upper bound is below the threshold $\tau$, the block is skipped.
Proposed API
The layer would follow the pz.nn.Layer interface and fully support NamedArray for axis safety.
@pz.pytree_dataclass
class GeometricSparseAttention(pz.nn.Layer):
"""A geometric sparse attention layer compatible with pz.select()."""
block_size: int = 64
threshold: float = 0.15
# ... implementation details ...
def __call__(self, query, key, value, mask=None):
# 1. Compute block centroids/radii (JAX-friendly reshape)
# 2. Compute upper-bound scores
# 3. Create boolean mask
# 4. Apply masked attention
pass
Proposal
I propose adding
GeometricSparseAttention, a new modular attention layer that enables data-dependent, mathematically safe sparsity for long-context inference.Unlike static sparse patterns (e.g., Sliding Window, BigBird) which are rigid, this layer uses AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering) logic to dynamically prune computation blocks at runtime based on the geometric topology of the keys.
Motivation
Current attention mechanisms in Penzai ($O(N^2)$ . While Penzai excels at model surgery and interpretability, analyzing long-context behavior (100k+ tokens) is currently computationally prohibitive.
pz.nn.Attention) scale quadraticallyBy introducing a geometric decision gate, we can enable Penzai users to:
Technical Approach
The core logic relies on the Cauchy-Schwarz Upper Bound to guarantee safety. For a query$q$ and a key block $B$ with centroid $\mu$ and radius $r$ :
If this upper bound is below the threshold$\tau$ , the block is skipped.
Proposed API
The layer would follow the
pz.nn.Layerinterface and fully supportNamedArrayfor axis safety.