[RFC] Geometric Sparse Attention Layer (AETHER)

## Proposal
I propose adding `GeometricSparseAttention`, a new modular attention layer that enables **data-dependent, mathematically safe sparsity** for long-context inference.

Unlike static sparse patterns (e.g., Sliding Window, BigBird) which are rigid, this layer uses **AETHER (Adaptive Event-driven Threshold Hybrid Entangled Rendering)** logic to dynamically prune computation blocks at runtime based on the geometric topology of the keys.

## Motivation
Current attention mechanisms in Penzai (`pz.nn.Attention`) scale quadratically $O(N^2)$. While Penzai excels at model surgery and interpretability, analyzing long-context behavior (100k+ tokens) is currently computationally prohibitive.

By introducing a geometric decision gate, we can enable Penzai users to:
1.  **Scale:** Run inference on massive contexts using sub-linear compute.
2.  **Inspect:** Visualize the "Manifold Mask" in Treescope to understand *semantically* which blocks the model deems important.

## Technical Approach
The core logic relies on the **Cauchy-Schwarz Upper Bound** to guarantee safety. For a query $q$ and a key block $B$ with centroid $\mu$ and radius $r$:

$$\max_{k \in B} (q \cdot k) \le q \cdot \mu + \|q\| \cdot r$$

If this upper bound is below the threshold $\tau$, the block is skipped.

### Proposed API
The layer would follow the `pz.nn.Layer` interface and fully support `NamedArray` for axis safety.

```python
@pz.pytree_dataclass
class GeometricSparseAttention(pz.nn.Layer):
    """A geometric sparse attention layer compatible with pz.select()."""
    
    block_size: int = 64
    threshold: float = 0.15
    # ... implementation details ...

    def __call__(self, query, key, value, mask=None):
        # 1. Compute block centroids/radii (JAX-friendly reshape)
        # 2. Compute upper-bound scores
        # 3. Create boolean mask
        # 4. Apply masked attention
        pass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Geometric Sparse Attention Layer (AETHER) #133

Proposal

Motivation

Technical Approach

Proposed API

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Geometric Sparse Attention Layer (AETHER) #133

Description

Proposal

Motivation

Technical Approach

Proposed API

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions