Releases: MapleSilicon/SparseFlow
SparseFlow v0.3-alpha — GPU Lowering Pipeline (Kernel Stub)
🚀 SparseFlow v0.3-alpha
This is an architectural alpha release that completes SparseFlow’s end-to-end GPU lowering pipeline.
This release validates the compiler flow from structured sparsity analysis to GPU IR generation using MLIR.
✅ What’s Included
Compiler Passes
-
sparseflow-spa
- Static N:M sparsity propagation at compile time
-
sparseflow-rewrite-matmul
- Rewrites sparse
linalg.matmulinto backend calls
- Rewrites sparse
-
sparseflow-gpu-rewrite
- Lowers sparse calls into:
gpu.modulegpu.func(kernel stub)gpu.launch
- Lowers sparse calls into:
Verified Pipeline
→linalg.matmul
→ sparseflow-spa
→ func.call @sparse_matmul_N_M
→ gpu.launch + gpu.func
The full pipeline executes successfully with verifier-clean MLIR.
⚠️ Limitations (By Design)
- GPU kernel contains no computation (
gpu.returnonly) - No bufferization or memory lowering
- No correctness or performance claims on GPU
- Output tensor is currently a placeholder
This release is about compiler correctness and architecture, not performance.
🧱 Why This Matters
SparseFlow now has:
- Stable pass registration
- Stable plugin loading
- Verified CPU → GPU lowering boundary
- A clean foundation for real GPU kernel development
This is the last structural milestone before implementing GPU kernels.
🔜 What’s Next
v0.3-beta
- Define GPU kernel ABI
- Memory layout and bufferization
- Replace stub kernel with real N:M computation
📄 License
MIT
Full Changelog: v0.7.1...v0.3-alpha
SparseFlow v0.7.1 – SPA + CPU Runtime + Python CLI (Dev Preview)
SparseFlow v0.7.1 – SPA + CPU Runtime + Python CLI (Dev Preview)
This release packages the current SparseFlow stack into a usable, reproducible demo
for researchers, compiler engineers, and hardware partners.
Highlights
-
SPA v0.6 (2D Sparsity)
- Propagates row + column sparsity through a small MLIR pipeline
- Exports JSON metadata with row/col masks and sparsity stats
- Proven on structured sparse matmuls with 50% row + 50% col sparsity (≈75% FLOP reduction)
-
C++ CPU Runtime (OpenMP)
- Consumes SPA JSON masks and skips zero rows/cols
- Blocked matmul kernel with OpenMP parallelism
- Achieves ≈3–5× speedup on larger matmuls (512–1024) at ~75% sparsity
- Honest behavior: small sizes (<512) can be 1–3× due to OpenMP and cache overhead
-
End-to-End Demo Script
./spa-runner.sh:- Rebuilds passes/runtime if needed
- Runs SPA on test MLIR
- Exports
spa_sparsity.json - Benchmarks dense vs sparse CPU matmuls
-
Python CLI (Developer Preview)
Install:cd sparseflow_package pip install -e .Tools:
sparseflow-demo sparseflow-analyze tests/test_spa_v6_full_2d.mlir sparseflow-benchmark
Quick Start
git clone https://github.com/MapleSilicon/SparseFlow.git
cd SparseFlow
./spa-runner.sh
SparseFlow v0.1.1 — JSON-Driven N:M Runtime + Dimension Support
This release includes:
- Full MLIR pass pipeline (AnnotateNmPass, ExportMetadataPass, etc.)
- JSON-driven compiler → runtime integration
- Dimension-aware runtime (M, N, K correctly read from metadata)
- Sparse matmul simulator with N:M skipping logic
- End-to-end validation scripts
- Matrix + sparsity sweep tool