Skip to content

tile-ai/TileOPs

Repository files navigation

TileOPs

Spec-driven GPU operator library for LLMs — designed for AI agents to build, evaluate, and optimize

Built on TileLang

Installation | Quick Start | Docs

Status: TileOPs is under active development. APIs may change.

Overview

TileOPs is a GPU operator library for LLM training and inference, built on TileLang. Beyond providing a growing collection of production-quality operators, TileOPs explores a spec-driven development model where AI agents can read declarative operator specifications, generate kernel implementations, and evaluate them against hardware-theoretical performance bounds — with minimal human scaffolding.

Architecture

Every operator is split into two layers with a strict boundary:

  • Op (L2) — stateless Python entry point. Handles validation, dtype casting, and memory layout. Compatible with CUDA-Graph and torch.compile.
  • Kernel (L1) — TileLang GPU implementation with hardware-specific optimizations (Ampere, Hopper).

This separation keeps user-facing behavior independent of GPU strategy, allowing agents and developers to modify either layer without side effects on the other.

Key Properties

  • Spec-driven — each operator is declared in a machine-readable manifest (ops_manifest.yaml) that specifies signatures, workloads, and roofline formulas, serving as the entry point for both agent code generation and automated validation
  • Roofline-evaluated — kernel performance is measured against Speed-of-Light hardware bounds, not relative baselines
  • Auto-tuning — built-in search over tile sizes, pipelines, and scheduling parameters
  • Lightweight — depends only on TileLang, PyTorch, and einops

Installation

TileOPs can be installed from PyPI or built from source. A CUDA-capable GPU is required.

Prerequisites

  • Python >= 3.10
  • PyTorch >= 2.1
  • CUDA Toolkit
  • NVIDIA GPU: Hopper (SM_90)
  • TileLang == 0.1.8

From PyPI

pip install tileops

From source

git clone https://github.com/tile-ai/TileOPs
cd TileOPs
make install    # dev dependencies + pre-commit hooks

Note

If CUDA and TileLang are already installed system-wide and you encounter build issues: PIP_NO_BUILD_ISOLATION=1 pip install -e '.[dev]' -v && pre-commit install

Verify:

python -m pytest tests/ -q    # requires a CUDA GPU

Quick Start

import torch
from tileops.ops import GemmOp

M, N, K = 1024, 1024, 512
dtype = torch.float16

gemm = GemmOp(M, N, K, dtype=dtype)

A = torch.randn(M, K, device="cuda", dtype=dtype)
B = torch.randn(K, N, device="cuda", dtype=dtype)

C = gemm(A, B)

Documentation

Design docs and development guides are in docs/. The full API reference and performance tables are published at TileOPs.github.io.

Contributing

See docs/ for design docs. Branch and commit conventions are in .claude/conventions/types.sh.

License

TileOPs is released under the MIT License.

About

High-performance LLM operator library built on TileLang.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages