Skip to content

Latest commit

 

History

History
26 lines (15 loc) · 1.19 KB

File metadata and controls

26 lines (15 loc) · 1.19 KB

Open Tasks

Does RoPE mess with semantics of the vectors, what would you do differently? ➝

Articles

Claim: For any n-gram language model, there exists a state space language model that can simulate it with arbitrarily small error.

Advanced research on DeepSeek's innovative sparse attention mechanisms for efficient long-context processing.

How a 7M parameter model beats 100x bigger models at Sudoku, Mazes, and ARC-AGI using recursive reasoning.

NVIDIA's breakthrough 4-bit training methodology achieving 2-3x speedup and 50% memory reduction.

Diffusion Transformers with Representation Autoencoders achieve state-of-the-art FID 1.13 on ImageNet.

Quantization-enhanced Reinforcement Learning for LLMs enables RL training of 32B models on a single GPU.