Skip to content

Latest commit

 

History

History
27 lines (18 loc) · 1.32 KB

File metadata and controls

27 lines (18 loc) · 1.32 KB

About

Welcome! This repository is a collection of my experiments and examples as I learn about GPU programming and parallel computing with CUDA. Here you'll find code exploring everything from simple vector addition to more advanced parallel algorithms.

I am referring this book:

Programming Massively Parallel Processors: A Hands-on Approach by Wen-mei W. Hwu, David B. Kirk, and Izzat El Hajj

to get a grasp of CUDA fundamentals and more advanced topics.

Requirements

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit
  • NVCC (NVIDIA CUDA Compiler)
  • C++ compiler

Programs

Program Description Page Link
Vector Addition Basic CUDA program demonstrating parallel vector addition. Each thread computes one element of the result vector. README
Matrix Multiplication Matrix multiplication implementation with two versions: naive kernel and tiled kernel using shared memory. Demonstrates key CUDA concepts like shared memory, tiling, and memory coalescing. README
One-Head Attention Implementation of scaled dot-product attention mechanism using CUDA. Computes Attention(Q, K, V) = softmax(QK^T / √d) × V using multiple optimized CUDA kernels. README