About

Welcome! This repository is a collection of my experiments and examples as I learn about GPU programming and parallel computing with CUDA. Here you'll find code exploring everything from simple vector addition to more advanced parallel algorithms.

I am referring this book:

Programming Massively Parallel Processors: A Hands-on Approach by Wen-mei W. Hwu, David B. Kirk, and Izzat El Hajj

to get a grasp of CUDA fundamentals and more advanced topics.

Requirements

NVIDIA GPU with CUDA support
CUDA Toolkit
NVCC (NVIDIA CUDA Compiler)
C++ compiler

Programs

Program	Description	Page Link
Vector Addition	Basic CUDA program demonstrating parallel vector addition. Each thread computes one element of the result vector.	README
Matrix Multiplication	Matrix multiplication implementation with two versions: naive kernel and tiled kernel using shared memory. Demonstrates key CUDA concepts like shared memory, tiling, and memory coalescing.	README
One-Head Attention	Implementation of scaled dot-product attention mechanism using CUDA. Computes Attention(Q, K, V) = softmax(QK^T / √d) × V using multiple optimized CUDA kernels.	README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About

Requirements

Programs

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

About

Requirements

Programs