Skip to content

splAcharya/TinyDNN-CUDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TinyDNN-CUDA

Personal CUDA learning repo for neural-network primitives.

The layout now follows the same broad idea as cudaJourney: keep each operation area self-contained instead of splitting the repo by artifact type.

Repository Layout

Path Focus
linear_algebra/gemm/ batched GEMM kernel, PyTorch wrapper, and runnable Python driver
activations/elementwise/ elementwise activation kernels and saved result snapshots
activations/with_bias/ activation kernels that include a bias input

Each operation folder keeps related files together:

  • kernel.cu for the CUDA implementation
  • wrapper.cpp for the PyTorch extension binding
  • test.py for the local runner / validation path
  • optional build/ and results/ folders for checked-in artifacts

Requirements

  • NVIDIA GPU with a working CUDA runtime
  • CUDA Toolkit with nvcc
  • Python with a CUDA-enabled PyTorch install

Build And Run

There is still no single build system. Each operation is run from its local test.py, which compiles wrapper.cpp and kernel.cu on demand through torch.utils.cpp_extension.load().

Examples:

python linear_algebra/gemm/test.py 1 512 512 512
python activations/elementwise/test.py
python activations/with_bias/test.py

Why This Layout

The old structure separated kernels/, wrappers/, and py_tests/, which made each primitive span multiple folders. Grouping files by operation makes it easier to inspect, run, and extend one kernel family at a time while still keeping the repo lightweight.

About

learning cuda and implementing Neural Net primitives

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors