nsight-compute

Here are 13 public repositories matching this topic...

psmarter / CUDA-Practice

CUDA编程练习项目-Hands-on CUDA kernels and performance optimization, covering GEMM, FlashAttention, Tensor Cores, CUTLASS, quantization, KV cache, NCCL, and profiling.

parallel-computing cuda high-performance-computing cuda-kernels quantization cutlass gemm performance-optimization nccl gpu-programming roofline-model tensor-core llm-inference flash-attention nsight-compute

Updated Mar 20, 2026
Cuda

bikrammajhi / 100-days-of-GPU

Star

This is my 🔥 100 Days of GPU — a wild, hands-on journey through CUDA/CUTLASS kernels, Triton spells, and PTX sorcery.

mojo cuda triton cutlass ptx nsight-compute thunderkittens

Updated Apr 6, 2026
HTML

openhackathons-org / HPC_Profiler

Star

Profiling with NVIDIA Nsight Tools Bootcamp

hpc cuda openacc nsight-systems nsight-compute

Updated Feb 4, 2026
C++

SeungjaeLim / CUDA.tutorial

Star

References content from the OLCF CUDA Training Series. (https://github.com/olcf/cuda-training-series)

cuda gpu-programming nsight-systems nsight-compute

Updated Nov 21, 2024
Cuda

j3soon / hpc-samples

Star

CUDA Samples and Nsight Guided Profiling Samples

cuda profiling nsight nsight-compute

Updated Nov 14, 2025
Cuda

abhiMishra98 / Holoscan-Add-Matrices

Star

This project demonstrates the integration of a CUDA kernel within an NVIDIA Holoscan application. It consists of two custom operators: one for memory allocation and data initialization, and another for executing the CUDA kernel. The application was profiled using Nsight systems and the kernel with Nsight compute

cuda gpu-programming holoscan nsight-systems nsight-compute gpu-profiling

Updated Nov 11, 2025
C++

Artemarius / cuda-zkp-ntt

Star

GPU-accelerated Number-Theoretic Transform for ZK-Proof generation. Targets the NTT bottleneck (91% of Groth16 prover time) via two CUDA optimizations: async double-buffered pipeline eliminating CPU-GPU transfer overhead, and IADD3-path Montgomery multiplication reducing finite-field instruction latency. BLS12-381, Ampere sm_86, Nsight-profiled.

gpu cuda cuda-kernels number-theoretic-transform ntt zero-knowledge-proofs finite-field-arithmetic bls12-381 montgomery-multiplication nsight-compute

Updated Mar 16, 2026
Cuda

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

Star

University Project for "Computer Architecture" course (MSc Computer Engineering @ University of Pisa). Implementation of a Parallelized Nearest Neighbor Upscaler using CUDA.

gpu nvidia nvidia-cuda nvidia-gpu nsight image-upscaling parallelized nearest-neighbor-algorithm nsight-compute

Updated Dec 29, 2023
C

Olajide-Badejo / GPU-Physics-Simulation

Star

Real-time CUDA physics engine for N-body gravity, SPH fluids, and rigid-body collisions. Uses shared-memory tiling, kernel fusion, and spatial hashing on RTX 4080/4090.

cpp cuda physics-simulation sph fluid-simulation n-body nsight-systems nsight-compute

Updated Apr 15, 2026
Cuda

davesohamm / GPU-Benchmark

Star

A comprehensive, hardware-agnostic GPU benchmarking suite that compares CUDA, OpenCL, and DirectCompute performance using identical workloads. Built from scratch with professional architecture, extensive documentation, and production-ready GUI.

benchmarking cmake opengl kernel cpp opencl imgui cuda hlsl gpu-computing low-level-programming nvcc directcompute cool-stuff cuda-programming fxc nsight-compute dgxi

Updated Jan 16, 2026
C++

singularitti / NsightCompute.jl

Star

Julia tools for NVIDIA Nsight Compute

benchmarking tools julia profiling julia-package nvidia-gpu nsight-compute

Updated Apr 20, 2026
Julia

Liupeter01 / libHPC

Star

libHPC is a high-performance computing library focused on Linux and Windows environments. It provides SIMD-optimized kernels, concurrent data structures, GPU utilities, and HPC-oriented memory management components.

c cpp hpc cuda intrinsics lock-free sparse-matrix memorypool ghost-cell-exchange nsight-systems nsight-compute ghost-cell

Updated Apr 22, 2026
C++

CyberKnight1803 / roofline-profiler

Star

Roofline profiling for Deep Learning models

cuda pytorch nvidia-gpu nsight-compute roofline-profiler

Updated May 4, 2024
Python

Improve this page

Add a description, image, and links to the nsight-compute topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nsight-compute topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsight-compute

Here are 13 public repositories matching this topic...

psmarter / CUDA-Practice

bikrammajhi / 100-days-of-GPU

openhackathons-org / HPC_Profiler

SeungjaeLim / CUDA.tutorial

j3soon / hpc-samples

abhiMishra98 / Holoscan-Add-Matrices

Artemarius / cuda-zkp-ntt

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

Olajide-Badejo / GPU-Physics-Simulation

davesohamm / GPU-Benchmark

singularitti / NsightCompute.jl

Liupeter01 / libHPC

CyberKnight1803 / roofline-profiler

Improve this page

Add this topic to your repo