Tugbars

Tugbars Heptaskin Tugbars

Embedded Systems Engineer

Achievements

SMC-square-with-CPMMH-Rejuvenation SMC-square-with-CPMMH-Rejuvenation Public

A GPU-accelerated SMC² framework with Rao–Blackwellized inner filters and correlated PMMH rejuvenation, designed to amortize likelihood-based trial-and-error through massive parallelism.

Cuda
Flash-Attention-PTX-CUDA Flash-Attention-PTX-CUDA Public

Hand-written PTX flash attention kernel achieving 58% tensor core utilization on RTX 5080, matching A100's Flash Attention 2 without WGMMA, TMA, or datacenter hardware. 136 TFLOPS FP16.

Cuda
Bootstrap-Particle-Filter-in-PTX Bootstrap-Particle-Filter-in-PTX Public

BPF Bootstrap Particle Filter — Hand-Written PTX: For educational purposes.

Cuda 1
ICEEMDAN-MKL ICEEMDAN-MKL Public

High-performance ICEEMDAN implementation using Intel MKL. Header-only C++17, OpenMP parallelized, ~11ms @ 2048 samples. Cubic/Akima splines, multiple processing modes (Standard/Finance/Scientific).

C++ 2 1
Savitzky-Golay-Filter Savitzky-Golay-Filter Public

High-performance Savitzky-Golay filter in C: batch, streaming, and 2D image processing. Embedded-friendly with coefficient export for MCUs. MATLAB-validated.

C 24 3
VectorFFT VectorFFT Public

VectorFFT is a vectorized, pure C FFT library optimized for x86 processors (AVX-512, AVX2, SSE2) with zero external dependencies. It implements mixed-radix algorithms for common sizes and Bluestein…

C 4 1