A collection of low-level hardware optimizations, custom C++ ABI linking, and SIMD vectorization written in x86-64 assembly.
This repository contains a series of projects focused on extreme software optimization. Rather than relying entirely on high-level compilers, these modules drop down to raw assembly to manipulate hardware registers, defeat branch predictors, and manually interface with the C++ object model.
- Tech:
SSE/AVX,SIMD,64-bit Assembly - Description: Applies graphical filters directly to raw
.bmpbyte arrays by loading multiple pixels into 128-bitXMMregisters. Includes an ultra-fast byte-array diffing algorithm and a 2D edge-detection gradient filter that calculates distance vectors across adjacent pixel rows simultaneously.
- Tech:
Branchless Programming,Horizontal Reduction,Cache Optimization - Description: Pure mathematical optimizations demonstrating advanced CPU instruction sets.
- Features a branchless AVX algorithm utilizing conditional moves (
vblendvps) to completely eliminate pipeline stalls. - Includes an SSE horizontal reduction algorithm for rapid array min/max discovery.
- Contains a highly optimized, loop-unrolled matrix multiplication algorithm utilizing an
i-k-jloop order for maximum L1/L2 cache efficiency.
- Features a branchless AVX algorithm utilizing conditional moves (
- Tech:
System V AMD64 ABI,Hardware String Ops,Carry-Flag Math - Description: A fully functional object-oriented
BigIntclass where the backend logic is written entirely in assembly. Demonstrates manual management of the C++thispointer, hidden return object allocation, and hardware-level arithmetic (ADC,SBB,MUL,DIV) across dynamically allocated heap memory.
To compile and run these projects, your environment must meet the following hardware and software requirements:
- OS: Linux (Tested on Arch Linux)
- Architecture: x86-64 Processor with SSE/AVX instruction set support
- Assembler:
nasm(Netwide Assembler) - Compiler:
g++(GCC C++ Compiler) - Dependencies:
gcc-multilib(Required to compile the 32-bit module on a 64-bit host) - Build Tool:
make
All code is written in Intel syntax using NASM and linked against G++. Each project directory contains its own isolated Makefile.
To build and execute the test suites, navigate to the specific module and run make:
# Example: Running the SIMD Image Processing module
cd simd-image-processing
make
./compute_gradient-test
./diff-test