Skip to content

emiliamacek/high-performance-x86

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

High-Performance x86-64 Systems

A collection of low-level hardware optimizations, custom C++ ABI linking, and SIMD vectorization written in x86-64 assembly.

C++ Assembly Linux

This repository contains a series of projects focused on extreme software optimization. Rather than relying entirely on high-level compilers, these modules drop down to raw assembly to manipulate hardware registers, defeat branch predictors, and manually interface with the C++ object model.


Repository Projects

  • Tech: SSE/AVX, SIMD, 64-bit Assembly
  • Description: Applies graphical filters directly to raw .bmp byte arrays by loading multiple pixels into 128-bit XMM registers. Includes an ultra-fast byte-array diffing algorithm and a 2D edge-detection gradient filter that calculates distance vectors across adjacent pixel rows simultaneously.
  • Tech: Branchless Programming, Horizontal Reduction, Cache Optimization
  • Description: Pure mathematical optimizations demonstrating advanced CPU instruction sets.
    • Features a branchless AVX algorithm utilizing conditional moves (vblendvps) to completely eliminate pipeline stalls.
    • Includes an SSE horizontal reduction algorithm for rapid array min/max discovery.
    • Contains a highly optimized, loop-unrolled matrix multiplication algorithm utilizing an i-k-j loop order for maximum L1/L2 cache efficiency.
  • Tech: System V AMD64 ABI, Hardware String Ops, Carry-Flag Math
  • Description: A fully functional object-oriented BigInt class where the backend logic is written entirely in assembly. Demonstrates manual management of the C++ this pointer, hidden return object allocation, and hardware-level arithmetic (ADC, SBB, MUL, DIV) across dynamically allocated heap memory.

System Requirements

To compile and run these projects, your environment must meet the following hardware and software requirements:

  • OS: Linux (Tested on Arch Linux)
  • Architecture: x86-64 Processor with SSE/AVX instruction set support
  • Assembler: nasm (Netwide Assembler)
  • Compiler: g++ (GCC C++ Compiler)
  • Dependencies: gcc-multilib (Required to compile the 32-bit module on a 64-bit host)
  • Build Tool: make

Environment & Build Instructions

All code is written in Intel syntax using NASM and linked against G++. Each project directory contains its own isolated Makefile.

To build and execute the test suites, navigate to the specific module and run make:

# Example: Running the SIMD Image Processing module
cd simd-image-processing
make
./compute_gradient-test
./diff-test

About

Low-level hardware optimizations, simd image processing, and arbitrary-precision math written natively in x86-64 assembly

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors