Skip to content
View MansfieldPlumbing's full-sized avatar
đź’­
Open to work
đź’­
Open to work

Block or report MansfieldPlumbing

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
MansfieldPlumbing/README.md

Focus

Windows-native inference infrastructure. The work is about eliminating the overhead between ML models and the hardware they run on — zero-copy memory transports, KMDF drivers, TensorRT engine construction, PCIe fabric management.

Most of the stack that makes this possible is invisible by design.


Projects

DirectPort-SDK NT kernel object-based GPU IPC. Shared VRAM between processes via NT named handles and DX12 fences. Hardware-synchronized push architecture — the producer signals a GPU queue wait, not a CPU semaphore, so the consumer unblocks at PCIe crossbar latency (~170ns) rather than through the OS scheduler (1–15ms). D3D11 lacks the API to resolve NT handle strings directly; the SDK keeps a single D3D12 device alive purely as a name resolver, letting D3D11 own the actual resource. No polling, no copies.

DirectPort-Legacy Adapter layer that converts DirectPort's push model into a pull interface for applications that can't be modified. The transport primitive underneath is unchanged — the adapter absorbs the impedance mismatch at the boundary, not inside the pipeline.

VirtuaCam Multi-process zero-copy GPU video broker with a Media Foundation COM source. Producer applications share D3D11 textures and fences via NT handles; a central broker multiplexes feeds from multiple producers into a composited output (single source or PIP grid) and delivers frames into the Media Foundation pipeline as a system-registered virtual camera. All inter-process frame transfers stay on the GPU. WASAPI loopback capture included.

RIFE_TRT RIFE 4.9 frame interpolation on TensorRT. 2x/4x/8x frame rate multiplication at ~28ms per frame pair on RTX 3090. Zero-copy in-memory pipeline: C# unsafe Parallel.For handles real-time CHW transposition from packed RGB, a C++ DLL drives the async CUDA execution context, audio is stream-copied via a single FFmpeg mux pass at the end. No Python at runtime.

Depth_TRT Depth Anything V2 on TensorRT. C# NativeAOT orchestrator with unsafe Parallel.For + LockBits for real-time CHW tensor transposition. ImageNet normalization baked into the unmanaged C++ inference bridge. No Python at runtime.

Demucs_v4_TRT HTDemucs v4 on TensorRT. STFT/ISTFT internalized inside the traced graph to preserve the dual-path time/frequency architecture and achieve full kernel fusion across both branches. ~5 seconds end-to-end on RTX 3090 for a 3-minute track. No Python at runtime. Published on HuggingFace.

v340l-windows-enablement Custom KMDF driver and userspace daemon to activate the dual-die AMD Radeon Pro V340L on Windows. The card requires Microsemi Switchtec PCIe fabric initialization and a software SR-IOV mailbox implementation before the GPU silicon responds. No prior Windows activation of this card exists. Hardware validation in progress.


HuggingFace · YouTube

Pinned Loading

  1. DirectPort-Legacy DirectPort-Legacy Public

    An OS-native plumbing layer for establishing a direct, real-time VRAM conduit between previously siloed GPU applications.

    C++ 1

  2. Demucs_v4_TRT Demucs_v4_TRT Public

    A native, high-performance Demucs v4 implementation with internalized STFT/ISTFT operations for 5-second end-to-end TensorRT inference.

    PowerShell

  3. DirectPort-SDK DirectPort-SDK Public

    A low-latency, Windows NT-based IPC memory transport primitive enabling non-blocking, GPU-to-GPU data sharing. It utilizes a hardware-synchronized push architecture—bridged to legacy pull systems v…

    C++

  4. VirtuaCam VirtuaCam Public

    Lightweight and Modern Virtual Camera Software for Windows 10/11 (22000 and higher)

    C++ 1 1

  5. Depth_TRT Depth_TRT Public

    "High-performance Depth Anything V2 implementation using TensorRT. Features an unsafe C# memory pipeline for zero-copy tensor transposition and native Windows inference—bypassing Python entirely fo…

    PowerShell

  6. v340l-windows-enablement v340l-windows-enablement Public

    Research and custom driver implementation to enable the dual-die AMD Radeon Pro V340L MxGPU on Windows 10/11 and Server.

    C