Add NVLink P2P support for mixed NVLink/PCIe GPU topologies by valdemardi · Pull Request #18 · aikitoria/open-gpu-kernel-modules

valdemardi · 2026-03-17T19:09:01Z

I created an NVLink-enabled version based on your 595.45.04 updated tinygrad driver. In my repository, I forked the Nvidia upstream repository from the 595.45.04 tag, applied most of the changes from your repository (excluding the README and install.sh), and then made the NVLink enabling changes and updated the README with some test results, which confirm that the driver works as expected.
Today I also created a commit against your repository with the changes, in case you or others might find this useful, given your repository's visibility. The version in this PR should work as a drop-in replacement for your version. If the system running this version has NVLink(s), the driver will prefer them where possible, and otherwise it will fall back to the BAR1 PCIe P2P approach.
I have tested the this PR version only on a quad RTX 3090 system with two NVLinks (two NVLinked GPU pairs) and with that system it works as expected. I'd expect it to work the same as your version on systems with no NVLinks, but I have not done any testing.

Cheers

aikitoria · 2026-03-22T02:16:19Z

Cool! Sadly I don't have any 3090s anymore to test this change.

Including my cudaHostRegister change in your repo is pretty brave, that solves a particular edge case in my other project where I wanted to register an enormous amount of memory for async copies that lives in 1G reserved pages, and is otherwise not tested much, although I haven't heard of it causing crashes for anyone else.

naveline67 · 2026-03-31T18:57:50Z

i have 2 5090 and 2 3090 nvlinked on saphir rappids, let me try

magikRUKKOLA · 2026-04-01T20:06:13Z

I have tested the this PR version only on a quad RTX 3090 system with two NVLinks (two NVLinked GPU pairs) and with that system it works as expected. I'd expect it to work the same as your version on systems with no NVLinks, but I have not done any testing.

You said you did what??

Are you saying the P2P and nvlink are working together?

Can you please publish your p2pBandwidthLatencyTest results? Does it really shows 100 GB/s p2p enabled bidirectional ? The latencies are good as well?

I've got a lot of RTX 3090 and the two slot NvLink bridges. I have to do the water-cooling loop first, so I had been delaying it because there is no benefit in NvLink without P2P. So you're saying you have solved the issue and everything just works?

[EDIT]: Aha, the data is in your repo. Well, I have to test it then. :)

valdemardi · 2026-04-03T22:30:56Z

@magikRUKKOLA

Yup, as far as I can see, everything is working perfectly in my system. I’ve also run several stress tests with multiple instances of the nccl-tests and p2pBandwidthLatencyTests running simultaneously, and I haven’t seen any problems. The changes are mainly reverting changes made in the tinygrad and aikitoria versions to bring back the NVLink features, rather than adding much new.

I also have a minimized version in the mini-p2p branch, where the diff to the NVIDIA version is very small.

This stripped-down version runs also perfectly on my system, and utlizies both NVLink and PCIe P2P and I use this one as my daily driver currently. The only small drawback with this version is that it requires setting some more kernel options (nvidia NVReg Dwords) for it to work, which the tingygrad/aikitoria versions set behind the scenes. On the other hand, it will be easier to keep it up-to-date with the NVidia version. The required dword options for the mini-p2p version are: nvidia.NVreg_RegistryDwords="ForceP2P=0x11;RMPcieP2PType=0x1".

One thing I would also be interested to get feedback about is whether this version still works properly as PCIe-only P2P with for example RTX 4090 or RTX 5090 cards. I would assume it will, and if it does, I think the small diff to the upstream and the added NVLink capability makes it quite attractive fork to maintain overall.

Please let me know how things go with your 3090 system.

valdemardi · 2026-04-03T22:34:42Z

i have 2 5090 and 2 3090 nvlinked on saphir rappids, let me try

👍 Very interested to hear how this works out.

Add NVLink P2P support for mixed NVLink/PCIe GPU topologies

25e61dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NVLink P2P support for mixed NVLink/PCIe GPU topologies#18

Add NVLink P2P support for mixed NVLink/PCIe GPU topologies#18
valdemardi wants to merge 1 commit intoaikitoria:595.45.04-p2pfrom
valdemardi:aikitoria-595.45.04-p2p

valdemardi commented Mar 17, 2026

Uh oh!

aikitoria commented Mar 22, 2026

Uh oh!

naveline67 commented Mar 31, 2026

Uh oh!

magikRUKKOLA commented Apr 1, 2026 •

edited

Loading

Uh oh!

valdemardi commented Apr 3, 2026

Uh oh!

valdemardi commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

valdemardi commented Mar 17, 2026

Uh oh!

aikitoria commented Mar 22, 2026

Uh oh!

naveline67 commented Mar 31, 2026

Uh oh!

magikRUKKOLA commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valdemardi commented Apr 3, 2026

Uh oh!

valdemardi commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

magikRUKKOLA commented Apr 1, 2026 •

edited

Loading