Skip to content

Add NVLink P2P support for mixed NVLink/PCIe GPU topologies#18

Open
valdemardi wants to merge 1 commit intoaikitoria:595.45.04-p2pfrom
valdemardi:aikitoria-595.45.04-p2p
Open

Add NVLink P2P support for mixed NVLink/PCIe GPU topologies#18
valdemardi wants to merge 1 commit intoaikitoria:595.45.04-p2pfrom
valdemardi:aikitoria-595.45.04-p2p

Conversation

@valdemardi
Copy link
Copy Markdown

Hi @aikitoria

I created an NVLink-enabled version based on your 595.45.04 updated tinygrad driver. In my repository, I forked the Nvidia upstream repository from the 595.45.04 tag, applied most of the changes from your repository (excluding the README and install.sh), and then made the NVLink enabling changes and updated the README with some test results, which confirm that the driver works as expected.
Today I also created a commit against your repository with the changes, in case you or others might find this useful, given your repository's visibility. The version in this PR should work as a drop-in replacement for your version. If the system running this version has NVLink(s), the driver will prefer them where possible, and otherwise it will fall back to the BAR1 PCIe P2P approach.
I have tested the this PR version only on a quad RTX 3090 system with two NVLinks (two NVLinked GPU pairs) and with that system it works as expected. I'd expect it to work the same as your version on systems with no NVLinks, but I have not done any testing.

Cheers

@aikitoria
Copy link
Copy Markdown
Owner

Cool! Sadly I don't have any 3090s anymore to test this change.

Including my cudaHostRegister change in your repo is pretty brave, that solves a particular edge case in my other project where I wanted to register an enormous amount of memory for async copies that lives in 1G reserved pages, and is otherwise not tested much, although I haven't heard of it causing crashes for anyone else.

@naveline67
Copy link
Copy Markdown

i have 2 5090 and 2 3090 nvlinked on saphir rappids, let me try

@magikRUKKOLA
Copy link
Copy Markdown

magikRUKKOLA commented Apr 1, 2026

I have tested the this PR version only on a quad RTX 3090 system with two NVLinks (two NVLinked GPU pairs) and with that system it works as expected. I'd expect it to work the same as your version on systems with no NVLinks, but I have not done any testing.

You said you did what??

Are you saying the P2P and nvlink are working together?

Can you please publish your p2pBandwidthLatencyTest results? Does it really shows 100 GB/s p2p enabled bidirectional ? The latencies are good as well?

I've got a lot of RTX 3090 and the two slot NvLink bridges. I have to do the water-cooling loop first, so I had been delaying it because there is no benefit in NvLink without P2P. So you're saying you have solved the issue and everything just works?

[EDIT]: Aha, the data is in your repo. Well, I have to test it then. :)

@valdemardi
Copy link
Copy Markdown
Author

@magikRUKKOLA

Yup, as far as I can see, everything is working perfectly in my system. I’ve also run several stress tests with multiple instances of the nccl-tests and p2pBandwidthLatencyTests running simultaneously, and I haven’t seen any problems. The changes are mainly reverting changes made in the tinygrad and aikitoria versions to bring back the NVLink features, rather than adding much new.

I also have a minimized version in the mini-p2p branch, where the diff to the NVIDIA version is very small.

This stripped-down version runs also perfectly on my system, and utlizies both NVLink and PCIe P2P and I use this one as my daily driver currently. The only small drawback with this version is that it requires setting some more kernel options (nvidia NVReg Dwords) for it to work, which the tingygrad/aikitoria versions set behind the scenes. On the other hand, it will be easier to keep it up-to-date with the NVidia version. The required dword options for the mini-p2p version are: nvidia.NVreg_RegistryDwords="ForceP2P=0x11;RMPcieP2PType=0x1".

One thing I would also be interested to get feedback about is whether this version still works properly as PCIe-only P2P with for example RTX 4090 or RTX 5090 cards. I would assume it will, and if it does, I think the small diff to the upstream and the added NVLink capability makes it quite attractive fork to maintain overall.

Please let me know how things go with your 3090 system.

@valdemardi
Copy link
Copy Markdown
Author

i have 2 5090 and 2 3090 nvlinked on saphir rappids, let me try

👍 Very interested to hear how this works out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants