drop support for compute capability <= 7.0 for newer cuDNN versions#170
drop support for compute capability <= 7.0 for newer cuDNN versions#170bedroge wants to merge 1 commit intoEESSI:mainfrom
Conversation
|
Ultimately we could make the same kind of lookup table as for CUDA. Initially I started working on it: but it's a lot of work, and as mentioned, it's not really clear what is supported and what is not. We could also consider an more simple lookup table with just the min+max supported CCs per X.YZ version? But then again, https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html says that 12.1 is not supported, the binaries do seem to indicate that it's supported, so it's very confusing and unclear... |
| cuda_ccs_string = re.sub(r'[a-zA-Z]', '', cuda_ccs_string).replace(',', '_') | ||
| # Also replace periods, those are not officially supported in environment variable names | ||
| var=f"EESSI_IGNORE_CUDNN_{cudnn_ver}_CC_{cuda_ccs_string}".replace('.', '_') | ||
| errmsg = f"EasyConfigs using cuDNN {cudnn_ver} or older are not supported for (all) requested Compute " |
There was a problem hiding this comment.
I think this is wrong: in your case the cuDNN is too new, not too old, right?
|
My 2 cents:
|
|
I just feel like a lookup table is a lot of work to set up and to maintain, while (according to the docs) the supported CCs don't change that often. Also, wouldn't the sanity check still catch unsupported CCs, as it did for CC 7.0 in EESSI/software-layer#1410? So whenever we run into this, we can mark those as unsupported in the hooks (and if necessary, change the if statement to something else if there are going to be too many combinations)? |
This one is a little bit more tricky as CUDA itself, as the list of supported compute capabilities in the docs (https://docs.nvidia.com/deeplearning/cudnn/backend/v9.19.0/reference/support-matrix.html) don't really match what running
cuobjdumpon the binaries shows. Also, there seem to be some gaps in the matrix, and I wonder if that's really correct.So for now I've chosen an easier approach by just checking if we're building with a newer cuDNN and compute capability <= 7.0, and in that case I do the same thing as what @casparvl implemented for CUDA. In order to check if cuDNN is used as dependency, I've generalized Caspar's
get_cuda_versioninto aget_dependency_software_versionfunction.Tested this locally with EESSI-extend and the cuDNN from EESSI/software-layer#1410 on a V100 (CC 7.0) and RTX PRO 6000 (CC 12.0f), and got the expected result: on the RTX PRO 6000 I get a full cuDNN installation, while for the V100 I get the following output during the build:
and a module file that has: