admin管理员组文章数量:1434908
Very simple question. I have access to a multi-node machine and I have to do some NCCL tests. In the readme it says
If CUDA is not installed in /usr/local/cuda, you may specify CUDA_HOME. Similarly, if NCCL is not installed in /usr, you may specify NCCL_HOME.
I can see that CUDA is installed but (and here it comes my question)
how can I know if NCCL is installed? and where?
Other info
I have done
find /usr -name "libnccl.so*" 2>/dev/null
and I found this file. However when I di
find /usr -name "nccl.h" 2>/dev/null
it was not found. Obviously I could not build even the simplest
#include <stdio.h>
#include <nccl.h>
int main() {
printf("NCCL version: %d\n", NCCL_VERSION_CODE);
return 0;
}
(Btw, I think the OS is CentOS)
Very simple question. I have access to a multi-node machine and I have to do some NCCL tests. In the readme it says
If CUDA is not installed in /usr/local/cuda, you may specify CUDA_HOME. Similarly, if NCCL is not installed in /usr, you may specify NCCL_HOME.
I can see that CUDA is installed but (and here it comes my question)
how can I know if NCCL is installed? and where?
Other info
I have done
find /usr -name "libnccl.so*" 2>/dev/null
and I found this file. However when I di
find /usr -name "nccl.h" 2>/dev/null
it was not found. Obviously I could not build even the simplest
#include <stdio.h>
#include <nccl.h>
int main() {
printf("NCCL version: %d\n", NCCL_VERSION_CODE);
return 0;
}
(Btw, I think the OS is CentOS)
Share Improve this question asked Nov 17, 2024 at 13:17 KansaiRobotKansaiRobot 10.2k22 gold badges112 silver badges227 bronze badges1 Answer
Reset to default 0It is likely you have the runtime:
sudo yum install -y libnccl
But not the development environment:
sudo yum install -y libnccl-devel
As an alternative, since you have the HPC tag, most HPC cluster tend to have their code under modules (env mod, or lmod) and those are usually outside /usr. You can look with
module avail nccl
If it is there you could load the module and should have access to the development environment.
For the actual finding, If it is in a module, the the previous command will tell, and you can check in the module file to see if any variable like nccl_home is set which might make it easier.
You can also use ldconfig, which might work (if it doesn't show, it could be a false negative as there are other reasons other than not being installed that could cause the negative), it prints all the shared libraries cached by the system.
ldconfig -p | grep libnccl
Finally, specific to this case, try to run nvidia-smi if it is installed (and in path), it should print an output indicating the version (and maybe location?) of nccl.
本文标签: nvidiaHow can I know if NCCL is installedStack Overflow
版权声明:本文标题:nvidia - How can I know if NCCL is installed? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1745633734a2667426.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论