You need to enable JavaScript to run this app.
导航
安装NVIDIA-Fabric Manager软件包
最近更新时间:2024.05.06 16:48:44首次发布时间:2021.09.08 15:34:49

操作场景

NVIDIA-Fabric Manager服务可以使多A100/A800显卡间通过NVSwitch互联。有关NVSwitch的更多介绍,请参见NVIDIA官网

说明

  • 搭载A100/A800显卡的实例请参见实例规格介绍,如果未安装与GPU驱动版本对应的NVIDIA-Fabric Manager服务,您将无法正常使用该类GPU实例。
  • 搭载A100/A800显卡的实例升级GPU驱动的同时,还需同步升级Fabric Manager,否则将无法正常使用。如何升级NVIDIA Tesla驱动?
  • 火山引擎提供的公共镜像默认已安装NVIDIA-Fabric Manager及devel软件包,您只需启动NVIDIA-Fabric Manager即可实现NVSwitch互联。
  • 如果您使用未安装NVIDIA-Fabric Manager的自定义镜像,购买了搭载多张A100/A800显卡的GPU实例后,则必须安装与GPU驱动版本对应的NVIDIA-Fabric Manager软件包。

步骤一:安装NVIDIA-Fabric Manager

您可以通过安装包或者源码两种方式安装NVIDIA-Fabric Manager服务,下文以GPU驱动为470.57.02版本为例,为您介绍如何安装并启动NVIDIA-Fabric Manager服务。如需下载其它版本,请将命令中的版本号替换为相应的GPU驱动版本号。您可以执行nvidia-smi命令,查看GPU驱动版本。

方式一:通过安装包安装

  • CentOS 8.x

    wget https://developer.download.nvidia.cn/compute/cuda/repos/rhel8/x86_64/nvidia-fabric-manager-470.57.02-1.x86_64.rpm
    rpm -ivh nvidia-fabric-manager-470.57.02-1.x86_64.rpm
    
  • CentOS 7.x

    wget https://developer.download.nvidia.cn/compute/cuda/repos/rhel7/x86_64/nvidia-fabric-manager-470.57.02-1.x86_64.rpm
    rpm -ivh nvidia-fabric-manager-470.57.02-1.x86_64.rpm
    
  • Ubuntu 20.04

    wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/nvidia-fabricmanager-470_470.57.02-1_amd64.deb
    dpkg -i nvidia-fabricmanager-470_470.57.02-1_amd64.deb
    
  • Ubuntu 18.04

    wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/nvidia-fabricmanager-470_470.57.02-1_amd64.deb
    dpkg -i nvidia-fabricmanager-470_470.57.02-1_amd64.deb
    
  • Debain 10、veLinux 1.0

    wget https://developer.download.nvidia.cn/compute/cuda/repos/debian10/x86_64/nvidia-fabricmanager-470_470.57.02-1_amd64.deb
    dpkg -i nvidia-fabricmanager-470_470.57.02-1_amd64.deb
    

方式二:通过源安装

  • CentOS 8.x

    dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
    dnf module enable -y nvidia-driver:470
    dnf install -y nvidia-fabric-manager-0:470.57.02-1
    
  • CentOS 7.x

    yum -y install yum-utils 
    yum-config-manager --add-repo https://developer.download.nvidia.cn/compute/cuda/repos/rhel7/x86_64/cuda-rhel7.repo
    yum install -y nvidia-fabric-manager-470.57.02-1
    
  • Ubuntu 20.04

    wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
    mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
    apt-key add 7fa2af80.pub
    rm 7fa2af80.pub
    echo "deb http://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
    apt-get update
    apt-get -y install nvidia-fabricmanager-470=470.57.02-1
    
  • Ubuntu 18.04

    wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
    mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    apt-key add 7fa2af80.pub
    rm 7fa2af80.pub
    echo "deb http://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64 /" | tee /etc/apt/sources.list.d/cuda.list
    apt-get update
    apt-get -y install nvidia-fabricmanager-470=470.57.02-1
    

步骤二:安装Nvidia-Fabric-Manager-devel

  • CentOS 7.x/8.x
    yum install nvidia-fabric-manager-devel-470.57.02-1 -y

  • Ubuntu 20.04/18.04、Debain 10、veLinux 1.0
    dpkg -i nvidia-fabric-manager-devel-470.57.02-1_amd64.deb

步骤三:启动NVIDIA-Fabric Manager

  1. 执行如下命令启动Fabric Manager服务。
    sudo systemctl start nvidia-fabricmanager

  2. 执行如下命令查看Fabric Manager服务是否正常启动,回显active(running)表示启动成功。
    sudo systemctl status nvidia-fabricmanager

  3. 执行如下命令配置Fabric Manager服务随实例开机自启动。
    sudo systemctl enable nvidia-fabricmanager