You need to enable JavaScript to run this app.
导航

验证镜像是否支持 RDMA

最近更新时间2023.06.19 21:42:12

首次发布时间2022.06.24 19:31:20

本文介绍了如何验证当前镜像是否支持 RDMA 能力,用户可以根据下文中的步骤分别在 V100 RDMA(ml.hpcg1v.21xlarge 或 ml.hpcg1ve.21xlarge)和 A100 RDMA(ml.hpcpni2.28xlarge)两种机型上验证某个镜像是否符合 RDMA 的使用条件。

背景

V100 和 A100 的 RDMA 网卡硬件不同,云服务器对 V100 和 A100 的 RDMA 网卡虚拟化支持方式不同,因此不同机型对镜像内相关软件库 / 包的版本也略有差异。

确认操作系统的发行版本

说明

不同发行版本的安装命令可能略有差异,目前主流的训练容器镜像是基于 Ubuntu(下文的 Ubuntu 版本为 20.04) 构建的,后续有其他发行版本的镜像,本文档会迭代更新。

在容器内执行 cat /etc/os-release,输出示例如下:

root@iv-ybqs2pif757grbqpwubx:/workspace# cat /etc/os-release 
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
根据机型配置测试环境

V100 RDMA

Ubuntu

  1. 执行如下命令安装测试软件包。
apt update && apt install -y infiniband-diags
  1. 使用 ibstatus 命令查看网卡速率。可以看到本例中网卡(mlx5_1)速率(rate)为 100Gb/s,对 V100 RDMA 机型而言这是符合预期的。
# ibstatus
Infiniband device 'mlx5_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0216:3eff:fe5a:2a70
        base lid:        0x0
        sm lid:          0x0
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            25 Gb/sec (1X EDR)
        link_layer:      Ethernet

Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:0216:3fff:fe0e:db1b
        base lid:        0x0
        sm lid:          0x0
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            100 Gb/sec (4X EDR)
        link_layer:      Ethernet
  1. 执行如下命令检查是否安装 RDMA 相关库。
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1

输出示例如下:

# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
// 下面是输出
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                               Version                        Architecture                   Description
+++-==================================================-==============================-==============================-==========================================================================================================
ii  ibverbs-providers:amd64                            17.1-1ubuntu0.2                amd64                          User space provider drivers for libibverbs
ii  libibverbs1:amd64                                  17.1-1ubuntu0.2                amd64                          Library for direct userspace use of RDMA (InfiniBand/iWARP)
ii  libnl-3-200:amd64                                  3.2.29-0ubuntu3                amd64                          library for dealing with netlink sockets
ii  libnl-route-3-200:amd64                            3.2.29-0ubuntu3                amd64                          library for dealing with netlink sockets - route interface
dpkg-query: no packages found matching perftest
dpkg-query: no packages found matching libibumad3
dpkg-query: no packages found matching librdmacm1

上述输出信息中包含了已安装(如ibverbs-providers:amd64libibverbs1:amd64等)和未安装(如perftestlibibumad3等)的软件。
如有软件包未安装,请执行后续操作,否则即可正常使用上述软件验证当前镜像是否支持 RDMA。
4. 执行如下命令:

apt update && apt install -y perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
  1. 执行如下命令再次查看软件包安装情况。
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1

输出示例如下:

# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                               Version                        Architecture                   Description
+++-==================================================-==============================-==============================-==========================================================================================================
ii  ibverbs-providers:amd64                            28.0-1ubuntu1                  amd64                          User space provider drivers for libibverbs
ii  libibumad3:amd64                                   28.0-1ubuntu1                  amd64                          InfiniBand Userspace Management Datagram (uMAD) library
ii  libibverbs1:amd64                                  28.0-1ubuntu1                  amd64                          Library for direct userspace use of RDMA (InfiniBand/iWARP)
ii  libnl-3-200:amd64                                  3.4.0-1                        amd64                          library for dealing with netlink sockets
ii  libnl-route-3-200:amd64                            3.4.0-1                        amd64                          library for dealing with netlink sockets - route interface
ii  librdmacm1:amd64                                   28.0-1ubuntu1                  amd64                          Library for managing RDMA connections
ii  perftest                                           4.4+0.5-1                      amd64                          Infiniband verbs performance tests

如未出现 dpkg-query: no packages found matching 报错,即可正常使用,版本号无需和本例保持一致。
6. 如果 nccl 版本低于 2.12 可以尝试安装 Sharp 插件以便启用 GDR(无法使用 GDR 将导致约 10% 的性能下降):

apt install automake autoconf libtool libibverbs-dev=28.0-1ubuntu1 libibverbs1=28.0-1ubuntu1
cd /tmp \
 && git clone https://github.com/Mellanox/nccl-rdma-sharp-plugins.git \
 && cd nccl-rdma-sharp-plugins \
 && ./autogen.sh \
 && ./configure --prefix=/usr/local/nccl-rdma-sharp-plugins --with-cuda=/usr/local/cuda \
 && make && make install \
 && rm -rf /tmp/nccl-rdma-sharp-plugins
 
 export LD_LIBRARY_PATH="/usr/local/nccl-rdma-sharp-plugins/lib:${LD_LIBRARY_PATH}"

CentOS

  1. 本例中使用 Docker Hub 中的 CentOS 7.9.2009 镜像(Image Layer Details - centos:centos7.9.2009 | Docker Hub),容器中执行 cat /etc/os-release,样例输出如下:
[root@ncggrd8mrsfegjm28qvqg /]# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

执行如下命令安装测试软件包:

yum install -y infiniband-diags

注:由于 CentOS 8 已迁移到 CentOS 8 Stream,在使用上述命令时可能会遇到如下报错:

# yum install -y infiniband-diags
Failed to set locale, defaulting to C.UTF-8
CentOS Linux 8 - AppStream                                                                                                                                                                           89  B/s |  38  B     00:00    
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist

此时可先使用如下两条命令,然后再次执行 yum install -y infiniband-diags 即可。

sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-*
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-*
  1. 使用 ibstatus 命令查看网卡速率,可看到本例中网卡(mlx5_1)速率(rate)为 100 Gb/sec,对 V100 RDMA 机型而言这是符合预期的。
# ibstatus
Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:d069:89ff:fe00:e864
        base lid:        0x0
        sm lid:          0x0
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            100 Gb/sec (4X EDR)
        link_layer:      Ethernet
  1. 执行如下命令检查是否安装 RDMA 相关库:
rpm -q perftest libibumad libibverbs libnl3 librdmacm

输出示例如下:

# rpm -q perftest libibumad libibverbs libnl3 librdmacm
// 下面是输出
package perftest is not installed
libibumad-22.4-6.el7_9.x86_64
package libibverbs is not installed
package libnl3 is not installed
package librdmacm is not installed

上述输出信息中包含了已安装(如 libibumad)和未安装(如perftestlibibverbs 等)的软件。 如有软件包未安装,请执行后续操作,否则即可正常使用上述软件验证当前镜像是否支持 RDMA**。**

  1. 执行如下命令:
yum install -y perftest libibumad libibverbs libnl3 librdmacm
  1. 执行如下命令再次查看软件包安装情况。
rpm -q perftest libibumad libibverbs libnl3 librdmacm

输出示例如下:

# rpm -q perftest libibumad libibverbs libnl3 librdmacm
perftest-4.2-2.el7.x86_64
libibumad-22.4-6.el7_9.x86_64
libibverbs-22.4-6.el7_9.x86_64
libnl3-3.2.28-4.el7.x86_64
librdmacm-22.4-6.el7_9.x86_64

如未出现 package x is not installed 报错,即可正常使用,版本号无需和本例保持一致。

A100 RDMA

Ubuntu

  1. 确定网卡速率可参考 V100 RDMA 中步骤 1 及步骤 2,对于 A100 RDMA 机型,网卡速率应为 200Gb/s。
  2. 执行如下指令确认 Ubuntu 版本:
lsb_release -c

输出示例如下:

# lsb_release -c
// 下面是输出
Codename:       bionic

Codename 和 Ubuntu 版本对应如下表格:

CodenameVersion
bionic18.04
focal20.04
impish21.10
jammy22.04
  1. 执行如下命令检查是否安装 RDMA 相关库。
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1

输出示例如下:

# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
// 下面是输出
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                               Version                        Architecture                   Description
+++-==================================================-==============================-==============================-==========================================================================================================
ii  ibverbs-providers:amd64                            17.1-1ubuntu0.2                amd64                          User space provider drivers for libibverbs
ii  libibverbs1:amd64                                  17.1-1ubuntu0.2                amd64                          Library for direct userspace use of RDMA (InfiniBand/iWARP)
ii  libnl-3-200:amd64                                  3.2.29-0ubuntu3                amd64                          library for dealing with netlink sockets
ii  libnl-route-3-200:amd64                            3.2.29-0ubuntu3                amd64                          library for dealing with netlink sockets - route interface
dpkg-query: no packages found matching perftest
dpkg-query: no packages found matching libibumad3
dpkg-query: no packages found matching librdmacm1

上述输出信息中包含了已安装(如ibverbs-providers:amd64libibverbs1:amd64等)和未安装(如perftestlibibumad3等)的软件。
如有软件包未安装或ibverbs-providers:amd64libibverbs1:amd64的版本号前两位数字低于23,请执行后续操作。否则可跳转至步骤 8 对比输出,如无问题即可正常使用。
4. 如在步骤 2 中获取到的版本低于20.04,请从步骤 5 开始操作,如版本高于20.04,请直接执行如下命令,然后可跳转到步骤 8。

apt update && apt install -y perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
  1. 添加deb http://mirrors.ivolces.com/ubuntu/ focal main universe/etc/apt/sources.list,或者直接执行如下命令(只需添加一次):
echo "deb http://mirrors.ivolces.com/ubuntu/ focal main universe" >> /etc/apt/sources.list
  1. 添加 APT::Default-Release "Codename";/etc/apt/apt.conf.d/01-vendor-ubuntu,这里的 Codename 替换为步骤 1 中获取到的结果,以 18.04 为例执行如下命令(只需添加一次):
echo "APT::Default-Release \"bionic\";" >> /etc/apt/apt.conf.d/01-vendor-ubuntu
  1. 执行如下命令:
apt update && apt install -t focal perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
  1. 执行如下命令再次查看软件包安装情况。
dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1

输出示例如下:

# dpkg -l perftest ibverbs-providers libibumad3 libibverbs1 libnl-3-200 libnl-route-3-200 librdmacm1
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                               Version                        Architecture                   Description
+++-==================================================-==============================-==============================-==========================================================================================================
ii  ibverbs-providers:amd64                            28.0-1ubuntu1                  amd64                          User space provider drivers for libibverbs
ii  libibumad3:amd64                                   28.0-1ubuntu1                  amd64                          InfiniBand Userspace Management Datagram (uMAD) library
ii  libibverbs1:amd64                                  28.0-1ubuntu1                  amd64                          Library for direct userspace use of RDMA (InfiniBand/iWARP)
ii  libnl-3-200:amd64                                  3.4.0-1                        amd64                          library for dealing with netlink sockets
ii  libnl-route-3-200:amd64                            3.4.0-1                        amd64                          library for dealing with netlink sockets - route interface
ii  librdmacm1:amd64                                   28.0-1ubuntu1                  amd64                          Library for managing RDMA connections
ii  perftest                                           4.4+0.5-1                      amd64                          Infiniband verbs performance tests

检查ibverbs-providers:amd64libibumad3:amd64libibverbs1:amd64librdmacm1:amd64的版本号,该例中是 28.0-1ubuntu1,前两位数字不低于 23 即可正常使用。

CentOS

  1. 本例中使用 Docker Hub 中的 CentOS 7.9.2009 镜像(Image Layer Details - centos:centos7.9.2009 | Docker Hub),容器中执行 cat /etc/os-release,样例输出如下:
[root@ncggrd8mrsfegjm28qvqg /]# cat /etc/os-release 
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

执行如下命令安装测试软件包:

yum install -y infiniband-diags

注:由于 CentOS 8 已迁移到 CentOS 8 Stream,在使用上述命令时可能会遇到如下报错:

# yum install -y infiniband-diags
Failed to set locale, defaulting to C.UTF-8
CentOS Linux 8 - AppStream                                                                                                                                                                           89  B/s |  38  B     00:00    
Error: Failed to download metadata for repo 'appstream': Cannot prepare internal mirrorlist: No URLs in mirrorlist

此时可先使用如下两条命令,然后再次执行 yum install -y infiniband-diags 即可。

sed -i 's/mirrorlist/#mirrorlist/g' /etc/yum.repos.d/CentOS-Linux-*
sed -i 's|#baseurl=http://mirror.centos.org|baseurl=http://vault.epel.cloud|g' /etc/yum.repos.d/CentOS-Linux-*
  1. 使用 ibstatus 命令查看网卡速率,可看到本例中网卡(mlx5_1)速率(rate)为 100 Gb/sec,对 V100 RDMA 机型而言这是符合预期的。
# ibstatus
Infiniband device 'mlx5_1' port 1 status:
        default gid:     fe80:0000:0000:0000:d069:89ff:fe00:e864
        base lid:        0x0
        sm lid:          0x0
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            100 Gb/sec (4X EDR)
        link_layer:      Ethernet
  1. 执行如下命令检查是否安装 RDMA 相关库:
rpm -q perftest libibumad libibverbs libnl3 librdmacm

输出示例如下:

# rpm -q perftest libibumad libibverbs libnl3 librdmacm
// 下面是输出
package perftest is not installed
libibumad-22.4-6.el7_9.x86_64
package libibverbs is not installed
package libnl3 is not installed
package librdmacm is not installed

上述输出信息中包含了已安装(如 libibumad)和未安装(如perftestlibibverbs 等)的软件。 如有软件包未安装,请执行后续操作,否则即可正常使用上述软件验证当前镜像是否支持 RDMA

  1. 执行如下命令:
yum install -y perftest libibumad libibverbs libnl3 librdmacm
  1. 执行如下命令再次查看软件包安装情况:
rpm -q perftest libibumad libibverbs libnl3 librdmacm

输出示例如下:

# rpm -q perftest libibumad libibverbs libnl3 librdmacm
perftest-4.2-2.el7.x86_64
libibumad-22.4-6.el7_9.x86_64
libibverbs-22.4-6.el7_9.x86_64
libnl3-3.2.28-4.el7.x86_64
librdmacm-22.4-6.el7_9.x86_64

如未出现 package x is not installed 报错,即可正常使用,版本号无需和本例保持一致(以 libibverbs-22.4-6.el7_9.x86_64 为例,其中版本号为 22.4-6.el7_9.x86_64,仅需其前两位不低于 22 即可)。

验证是否支持 RDMA

根据前文配置好环境后,可按照下列步骤进行镜像的配置验证,对于 V100 RDMA 和 A100 RDMA 两种机型而言,验证步骤相同。

在单机上的验证方式

  1. 输入如下命令:
ib_write_bw -d mlx5_1 &

输出示例如下:

# ib_write_bw -d mlx5_1 &
[1] 104777
root@iv-ybrf933mwd8rx7gs2na5:/workspace# 
************************************
* Waiting for client to connect... *
************************************
  1. 在同一机器上继续输入如下命令:
ib_write_bw -d mlx5_1 127.0.0.1 --report_gbits

输出示例如下:

# ib_write_bw -d mlx5_1 127.0.0.1 --report_gbits
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 2
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x090a PSN 0x723cbf RKey 0x082200 VAddr 0x007f67e0c4c000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : Ethernet
 GID index       : 2
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x090b PSN 0xe78073 RKey 0x082300 VAddr 0x007fd7b287f000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59
 remote address: LID 0000 QPN 0x090b PSN 0xe78073 RKey 0x082300 VAddr 0x007fd7b287f000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59
 remote address: LID 0000 QPN 0x090a PSN 0x723cbf RKey 0x082200 VAddr 0x007f67e0c4c000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:198:18:06:59
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 65536      5000             93.90              93.67              0.178661
---------------------------------------------------------------------------------------
 65536      5000             93.90              93.67              0.178661
---------------------------------------------------------------------------------------

对于 V100 RDMA 机型,带宽值(BW peakBW average)应接近 100Gb/s,A100 RDMA 机型应接近 200Gb/s,如符合要求则说明配置无问题,如无输出或报错请回到根据机型配置环境的部分,检查是否有配置项的遗漏。

在多机上的验证方式

  1. 在 A 机器中输入如下命令
ib_write_bw -d mlx5_1 -x 3

输出示例如下:

# ib_write_bw -d mlx5_1 -x 3
************************************
* Waiting for client to connect... *
************************************
  1. 在 B 机器中输入如下命令,<MACHINE_A_HOST> 请替换为 A 机器的 RDMA 网口 IP。
ib_write_bw -d mlx5_1 -x3 <MACHINE_A_HOST> --report_gbits