RoCE SR-IOV Setup and Performance Study on vSphere 7.x
Description:
This is the second document in a series of technical guides. Here, we walk through the steps to enable RoCE SR-IOV on a dual-port Mellanox ConnectX-5 VPI adapter card in VMware vSphere 7.x. We cover the steps from the BIOS, ESXi, and the vSphere Client to the functionality test on the VM guest operating system. We also introduce how to use the vHPC toolkit, an open-source tool developed by VMware, to speed up the deployment of an HPC cluster in vSphere. Some of the steps are referenced from VMware documentation on how to configure a VM to use SR-IOV devices and NVIDIA documentation on how to set up and configure the firmware and driver of Mellanox ConnectX adapter cards in a vSphere environment. Finally, we present a performance study that uses five HPC applications across multiple vertical domains. We conclude that a virtual HPC cluster can perform nearly as well as a bare metal HPC cluster while offering all the advantages of virtualization with vSphere like increased IT agility, flexibility, scalability, and significant cost savings of hardware. --Authored by Yuankun Fu