VMware ESX Server 2.0Features | Documentation | Knowledge Base | Discussion ForumsESX Server 2.0 includes multipathing support to maintain a constant connection between the server machine and the storage device in case of the failure of a host bus adapter (HBA), switch, storage controller (or storage processor; abbreviated as SP in the following diagram), or a Fibre Channel cable. Unlike previous versions of ESX Server, this version of multipathing support does not require specific failover drivers. In the preceding diagram, there are multiple, redundant paths from each server to the storage device. For example, if HBA1, or the link between HBA1 and the Fibre Channel (FC) switch breaks, HBA2 takes over and provides the connection between the server and the switch. This process is called HBA failover. Similarly, if SP1, or the link between SP1 and the switch breaks, SP2 takes over and provides the connection between the switch and the storage device. This process is called SP failover. VMware ESX Server 2.0 provides both HBA and SP failover with its multipathing feature. (SP failover may not be supported by all disk arrays.) For information on supported SAN hardware, download the VMware ESX Server SAN Compatibility List from the VMware Web site at www.vmware.com/support/esx2. You can view the current multipathing state by examining the proc entries for each of your LUNs. Each LUN is represented by one proc entry, represented by its canonical name. The canonical name for a LUN is the first path ESX Server finds to the LUN. Since ESX Server begins its scans at the first HBA and the lowest device number, the first path (and also the LUN's canonical name) is the path with the lowest number HBA and device number. For example, if the paths to a LUN are vmhba0:0:2, vmhba1:0:2, vmhba0:1:2 and vmhba1:1:2, then the LUN's canonical name is vmhba0:0:2.
You can specify the default policy for the multipathing feature. There are two policies:
Note: You can select a different policy for each LUN. You can use the proc command to disable and enable paths, set the active path, and set the preferred path, as illustrated in the following examples. Type the following command to disallow the specified path to the LUN. echo "pathoff <path>" > /proc/vmware/scsi/<adapter>/<entry> In this example, you are changing the status of path vmhba1:0:1 to off in the proc entry for LUN vmhba0:0:1. echo "pathoff vmhba1:0:1" > /proc/vmware/scsi/vmhba0/0:1 Type the following command to enable the specified path to the LUN. echo "pathon <path>" > /proc/vmware/scsi/<adapter>/<entry> In this example, you are changing the status of path vmhba1:0:1 to on in the proc entry for LUN vmhba0:0:1. echo "pathon vmhba1:0:1" > /proc/vmware/scsi/vmhba0/0:1 Type the following command to set the specified path as the preferred path to the LUN. echo "preferred <path>" > /proc/vmware/scsi/<adapter>/<entry> In this example, you are making path vmhba1:0:1 the preferred path (indicated by #) in the proc entry for LUN vmhba0:0:1. echo "preferred vmhba1:0:1" > /proc/vmware/scsi/vmhba0/0:1 Your multipathing settings are saved when shutting down ESX Server normally. However, we suggest you run the following command, as root, to ensure your settings are saved, in case of an abnormal shutdown.
su By running this command, your multipathing settings are restored automatically on bootup. When a cable is pulled, I/O freezes for approximately 30-60 seconds, until the SAN driver determines that the link is down, and failover occurs. During that time, the virtual machines (with their virtual disks installed on a SAN) may appear unresponsive, and any operations on the /vmfs directory may appear to hang. After the failover occurs, I/O should resume normally. Even though ESX Server's failover feature ensures high availability and prevents connection loss to SAN devices, all connections to SAN devices may be lost due to disastrous events, that include multiple breakages. If all connections to the storage device are not working, then the virtual machines will begin to encounter I/O errors on their virtual SCSI disks. Also, operations in the /vmfs directory may eventually fail after reporting an "I/O error". For QLogic cards, you may want to adjust the PortDownRetryCount value in the QLogic BIOS. This value determines how quickly a failover occurs when a link goes down. If the PortDownRetryCount value is <n>, then a failover typically takes a little longer than <n> multiplied by 2 seconds. A typical recommended value for <n> is 15, so in this case, failover takes a little longer than 30 seconds. For more information on changing the PortDownRetryCount value, refer to your QLogic documentation. For the Windows 2000 and Windows Server 2003 guest operating systems, you may want to increase the standard disk TimeOutValue so that Windows will not be extensively disrupted during failover.
|
