VMware

VMware ESX Server 2.0

Features | Documentation | Knowledge Base | Discussion Forums

previous Prev   Contents   Last   Next next

Using Multipathing in ESX Server

Using Multipathing in ESX Server

ESX Server 2.0 includes multipathing support to maintain a constant connection between the server machine and the storage device in case of the failure of a host bus adapter (HBA), switch, storage controller (or storage processor; abbreviated as SP in the following diagram), or a Fibre Channel cable. Unlike previous versions of ESX Server, this version of multipathing support does not require specific failover drivers.

In the preceding diagram, there are multiple, redundant paths from each server to the storage device. For example, if HBA1, or the link between HBA1 and the Fibre Channel (FC) switch breaks, HBA2 takes over and provides the connection between the server and the switch. This process is called HBA failover.

Similarly, if SP1, or the link between SP1 and the switch breaks, SP2 takes over and provides the connection between the switch and the storage device. This process is called SP failover. VMware ESX Server 2.0 provides both HBA and SP failover with its multipathing feature. (SP failover may not be supported by all disk arrays.)

For information on supported SAN hardware, download the VMware ESX Server SAN Compatibility List from the VMware Web site at www.vmware.com/support/esx2.

Viewing the Current Multipathing State

Viewing the Current Multipathing State

You can view the current multipathing state by examining the proc entries for each of your LUNs. Each LUN is represented by one proc entry, represented by its canonical name.

The canonical name for a LUN is the first path ESX Server finds to the LUN. Since ESX Server begins its scans at the first HBA and the lowest device number, the first path (and also the LUN's canonical name) is the path with the lowest number HBA and device number. For example, if the paths to a LUN are vmhba0:0:2, vmhba1:0:2, vmhba0:1:2 and vmhba1:1:2, then the LUN's canonical name is vmhba0:0:2.

  1. Change directories to the SCSI adapter, /proc/vmware/scsi/<scsi_adapter> and view the directory listing.

    cd /proc/vmware/scsi/vmhba0
    ls

    The output resembles the following:

    0:0 0:2 stats

  2. View the proc entry for a LUN.

    Each entry includes the partition table, statistics, the vendor ID, the size of the disk, and so on. The list of path(s) to the LUN is included at the end of the entry. For example, the entry for /proc/vmware/scsi/vmhba0/0:2 includes:

    Vendor: IBM Model: 2105E20 Rev: .100

    Type: Direct-Access ANSI SCSI revision: 03

    Size: 17166 Mbytes

    Queue Depth: 16

    Partition Info:
    Block size: 512
    Num Blocks: 35156288

    cmds reads KBread writes KBwritten cmdsAbrt busRst

    18 11 7 0 0 0 0

    paeCmds paeCopies splitCmds splitCopies issueAvg totalAvg

    0 0 0 0 14557 572198

    .
    .
    .

    Paths:fixed

    vmhba0:0:2 on*#

    vmhba1:0:2 on

    vmhba0:1:2 on

    vmhba1:1:2 on

    Active: 0 Queued: 0

    LUN vmhba0:0:2 has a "fixed" policy. There are four paths to this LUN; the first path listed is always the canonical name for the LUN. The list of paths indicates the different ways that the LUN can be accessed. For example, the presence of path vmhba1:1:2 indicates that one of the ways to access the LUN is at device 1 via HBA 1.

    The asterisk (*) indicates that the first path, vmhba0:0:2 is the current, active path and the pound (#) indicates that this is the preferred path from the server to the LUN. (By default, the preferred path for a LUN is its canonical name.)

    The status of each path to the LUN is indicated by on, off, or dead. The on status indicates that the path is OK, and data is being transferred successfully. The off status indicates that this path has been deliberately turned off, while dead indicates that the path should be active, but the software cannot connect to the LUN through this path.

Setting Your Multipathing Policy for a LUN

Setting Your Multipathing Policy for a LUN

You can specify the default policy for the multipathing feature. There are two policies:

  • fixed — ESX Server always uses the preferred path to the LUN; if it cannot access the LUN through the preferred path, then it tries the alternate paths. Fixed is the default policy in ESX Server.

    Type the following command to select the fixed policy for a LUN, in this example, vmhba0:0:0.

    echo "policy fixed" > /proc/vmware/scsi/vmhba0/0:0

  • mru — ESX Server uses the most recent path to the LUN until this path becomes unavailable. That is, ESX Server does not automatically revert back to the preferred path.

    Type the following command to select the mru policy for a LUN, in this example, vmhba0:0:0.

    echo "policy mru" > /proc/vmware/scsi/vmhba0/0:0

Note: You can select a different policy for each LUN.

Specifying Paths

Specifying Paths

You can use the proc command to disable and enable paths, set the active path, and set the preferred path, as illustrated in the following examples.

Disabling a Path

Disabling a Path

Type the following command to disallow the specified path to the LUN.

echo "pathoff <path>" > /proc/vmware/scsi/<adapter>/<entry>

In this example, you are changing the status of path vmhba1:0:1 to off in the proc entry for LUN vmhba0:0:1.

echo "pathoff vmhba1:0:1" > /proc/vmware/scsi/vmhba0/0:1

Enabling a Path

Enabling a Path

Type the following command to enable the specified path to the LUN.

echo "pathon <path>" > /proc/vmware/scsi/<adapter>/<entry>

In this example, you are changing the status of path vmhba1:0:1 to on in the proc entry for LUN vmhba0:0:1.

echo "pathon vmhba1:0:1" > /proc/vmware/scsi/vmhba0/0:1

Setting the Preferred Path

Setting the Preferred Path

Type the following command to set the specified path as the preferred path to the LUN.

echo "preferred <path>" > /proc/vmware/scsi/<adapter>/<entry>

In this example, you are making path vmhba1:0:1 the preferred path (indicated by #) in the proc entry for LUN vmhba0:0:1.

echo "preferred vmhba1:0:1" > /proc/vmware/scsi/vmhba0/0:1

Saving Your Multipathing Settings

Saving Your Multipathing Settings

Your multipathing settings are saved when shutting down ESX Server normally. However, we suggest you run the following command, as root, to ensure your settings are saved, in case of an abnormal shutdown.

su
# /usr/sbin/vmkmultipath -S

By running this command, your multipathing settings are restored automatically on bootup.

In Case of Failover

In Case of Failover

When a cable is pulled, I/O freezes for approximately 30-60 seconds, until the SAN driver determines that the link is down, and failover occurs. During that time, the virtual machines (with their virtual disks installed on a SAN) may appear unresponsive, and any operations on the /vmfs directory may appear to hang. After the failover occurs, I/O should resume normally.

Even though ESX Server's failover feature ensures high availability and prevents connection loss to SAN devices, all connections to SAN devices may be lost due to disastrous events, that include multiple breakages.

If all connections to the storage device are not working, then the virtual machines will begin to encounter I/O errors on their virtual SCSI disks. Also, operations in the /vmfs directory may eventually fail after reporting an "I/O error".

Settings for QLogic Adapters

Settings for QLogic Adapters

For QLogic cards, you may want to adjust the PortDownRetryCount value in the QLogic BIOS. This value determines how quickly a failover occurs when a link goes down.

If the PortDownRetryCount value is <n>, then a failover typically takes a little longer than <n> multiplied by 2 seconds. A typical recommended value for <n> is 15, so in this case, failover takes a little longer than 30 seconds.

For more information on changing the PortDownRetryCount value, refer to your QLogic documentation.

Failover in Windows 2000 and Windows Server 2003 Guest Operating Systems

Failover in Windows 2000 and Windows Server 2003 Guest Operating Systems

For the Windows 2000 and Windows Server 2003 guest operating systems, you may want to increase the standard disk TimeOutValue so that Windows will not be extensively disrupted during failover.

  1. Select Start > Run, type regedit.exe, and click OK.
  2. In the left panel hierarchy view, double-click HKEY_LOCAL_MACHINE, System, CurrentControlSet, Services, then Disk.
  3. Select the TimeOutValue and set the Data value to x03c (hexadecimal) or 60 (decimal). By making this change, Windows waits at least 60 seconds, for delayed disk operations to complete, before generating errors.
  4. Click OK and exit the Registry Editor program.

previous Prev   Contents   Last   Next next