Storage I/O Control
Storage I/O Control (SIOC) is a feature introduced to provide I/O prioritization of virtual machines running on a group of vSphere hosts that access a shared storage pool. It extends the familiar constructs of shares and limits, which already existed for CPU and memory, to address storage utilization through a dynamic allocation of I/O queue slots across a group of servers.
Traditional CPU and memory shares address resources on a single ESXi host. Essentially virtual machines are competing for limited memory and CPU resources contained within a single host. Shared storage resources in a vSphere infrastructure are different, because as the name implies, storage is shared. This means vSphere must treat access to storage at a multi-host level, rather than than on a per-host basis. The operating boundaries for SIOC is a group of hosts sharing a common datastore and managed by a single vCenter Server instance. The following diagrams will show the differences between managing access to storage at the host level and across a group of hosts sharing common datastores.
The diagram below shows three virtual machines. VM001 and VM002 are hosted on ESX01 and VM003 is hosted on ESX02. Each virtual machine has disk shares set to a value of 1000. Storage I/O Control is disabled in this example, so there is no mechanism to regulate the I/O on a datastore level. At the bottom of the diagram is the Storage Array Queue. This represents the actual reads and writes being sent to the storage array for processing. As you can see, VM003 ends up getting more resources than VM001 and VM002 despite each virtual machine having equal share values for storage resources. This is due to the fact that VM003 has access to all of the storage resources on host ESX02, while VM001 and VM002 are sharing the resources of host ESX01.
In the next diagram, Storage I/O Control has been enabled. In this case, during times of contention, all virtual machines accessing a specific datastore will have I/O access limited by their respective share value, rather than per host. As you can see, VM003 is now limited by its 1000 share value.
Once Storage I/O Control has been enabled for a specific datastore, it will start monitoring that datastore. If the specified latency threshold has been reached (Default: Average I/O Latency of 30MS) for the datastore, SIOC will be triggered to take action and to resolve this possible imbalance. SIOC will then limit the amount of I/O operations a host can issue for that given datastore. It does this by throttling the host device queue (shown in the diagram and labeled as “Device Queue Depth”). In this example, the device queue depth of ESX02 is decreased to 16.
Storage I/O Control operates as a “datastore-wide disk scheduler”. When enabled for a datastore, SIOC will sum up the disk shares for each of the VMDK files on the datastore. SIOC will then calculate the I/O slot entitlement per ESXi host based on the percentage of shares virtual machines running on that host have relative the total shares for all hosts accessing that datastore.
For Storage I/O Control to engage in optimizing I/O to a given datastore, two conditions must be met:
- The datastore must have this feature enabled. (Enabled by configuring the property setting of the datastore).
- There must be a sustained average latency detected across the vSphere hosts that share that datastore. The default threshold is 30ms and can be modified through the advanced setting options for the datastore properties.
Once both of those conditions are met, SIOC engages in proactively managing the I/O queues across all ESXi servers that share the datastore.
vSphere 5.0 enhancements
vSphere 5.1 enhancements
vSphere 5.0 Storage I/O Control provides support for shares and limits for block based and NFS datastores. This means that no single virtual machine should be able to create a bottleneck in any environment regardless of the type of shared storage used.
The default latency threshold for SIOC is
30msecs. Not all storage devices are equal, so this default is set to the middle of the range value. There are devices that will hit their natural contention point earlier than others, for example, SSDs, for which the threshold should be decreased by the user.
However, manually determining the correct latency can be difficult. This generates a need for the latency threshold to be automatically determined for each device. Rather than using a default/user selection for latency threshold, vSphere 5.1 SIOC can automatically determine the best threshold for a datastore.
The latency threshold is set to the value determined by the I/O injector (a part of SIOC). When the I/O injector calculates the peak throughput, it then finds the 90 percent throughput value and measures the latency at that point to determine the threshold.
vSphere administrators can change this 90 percent to another percent value or they can continue to input a millisecond value if they choose.
With vSphere 5.1, SIOC is automatically enabled on all datastores in “stats only” mode. This means that storage statistics that are typically presented only when SIOC is enabled will now be available immediately. It also means that
Storage DRS now has statistics in advance for new datastores being added to the datastore cluster.
VmObservedLatency is a new SIOC metric introduced in vSphere 5.1. It replaces the datastore latency metric that was used in previous versions. This new metric measures the time between receipt by VMkernel of the I/O from the virtual machine and receipt of the response from the datastore. Previously, latency was measured only after the I/O had left the ESXi host. The new metric also measures latency in the host. This new metric will be visible in the vSphere Client™