Besides the need to capture key performance characteristics of virtual systems, an appropriate virtual platform benchmark must employ realistic, diverse workloads running on multiple hosts. Further, there is a need to define a single, easy to understand metric while ensuring that the benchmark is representative of various end-user environments. The benchmark specification must provide a methodical way to measure scalability so that the same benchmark can be used for small platforms as well as larger platforms from different hardware vendors.
VMware realized the need for a next-generation virtualization benchmark to compare different virtualization and hardware platforms, which consists of multiple hosts, diverse multi-tier workloads and infrastructure operations. VMmark 3 was created as a standardized way to compare these platforms.
A VMmark tile is group of nineteen virtual machines concurrently executing a collection of diverse workloads. Each of these workloads represents a common multi-tier application workload found in today's data centers. Included in each tile are a scalable web simulation, an e-commerce simulation, and an idle machine.
Each virtual machine in a tile is tuned to use only a fraction of the system's total resources. As a tile, the aggregate of all workloads utilizes less than the full capacity of modern servers. The saturation of a system's resources and accurate measurement of server performance with VMmark 3 therefore requires the simultaneous execution of multiple tiles.
Each workload within a VMmark 3 tile is constrained to execute at less than full utilization of its virtual machine. The performance of each workload can vary to a degree with the speed and capabilities of the underlying system. For example, disk-centric workloads might respond to the addition of a fast disk array with a more favorable score. These variations can capture system improvements that don't warrant the addition of another tile. The workload throttling will force the use of additional tiles for large jumps in system performance. When the number of tiles is increased, workloads in existing tiles might have lower performance. If the system has not been overcommitted, the aggregate score, including the new tile, should increase. The result is a flexible benchmark metric that provides a relative measure of the number of workloads that can be supported by a particular system as well as the cumulative performance level of all the virtual machines.
VMmark 3 was developed as a tool for hardware vendors, system integrators, and customers to evaluate the performance of their systems. Many customers will not run the benchmark themselves, but rather rely on published VMmark 3 scores from their hardware vendors to make purchasing and configuration decisions for their virtualization infrastructure.
The main use-case for VMmark 3 is to compare the performance of different hardware platforms and configurations. Organizations implementing or evaluating virtualization platforms use VMmark 3 for comparing performance and scalability of different virtualization configurations, making appropriate hardware choices, and for measuring platform performance on an ongoing basis.
It is also important to note that VMmark 3 is neither a capacity planning tool nor a sizing tool. It does not provide deployment guidelines for specific applications. Rather VMmark 3 is meant to be representative of a general-purpose virtualization environment. The virtual machine configurations and the software stacks inside the virtual machines are fixed as part of the benchmark specification. Recommendations derived from VMmark 3 results will capture many common cases; however, specialized scenarios will likely require individual measurement.
With VMmark 3, organizations have a robust and reliable benchmark that captures the key performance characteristics of virtual platforms, is representative of real world environments running multiple workloads, is hardware platform neutral, and provides a methodical way to measure scalability so that the same benchmark can be used across different vendor platforms.
A higher VMmark 3 score implies that a virtualization platform is capable of sustaining greater throughput in a mixed workload cloud environment, while experiencing data center operations in the background. A larger number of VMmark 3 tiles used to generate the benchmark means that the platform supported more virtual machines across the multiple hosts during the benchmark run. Typically a higher benchmark score requires a higher number of tiles.
If two different virtualization platforms achieve similar VMmark 3 scores with a different number of tiles, the score with the lower tile count is generally preferred. The higher tile count could be an indication that the underlying hardware resources were not properly balanced. Studying the individual workload metrics is suggested in these cases.
The workload applications in VMmark 3 have been updated to reflect modern multi-tier application design standards and technologies. VMmark 3 features a highly automated setup and tile-cloning process, and the VMmark .ova includes all the needed software in one downloadable template.
No, the workloads and load levels of VMmark 3 have changed significantly from VMmark 2.x in order to take better advantage of today's larger and more powerful server hardware. Because the VMmark 3 workloads and load levels have changed since VMmark 2.x, the VMmark 3 benchmark scores are not comparable to VMmark 2.x benchmark scores.
No. VMmark 3 provides three test types:
Benchmarkers may choose to optimize a test configuration for a particular aspect of measurement. For instance, if running with a power measurement, the benchmarker may choose to optimize for power over server performance. Representing all server performance results (both with and without power measurements) on the same results page could be misleading. In order to ensure consistent comparability of results, separate results pages are used.
There are a number of sources for VMmark support: