VMware VMmark FAQs  

What is VMmark v2?

VMmark® 2.x is the first benchmark that was designed specifically to quantify and measure the performance of virtualized data centers. It features a tile-based scheme for measuring the scalability of consolidated workloads and provides a consistent methodology that captures both the overall scalability and individual application performance.

The VMmark 2.x benchmark is built on our expertise in virtualization performance and incorporates popular workloads from application categories most commonly represented in customer data centers. Unlike other virtualization benchmarks that benchmark only a single-host, VMmark 2.x is the first benchmark to measure the performance of multi-host virtual environments.

Why is there a need for a new benchmark?

Today’s virtual data centers contain several diverse workloads running on top of a virtualization platform - a collection of physical servers accessing shared storage and network resources. In these virtualization platforms, advanced virtualization capabilities such as dynamic load balancing ensure that all system resources such as CPU, network and disk are more efficiently utilized. Previous single-host virtualization benchmarks have not fully captured the complexities of today’s virtualized data centers. VMmark 2.x, the industry’s first multi-server data center virtualization benchmark, addresses this gap by including as part of the benchmark a variety of common platform-level workloads such as live migration of virtual machines, cloning and deploying of virtual machines and automatic virtual machine load balancing across the data center.

What are some specific requirements for developing such a benchmark?

Besides the need to capture key performance characteristics of virtual systems, an appropriate virtual platform benchmark must employ realistic, diverse workloads running on multiple operating systems, on multiple hosts. Further, there is a need to define a single, easy to understand metric while ensuring that the benchmark is representative of various end-user environments. The benchmark specification needs to be platform neutral and should also provide a methodical way to measure scalability so that the same benchmark can be used for small platforms as well as larger platforms from different hardware vendors.

Why did VMware develop VMmark v2?

VMware realized the need for a next-generation virtualization benchmark to compare different virtualization platforms, which consist of multiple hosts, diverse multi-tier workloads and infrastructure operations.

VMmark 2.x provides a standardized way to compare these virtualization platforms.

What is a tile?

A tile is a collection of diverse workloads consisting of eight virtual machines all concurrently executing.  Each workload represents common application workloads found in today’s data centers.  Included in a single tile are a mail server, web 2.0 database and web system, e-commerce back-end database and front-end web layer, as well as an idle machine.

Each virtual machine in a tile is tuned to use only a fraction of the system's total resources.  As a tile, the aggregate of all workloads normally utilizes less than the full capacity of modern servers.  Therefore, the complete saturation of a system's resources and accurate measurement of server performance with VMmark 2.x require the execution of multiple tiles simultaneously.

How does VMmark v2 work?

VMmark 2.x is designed as a tile-based benchmark consisting of a diverse set of workloads commonly found in the data center. The workloads comprising each tile are run simultaneously in separate virtual machines at load levels that are typical of virtualized environments. The performance of each workload is measured and then combined with the other workloads to form the score for the individual tile. Multiple tiles can be run simultaneously to increase the overall score.

This approach allows smaller increases in system performance to be reflected by increased scores in a single tile and larger gains in system capacity to be captured to adding additional tiles.

Each workload within a VMmark 2.x tile is constrained to execute at less than full utilization of its virtual machine. The performance of each workload can vary to a degree with the speed and capabilities of the underlying system. For instance, disk-centric workloads might respond to the addition of a fast disk array with a more favorable score. These variations can capture system improvements that do not warrant the addition of another tile. The workload throttling will force the use of additional tiles for large jumps in system performance. When the number of` tiles is increased, workloads in existing tiles might measure lower performance. If the system has not been  overcommitted, the aggregate score, including the new tile, should increase. The result is a flexible benchmark metric that provides a relative measure of the number of workloads that can be supported by a particular system as well as the overall performance level within the virtual machines.

Who will use VMmark v2?

VMmark 2.x was developed as a useful tool for hardware vendors, system integrators, and customers to evaluate the performance of their systems. Many customers will not run the benchmark themselves, but rather rely on published VMmark 2.x scores from their hardware vendors to make purchasing and configuration decisions for their virtualization infrastructure.

What are the use cases for VMmark v2?

The main use-case for VMmark 2.x is to compare the performance of different hardware platforms and configurations. Organizations implementing or evaluating virtualization platforms today will use VMmark 2.x for comparing performance and scalability of different virtualization platforms, making appropriate hardware choices and for measure platform performance on an ongoing basis.

It is also important to note that VMmark 2.x is neither a capacity planning tool nor a sizing tool. It does not provide deployment guidelines for specific applications. Rather VMmark 2.x is meant to be representative of a general-purpose virtualization environment. The virtual machine configurations and the software stacks inside the virtual machines are fixed as part of the benchmark specification. Recommendations derived from VMmark 2.x results will capture many common cases; however, specialized scenarios will likely require individual measurement.

What are the benefits of VMmark v2?

With VMmark 2.x, organizations now have a robust and reliable benchmark that captures the key performance characteristics of virtual platforms; is representative of real world environments running multiple workloads; is hardware platform neutral and provides a methodical way to measure scalability so that the same benchmark can be used across different vendor platforms.

How can one obtain / get started with VMmark v2?

To get started with VMmark 2.x, one should follow the steps below:

  1. Download the latest VMmark 2.x Kit
    The VMmark 2.x kit contains the VMmark 2.x User's Guide, the configuration files, and much of the software needed to run VMmark 2.x.
  2. Download the pre-built Virtual Machines
    The same download page contains links to pre-built virtual machines for the three Linux virtual machines used in the benchmark.
  3. Extract the VMmark 2.x kit
    Extract the contents of the VMmark 2.x kit to C:/ on your initial Windows Server 2003 client system.
  4. Refer to the VMmark 2.x Benchmark Guide
    Follow the instructions in the VMmark 2.x Benchmark Guide (in the 'docs' directory of the VMmark 2.x kit) for directions on how to set up and run the benchmark.
  5. Read the run and reporting rules carefully
    The VMmark 2.x Run and Reporting Rules document (in the 'docs' directory of the VMmark kit) outlines the requirements for producing a publishable VMmark 2.x result. Any VMmark result that the benchmarker plans to publish must adhere to the Run and Reporting rules included in the benchmark kit.

How do I interpret a VMmark v2 score?

VMmark 2.x aggregates the throughput metrics of all application and infrastructure workloads to create a single overall benchmark score that can be used to quickly compare different platform configurations. Every workload must also pass its minimum quality-of-service requirements for the benchmark result to be considered compliant.

After a valid run, the metrics of the application workloads within each tile are computed and aggregated into a score for that tile. This aggregation is performed by first normalizing the different performance metrics (such as Actions/minute and operations/minute) with respect to a reference platform. Then a geometric mean of the normalized scores is computed as the final score for the tile. The resulting per-tile scores are then summed to create the application-workload portion of the final metric.

The metrics for the infrastructure workloads are aggregated separately using the same mathematical technique of normalization with respect to a reference platform followed by the computation of the geometric mean. Unlike the application workloads, the infrastructure workloads are not scaled explicitly by the user. Consequently, the infrastructure workloads are complied as a single group and no multi-tile sums are required.

The final benchmark score is then computed as a weighted average of the application-workload component and the infrastructure-workload component. VMmark 2.x gives a weight of 80 percent to application-workload component and a weight of 20 percent to the infrastructure-workload component. These weights were chosen to reflect the relative contribution of infrastructure and application workloads to overall resource demands.

A VMmark 2.x full disclosure report also includes the raw and normalized results for each underlying workload as well as complete details of the virtualization platform configuration. In some cases, studying the workload metrics along with the platform configuration can provide insight into system performance and scaling.

How do I compare VMmark 2.x scores across different virtualization platforms?

A higher VMmark 2.x score implies that a virtualization platform is capable of sustaining greater throughput in a mixed workload consolidation environment, while experiencing data center operations in the background. A larger number of VMmark 2.x tiles used to generate the benchmark means that the platform supported more virtual machines, across the multiple hosts, during the benchmark run. Typically, a higher benchmark score requires a higher number of tiles.

If two different virtualization platforms achieve similar VMmark 2.x scores with a different number of tiles, the score with the lower tile count is generally preferred. The higher tile count could be a sign that the underlying hardware resources were not properly balanced. Studying the individual workload metrics is suggested in these cases.

How is VMmark version 2.0 different from VMmark version 1.x?

VMmark 1.x was designed as a single-system consolidation benchmark consisting of six isolated single-tier workloads. VMmark 2.x was designed as a multi-host benchmark reflecting typical, modern-day usage of virtualized infrastructure. VMmark 2.x consists of two single-tier application workloads, two multi-tier application workloads, and four infrastructure-level workloads.

Are VMmark 1.x results comparable to VMmark 2.x results?

No, the workloads and load levels of VMmark 2.x have changed significantly from VMmark 1.x in order to take better advantage of today’s larger and more powerful server hardware. Because the VMmark 2.x workloads and load levels have changed since VMmark 1.x, the VMmark 2.x benchmark scores are not comparable to VMmark 1.x benchmark scores.

Are VMmark 2.0 results comparable to VMmark 2.1 results?

Yes. VMmark 2.1 adds support for client systems running certain versions of Windows Server 2008 (in addition to Windows Server 2003, which was supported in VMmark 2.0) as well as support for virtualized clients (subject to certain conditions).

How is VMmark 2.5 different from previous 2.x versions?

In addition to the optional power measurement functionality in VMmark 2.5, the benchmark has been enhanced in a variety of ways, including increased parallelism, additional client support, added support for vSphere 5.1, support for short runs, and enhanced messaging.

Will all VMmark 2.5 benchmarks include power measurements?

Not necessarily. VMmark 2.5 provides three test types: performance-only (no power measurement), server-only power-performance, and server-and-storage power-performance.

Why aren’t all VMmark 2.5 results shown on the same web page?

Benchmarkers may choose to optimize a test configuration for a particular aspect of measurement. For instance, if running with a power measurement, the benchmarker may choose to optimize for power over server performance. Representing all server performance results (both with and without power measurements) on the same results page could be misleading. In order to ensure consistent comparability of results, separate results pages are used.