The following are the features of VMware VMmark® version 2.x.

Application-Centric Benchmarking of Real-World Workloads

VMmark uses workloads representative of those applications most often found in the data center, such as email servers, databases, etc. VMware has worked closely with its partners to design and implement the benchmark across various software and hardware platforms, and has also gathered extensive customer feedback to understand how these applications are typically used in virtualized environments.

To measure performance, VMmark leverages well-understood, existing workloads that customers are already familiar with.

Unique Tile-Based Implementation

The unit of work for a benchmark of virtualized consolidation environments can be naturally defined as a collection of virtual machines executing a set of diverse workloads. The VMmark benchmark refers to this unit of work as a tile. In VMmark 2.x, the total number of tiles that multiple systems in the data center can accommodate, while experiencing administrative operations in the background, provides a coarse-grain measure of that data center's consolidation capacity. The number of tiles and the performance of each individual workload within the tiles determine the overall benchmark score.

Multi-Server Virtualized Data Center Benchmarking

The rapid pace of innovation has quickly transformed typical server usage by enabling easier virtualization of bursty and heavy workloads, dynamic virtual machine relocation (vMotion), dynamic datastore relocation (storage vMotion), and automation of many provisioning and administrative tasks across large-scale multi-host environments. In this paradigm, a large fraction of the stresses on the CPU, network, disk and memory subsystems is generated by the underlying infrastructure operations. Load balancing across multiple hosts can also greatly affect application performance. Any relevant benchmarking methodology must still focus on user-centric application performance while accounting for the effects of this infrastructure activity on overall platform performance. VMmark 2.x generates a realistic measure of platform performance by incorporating a variety of platform-level workloads such as virtual machine migration, clone and deploy, and storage migration operations, in addition to traditional application-level workloads.

High-Precision Scoring Methodology

VMmark allows for the integration of the different component metrics into an overall score. Once a VMmark test completes, each individual workload, both application workloads and infrastructure workloads, reports its relevant performance metric. These metrics are collected at frequent intervals during the course of a run. A VMmark benchmark test is designed to run for at least three hours with workload metrics reported every 60 seconds.

After a benchmark run, the application workload metrics for each tile are computed and aggregated into a score for that tile. This aggregation is performed by first normalizing the different performance metrics such as MB/second and database commits/second with respect to a reference system. Then, a geometric mean of the normalized scores is computed as the final score for the tile. The resulting per-tile scores are then summed to create the application workload portion of the final metric.

A similar calculation is used for the infrastructure workloads to create the infrastructure workload portion of the final metric, except that unlike the application workloads, the infrastructure workloads are not scaled explicitly by the user. Consequently, the infrastructure workloads are compiled as a single group and no multi-tile sums are required.The final benchmark score is then computed as a weighted average of the application-workload component and the infrastructure-workload component. VMmark 2.x gives a weight of 80% to application-workload component and a weight of 20% to the infrastructure-workload component. These weights were chosen to reflect the relative contribution of infrastructure and application workloads to overall resource demands.

This approach helps users measure the virtualization overhead of the individual application workloads, as well as the scalability of the entire system.