What is the VMmark Benchmark?

VMmark is a free tool used by hardware vendors and others to measure the performance and scalability of virtualization platforms.
The VMmark benchmark:

  • Allows accurate and reliable benchmarking of virtual data center performance.
  • Allows comparison of the performance of different virtualization platforms.
  • Can be used to determine the performance effects of changes in hardware, software, or configuration within the virtualization environment.

 

Why a Virtualization Platform Benchmark?

A cloud environment typically collects several diverse workloads onto a virtualization platform — a collection of physical servers accessing shared storage and network resources. Traditional single-workload performance and scalability benchmarks for non-virtualized environments were developed with neither virtual machines nor cloud environments in mind. Even single-server virtualization benchmarks did not fully capture the complexities of today's virtualized data centers.

× Modal Popup Image

VMmark 4: A Web-Scale Multi-Server Virtualization Platform Benchmark

VMmark 1.x pioneered single-server virtualization benchmarking with its unique tile-based multi-application design. VMmark 2.x expanded this to multiple servers and platform-level workloads. VMmark 3.x addressed the move towards highly scalable ‘third platform’ applications and increasingly complex OLTP workloads. VMmark 4 builds on these earlier VMmark versions by increasing the resources consumed by traditional Java and database workloads and adding Kubernetes, Docker containers, NoSQL, and social network workloads common in modern enterprise data centers.

Automated Benchmark Installation

VMmark 4 features a highly automated setup and tile-cloning process that makes benchmark deployment fast and easy, with very little need for manual intervention. VMmark 4 uses free or open-source software throughout, eliminating the need to purchase software licenses, and the VMmark .ova includes all the needed software in one downloadable template.

× Modal Popup Image
× Modal Popup Image

How Does VMmark Work?

The VMmark benchmark combines commonly-virtualized applications into predefined bundles called "tiles." The number of VMmark tiles a virtualization platform can run, as well as the cumulative performance of those tiles and of a variety of platform-level workloads, determine the VMmark 4 score.

A Peer Reviewed Benchmark with More than 400 Published Results

Before being published on the VMmark results page, every VMmark result must be submitted to a review panel composed of a variety of companies that have published VMmark benchmark results, ensuring the fairness and integrity of the benchmark.

Since its inception in 2007, more than 400 VMmark results have been published on the VMmark website and VMmark has become the standard by which the performance of virtualization platforms is evaluated.

× Modal Popup Image

Application-Centric Benchmarking of Real-World Workloads

VMmark uses workloads representative of the highly scalable and complex applications commonly found in the data center. VMware has worked closely with its partners to design and implement the benchmark and has gathered extensive customer feedback to understand how these applications are typically used in virtualized environments.

Self-Scaling Applications

Elasticity is critical in measuring the real-world workloads that are found in today’s datacenters. The act of adding and subtracting resources to meet demands is now more common than ever for self-scaling applications. VMmark 4 includes this component alongside a cyclical application profile to more accurately represent today’s bursty environments.

Unique Tile-Based Implementation


The unit of work for a virtualized data center can be usefully defined as a collection of virtual machines executing a set of diverse workloads. The VMmark benchmark refers to this unit of work as a "tile." Each VMmark tile is paired with a client system that drives the tile’s virtual machines to perform a variety of tasks, some internal to each virtual machine, some involving other virtual machines in the tile, and some involving the client system.

 

The total number of tiles that multiple systems in the data center can accommodate, at the same time as infrastructure operations are being performed in the background, provides a coarse measure of that data center's consolidation capacity. The performance of the workloads within those tiles provides a fine measure of the data center's overall performance and, combined with the performance of the administrative operations, is used to calculate a VMmark benchmark score.

Multi-Server Virtualized Data Center Benchmarking

The rapid pace of innovation has quickly transformed typical server usage by enabling easier virtualization of bursty and heavy workloads, dynamic virtual machine relocation (vMotion), dynamic datastore relocation (Storage vMotion), and automation of many provisioning and administrative tasks across large-scale multi-host environments. In this paradigm, a significant proportion of the stresses on the CPU, network, disk, and memory subsystems can be generated by the underlying infrastructure operations. Load balancing across multiple hosts can also greatly affect application performance. Any relevant benchmarking methodology must still focus on user-centric application performance while accounting for the effects of this infrastructure activity on overall platform performance. VMmark 4 generates a realistic measure of platform performance by incorporating a variety of platform-level workloads such as virtual machine migration, storage migration, shared-nothing migration, clone and deploy, and snapshotting operations, in addition to traditional application-level workloads.

High-Precision Scoring Methodology

During a VMmark benchmark run, which lasts at least three hours, individual performance metrics are collected every 60 seconds. Each of these metrics represents the performance of an individual application or infrastructure workload.


The application workload metrics for each tile are computed and aggregated into a score for that tile by normalizing the different performance metrics, such as operations/second or transactions/second, with respect to a reference system. A geometric mean of the normalized scores is then computed as the final score for the tile. Finally, the resulting per-tile scores are summed to create the application workload portion of the final metric.


A similar calculation is used to create the infrastructure workload portion of the final metric except that, unlike the application workloads, the infrastructure workloads are not scaled explicitly by the user. Consequently, the infrastructure workloads are compiled as a single group and no multi-tile sums are required.


The final benchmark score is computed as a weighted average: 80% to the application workload component and 20% to the infrastructure workload component. These weights were chosen to reflect the relative contribution of infrastructure and application workloads to overall resource demands.


In order for the resultant benchmark score to be considered compliant, the benchmark run must also meet a number of conditions, including minimum quality-of-service requirements.


In addition to the overall benchmark score, a VMmark full disclosure report also includes the raw and normalized results for each underlying workload and complete details of the virtualization platform configuration. In some cases, studying the workload metrics along with the platform configuration can provide insight into system performance and scaling.

VMmark Product FAQs

Besides the need to capture key performance characteristics of virtual systems, an appropriate virtual platform benchmark must employ realistic, diverse workloads running on multiple hosts. Further, there is a need to define a single, easy to understand metric while ensuring that the benchmark is representative of various end-user environments. The benchmark specification must provide a methodical way to measure scalability so that the same benchmark can be used for small platforms as well as larger platforms from different hardware vendors.

In today’s dynamic environment, it is imperative for users to have meaningful and precise metrics in order to effectively compare the suitability and performance of different platforms for virtual environments. VMmark 4 builds on the legacy of earlier VMmark versions and includes new workloads to better represent the complexity of today’s enterprise data centers

A VMmark tile is group of 23 virtual machines concurrently executing a collection of diverse workloads. Each of these workloads represents a common multi-tier application workload found in today's data centers. Included in each tile are a scalable web simulation, an e-commerce simulation, a social network simulation, a distributed database workload, and an idle machine.

Each virtual machine in a tile is tuned to use only a fraction of the system's total resources. As a tile, the aggregate of all workloads utilizes less than the full capacity of modern servers. The saturation of a system's resources and accurate measurement of server performance with VMmark 4 might therefore require the simultaneous execution of multiple tiles.

Each workload within a VMmark 4 tile is constrained to execute at less than full utilization of its virtual machine. The performance of each workload can vary to a degree with the speed and capabilities of the underlying system. For example, disk-centric workloads might respond to the addition of a fast disk array with a more favorable score. These variations can capture system improvements that don't warrant the addition of another tile. The workload throttling will force the use of additional tiles for large jumps in system performance. When the number of tiles is increased, workloads in existing tiles might have lower performance. However, if the system has not been overcommitted, the aggregate score, including the new tile, should increase. In some cases, an environment will have additional resources, but not enough to run a full tile. In this scenario, VMmark 4 provides for the use of fractional tiles. This allows for additional work to be done but with only a subset of the application workloads, enabling users to evaluate the full potential of their systems. The result is a flexible benchmark metric that provides a relative measure of the number of workloads that can be supported by a particular system as well as the cumulative performance level of all the virtual machines.

VMmark was developed as a tool for hardware vendors, system integrators, and customers to evaluate the performance of their systems. Many customers will not run the benchmark themselves, but rather rely on published VMmark scores from their hardware vendors to make purchasing and configuration decisions for their virtualization infrastructure.

The main use-case for VMmark is to compare the performance of different hardware platforms and configurations. Organizations implementing or evaluating virtualization platforms use VMmark for comparing performance and scalability of different virtualization configurations, making appropriate hardware choices, and for measuring platform performance on an ongoing basis.

It is also important to note that VMmark is neither a capacity planning tool nor a sizing tool. It does not provide deployment guidelines for specific applications. Rather VMmark is meant to be representative of a general-purpose virtualization environment. The virtual machine configurations and the software stacks inside the virtual machines are fixed as part of the benchmark specification. Recommendations derived from VMmark results will capture many common cases; however, specialized scenarios will likely require individual measurement.

With VMmark, organizations have a robust and reliable benchmark that captures the key performance characteristics of virtual platforms, is representative of real world environments running multiple workloads, is hardware platform neutral, and provides a methodical way to measure scalability so that the same benchmark can be used across different vendor platforms.

VMmark Scoring FAQ

A higher VMmark score implies that a virtualization platform is capable of sustaining greater throughput in a mixed workload cloud environment, while experiencing data center operations in the background. A larger number of VMmark tiles used to generate the benchmark means that the platform supported more virtual machines across the multiple hosts during the benchmark run. Typically a higher benchmark score requires a higher number of tiles.

If two different virtualization platforms achieve similar VMmark scores with a different number of tiles, the score with the lower tile count is generally preferred. The higher tile count could be an indication that the underlying hardware resources were not properly balanced. Studying the individual workload metrics is suggested in these cases.

VMmark Version FAQ

The workload applications in VMmark 4 have been updated to reflect modern multi-tier application design standards and technologies, including the addition of Kubernetes, NoSQL, microservices, and Docker containers. VMmark 4 adds support for fractional tiles and, though it can measure power, it does not provide benchmarking on power efficiency. The VMmark 4 .ova includes all the needed software in one downloadable template and features a highly automated setup and tile-cloning process that vastly simplifies the most common workflow and minimizes the time required to generate valuable results.

No, the workloads and load levels of VMmark 4 have changed significantly from VMmark 3.x in order to take better advantage of today's larger and more powerful server, storage, and networking hardware. For this reason, VMmark 4 benchmark scores are not comparable to VMmark 3.x benchmark scores.

To qualify as a vSAN storage result, a VMmark result must run all application workload virtual machines on VMware vSAN storage. However, these results can use non-vSAN storage hardware for infrastructure target storage and for the deploy template. For more details, see the VMmark Run and Reporting Rules.

To view vSAN storage results, see Recently Published vSAN Storage on the VMmark Results Page.

VMmark Support

There are a number of sources for VMmark support:

  • Refer to the VMmark User’s Guide, particularly the "Troubleshooting" section.
  • Visit the VMmark community forum.
  • Contact the VMmark team (an email address is provided in the VMmark User’s Guide in the "Submit the Benchmark Results for Review" section).
We couldn't find a match for given <KEYWORD>, please try again.
We couldn't find a match for given <KEYWORD>, please try again.
We couldn't find a match for given <KEYWORD>, please try again.

VMmark 4 System Requirements

This section provides an abbreviated outline of the minimum hardware and software required to run VMmark 4. For a detailed list, see the most recent version of the VMmark User’s Guide (available on the VMmark download page).

A cluster with at least two hosts (homogeneous systems not required), with the following:

  • VMware ESXi 8.0 Update 2 or later
  • Sufficient memory; 324GB of memory is allocated for each tile, plus 16GB for the first tile, but physical memory requirements can be reduced with memory overcommitment
  • 2003GB of shared storage per tile, plus 96GB for the first tile, less if thin provisioned
  • vMotion compatible servers
  • VMware vCenter Server installed on a separate, dedicated server

One prime client virtual machine with:

  • 4 vCPUs, CPU overcommit is not allowed
  • 32GB of memory, overcommit is not allowed
  • 182GB of available virtual disk space, less if thin provisioned

One client virtual machine per tile, each with:

  • 64 vCPUs, CPU overcommit is not allowed
  • 96GB of memory, overcommit is not allowed
  • 146GB of available virtual disk space, less if thin provisioned

At least one separate host for the virtual clients with the following:

  • VMware ESXi 8.0 Update 2 or later
  • Physical processors equivalent or faster than AMD EPYC 9004 Series (“Genoa”) are recommended
  • vMotion networking (25 Gbps network recommended)

Getting Started With VMmark 4

To get started with VMmark, follow these steps:

 

  1. Download the VMmark template virtual machine
    VMmark 4 uses a single template as the source for all virtual machines used in the benchmark. The template contains the VMmark 4 harness, the configuration files, and all the software needed to run VMmark.

  2. Download the VMmark User’s Guide

  3. Deploy the template virtual machine
    Deploy the template into the virtualized infrastructure that you wish to test.

  4. Refer to the VMmark User’s Guide
    Follow the instructions in the User’s Guide for directions on how to set up and run the benchmark.

  5. Carefully read the VMmark Run and Reporting Rules
    The VMmark Run and Reporting Rules document outlines the requirements for producing a publishable VMmark result. To be published, or otherwise publicly disclosed, a VMmark result must adhere to the latest version of the Run and Reporting Rules.

Ready to Get Started?