VMware

Virtually There: Steve Herrod's Blog

Mon, 19 Jun 2006

ESX Server 3.0 and Performance

I discussed our obsession with quality in the previous entry. I thought people may be interested in our other obsession as well... performance. We have a fantastic performance team that keeps us constantly focused on this critical trait. ESX Server performance improvements directly translate into higher overall efficiency which can in turn lead to a variety of benefits such as faster individual applications, higher consolidation ratios, and lower power usage.

For ESX 3.0, we've worked hard to improve performance across almost every component. Here I'll touch upon how some of the major improvements will affect your workloads, describing improvements to individual VM performance, I/O performance, and overall scalability. We're also hard at work on a number of more detailed whitepapers on these topics, so substantially more detail will follow. First, we've sped up memory management unit (MMU) operations inside virtual machines. In particular, we've decreased latencies of key operations such as page faults and context switches. This benefits almost every workload, and in particular process-heavy ones such as Terminal Services, Databases, and many enterprise applications. Such applications often require large amounts of memory, and virtual machines can now use up to 16 GB of memory by enabling Physical Address Extensions (PAE) within the guest operating system. In ESX 3.0 we improved PAE performance so that there is negligible overhead when running with PAE enabled. We've also added a number of optimizations to improve the performance of applications on Linux guests. In particular, we've optimized our handling of the Linux Native Posix Thread Library (NPTL).

In addition to these targeted optimizations we've made many other changes that we expect to result in better performance overall. While the single VM performance improvements focused on CPU and memory, we have also made a number of improvements to I/O performance. We've optimized our guest virtual Ethernet adapter (vmxnet), improved VM to VM networking and re-architected our networking layer for ESX 3.0. This helps workloads such as multi-tiered applications and web servers. On the storage side, we've introduced VMFS3: a new, more scalable, distributed file system that includes enhanced file locking and improved caching to support large numbers of VMs. For the new storage options (NFS and iSCSI) we worked to ensure that the performance is up to the standard that our customers have come to expect.

In light of advances in multi-core technology, we also focused explicitly on improved scalability on a single machine. We introduced "userworlds", which allow us to run our user level virtualization components on any processor and no longer only on the Service Console. This improves scalability by reducing the demands on the Service Console, and increases the limit on the number of simultaneous VMs you can run by 50% over ESX 2.5. Below is a graph illustrating the type of benefit available due to this optimization. In this example (run on an 8-way server with 16GB of physical RAM), we can see that the time required to boot a Windows 2000 virtual machine does not dramatically increase even as you move beyond 100 instances. Different workloads will of course have different profiles, but the core scalability improvements (coupled with our continued focus on advanced resource management) should benefit most users.

Speaking of resource management, we also made a number of improvements to the CPU scheduler to reduce the amount of ready time, that is, the time a VM spends ready to run but not scheduled. In ESX 3.0 we support 4-way virtual SMP and we've improved the scheduling policies so that these VMs can run effectively alongside a mix of other VMs.

To further enhance scalability, we implemented features to reduce CPU consumption for idling VMs. These include support for power saving mode (ACPI S1) in VMs, better handling of idling windows guests, and allowing the vmkernel to halt idling CPUs.

In the end, we improved both overall performance and the performance of specific workloads by means of targeted optimizations. As a result, ESX 3.0 is our highest performance and most scalable release to date!

posted at: 17:37 | | | permanent link

Tue, 06 Jun 2006

VMware Infrastructure 3 and Quality

Hopefully you've seen the news about the launch of VMware Infrastructure 3. We're quite excited about the new features and improved performance and scalability. In future blogs, I'll cover these areas as well as the next-generation virtualization services available via VMware DRS, VMware HA, and VMware Consolidated Backup.

However, one of the things we're most proud of (and something not typically covered in product launches) is the heavy focus on quality. Our customers have come to expect solutions that are incredibly stable and "just work", and our development and QA teams obsess over maintaining this high level of quality. I thought I'd use this column to call out some interesting aspects of our quality focus.

Traditional testing

  • We of course perform the "traditional" forms of software testing: unit testing, white-box testing, black-box testing, and system/integration testing. We have several QA teams located around the world focusing on these traditional testing techniques ("the sun never sets on VMware QA").
  • Additionally, we have a very heavy focus on the automation of this testing to allow it to easily run on every build and in every machine configuration. As a developer, there's nothing like waking up in the morning to find a report indicating just how well your changes from the previous day have performed.
  • And it should go without saying that we heavily use our own products throughout the development and test process. For example, the VMware ESX Server installer was developed within a VMware Workstation-based virtual machine and most of our Virtual Center UI cross-OS testing took place within a VM as well. There's really no better way to ensure a good experience across a plethora of operating system and browser versions!

By a jury of your peers...

  • Code reviews are consistently ranked as one of the most effective means of avoiding bugs in software. They catch potential bugs extremely early when they're least expensive to fix. They can catch subtle issues that would often take significant test cycles to uncover. And along the way, code reviews are a great way to give developers a broader view of the product!
  • We've developed a rich set of tools and processes for electronic code reviews. Almost every non-trivial code check-in for this release has gone through this process.

Dogfood duty

  • Technology companies often refer to "flying your own airplane" or the more colorful "eating your own dogfood". This practice refers to using your own in-development product within the company, and we've embraced it at VMware.
  • One of the more interesting things we've done is actually run the ESX and Virtual Center build system within VMware Infrastructure. Our system cranks out a variety of builds: light builds after each check-in, pre-checkin builds by developers for testing, full builds for QA system testing, and many other flavors as well. We have been running 18 Build VMs in a 4-host VMware DRS cluster for several months now. These VMs must be up 24/7, and this early production-level usage helped us uncover and fix a variety of issues. There's something interesting about a product used to build itself. Makes you ponder what compiled the first compiler as well.

The Biggest Beta Program Yet

  • Prior to this release, our enterprise products have had rather limited beta programs: typically no more than 50 customers have participated. Fortunately we've been able to ramp this beta program up tremendously in VMware Infrastructure 3. Thanks, Larry!
  • Starting with an early beta release in October of last year, we've released 4 different builds to a very broad audience. As of this writing, we're at more than 6000 beta participants representing 50 different countries. More than 50% of the beta testers are outside of the United States. Here's a particular thank you to those of you testing in Mauritius, Namibia, and Papua New Guinea!
  • Along with the broad distribution of beta code, we've had a very active set of discussions via the VMTN Forums and in open conference calls. This has been a very useful way for us to receive feedback directly from the participants. Hopefully our online tech talks covering the new product features has enabled more effective beta testing along the way.

I hope you've found this quick view into some of our approaches to quality interesting. We hope this obsession continues to inspire fans such as this one. Now go ahead... download the free evaluation and give it a try!

posted at: 10:09 | | | permanent link

Steve Herrod,
Vice President of Technology Development,
VMware, Inc.

Archives

Disclaimer

The postings on this site are the individual poster's and do not represent VMware's positions, strategies or opinions.