VMware

Enabling Highly Available and Scalable Hadoop

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters and nodes of computers, designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop is being used by enterprises across verticals for Big Data analytics to help make better business decisions based on large data sets.

VMware enables you to easily and efficiently deploy and use Hadoop on your existing virtual infrastructure. Leverage vSphere to deploy an Apache Hadoop cluster in minutes, while improving availability of the Hadoop cluster. VMware is making two open source contributions: Hadoop Virtual Extensions to make Hadoop virtualization-aware, and Serengeti to enable deployment of a Highly Available Hadoop cluster in minutes.

Deploy a Hadoop cluster on vSphere in 10 minutes
  • Achieve better Apache Hadoop performance in virtual environments through Hadoop Virtualization Extensions
  • Simplify operations with rapid deployment of Apache Hadoop cluster, including distributions from multiple vendors with Serengeti
  • Remove single point of failure through one click High Availability for Apache Hadoop NameNode and JobTracker, as well as Hadoop tools like Pig and Hive

Improve Hadoop performance with Hadoop Virtualization Extensions (HVE)

VMware is working with the Apache Hadoop community to enhance the support for failure and locality topologies and make Hadoop virtualization-aware. The changes are not specific to VMware hypervisors. These extensions will allow Hadoop to evolve new deployment models on virtualized infrastructure and should enable enterprises to deploy more elastic and secure Hadoop clusters.

Deploy Hadoop in minutes with Serengeti

Deploy an Apache Hadoop cluster (HDFS, MapReduce, Pig, Hive) in minutes on your existing vSphere cluster using Serengeti , an open source project headed by VMware.

Provide One Click High Availability

Leverage vSphere capabilities to improve availability of Hadoop clusters.

  • One-click High Availability (HA) and Fault-tolerance (FT) for Hadoop NameNode and JobTracker
  • One-click High Availability (HA) for Hadoop tools such as Apache Pig and Apache Hive
  • VMware vMotion to reduce planned downtime
Share
       

 

Try Serengeti


Download Serengeti Virtual Appliance for free

Download Now