Enabling Highly Available and Scalable Hadoop
The
Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters and nodes of computers, designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop is being used by enterprises across verticals for Big Data analytics to help make better business decisions based on large data sets.
VMware enables you to easily and efficiently deploy and use Hadoop on your existing virtual infrastructure. Leverage vSphere to deploy an Apache Hadoop cluster in minutes, while improving availability of the Hadoop cluster. VMware is making two open source contributions: Hadoop Virtual Extensions to make Hadoop virtualization-aware, and
Serengeti to enable deployment of a Highly Available Hadoop cluster in minutes.
- Achieve better Apache Hadoop performance in virtual environments through Hadoop Virtualization Extensions
- Simplify operations with rapid deployment of Apache Hadoop cluster, including distributions from multiple vendors with Serengeti
- Remove single point of failure through one click High Availability for Apache Hadoop NameNode and JobTracker, as well as Hadoop tools like Pig and Hive
Improve Hadoop performance with Hadoop Virtualization Extensions (HVE)
VMware is working with the Apache Hadoop community to enhance the support for failure and locality topologies and make Hadoop virtualization-aware. The changes are not specific to VMware hypervisors. These extensions will allow Hadoop to evolve new deployment models on virtualized infrastructure and should enable enterprises to deploy more elastic and secure Hadoop clusters.
Deploy Hadoop in minutes with Serengeti
Deploy an Apache Hadoop cluster (HDFS, MapReduce, Pig, Hive) in minutes on your existing vSphere cluster using Serengeti , an open source project headed by VMware.
- Get up and running with Hadoop in minutes
- Flexible configuration profile for customizing your cluster
- Run leading Hadoop distributions: Apache Hadoop, Cloudera Distribution Including Apache Hadoop (CDH), Greenplum HD, Hortonworks Data Platform, etc.
Provide One Click High Availability
Leverage vSphere capabilities to improve availability of Hadoop clusters.
- One-click High Availability (HA) and Fault-tolerance (FT) for Hadoop NameNode and JobTracker
- One-click High Availability (HA) for Hadoop tools such as Apache Pig and Apache Hive
- VMware vMotion to reduce planned downtime
