VMware

Enabling Highly Available and Scalable Hadoop

The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters and nodes of computers, designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop is being used by enterprises across verticals for Big Data analytics to help make better business decisions based on large data sets.

VMware enables you to easily and efficiently deploy and use Hadoop on your existing virtual infrastructure. Leverage vSphere to deploy an Apache Hadoop cluster in minutes, while improving availability of the Hadoop cluster. VMware is making two open source contributions: Hadoop Virtual Extensions to make Hadoop virtualization-aware, and Serengeti to enable deployment of a Highly Available Hadoop cluster in minutes.

Deploy a Hadoop cluster on vSphere in 10 minutes
  • Achieve better Apache Hadoop performance in virtual environments through Hadoop Virtualization Extensions
  • Simplify operations with rapid deployment of Apache Hadoop cluster, including distributions from multiple vendors with Serengeti
  • Remove single point of failure through one click High Availability for Apache Hadoop NameNode and JobTracker, as well as Hadoop tools like Pig and Hive

Serengeti is an open source project initiated by VMware to automate deployment and management of Apache Hadoop clusters on virtualized environments such as vSphere. Serengeti code can run Hadoop distributions from multiple vendors.

Deploy a Hadoop cluster on vSphere in 10 minutes
  • Deploy clusters with HDFS, MapReduce, Pig, Hive, and Hive server
  • One command to deploy and use Hadoop clusters

Fully customizable configuration profile to meet your needs
  • Dedicated machines or share with other work load
  • Shared or local storage
  • Static IP or DHCP network
  • Fully control the placement of Hadoop nodes

Manage and use Hadoop easily
  • Scale out a Hadoop cluster
  • Tune Hadoop configuration

Speed up time to insight
  • Upload/download data, run MapReduce job, pig and Hive scripts from PC
  • Consume data in HDFS through Hive server SQL connection using existing tools

Boost up and dismiss compute capacity on demand
  • Separate compute node from data without losing data locality
  • Scale out and shutdown compute node on demand
  • Spin up compute only cluster to analyze data in existing HDFS

Improved availability for Hadoop cluster
  • One click to Highly Available NameNode and JobTracker to avoid single point of failure
  • Fault-tolerance (FT) for NameNode and JobTracker
  • One click HA for Hadoop tools Pig, Hive and Hbase
  • VMware vMotion to reduce planned downtime

Support multiple Hadoop 1.0 based distributions
Share
       

 

Try Serengeti


Download Serengeti Virtual Appliance for free

Download Now