VMware is the best platform for big data as well as traditional applications. Virtualizing big data applications simplifies the management of your big data infrastructure, delivers faster time to results and is more cost effective.

      

(9:45)

VMware and Cloudera: Working together to virtualize Hadoop

See how VMware empowers Big Data

Busting the Myths about Hadoop Virtualization

Learn the facts about virtualizing Hadoop on vSphere

Read Now

Technical Case Study: Adobe Systems

Adobe Deploys Hadoop as a Service on VMware vSphere

Learn More

What Is Big Data


The growth in volume of big data is huge and is coming from everywhere, every second of the day. Systems and devices including computers, smart phones, appliances and equipment generate and build upon the existing massive data sets.
 

But what is big data? Big data is a broad term for both structured and unstructured data sets so large and complex that traditional data processing applications and systems cannot adequately handle them. Big data often powers predictive analytics. Analysis of data sets are used to find new correlations to identify business trends, prevent diseases, combat crime and much more. 
 

Industry analyst, Doug Laney, defined Big Data in terms of the three Vs:

  • Volume: Terabytes, records, transactions, tables and files
  • Velocity: Batch, near time, real time and streams
  • Variety: Structured, unstructured and semi structured

Apache Hadoop (aka Hadoop) is open source software used for distributed storage and processing of big data. Hadoop has been packaged and integrated in large distributions (aka distros) by companies such as Cloudera, Hortonworks, MAPR and Pivotal to run big data workloads.

Roadblocks to Successful Big Data Projects

 

Companies often encounter roadblocks when implementing big data projects. These can include budget constraints, lack of IT expertise and risk of platform lock-in.

Budget Constraints

Budget constraints and cost are the top reasons why many companies are shying away from deploying big data, according to a study performed by Deloitte. It can be hard to justify investing in new IT infrastructure to process large amounts of data, especially if the business does not yet havean immediate business case.

 

IT Expertise

Processing big data workloads is different than processing typical enterprise application workloads. Big data workloads are processed in parallel, instead of sequentially. IT typically prioritizes business critical workloads and schedules lower priority jobs in batches at night or when there is excess capacity. With big data analytics, many use cases must be run in real-time for live analysis and reaction. This forces IT to change data center policies and learn new tools to create, manage and monitor these new workloads.

 

Platform Lock-In

Companies need to choose the right type of infrastructure to run their applications and data. Procuring hardware takes time. Going into the cloud may be great for a proof of concept, but it carries the risk of platform lock-in, comes with security concerns and incurs tremendous cost at scale. Companies must also decide which Hadoop distribution to select, with Cloudera, Hortonworks, MAPR, and Pivotal all offering competing (and incompatible) architectures. There are many decisions that, once made, make it difficult for a company to pivot later so many companies just delay having the big data conversation.

VMware’s Role in Big Data

 

The role of infrastructure, whether it’s physical or virtual, is to support applications. This includes traditional business critical applications as well as modern cloud, mobile and big data applications. 
 

Virtualizing big data applications like Hadoop offers a lot of benefits that cannot be obtained on physical infrastructure or in the cloud. Simplifying the management of your big data infrastructure gets faster time to results, making it more cost effective. VMware is the best platform for big data just as it is for traditional applications.
 

VMware Big Data Is

Simple

Simplify operations and maintenance of your big data infrastructure.

Cost Effective

Decrease CapEx cost through cluster consolidation. Decrease OpEx through automation and simple workflows.

Agile

Get your infrastructure on demand so you can rapidly deliver business value.

Flexible

Try early and often with major big data technologies. Multi-tenancy allows you to run multiple Hadoop distributions on the same virtual machine.

Efficient

Pool your resources and increase server utilization. Automation of workload mobility adds to process efficiencies.

Secure

Ensure control and compliance of your sensitive data.

vSphere Big Data Extensions


VMware vSphere Big Data Extensions (BDE) simplifies running Big Data workloads on the vSphere platform to deliver a new level of efficiency and agility.

Download Now

What vSphere Big Data Extensions Does

Achieve Operational Simplicity with Performance

Use vCenter to install and configure Hadoop clusters that deliver performance comparable to physical deployment configurations.

Maximize Resource Utilization on New or Existing Hardware

Gain resource efficiency through a solution that offers elastic scaling and true multi-tenancy.

Architect Scalable and Flexible Big Data Platform for the Enterprise

Build a flexible platform that can scale seamlessly and position Big Data for long-term growth in the enterprise.