What’s the Big Deal About Big Data?

Companies have never been in a better position to leverage the mountains of data available today to quickly gain insights for real business results. The technology that delivers this power—Big Data—is at work in virtually every sector of the economy:

  • Retailers are now using predictive analytics to offer individual customers the up-sell or cross-sell items they are most likely to buy in real time, thereby increasing both sales and profits.
  • Financial institutions are looking for anomalies in transaction patterns that can indicate fraud, again in real time.
  • Healthcare providers are linking treatments to outcomes to determine which courses of action have the best results.

These are just a few examples of the many, many operational and bottom line contributions Big Data is making to business. And those contributions multiply and expand when the data is virtualized.

So, what’s the big deal about Big Data? The ability to comb through huge amounts of data enables data scientists to discover patterns and unexpected correlations that enable better, data-driven business decisions at every level of the organization. This is what Big Data is all about.

The Importance of Infrastructure

Infrastructure figures heavily in the conversation around Big Data. Relatively recent innovations in distributed computing and analytics are what make Big Data possible. The most important of these is a platform called Hadoop. (The quirky name comes from a toy elephant owned by one of the developer’s sons.) Essentially, Hadoop has three key capabilities:

  1. It can handle both structured and unstructured data collected from many different (and perhaps incompatible) systems.
  2. It can run Big Data computing tasks in parallel, with many tasks running at once, rather than sequentially, where Task B can’t get started until Task A is finished, etc.
  3. It can enable distributed processing with commodity hardware, so the tasks running in parallel can be parceled out to many different servers.

These three capabilities combine to give businesses higher-quality information (because it comes from more sources) at a faster pace.

Virtualizing Big Data for Big Benefits

All too often, line of business executives who want to obtain quick insights from Big Data opt to implement Hadoop via physical machines, collections of servers referred to as Hadoop clusters. While this is certainly a viable option, VMware recommends a better option that delivers faster results: virtualized Big Data.

In this approach, the physical machines are replaced by virtual machines that can reside on any physical server that’s available. Virtualization technology has improved significantly in the last 10 years. In fact, benchmark studies indicate that once installed—by following VMware’s best practices—virtualization technology can run up to 12 percent faster than a physical server. This approach has both business and technical advantages:

  • Faster time to results. A physical implementation involves ordering the appropriate hardware, waiting for it to arrive, completing the physical installation (racking and cabling), installing the OS, and then configuring the network and the specific version of Hadoop itself so that everything will work. This is a process that can take weeks. In contrast, a virtual implementation can be completed in a few minutes.
  • Lower Cost. Virtualized Big Data can make use of the virtualization technology that is already deployed in the vast majority of data centers worldwide. Normally, managing Big Data infrastructure can be time consuming. However, with virtualized Big Data, the same virtualization management tools that administrators use to set up normal nodes and clusters are also used to set up the Big Data clusters, so the learning curve is significantly reduced. Furthermore, virtualized Big Data can be deployed on a data center’s existing servers, with no need to purchase more hardware. In other words, companies can gain all the benefits of Big Data using the resources they already have.
  • Agility and Flexibility. The Hadoop platform is in a state of rapid evolution, with new capabilities appearing every few months. Adjusting to the changes that are bound to occur, or simply scaling up or down, is much simpler with virtual machines than it is with servers in the physical world, which may have to be physically replaced—a time-consuming and costly process. And the added bonus of virtualized Big Data is that it can better address multitenancy issues.

In today’s competitive environment, Big Data has moved from the “nice to have” to the “must have” category. The fastest, most efficient and cost-effective way to achieve the benefits and insights Big Data can provide is through virtualization.

For more information, visit VMware’s Big Data Extensions page.