What is Big Data?
Roadblocks to Successful Big Data Projects
Budget constraints and cost are the top reasons why many companies are shying away from deploying big data, according to a study performed by Deloitte. It can be hard to justify investing in new IT infrastructure to process large amounts of data, especially if the business does not yet havean immediate business case.
Processing big data workloads is different than processing typical enterprise application workloads. Big data workloads are processed in parallel, instead of sequentially. IT typically prioritizes business critical workloads and schedules lower priority jobs in batches at night or when there is excess capacity. With big data analytics, many use cases must be run in real-time for live analysis and reaction. This forces IT to change data center policies and learn new tools to create, manage and monitor these new workloads.
Companies need to choose the right type of infrastructure to run their applications and data. Procuring hardware takes time. Going into the cloud may be great for a proof of concept, but it carries the risk of platform lock-in, comes with security concerns and incurs tremendous cost at scale. Companies must also decide which Hadoop distribution to select, with Cloudera, Hortonworks, MAPR, and Pivotal all offering competing (and incompatible) architectures. There are many decisions that, once made, make it difficult for a company to pivot later so many companies just delay having the big data conversation.
Benefits of VMware Big Data
Spotlight on Hadoop
The role of infrastructure, whether it’s physical or virtual, is to support applications. This includes traditional business critical applications as well as modern cloud, mobile and big data applications. Industry analyst, Doug Laney, defined Big Data in terms of the three Vs:
- Volume: Terabytes, records, transactions, tables and files
- Velocity: Batch, near time, real time and streams
- Variety: Structured, unstructured and semi structured
Apache Hadoop (aka Hadoop) is open source software used for distributed storage and processing of big data. Hadoop has been packaged and integrated into large distributions (aka distros) by companies such as Cloudera, Hortonworks, MAPR and Pivotal to run big data workloads. Virtualizing big data applications like Hadoop offers a lot of benefits that cannot be obtained on physical infrastructure or in the cloud. Simplifying the management of your big data infrastructure gets faster time to results, making it more cost-effective. VMware is the best platform for big data just as it is for traditional applications.