EMC Looks to Be Pivotal for Big Data

March 6, 2013 in Big Data, Business Analytics, Business Intelligence (BI), Cloud Computing, Information Applications (IA), Information Management (IM), Location Intelligence | Tags: Cirro, Cloudera, EMC, Hadoop, HAWQ, HDFS, Hive, HortonWorks, MapR, Pivotal HD, Tableau Software

The big-data landscape just got a little more interesting withspacer the release of EMC’s Pivotal HD distribution of Hadoop. Pivotal HD takes Apache Hadoop and extends it with a data loader and command center capabilities to configure, deploy, monitor and manage Hadoop. Pivotal HD, from EMC’s Pivotal Labs division, integrates with Greenplum Database, a massively parallel processing (MPP) database from EMC’s Greenplum division, and uses HDFS as the storage technology. The combination should help sites gain from big data a key part of its value in information optimization.

Greenplum and EMC have been working with Hadoop technology to provide robust database and analytic technology offerings. EMC is using Hadoop and HDFS as a foundation to support a new generation of information architectures, on top of which the company provides a value-added layer of data and analytic processing to support a range of big data needs. The aim is to address one of the benefits of big data technology, which is to increase the speed of analysis; our big data benchmark research found that to be a key benefit for 70 percent of organizations.

EMC is placing a bet by building its distribution on top of Apache Hadoop 2.02, which has yet to be officially released. The company is testing its software on a thousand-node cluster to ensure it will be ready. While EMC calls Pivotal HD the most powerful Hadoop distribution, it is one of many new providers that are building on Hadoop technologies and commercializing it for organizations looking for direct support and services or looking for value-added technology on top of Hadoop. Oddly, however, EMC’s new offering appears to be competitive with its own licensing of MapR for a product it calls Greenplum MR.

EMC is calling the advanced database processing technology with Pivotal HD a new name of HAWQ. It provides the ability to use ANSI SQL in an optimized manner against big data through a query parser and optimizer with its own HAWQ nodes process query execution against HDFS data nodes. HAWQ also has its own Xtension Framework for adaptability to other technologies. HAWQ improves upon the performance of regular SQL as it is a specialized technology to manage distributed and optimized queries to data in Hadoop.

By supporting SQL as the language to get to Hadoop, HAWQ simplifies standardized access to big data through this approach that provides query optimization through its query planning and pipelining methods. Providing a SQL interface and an ODBC connection is not new; many Hadoop distributions now provide ODBC connectivity, including Cloudera, Hortonworks and MapR. EMC, however, uses its optimized query and SQL connection in HAWQ as an accelerator, which lets it stack its software technology up against any data and analytic technology, not just Hadoop. The question for organizations thinking about making an investment in this approach is whether they are limiting their access to future Hadoop advancements by investing in HAWQ technology that operates with only the Pivotal HD distribution or does the gains provide immediate value to separate any Hadoop challenges in optimizing its infrastructure. It is my belief that if an organization adopts this path of HAWQ, it will need to ensure it invests in an information architecture that includes integration technology at the HDFS level, as businesses will inevitably be operating against varying flavors of Hadoop.

Another area of differentiation EMC promises for HAWQ is in the area of performance. EMC claims exponential performance improvement using its query optimizer and SQL versus using Hive to access HDFS or Cloudera Impala and native Hadoop. In fact it claims 19 to 648 times faster performance using its own benchmark. Since these benchmarks were not run independently, it is hard to place significant value in them for now. I made inquiries to many Hadoop software providers, including Cloudera, and they said these metrics are probably not that accurate and invited performance comparisons against their technologies. Clearly these benchmarks should have been released to the Hadoop community for its members to design optimized queries using Hive for more accurate comparisons, but EMC is hoping that its results will entice IT professionals to try it for themselves.

EMC’s stature in the market and its work with a broad range of technology partners makes it an important player in the big data market. Tableau Software is one of those partners, providing discovery on data from HAWQ and Pivotal HD for analytics. Cirro also announced support for Pivotal HD, enabling a new generation of what I call big data integration. These partners are good examples and provide EMC a more complete stack of technologies for operating in a more enterprise approach for big data from analyst to connectivity to other data sources.

EMC can deploy its big data technology across a variety of deployment methods, including public cloud with OpenStack and Amazon spacer Web Services (AWS), private cloud using VMware, and on-premises. Our big data research shows faster growth planned for hosted (59%) and software as a service (65%) than for future on-premises deployments. While EMC is not allowed to publicly mention its customer references, and I have yet to validate them, the company says they include some of the largest banks and manufacturers.

Meanwhile, the Hadoop community’s new project Tez provides an alternative to bypass MapReduce to improve performance. It uses Hadoop YARN for a more efficient run time and better performance for queries. Also, the Stinger Initiative is a project to improve interactive query support for Hive.

EMC acknowledges open source efforts that focus on improving the performance of accessing HDFS and look forward to those advancements and where they can be extracted into its Pivotal HD product but points to its query optimizer and ANSQ SQL as a better approach. It also did not deny that its performance comparisons could have been more optimized. But EMC is betting that its HAWQ efforts and its reliance on the next release of Apache Hadoop 2 will place it in a good market position, leveraging open source technology that is expected to be released in 2013.

This move to introduce Pivotal HD Enterprise and HAWQ is clearly an opportunity to accelerate EMC’s efforts. Greenplum’s technology needed assistance to grow its adoption as it competes with approaches that encompass not only Hadoop but also in-memory, appliance and RDBMS technology. Only time will tell how EMC’s focus on big data with Pivotal HD and HAWQ will play out. The battle among big data providers continues to be very competitive, with dozens of approaches. As each company moves from experimentation to development to production, it must carefully determine what technology will best meet its unique needs. Organizations should evaluate HAWQ and Pivotal HD on not just the merits of performance or providing SQL access but on the architectural and management needs of IT that span from adaptability, manageability, reliability and usability and the business value that should be ascertained with this technology compared to other Hadoop and big-data technology approaches.

Regards,

Mark Smith

CEO & Chief Research Officer

Rate this:

Share this:

  • Email
  • Facebook
  • Twitter
  • LinkedIn
  • Google
  • StumbleUpon
  • Print
  • Tumblr
  • Reddit
  • Pinterest

Follow & Network with Mark

  • Google+
  • spacer Klout
  • spacer Kred
  • spacer LinkedIn
  • spacer Plaxo
  • spacer Twitter

Collaborate with Ventana

  • spacer Facebook Fan Page
  • spacer Facebook Group
  • Follow Ventana on LinkedIn
  • Google+
  • spacer LinkedIn
  • spacer Twitter
  • spacer Ventana Research Website

Other Analyst Perspectives

  • Richard Snow
  • Robert Kugel
  • Stephan Millard
  • Tony Cosentino
  • Ventana Research

spacer Mark Smith’s Analyst Perspectives at Ventana Research

  • Big Data Requires Integration Technology November 7, 2014
  • Salesforce Analytics Cloud Delivers Wave of Elegant Dashboards November 5, 2014
  • Oracle Provides Cloud Platform and Applications for Everyone in Sales November 5, 2014
  • The Future of Integrating ERP and Applications in the Cloud July 31, 2014
  • Teradata Takes Bigger Approach to Big Data June 17, 2014

Categories

  • Big Data
  • Business Analytics
  • Business Collaboration
  • Business Intelligence (BI)
  • Business Mobility
  • Business Performance Management (BPM)
  • Cloud Computing
  • Customer Performance Management (CPM)
  • Financial Performance Management (FPM)
  • Governance, Risk & Compliance (GRC)
  • Information Applications (IA)
  • Information Management (IM)
  • IT Performance Management (ITPM)
  • Location Intelligence
  • Operational Intelligence
  • Operational Performance Management (OPM)
  • Other
  • Sales Performance Management (SPM)
  • Social Media
  • Supply Chain Performance Management (SCPM)
  • Sustainability
  • Uncategorized
  • Workforce Performance Management (WPM)

Mark Smith – Twitter

  • RT @ventanaresearch: Insights from @marksmithvr on @salesforce & Wave of Dashboards ow.ly/DZpw1 #DF14 6 hours ago
  • IBM Steve Mills can stretch his arms further than any other software exec to answer ? & make point! #swgai t.co/HwVBoDlrde 8 hours ago
  • Mills "A lot of immature billionaire CEO's & operate in reality distortion field" & "we are oldest & most mature biz in room" #SWGAI 8 hours ago
  • RT @sandy_carter: Today the ecosystem is crucial for success Steve Mills #SWGAI t.co/VmzxEwzNn4 8 hours ago
  • RT @pund_it: IBM's Mills: Being a billionaire changes your orientation to pretty much everything else in the world. #swgai 8 hours ago

Ventana Research

Archives

  • November 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • August 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • July 2010
  • June 2010

Top Rated

Stats

  • 167,039 hits

Analytics Android Apple Big Data Business Analytics Business Collaboration Business Intelligence Business Mobility Business Technology CFO Chief Information Officer CIO Cloud Computing Cloudera CMO Collaboration Compensation CRM Customer Analytics Data Data Governance Data Integration Enterprise Software Hadoop HortonWorks HR HRMS Human Capital Management human resources IBM Informatica Information Applications Information Management Information Optimization Information Technology Learning learning management systems LMS Marketing Market Research Master Data Management Metrics
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.