Dell, EMC, Dell Technologies, Cisco,

Wednesday, October 28, 2015

Big Open Source Week for EMC - Part 2: Greenplum is now an open MPP project

Earlier in the year - we announced that we were committing all the elements of the Pivotal Big Data Suite as Open Source projects.   That includes:
  • Greenplum (which an Massively Parallel Processing data warehouse stack - ideal for massive data sets)
  • HAWQ (is now Pivotal HDB which is a Hadoop native SQL engine - ideal for transactional SQL that is fully  and machine-learning algos on SQL).  Expect a lot more on this shortly.  It’s incubating, and needs things like Ambari integration - but it is AMAZING.   Hadoop Native SQL, and incredible capability, incredible performance.   People often use HIVE, or Impala for this - consider Pivotal HDB:


  • MADLib (which is a machine-learning on SQL library).   Works great on Pivotal HDB. 
  • Gemfire (which is a in-memory distributed data grid - unbelievable, stratospheric transactional performance - different than the idea Apache Spark use cases which tend to be iterative/ML type use caseS).   The Apache Geode project is based on this contribution.
… These are all Apache Software Foundation projects.   For our Apache Hadoop core, Pivotal is committed to the Open Data Platform initiative (ODPi) together with HortonWorks.   The ODPi effort aims to maintain as much of the ecosystem interoperable.   
Little sidebar - just like my OpenStack post here, this is an open ecosystem - so of COURSE EMC partners (in fact also resells!) not only Pivotal Big Data Suite, but also HortonWorks (also ODPi) and Cloudera.
The Pivotal Big Data Suite is a huge part of Pivotal’s business - and growing… So why open-source?   Answer - this is market that needs disruption, innovation as closed projects is simply too slow.   Many of the customers wanted features/functions/fixes faster than Pivotal could do alone, and saw open-source projects (even if they were not quite as good) as a way forward.
Why is this specific news about Greenplum important?  
  • How about disrupting a $16B Data Warehouse market that is dominated by legacy players (Oracle, Teradata) who are unwilling/unable to disrupt their existing business models and technology stacks?
  • How about 10+ years of R&D now as an open project?  
  • How about the only material Open Source Software project which is a SQL-compliant data warehouse that scales to many, many PBs.
Any customer that believes in Open Source, is tired of massive bills to legacy players - it’s a good time to take a look at Greenplum :-)
Pivotal Cloud Foundry (a Linux Foundation project) has been a shining light for the rest of the EMC Federation - recognizing that for many efforts, open is the way to go.   It’s arguable that we should have done it earlier with Greenplum, HAWQ, Gemfire - but it’s not arguable that they are open now.

No comments:

Post a Comment