Thursday, September 29, 2016

Microsoft HDInsight gets Spark 2.0, faster Hive, and better security

When #Microsoft started out dipping its toes into the #Hadoop waters, it worked with #Hortonworks to port #Hadoop to #Windows and run it in the #Azure cloud. But running Hortonworks Data Platform (HDP) for Windows meant #HDInsight (as Hadoop on Azure was eventually branded) was always a step behind the more mainstream #Linux distributions, and constantly playing catch-up. When Microsoft decided to offer HDInsight clusters running on Linux, everything changed. Support from across the industry materialized and the newest Hadoop features were added to the service in much faster timeframes.
Still, HDInsight has been due for a polishing, and today Microsoft is announcing just that. A new version of HDInsight, based on HDP 2.5 is launching today and, along with it, some Microsoft-specific security and application integrations that make HDInsight a contender for leading cloud Hadoop offering.

Spark in its eye
So what's inside? Apache Spark 2.0, to start. This version of Spark includes technology from Project Tungsten, giving Spark the power of vectorized computations. Along with the new version of Spark itself, HDInsight will now include support for Apache Zeppelin notebooks, which let developers build scrapbook-like compositions of code and data visualizations that run on Spark.

Also read: Spark comes to Azure HDInsight

HDInsight had already offered similar capabilities using Jupyter, another open source notebook technology. But it's nice to see HDInsight include both notebook technologies, in parity with most other Hadoop offerings. Another nice Spark-related addition is that of a Spark-HBase connector, allowing Spark SQL to be used -- from notebooks or elsewhere -- to query data in Apache HBase.

Hive moves into express lane
Using HDP 2.5 under the hood also means that Microsoft can ship Apache Hive's new LLAP ("Live Long And Process") mode, stemming from the "Stinger.Next" initiative around Hive. As I reported a year and a half ago, the technology combines Hive running on Apache Tez with caching, vectorization, and other optimizations to deliver what both Microsoft and Hortonworks claim are sub-second response times
http://www.zdnet.com/article/microsoft-hdinsight-gets-spark-2-0-faster-hive-and-better-security/

No comments:

Post a Comment