TechNewSources: Unraveling Hadoop and Spark Performance Mysteries

Wednesday, September 14, 2016

Unraveling Hadoop and Spark Performance Mysteries

What do you do when your #Spark or #Hive job runs like molasses? If you’re like most end-users who lack in-depth technical skills, the answer is “not much.” Now a startup named #UnravelData is working to show you what’s actually going on in the cluster, and provide some configuration recommendations and automatic fixes as well. “Big data operations is usually considered a black art,” says Kunal Agarwal, the co-founder and CEO of Unravel Data, which came out of stealth mode today three years after its founding. “People don’t usually understand what’s happening in the stack, and this impedes its performance.” Agarwal studied parallel programming and distributed systems while enrolled in Duke University’s computer science program, and got a taste for how complex #Hadoop and #Spark clusters can be. When he left North Carolina to enter the big data space, he looked for ways he could make the biggest impact. His focus quickly turned to the operations side of things, which he terms “Data Ops.” “End users are spending more than half of their day trying to solve these issues and getting productive on the big data stack,” he tells Datanami. “People don’t necessarily understand distributed computing and parallel processing, and getting performance and reliability out of these applications is super hard.”

https://www.datanami.com/2016/09/13/unraveling-hadoop-spark-performance-mysteries/