TechNewSources: Look for advances in serving the Hadoop data scientist

Thursday, June 22, 2017

Look for advances in serving the Hadoop data scientist

#Hadoop and its proponents have been stymied in bringing the distributed data framework to wider use, as very special, end-to-end skills are required to get this style of distributed big data application up and running.

However, changes are afoot. The original style of Hadoop based on MapReduce and the Hadoop Distributed File System (HDFS) has given way to interchangeable configurations that may use neither MapReduce, nor HDFS. Cloud-based Hadoop is on the upswing. And vendors are trying to bring self-service capabilities to the Hadoop data scientist.

To some extent, #Spark was a reaction to Hadoop's initial complexity. Spark set about to improve upon original Hadoop's MapReduce data processing model, and also upped the level of abstraction for programmers. Building systems still required Java programmers to do much of the work, but they did not have to deal with the down-and-dirty camshafts and flywheels of fairly raw Java. Similar motivation was also behind the tools created to open up Hadoop to SQL and a wider programming audience.

Yet, the problem remained: Hadoop and Spark in production called for near-superhero users who possessed skills covering a range of jobs -- including system admin, system programmer, Java developer and data engineer. Why not throw in domain expert, statistician and Hadoop data scientist, too?

http://searchdatamanagement.techtarget.com/opinion/Look-for-advances-in-serving-the-Hadoop-data-scientist