Wednesday, May 24, 2017

The siren song of Hadoop

#Hadoop seems incredibly well-suited to shouldering machine-learning workloads. With #HDFS you can store both structured and unstructured data across a cluster of machines, and #SQLonHadoop technologies like #Hive make those structured data look like database tables. Execution frameworks like #Spark let you distribute compute across the cluster as well. On paper, Hadoop is the perfect environment for running compute-intensive distributed machine learning algorithms across a vast amount of data. Unfortunately, though, #Hadoop seems incredibly well-suited for a lot of other things too. Streamingdata? #Storm and #Flink! Security? #Kerberos, #Sentry, #Ranger, and #Knox! Data movement and message queues? #Flume, #Sqoop, and #Kafka! #SQL? #Hive, #Impala and #Hawq! The Hadoop ecosystem has become a bag of often overlapping and competing technologies. #Cloudera vs. #Hortonworks vs. #MapR is responsible for some of this, as is the dynamism of the open source community

http://www.computerworld.com/article/3196509/data-analytics/the-siren-song-of-hadoop.html

No comments:

Post a Comment