TechNewSources: APACHE HADOOP 3.1.0 RELEASED. AND A LOOK BACK!

Monday, April 9, 2018

APACHE HADOOP 3.1.0 RELEASED. AND A LOOK BACK!

The @Apache #Hadoop community just announced the release of Apache Hadoop 3.1.0! This is the next big milestone in the Apache Hadoop 3.x line, the first one being the 3.0.0 release back in December 2017. It’s a significant release for many readers of this blog who are on the Apache Hadoop 2.x series. This release is *not* yet ready for production use. Critical issues are being ironed out via testing and downstream adoption. Production users should wait for a 3.1.1/3.1.2 release. The Hadoop community fixed 768 JIRAs (https://s.apache.org/apache-hadoop-3.1.0-all-tickets) in total as part of the 3.1.0 release. Of these fixes: – 141 in Hadoop Common – 266 in HDFS – 329 in YARN – 32 in MapReduce Apache Hadoop 3.1.0 contains a number of significant features and enhancements. A few of them are noted below. Hadoop Common – HADOOP-14831 / HADOOP-14531 / HADOOP-14825 / HADOOP-14325. S3/S3A/S3Guard related improvements. Hadoop HDFS – HDFS-9806 – HDFS block replicas to be provided by an external storage system Hadoop YARN – YARN-6223. First class GPU support on YARN – YARN-5983. First class FPGA support on YARN – YARN-5079 / YARN-4793 / YARN-4757 / YARN-6419. YARN native service support – YARN-6592. Rich placement constraints in YARN – YARN-5881. Enable configuration of queue capacity in terms of absolute resources for Capacity Scheduler. – YARN-7117. Capacity Scheduler: Support auto-creation of leaf queues while doing queue mapping A QUICK LOOK BACK We wanted to use this opportunity to also look back a bit and analyze how the Hadoop code-base changed over time and across recent releases. The following charts show some of the statistics. Note that in some cases, we picked 2.8.3 as the base-line 2.x release since that is likely the closest 2.x release that enjoys the most installation footprint around the world.