Dell, EMC, Dell Technologies, Cisco,

Thursday, June 2, 2016

Big Data Benchmark Gauges Hadoop Platforms

In another indication of a maturing technology and growing demand, an industry group has released a big data analytics benchmark designed to gauge the performance of #Hadoop -based systems.

The Transaction Processing Performance Council said this week its TPCx-BB benchmark for big data analytics systems covers systems such as #MapReduce, #Apache #Hive, Apache #Spark and Machine Learning Library, or #MLib.

According to the TPC website, the “express” benchmark measures the performance of Hadoop-based systems, including hardware and software components. The benchmark executes 30 frequently performed analytical queries in the context of “retailers with physical and online store presence.”

The queries are expressed in SQL for structured data and in machine learning algorithms for semi-structured and unstructured data. SQL queries can use Hive or Spark while machine learning algorithms use MLib along with “user defined functions and a procedural program,” the benchmark group added.

Along with representing the three data types, the new benchmark simulates big data processing, analytics and reporting for the 30 use cases. Runtimes for the big data simulations range from seconds to hours.

The benchmark workload also addresses data set scaling and can run concurrent threads supporting multiple jobs with different characteristics running on a single cluster or via node scaling. The metric supports Hive on MapReduce as well as Hive running on both Spark and Apache Tez, the framework for building high-performance batch and interactive data processing applications.

The benchmark characteristics ultimately provide performance and price metrics for determining the tradeoffs between data analytics performance and cost, the council said.

http://www.datanami.com/2016/06/01/big-data-benchmark-gauges-hadoop-platforms/

1 comment: