Dell, EMC, Dell Technologies, Cisco,

Monday, June 6, 2016

Spark Makes Inroads into NoSQL Ecosystem

#Apache #Spark may have gained fame for being a better and faster processing engine than #MapReduce running in #Hadoop clusters. But the in-memory software is increasingly finding use outside of Hadoop, including integration with operational #NoSQL databases.

Spark is currently supported in one way or another with all the major NoSQL databases, including #Couchbase, Datastax, and #MongoDB. #Datastax may have been the first to announce support for Spark nearly two years ago with Apache Cassandra, but today all of the “big three” open source NoSQL database offer Spark connectors. And Spark is supported in some manner with a range of other NoSQL databases, including those from #Aerospike, Apache #Accumulo, #Basho‘s Riak, #Neo4J, #Redis, and #MarkLogic.

The primary use case for deploying Spark and NoSQL databases together involves bridging the transactional and analytic divide. Practical examples typically fall into several buckets, including powering product recommendations in an ecommerce site, doing deep analyses of IoT data, driving customer-360 initiatives, and detecting fraud.

Spark and NoSQL make a good combination, as they complement each other’s strengths. Organizations today are often picking NoSQL databases over relational databases to power large-scale Web, mobile and IoT applications that need schema flexibility, support for semi-structured data types like JSON, and horizontal scalability on commodity hardware.

And just as relational data warehouses from Teradata (NYSE: TDC) , IBM (NYSE: IBM), and Oracle (NYSE: ORCL) have traditionally been the source of business insights that are put into action with relational databases, we’re now seeing Apache Spark take on that role. Analysts and data scientists use Spark to crunch all sorts of data in search of business insights, via machine learning algorithms, graph analytics, or straight SQL analyses. Those insights are then fed back into the operational system, which increasingly resides atop a NoSQL database.

Demand for Spark capabilities is growing, according to Couchbase’s director of big data product management Will Gardella. “Definitely there are a lot of people who are asking for it,” he says. “Spark is really popular and it’s getting a lot of mindshare. A lot of the stuff people used to do in Hadoop, they like to look at Spark for.”

Couchbase has offered a Spark Connector for its NoSQL database for over a year. Just as other NoSQL vendors, Couchbase’s Spark connector enables Couchbase data to be materialized as Spark DataFrames and Datasets, which makes that data available to Spark’s SQL, machine learning, and graph APIs.

Today at the Spark Summit, Couchbase announced Spark Connector version 1.2. Gardella says the main focuses with the new connector are around speed and performance.

The first way the new connector improves speed is through better data locality. “In Couchbase, we know where every document on the Couchbase cluster is,” Gardella says. “We can use that information to find exactly where to go for certain kinds of queries so they can be ultra-efficient. We get exactly the piece of data we need to exactly the nodes that need to use them.”

http://www.datanami.com/2016/06/06/spark-makes-inroads-nosql-ecosystem/

No comments:

Post a Comment