Dell, EMC, Dell Technologies, Cisco,

Thursday, June 8, 2017

Cask Shortens Time-to-Value for Data-Driven Applications on Apache Spark and Hadoop

SAN FRANCISCO, CA--(Marketwired - June 07, 2017) - #SparkSummit -- #Cask (cask.co), the company that makes building and deploying big data solutions easy, today at Spark Summit San Francisco announced the newest release of #Cask Data Application Platform (CDAP), version 4.2. #CDAP 4.2 delivers expanded support for #Spark and enhanced, user-centric self-service data ingestion and preparation capabilities. It also expands the pre-built and easy-to-deploy solutions in Cask Market to include #ChangeDataCapture (CDC) support for #SQL Server and #Oracle. These new capabilities will help accelerate user productivity in #Spark and #Hadoop projects, reducing initial time to value and time to production for #bigdata solutions significantly. "Simplified code development, workload flexibility and faster data processing have generated huge interest in Spark", said Jonathan Gray, Cask founder and CEO. "But as is the case with many other big data technologies, operationalizing Spark and scaling workloads from prototype to production present their own set of challenges for IT teams, greatly extending timelines and often putting the success of projects at risk. With broad support for Spark and increasingly code-free, interactive data integration capabilities, this latest release of CDAP dramatically shortens the time to prepare and ingest data and to test, run and deploy Spark data pipelines on that data. This means simplified onboarding, better productivity and faster time to production for data lakes and data-driven applications on Spark and Hadoop." CDAP 4.2 adds support for Spark 2.x, which includes the new DataFrame/DataSet/SQL APIs, as well as the new Spark2 runtime. As a result, CDAP users will be able to easily upgrade their Spark programs from Spark 1.x to Spark 2.x. Furthermore, when building a data pipeline, or exporting a data pipeline from their CDAP SDK environment to a cluster, users will not have to be concerned with the version of Spark running on the target cluster. CDAP 4.2 also adds a new, interactive experience for Spark developers, enabling them to add custom Spark transformation logic to a data pipeline, run the code and get results from it quickly, all directly from the user interface. CDAP 4.2 introduces a new user-centric, self-service data preparation workflow that allows users to easily connect to existing data sources, offers them simple point-and-click interactive data preparation to transform data, and provides push button operationalization of ingestion and transformation work as production pipelines. Additional enhancements in CDAP 4.2 include advanced scheduling capabilities designed to boost scalability and flexibility in production environments. The new, more scalable CDAP scheduler allows for data-driven schedules, event-based triggers, and the definition of constraints that can be used to triage multiple jobs running on the same cluster. The Cask Market update for CDAP 4.2 offers new, pre-built assets, expanding the list of reusable, ready-to-use big data solutions and components available for push button deployment. Introducing EDW Offload as a pre-built, packaged solution in Cask Market with CDAP 4.1, Cask Market now offers real-time Change Data Capture (CDC) for SQL Server and Oracle with Spark Streaming, enabling data to be in sync between the source databases and Hadoop. This allows CDAP users to use Change Data Capture instead of traditional ETL for their EDW Offload workloads, improving efficiency while reducing latency of the data extracted from their source data systems. In addition to CDC, Cask Market now also features XSD-based, complex XML readers as well as connectors for Apache Kafka, Apache Kudu, HP Vertica and others. "Enterprises derive the most value from Hadoop and Spark with configurable data applications. Yet these applications can be hard to create, and even harder to manage in production settings", said John L. Myers, Managing Research Director, Enterprise Management Associates, a Boulder, CO-based analysis firm. "CDAP 4.2 encapsulates the complexity and difficulties of the do-it-yourself approach from organizations. This approach empowers companies to tackle big data applications from data prep to production implementation quickly and speed time to implementation.

http://m.marketwired.com/press-release/cask-shortens-time-to-value-for-data-driven-applications-on-apache-spark-and-hadoop-2220578.htm

No comments:

Post a Comment