Getting Started with Apache Spark: the Definitive Guide

Data Books

If you work in Data Science or IT, you’re probably already familiar with Apache Spark. In practice, Spark has grown exponentially in 2015, and in some use cases it has matched or even surpassed Hadoop as the open source Big Data framework of choice. Vendors are beginning to hop on board as well, as Talend, Altiscale and Pentaho have all enhanced their integration platforms with Spark in recent months.

With all of the highly technical chatter out there, it can be hard to understand what Spark can help your organization do. Thankfully there’s LinkedIn’s Slideshare, a resource where users and companies can host webinars and presentations for public access. We combed through thousands of presentations on the site using the Spark keyword to find a series of eight created by Databricks, a company who revolutionizes data processing through the Spark platform.

The slideshows, which were all presented by Databricks at Spark Summit EU 2015 in late October, outline various topics on Spark, as you’ll see below:

The evolution of Spark: where is it being used, for what purpose, and by whom?

A technical overview of Spark’s DataFrame API: Implementation and more:

An inside look at Spark’s development, both frontend and backend:

Databricks outlines emerging trends, common issues, and solutions:

How do users integrate common data science tools like Python, with Spark?

What have users learned in migrating from Data Warehouses to Spark?

Databricks’ CEO discusses the impact Spark has had in the enterprise:

How do Spark clusters and R facilitate analysis of Big Data?

There you have it! A nice selection of Spark presentations to help you cut through all of the other information out there on the web. For more on Spark, stay tuned into Solutions Review.


Data Integration Buyer's Guide Solutions ReviewWhich Data Integration Solution is Right for You? Find out in our Buyer’s Guide:

  • Complete and comprehensive rundowns of the top DI vendors and what their solutions include
  • Bottom line descriptions of each solution and their strengths
  • Important questions to ask yourself and potential vendors when considering a solution
  • Market overview of the current DI space
Download Now

 

Timothy King
Follow Tim

Timothy King

Editor, Data and Analytics at Solutions Review
Timothy leads Solutions Review's Business Intelligence, Data Integration, and Data Management areas of focus. He is recognized as one of the top authorities in Big Data, and the number-one authority in enterprise middleware. Timothy has also been named one of the world's top-75 most influential business journalists by Richtopia.
Timothy King
Follow Tim