The 12 Best Apache Spark Courses and Online Training for 2020

The 12 Best Apache Spark Courses and Online Training for 2020

The editors at Solutions Review have compiled this list of the best Apache Spark courses and online training to consider for 2020.

Apache Spark is a unified analytics engine for large-scale data processing. It is noted for its high performance for both batch and streaming data by using a DAG scheduler, query optimizer, and a physical execution engine. Spark offers more than 80 high-level operators that can be used interactively from the Scala, Python, R, and SQL shells. The engine powers a stack of libraries including SQL and DataFrames, MLib for machine learning, GraphX, and Spark Streaming. Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the cloud.

With this in mind, we’ve compiled this list of the best Apache Spark courses and online training to consider if you’re looking to grow your data analytics skills for work or play. This is not an exhaustive list, but one that features the best Apache Spark courses and online training from trusted online platforms. We made sure to mention and link to related courses on each platform that may be worth exploring as well. Click Go to training to learn more and register.

Introduction to Spark with sparklyr in R

Platform: DataCamp

Description: R is mostly optimized to help you write data analysis code quickly and readably. Apache Spark is designed to analyze huge datasets quickly. The sparklyr package lets you write dplyr R code that runs on a Spark cluster, giving you the best of both worlds. This course teaches you how to manipulate Spark DataFrames using both the dplyr interface and the native interface to Spark, as well as trying machine learning techniques. Throughout the course, you’ll explore the Million Song Dataset.

Related paths/tracks: Machine Learning with PySpark, Introduction to Spark SQL in Python, Cleaning Data with PySpark

Go to training

Big Data Analysis with Scala and Spark

Platform: Coursera

Description: In this course, you’ll see how the data parallel paradigm can be extended to the distributed case, using Spark throughout. It covers Spark’s programming model in detail, being careful to understand how and when it differs from familiar programming models, like shared-memory parallel collections or sequential Scala collections. Through hands-on examples in Spark and Scala, you’ll learn when important issues related to distribution like latency and network communication should be considered and how they can be addressed effectively for improved performance.

Related paths/tracks: Big Data Essentials: HDFS, MapReduce and Spark RDD, Scalable Machine Learning on Big Data using Apache Spark, Distributed Computing with Spark SQL

Go to training

Apache Spark and Scala Certification Training

Platform: Edureka

Description: Apache Spark and Scala Certification Training is designed to prepare you for the Cloudera Hadoop and Spark Developer Certification Exam (CCA175). You will gain in-depth knowledge on Apache Spark and the Spark Ecosystem, which includes Spark RDD, Spark SQL, Spark MLlib and Spark Streaming. You will get comprehensive knowledge on Scala Programming language, HDFS, Sqoop, Flume, Spark GraphX and messaging systems like Kafka.

Related path/track: Python Spark Certification Training using PySpark

Go to training

Apache Spark with Scala – Hands On with Big Data!

Platform: Udemy

Description: This course is very hands-on; you’ll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon’s Elastic MapReduce service. 7.5 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Related paths/tracks: Taming Big Data with Apache Spark and Python – Hands On!, Streaming Big Data with Spark Streaming & Scala – Hands On!

Go to training

Apache Spark, Scala and Storm Training

Platform: IntelliPaat

Description: This is a combo course in Spark, Storm and Scala that is designed keeping in mind the industry requirements for high-speed processing of data. Taking this training will fully equip you with the skill sets to take on the challenges in the big data Hadoop ecosystem in the real world regardless of industry vertical. This training course includes learning the Apache Spark processing engine, along with programming in the general-purpose language Scala, and it provides in-depth knowledge of the Apache Storm computation system.

Related paths/tracks: Apache Spark and Scala Certification Training, Big Data Hadoop, Spark, Storm and Scala Training

Go to training

Big Data Hadoop Certification Training Course

Platform: Simplilearn

Description: The Big Data Hadoop certification training is designed to give you an in-depth knowledge of the Big Data framework using Hadoop and Spark. In this hands-on Hadoop course, you will execute real-life, industry-based projects using Integrated Lab. Professionals entering into Big Data Hadoop certification training should have a basic understanding of Core Java and SQL. If you wish to brush up your Core Java skills, Simplilearn offers a complimentary self-paced course Java essentials for Hadoop when you enroll for this course.

Related path/track: Apache Scala and Spark Certification Training

Go to training

Apache Spark 3 with Scala: Hands On with Big Data!

Platform: Skillshare

Description: This course is very hands-on; you’ll spend most of your time following along with the instructor as we write, analyze, and run real code together – both on your own system, and in the cloud using Amazon’s Elastic MapReduce service. 7.5 hours of video content is included, with over 20 real examples of increasing complexity you can build, run and study yourself. Move through them at your own pace, on your own schedule. The course wraps up with an overview of other Spark-based technologies, including Spark SQL, Spark Streaming, and GraphX.

Go to training

Apache Spark SQL

Platform: Experfy

Description: In this course you’ll learn the physical components of a Spark cluster, and the Spark computing framework. You’ll build your own local standalone cluster. You’ll write Spark code. You’ll learn how to run Spark jobs in a variety of ways. You’ll create Spark tables and query them using SQL. You will learn a process for creating successful Spark applications. Students should have familiarity with Python and be comfortable using the Unix command line.

Go to training

Apache Spark Fundamentals

Platform: Pluralsight

Description: In this course, you’ll learn Spark from the ground up, starting with its history before creating a Wikipedia analysis application as one of the means for learning a wide scope of its core API. That core knowledge will make it easier to look into Spark’s other libraries, such as the streaming and SQL APIs. Finally, you’ll learn how to avoid a few commonly encountered rough edges of Spark. You will leave this course with a tool belt capable of creating your own performance-maximized Spark application.

Related paths/tracks: Beginning Data Exploration and Analysis with Apache Spark, Structured Streaming in Apache Spark 2

Go to training

Apache Spark Essential Training

Platform: LinkedIn Learning

Description: In this course, get up to speed with Spark, and discover how to leverage this popular processing engine to deliver effective and comprehensive insights into your data. Instructor Ben Sullins provides an overview of the platform, going into the different components that make up Apache Spark. He shows how to analyze data in Spark using PySpark and Spark SQL, explores running machine learning algorithms using MLib, demonstrates how to create a streaming analytics application using Spark Streaming, and more.

Related paths/tracks: Big Data Analytics with Hadoop and Apache Spark, Apache PySpark by Example

Go to training

Introduction to Apache Spark

Platform: edX

Description: This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), but previous experience with Spark or distributed computing is NOT required. Students should take this Python mini-quiz before the course and take this Python mini-course if they need to learn Python or refresh their Python knowledge.

Related paths/tracks: Big Data Analysis with Apache Spark, Distributed Machine Learning with Apache Spark

Go to training

Data Streaming

Platform: Udacity

Description: Learn how to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark, Kafka, Spark Streaming, and Kafka Streaming. You’ll start by understanding the components of data streaming systems. You’ll then build a real-time analytics application. Students will also compile data and run analytics, as well as draw insights from reports generated by the streaming console.

Go to training

Follow Tim

Timothy King

Senior Editor at Solutions Review
Timothy is Solutions Review's Senior Editor. He is a recognized thought leader and influencer in enterprise BI and data analytics. Timothy has been named a top global business journalist by Richtopia. Scoop? First initial, last name at solutionsreview dot com.
Timothy King
Follow Tim