{"id":1175,"date":"2015-11-19T11:51:38","date_gmt":"2015-11-19T16:51:38","guid":{"rendered":"https:\/\/solutionsreview.com\/data-integration\/?p=1175"},"modified":"2018-03-22T12:08:10","modified_gmt":"2018-03-22T16:08:10","slug":"getting-started-with-apache-spark-the-definitive-guide-2","status":"publish","type":"post","link":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/","title":{"rendered":"Getting Started with Apache Spark: the Definitive Guide"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-2699\" src=\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg\" alt=\"Getting Started with Apache Spark: the Definitive Guide\" width=\"800\" height=\"400\" srcset=\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg 800w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF-300x150.jpg 300w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF-768x384.jpg 768w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF-540x270.jpg 540w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF-162x81.jpg 162w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF-360x180.jpg 360w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF-630x315.jpg 630w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<p style=\"text-align: justify\">If you work in Data Science or IT, you&#8217;re probably already familiar with Apache Spark. In practice, <a href=\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/\" target=\"_blank\" rel=\"noopener noreferrer\">Spark<\/a> has grown exponentially in 2015, and in some use cases it has <a href=\"https:\/\/solutionsreview.com\/data-integration\/hadoop-vs-spark-which-big-data-framework-is-better\/\" target=\"_blank\" rel=\"noopener noreferrer\">matched or even surpassed Hadoop<\/a> as the open source Big Data framework of choice. Vendors are beginning to hop on board as well, as <a href=\"https:\/\/solutionsreview.com\/data-integration\/talend-6-the-first-spark-powered-data-integration-platform\/\" target=\"_blank\" rel=\"noopener noreferrer\">Talend<\/a>, <a href=\"https:\/\/solutionsreview.com\/data-integration\/altiscale-data-cloud-4-0-makes-hadoop-spark-more-reliable\/\" target=\"_blank\" rel=\"noopener noreferrer\">Altiscale<\/a> and <a href=\"https:\/\/solutionsreview.com\/data-integration\/pentaho-takes-big-data-lead-with-apache-spark-integration\/\" target=\"_blank\" rel=\"noopener noreferrer\">Pentaho<\/a> have all enhanced their integration platforms with <a href=\"https:\/\/solutionsreview.com\/data-integration\/video-five-ways-to-get-more-from-hadoop-with-apache-spark\/\" target=\"_blank\" rel=\"noopener noreferrer\">Spark<\/a> in recent months.<\/p>\n<div class=\"widget\"><div class=\"aside-card\">\t\t\t<div class=\"textwidget\"><p><a class=\"bgs-speedbump\" title=\"Download link to Data Integration Buyer's Guide\" href=\"https:\/\/solutionsreview.com\/data-integration\/data-integration-buyers-guide\/\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-1682\" src=\"https:\/\/solutionsreview.com\/data-integration\/files\/2019\/02\/di-bg-speedbump.jpg\" alt=\"Download Link to Data Integration Buyer's Guide\" width=\"800\" height=\"225\" \/><\/a><\/p>\n<\/div>\n\t\t<\/div><\/div>\n<p style=\"text-align: justify\">With all of the highly technical chatter out there, it can be hard to understand what Spark can help your organization do. Thankfully there&#8217;s LinkedIn&#8217;s <a href=\"https:\/\/www.slideshare.net\/\" target=\"_blank\" rel=\"noopener noreferrer\">Slideshare<\/a>, a resource where users and companies can host webinars and presentations for public access. We combed through thousands of presentations on the site using the Spark keyword to find a series of eight created by <a href=\"https:\/\/databricks.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Databricks<\/a>, a company who revolutionizes data processing through the Spark platform.<\/p>\n<p style=\"text-align: justify\">The slideshows, which were all presented by Databricks at Spark Summit EU 2015 in late October, outline various topics on Spark, as you&#8217;ll see below:<\/p>\n<p><strong>The evolution of Spark: where is it being used, for what purpose, and by whom?<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Spark Summit EU 2015: Matei Zaharia keynote\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/IsfjTZagcfs0L7\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/spark-summit-eu-2015-matei-zaharia-keynote\" title=\"Spark Summit EU 2015: Matei Zaharia keynote\" target=\"_blank\">Spark Summit EU 2015: Matei Zaharia keynote<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p><strong>A technical overview of Spark&#8217;s DataFrame API: Implementation\u00a0and more:<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structured Data\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/9ZxcAcZCgz83yP\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/spark-summit-eu-2015-spark-dataframes-simple-and-fast-analysis-of-structured-data\" title=\"Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structured Data\" target=\"_blank\">Spark Summit EU 2015: Spark DataFrames: Simple and Fast Analysis of Structured Data<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p><strong>An inside look at Spark&#8217;s development, both frontend and backend:<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Spark Summit EU 2015: Reynold Xin Keynote\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/4JTPCYkZhWZu8I\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/spark-summit-eu-2015-reynold-xin-keynote\" title=\"Spark Summit EU 2015: Reynold Xin Keynote\" target=\"_blank\">Spark Summit EU 2015: Reynold Xin Keynote<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p><strong>Databricks outlines emerging trends, common issues, and solutions:<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Spark Summit EU 2015: Lessons from 300+ production users\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/frIHwyNgU1KzAE\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/spark-summit-eu-2015-lessons-from-300-production-users\" title=\"Spark Summit EU 2015: Lessons from 300+ production users\" target=\"_blank\">Spark Summit EU 2015: Lessons from 300+ production users<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p><strong>How do users integrate common data science tools like Python, with Spark?<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/l5dMAkgccQXNQt\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/spark-summit-europe-2015-combining-the-strengths-of-mllib-scikitlearn-and-r\" title=\"Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R\" target=\"_blank\">Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p><strong>What have users learned in migrating from Data Warehouses to Spark?<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Transitioning from Traditional DW to Apache\u00ae Spark\u2122 in Operating Room Predictive Modeling\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/k72vLmqpoUJ4YX\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/transitioning-from-traditional-dw-to-spark-in-operating-room-predictive-modeling\" title=\"Transitioning from Traditional DW to Apache\u00ae Spark\u2122 in Operating Room Predictive Modeling\" target=\"_blank\">Transitioning from Traditional DW to Apache\u00ae Spark\u2122 in Operating Room Predictive Modeling<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p><strong>Databricks&#8217; CEO discusses the impact Spark has had in the enterprise:<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/mTmJ9bTT2BxTpy\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/spark-summit-eu-2015-revolutionizing-big-data-in-the-enterprise-with-spark\" title=\"Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark\" target=\"_blank\">Spark Summit EU 2015: Revolutionizing Big Data in the Enterprise with Spark<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p><strong>How do Spark clusters and R facilitate analysis of Big Data?<\/strong><\/p>\n<p><iframe loading=\"lazy\" title=\"Enabling exploratory data science with Spark and R\" src=\"https:\/\/www.slideshare.net\/slideshow\/embed_code\/key\/592qdEooJTkNFX\" width=\"427\" height=\"356\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" style=\"border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;\" allowfullscreen> <\/iframe> <\/p>\n<div style=\"margin-bottom:5px\"> <strong> <a href=\"https:\/\/www.slideshare.net\/databricks\/enabling-exploratory-data-science-with-spark-and-r\" title=\"Enabling exploratory data science with Spark and R\" target=\"_blank\">Enabling exploratory data science with Spark and R<\/a> <\/strong> from <strong><a href=\"https:\/\/www.slideshare.net\/databricks\" target=\"_blank\">Databricks<\/a><\/strong> <\/div>\n<p style=\"text-align: justify\">There you have it! A nice selection of Spark presentations to help you cut through all of the other information out there on the web. For more on Spark, stay tuned into <a href=\"https:\/\/solutionsreview.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Solutions Review<\/a>.<\/p>\n<div class=\"hr hr\"><\/div>\n<br \/>Widget not in any sidebars<br \/>\n","protected":false},"excerpt":{"rendered":"<p>If you work in Data Science or IT, you&#8217;re probably already familiar with Apache Spark. In practice, Spark has grown exponentially in 2015, and in some use cases it has matched or even surpassed Hadoop as the open source Big Data framework of choice. Vendors are beginning to hop on board as well, as Talend, [&hellip;]<\/p>\n","protected":false},"author":23,"featured_media":2699,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[200,89,19,209,247,55,57,155,248],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Getting Started with Apache Spark: the Definitive Guide<\/title>\n<meta name=\"description\" content=\"Apache Spark has grown exponentially, and in some use cases it has matched or even surpassed Hadoop as the open source big data framework of choice.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tim King\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/\"},\"author\":{\"name\":\"Tim King\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c\"},\"headline\":\"Getting Started with Apache Spark: the Definitive Guide\",\"datePublished\":\"2015-11-19T16:51:38+00:00\",\"dateModified\":\"2018-03-22T16:08:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/\"},\"wordCount\":524,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg\",\"keywords\":[\"Apache\",\"Apache Spark\",\"Big Data\",\"Data Science\",\"Databricks\",\"Hadoop\",\"MapReduce\",\"Spark\",\"Spark Summit EU 2015\"],\"articleSection\":[\"Best Practices\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/\",\"name\":\"Getting Started with Apache Spark: the Definitive Guide\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg\",\"datePublished\":\"2015-11-19T16:51:38+00:00\",\"dateModified\":\"2018-03-22T16:08:10+00:00\",\"description\":\"Apache Spark has grown exponentially, and in some use cases it has matched or even surpassed Hadoop as the open source big data framework of choice.\",\"breadcrumb\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg\",\"width\":800,\"height\":400,\"caption\":\"Getting Started with Apache Spark: the Definitive Guide\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/solutionsreview.com\/data-integration\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Getting Started with Apache Spark: the Definitive Guide\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#website\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/\",\"name\":\"Data Integration Tools &amp; ETL Software | Solutions Review\",\"description\":\"Evaluating Enterprise ETL Tools, Cloud iPaaS &amp; Real-Time Data Streaming Platforms.\",\"publisher\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/solutionsreview.com\/data-integration\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\",\"name\":\"Solutions Review\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png\",\"width\":225,\"height\":90,\"caption\":\"Solutions Review\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c\",\"name\":\"Tim King\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg\",\"caption\":\"Tim King\"},\"description\":\"Tim is Solutions Review's Executive Editor covering the human impact of AI on the future of work and learning. He is also the Media Strategist behind Insight Jam (1M+ on YouTube) events and programming. A 2017 and 2018 Most Influential Business Journalist and 2021 \\\"Who's Who\\\" in multiple categories, Tim is a recognized thought leader in enterprise tech and AI.\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/author\/timking\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Getting Started with Apache Spark: the Definitive Guide","description":"Apache Spark has grown exponentially, and in some use cases it has matched or even surpassed Hadoop as the open source big data framework of choice.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/","twitter_misc":{"Written by":"Tim King","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#article","isPartOf":{"@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/"},"author":{"name":"Tim King","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c"},"headline":"Getting Started with Apache Spark: the Definitive Guide","datePublished":"2015-11-19T16:51:38+00:00","dateModified":"2018-03-22T16:08:10+00:00","mainEntityOfPage":{"@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/"},"wordCount":524,"commentCount":0,"publisher":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#organization"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage"},"thumbnailUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg","keywords":["Apache","Apache Spark","Big Data","Data Science","Databricks","Hadoop","MapReduce","Spark","Spark Summit EU 2015"],"articleSection":["Best Practices"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/","url":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/","name":"Getting Started with Apache Spark: the Definitive Guide","isPartOf":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#website"},"primaryImageOfPage":{"@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage"},"thumbnailUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg","datePublished":"2015-11-19T16:51:38+00:00","dateModified":"2018-03-22T16:08:10+00:00","description":"Apache Spark has grown exponentially, and in some use cases it has matched or even surpassed Hadoop as the open source big data framework of choice.","breadcrumb":{"@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#primaryimage","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/11\/oie_2217410vD09ZmzF.jpg","width":800,"height":400,"caption":"Getting Started with Apache Spark: the Definitive Guide"},{"@type":"BreadcrumbList","@id":"https:\/\/solutionsreview.com\/data-integration\/getting-started-with-apache-spark-the-definitive-guide-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/solutionsreview.com\/data-integration\/"},{"@type":"ListItem","position":2,"name":"Getting Started with Apache Spark: the Definitive Guide"}]},{"@type":"WebSite","@id":"https:\/\/solutionsreview.com\/data-integration\/#website","url":"https:\/\/solutionsreview.com\/data-integration\/","name":"Data Integration Tools &amp; ETL Software | Solutions Review","description":"Evaluating Enterprise ETL Tools, Cloud iPaaS &amp; Real-Time Data Streaming Platforms.","publisher":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/solutionsreview.com\/data-integration\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/solutionsreview.com\/data-integration\/#organization","name":"Solutions Review","url":"https:\/\/solutionsreview.com\/data-integration\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png","width":225,"height":90,"caption":"Solutions Review"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c","name":"Tim King","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/image\/","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg","caption":"Tim King"},"description":"Tim is Solutions Review's Executive Editor covering the human impact of AI on the future of work and learning. He is also the Media Strategist behind Insight Jam (1M+ on YouTube) events and programming. A 2017 and 2018 Most Influential Business Journalist and 2021 \"Who's Who\" in multiple categories, Tim is a recognized thought leader in enterprise tech and AI.","url":"https:\/\/solutionsreview.com\/data-integration\/author\/timking\/"}]}},"_links":{"self":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts\/1175"}],"collection":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/comments?post=1175"}],"version-history":[{"count":0,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts\/1175\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/media\/2699"}],"wp:attachment":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/media?parent=1175"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/categories?post=1175"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/tags?post=1175"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}