{"id":1056,"date":"2015-08-14T12:15:15","date_gmt":"2015-08-14T16:15:15","guid":{"rendered":"https:\/\/solutionsreview.com\/data-integration\/?p=1056"},"modified":"2016-10-05T09:40:07","modified_gmt":"2016-10-05T13:40:07","slug":"combining-spark-batch-processing-for-real-time-analytics","status":"publish","type":"post","link":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/","title":{"rendered":"Combining Spark &amp; Batch Processing for Real-Time Analytics"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1061\" src=\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg\" alt=\"Spark and Batch\" width=\"600\" height=\"300\" srcset=\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg 600w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO-300x150.jpg 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/p>\n<p><strong>By Yann Delacourt<\/strong><\/p>\n<p>Companies that use Hadoop\u2019s big data processing platforms typically look to one of two integration modes depending on their usage. The two integration modes \u2013 asynchronous and synchronous both come with their benefits and limitations. It follows that as the pace of business increases, more and more organization are looking to use these integration modes interchangeably to pull as much benefit and analysis from their data as possible.<\/p>\n<p>Asynchronous mode, often referred to as \u201cbatch\u201d is typically used for methodical, overnight processing. Organizations will process huge data sets to meet the needs of most traditional corporate analytics initiatives. For instance, when a bank branch integrates the deposits from the day into its books, batch processing is often used.<\/p>\n<p>However, demand for quicker insights are driving corporate analytics teams to look for technology that supports real-time integration and ultimately predictive analytics. The latency period of batch processing makes this impossible. If a financial institution needs to detect and stop fraud as it happens, or an e-retailer wants to recommend a related add-on purchase, batch processing won\u2019t cut it.<\/p>\n<p>Spark, a technology developed by the Apache Foundation for the Hadoop ecosystem provides an option for real-time integration. This multifunction analysis engine allows for a synchronous integration mode, which is commonly referred to as \u201cstreaming.\u201d Spark quickly processes large data sets and conveniently includes the same functions as MapReduce, but with vastly superior performance: Both data acquisition and processing can be managed at a processing speed 50 to 100 times greater than MapReduce.<\/p>\n<br \/>Widget not in any sidebars<br \/>\n<p>Streaming works by processing a collection of events over a period of time, but it only makes a record of the group, and so doesn\u2019t provide a timestamp for each and every record. Also, data quality can be impacted by streams of data arriving out of order, or with missing records, so having batch processed records may be necessary in certain aspects of business or regulated industries.<\/p>\n<p>When companies combine these two modes of processing however, they get the best of both worlds. The newest wave of data integration technology supports both integration modes while making it possible to switch between them transparently. Previous generations have allowed switching, but only with a complete overhaul of the data integration layer. This simplifies processing development and the management of the overall life cycle, including updates, changes, and re-use.<\/p>\n<p>The e-retailer that was looking for a way to provide recommendations may now combine browsing history data with the very latest information available \u2013 even from social networks. Banks can now do more than synchronize daily activity: They can create data lakes to store all internal and external market data, then compile the data with no volume restrictions and integrate it with other types of data for a predictive program. Spark and batch processing also enables huge volumes of data to be extracted for predictive maintenance, or to predict the outcomes of various scenarios.<\/p>\n<p>Retail and banking are just the tip of the iceberg. There is unprecedented analytical potential when combining Spark and batch processing to align the current reality of business with greater accuracy. Data-driven companies that take advantage of this technology \u2013 across all industries \u2013 will find that they are able to maximize the value derived from the data and stay ahead of market needs and customer demands.<\/p>\n<p><span style=\"color: #222222;font-family: 'Noto Sans', Helvetica, Helvetica, Arial, sans-serif\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-1057 alignleft\" src=\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/Yann-Delacourt.jpg\" alt=\"Yann Delacourt\" width=\"137\" height=\"137\" srcset=\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/Yann-Delacourt.jpg 314w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/Yann-Delacourt-300x300.jpg 300w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/Yann-Delacourt-70x70.jpg 70w, https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/Yann-Delacourt-60x60.jpg 60w\" sizes=\"(max-width: 137px) 100vw, 137px\" \/>Yann Delacourt is director of product management at <a href=\"https:\/\/www.talend.com\/\" target=\"_blank\">Talend<\/a>. His field of expertise covers data integration, big data and analytics.\u00a0<\/span><span style=\"color: #222222;font-family: 'Noto Sans', Helvetica, Helvetica, Arial, sans-serif\">Yann has more than 15 years of experience in the software industry having held various leadership positions in product management and engineering at SAP &amp; Business Objects. <a href=\"https:\/\/www.linkedin.com\/profile\/view?id=8654937&amp;authType=NAME_SEARCH&amp;authToken=5Ulp&amp;locale=en_US&amp;trk=tyah&amp;trkInfo=clickedVertical%3Amynetwork%2CclickedEntityId%3A8654937%2CauthType%3ANAME_SEARCH%2Cidx%3A1-1-1%2CtarId%3A1439562817447%2Ctas%3AYann%20Delacourt\" target=\"_blank\">Connect with him on LinkedIn<\/a>.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Yann Delacourt Companies that use Hadoop\u2019s big data processing platforms typically look to one of two integration modes depending on their usage. The two integration modes \u2013 asynchronous and synchronous both come with their benefits and limitations. It follows that as the pace of business increases, more and more organization are looking to use [&hellip;]<\/p>\n","protected":false},"author":23,"featured_media":1061,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[89,176,175,53,177],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Combining Spark &amp; Batch Processing for Real-Time Analytics<\/title>\n<meta name=\"description\" content=\"Demand for quicker insights are driving corporate analytics teams to look for technology that supports real-time integration.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tim King\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/\"},\"author\":{\"name\":\"Tim King\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c\"},\"headline\":\"Combining Spark &amp; Batch Processing for Real-Time Analytics\",\"datePublished\":\"2015-08-14T16:15:15+00:00\",\"dateModified\":\"2016-10-05T13:40:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/\"},\"wordCount\":610,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg\",\"keywords\":[\"Apache Spark\",\"Batch Processing\",\"Spark Processing\",\"Talend\",\"Yann Delacourt\"],\"articleSection\":[\"Best Practices\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/\",\"name\":\"Combining Spark & Batch Processing for Real-Time Analytics\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg\",\"datePublished\":\"2015-08-14T16:15:15+00:00\",\"dateModified\":\"2016-10-05T13:40:07+00:00\",\"description\":\"Demand for quicker insights are driving corporate analytics teams to look for technology that supports real-time integration.\",\"breadcrumb\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg\",\"width\":600,\"height\":300},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/solutionsreview.com\/data-integration\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Combining Spark &amp; Batch Processing for Real-Time Analytics\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#website\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/\",\"name\":\"Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop\",\"description\":\"Data Integration Buyers Guide and Best Practices\",\"publisher\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/solutionsreview.com\/data-integration\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\",\"name\":\"Solutions Review\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png\",\"width\":225,\"height\":90,\"caption\":\"Solutions Review\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c\",\"name\":\"Tim King\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg\",\"caption\":\"Tim King\"},\"description\":\"Tim is Solutions Review's Executive Editor covering the human impact of AI on the future of work and learning. He is also the Media Strategist behind Insight Jam (1M+ on YouTube) events and programming. A 2017 and 2018 Most Influential Business Journalist and 2021 \\\"Who's Who\\\" in multiple categories, Tim is a recognized thought leader in enterprise tech and AI.\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/author\/timking\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Combining Spark & Batch Processing for Real-Time Analytics","description":"Demand for quicker insights are driving corporate analytics teams to look for technology that supports real-time integration.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/","twitter_misc":{"Written by":"Tim King","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#article","isPartOf":{"@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/"},"author":{"name":"Tim King","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c"},"headline":"Combining Spark &amp; Batch Processing for Real-Time Analytics","datePublished":"2015-08-14T16:15:15+00:00","dateModified":"2016-10-05T13:40:07+00:00","mainEntityOfPage":{"@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/"},"wordCount":610,"commentCount":0,"publisher":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#organization"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage"},"thumbnailUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg","keywords":["Apache Spark","Batch Processing","Spark Processing","Talend","Yann Delacourt"],"articleSection":["Best Practices"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/","url":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/","name":"Combining Spark & Batch Processing for Real-Time Analytics","isPartOf":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#website"},"primaryImageOfPage":{"@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage"},"thumbnailUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg","datePublished":"2015-08-14T16:15:15+00:00","dateModified":"2016-10-05T13:40:07+00:00","description":"Demand for quicker insights are driving corporate analytics teams to look for technology that supports real-time integration.","breadcrumb":{"@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#primaryimage","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2015\/08\/oie_sOrJhOCsWhXO.jpg","width":600,"height":300},{"@type":"BreadcrumbList","@id":"https:\/\/solutionsreview.com\/data-integration\/combining-spark-batch-processing-for-real-time-analytics\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/solutionsreview.com\/data-integration\/"},{"@type":"ListItem","position":2,"name":"Combining Spark &amp; Batch Processing for Real-Time Analytics"}]},{"@type":"WebSite","@id":"https:\/\/solutionsreview.com\/data-integration\/#website","url":"https:\/\/solutionsreview.com\/data-integration\/","name":"Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop","description":"Data Integration Buyers Guide and Best Practices","publisher":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/solutionsreview.com\/data-integration\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/solutionsreview.com\/data-integration\/#organization","name":"Solutions Review","url":"https:\/\/solutionsreview.com\/data-integration\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png","width":225,"height":90,"caption":"Solutions Review"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c","name":"Tim King","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/image\/","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg","caption":"Tim King"},"description":"Tim is Solutions Review's Executive Editor covering the human impact of AI on the future of work and learning. He is also the Media Strategist behind Insight Jam (1M+ on YouTube) events and programming. A 2017 and 2018 Most Influential Business Journalist and 2021 \"Who's Who\" in multiple categories, Tim is a recognized thought leader in enterprise tech and AI.","url":"https:\/\/solutionsreview.com\/data-integration\/author\/timking\/"}]}},"_links":{"self":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts\/1056"}],"collection":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/comments?post=1056"}],"version-history":[{"count":0,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts\/1056\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/media\/1061"}],"wp:attachment":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/media?parent=1056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/categories?post=1056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/tags?post=1056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}