{"id":627,"date":"2014-08-22T10:15:13","date_gmt":"2014-08-22T14:15:13","guid":{"rendered":"https:\/\/solutionsreview.com\/data-integration\/?p=627"},"modified":"2014-09-02T16:15:53","modified_gmt":"2014-09-02T20:15:53","slug":"8-data-integration-tips","status":"publish","type":"post","link":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/","title":{"rendered":"8 Great Data Integration Development Tips"},"content":{"rendered":"<p><a href=\"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-medium wp-image-631\" src=\"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips-300x135.jpg\" alt=\"8 Great Data Integration Development Tips\" width=\"300\" height=\"135\" srcset=\"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips-300x135.jpg 300w, https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg 600w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a>Just as anything in IT, things don\u2019t ever go as smoothly as they looked on the data integration process flow from\u00a0the PowerPoint slide that your manager spoke to at the last departmental meeting. The time that you projected to run a report may be longer than expected, the effort to get the process working seems like endless lines of code, and you find yourself doing twice as many debugging sessions than ever expected.<\/p>\n<p>Many can probable\u00a0relate to this nightmare because it\u2019s not always easy to get the dataflow right, especially with batch processes for big data volumes. I\u2019m sure that this horror show is more common than one would like to admit so I thought it might\u00a0be helpful to review an article written by Saggi Neumann, Co-Founder and CTO of xplenty, called \u201c<a href=\"https:\/\/www.xplenty.com\/blog\/2014\/06\/eight-best-practices-for-data-integration-development\/\">Eight Best Practices for Data Integration Development.<\/a>\u201d<\/p>\n<p>Below are small snippets from the article.<\/p>\n<p><strong>1. Start Small<\/strong><\/p>\n<p>\u201cStart with a small sample of the dataset for development and debugging purposes.\u201d<\/p>\n<p>\u201cUsing too much data at this point only lengthens development time.\u201d<\/p>\n<p>\u201cProcess the entire dataset further down the line after you have confirmed that your dataflow works correctly.\u201d<\/p>\n<p><strong>2. Develop Gradually<\/strong><\/p>\n<p>\u201cDeveloping a long and complicated dataflow only to see it fail can waste plenty of time, not to mention that it is rather hard to debug.\u201d<\/p>\n<p>\u201c\u2026develop it gradually, part by part.\u201d<\/p>\n<p>\u201c\u2026check the output after each intersection and make sure that the results are correct.\u201d<\/p>\n<p><strong>3. Filter Out Useless Data<\/strong><\/p>\n<p>\u201cSelect only relevant fields via projection and use filters to keep irrelevant data out of the flow.\u201d<\/p>\n<p><strong>4. Join Carefully<\/strong><\/p>\n<p>\u201c\u2026take a look at data from the join sources and manually check whether they are joined correctly by checking row counts and value histograms after the join.\u201d<\/p>\n<p>Types of joins: Replicated Join; Skewed Join; Merge Join; Merge-Sparse Join; Default Join<\/p>\n<p>\u201cMake sure, of course, that you put the relevant data source on the correct side depending on the join type.\u201d<\/p>\n<p><strong>5. Store Results as Files<\/strong><\/p>\n<p>\u201cUsing the database as immediate output during development is not such a good idea &#8211; you will find out about errors, like an invalid schema, only when inserting the crunched data into the DB.\u201d<\/p>\n<p><strong>6. Split Parallel Dataflows<\/strong><\/p>\n<p>\u201cDataflows where one data source is split into several parallel flows may work better when split into entirely separate dataflows.\u201d<\/p>\n<p><strong>7. Split Complex Dataflows<\/strong><\/p>\n<p>\u201cDataflows that are too big and complex should also be split into several dataflows.\u201d<\/p>\n<p>\u201cThis helps to debug each one more easily and make sure everything works correctly.\u201d<\/p>\n<p><strong>8. Use GZIP<\/strong><\/p>\n<p>\u201cCompressing input and output files saves plenty of time.\u201d<\/p>\n<p>\u201cYes, it takes more CPU power to compress and decompress data, but that\u2019s nothing compared the time saved transferring bytes over the network.\u201d<\/p>\n<p><a href=\"https:\/\/www.xplenty.com\/blog\/2014\/06\/eight-best-practices-for-data-integration-development\/\">Click here<\/a> to read the entire article.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Just as anything in IT, things don\u2019t ever go as smoothly as they looked on the data integration process flow from\u00a0the PowerPoint slide that your manager spoke to at the last departmental meeting. The time that you projected to run a report may be longer than expected, the effort to get the process working seems [&hellip;]<\/p>\n","protected":false},"author":23,"featured_media":631,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[3],"tags":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>8 Great Data Integration Development Tips &#187; Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Tim King\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/\"},\"author\":{\"name\":\"Tim King\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c\"},\"headline\":\"8 Great Data Integration Development Tips\",\"datePublished\":\"2014-08-22T14:15:13+00:00\",\"dateModified\":\"2014-09-02T20:15:53+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/\"},\"wordCount\":466,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg\",\"articleSection\":[\"Data Integration News\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/\",\"name\":\"8 Great Data Integration Development Tips &#187; Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg\",\"datePublished\":\"2014-08-22T14:15:13+00:00\",\"dateModified\":\"2014-09-02T20:15:53+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg\",\"width\":600,\"height\":270,\"caption\":\"8 Great Data Integration Development Tips\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/solutionsreview.com\/data-integration\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"8 Great Data Integration Development Tips\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#website\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/\",\"name\":\"Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop\",\"description\":\"Data Integration Buyers Guide and Best Practices\",\"publisher\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/solutionsreview.com\/data-integration\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#organization\",\"name\":\"Solutions Review\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png\",\"width\":225,\"height\":90,\"caption\":\"Solutions Review\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c\",\"name\":\"Tim King\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg\",\"caption\":\"Tim King\"},\"description\":\"Tim is Solutions Review's Executive Editor covering the human impact of AI on the future of work and learning. He is also the Media Strategist behind Insight Jam (1M+ on YouTube) events and programming. A 2017 and 2018 Most Influential Business Journalist and 2021 \\\"Who's Who\\\" in multiple categories, Tim is a recognized thought leader in enterprise tech and AI.\",\"url\":\"https:\/\/solutionsreview.com\/data-integration\/author\/timking\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"8 Great Data Integration Development Tips &#187; Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/","twitter_misc":{"Written by":"Tim King","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#article","isPartOf":{"@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/"},"author":{"name":"Tim King","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c"},"headline":"8 Great Data Integration Development Tips","datePublished":"2014-08-22T14:15:13+00:00","dateModified":"2014-09-02T20:15:53+00:00","mainEntityOfPage":{"@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/"},"wordCount":466,"commentCount":0,"publisher":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#organization"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage"},"thumbnailUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg","articleSection":["Data Integration News"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/","url":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/","name":"8 Great Data Integration Development Tips &#187; Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop","isPartOf":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#website"},"primaryImageOfPage":{"@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage"},"thumbnailUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg","datePublished":"2014-08-22T14:15:13+00:00","dateModified":"2014-09-02T20:15:53+00:00","breadcrumb":{"@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#primaryimage","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2014\/08\/8-Great-Data-Integration-Development-Tips.jpg","width":600,"height":270,"caption":"8 Great Data Integration Development Tips"},{"@type":"BreadcrumbList","@id":"https:\/\/solutionsreview.com\/data-integration\/8-data-integration-tips\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/solutionsreview.com\/data-integration\/"},{"@type":"ListItem","position":2,"name":"8 Great Data Integration Development Tips"}]},{"@type":"WebSite","@id":"https:\/\/solutionsreview.com\/data-integration\/#website","url":"https:\/\/solutionsreview.com\/data-integration\/","name":"Best Data Integration Vendors, News &amp; Reviews for Big Data, Applications, ETL and Hadoop","description":"Data Integration Buyers Guide and Best Practices","publisher":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/solutionsreview.com\/data-integration\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/solutionsreview.com\/data-integration\/#organization","name":"Solutions Review","url":"https:\/\/solutionsreview.com\/data-integration\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2016\/02\/Solutions_Review_Header_Data_Integration_225.png","width":225,"height":90,"caption":"Solutions Review"},"image":{"@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/154e152a275103e373e24ada7f2feb5c","name":"Tim King","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-integration\/#\/schema\/person\/image\/","url":"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg","contentUrl":"https:\/\/solutionsreview.com\/data-integration\/files\/2023\/12\/tk.jpg","caption":"Tim King"},"description":"Tim is Solutions Review's Executive Editor covering the human impact of AI on the future of work and learning. He is also the Media Strategist behind Insight Jam (1M+ on YouTube) events and programming. A 2017 and 2018 Most Influential Business Journalist and 2021 \"Who's Who\" in multiple categories, Tim is a recognized thought leader in enterprise tech and AI.","url":"https:\/\/solutionsreview.com\/data-integration\/author\/timking\/"}]}},"_links":{"self":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts\/627"}],"collection":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/users\/23"}],"replies":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/comments?post=627"}],"version-history":[{"count":0,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/posts\/627\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/media\/631"}],"wp:attachment":[{"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/media?parent=627"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/categories?post=627"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-integration\/wp-json\/wp\/v2\/tags?post=627"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}