{"id":3932,"date":"2022-05-03T13:23:28","date_gmt":"2022-05-03T17:23:28","guid":{"rendered":"https:\/\/solutionsreview.com\/data-management\/?p=3932"},"modified":"2022-05-06T10:54:37","modified_gmt":"2022-05-06T14:54:37","slug":"three-incorrect-assumptions-about-open-source-data-warehousing-software","status":"publish","type":"post","link":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/","title":{"rendered":"3 Incorrect Assumptions About Open-Source Data Warehousing Software"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-3939 size-full\" src=\"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg\" alt=\"3 Incorrect Assumptions About Open-Source Data Warehousing Software\" width=\"800\" height=\"400\" srcset=\"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg 800w, https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2-300x150.jpg 300w, https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2-768x384.jpg 768w, https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2-600x300.jpg 600w, https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2-162x81.jpg 162w, https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2-360x180.jpg 360w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<p style=\"text-align: justify;\"><strong><em>This is part of Solutions Review\u2019s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, <a href=\"https:\/\/www.datafold.com\/\" target=\"_blank\" rel=\"noopener\">Datafold<\/a> CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software.<\/em><\/strong><\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-3560 alignleft\" src=\"https:\/\/solutionsreview.com\/data-management\/files\/2021\/12\/SR-Premium-Content.gif\" alt=\"SR Premium Content\" width=\"105\" height=\"110\" srcset=\"https:\/\/solutionsreview.com\/data-management\/files\/2021\/12\/SR-Premium-Content.gif 105w, https:\/\/solutionsreview.com\/data-management\/files\/2021\/12\/SR-Premium-Content-77x81.gif 77w\" sizes=\"(max-width: 105px) 100vw, 105px\" \/>There are many reasons to use <span style=\"text-decoration: underline;\"><strong><a href=\"https:\/\/solutionsreview.com\/data-management\/the-ultimate-open-source-database-list-profiling-software-tools\/\" target=\"_blank\" rel=\"noopener\">open-source software<\/a><\/strong><\/span> (OSS) in your data stack, specifically, data warehousing and processing. Today, we have at our disposal dozens of mature OSS technologies with active communities, including:<\/p>\n<ul>\n<li>Apache Spark is a Swiss Army knife data processing engine that is also developer-friendly<\/li>\n<li>Trino provides great performance for SQL ETL and analytical queries while abstracting the user from underlying complexity<\/li>\n<li>Druid and ClickHouse offer subsecond query performance for interactive analytics<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">However, due to the modularity of the OSS ecosystem (the stack is assembled from multiple OSS projects and standards) and distributed ownership of codebase (and of bugs), adopting an open-source technology is often a very different experience than buying a similar proprietary product from a vendor. It seems that many teams default to open-source without properly considering trade-offs and carefully evaluating their assumptions. In this article, I challenge what I consider the top three erroneous assumptions about OSS <span style=\"text-decoration: underline;\"><strong><a href=\"https:\/\/solutionsreview.com\/data-management\/the-best-cloud-data-warehouse-solutions-2\/\" target=\"_blank\" rel=\"noopener\">data warehousing technologies<\/a><\/strong><\/span>.<\/p>\n<div class=\"widget\"><div class=\"aside-card\">\t\t\t<div class=\"textwidget\"><a class=\"speedbump\" href=\"https:\/\/solutionsreview.com\/data-management\/data-management-data-warehouse-buyers-guide\/\" title=\"Download link to Data Management Buyers Guide\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-1682\" src=\"https:\/\/solutionsreview.com\/data-management\/files\/2019\/01\/data-management-speedbump-cta.jpg\" alt=\"Download Link to Data Management Buyers Guide\" width=\"800\" height=\"225\" \/><\/a><\/div>\n\t\t<\/div><\/div>\n<p style=\"text-align: justify;\">Before we proceed, let\u2019s define the scope for \u201cdata warehousing tech\u201d as delivering the following use cases:<\/p>\n<ul>\n<li>Ingest and store all analytical data<\/li>\n<li>Execute <span style=\"text-decoration: underline;\"><strong><a href=\"https:\/\/solutionsreview.com\/data-integration\/the-best-data-transformation-tools-and-software\/\" target=\"_blank\" rel=\"noopener\">data transformation<\/a><\/strong><\/span>s (the \u201cT\u201d of \u201cELT\u201d)<\/li>\n<li>Serve data to consumers (Dashboards, Ad-hoc analysis, Consuming applications (ML, microservices, etc.)<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">I often hear from enterprise data teams, \u201cWe are choosing open-source data warehousing technology because it\u2019s cheaper than proprietary options, to avoid vendor lock-in, and to be able to extend the system if we need to.\u201d There are actually three statements here, so let\u2019s break down the assumptions.<\/p>\n<h3><strong>OSS is Cheaper<\/strong><\/h3>\n<p style=\"text-align: justify;\">It\u2019s difficult to <span style=\"text-decoration: underline;\"><strong><a href=\"https:\/\/solutionsreview.com\/data-management\/data-management-data-warehouse-buyers-guide\/\" target=\"_blank\" rel=\"noopener\">compare pricing for data warehousing tech<\/a><\/strong><\/span>, especially with structurally different pricing models: for example, pay-per-use (BigQuery, Athena) vs. pay-per-infra time (Snowflake, Databricks). Therefore, I suggest a counterexample to this assumption.<\/p>\n<p style=\"text-align: justify;\">To compare pricing, we first need to establish a common denominator in terms of the amount of work and performance that we are buying for a given price. Let\u2019s consider an often-cited Fivetran <a href=\"https:\/\/www.fivetran.com\/blog\/warehouse-benchmark\" target=\"_blank\" rel=\"noopener\">TPC benchmark<\/a> that compared Snowflake, a proprietary DWH product, with Trino (formerly known as PrestoSQL), a popular open-source data processing engine, among others. In that benchmark, Trino shows an average query runtime of 14.78 sec vs. 10.74 sec (38 percent difference) for Snowflake, but the medians are roughly the same, so let us assume the performance is comparable.<\/p>\n<h4><strong>Here\u2019s a back-of-the envelope cost calculation: Base Costs<\/strong><\/h4>\n<table width=\"624\">\n<tbody>\n<tr>\n<td width=\"103\"><\/td>\n<td width=\"176\">Mean TPC query time<\/td>\n<td width=\"84\">Price\/hr<\/td>\n<td width=\"151\">Size<\/td>\n<td width=\"110\">Mean cost per query<\/td>\n<\/tr>\n<tr>\n<td width=\"103\">Snowflake<\/td>\n<td width=\"176\">10.74 sec<\/td>\n<td width=\"84\">$16.00<\/p>\n<p>&nbsp;<\/td>\n<td width=\"151\">Large<\/td>\n<td width=\"110\">$0.047<\/td>\n<\/tr>\n<tr>\n<td width=\"103\">Trino<\/td>\n<td width=\"176\">14.78 sec<\/td>\n<td width=\"84\">$8.02<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/td>\n<td width=\"151\">4x n2-highmem-32<\/td>\n<td width=\"110\">$0.033<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: justify;\">At first glance, Trino seems to be ~30 percent cheaper per query. However, in a real production scenario, we need to consider total cost of ownership and, in case of OSS, factor in expenses such as DevOps; products like Trino are sophisticated distributed systems that require proper deployment, monitoring, tuning and maintenance. And while it may seem easy to spin up a cluster with Kubernetes in a couple of clicks, matching Snowflake\u2019s <a href=\"https:\/\/community.snowflake.com\/s\/question\/0D50Z000098T39RSAS\/what-sla-availability-does-snowflake-guarantee\" target=\"_blank\" rel=\"noopener\">availability SLA of 99.9 percent<\/a> is a completely different game.<\/p>\n<p style=\"text-align: justify;\">Let\u2019s consider the DevOps cost factor in two primary scenarios: vendor and in-house managed deployments.<\/p>\n<h4><strong>Vendor-Managed Trino Costs<\/strong><\/h4>\n<p style=\"text-align: justify;\">For example, the leading Trino vendor, Starburst, <a href=\"https:\/\/console.cloud.google.com\/marketplace\/product\/starburst-public\/starburst-enterprise\" target=\"_blank\" rel=\"noopener\">charges<\/a> ~60 percent markup on top of the AWS infrastructure cost, which makes vendor-hosted Trino more expensive than Snowflake.<\/p>\n<table width=\"624\">\n<tbody>\n<tr>\n<td width=\"103\"><\/td>\n<td width=\"176\">Mean TPC query time<\/td>\n<td width=\"84\">Price\/hr<\/td>\n<td width=\"151\">Size<\/td>\n<td width=\"110\">Mean cost per query<\/td>\n<\/tr>\n<tr>\n<td width=\"103\">Snowflake<\/td>\n<td width=\"176\">10.74 sec<\/td>\n<td width=\"84\">$16.00<\/p>\n<p>&nbsp;<\/td>\n<td width=\"151\">Large<\/td>\n<td width=\"110\">$0.047<\/td>\n<\/tr>\n<tr>\n<td width=\"103\">Trino via Starburst Enterprise<\/td>\n<td width=\"176\">14.78 sec*<\/td>\n<td width=\"84\">$12.82<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/td>\n<td width=\"151\">4x n2-highmem-32<\/td>\n<td width=\"110\">$0.053<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>*Assuming similar performance for plain Trino and Starburst Enterprise distribution.<\/p>\n<p>Trino via Starburst Enterprise becomes 13 percent more expensive.<\/p>\n<h4><strong>In-House Trino Costs<\/strong><\/h4>\n<p style=\"text-align: justify;\">If you are up for running Trino in-house, consider the cost of two senior distributed systems engineers (most companies can\u2019t do with just one because they need to be on call to maintain the SLA). That\u2019s $220K * 2 * 2.7 <a href=\"https:\/\/web.mit.edu\/e-club\/hadzima\/how-much-does-an-employee-cost.html\" target=\"_blank\" rel=\"noopener\">overhead factor<\/a> = $111 per calendar hour (Silicon Valley pricing).<\/p>\n<p style=\"text-align: justify;\">That would be tremendous overhead for just a 4-node cluster as in the benchmark, so let\u2019s assume a more realistic scenario for a large company: a ~110-node Trino cluster. Even with such a large cluster size, you would be paying a ~100 percent markup on top of infrastructure costs for the DevOps labor, making Trino 40 percent more expensive than Snowflake per query.<\/p>\n<table width=\"624\">\n<tbody>\n<tr>\n<td width=\"103\"><\/td>\n<td width=\"176\">Mean TPC query time<\/td>\n<td width=\"84\">Price\/hr<\/td>\n<td width=\"151\">Size<\/td>\n<td width=\"110\">Mean cost per query<\/td>\n<\/tr>\n<tr>\n<td width=\"103\">Snowflake<\/td>\n<td width=\"176\">10.74 sec<\/td>\n<td width=\"84\">$16.00<\/p>\n<p>&nbsp;<\/td>\n<td width=\"151\">Large<\/td>\n<td width=\"110\">$0.047<\/td>\n<\/tr>\n<tr>\n<td width=\"103\">Trino including DevOps<\/td>\n<td width=\"176\">14.78 sec<\/td>\n<td width=\"84\">$16.04<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/td>\n<td width=\"151\">4x n2-highmem-32<\/td>\n<td width=\"110\">$0.066<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: justify;\">This should not be surprising because a vendor like Snowflake, with tens of thousands of customers, is able to perfect their technology and distribute their DevOps costs thinly.<\/p>\n<p style=\"text-align: justify;\">We\u2019ve been making quite a few assumptions along the way, so the numbers may differ from case to case, but the key takeaway is that there is no magic 1.5x\/2x\/3x savings we can expect from going the OSS route for a typical team, unless you have your own hardware infrastructure and are FAANG-big. Note that DoorDash and Block (aka Square), two public and very data-driven companies with hundreds of internal data users, decided against hosting their data infrastructure in favor of Snowflake.<\/p>\n<h3><strong>OSS Eliminates Vendor Lock-in<\/strong><\/h3>\n<p>Unless your company is so small that you don\u2019t care about SLAs or so big (Facebook-big) that you can actually pull it off while benefiting from economies of scale, you will probably contract a vendor to host the OSS data warehouse for you instead of running it in-house.<\/p>\n<p style=\"text-align: justify;\">Many vendors run their custom distributions of OSS, which can be significantly behind the master branch of the respective OSS projects. That, in turn, limits your ability to tinker with the OSS and to take advantage of its latest features and optimizations. And now your infrastructure is also integrated with the vendor\u2019s APIs, making it harder to move out. All in all, you end up with a similar vendor lock-in that you tried to avoid by going the OSS route. With one exception: a theoretical ability to integrate some of the community-contributed components. But unless you have very specific requirements here, it\u2019s hardly an advantage as the best proprietary database products have excellent interoperability as well.<\/p>\n<h3><strong>OSS Gives More Flexibility<\/strong><\/h3>\n<p style=\"text-align: justify;\">First, consider all of the challenges with vendor-flavored OSS described above. Finally, ask yourself (and your engineering partners) what exactly and why would you want to add to a database solution that isn\u2019t readily available from serverless offerings such as Snowflake or BigQuery? Is that custom feature so critical to your business that you are going to hire expensive engineering talent to develop database tech instead of building your core product?<\/p>\n<h3><strong>End-User Experience<\/strong><\/h3>\n<p style=\"text-align: justify;\">It didn\u2019t come up in the \u201cthree reasons,\u201d but I am going to mention end-user experience anyway. Even the most mature OSS products in the data warehousing space are still rough around the edges when it comes to user experience in comparison to top proprietary products such as Snowflake and BigQuery. If you are unsure what I am referring to, try optimizing the performance of a couple of Spark jobs that take forever to complete \u2014 for no apparent reason. Or try to see inside your Kafka data queue. In a world where analytics powers the business and where talent is the scarcest resource, a data platform\u2019s user experience matters a lot \u2014 so don\u2019t compromise on it, especially without a good reason.<\/p>\n<h3><strong>Where Does OSS Make Sense in the Data Stack?<\/strong><\/h3>\n<p style=\"text-align: justify;\">While core data processing infrastructure, as we argue, is a tricky one, other parts of the stack \u2014 for example, data orchestration and transformation layers \u2014 are perfect examples where open-source solutions are most effective. Note the success stories of Airflow, dbt, and Dagster.<\/p>\n<p style=\"text-align: justify;\">If we were to draw a line to distinguish \u201cgood for open-source\u201d and \u201cchallenging for open-source\u201d places in the stack, it would mostly come down to customization and extensibility. It is unlikely for an average company to tinker with core infra, such as Linux, Docker or their data warehousing solution. But building dbt packages, utils or connectors is very common.<\/p>\n<p style=\"text-align: justify;\">Lastly, even with core data infrastructure, the open-source world is rapidly converging with fully managed. Databricks started its business as a hosted Spark platform and currently offers a serverless SQL data engine that directly challenges Snowflake in both price and performance while supporting open-source file formats.<\/p>\n<p style=\"text-align: justify;\">Open-source software can be profoundly effective if adopted for the right reasons and can turn into a terrible money sink when the fit isn\u2019t right.<\/p>\n<div class=\"hr hr\"><\/div>\n<div class=\"widget\"><div class=\"aside-card\">\t\t\t<div class=\"textwidget\"><p><a class=\"speedbump\" href=\"https:\/\/solutionsreview.com\/data-management\/data-management-vendor-map-a-guide-to-the-best-data-management-tools\/\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-1682\" src=\"https:\/\/solutionsreview.com\/data-management\/files\/2019\/01\/data-management-vendor-map-sb-cta.jpg\" alt=\"Download Link to Data Management Vendor Map\" width=\"800\" height=\"225\" \/><\/a><\/p>\n<\/div>\n\t\t<\/div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>This is part of Solutions Review\u2019s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Datafold CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software. There are many reasons to use open-source software (OSS) in your data stack, specifically, data [&hellip;]<\/p>\n","protected":false},"author":165,"featured_media":3939,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[3],"tags":[1267],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>3 Incorrect Assumptions About Open-Source Data Warehousing Software<\/title>\n<meta name=\"description\" content=\"Datafold CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"3 Incorrect Assumptions About Open-Source Data Warehousing Software\" \/>\n<meta property=\"og:description\" content=\"Datafold CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/\" \/>\n<meta property=\"og:site_name\" content=\"Best Data Management Software, Vendors and Data Science Platforms\" \/>\n<meta property=\"article:published_time\" content=\"2022-05-03T17:23:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-05-06T14:54:37+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"800\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Alex Morozov\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Alex Morozov\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/\",\"url\":\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/\",\"name\":\"3 Incorrect Assumptions About Open-Source Data Warehousing Software\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/data-management\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg\",\"datePublished\":\"2022-05-03T17:23:28+00:00\",\"dateModified\":\"2022-05-06T14:54:37+00:00\",\"author\":{\"@id\":\"https:\/\/solutionsreview.com\/data-management\/#\/schema\/person\/c4b50779a65a89aab77e9050ea587623\"},\"description\":\"Datafold CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software.\",\"breadcrumb\":{\"@id\":\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#primaryimage\",\"url\":\"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg\",\"contentUrl\":\"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg\",\"width\":800,\"height\":400,\"caption\":\"3 Incorrect Assumptions About Open-Source Data Warehousing Software\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/solutionsreview.com\/data-management\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"3 Incorrect Assumptions About Open-Source Data Warehousing Software\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/solutionsreview.com\/data-management\/#website\",\"url\":\"https:\/\/solutionsreview.com\/data-management\/\",\"name\":\"Best Data Management Software, Vendors and Data Science Platforms\",\"description\":\"Enterprise Information Management\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/solutionsreview.com\/data-management\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/solutionsreview.com\/data-management\/#\/schema\/person\/c4b50779a65a89aab77e9050ea587623\",\"name\":\"Alex Morozov\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/data-management\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/20f08675ab17f6048996d2a6e2b58e27?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/20f08675ab17f6048996d2a6e2b58e27?s=96&d=mm&r=g\",\"caption\":\"Alex Morozov\"},\"description\":\"Alex Morozov is the chief technology officer and co-founder at Datafold. A nuclear physicist by training, Alex has served as CTO at numerous startups and brings over 15 years of experience in delivering sophisticated software systems across data, IoT, embedded and communications.\",\"url\":\"https:\/\/solutionsreview.com\/data-management\/author\/almorozov\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"3 Incorrect Assumptions About Open-Source Data Warehousing Software","description":"Datafold CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/","og_locale":"en_US","og_type":"article","og_title":"3 Incorrect Assumptions About Open-Source Data Warehousing Software","og_description":"Datafold CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software.","og_url":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/","og_site_name":"Best Data Management Software, Vendors and Data Science Platforms","article_published_time":"2022-05-03T17:23:28+00:00","article_modified_time":"2022-05-06T14:54:37+00:00","og_image":[{"width":800,"height":400,"url":"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg","type":"image\/jpeg"}],"author":"Alex Morozov","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Alex Morozov","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/","url":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/","name":"3 Incorrect Assumptions About Open-Source Data Warehousing Software","isPartOf":{"@id":"https:\/\/solutionsreview.com\/data-management\/#website"},"primaryImageOfPage":{"@id":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#primaryimage"},"image":{"@id":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#primaryimage"},"thumbnailUrl":"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg","datePublished":"2022-05-03T17:23:28+00:00","dateModified":"2022-05-06T14:54:37+00:00","author":{"@id":"https:\/\/solutionsreview.com\/data-management\/#\/schema\/person\/c4b50779a65a89aab77e9050ea587623"},"description":"Datafold CTO and Co-Founder Alex Morozov offers three incorrect assumptions about open-source data warehousing software.","breadcrumb":{"@id":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#primaryimage","url":"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg","contentUrl":"https:\/\/solutionsreview.com\/data-management\/files\/2022\/05\/MicrosoftTeams-image-2.jpg","width":800,"height":400,"caption":"3 Incorrect Assumptions About Open-Source Data Warehousing Software"},{"@type":"BreadcrumbList","@id":"https:\/\/solutionsreview.com\/data-management\/three-incorrect-assumptions-about-open-source-data-warehousing-software\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/solutionsreview.com\/data-management\/"},{"@type":"ListItem","position":2,"name":"3 Incorrect Assumptions About Open-Source Data Warehousing Software"}]},{"@type":"WebSite","@id":"https:\/\/solutionsreview.com\/data-management\/#website","url":"https:\/\/solutionsreview.com\/data-management\/","name":"Best Data Management Software, Vendors and Data Science Platforms","description":"Enterprise Information Management","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/solutionsreview.com\/data-management\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/solutionsreview.com\/data-management\/#\/schema\/person\/c4b50779a65a89aab77e9050ea587623","name":"Alex Morozov","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/data-management\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/20f08675ab17f6048996d2a6e2b58e27?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/20f08675ab17f6048996d2a6e2b58e27?s=96&d=mm&r=g","caption":"Alex Morozov"},"description":"Alex Morozov is the chief technology officer and co-founder at Datafold. A nuclear physicist by training, Alex has served as CTO at numerous startups and brings over 15 years of experience in delivering sophisticated software systems across data, IoT, embedded and communications.","url":"https:\/\/solutionsreview.com\/data-management\/author\/almorozov\/"}]}},"_links":{"self":[{"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/posts\/3932"}],"collection":[{"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/users\/165"}],"replies":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/comments?post=3932"}],"version-history":[{"count":0,"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/posts\/3932\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/media\/3939"}],"wp:attachment":[{"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/media?parent=3932"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/categories?post=3932"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solutionsreview.com\/data-management\/wp-json\/wp\/v2\/tags?post=3932"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}