{"id":548,"date":"2024-01-01T14:51:38","date_gmt":"2024-01-01T14:51:38","guid":{"rendered":"https:\/\/solutionsreview.com\/expert\/?p=548"},"modified":"2024-02-02T14:32:02","modified_gmt":"2024-02-02T14:32:02","slug":"distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity","status":"publish","type":"post","link":"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/","title":{"rendered":"Distributed Data Sources are Everywhere &#8211; Can DataOps Save Us from Cloud Data Complexity?"},"content":{"rendered":"<p style=\"text-align: justify;\"><em>Cloud data was supposed to enable AI at scale and democratize data. But how do we cope with the new complexities of distributed data? The emerging discipline of DataOps may help us here &#8211; along with concepts like &#8220;Data in mind, data in hand.&#8221;<\/em><\/p>\n<p style=\"text-align: justify;\">What we\u2019re all striving for with data is timely and virtually frictionless access to it. It is a critical requirement for the expanding need for data science and machine learning (I leave out the qualifier \u201cAI\u201d because there is a large and expanding set of capabilities in AI that are not ML.)<\/p>\n<p style=\"text-align: justify;\">The precious time of skilled practitioners is often spent managing data instead of building models. Ingenious technology that allows them to think about data, and get it without delays, requests, and errors are here now. Data in mind, data in hand, is a concept that shrinks the effort and latency from conceiving a model and having the data to run it.<\/p>\n<p style=\"text-align: justify;\">The era of Data Science and Machine Learning created a fundamental shift in data provisioning for analytics. Until recently, data was moved and transformed from sources generically. In other words, decisions were made on what data elements and in what form were needed to satisfy a range of analytical requirements. This typically used persistent data warehouses and carefully constructed data transformation and integration.<\/p>\n<p style=\"text-align: justify;\">It was primarily a \u201cmanaging from scarcity\u201d approach to minimize hardware and storage costs.The source systems were relatively stable, but the major problem was their semantic dissonance. Merging and unifying various attributes with differing semantics took a great deal of time. Adding a new source required reexamining those relationships and a funded effort.Big Data coupled with cloud resources changed all of that by encouraging organizations to discard the scarcity concept and to gather data in data lakes, not by a perceived need of data elements, but for whole data sources in a myriad of formats in bulk.<\/p>\n<p style=\"text-align: justify;\">The problem with this approach was that the data had no unifying element and little of it produced value due to the difficulty of navigating the data lake. Most organizations felt compelled to move some or all of these collections to cloud services, and in a short period, data sources became distributed.The physical location of the data was of lesser concern.<\/p>\n<p style=\"text-align: justify;\">As a result, technologies for access and control had to deal with multiple locations, and accommodate the movement of data with abstractions that no longer require requesters to understand where the data is located at any point in time. This new reality posed a severe problem for those who relied on a steady source of integrated, conformed data. With the number of source systems identified and captured, it was no longer possible for an analyst to identify, much less qualify, a data source for their investigations.<\/p>\n<p style=\"text-align: justify;\">The situation became so complicated that a new approach emerged: DataOps. While DataOps promises to streamline analytics, it comes at a cost.The architecture to materialize this has many components and is complex. For the data scientist, \u201cThe Data in Mind, Data in Hand\u201d concept demands that all of this complexity is not hidden but rather exposed in such a way that all of the capabilities of the DataOps architecture are there for them to exploit.<\/p>\n<p style=\"text-align: justify;\">However, that is a lot of complexity: the components of operations, governance, and agile data pipelines. When you consider that every one of these elements represents multiple, if not hundreds of instances and that there is often more than one location in today\u2019s hybrid cloud world, DataOps masks the complexity.<\/p>\n<p style=\"text-align: justify;\">Still, the whole point of DataOps is to provide an\u201d intent-driven design.\u201d The reality is that data movement in a world of unfathomable data volumes is highly complex. However, simplifying the abstraction layer is still valuable, especially in democratizing the data experience. Nevertheless, the burgeoning world of analytics is not shying away from scale, so these are fundamental needs for the data team, engineers, IT, and decision scientists.<\/p>\n<p style=\"text-align: justify;\">Two essential components of the DataOps architecture are connectors and pipelines. A connector is merely a template to describe how to access the data in a particular source. A given connector may be used in a dozen, or even hundreds of different pipelines that are designed for a specific point-to-point transfer. SLAs and just about anything else that maybe needed to stage complex, vast, distributed data.<\/p>\n<p style=\"text-align: justify;\">A pipeline accesses a data source (or more than one). It can move data from place to place and perform transformations and operations on the data, such as profiling, transforming, cleaning, aggregating, and providing operational metrics. Pipelines are not singular operators. They can work across parallel processors and interoperate with each other.<\/p>\n<p style=\"text-align: justify;\">Once the number of active pipelines expands, a central function of DataOps is the overall management and orchestration of the entire environment.There is always tension between complexity and simplicity. Something that appears to be operating seamlessly relies on a great deal of structure, function, and complexity. The old concept of \u201cease of use\u201d is, essentially use-less. It tended to dumb-down things to make them understandable, resulting in \u201cmasked complexity.&#8221; That is not a useful approach to DataOps.<\/p>\n<p style=\"text-align: justify;\">There is a term that works. \u201cRevealed complexity\u201d means, in the case of a user interface, something designed to expose it in a metaphor that facilitates actions and disburdens the user from the underlying complexity while remaining approachable. If you had to drive your car with a GUI interface, you might not get out of the driveway because all of the functions are hidden behind drop-downs and buttons. Instead, a voice-based or even a \u201cstick\u201d whose subtle movements would invoke cascade of logical functions that are too numerous and too fast for you to control, but allow you access to all the underlying complexity.<\/p>\n<p style=\"text-align: justify;\">Each nuance with the stick controls a series of events you are not directly aware but control a the detail level. Many software products have user (masked complexity) interfaces and much richer, functional interfaces for administrators, for example. However, why limit this to administrators? Getting the job done takes a lot of structure, features, and complexity. However, revealed complexity, though it sounds like an oxymoron, reminds me of the old Dolly Parton quote, \u201cIt takes a lot of money to look this cheap.\u201d (Spoiler alert: this is not a misogynist comment, I admire Dolly Parton for concocting this clever phrasing. Dolly Parton is a musical genius, \u00a0national treasure and a philanthropist). She captured this tension perfectly.<\/p>\n<p style=\"text-align: justify;\">Revealed complexity should be a design goal for today\u2019s software systems. Self-service data access and analytics development stress the data supply chain by expanding and complicating the related systems. Analytic systems are different from operational systems because they are dynamic. Even the data sources that are stable and persistent are subject to data drift, changes in the semantics (the meaning of the data), and upstream and downstream systems.<\/p>\n<p style=\"text-align: justify;\">DataOps deals with this by applying the fundamental concepts of DevOps to the infinitely more complex world of data, providing the capabilities for data practitioners to become more effective. Because all of this data movement is complicated, the glue that holds DataOps together is monitoring and observability (I\u2019ll dig into that latter issue next month). Analytics performance can now go beyond the speed of data delivery or faster than the quality of the data can be assured.<\/p>\n<p style=\"text-align: justify;\">In the final analysis, is the data ready for consumption? The urgency with which real-time data is consumed and the impact of data drift has on data health makes continuous monitoring at every point in a pipeline critical to the performance of the application or process relying on the data.<\/p>\n<h2 style=\"text-align: justify;\">My take<\/h2>\n<p style=\"text-align: justify;\">About fifteen years ago, our data management and integration methods were primitive compared to today. While we can think of those days like an Andy Griffith show. Today this industry is more like \u201cFear the Walking Dead.\u201d \u00a0It just keeps coming back for you. I should point out that data pipelines can get pretty complicated, so tools for their orchestration, like Apache Airflow, are gaining traction. But consider this: 80% of in-house machine learning projects fail, and data is still the prevailing problem.<\/p>\n<p style=\"text-align: justify;\">Next time we\u2019ll take a look at that statistic and drill-down to ground truth. Maybe massive amounts aren\u2019t needed. Maybe there will emerge parsimonious solutions. Maybe synthetic data is the answer &#8211; clean, labeled and ready to go.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cloud data was supposed to enable AI at scale and democratize data. But how do we cope with the new complexities of distributed data? The emerging discipline of DataOps may help us here &#8211; along with concepts like &#8220;Data in mind, data in hand.&#8221; What we\u2019re all striving for with data is timely and virtually [&hellip;]<\/p>\n","protected":false},"author":433,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[11],"tags":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Distributed Data Sources are Everywhere - Can DataOps Save Us from Cloud Data Complexity?<\/title>\n<meta name=\"robots\" content=\"noindex, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Distributed Data Sources are Everywhere - Can DataOps Save Us from Cloud Data Complexity?\" \/>\n<meta property=\"og:description\" content=\"Cloud data was supposed to enable AI at scale and democratize data. But how do we cope with the new complexities of distributed data? The emerging discipline of DataOps may help us here &#8211; along with concepts like &#8220;Data in mind, data in hand.&#8221; What we\u2019re all striving for with data is timely and virtually [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/\" \/>\n<meta property=\"og:site_name\" content=\"Solutions Review Thought Leaders\" \/>\n<meta property=\"article:published_time\" content=\"2024-01-01T14:51:38+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-02-02T14:32:02+00:00\" \/>\n<meta name=\"author\" content=\"Neil Raden\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Neil Raden\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/\",\"url\":\"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/\",\"name\":\"Distributed Data Sources are Everywhere - Can DataOps Save Us from Cloud Data Complexity?\",\"isPartOf\":{\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/#website\"},\"datePublished\":\"2024-01-01T14:51:38+00:00\",\"dateModified\":\"2024-02-02T14:32:02+00:00\",\"author\":{\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/#\/schema\/person\/fe941647826b18f7a50b492466b043d9\"},\"breadcrumb\":{\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/solutionsreview.com\/thought-leaders\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Distributed Data Sources are Everywhere &#8211; Can DataOps Save Us from Cloud Data Complexity?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/#website\",\"url\":\"https:\/\/solutionsreview.com\/thought-leaders\/\",\"name\":\"Solutions Review Thought Leaders\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/solutionsreview.com\/thought-leaders\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/#\/schema\/person\/fe941647826b18f7a50b492466b043d9\",\"name\":\"Neil Raden\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/solutionsreview.com\/thought-leaders\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/0813278cb05cca09748dcebe9e2cc499?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/0813278cb05cca09748dcebe9e2cc499?s=96&d=mm&r=g\",\"caption\":\"Neil Raden\"},\"description\":\"Neil Raden is a mathematician, former P&amp;C actuary, consultant and industry analyst and has for more than a quarter-century devised and implemented analytical decision-making systems for industry and government He delivers context and advisory services in the application of analytics, decision management, AI and AI Ethics as an author and popular speaker.\",\"sameAs\":[\"https:\/\/www.hiredbrains.com\",\"www.linkedin.com\/in\/neilraden\/\"],\"url\":\"https:\/\/solutionsreview.com\/thought-leaders\/author\/neil-raden\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Distributed Data Sources are Everywhere - Can DataOps Save Us from Cloud Data Complexity?","robots":{"index":"noindex","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"og_locale":"en_US","og_type":"article","og_title":"Distributed Data Sources are Everywhere - Can DataOps Save Us from Cloud Data Complexity?","og_description":"Cloud data was supposed to enable AI at scale and democratize data. But how do we cope with the new complexities of distributed data? The emerging discipline of DataOps may help us here &#8211; along with concepts like &#8220;Data in mind, data in hand.&#8221; What we\u2019re all striving for with data is timely and virtually [&hellip;]","og_url":"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/","og_site_name":"Solutions Review Thought Leaders","article_published_time":"2024-01-01T14:51:38+00:00","article_modified_time":"2024-02-02T14:32:02+00:00","author":"Neil Raden","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Neil Raden","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/","url":"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/","name":"Distributed Data Sources are Everywhere - Can DataOps Save Us from Cloud Data Complexity?","isPartOf":{"@id":"https:\/\/solutionsreview.com\/thought-leaders\/#website"},"datePublished":"2024-01-01T14:51:38+00:00","dateModified":"2024-02-02T14:32:02+00:00","author":{"@id":"https:\/\/solutionsreview.com\/thought-leaders\/#\/schema\/person\/fe941647826b18f7a50b492466b043d9"},"breadcrumb":{"@id":"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/solutionsreview.com\/thought-leaders\/distributed-data-sources-are-everywhere-can-dataops-save-us-from-cloud-data-complexity\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/solutionsreview.com\/thought-leaders\/"},{"@type":"ListItem","position":2,"name":"Distributed Data Sources are Everywhere &#8211; Can DataOps Save Us from Cloud Data Complexity?"}]},{"@type":"WebSite","@id":"https:\/\/solutionsreview.com\/thought-leaders\/#website","url":"https:\/\/solutionsreview.com\/thought-leaders\/","name":"Solutions Review Thought Leaders","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/solutionsreview.com\/thought-leaders\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/solutionsreview.com\/thought-leaders\/#\/schema\/person\/fe941647826b18f7a50b492466b043d9","name":"Neil Raden","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/solutionsreview.com\/thought-leaders\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/0813278cb05cca09748dcebe9e2cc499?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0813278cb05cca09748dcebe9e2cc499?s=96&d=mm&r=g","caption":"Neil Raden"},"description":"Neil Raden is a mathematician, former P&amp;C actuary, consultant and industry analyst and has for more than a quarter-century devised and implemented analytical decision-making systems for industry and government He delivers context and advisory services in the application of analytics, decision management, AI and AI Ethics as an author and popular speaker.","sameAs":["https:\/\/www.hiredbrains.com","www.linkedin.com\/in\/neilraden\/"],"url":"https:\/\/solutionsreview.com\/thought-leaders\/author\/neil-raden\/"}]}},"_links":{"self":[{"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/posts\/548"}],"collection":[{"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/users\/433"}],"replies":[{"embeddable":true,"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/comments?post=548"}],"version-history":[{"count":0,"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/posts\/548\/revisions"}],"wp:attachment":[{"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/media?parent=548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/categories?post=548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/solutionsreview.com\/thought-leaders\/wp-json\/wp\/v2\/tags?post=548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}