42 Data Management Predictions from 30 Experts for 2022
We polled 30 experts and received 42 data management predictions for 2022, in an attempt to help you make the best business decisions.
As part of Solutions Review’s third-annual #BIInsightJam, we called for the industry’s best and brightest to share their data management predictions for 2022. The experts featured here represent the top data management solution providers with experience in this niche. Data management predictions have been vetted for relevance and ability to add business value as well. These are the 1best predictions from the dozens we received. We believe these are actionable and may impact a number of verticals, regions, and organization sizes.
Note: Data management predictions are listed in the order we received them.
Raj Verma, CEO at SingleStore
Databases 3.0: The Great Database Consolidation
“The first generation of databases were the Oracles and Informix and DB2. The second was this database sprawl where you saw the influx of DB2, Couchbase went public, and the other 300. The next generation of databases is the consolidation of these data platforms and types into a database that can handle modern data, and do it in a hybrid, multi-cloud manner with extremely low latency.”
Oliver Schabenberger, Chief Innovation Officer at SingleStore
Data Intensity Will be the New KPI
“The concepts of data intensity and complexity will be widely adopted in the coming years to measure digital dexterity, as organizations need to drive data intensity without adding complexity. Data intensity increases naturally as more constraints are connected to the data: variety, volume or velocity, geographic distribution, diverse types and structure, diverse use cases, automation privacy, security, number of producers and consumers.”
From Storing Data to Supporting Decisions
“By 2024, data technology will have evolved to allow organizations to optimize for frictionless digital transformation rather than optimize for read/write of transactions or efficient scans of large datasets. Databases will be an active participant and orchestrator of decision support. Analytic assets such as model pipelines, networks, business rules will be a common form of metadata just as structural or descriptive metadata is today.”
Haoyuan Li, Founder and CEO at Alluxio
Data Sharing Across the Cloud
“With SaaS and managed services in the cloud creating data silos, improved governance and catalog with a data fabric spanning multiple services will come to the rescue in 2022. Sharing data across tenants and multiple service providers efficiently and securely will make data exchange easier than ever before.”
Rise of Table Formats for Data Lakes
“New stack both in the storage and the compute layer keeps innovating. Data Lakes are rising to prominence and structured data is transitioning to new formats. In 2022, open-source projects like Apache Iceberg or Apache Hudi will replace more traditional Hive warehouses in cloud-native environments, enabling Presto and Spark workloads running more efficiently on a large scale.”
David Richards, CEO at WANdisco
Enterprises Will Finally Start to Benefit from Activated Data at Scale
“Up to this point, companies have struggled to capture the full value of their data. Despite recognizing the importance of using data to inform decision-making, most organizations still haven’t fully realized their data’s potential. I believe next year will be the year that more companies start putting all of their data to use, at scale. We’ll start to see more enterprises make strides towards activating large volumes of unstructured data, getting it into the cloud, and using it to inform product roadmaps and identify new revenue opportunities. Beyond simply automating data management, or optimizing it for analytics, forward-thinking companies will start to turn real-time data into fuel for business growth.”
Steven Mih, Co-Founder and CEO at Ahana
Investment and Adoption of Managed Services for Open-Source Will Soar
“More companies will invest in and adopt managed services for open source in 2022 (not to be confused with Managed Service Providers or MSPs). As more rich yet complex, cloud-native open source technologies become mainstream (think Spark, Kafka, Presto, Hudi, Superset), enterprises want to leverage those distributed systems but don’t want the operational heavy-lifting to fall on their platform engineers. Open source companies offering easier-to-use managed service versions of installed software solve this, enabling companies to take advantage of these powerful systems without the resource overhead so they can focus on faster time to market and business-driven innovation.”
Dipti Borkar, Co-Founder and Chief Product Officer at Ahana
OpenFlake – the Open Data Lake for Warehouse Workloads
“Data warehouses like Snowflake are the new Teradata – they’re locking people into proprietary formats. As users start feeling the burden of higher costs as the size of their cloud data warehouse grows, they’ll start looking for cheaper AND open options that don’t lock them into a proprietary format or technology. In 2022 it’ll be all about the Open Data Lake Analytics stack, the stack that allows for open formats, open source, open cloud – and absolutely no lock-in.”
Database Engineering is Cool Again
“The debate on the Data Warehouse or Data Lake seemed to headline the past year, with a pronounced movement to the Data Lake. Now in 2022, it’s time to make database engineering cool again – on the Data Lake. That means the database benchmarking wars will be back in action. As the performance of disaggregated query engines on data lakes meets and exceeds the performance of tightly coupled data warehouses, workloads will migrate to the data lake for lower cost and more flexibility. The database engineers who can build a data lake stack with data warehousing capabilities (transactions, security) but without the compromises (lock-in, cost) will win.”
Christal Bemont, CEO at Talend
Business Leaders Around the World Will Finally See Data as More Than Just a “Nice-to-Have” and Instead as More of a Business Enabler
“Prior to the pandemic, organizations viewed data as a nice-to-have option for increasing operational efficiency and remaining competitive in the market. Now, it’s an imperative and the only way to survive in today’s landscape. Data can improve customer service, uncover opportunities, improve agility and enable data-driven decision making. Making data more accessible, cleaning up data health and putting it into the right context will unlock these business drivers and deliver revenue.”
Krishna Tammana, CTO at Talend
Data Management Will Shift Organizations’ Focus From the Mechanics of Moving and Storing Information to Focusing on Business Outcomes
“As intelligent automation continues to transform the way businesses operate, organizations are starting to realize that AI/ML is only as good as the data they feed into it. If businesses can ensure healthy data at scale and at the speed of business, they will be able to truly unlock the power of data analytics and deliver successful business outcomes.”
New Data Privacy Laws Will Continue to Emerge, Increasing Compliance Complexity and Driving the Need for a Global Consumer Data Privacy Standard
“Using the EU’s General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) as a model, governments globally and state-by-state in the US will create new regulations to give consumers rights over the data companies collect about them and how they use it. These disparate regulations will increase complexity for organizations trying to comply across regions. However, this could be the push we need to develop a truly global standard for consumer data privacy.”
Bruce Kornfeld, Chief Product Officer at StorMagic
New Data Management Approaches Needed at the Edge
“There is just too much data now being generated outside the datacenter and cloud (Gartner says by 2025 it will be 75 percent). Today’s edge computing platforms weren’t designed to handle this – a new approach is needed to store the data effectively, “thin” the data by finding only the useful parts and then making it easy for analytics, machine learning, and AI to extract value for organizations.”
Krishna Subramanian, COO at Komprise
Data Fabric Will Become a Strategic Enterprise IT Trend
“Data Fabric is still a vision. It recognizes that your data is living in a lot of places and a fabric can bridge the silos and deliver greater portability, visibility, and governance. Data Fabric research has typically focused on semi-structured and structured data. But 90 percent of the world’s data now is unstructured (think videos, X-Rays, genomics files, log files, sensor data) and this data has no defined schema. Data lakes and data analytics applications cannot readily access this dark data locked in files. So data fabric technologies need to bridge the unstructured data storage (file storage and object storage) and data analytics platforms (data lakes, ML and natural language processors, image analytics etc).
Analyzing unstructured data is becoming more important as machine learning relies on unstructured data. Data Fabric technologies need to be open, standards-based and look across environments. In 2022, the data fabric should move from being a vision to a set of architectural principles of data management. Technology vendors need to incorporate unstructured data into their data fabric architectures given its rising importance and sheer magnitude.”
Data Silos Are Not Going Away; It’s Time to Embrace Them
“Data silos are not going away, and nobody wants to commit to vendor lock-in to avoid the silos. The answer is not to worry about the silos but look for solutions that can look across the data – search, classify, secure, visualize it in place – without forcing you to put all your data into one location or technology. Another area we think is going to gain visibility is cross-platform, portable tag management. This would enable data managers and data scientists to move files into new clouds or applications yet retain the tags which are critical for rapidly searching and segmenting data to feed data analytics pipelines. The role of storage IT is also evolving to data management and enabling business outcomes rather than managing infrastructure.”
Ed Macosky, SVP and Head of Product at Boomi
Hyperautomation Will Replace Automation as the Next Business Imperative for Organizations Undergoing Digital Transformation
“Advances in automation have created operational efficiencies, but these automations are typically static. If processes, workflows, apps or data change, developers must update their automations—essentially turning an automated process into a manual one. Hyperautomation, on the other hand, uses AI/ML to identify patterns to create smarter automations that can evolve and adapt to change at the speed and scale businesses need now.”
Ben Slater, Chief Product Officer at Instaclustr
2022 Will See the Maturity of Database Management Solutions That Leverage Machine Learning-Fueled Predictive Analytics
“Traditional, query performance has been a function of database administrators’ expertise and manual efforts to analyze traffic patterns and storage growth. The realities of data design flexibility, the challenge of predicting data usage patterns, and relatively poor control of storage management represent limiting complexities for DBAs. New ML-fueled strategies and tools will mitigate this complexity through predictive models that can guess where data resides. These solutions will create data indexes, handle reindexing, and manage storage to automatically deliver superior query performance, transforming database operations going forward, and 2022 will be a big momentum push in this direction.”
Edo Liberty, CEO at Pinecone
The Rapid Rise of Complex Data Will Lead to Broader Adoption of Vector Databases
“Complex data, unstructured forms of data that include documents, images, videos, and plain text on the web, contain hidden insights that are very hard for traditional databases to organize and interpret. By using machine learning models, organizations increasingly turn these objects into vector embeddings that describe complex data objects numerically across hundreds of different dimensions.
A vector database indexes and stores these vectors to power similarity search, recommendation engines, and anomaly detection by analyzing the nearest match to a question or search query. These systems have been notoriously difficult to develop and implement, and until now, have been reserved for tech giants like Google, Facebook, Amazon, and Spotify. Organizations of all sizes will be able to implement similarity search for complex data by partnering with new SaaS and open-source vector database solutions.”
Craig Stewart, CTO at SnapLogic
The Importance of Self-Service Data Access
“This has been bubbling to the surface for a bit, but next year will be the year that companies truly understand the importance of opening up data access to all parts of their organizations. They will move forward by giving non-IT, business-level employees not just access to the company’s data, but the tools and skills to utilize said data themselves. The future is enabling individual departments to procure, develop, analyze, and use what they need – with IT visibility and governance as required – and the last couple of years have finally convinced companies that this is the way forward. The technologies are maturing to enable this to be realized.”
The Convergence of AI, Integration, and Automation
“The increased adoption of AI and ML are bringing the industry closer to self-powered integration. By utilizing automated, smart tools, powered by AI, companies will be able to integrate applications and technologies together quickly and easily, without consuming expensive and error-prone manpower to make things work. AI will help make the connections and guide the path forward to ensure employees quickly have access to the data and tools they need to drive value for the organization.”
Dale Renner, CEO at Redpoint Global
With the Shift Away From Third-Party Data, We Know That First-Party Data Will be the Way Forward for Brands to Gather Insight on Customers
“To differentiate in a cookie-less world though, it will be the companies that prioritize building brand equity that will be most effective at fostering consumer trust. These businesses know that poor usage and accountability of data will cause their brand equity and trust among consumers to decrease. With this in mind, we can expect these brands to pay closer attention to the use of, protection of and the stewardship of customer data. Specifically, they’ll set the standard by ensuring all customer data never leaves the control of the organization, becoming increasingly reluctant to push it out to managed services.”
Ravi Shankar, SVP at Denodo
Data Fabric Becomes the Foundation for the Distributed Enterprise
“As digital business and online sales channels proliferate and remote work becomes the norm, it creates a complex and diverse ecosystem of devices, applications, and data infrastructure. In particular, data infrastructure can span on-premises, single cloud, multi-cloud, hybrid-cloud, or a combination of these, spread across regional boundaries with no single solution to knit these data together. Even the latest trends of developing data lake houses in the cloud are falling short.
In 2022, organizations will create a data fabric to drive enterprise-wide data and analytics and to automate many of the data exploration, ingestion, integration, and preparation tasks. By enabling organizations to choose their preferred tools, these data fabrics will reduce time-to-delivery and make it a preferred data management approach in the coming year.”
Data Mesh Architectures Become More Enticing
“As organizations grow in size and complexity, central data teams are forced to deal with a wide array of functional units and associated data consumers. This makes it difficult to understand the data requirements for all cross functional teams and offer the right set of data products to their consumers. Data mesh is a new decentralized data architecture approach for data analytics that aims to remove bottlenecks and take data decisions closer to those who understand the data.
In 2022 and beyond, larger organizations with distributed data environments will implement a data mesh architecture to minimize data silos, avoid duplication of effort, and ensure consistency. Data mesh will create a unified infrastructure enabling domains to create and share data products while enforcing standards for interoperability, quality, governance, and security.”
Kendall Clark, Founder and CEO at Stardog
The Era of Big Data Centralization and Consolidation is Over
“The importance of centralized or consolidated data storage will also come to the forefront in 2022. To be clear this trend isn’t the end of storage, but is the end of centrally consolidated approaches to data storage particularly for analytics and app dev. In 2022, we will see the continuation of the big fight that’s brewing in the data analytics space as old ways of managing enterprise data, focusing on patterns of consolidation and centralization, reach a peak and then start to trend downward. Part of what we’re about to see unfold in the big fight between Snowflake and Databricks in 2022 and beyond is a function of their differing approaches to centralized consolidation.
But it’s not just technical pressures. The economics of unavoidable data movement in a hybrid multicloud world are not good and don’t look to be improving. Customers and investors are pushing back against the kind of lock-in that accompanies centralization approaches so anticipate the pendulum swinging in the direction of decentralization and disintermediation of the data analytics stack in the coming year.”
Data Fabric Goes Mainstream
“Data fabric is the future of data management according to analysts but in 2022, the maturity of enterprise data fabric as the key to data integration in the hybrid multicloud world will become more commercially evident. 2022 will see high-profile enterprise adoption around use cases like analytics modernization, acceleration of insights from data lakes, digital twin in manufacturing and supply chain, as well as drug discovery and supply chain control tower in pharma and life sciences.
Just as race cars without high-octane fuel sources are no more than beautiful, static sculptures, analytics platforms including AI/ML without total data mastery, accessibility, and innovation data integration solutions will fail to live up to their potential. Market signals also suggest that next year the enterprise itself will get serious about finding new ways to integrate and connect data in the new hybrid multicloud world we all live in.”
Raj Gossain, Chief Product Officer at Alation
Data Catalogs Will Evolve as a Necessity for Competitive Advantage
“Data intelligence has drastically improved in recent years and organizations are recognizing the value of their data. As a result, companies are taking steps to reinforce data-driven decision-making. Many of today’s leading enterprises understand the value of data catalogs and how they help organizations remain competitive by enabling confident data-driven decision-making. Next, data catalogs will become increasingly popular among mainstream enterprises who are just starting to realize the benefits. In 2022, leadership executives, such as CFOs, will understand that data catalogs are a non-negotiable – and a ‘must have.’ It is then that even more enterprises will reap the benefits of streamlined analytics and productivity to generate more sales or meet superior margins compared to market rivals.”
Companies Will Unlock Essential Business Value by Utilizing Public and Private Data Marketplaces
“Today, companies are already buying data sets to innovate or get insights where data is lacking. In 2022, we will see an increase in organizations turning to public data marketplaces, using two approaches. First, companies that use data catalogs to access, use, and understand the rich data within their organization, will recognize that joining enterprise data with third-party data sets unlocks even more value and productivity than ever before. In contrast, traditional companies will realize that proprietary internal data sets can be monetized and packaged for consumption by other companies, thus creating new revenue streams that will make it easier for enterprises to discover and use.”
Susan Cook, CEO at Zaloni
We’re Only at the Tip of the Iceberg With Solutions That Increase Faith and Trust in Data
“We’ve only seen the tip of the iceberg of technology solutions that are truly able to handle data accuracy and relevancy. In 2022, we will leverage machine learning and automation more fully to manage, govern and improve data. Once we do that, enterprises will have more trust and faith that they have good quality data, which will result in much faster and better decisions.”
Ashwin Nayak, VP of Engineering at Zaloni
The Defining Year for Quantifying Data Governance ROI to the C-Suite
“Historically, the c-suite, especially the CDO role, hasn’t perceived data governance investments as a strategic value-add, largely due to a lack of well-defined, tangible measures of success. That will change in 2022.
Establishing KPIs to link data quality to ROI, measuring usage metrics of data assets, and implementing policies to protect data are the missing elements in communicating the value of governance investments. The data protection initiatives require partnership with CIO to define policies, but the implementation will fall under CDO organization. Identifying data usage, history, access levels, sources, and endpoints across different applications and databases with search-based knowledge graphs will be a strategic priority for end-to-end observability. When organizations begin connecting governance KPIs to business value, it will inevitably lead to opportunities to activate and leverage their data to drive competitive advantage.”
Matthew Monahan, Director of Product Management at Zaloni
Data Governance Will Rely on MLOps
“The best ML technologies have well-defined training sets and MLOps techniques to identify data at the right time, from the development process through training and testing. This MLOps transition parallels what we see in DataOps and what we saw with DevOps: you need to have good metadata to accomplish those processes. In the coming year, we will begin to see more crossover between data governance and MLOps because you need not just high-quality source data but also metadata to describe the data to feed into the MLOps process for development, training, and testing of those algorithms.”
Fraser Harris, VP of Product at Fivetran
“As we’re seeing more and more large organizations fully embrace the modern data stack, many are now grappling with how to govern what’s there. In 2022, there will be a huge amount of work within the industry to help organizations solve data governance. Expect to see advances in interoperability between tools and APIs that expose metadata. There will be an early push towards standardizing our understanding of what data is and what compliance policies apply to it. Software projects and vendors that don’t collaborate on governance are going to be left behind. Doing your own isolated thing won’t solve problems at scale for enterprises.”
Dhruba Borthakur, Co-Founder and CTO at Rockset
Democratization of Real-Time Data
“The democratization of real-time data follows upon a more general democratization of data that has been happening for a while. Companies have been bringing data-driven decision making out of the hands of a select few and enabling more employees to access and analyze data for themselves. As the access to data becomes commodified, data itself becomes differentiated. The fresher the data, the more valuable it is.
Every other business is now feeling the pressure to take advantage of real-time data to provide instant, personalized customer service, automate operational decision making, or feed ML models with the freshest data. Businesses that provide their developers unfettered access to real-time data in 2022, without requiring them to be data engineering heroes, will leap ahead of laggards and reap the benefits.”
Tomer Shiran, Founder and CPO at Dremio
Data Warehouses are Dead! Hello Open Data Architectures
“Newer technologies like data lakehouses will gain even more traction in 2022 because they have more to offer the enterprise than older data warehouse models that lock them in and drive up costs. Companies are more budget-conscious than ever and will be reevaluating their data management systems. With a lakehouse architecture, there’s no need to ETL data from the lake into the warehouse.”
Data Lakes Become the Preferred Platform for All Companies
“Functionality of data lakes will get even easier to use, making them as easy to get started with as any data warehouse. Even non-technical workers will be able to easily get up and running on a data lake – thus eliminating the complexity and high costs of older data warehouse models. As a result of these significant cost savings and lower barriers to entry, we can expect to see smaller companies and start-ups embrace this model, in addition to larger companies.”
David Mariani, Co-Founder and CTO at AtScale
Data Sharing Between Organizations Become More Standardized and Commonplace
“Vendors like Snowflake, Amazon and Databricks continue to make major strides in making it easier to share data across individuals, workgroups, applications and organizations. Data sharing lets different groups or organizations consume data through shared views. These new capabilities eliminate the need to copy data or create custom data extracts which allow more organizations to share their data with their partners without expensive data engineering.
Data sharing allows teams to generate more value from data assets – supporting collaboration, better alignment, and more real-time views of data. We’ll see an increase in data sharing across different business units (e.g. between Data Science and BI teams); across applications (e.g. BI dashboards and embedded analytics); trading partners (e.g. inventory data).”
Karthik Ranganathan, Co-Founder and CTO at Yugabyte
Modern Data Reference Architectures
“Monolithic RDBMSs were not designed to meet the needs of cloud-native applications. The rise of microservices, cloud infrastructure, and DevOps puts pressure on traditional systems of record. Companies are increasingly seeking databases that can run anywhere that cloud native applications are deployed; across private, public, hybrid, and multi-cloud environments. To satisfy demand, databases need to combine powerful RDBMS capabilities with cloud native resilience, scale, and geo-distribution. They also need to quickly, easily, and non-disruptively scale to handle peak demand.”
Luke Han, CEO at Kyligence
Life After Hadoop
“In 2022, we can expect the continued decline of the Hadoop platform, even though like some tough weeds in your garden the roots and trailers of Hadoop will be hard to completely eradicate. Expect CIOs and data teams to continue to de-emphasize Hadoop and to continue the process of removing it from their production data stack.
Also look for IT departments to continue to make their on prem implementations look and function like the public cloud. In the near term, organizations may continue to use the Hadoop File System (HDFS) as a storage platform until a better private cloud storage solution can be devised.
In reality, to protect existing investments, and to comply with local government regulation, organizations can’t simply move all existing workloads and applications built on top of on-premise Hadoop to the public cloud. The on-premise data stack will continue to exist. A hybrid solution across the public cloud and private cloud will be a more practical approach.”
“As more technology organizations look to drive greater automation of analytics processes and greater developer productivity, the monetization of APIs and the pursuit of an API economy will naturally affect data engineering, data management and analytics. In 2022, an increasing number of businesses will push data-driven decisioning and predictive analytics into the mainstream with a factory-like Data-as-a-Service approach.
The systematic implementation of APIs that deliver data, metadata, and essential intelligence will not only be used for public, customer facing processes, but also for internal usage. This will put DaaS APIs at the center of a hub for all enterprise processes, workflows and management.”
Matt Carroll, CEO at Immuta
The Data Governance/Access Control Market Will Accelerate
“In 2022, the demand will skyrocket for scalable, automated ways to author and evolve complex data access control policies, the need to simplify data policy management, and the desire to efficiently scale cloud data and analytics initiatives to an ever-growing number of internal and external data consumers.
Data teams need and want the ability to deploy row access and column masking policies, as well as leverage object tagging while benefiting from universal cloud policy authoring and highly scalable and evolvable attribute-based access controls, and 2022 is the year to do it.”
Saket Saurabh, CEO at Nexla
Data Connectors’ Limitations in Retail and eCommerce Will Rear Their Ugly Heads
“Data connection matters big time because it allows retailers to combine system metadata (e.g. from the order fulfillment, processing, delivery stages, etc.) to generate a deep understanding of that data (not just knee jerk or superficial response to data) and what it means with regards to dynamically changing consumer shopping habits and debilitating obstacles in the retailers’ entire supply chain. Retailers that don’t have strong data connectors are at risk of public failures and revenue losses.”
Yael Ben Arie, CEO at Octopai
“With great power comes great responsibility. There has been an explosion of growth in companies using IoT devices and the implementation of digital technology has caused an enormous increase in the amount of data that companies possess. Because of this mass infiltration, companies are tasked with managing large quantities of data coming in regularly and from multiple sources, which has created a need to consolidate data and create a central system for BI and data management. Today data is the overarching framework of any business and enterprises will need to evaluate how they manage data in order to keep up with the digital transformations of 2022 and beyond.”
“Businesses that manage data well will leverage data to provide a high level of agility and respond to business changes intelligently and quickly. Shifts in business are inevitably happening at a faster rate and being able to respond to those changes efficiently creates enormous value for any company. Data is at the center of that, by accessing data, businesses have the power to react to change informatively. And because of the dependency on data to make these decisions, ensuring the accuracy of data is more important than ever. With so many devices and streams of data coming in from multiple sources, creating a single platform to manage all data will be imperative to ensure accuracy.”