Data Curation and Governance are the Top-Two Data Engineering Challenges for 2022
Data curation and governance are the top-two data engineering challenges for 2022, according to a report by Gradient Flow and Immuta.
Data curation and governance are the top-two data engineering challenges for 2022, according to a new report commissioned by Gradient Flow and Immuta. The 2022 State of Data Engineering survey examined the changing landscape of data engineering and operations challenges, tools, and opportunities. The data engineering challenges that data professionals worry about most come after data has been extracted, loaded, and transformed. Data for the report was gathered from a global audience of 372 respondents, more than half of which were data engineers or data architects, over 61 days.
The main data engineering challenges cited by those polled include validation, data monitoring and auditing for compliance, data masking and anonymization, as well as data discovery. Nearly two-thirds of respondents (65 percent) said their company is either 100 percent cloud-based or will be in the next 12-to-24 months. In the same way, 62 percent of respondents signaled their plans to adopt one of the top-five cloud databases and platforms (Amazon Redshift, Amazon Athena, Google BigQuery, Databricks, and Snowflake) in the months ahead.
While 64 percent of those polled come from organizations already collecting and storing sensitive data, the vast majority (88 percent) indicated that their firms are subject to one or more data use rules or regulations like GDPR, HIPAA, CCPA, and SOC 2. Additionally, 30 percent of respondents reported a need to comply with internal, company-specific rules around data. Somewhat concerning is that more than a quarter of all those polled were unsure of what (if any) data quality solution their organization is currently using.
The data engineering landscape is changing and maturing. Whereas years ago there were few, if any, tools to solve data challenges, a plethora of technologies – both commercial and open-source – are now available. These technologies are helping organizations leverage their sensitive data for real-time access and analytics, all while protecting it in accordance with a growing body of regulatory requirements. There are also an entirely new crop of data engineering training courses and online certification options available (tto enable technical and non-technical users alike to develop on-the-fly skills.