Syncsort recently released the results from its fourth annual Big Data Survey. The data integration solution provider found that nearly 6 in 10 organizations were seeing major benefits from Hadoop and Spark. Syncsort polled 200 respondents including data architects, IT managers, developers, business intelligence/data analysts, and data scientists at organizations with interest in Hadoop and Spark. Participants represent a broad range of vertical industries such as financial services, insurance, healthcare, government, telecommunications, and retail.
More than half of those polled were seeing higher revenue and accelerated growth compared to last year, according to Syncsort. In addition, Hadoop and Spark are now in test or production at 70 percent of responding organizations (40 percent in production, 30 percent in proof of concept or pilot program). Conversely though, respondents are facing many of the same challenges with their data, including keeping pace with evolving technologies and software products.
Syncsort notes that major data lake trends emerged last year, and the company advises data and analytics professionals to monitor them in 2018. At Solutions Review, we’ve read the report in its entirety (available here), and pulled out the three we think are most important to watch this year.
1. Investment in big data is still on the rise
And probably will be indefinitely. Roughly 9 in 10 were convinced that utilizing Hadoop and Spark and moving away from legacy systems added value in creating insights from data, as well as cutting costs. Syncsort believes (and we agree) that as organizations optimize their legacy frameworks, additional resources will be poured into the funding of big data projects.
2. Keeping data “fresh” is a major pain point
According to the study, more than 3 in 4 responding organizations have difficulty keeping their data lake in sync with changing data sources, and even more so when the source is disparate or hard to access. The majority of data lake use cases are coming via ETL and data analytics (advanced, predictive, real-time), which means that data quality is of the utmost importance. As a result, data and analytics professionals will should take a longer look at ensuring fresh and up-to-date data is a key tenet of their data lake strategy.
3. Regulatory compliance is the top priority
This one doesn’t come as much surprise given that GDPR is knocking on the door. 40 percent of those polled regard data quality as their most significant struggle. This is likely a direct result of the increase in data lake use. The scope of data governance is expected to increase both vertically and horizontally in the years ahead as organizations will be required to place a high priority on compliance. The end-goal is for organizations to have a wide-angled view of where their data lives and where it has been.
For more big data trends, challenges, benefits, and a spotlight on how data quality impacts the data lake, be sure to read the full report.