Survey: Data Management Bogged Down by Data Pollution

Data Pollution

In a recent release, StreamSets, a vendor that delivers performance management for data flows announced the results from a global survey of over 300 professionals in the Data Management space. The survey was conducted by Dimensional Data, an independent research firm. According to the results of the survey, enterprises are increasingly being challenged on a wide range of key data performance management issues as a result of failing to stop bad data from interrupting the operational efficiency of data flows. As a result, 90 percent of those who took part in the survey reported that bad data has flown into their data stores. On the flip side of the coin, only 12 percent consider that they are doing a good job at data flow performance management.

As a result of all this bad data, data pollution has become a major threat to enterprise analytics. If enterprises run analytics on data that they fail to recognize as polluted, the insights generated as a result will lead them down the wrong paths. With real-time and streaming analytics via self-service methods quickly becoming the new normal in the enterprise, data professionals simply do not have the time or resources to spend on data quality that they once did. The results can become catastrophic, as clean data is the first step in generating worthwhile insights that create true business value.

Additional report highlights include:

  • 68 percent noted that data quality was the most common challenge they faced when managing Big Data flows
  • 74 percent of respondents reported having bad data in their stored currently, despite cleansing it throughout its lifecycle
  • Nearly two-thirds still use legacy ETL integration tools and 77 percent use hand-coding to design their data pipelines
  • The only metric where a large majority felt positively about their capabilities was in detecting a down pipeline, with 66 percent saying they were comfortable in identifying this issue

Girish Pancha, CEO at StreamSets, concludes: “In today’s world of real-time analytics, data flows are the lifeblood of an enterprise. The industry has long been fixated on managing data at rest and this myopia creates a real risk for enterprises as they attempt to harness big and fast data. It is imperative that we shift our mindset towards building continuous data operations capabilities that are in tune with the time-sensitive, dynamic nature of today’s data.”

Click here to read the survey in full.

Timothy King
Follow Tim

Leave a Reply