
The Dangers of Dirty Data
We all think we know what dirty data is, but it can mean very different things to different people, and depending on who you speak to you could end up with many different definitions. But, at it’s most basic level, dirty data is anything thst’s incorrect.
Within procurement that it could be misspelt vendors, incorrect invoice descriptions, missing product codes, a lack of standard units of measure (e.g. ltr, l, litres), currency issues, duplicate invoices or incorrect/partially classified data.
Dirty data can affect the whole organisation, and we all have an impact on, and responsibility for the data we work with. Accurate data should be everyone’s responsibility, but currently across many organisations data is the sole responsibility of a person or department, and everyone trusts them to make sure the data is accurate.
How many times have you been working with a data set and noticed a small error but not said anything, or just manually corrected something from an automated report, just get it out the door on time. These small errors can filter all the way up to the top of an organisation through reports and dashboards where critical decisions are being made.
How does this affect my organisation?
One of the most widespread and noticeable impacts is around reporting and analytics. If you’re in senior management, you will most likely receive a dashboard from your team that you could be using to review cost savings, supplier negotiations, rationalisation, forecasting or budgets.
What if within that dashboard was £25k of cleaning spend under IBM? I can already hear you saying “that’s ridiculous”- well, it is obvious when pointed out, but I have seen with my own eyes. It can happen easily and occurs more frequently than you might think.
When there are tens or hundreds of thousands of rows of data, errors will occur multiple times across many suppliers. And for the wider organisation, this could affect demand, planning, sales, marketing and financial decisions.
Think back to the IBM example, each quarter the data is refreshed automatically with the cleaning classification, that £25k becomes £50k, then £75k the following quarter, it’s only when the value becomes significant that someone notices the issue. By this stage, how many decisions have been based on this incorrect information?
So, how do I fix it?
There’s no magic bullet or miracle solution out there to improve the accuracy of your data, you have to use your team or an experienced professional to get the job done. Get your team to familiarise themselves with the data, if they are reviewing and maintaining it regularly they will soon be able to spot errors in the data quickly and efficiently.
Your data should always have its COAT on and should always be:
Consistent – everyone working to the same standards
Organised – categorised properly
Accurate – correct.
Trustworthy – you wouldn’t drive around in a car without a regular inspection would you?
How do I get a data COAT?
With a spreadsheet of spend transactions over a period of time such as 12 to 24 months the first step should be Supplier Normalisation. This is where a new column is added to consolidate several versions of the same company to get a true picture of spend with that supplier. For example, I.B.M, IBM Ltd, I.B.M. would all be normalised to IBM.
Data can be classified using minimum information, such as Supplier Name, Invoice/PO line description and value. To get more from the data, other factors can then be added in, such as unit price. Where unit price information is not available, the quantity can be divided by the overall value.
A suitable taxonomy will then need to be found to classify the data. It can be an off the shelf product such as ProClass, UNSPSC, PROC-HE or a taxonomy can be customised so that it is specific to your organisation or industry.
This initial stage may take months is you are working with large volumes of data, it might be worth considering outsourcing this initial task to professionals experienced in this area, who will be able to complete the project in a shorter time, with greater accuracy.
It’ll save you money in the long run
Data accuracy is an investment, not a cost. Address the issues at the beginning – while it might seem like a costly exercise, you will undoubtedly spend less than if you have a to resolve an issue further down the line with a time-consuming and costly data clean-up operation. And by involving the whole team or organisation, it will be much easier to manage and maintain the most accurate data possible.
Spend data classification shows you the whole picture, as long as it’s accurate. You can get a true view of your spend, allowing improved cost savings, better contract compliance and possibly the most important – preventing costly mistakes before they happen.