TCG vs. the Machines: Accepting AI and ChatGPT in Spend Data Classification

- by Susan Walsh, Expert in Data Management

What’s that coming over the hill – is it a monster? No, it’s just the 21st century. And whether Sarah Connor likes it or not Artificial Intelligence (AI) and Chat GPT, also known as Generative AI (GenAI) is here to stay. But despite Skynet’s best efforts, it won’t take over, we are certain of that. We can live with it, in fact, even cohabit with it (almost) harmoniously in the spend data classification world, but not before it has been classified and checked by a human.

You may have been told by your suppliers or a potential supplier that they’re using AI, machine learning or more recently, Chat GPT/GenAI, and this is what they’ll be using to classify your spend data. You may even have heard phrases like “neural networks”, “algorithms” and “LLMs”, lots of fancy words being thrown around. It all sounds very technical and a bit scary, especially if they’ve not given you any more information or explained what it actually does, and there can be an implied trust that it works, because it sounds like it does. But it doesn’t always.

In reality, AI and GenAI in spend data classification is not yet there. It may sound like a wonderful tool where you can just press a button and your data is magically classified, however the reality is much different. AI has to learn from an immense number of training data sets, and these data sets need to be clean and accurate. How do we do this? We use people, preferably subject matter experts (SMEs) to check and verify those training data sets. And to add more complexity into the mix, context is hugely important in spend data classification, and something that is still an issue within AI, GenAI and Machine Learning. So, humans still need to be heavily involved.

AI and GenAI will be here to help us, not to take over. Think of it more as a flower chasing Wall-E type situation than a gun-wielding Terminator and it becomes much less intimidating. The truth is, IT needs US. And to some extent, we need it too, but only when it’s correct. And for those jobs we really don’t want to do…

Very simply what happens is the AI, in whichever form, will learn from training data sets that have been previously classified and it will keep learning as new data is added. This is particularly successful in areas such as MRO (Maintenance, Repair, Ops) where the items are hardware such as nuts, bolts and screws. Their classification is unlikely to change, and so large volumes of data can be classified accurately.

There are, however, other instances where it is not so clear cut. For example, DHL could be classified as a courier in the training sets, so the AI will learn from this and classify DHL as a courier whenever it shows up. But DHL is not always used as a courier. DHL can also be used for warehousing, distribution and logistics depending on who the data set belongs to, and so it’s important to make this distinction, particularly for spend management.

And then there’s free text descriptions, plus many suppliers will provide multiple products or services, so it’s harder for the AI to learn. If you have a description “taxi from hotel to restaurant”, how does the AI know which one to choose? With the prevalence of GenAI, we’re getting better but you can’t blindly trust the tech.

That’s why it is absolutely crucial that the AI learns accurate data from a human from day one and is checked regularly for accuracy. What if there’s a glitch in the Matrix and the AI learns something it shouldn’t? (I know I’m mixing my movie metaphors here, but can you imagine what a film that’d be with Keanu Reeves AND Arnold Schwarzenegger?!). Imagine the implications of one tiny mistake in the early stages of the AI’s learning – let alone hundreds, if not thousands? That’s a lot of spend misclassified right there.

Picture being several processes down the line and realising a whole section of data has been wrongly classified over and over and… over again. And it wasn’t spotted because the AI had been left to do its thang. Someone will then need to manually classify your data while the AI is retrained. Not to mention having to go back through, find and correct the earlier mistakes.

To put it bluntly, without accurate human input the expensive software or service will be about as much use as a cardboard box on four wheels during an episode of robot wars.

When it comes to spend analytics tools, it’s not a case of a perfect solution yet. There is a lot of different methods used in machine learning/AI for data classification because there’s no single system that’s better than any other (at the moment). There’s no one method that’s leading the way, or has been perfected, and with GenAI entering the picture this adds another layer of complexities while we learn to understand its capabilities and limitations. That means you still need capable and adaptable humans, who have the know-how to manually classify and check data. (I know a good book for that…)

If you want to know any more about preparing your data for AI classification, checking it or classifying your spend data, then don’t hesitate to get in touch – it’s susan@theclassificationguru.com.

Susan Walsh

Expert in Data Management

Susan is a specialist in data classification, taxonomy customization, and data cleansing, as well as the founder of The Classification Guru. She is an industry thought leader, TEDx speaker, and author of the published ‘Between the Spreadsheets: Classifying and Fixing Dirty Data’. Susan has also developed a methodology to accurately and efficiently classify, cleanse and check data for errors which will help prevent costly mistakes.

Latest posts by Susan Walsh (see all)

Expert in Data Management

Data Governance 101: Look After Your Data COAT - February 20, 2025
What is Supplier Normalization & Why Does it Matter? - January 17, 2025
The Secret to Digital Transformation is Not Clean Data...It's People - December 2, 2024
The Secret to Digital Transformation is Not Clean Data…it’s People - July 11, 2024

Tagged

NextHow to Get Your Business Stakeholders to Want and Use a Data Glossary

TCG vs. the Machines: Accepting AI and ChatGPT in Spend Data Classification

Susan Walsh

Expert in Data Management

Latest posts by Susan Walsh (see all)

Expert in Data Management

Top Posts & Pages

Data & Analytics

Cybersecurity

Worktech