This is part of Solutions Review’s Premium Content Series, a collection of contributed columns written by industry experts in maturing software categories. In this submission, Tamr Chief Product Officer Anthony Deighton asks whether the supervised machine learning approach is optimal and offers three key factors to consider.
Machine learning (ML) is everywhere, powering many aspects of modern technology, so things move faster and operate at scale. From the apps on our phones to the searches we conduct through the web’s most popular engines like Google and Yahoo!, ML is a driver behind most of our day-to-day activities – whether you’re aware of it or not.
ML and human expertise: you can’t have one without the other
ML is not some powerful magical box replacing human understanding, foresight, expertise, intellect, and decision-making. When you send data in and clean data comes out, human validation is essential to ensure a thorough examination of results and that there are no false positives. The advantage of ML is that it can assess data at scale. The disadvantage? Machines do not have the power to analyze their results the way humans do. Machines are robust but still far from perfect and wholly reliable in 2022. They certainly cannot be left to their own devices. When clean data comes out, there is little to no transparency about how that works; that’s where human feedback proves its effectiveness and essentiality. Humans improve data modeling.
On the other hand, we find processes that are 100 percent human-driven and lack the ML to yield faster results. Companies hire tens – even hundreds – of people, often in low-cost areas, and ask them to resolve the data and the entities in the data. Human-driven processes work, but they are labor-intensive. Mere mortals cannot operate at a machine scale. For this reason, data science and machine learning tools greatly complement those human-driven processes, just as human expertise is the perfect complement to ML.
So, in a nutshell, you can’t have one without the other.
Is Supervised Machine Learning Approach Optimal?
While humans and ML are options for doing master data management, I want to highlight that there is a middle ground that consists of the machine taking the lead and humans providing guidance and feedback to improve results. This is called “supervised ML,” a data-mastering approach that delivers the best outcomes.
Supervised ML combines the best of the machine’s capabilities with the best a human expert has to offer. Machines are very good at resolving data and entities at scale and with speed. They don’t get tired, which is a benefit, especially as data volumes continually increase. On the other hand, humans are skilled at providing feedback and ensuring that the machine’s lightning-speed results are, in fact, accurate. And the more feedback human analysts offer, the better the machine becomes. Another benefit of human involvement is trust. When humans participate in the process and are able to validate the results, they are more likely to trust the data.
Furthermore, when humans trust the data, they will likely use it in analytics and drive future decisions.
The Tesla and the Fire Truck: An Analogy
Let’s look at an analogy to illustrate my point. Companies like Tesla are touting the benefits of their self-driving cars. And they believe that the black box model delivers a better outcome – a mistake! Why? Because self-driving cars work well – until they don’t.
They don’t know what to do when they encounter a situation they’ve never seen before. And these cars don’t know how to anticipate the outcome. This scenario actually did happen with a Tesla. The car was driving itself and a stopped fire truck was ahead. The Tesla didn’t stop and ended up crashing into the fire truck. So why didn’t it stop? Because the situation was unknown to the machine, which did not anticipate a crash as the outcome. Human-supervised ML for driving is the best of both worlds due to the combined powers of machines and human oversight to guide those machines and establish rules and algorithms anticipating and planning for future scenarios.
For instance, if the human guided the Tesla, it could have applied the brake and stopped the car before it crashed into the fire truck. In the future, with human guidance, the machine would recognize the situation and know to apply the brake before it hits the fire truck.
Human in the Loop, ML Data Mastering
Just like human-supervised driving delivers the best outcomes, so does human-in-the-loop ML data mastering. Organizations benefit from the power of the machine to clean and curate data from a myriad of sources across multiple data silos, simultaneously reaping the value of human feedback. This process ensures that the machine stays on track and delivers the best result.
Here’s an equation I like to use to illustrate how the best modern data mastering solutions work:
Modern Data Mastering = 80 percent machine + 10 percent humans + 10 percent rules.
With current data mastering, you benefit from the power that machine learning provides and the valuable feedback only humans can contribute. The result? Better, more accurate data and higher levels of trust.
Machine learning is a great innovation, but we still need humans!
- Is the Supervised Machine Learning Approach Optimal? 3 Factors - October 14, 2022