As part of Solutions Review’s Premium Content Series—a collection of contributed columns written by industry experts in maturing software categories— Mike Marks of Riverbed lays out why Unified Observability is the next phase of performance monitoring evolution, and how IT teams can adapt to this change.
Much of what gets written today about observability is in the context of the challenges that DevOps teams face in complicated, highly distributed cloud-native environments. In this context, observability has evolved from Application Performance Monitoring to help teams deal with the challenges of application monitoring, testing, and management in these environments. Observability applies to more than just cloud-native environments. IT teams face challenges managing other complex, highly distributed environments. There’s a need for a broader definition that applies to this expanded set of challenges. Unified Observability.
Real Unified Observability is hard. Big companies can have as many as 10,000 employees who are globally distributed in swiftly changing hybrid work environments. Those employees can have 10,000 unique laptop configurations and access the Internet through nearly as many Wi-Fi set-ups. But all the employees expect the same digital experience as if they’re working from the company HQ. Adding to the observability conundrum is that company can have 300,000 demanding customers, a technology infrastructure mix of legacy on-premises and cloud applications with all sorts of shadow IT lurking in the background.
To address the challenge of managing and optimizing today’s complex IT infrastructures, IT teams need an approach to Unified Observability that cuts across siloes and locales to collect information from all data sources with full fidelity.
Unified Observability: Performance Monitoring Evolved
Too Many Tools, Not the Right Data
According to a recent IDC poll, 90 percent of all IT teams use observability tools to monitor the current enterprise mix of geographies, applications, and networking requirements to gain visibility and manage that collection of resources. Roughly half of those teams are using six different observability tools, which are creating tens of thousands of alerts per day, too many for any team to monitor manually. With all that data coming from a variety of siloed tools that have so many alerts, collecting the right data and ensuring important information does not slip through the cracks is a challenge. The challenge is made more difficult by teams using limited or outdated tools. Roughly 60 percent of survey respondents said their organizations used tools that were too narrowly focused on managing the company’s complex web of on-premises hardware configurations, cloud-based services, and legacy on-premises applications. According to the IDC survey, 61 percent of IT teams feel these limited tools and siloed data views hamper productivity and collaboration.
To address this challenge, IT teams are moving toward using Unified Observability tools that unify the sources of telemetry across all domains and devices with full fidelity. These teams realize that sampling, only capturing some of the data, can lead to substantial blind spots. Imagine a retail company that’s only capturing one in ten customer complaints on Black Friday. The company wouldn’t know the full scope of the problem on a critical shopping day and would leave money in abandoned checkout carts. It’s not just that low sample rates are bad– they tell an incomplete story, and the incomplete story can lead to wrong conclusions.
The Human Element
To make sense of the data and uncover nuggets that lead to actionable insights, IT teams use Unified Observability tools with artificial intelligence and machine learning to detect anomalies more quickly and provide context for anomalies. For IT teams that may be suffering through alert fatigue, trying to sift through the noise in the signal to pinpoint the problem causing a delay for the customer service person can be incredibly difficult, especially with full telemetry delivering a flood of data. Historically, companies would turn to resource-intensive war rooms, often leading to a lot of finger-pointing but not necessarily quick resolutions. Or they would have a senior-level person, a guru, who, by the depth of their experience, could identify the source of the problem. But their ability to understand the big picture also makes gurus valuable to organizations because they can set the strategic vision for IT that drives the business. Having gurus troubleshoot problems across IT silos wastes resources, and if the guru leaves, the company doesn’t have a troubleshooter or a senior IT person.
Unified Observability leads to reduced ticketing and alert fatigue, improving the job satisfaction of junior and senior staff. Just as importantly, because Unified Observability cuts across silos, IT teams work collaboratively to remediate problems. At a time when IT staff are hard to find, Unified Observability plays a critical role in lessening the burden IT teams face.
Unified Observability and the Shift Left
Using automation tools powered by AI and ML, more junior IT staff can use runbooks to automate what they’re doing manually today. Many organizations have documented runbooks, and if they see a specific problem, they resolve it following manual steps. Unified Observability tools allow IT teams to create workflow engines that automate processes and remediate problems much faster. These automated workflow engines are also customizable, so teams can tweak them and feel confident that they know what the workflow engine will do. With preconfigured libraries that can be customized to deliver automated actions for frequently encountered problems, IT staff can focus on higher-level tasks.
Optimizing productivity and Service
In the survey, 75 percent of the teams said they struggle to gain actual insight from their current multiplicity of observability tools. A Unified Observability approach enables IT teams to take all of the data in their organization, connect it, and glean actionable insights. Those insights ensure a quality digital experience for end-users, ensuring things run smoothly and securely and keeping employees happy and productive. At the same time, automating remediation improves agility, optimizes costs, and boosts service.
Given the high percentage of organizations that are using observability tools, it is clear that most businesses understand the imperative of monitoring their infrastructure to deliver digital experiences that don’t disappoint customers or employees. However, most companies are stuck using multiple legacy tools that have limited capability and do not provide a comprehensive view of network performance and end-user satisfaction. To understand operating conditions across an organization’s digital infrastructure, companies are moving towards Unified Observability, minimizing the toolsets overwhelmed IT teams have to manage while maximizing their ability to find insights that improve organizational productivity.