Timing is Everything: Why Microsecond Visibility is Essential for Troubleshooting

Microsecond Visibility

Solutions Review’s Expert Insights Series is a collection of contributed articles written by industry experts in enterprise software categories. Michael Bacon of Accedian lays out why every (micro)second counts, and what a lack of microsecond visibility can do to your network.

Expert Insights badgeBy the time you finish reading this sentence, how many things have gone wrong in your network? And how would you know? The fact is, you probably wouldn’t– not if you’re using a traditional network monitoring system (NMS). Such systems typically analyze data only every minute, five minutes, or even less frequently– which is a lifetime in many enterprise verticals.

Take financial services, for example. The European Union and the U.S. Securities and Exchange Commission (SEC) have rules requiring traders to tightly synchronize their clocks. Under Financial Industry Regulatory Authority (FINRA) 7340, for instance, U.S. firms must keep their clocks within one second of the official time from the National Institute of Standards and Technology (NIST). Another example is the manufacturing sector, in which time-sensitive networking (TSN) is the foundation for highly automated factories and industrial robotics. Many TSN applications have latency limits ranging from two milliseconds (ms) to as little as 125 microseconds (μs).

Microsecond Visibility and Modern Troubleshooting

When Everything Looks Fine, But It’s Not

It’s tempting to think that microsecond visibility is overkill for most other enterprises, but that’s not true. In fact, it’s become a must-have simply because of how complex today’s enterprise IT environments are. The cloud is a significant driver behind this complexity. In 2020, 20 percent of enterprise workloads were in the cloud. By 2023, the amount will double to 40 percent, according to Gartner. Organizations are grappling with multiple cloud providers and hybrid cloud approaches, which makes debugging more difficult. The network is vastly more complex. Because of this, more than ever before, IT departments need microsecond visibility into the network links connecting employees, IoT devices, cu,stomers and business partners to multiple public and private clouds.

A traditional NMS that analyzes data every few minutes struggles in this highly complex new world. If there’s a micro outage — consecutive packets that are lost because the network is unresponsive — and it occurs between polling intervals, the NMS won’t catch it. Instead, it will report that everything is fine, even though it’s not. That micro outage might be the first sign of a link that’s on its way to total failure, cutting off employees from their cloud CRM or EPS for an hour or afternoon. This could lead to countless hours of lost productivity. That’s just one example of the business impact of flying blind.

Even if the link isn’t headed for complete failure, those micro outages can still cause numerous performance issues. Suppose the network is unresponsive for two or three seconds every 10 minutes. That might sound brief, but it’s a long time for bursty applications such as Microsoft Teams, Zoom, and other video collaboration platforms. Additionally, all of the applications using that link now have to retransmit their lost packets, which unnecessarily increases bandwidth usage. That clogs up the link and can eventually degrade performance to the point that users complain. Then the IT team has to scramble to track down the root cause – and wonder why their NMS was consistently green when it should have been flashing red.

Microsecond Visibility: How Active Monitoring Eliminates Blind Spots

Enterprises can better highlight these blind spots by using active monitoring solutions that use the Two-Way Active Measurement Protocol (TWAMP) to set key performance indicators (KPIs) for the loss burst maximum and the loss burst minimum. In an analysis interval, the system then calculates how many consecutive packets were lost by generating, for example, 1,200 packets in a 60-second interval.

To assist the IT team during a troubleshooting session, active monitoring software can increase the number of generated packets per second (PPS) and reduce the analysis interval window all the way down to one second. This additional granularity is critical for ferreting out problems that remain hidden in traditional NMS’s blind spots. A lot can happen in just one second. For example, losing eight consecutive packets while generating 20 PPS means that the network was unresponsive for 400 ms. During that 400 ms, applications such as video collaboration and point-of-sale terminals must retransmit all of their packets. Without active monitoring granularity revealing why they’re retransmitting, the IT department would be left to wonder why network performance is lagging as it struggles to keep up with the additional traffic.

Here’s a real-world example: Goya Foods, a large U.S.-based food company, sought to increase the performance of its networks and services to support current initiatives and encourage future growth. The company manages critical information flow across disparate locations, data centers, cloud environments, and real-time manufacturing plants. What was needed was a solution that could provide microsecond-level visibility into the performance of each of these– while also enabling the company to continually scrutinize and optimize user and application performance, as well as quickly identify the root cause as soon as issues are detected. And if that degradation is due to a cybersecurity breach, Goya can catch it before it spreads across a plant or the entire company.

Using Next-Gen Performance Monitoring for Success in 2023

Mobile operators and other service providers understand the importance of microsecond visibility for nipping problems in the bud. That’s why they rely on next-gen performance monitoring solutions to detect micro outages and other emerging performance issues so they can be fixed before applications and users start to notice. And more enterprises are increasingly following suit. As we settle into the new year, it’s the perfect time to start considering how this approach can improve your enterprise operations, too.

Michael Bacon
Follow Mike