Building Resilient Systems in a World Without Predictability

Krishna Sai, Chief Technology Officer at SolarWinds, walks us through some best practices for building resilient systems in an unpredictable world. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.
Today’s uncertain economic climate may compel your organization to look for cost-cutting measures, and that isn’t necessarily a bad thing. When done strategically, and not just to slash budgets for the short term, eliminating inefficiencies and optimizing processes can strengthen the business. Due to the unpredictable nature of today’s IT environments, comprehensive observability, a vital IT function to enhance efficiencies, can be difficult. However, with the right approach, tools, and perspective, IT leaders can achieve resilience in their IT environments and prepare for today’s unpredictability.
The Unpredictable IT Landscape
By definition, a proper observability framework allows an organization to maintain constant monitoring over its entire IT infrastructure. Understanding the various relationships in that infrastructure will determine if it is operating correctly. Unfortunately, multiple factors contribute to the unpredictability in today’s IT landscape, making observability much more difficult.
The scale and nature of IT environments have changed drastically in the last few years. Data from Goldman Sachs predicts cloud computing sales are expected to reach US$2 trillion by the end of 2030, suggesting IT environments will reach unprecedented sizes by the end of the decade. Despite this larger investment in cloud resources, today’s companies are not leaning solely on the cloud. Many enterprises are looking to strike a balance between cloud and on-premises.
As a result, IT leaders are responsible for managing complex, hybrid IT environments. In fact, according to data from a recent public sector SolarWinds AI and Observability report, three-quarters of respondents indicated hybrid environments were difficult to manage, with data protection and data privacy appearing as top concerns. IT managers said their issues with complexity spring from the need to secure and integrate multiple infrastructures.
A number of cybersecurity factors also contribute to unpredictability. More than half of respondents (58 percent) said cybersecurity mistakes from untrained insiders or people authorized to be in their networks contributed to the most significant security threats.
Simultaneously, 59 percent said the “general hacking communities” also contributed. It’s important to note that hacking has become much more sophisticated than in recent years. For example, AI has made hacking more ubiquitous, allowing trained and untrained hackers to amplify and improve their attacks on IT environments. The best way to handle unforeseen circumstances is for our internal observability functions to operate like the human brain.
An Intelligent Observability Function
If we think about it, the human brain is the most powerful observability system. It can analyze and assess constant noise in and around the body. It can also subconsciously suppress activity that doesn’t need immediate attention while allowing us to consciously trigger a response to the issues that need immediate attention.
What IT leaders need is an observability function that operates in the same way. However, the usual architecture of most observability frameworks makes this “human brain” approach difficult.
Many enterprises have hybrid IT architectures that leverage different observability tools for their on-premises and cloud environments. Further, detection (the subconscious recognition of activity) and remediation (the conscious triggering of a fix) are often two separate solutions in the environment. This creates gaps between when a problem begins and when you can solve it.
The problem for IT leaders is that, in many cases, it doesn’t matter how many separate solutions you have—you are often responsible for both detection and remediation in all of them. In a world of IT unpredictability, there is little time for gaps in how quickly your systems find something wrong and address it. Any delay in remediation will only increase the time and difficulty of fixing the problem.
Comprehensive observability solutions limit this mean time to remediate (MTTR). They are integrated into your on-premises data center, cloud solutions, and the remediation services necessary to solve IT issues. This removes silos and enables precision in incident detection. This solution also knows the severity of any unusual activity—is there a ransomware attack happening, or did Joe from accounting try to access a work document with his email again?
For today’s IT leaders, resilience in a world of unpredictability is measured by your team’s ability to recognize something you saw coming, figure out why it’s happening, and address it quickly. The right observability solution is the backbone of this resilience.
Prepared for Anything
As AI automates more of our systems, enterprises will continue to figure out the right balance between the cloud and on-premises, with increasing unpredictability. Hackers and foreign threats will continue to threaten your IT environment even as you figure out how to optimize your IT assets. This is why observability and resilience are so critical even during a push to maximize resources. It’s not just about protecting what you already have. The right observability tools and best practices will allow you to preserve your current environment while you work to improve your business for the future.