Proactive IT Resilience: Preparing for the Next Big Disruption
David Chen, the Director of IT Services at Laserfiche, explains why proactive IT resilience will help companies prepare for whatever the next disruption is. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.
This summer, a faulty update from CrowdStrike’s Falcon Sensor software triggered one of the largest IT outages in history. Roughly 8.5 million Microsoft Windows systems were impacted, rendering them unable to restart properly. This incident became a wake-up call for the entire IT industry, demonstrating how even the most advanced cybersecurity solutions can be vulnerable to errors that lead to unprecedented disruption. As we navigate an era increasingly dependent on technology, the risk of such disruptions isn’t just a possibility—it’s an inevitability.
For organizations worldwide, the CrowdStrike incident serves as a potent reminder that IT resilience isn’t built in a day or in the aftermath of a crisis. Instead, it requires a forward-thinking approach where potential threats are anticipated, and strategic measures are implemented well in advance. Preparing for these disruptions involves more than just patching vulnerabilities as they appear; it demands a fundamental shift toward a culture of resilience, adaptability, and ongoing risk management. Here’s how IT departments can develop a resilient strategy to face future disruptions.
1) Vendor Risk Assessment: A Foundation for Resilience
When an organization relies on third-party vendors, their vulnerabilities become potential risks. It’s essential to conduct thorough vendor risk assessments and continuously monitor their risk profiles. This involves establishing rigorous vetting processes that include legal, executive, and security checks, ensuring that all compliance requirements are met and potential vulnerabilities are identified and addressed early.
A “trust but verify” mindset is crucial—regularly auditing or checking vendors’ security practices and performance while also assuming that any vendor’s data could be compromised or lost at any time. This approach means backing up all critical data independently and maintaining control over information, even when working with trusted vendors. Negotiating comprehensive service level agreements (SLAs) that define expected recovery time objectives (RTOs) in the event of a disruption also provides a clear understanding of the vendor’s capabilities and limitations. Such safeguards ensure the organization remains operational even if a key vendor experiences an outage.
2) Out-of-Band Communication: Ensuring Connectivity During Outages
A sometimes overlooked aspect of crisis management is the importance of maintaining reliable communication channels. During significant IT disruptions, primary communication platforms may become compromised or entirely inaccessible. For this reason, establishing out-of-band communication methods ahead of time is a critical aspect of any robust response plan.
Organizations should implement multiple communication platforms, ensuring redundancy in case one fails. For example, utilizing both Teams and Zoom can provide a fallback option when a primary channel is disrupted. Additionally, maintaining an offline record of crucial personnel contact information, stored securely but accessible to those who may need it, ensures that communication lines remain intact even if digital systems are down.
Frequent drills and tabletop exercises are essential to ensuring the effectiveness of these strategies. This level of preparedness ensures that employees are not only aware of the protocols but can execute them without hesitation, minimizing downtime and confusion when disruptions occur.
3) Lessons from Recent Disruptions: Adapting and Improving
The CrowdStrike incident is a stark reminder that even well-established security protocols can fail unexpectedly. Beyond having response plans, organizations must constantly refine and adapt these strategies based on emerging threats and lessons learned from past disruptions. In the wake of the CrowdStrike disruption, organizations that responded most effectively had pre-established protocols and experienced engineers who could act swiftly. This level of preparedness, coupled with a shared understanding of the response plan, allowed for rapid system recovery and reduced overall impact.
Another primary lesson is the value of phased rollouts for critical software updates. By implementing changes incrementally, IT teams can identify potential issues early, significantly reducing the risk of widespread system failures. This approach requires a level of patience and diligence, but the payoff in risk mitigation is invaluable.
4) Cross-Functional Coordination: Strengthening Business Continuity
No IT disruption occurs in isolation, and the impact often extends beyond the IT department. Effective crisis management requires seamless coordination across all departments, from operations to customer service to executive leadership. Establishing clear lines of communication and responsibility ensures that each department understands its role and can contribute to a unified response.
As data sovereignty and compliance laws become increasingly complex, it is crucial to involve legal teams early in the planning process. Legal input ensures that an organization’s response aligns with all regulatory requirements, safeguarding against potential legal repercussions.
A structured incident response plan that integrates cross-departmental collaboration can make all the difference during a crisis. For example, while the IT department focuses on technical recovery, other teams—such as customer service and communications—must be prepared to address customer concerns and manage external communications. This level of coordination ensures that all aspects of the business continue to function, even if some systems are temporarily offline.
5) Building a Culture of Preparedness
A culture of preparedness and resilience is at the heart of any effective disruption strategy. This means going beyond mere policies and protocols to cultivate an organizational mindset that prioritizes risk management and continuity planning. Such a culture encourages proactive engagement with potential threats rather than reactive scrambling when disruptions occur.
This culture must start from the top, with executive leadership demonstrating commitment to preparedness by actively participating in drills and ensuring that all departments understand the critical nature of their roles. Regular testing and updates to response plans are vital, as they reveal gaps and allow teams to refine their strategies. It’s this ongoing commitment to testing, learning, and improving that transforms a theoretical plan into a living, breathing process capable of withstanding real-world challenges.
The CrowdStrike incident has made one thing abundantly clear: the question isn’t whether another IT disruption will occur but when. The organizations that weather such storms most effectively are those that take proactive steps to prepare, test, and refine their response strategies continually. This involves not only having a robust plan in place but also building a culture that values resilience, redundancy, and adaptability at every level. When the next disruption arrives—and it will—these organizations won’t just survive; they’ll emerge stronger and more capable of navigating the unpredictable landscape of modern IT challenges.