Ad Image

CrowdStrike Cost the Global Economy Billions in One Day. Can They Recover? Lessons from Someone Who Did

Crowdstrike Lessons

Crowdstrike Lessons

Mehdi Daoudi, the CEO and co-founder of Catchpoint, gives his commentary on whether CrowdStrike can recover from its recent outage and shares some lessons he learned from a similar situation he faced. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.

The recent CrowdStrike outage has had a staggering impact on the global economy, with losses running into billions. As someone who has faced a similar crisis, I understand the immense challenge of recovering from such an event. But recovery is possible, and there are valuable lessons to be learned.

In 1999, while working at DoubleClick, I inadvertently caused a major outage by deleting a seemingly insignificant file. The fallout was immediate and severe, involving operational chaos, breached service level agreements (SLAs), and millions in financial losses. The incident taught me crucial lessons about the importance of resilience and preparedness, lessons that are highly relevant for CrowdStrike today.

The first step towards recovery is acknowledging the severity of the incident. CrowdStrike must openly address the scale of the outage and its implications. This involves detailed communication with their clients, stakeholders, and the public, explaining what went wrong and what steps are being taken to rectify the situation.

CrowdStrike needs to scrutinize its change management processes. The incident was triggered by a routine software update, highlighting a critical failure in its testing and validation protocols. By implementing more rigorous testing procedures, CrowdStrike can ensure that updates are thoroughly vetted before deployment. This includes testing on a wide range of systems and configurations to catch potential issues.

Building a culture of resilience is essential. This means regular training and drills for employees to ensure they are prepared for crises. It also involves establishing clear incident response plans that can be quickly activated when problems arise. Resilience is about being ready to respond effectively, minimizing downtime, and mitigating financial and reputational damage.

Transparency and communication are crucial. CrowdStrike must keep their clients and stakeholders informed throughout the recovery process to rebuild trust and demonstrate their commitment to resolving the issue. Apologies and compensation for affected customers are also necessary steps to repair relationships and restore confidence. While I commend CrowdStrike’s CEO for eventually going on national television to apologize and explain their corrective actions, his initial failure to apologize on Twitter was a significant oversight. Correcting that mistake later showed a commendable commitment to transparency and accountability.

In a head-scratching moment amidst the chaos, CrowdStrike’s newly appointed CTO chose the day of the outage to announce his position on LinkedIn. His attempt at humor with the line, “It’s my first day, what did I miss??,” was met with hundreds of responses, many of which questioned whether it was an ill-timed joke.

Investment in advanced monitoring and alert systems is vital. These tools can provide early warnings of potential issues, allowing for rapid response and mitigation. Integrating these tools into a robust incident management framework ensures that the organization can quickly identify and address problems before they escalate.

Finally, CrowdStrike should use this incident as a learning opportunity. Conducting a thorough post-mortem to understand what went wrong and implementing changes based on these insights can prevent future occurrences. Continuous improvement is key to building a resilient organization.

The recent CrowdStrike outage affected critical sectors like healthcare, banking, and travel. An estimated 3,400 flights were canceled, making it the worst day of the year for flight cancellations. 911 systems were impacted, transit was disrupted, and people may have died in the process The fallout from this event will likely be measured not just in the disruption of services but in exponential financial losses worldwide, potentially amounting to millions or even billions in lost revenue.

According to a Forrester study, 39 percent of companies lost between $500,000-$999,999 due to Internet disruptions in a single month. The CrowdStrike outage was felt on an exponential scale, rippling across the world, global economies, industries, and almost every sector. The whole world suddenly cares about IT outages, a field often overlooked and underappreciated, where even the simplest of issues can shut down the world in seconds.

Despite these steps, one must ask: Can CrowdStrike truly recover from this? The damage to their reputation and the trust they’ve lost could take years to rebuild, if ever fully. In 1999, I faced a similar crisis and learned the importance of preparation, transparency, and continuous improvement. By embracing these principles, CrowdStrike can recover from this setback and emerge stronger and more resilient.


Share This

Related Posts