Taking a Measured Response to the CrowdStrike Outage
Nick Carroll, an experienced cybersecurity professional currently serving as a Cyber Incident Response Manager at Nightwing, recently shared a “measured response” to the recent CrowdStrike outage. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.
The outage experienced by many CrowdStrike customers on July 19, 2024, was widespread due to the breadth of their customer base and the ubiquity of the Windows operating system, which the defect in its update impacted. This outage spread quickly due to the privileged access security software has on the computers it is run on to protect them from malware, and many customers update their computers automatically to ensure they have the latest security advancements.
While there was not a lot that organizations could have done to fully prevent this, outside of disabling patches and definition updates, which bring their own risks, there are steps organizations can take now to help minimize disruptions from these types of incidents in the future. There is also an opportunity to analyze how this incident will change the way the security industry operates moving forward.
Reevaluate Internal Testing and Orchestration
The time is now for impacted organizations to review their own incident response plans and identify any weak spots. Testing is a much safer way to confirm disaster recovery preparedness, but an actual incident can often highlight areas for improvement that a simulated test would miss. Thus, the lessons learned from this CrowdStrike event can be critical to success in reducing the recovery time from any future outage.
Organizations must also consider if they can automate some testing and orchestration of patch and anti-virus definition update management. Some organizations in more critical infrastructure environments have defined testing systems that received these updates first. Then, through automated patch management, those updates would be distributed to other systems across the organization after a defined testing period to confirm that the update wouldn’t break mission-critical software. Though these measures don’t always come to mind for organizations, events like the CrowdStrike outage remind us of the importance of taking all precautions necessary, internally and on an industry level as well.
Organizations should also confirm that all critical systems have recent, functioning backups. For example, a few organizations that tackled this outage had fairly quick recovery times because they were able to simply roll back to the last backup of their servers. This doesn’t mean their help desk didn’t still have to do some leg work for laptops and remote workers, but the organizational impact was greatly reduced.
With that said, aside from having a vetted incident response plan and excellent backups, organizations should check their cyber insurance policies to make sure that these types of business interruptions are properly covered. Comprehensive cyber insurance can help mitigate some of the business costs and damages associated with a major outage, but not all policies are the same.
Calls for Larger Vendor Changes
Organizations must weigh the types of access they provide vendors and software on their networks to ensure they’re striking the best balance of security and risk reduction to operational outcomes. They should be considering network segmentation, zero trust strategies, and similar defense mechanisms as part of their overall security posture. Realistically, this incident probably won’t have a major impact on how organizations allow or provide access to security vendors.
Due to how security tools operate, they often require privileged access to operating systems, network devices, and more to provide functional security. And it’s not just security tools. Networking and sysadmin tools often have similar levels of trusted access. We’ve seen major incidents like the SolarWinds breach of 2020-2021, that resulted in short-term changes, but didn’t stop all organizations from using SolarWinds products.
Looking at the industry as a whole, though, vendors are already being encouraged to adopt more transparency about the software they create through the software bill of materials (SBOM), which lets end-users know what open-source and third-party components are used to create the software products they use. This can help organizations identify vulnerabilities and risks they might not otherwise be privy to because often the software we use is a black box to us.
SBOM adoption will likely continue to grow across software supply chains to help organizations better understand the products they are purchasing. It is likely that many organizations will look at the single points of failure identified in this outage, CrowdStrike included, and try to find ways to create redundant checks and balances to prevent such an impact from occurring to their organizations again. Going forward, these industry-wide changes could have a major effect on protecting organizations from the harmful impacts of future outages.
Ultimately, organizations need to take a measured approach in their response to this event. Continuing to follow good cyber-hygiene and cybersecurity best practices will create those constant invisible benefits that keep organizations from falling victim to a ransomware event or other compromises and avoid making the news with a data breach. It is also important to note that this is not the first time an incident like this has occurred, albeit at different degrees. We must now analyze and learn from how an incident like this happened to minimize its risk of happening again in the future.