Lessons from the CrowdStrike Incident
CrowdStrike is a company in the cybersecurity industry that specializes in delivering services such as endpoint security, threat intelligence, and cyber-attack response. On July 19th, 2024, a global outage of IT services occurred due to a faulty update released for their Falcon Sensor security software.
This was characterized by the infamous Blue Screen of Death (BSOD) shown below:
This outage affected multiple industries, including hospitals, airlines, manufacturers, and major global banks, among others.
Root Cause of the Issue
On August 6th, 2024, CrowdStrike published a detailed Root Cause Analysis (RCA) Report detailing what caused the July 19th 2024 system crashes that lead to the global outage. In the RCA, CrowdStrike highlighted multiple factors contributing to the Falcon EDR sensor crash.
Key among these included an input mismatch between inputs validated by a Content Validator and those provided to a Content Interpreter, an out-of-bounds read issue in the Content Interpreter and the absence of a specific test.
CrowdStrike explained the sensors receiving the new version of Channel File 291 were exposed to a latent out-of-bounds issue in the Content Interpreter. In the updated version of the Falcon EDR sensor, the new IPC Template Instances were evaluated requiring a comparison against 21 input values, while the Content Interpreter expected only 20 input values.
Another major issue was that it was a global release, rather than a staggered release or phased rollout. When it went wrong, it went wrong for everyone. With a staggered release, they will exclude organizations that are part of critical infrastructure until a later release so they do not experience the bugs in an early version.
Vulnerability of Monocultures
The CrowdStrike outage highlighted the dangers of overreliance on single platforms and vendors. Windows’ dominance in the corporate sector has resulted in a vast, uniform attack surface for cybercriminals. Similarly, the concentration of cybersecurity services among a few major companies increases the potential impact of a breach, as seen in the SolarWinds attack. This centralization creates a tempting target for adversaries, as the SolarWinds breach affected critical U.S. government departments, including Homeland Security, State, Commerce, and Treasury.
To mitigate these risks, organizations should consider diversifying their technology stack and security providers and adopting a multi-layered defense strategy to reduce reliance on any single vendor or system.
The Way Forward
In light of this issue and the possibility of future occurrences, organizations need to secure their operations and increase their resilience by reducing their exposure.
Organizations need to consider the following areas in preparation for future unforeseen events:
- Testing: This incident highlights the crucial need for comprehensive testing, careful deployment, accountability, and the importance of phased rollouts. Regular automated updates to endpoint detection response (EDR) tools, like CrowdStrike’s Falcon, should be implemented in phases to detect problems before they become widespread. A phased rollout approach involves Initial deployment to a small, diverse subset of systems, monitoring for unexpected behaviors or conflicts, gradual expansion to larger groups, and maintaining the ability to rollback if problems arise quickly. Additionally, organizations should maintain test environments that are logically separated from production environments. By combining these strategies, organizations can create a more resilient and secure IT infrastructure.
- Encryption Management: IT Administrators should establish a process for inventorying and managing assets, as well as maintaining encryption keys throughout the lifecycle of organizational assets. During the CrowdStrike incident, many IT Administrators lacked the necessary encryption keys to decrypt hard drives and remove the problematic CrowdStrike file. As a result, new devices had to be deployed, leading to decreased productivity and added expense.
- Maintaining System Backups: Organizations should routinely test their backups, ideally at least four times a year, with increased frequency if there are significant changes in the data environment. To protect against ransomware attacks, it is also important to maintain offline backups that are not connected to the network.
- Developing and Updating Incident Response and Contingency Plans: Organizations should develop and consistently update incident response and contingency plans according to criticality and potential impact to systems. These plans must include detailed procedures for recovering critical systems and should be supported by a comprehensive Backup and Recovery Plan that outlines specific recovery processes.
- 3rd-Party Vendor Risk Management. This strategy should involve having multiple service providers to ensure that if one experiences an outage, business processes and services remain unaffected. It’s important to “trust but verify” third-party IT security processes. Organizations and businesses should consistently assess their third-party providers to ensure they meet both internal and regulatory standards.
How Can CKSS Help?
CK Security Solutions (CKSS) assists organizations of all sizes and from many industries, including DoD contractors, in implementing a comprehensive security program that includes the following key elements:
- Business Recovery and Continuity Services: Develop a strategy for restoring critical infrastructure in the event of a disruption.
- Managed Security Services: Incorporate operations, vulnerability management, and continuous monitoring.
- Cybersecurity Audit and Assessment Services: Regularly evaluate your security posture and establish a roadmap for future improvement.
- IT and Cybersecurity Design and Implementation Services: Integrate security measures throughout the system development lifecycle.
We’ve covered a lot of ground in this post, and if you’re feeling a bit overwhelmed, don’t be intimidated. The good news is our team specializes in creating systems that require less intervention and reduce the headache to your IT team.
The first step is scheduling a No-Obligation Consultation to assess your current systems and evaluate your existing Third-Party Action Plan.
Contacting CKSS
You probably have questions, and our team is here to provide you with the answers you need.
Call us anytime at 443.464.1589 or contact our team online today.