Businesses across the world have been hit by widespread disruptions to their Windows workstations stemming from a faulty update pushed out by cybersecurity company CrowdStrike.
“CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts,” the company’s CEO George Kurtz said in a statement. “Mac and Linux hosts are not impacted. This is not a security incident or cyberattack.”
The company, which acknowledged “reports of [Blue Screens of Death] on Windows hosts,” further said it has identified the issue and a fix has been deployed for its Falcon Sensor product, urging customers to refer to the support portal for the latest updates.
For systems that have been already impacted by the problem, the mitigation instructions are listed below –
- Boot Windows in Safe Mode or Windows Recovery Environment
- Navigate to the C:\Windows\System32\drivers\CrowdStrike directory
- Find the file named “C-00000291*.sys” and delete it
- Restart the computer or server normally
It’s worth noting that the outage has also impacted Google Cloud Compute Engine, causing Windows virtual machines using CrowdStrike’s csagent.sys to crash and go into an unexpected reboot state.
“After having automatically received a defective patch from CrowdStrike, Windows VMs crash and will not be able to reboot,” it said. “Windows VMs that are currently up and running should no longer be impacted.”
Microsoft Azure has also posted a similar update, stating it “received reports of successful recovery from some customers attempting multiple Virtual Machine restart operations on affected Virtual Machines” and that “several reboots (as many as 15 have been reported) may be required.”
Amazon Web Services (AWS), for its part, said it has taken steps to mitigate the issue for as many Windows instances, Windows Workspaces, and Appstream Applications as possible, recommending customers still affected by the issue to “take action to restore connectivity.”
Security researcher Kevin Beaumont said “I have obtained the CrowdStrike driver they pushed via auto update. I don’t know how it happened, but the file isn’t a validly formatted driver and causes Windows to crash every time.”
“CrowdStrike is the top tier EDR product, and is on everything from point of sale to ATMs etc – this will be the biggest ‘cyber’ incident worldwide ever in terms of impact, most likely.”
Airlines, financial institutions, food and retail chains, hospitals, hotels, news organizations, railway networks, and telecom firms are among the many businesses affected. Shares of CrowdStrike have tanked 15% in U.S. premarket trading.
“The current event appears – even in July – that it will be one of the most significant cyber issues of 2024,” Omer Grossman, Chief Information Officer (CIO) at CyberArk, said in a statement shared with The Hacker News. “The damage to business processes at the global level is dramatic. The glitch is due to a software update of CrowdStrike’s EDR product.”
“This is a product that runs with high privileges that protects endpoints. A malfunction in this can, as we are seeing in the current incident, cause the operating system to crash.”
The recovery is expected to take days as the problem needs to be solved manually, endpoint by endpoint, by starting them in Safe Mode and removing the buggy driver, Grossman pointed out, adding the root cause behind the malfunction will be of the “utmost interest.”
Jake Moore, global security advisor at Slovakian cybersecurity company ESET, told The Hacker News that the incident serves to highlight the need for implementing multiple “fail safes” in place and diversifying IT infrastructure.
“Upgrades and maintenance to systems and networks can unintentionally include small errors, which can have wide-reaching consequences as experienced today by CrowdStrike’s customers,” Moore said.
“Another aspect of this incident relates to ‘diversity’ in the use of large-scale IT infrastructure. This applies to critical systems like operating systems (OSes), cybersecurity products, and other globally deployed (scaled) applications. Where diversity is low, a single technical incident, not to mention a security issue, can lead to global-scale outages with subsequent knock-on effects.”
The development comes as Microsoft is recovering from a separate outage of its own that caused issues with Microsoft 365 apps and services, including Defender, Intune, OneNote, OneDrive for Business, SharePoint Online, Windows 365, Viva Engage, and Purview.
“A configuration change in a portion of our Azure backend workloads, caused interruption between storage and compute resources which resulted in connectivity failures that affected downstream Microsoft 365 services dependent on these connections,” the tech giant said.
Omkhar Arasaratnam, general manager of OpenSSF, said the Microsoft-CrowdStrike outages underscore the fragility of monocultural supply chains and emphasized the import ance of diversity in technology stacks for greater resilience and security.
“Monocultural supply chains (single operating system, single EDR) are inherently fragile and susceptible to systemic faults – as we’ve seen,” Arasaratnam pointed out. “Good system engineering tells us that changes in these systems should be rolled out gradually, observing the impact in small tranches vs. all at once. More diverse ecosystems can tolerate rapid change as they’re resilient to systemic issues.”
https://thehackernews.com/2024/07/faulty-crowdstrike-update-crashes.html