CrowdStrike BSOD
How Versio.io customers solve the IT outage caused by CrowdStrike more efficiently and promptly
Free trial In a nutshell NIS2 🇩🇪- An overview of the global IT outage caused by CrowdStrike on 19 July 2024
- What is needed to detect CrownStrike with Versio.io
- How Versio.io customers identify the affected CrowdStrike servers
- Frequently asked questions about the CrowdStrike BSOD problem from Versio.io customers
Global IT outage due to CrowdStrike
What is needed to recognise CrownStrike with Versio.io?
How Versio.io customers identify the affected CrowdStrike servers
Automatically record CrowdStrike usage in the IT landscape
Automatically record CrowdStrike usage in the IT landscape
The OneImporter agent of the Versio.io platform is able to record all executed processes on a server. This includes the CrowdStrike Falcon agent named ‘CSFalconService.exe’, which caused the IT outage.
The fully automated inventory ensures that Versio.io customers have accurate data on the use of the CrowdStrike Falcon Agent. In addition to the process characteristics, Versio.io automatically recognises the product and its version numbers.
In addition, the OneImporter can use the ‘File Importer’ module to inventory the problem-causing file ‘C:\Windows\System32\drivers\CrowdStrike\C-00000291*.sys’.
Determine servers on which the CrowdStrike Falcon Agent is running via topology context
Determine servers on which the CrowdStrike Falcon Agent is running via topology context
The Versio.io platform is able to automatically recognise the relationships between recorded configuration items. The topology illustration shows that the CrowdStrike Falcon Agent was started by a process called ‘Services.exe’, which in turn was started by ‘Wininit.exe’. ‘Wininit.exe’ is the most important process of a Windows operating system and therefore has a direct relationship to the server instance “evp-node-1�.
On this topological basis, it is now transparent for each CrowdStrike Falcon agent on which server it is executed.
Identification of all servers that use the CrowdStrike Falcon Agent
Identification of all servers that use the CrowdStrike Falcon Agent
Based on the recorded process data and the topology, all processes can now be filtered by ‘CSFalconService.exe’ in Versio.io Reporting and the executing host can be displayed.
This means that Versio.io customers now have access to basic information about the scope and the servers that use the CrowdStrike Falcon Agent. In the same way, it would be possible to report on which servers the file ‘C-00000291*.sys’ is available.
Determine the runtime behaviour of the servers affected by CrowdStrike using OneImporter Heartbeat
Determine the runtime behaviour of the servers affected by CrowdStrike using OneImporter Heartbeat
Each of our customers' servers is provisioned with a Versio.io OneImporter. This sends heartbeats to the server at regular intervals. The heartbeat is a message that indicates to the Versio.io server that the OneImporter is functional. The heartbeat status can be used in the OneImporter Dashboard to recognise which OneImporters running on Windows operating systems are no longer working correctly. Due to the high stability of the OneImporter, it can be assumed that these Windows systems are part of the CrowdStrike problem during the period of the global IT outage.
Questions & answers
Were the Versio.io services affected by the outage?
The services of the Versio.io platform were not affected by the CrowdStrike issue, as the platform only runs on Linux-based computers. All OneImporter and OneGates agents may be affected if they are running on Windows systems. In the OneImporter and OneGate Dashbaord, however, you can easily recognise when the agents are no longer functional by the heartbeat.
What was the cause of the failure?
CrowdStrike released an update for Windows PCs that contained a defect.Affected servers were forced into a boot loop that prevented them from switching on. The boot sequence is the first time a server is switched on, during which the operating systems, applications and services running on the server are first brought online.
Why was the outage so severe?
If an affected server is stuck in a boot loop, it cannot establish communication or services, i.e. it does not respond to requests or commands. It is as if the server is switched off. In order to restore the services, the rectification must be carried out individually and manually. The remediation process can also be complex and time-consuming for each server and may involve a ‘rollback’ to an earlier point in time from backups. In total, an estimated 8.5 million Windows devices are affected.
Is there a schedule for restoring the services
As remediation is manual and time consuming, service recovery depends on which servers are involved in the most critical applications and which servers are prioritised over less critical services. This can take hours or days for many organisations. Versio.io customers can speed up this process by quickly finding affected hosts and prioritising the most critical first based on protection needs.
How does Versio.io help our customers who are affected by the outage?
This problem needs to be fixed manually, but Versio.io recognises which servers and which services are affected. With this information, we simplify the process for our customers to create plans and restore servers and services associated with their most critical (high protection) applications.
Are many Versio.io customers affected by the outage?
Yes, because this outage was unavoidable after CrowdStrike released the buggy update. Many of the world's largest and most important companies use CrowdStrike for endpoint protection. Fortunately, Versio.io helps our customers quickly identify and prioritise affected servers so they can quickly restore services to their most critical business functions. By knowing exactly which offline servers are connected to specific critical business services and the exact dependency relationships, IT teams can quickly create manual remediation plans to efficiently restore business-critical functions. Versio.io customers are very familiar with this process, as they use it when zero-day runtime vulnerabilities such as log4j are discovered that pose an immediate threat to large parts of their environment. In these cases of vulnerabilities, Versio.io helps customers to immediately identify and prioritise the affected code.
Authors | July 19, 2024
Keywords