It’s been two weeks since a massive global disruption unfolded as CrowdStrike experienced a widespread outage on July 19. The incident impacted countless businesses and organizations around the world, causing widespread chaos as critical systems went offline.
Hospitals were brought to a standstill, thousands of planes were grounded, and millions of devices and companies were affected. The outage underscored the fragility of modern digital infrastructure and raised serious questions about the potential consequences of widespread cybersecurity failures.
Techopedia reached out to affected organizations and leading experts to provide critical insight that can help them take immediate action and develop long-term plans.
Key Takeaways
- Ghazenfer Mansoor emphasizes the need to “reinvent” digital infrastructure after the CrowdStrike outage, advocating for a detailed system examination and a disaster recovery plan.
- Jake Williams highlights the risk of SaaS-based services, noting that the outage shows “the model of pushing updates without IT intervention is unsustainable”.
- Yakir Golan warns of the dangers of dependency on single third-party providers, calling for a “blame-free” conversation to assess and mitigate digital infrastructure risks.
- Alina Timofeeva discusses the systemic risks of relying on large providers, noting that failures can “damage the global economic system.”
- Erik Severinghaus advocates for upgrading digital systems post-outage, stating it’s an opportunity to “rebuild smarter and stronger.
A World Running One Single Point of Failure
In an official communication, CrowdStrike explained that the incident was caused by a release of a content configuration update for the Windows sensor. The update was designed to help organizations better fight threats.
Not only would reports later show that the update had not been extensively tested, but more importantly, the update was automatic for all CrowdStrike Windows clients who were never given the chance to update later. As a result, the dreaded Windows ‘Blue Screen of Death‘ shut down systems across the globe.
Ghazenfer Mansoor, CEO of Technology Rivers, a software development agency which was impacted by the outage, spoke to Techopedia about the event.
“When the digital ground shakes, the wise don’t just rebuild, they reinvent.
“After something like this happens, it’s very important to understand your situation and where you are now,” Mansoor explained.
Technology Rivers began with a detailed examination of their systems to grasp the complete impact.
“I suggest you do the same for your operations,” Mansoor said. “Find any problems or mistakes with data and be sure your IT setup is very secure to stop more issues from happening.”
When questioned about long-term actions, Mansoor said the key is to have a disaster recovery plan.
“If you have a disaster recovery plan ready, it’s the moment to use it. After the ship is steady again, it is good to think about long-term changes.”
Mansoor said that a long-term important strategy is to make tech investments more varied and not “place every digital egg in the same basket”. Using systems from several different providers can help avoid many problems if one of them fails.
When Administrators Are Not in Control of Updates
While Microsoft quickly attributed the widespread disruption to a faulty CrowdStrike update and the CEO of Crowdstrike, George Kurtz, issued apologies hoping these would be enough to calm the waters, many security leaders were left dissatisfied.
The most important question — how could a simple content update crash the global internet?
Jake Williams, former NSA hacker, Faculty at IANS Research, a Boston-based cybersecurity research and advisory firm, and VP of R&D at Hunter Strategy, told Techopedia that when administrators are not in control of updating their own systems there is a fundamental serious problem.
“This (global IT outage) highlights the risks of SaaS-based services taking update cycles out of the hands of systems administrators.”
“Many security teams don’t realize that their endpoint protection platforms’ signature updates often themselves contain code, further exacerbating the issue.”
Williams said that while we should expect to see changes in this operating model, for better or worse, CrowdStrike showed why the model of pushing updates without IT intervention is unsustainable.
An Urgent Call for Blame-Free Honest Conversation
Yakir Golan, who started his career in the Israeli Intelligence Forces and is today CEO of Kovrr, a risk quantification company, spoke to Techopedia about the dangers of dependency on third-party providers.
“This catastrophic cyber event was a jolting reminder for some, and harsh wake-up call for others, of the pervasiveness of reliance that huge portions of the market have on single third-party service providers.”
Golan explained that everyone — from CISOs to CFOs, to CEOs, to board members — need to carefully review their organization’s digital infrastructure, and explore the unique exposure levels to the various cyber risk scenarios they face. This especially applies to those connected to third-party services.
“There needs to be a frank conversation, free of blame, about why the incident had such a severe impact on the company and what can be done in the future to mitigate these impact levels.”
The challenge now for CISOs and security leaders is to clearly and effectively communicate to stakeholders, boards, and C-Suite executives, what went wrong and what needs to be done to prevent it from happening again.
Too Big To Fail? It Could Happen Again
Unfortunately, while the CrowdStrike event and its technological details have been reported, there has been little to no information on how organizations should move forward. It may seem tempting to believe that the global IT outage was a one-in-a-lifetime human error incident, that systems were quickly restored, and business-as-usual should continue. But should it?
Eight-times award-winning tech expert, TED speaker, board member for the British Computer Society, The Chartered Institute for IT, and a strategic advisor in data and technology to the C-suite of major financial services organizations, Alina Timofeeva spoke to Techopedia about the issue.
“The global IT outages that occurred last week will have a lasting and far-reaching impact, way beyond the initial chaos that the CrowdStrike update caused.”
Timofeeva explained that customers may not necessarily be fully aware of the risks they are exposed to. “The global ripple effect of the outage illustrates the interconnectivity across the supply chain and risk concentration in this market,” Timofeeva said.
“Software vendors like CrowdStrike have become so large and so interconnected that their failures can damage the global economic system and tens of millions of customers globally.”
Timofeeva said that it is vital for companies, governments, and the regulatory ecosystem to be more mindful and concerned about the systemic risk of being dependent on a single major provider.
“Last week, it was CloudStrike and Microsoft. In the future, it could be cloud giants like Amazon, Microsoft, or Google that fail, and this would impact tens of millions of customers.”
Rebuilding More Secure and Resilient Digital Architectures
A key lesson from the CrowdStrike outage is that the security industry has become accustomed to reacting to incidents rather than proactively preventing them. Like Erik Severinghaus, CEO of Bloomfilter, a company working in process mining for the software development lifecycle, told Techpoedia.
“When the digital storm hits and knocks out the power, it’s not just about finding the flashlight but about upgrading your whole electrical system. It is a chance to rebuild smarter and stronger.”
The Bottom Line
Whether it be switching to decentralized cloud systems, making sure administrators are in full control of updates, diversifying providers, or avoiding over-dependence on big tech (who, we remind you, can fail), organizations need to take an honest look at their infrastructure and rebuild.
Supply chains riddled with single points of failure are a massive risk to healthcare, emergency services, logistics, critical infrastructure, transportation, and more. In our interconnected world, an outage caused by an update can shut down entire countries, as CrowdStrike proved. As long as we continue to rely on such centralized systems, we remain vulnerable to catastrophic failures.
References
- Falcon Content Update Remediation and Guidance Hub (Crowdstrike)
- Ghazenfer Mansoor – Technology Rivers (Linkedin)
- Revolutionizing Healthcare and AI with Software that Solves Core Industry Problems (Technologyrivers)
- Helping our Partners Get IT Right (Hunterstrategy)
- Yakir Golan – Kovrr (Il.linkedin)
- Cyber Risk Quantification and Risk Management (Kovrr)
- BCS, The Chartered Institute for IT (Bcs)
- Alina Timofeeva – BCS, The Chartered Institute for IT (Linkedin)
- Erik Severinghaus – Bloomfilter, Inc (Linkedin)
- AI-Driven Process Mining For Software Development (Thebloomfilter)