A global IT failure wreaked havoc on Friday, grounding flights and disrupting everything from hospitals to government agencies. Over all the chaos hung a question: how did a flawed update to Microsoft Windows software bring large swaths of society to a screeching halt?
The problem originated with an Austin, Texas-based cybersecurity firm called CrowdStrike, relied upon by most of the global technology industry, including Microsoft, for its Falcon program, which blocks the execution of malware and cyber-attacks. Falcon protects devices by securing access to a wide range of internal systems and automatically updating its defenses – a level of integration that means if Falcon falters, the computer is close behind. After CrowdStrike updated Falcon on Thursday night, Microsoft systems and Windows PCs were hit with a “blue screen of death” and rendered unusable as they were trapped in a recovery boot loop.
Microsoft is a juggernaut with significant market power, dominating cloud-computing infrastructure across Europe and the United States. So it wasn’t just computers that were affected, but servers and a host of other systems as well. Overwhelming requests from users, devices, services and businesses ushered in a cascading series of failures with Microsoft products – namely Azure Cloud and Microsoft 365. Failures plaguing Azure led to additional but separate disruptions with 365 services. A giant clusterfuck ensued.
That’s how CrowdStrike’s faulty update evolved into the largest IT outage in history, but it tells us nothing about why a global computational infrastructure seems to have one point of failure. At least one CrowdStrike executive had the same question.
“Their IT stack may include just a single provider for operating system, cloud, productivity, email, chat, collaboration, video conferencing, browser, identity, generative AI and increasingly security as well,” a CrowdStrike vice-president, Drew Bagley, said. “This means that the building materials, the supply chain and even the building inspector are all the same.”
This is at the heart of what happened these past two days. It’s not just that so many firms rely on CrowdStrike, but that cloud infrastructure relies on hugely powerful companies such as Microsoft, which then subject firms to exclusionary and anticompetitive practices that concentrate services and offerings into an increasingly narrow range of options.
In June 2023, the Federal Trade Commission fielded a call for public comments about cloud-computing business practices. Microsoft and Amazon, two companies that dominate this space, replied by insisting that competition was “thriving” and “highly dynamic and competitive”. Google, less of a player than the other two, was less demure and offered an 11-page document accusing Microsoft of stifling competition.
“Microsoft’s complex web of licensing restrictions prevents customers, particularly its existing on-premises enterprise clients, from choosing any other cloud provider at the time of migration into the cloud and ultimately locks those customers into its Azure ecosystem,” Google said in its complaint.
Google was right, though it is guilty of the same sins. Two-thirds of the global cloud infrastructure services market is controlled by these three firms, which make it near impossible to switch among providers by imposing impenetrable technical barriers that deter vendors from switching – effectively locking them in.
As the blue screen of death appeared at airports the world over, the US Federal Trade Commission chair, Lina Khan, tweeted: “All too often these days, a single glitch results in a system-wide outage, affecting industries from healthcare and airlines to banks and auto-dealers. Millions of people and businesses pay the price. These incidents reveal how concentration can create fragile systems.”
Why does concentration, consolidation and monopolization leave us at risk? It’s not simply that we homogenize a market, leaving everyone exposed to what should be an isolated service disruption. Concentration yields the power to restructure markets. Monopolists force firms out of a market and redesign the terms of engagement for competitors such that they don’t threaten incumbent juggernauts. A vendor ecosystem’s dependency on Microsoft might be rationalized as cost-cutting, just as the dependency that Microsoft will have on another company like CrowdStrike will be rationalized as cost-cutting.
The real cost is externalized: when these services shut down, who truly suffers? CrowdStrike’s chief executive, George Kurtz, has lost hundreds of millions of his fortune, but it will return. Microsoft and CrowdStrike have lost some clients and some business, but will undoubtedly gain more than it had within a year or two. That’s not just the case in this outage, but in any outage.
Is the same true for those who needed unavailable emergency services, hospitals, airports or government agencies? Is it true for the rest of society that has grown dependent on digital mediation and computation without much input because these processes are driven by absolutist economic firms, not democratic political actors?
We’ve had these sorts of outages before and nothing has changed, partly because the tech industry has been so adept at shifting blame. If that continues, then the monopolists will do what they please and everyone will suffer what they must.