Tech Meltdowns: 8 Epic Outages and What Went Wrong

According to the Uptime Institute’s 2024 Outage Analysis, between 10 and 20 “high-profile IT outages or data centre events” occur every year.

The study revealed that while power is the main cause of data centre outages, network issues are the leading cause of outages across all IT services. These outages make headlines and have serious consequences, disrupt business for customers, and damage company reputations.

More than half of the respondents said their most recent major outage cost them over $100,000, and 16% reported that it cost them over $1 million. Additionally, the report mentions the leading causes of network outages, including design and configuration, hardware, capacity, software, and environmental threats.

Here are eight major tech outages to be explored.

Microsoft-CrowdStrike

Last week, CrowdStrike, a security technology provider, caused a massive global IT outage, potentially the biggest in history, affecting airlines, banks, businesses, schools, and government services worldwide.

The CrowdStrike Outage occurred due to a faulty software update in their Falcon sensor program, which caused widespread disruptions to Windows systems globally. This led to the infamous “Blue Screen of Death” and reboot loops for millions of users.

Excluding Microsoft, US Fortune 500 companies are said to face $5.4 billion in financial losses due to the Windows outage.

Meta

On October 4, 2021, Meta platforms, including Facebook, Instagram and WhatsApp, experienced an outage lasting nearly six hours. Users faced difficulties accessing the apps, leading to a surge in traffic on competing platforms like Twitter and TikTok.

During this period, Facebook reportedly lost about $545,000 in US ad revenue per hour.

Google Services

Popular Google services such as YouTube, Gmail, Google Drive, and Google Docs were down for an hour, affecting millions of users worldwide on December 14, 2020.

The outage was attributed to a failure in Google’s authentication system, which manages user logins across its services. The issue specifically stemmed from an internal storage quota problem.

Users attempting to access these platforms encountered errors, with many reporting that they were unable to log in or retrieve their data. Google acknowledged the issue and confirmed that the services were restored for the vast majority of affected users shortly after the outage.

Fastly

On June 8, 2021, Fastly, a major content delivery network (CDN) provider, experienced a significant global outage that disrupted numerous high-profile websites, including Amazon, Reddit, and The New York Times.

The outage was triggered by a software bug that had been introduced during a deployment on May 12, which remained dormant until a valid configuration change made by a customer activated it.

This led to 85% of Fastly’s network returning errors, resulting in widespread accessibility issues for many internet users around the world.

Twitter (X Corp)

Twitter suffered a major outage on December 28, 2022, leaving tens of thousands of users unable to access the platform or its features for several hours. It primarily impacted users attempting to access the platform via desktop computers.

Many reported being unexpectedly logged out, encountering error messages, and facing difficulties in viewing replies or using features like notifications and TweetDeck. The hashtag #TwitterDown trended on the platform as users shared their experiences during the outage.

AWS

On December 7, 2021, Amazon Web Services (AWS) experienced a significant outage that disrupted numerous services and affected a wide range of businesses and applications. It primarily impacted the US-East-1 region, located in Northern Virginia, which is crucial for many of AWS’s services.

The outage was caused by an automated scaling activity designed to increase capacity for service within AWS’s main network. This action unintentionally triggered a surge in connection attempts within AWS’s internal network, overwhelming the devices managing communication between the internal and main networks.

Akamai

On June 17, 2021, a significant disruption occurred at Akamai, affecting the websites of numerous financial institutions and airlines in Australia and the United States. This outage was traced back to server-related glitches at Akamai, a major content delivery network (CDN) provider.

The incident marked the second major internet blackout within a week, following a prior outage caused by a rival CDN, Fastly Inc.

Akamai attributed the outage to a bug in its software, which was promptly addressed. The company confirmed that the issue was not related to any cyber-attack or security vulnerability.

Cloudflare

A power failure led to Cloudflare coming down for around two days. The platform uses the services of three data centres. One such data centre experienced a power failure. The outage was caused by a failure of the facility’s generators and faulty circuit breakers.

As the generators failed, Cloudflare’s network routers lost power, which disrupted services reliant on the PDX-04 data centre.

The outage primarily affected Cloudflare’s dashboard, APIs, and related services, while traffic through its global network continued to function without interruption.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...