Downdetector Helped the Early Detection of Major Outages During the First Half of 2024 | Ookla® (2024)

Since the beginning of 2024, there have been several high-profile outages causing service disruptions. Ookla’s Downdetector™ captured these events based on consumer reports, providing a unique perspective that can help us understand how they affect end-users. In this article, we analyze this data to assess the scale of selected outages, explore how they spread, and uncover interesting consumer behavior patterns.

Key takeaways

  • On February 22, 2024, AT&T experienced the largest operator outage in the world since 2020. Downdetector captured over 1.8 million reports related to AT&T’s nationwide outage reflecting its large scale. Ookla’s platform also helped alert Verizon and T-Mobile customers about the outage caused by AT&T to reduce unnecessary reports to their respective Downdetector pages.
  • In March 2024, Meta experienced one of the largest outages to date affecting several core services. Over 6.5 million reports were submitted in just over 2 hours related to Facebook on March 5th. A second outage in April 2024 highlighted the value of Downdetector to rapidly detect the outage for swift response times, as it identified the issue quickly through user reports, unlike traditional network and application testing solutions.
  • Telkom in South Africa witnessed a few network outages since March 2024: On May 13th, Telkom’s South African network experienced a temporary nationwide outage, causing customers to lose signal and access to their services. That outage followed a series of disruptions affecting subsea cables in Africa, highlighting the vulnerability of this critical communications infrastructure and the need for diversification and backup solutions such as satellite internet.

Services outages have been more in the news in recent months and have an ever-growing impact on consumers and businesses alike

In the past 18 months, several high-profile outages have disrupted services across several industries – from telecommunications (AT&T in the U.S.) and social media (Meta) to cloud services (Microsoft and AWS). These disruptions extend far beyond momentary inconvenience. As consumers increasingly depend on constant connectivity for communication, entertainment, and essential services like emergency response, the stakes are high. Businesses with a strong online presence and those reliant on cloud services are particularly vulnerable, risking productivity and revenue loss, as well as potential reputational damage.

Our digital infrastructure’s highly interconnected nature means a single outage can trigger a cascade of disruptions across various sectors. In today’s era of round-the-clock media and social platforms, even small disruptions can quickly escalate into significant crises, amplifying their visibility and impact.

It is therefore crucial to have systems in place to detect such events, manage outages, and develop a comprehensive contingency plan. By spotting anomalies early, service providers can isolate problems, minimize downtime, prevent escalations, and keep users informed throughout the outage. In this context, crowdsourced data can complement internal fault detection systems by assessing the outage’s scale and providing real-time information to affected users. Identifying priority areas allows for a more coordinated response, minimizing impact and protecting the company’s reputation.

Ookla’s Downdetector™ is the leading source for real-time status and outage information for thousands of services and websites around the world. Powered by unbiased, transparent user reports and problem indicators from around the web, it helps understand disruptions to vital services, empowering consumers, and informing businesses when customers are experiencing issues. The platform tracks over 14,000 services from around the world and receives reports from more than 200 million unique users. Users submit problem reports on Downdetector localized websites, which also collect indicators from social media and other web sources. These reports are then validated and analyzed in real-time to flag potential service disruptions and other problems. An incident is confirmed when the volume of reports significantly exceeds the typical baseline for a service.

In the sections that follow, we leverage Downdetector data to analyze three outages that occurred during the first half of 2024 by tracing back their evolution, providing insightful analysis, and uncovering interesting consumer behavior trends.

AT&T experienced the largest operator outage in the world since 2020 according to Downdetector

AT&T, the largest mobile operator in the U.S.A. with over 240 million subscribers, experienced a nationwide network outage on February 22. This affected its mobile network, leaving thousands of users without voice, messaging, and data services for several hours. Based on the number of reports on Downdetector.com, this was the largest outage of any telecom operator in the world since November 2020.

At 2:45 AM CST on 22 February 22, 2024, Downdetector started receiving thousands of self-reported incidents related to AT&T services, far exceeding the baseline. Reports peaked at 73,502 at 8:15 AM CST as people started their day. In total, AT&T received nearly 1.8 million issue reports on Downdetector between 2:45 AM CST and 5:45 PM. The number of reports started to dwindle rapidly after 10:45 AM, returning to normal by day’s end. This outage also affected AT&T’s sub-brand, Cricket Wireless, with reports tailing off in the late afternoon.

Rivals Verizon and T-Mobile also had higher-than-normal report volumes but on a much lower scale. Self-reported incidents peaked at 7:00 AM CST at 4358 and 1990 for Verizon and T-Mobile, respectively. The majority of these reports resulted from customers trying to connect to AT&T customers since both operators confirmed their networks were operating normally. These consumers are not wrong, though: they were unable to use their service as intended. To reduce the number of reports, Verizon and T-Mobile customers visiting Downdetector.com were informed that the issues being reported were likely related to AT&T. Such proactive measures helped to alert customers unaware of issues with third-party services that affect their experience and avoid unnecessary support calls.

Downdetector data points to an internal root cause within the AT&T network since no major cloud services provider, or popular online services, received a large number of reports at the same time as it did. The issues were reported from across the US, with users from Houston, Chicago, and Dallas generating the most reports. AT&T attributed the issue to a technical error in the application and execution of an incorrect process during the network expansion process. Initial concerns were about potential cyberattacks but there was no evidence to suggest that. The outage onset also corroborates with the explanation provided by AT&T that it occurred during typical maintenance hours in the very early morning. According to AT&T, three-quarters of the network was restored by the afternoon.

The outage also meant that customers were unable to call emergency services. Some public services, such as the New York Police Department, could not use their phones connected to the AT&T network. However, AT&T’s FirstNet network for first responders such as the police and fire departments remained operational.

As a result of this incident, AT&T’s share price fell by 2% and it could face fines due to the inaccessibility of emergency services during the outage. AT&T also offered customers a $5 credit in compensation for the incident. This highlights the potential financial cost of service disruption if not managed efficiently and if the network is not quickly restored.

Two months later to the day, AT&T experienced another, albeit more limited, outage that affected residents of Virginia and North Carolina due to equipment failure. The number of self-reported issues peaked at nearly 1300 in the morning of 22 May before subsiding one hour later.

Meta experienced one of the largest outages to date based on the number of services affected and the duration

On March 5, 2024, Meta experienced a widespread global outage impacting several of its core services including WhatsApp, Facebook, Facebook Messenger, Instagram, and Threads. The outage was first reported by mid-afternoon and began to clear at about 5:00 pm UTC, lasting about 2 hours. During this time, people could not log in to their Facebook accounts, with the site erroneously indicating that their passwords were no longer correct, sparking concerns about potential hacking.

Reports of issues with Meta’s services followed a similar trend:

  • Facebook reports started pouring into Downdetector around 03:15 PM UTC, peaking just 15 minutes later at over 2.35 million trouble notifications within that period. Between 3:15 PM and 5:15 PM, the total number of submitted reports exceeded 6.5 million.
  • For Instagram, the number of reports peaked at 529,140 at 3:30 PM UTC, with users reporting problems with the app.
  • People started reporting issues with Facebook Messenger‘s chatting services and problems logging in mid-afternoon, with a peak of 158,419 reports at 3:30 PM UTC.
  • Users on WhatsApp were comparatively much less affected by the outage, with only 25,312 reports between 3:00 PM and 5:30 PM, compared to 6.5 million for Facebook, over 1.8 million for Instagram, and 410,281 for Facebook Messenger.

Meta attributed the outage to an unspecified technical issue, with most users regaining access to its services by late afternoon. This episode highlights the potential risk of not quickly communicating about ongoing outages, raising customers’ concerns, and prompting them to change their passwords multiple times to regain access to their accounts, compounding incoming traffic to Meta platforms.

At the beginning of April, WhatsApp services went briefly offline again. Users could open the app and view their chats and history, but could not send or receive any messages. The same applied to Facebook Messenger; Instagram and Threads were less impacted. Starting at 6:10 PM UTC on April 3, 2024, Meta services, particularly WhatsApp, received many user reports on Downdetector. Between 6:00 PM and 8:30 PM UTC, WhatsApp reports amounted to over 1.7 million, much higher than that of Instagram at over 200,000 and Facebook at 35,721. Meta did not make an official statement explaining the cause of this outage.

Traditional network and application testing solutions did not detect this Meta outage because the network paths looked normal and did not exhibit any errors from the outside (using Ping-type tests). However, Downdetector identified the issue early thanks to user reports, highlighting the importance of quicker outage detection enabling faster response times.

Telkom in South Africa has witnessed a few network outages since March 2024

On May 13, 2024, a significant outage affected Telkom’s nationwide operations, leaving customers unable to use internet services, place calls, or send text messages. The service interruption began around 1:00 PM UTC, with user reports peaking between 1:00 PM and 3:00 PM UTC. During this window, approximately 48,433 outage reports were logged in Downdetector. Users also went to social media platforms such as X (formally Twitter) to voice their frustration.

Although service was largely restored by 4:00 PM UTC, lingering issues persisted in some areas. Downdetector received reports well into the evening, indicating that certain users continued to face connectivity problems. In response to the inconvenience, Telkom offered all affected customers compensation of 1GB of data, valid for two days.

Analysis of the outage reported by Telkom subscribers revealed that 40% pertained to internet connectivity, while 35.2% related to mobile phone services. Almost a quarter of the complaints described the situation as a “total blackout” of the mobile network, suggesting an extensive disruption. The outage inevitably impacted access to popular online platforms, including TikTok, YouTube, and Netflix.

The root cause of Telkom’s outage on May 13, 2024, remains unspecified but it follows another disruption that occurred the day before and affected online services in South Africa and several countries on the east coast of the continent including Kenya, Tanzania, Rwanda, and Uganda. This broader disruption was attributed to damage to the undersea cable system that connects the region to the rest of the world. Customers reported slow internet speeds and intermittent service throughout the day.

A more extensive outage had previously impacted the Western and Southern parts of Africa, including South Africa, on March 14, 2024. This disruption was due to multiple failures of the undersea cables and resulted in significant economic repercussions. For example, banks were forced to close in several countries including Nigeria, and mobile users across the region faced sluggish speeds and interference with financial transactions.

Regardless of whether Telkom’s recent service disruption was directly related to the recent subsea cable damages or not, these events underscore the critical importance of this infrastructure in sustaining Africa’s connectivity with the globe and expose the vulnerabilities inherent in a communications network reliant on limited pathways. They also show the important role of South Africa in serving parts of the continent since big regional companies have data centers located in the country.

The situation highlights the urgency for diversifying subsea cables and exploring alternative technologies, such as satellite internet from providers like Starlink, to serve as a contingency measure. However, even these technologies are not immune to challenges, as evidenced by the disruptions to Starlink in May 2024 due to a geomagnetic storm.

The network outages experienced by major service providers like Meta and AT&T emphasize that even the most extensive and relied-upon networks are susceptible to major service interruptions. Such disruptions can have a profound impact and disrupt critical services given consumers’ and businesses’ dependence on such infrastructure. The network disruptions faced by countries in Africa since March 2024 also highlight the economic risks linked to the limited number of subsea cables.

While infrastructure resilience improves over time, the complexity of modern systems means that organizations must proactively identify and mitigate network failures. Tools like Downdetector enable early detection, informed contingency planning, and transparent communication with concerned users – all essential for navigating outages and preserving user trust in an increasingly interconnected world.

If you would like to know more about Downdetector, please contact us.

Downdetector Helped the Early Detection of Major Outages During the First Half of 2024 | Ookla® (2024)
Top Articles
Latest Posts
Article information

Author: Ouida Strosin DO

Last Updated:

Views: 5441

Rating: 4.6 / 5 (76 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Ouida Strosin DO

Birthday: 1995-04-27

Address: Suite 927 930 Kilback Radial, Candidaville, TN 87795

Phone: +8561498978366

Job: Legacy Manufacturing Specialist

Hobby: Singing, Mountain biking, Water sports, Water sports, Taxidermy, Polo, Pet

Introduction: My name is Ouida Strosin DO, I am a precious, combative, spotless, modern, spotless, beautiful, precious person who loves writing and wants to share my knowledge and understanding with you.