On 18 November, Cloudflare experienced a major outage that left millions of websites offline. Many users saw error pages and slow responses across the internet. The company has now explained what went wrong. The cause wasn’t a hacker or an attack; it started with a small internal change that unexpectedly cascaded into a network-wide failure. Here’s a breakdown of the five main reasons the outage happened.
A routine update clashed with Cloudflare’s own setup
Cloudflare pushed out a software update to machines responsible for routing internet traffic. These machines hold the maps that show where data should go. The update introduced changes that did not match Cloudflare’s existing configuration. Once the mismatch appeared, the machines started making wrong routing decisions inside Cloudflare’s network.
The problem grew quickly because the update went out to many locations at once. Cloudflare’s network is large and tightly linked, so any mistake spreads fast. The company expected a smooth update. Instead, the system reacted unexpectedly, triggering a chain of routing failures.
A single configuration tweak broke internal communication
Within Cloudflare, many services communicate via private routes. The update changed how these internal routes were built and maintained. This broke connections that the company’s own systems needed to stay stable. When those routes failed, Cloudflare lost visibility into parts of its network.
This was not an outside breach or a security attack. It was a configuration change that worked on paper but behaved differently once deployed at scale. The moment internal paths dropped, key components could no longer reach the tools that support them. That loss of communication made the outage harder to fix.
Related Article: Cloudflare outage shuts out users on CDCare, Paystack API, Zoom, and X for the fourth time this year
Critical systems were cut off from the tools that keep them running
Modern internet services rely on shared internal tools for logging, monitoring, coordination and system checks. When Cloudflare’s internal routes failed, these tools became unreachable. Without them, Cloudflare could not retrace what was happening in real time. Engineers were effectively working with limited visibility.
The outage also made it harder for Cloudflare to run its normal processes for diagnosing and repairing issues. Some of the automated systems that manage recovery were unavailable. This left engineers with fewer options, so fixes had to be applied more carefully and slowly.
Cloudflare’s massive network made a small error grow into a global problem
Cloudflare operates thousands of servers across many cities. On a normal day, this scale keeps things reliable. On Tuesday, the scale amplified the failure. Once the bad configuration spread across multiple regions, the impact multiplied. Even though some cities remained stable, enough key locations broke to affect millions of users.
The size of Cloudflare’s network also meant the outage did not behave in a clean or predictable way. Some services struggled. Some half-worked. Some were unreachable. This inconsistent behaviour confused users and created pressure to understand the issue quickly. The widespread reach of Cloudflare meant the disruption touched many unrelated websites.
Recovery took time because engineers had to undo the change safely
Cloudflare engineers identified the root cause early, but they could not simply roll everything back at once. A rushed rollback could have caused new failures. They began restoring regions in a controlled order, rebuilding internal routes and checking stability before moving on.
The process took time because each region needed to confirm that routing was healthy again. Once the core systems regained stable connections, Cloudflare’s public services started to recover.
CEO Matthew Prince called it Cloudflare’s worst outage since 2019 and “unacceptable.” The company has said it will strengthen its testing process and add more safeguards to prevent this type of chain reaction in the future.
You can read Cloudflare’s full technical postmortem here if you want the detailed version: https://blog.cloudflare.com
Get passive updates on African tech & startups
View and choose the stories to interact with on our WhatsApp Channel
Explore
