
On October 20, 2025, a problem in one AWS region us-east-1 in Northern Virginia caused a massive chain reaction that took down parts of the internet. Apps and services people use every day, like Snapchat, Reddit, Venmo, Coinbase, Zoom, Signal, Roblox, Fortnite, and even internal Amazon services like Prime Video and Alexa, were disrupted.
Amazon Web Services (AWS) explained that the disruption started with a DNS-related failure connected to DynamoDB, one of AWS’s core database services, in us-east-1. This DNS failure then cascaded into other parts of AWS and affected many other services that depend on DynamoDB behind the scenes.
According to AWS, customers began seeing high error rates when calling DynamoDB APIs in the us-east-1 region.
Long story short: software that expected DynamoDB to respond started getting “something is wrong” responses instead. Because many AWS services rely on DynamoDB internally — and because many customer apps also rely on DynamoDB directly.
AWS said early in the incident that “the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1,” and they advised customers to try flushing DNS caches if they were still having trouble reaching the service.
By later in the day, AWS reported the outage was mitigated and services were recovering. But by then the impact had already hit thousands of businesses worldwide.
So yes: one regional problem in Virginia rippled out to the rest of the world.
Why? Two words: DNS and us-east-1.
DNS (Domain Name System) is basically the internet’s address book.
When an app wants to talk to a service — say dynamodb.us-east-1.amazonaws.com — it first asks DNS: “What IP address should I talk to?” DNS answers with an IP, and then the app connects.
If DNS can’t answer, or answers with the wrong thing, the app can’t reach the service — even if the service itself is technically healthy.
During this incident, the problem wasn’t “DynamoDB crashed and disappeared.” According to AWS’s post-incident explanation, the issue was that the DNS system responsible for telling clients where DynamoDB lives returned bad / empty data for the DynamoDB endpoint in us-east-1. In effect, software couldn’t find DynamoDB anymore.
You can think of it like this:
That is how a DNS problem becomes an internet-scale problem fast.

From AWS’s public statements and reporting:
us-east-1.These are takeaways that are directly supported by what happened and by how AWS described it.
When DNS is wrong, everything that depends on it is effectively offline even if the underlying servers are fine. AWS itself explicitly tied the outage to DNS resolution of the DynamoDB API endpoint in us-east-1.
If you run critical services, you should treat DNS like a first-class reliability component, not an afterthought.
Many teams believe, “We’re safe because we’re running in multiple regions.” But if those regions still depend (directly or indirectly) on a single “core” region for identity updates, database metadata, configuration, or control-plane actions, then you still have a hidden single point of failure. Multiple analysts pointed out and long-time AWS customers have complained that global AWS features can still rely on us-east-1.
Architects need to ask: “If us-east-1 vanished, what exactly stops working in my stack?”
AWS’s own explanation says a race condition in automated DNS management caused the bad update, and the self-healing mechanism didn’t immediately reverse it.
Automation is powerful it lets you run huge systems at scale but it also means a subtle logic bug can impact millions of users in seconds. The lesson for any team (even a small startup): review what your “safety scripts” are allowed to touch. Can they delete production records? Under what conditions?
AWS has publicly committed to adding safeguards so that an automated DNS race condition can’t wipe out a critical endpoint’s DNS mapping again. They’ve also disabled the behavior in the affected automation and are putting in additional protections to prevent recurrence.
For the rest of us engineers, team leads, founders, students preparing for certifications this incident is more than outage gossip. It’s a free architecture lesson, delivered at internet scale.
If your product (or future product) depends on the cloud, here’s the homework:
Because when something as fundamental as “where is the database?” becomes unanswerable, the whole internet feels it.

Founder of CertiPass.io
Professional Cloud & DevOps Architect with over 10 years working on projects migration, Landing zone services offerings and cloud advisor for technical teams
Sign in to leave a comment