ListNook

BenefitPlastic5609 May 8, 2026 +1092

Half the internet runs on AWS and one overheating building can take it all down, wild how fragile the infrastructure really is.

1092

schnurble May 8, 2026 +738

if people would stop being so dependent on us-east-1, it would work a lot better.

738

2_Spicy_2_Impeach May 8, 2026 +378

As someone that worked for AWS... we told folks almost a decade ago to stop f****** building in us-east-1. It's the oldest and jankiest. As a customer I degraded us-east-1 doing a load test for a super bowl event.

378

ZAlternates May 8, 2026 +131

Part of the issue is that some of their global services like dns are still based out of us-east-1 for critical components. I believe IAM (the login service) is as well, but I can’t say I’ve researched it much recently. You’d like to hope they learn from past mistakes and improve…

131

kbn_ May 8, 2026 +53

That stuff has been spread out more in recent years for exactly this reason. Still plenty of control planes in IAD but the major ones (Kinesis, Dynamo, Route53, etc) have a peer to peer failover mechanism to other regions.

53

2_Spicy_2_Impeach May 8, 2026 +27

Hopefully but probably not. I left because previously to get hired in you had to be more knowledgeable than half your team. Also, a lot of core product teams had so much turnover they were worthless due to culture. You had to befriend product teams to get info and prioritize PFRs. Still shocked stuff doesn't fall over more. So many stories of idiots from partner and event side that I should write a book of what not to do. Also, stopped doing business with institutions after working with a lot of customers. Then they hired thousands of idiots. For reference when I was there for ~1.5 years(FART), I was more senior than ~70% of the workforce. Stayed a while longer to cash out but what the f***.

27

footie_ruler May 8, 2026 +28

I was there for 7 years. Wanted to cash out all my initial RSUs because they spiked a lot. Worked in MSK, one of their largest money makers. The enshittification of AWS truly began in 23. Random cuts and layoffs that were not cuts, but merely “restructuring to provide more HC to other services”. It’s all bullshit all the way down. They think they can get away with anything, but the recent migrations to GCP and Azure should just be the start of the floodgates. AWS is so shit now.

28

2_Spicy_2_Impeach May 8, 2026 +11

7 is impressive. Had a guy on my team that was \~15+ and came from the dot com side. Had some stories. I dislike Azure more as a consumer. While there’s no love lost with AWS, I’ve never had more issues with a provider than Azure. Their gov offering was interesting to say the least that I was forced to architect with. We had weekly standup with Microsoft to go over the issues we were experiencing. Migrated away from AWS to Azure before my time there and regretted it.

11

speculatrix May 9, 2026 +1

I worked for a significant customer of aws, like $millions a month, and Google were so keen to win our business that they flew in several high level training experts across the Atlantic to give us four days of intensive training.

1

adx931 May 9, 2026 +5

Why do I get the feeling there are load-bearing perl scripts underneath it all?

5

Aar0ns May 9, 2026 +2

We do not speak of the foundations on which all things are lain

2

schnurble May 8, 2026 +37

at my previous job I kept telling people to build primary sites in other regions - use2 if it HAS to be east, or usw1/2, but noooooo. 🤦‍♂️

37

rexspook May 8, 2026 +10

I work there now. Can confirm that a lot of the critical stuff has not really moved off of it. I mean my org is in every region by the nature of what we do. It’s the dependencies that haven’t been migrated yet that take you down.

10

2_Spicy_2_Impeach May 8, 2026 +2

Unfortunate to hear but not shocking. Genuinely curious how the morale is there now. My old manager is still there and randomly emails me at 4AM asking how things are. Is the CloudFormation team finally fully staffed? It was an inside joke at the time.

2

rexspook May 8, 2026 +7

Honestly I stay as far away from anything CloudFormation related as possible. The running joke in our org is we keep making scripts and cli tools to fix the pipelines page because they don’t have the bandwidth to do it. Morale is weird. This kind of LSE doesn’t really matter to most of us. Most of the morale problems right now are due to the numerous layoffs and rapidly advancing AI adoption with no clear guidance. It seems like half the engineers in my org are spending most of their time playing around with AI just because it’s available

7

2_Spicy_2_Impeach May 8, 2026 +3

Appreciate the insight. Yeah, the AI messaging was bizarre when I was there. Leadership said push, push, push (even though my team thankfully wasn’t goaled that way). But most of the time there’d be some random internal workshop from someone who was bored and built something fun with no real world application.

3

rexspook May 8, 2026 +3

The first 2-3 months of 2026 was just tech demos of AI agents/skills/tooling that people built with nothing really being delivered related to our actual product lol

3

2_Spicy_2_Impeach May 8, 2026 +1

Oof. Gotta get those demos/PoCs in to one the probably now hundreds of Salesforce tenants for goaling.

1

rexspook May 8, 2026 +2

Don’t know what that means. I’m not in sales. These were people showing off internal tooling to the technical teams

2

anengineerandacat May 8, 2026 +6

It's their legit first env for testing and even if your not using US-East-1 your using it. There are core services that rely on it regardless of the region of your services. They have a page that goes into this in detail, I just don't have it on my current device.

6

2_Spicy_2_Impeach May 9, 2026 +1

Yes. I learned that as a customer with the DynamoDB outage the day after we launched our platform on AWS. More so that it has the most issues and expect shit to get weird. Control plane backed by DynamoDB having an issue and how many things had DynamoDB dependencies behind the scenes.

1

0neMinute May 8, 2026 +7

Are you sure about this? Most new services launch in us east1 and can only be used initially via us east 1 . If anything they have been encouraging it even if not saying it.

7

2_Spicy_2_Impeach May 8, 2026 +1

Yes. I’m sure.

1

0neMinute May 8, 2026 +1

Can you send me the docs?

1

2_Spicy_2_Impeach May 8, 2026 -5

No. Just go work for them and you’ll see.

-5

2_Spicy_2_Impeach May 8, 2026 -6

No. Just go work for them and you’ll see.

-6

0neMinute May 8, 2026 +3

I do work them which is why i asked, all new services are launched in us east 1 . Alot of services are dependent on us east 1 . Does aws want to fix this? Hard to actually say as they haven’t yet even when proving they can with the euro and china regions.

3

2_Spicy_2_Impeach May 8, 2026

Hilarious. Admitting to what I stated while asking for documentation. If you do actually work there, part of the reason I left and glad I did. It’s the biggest region and has the most issues. I don’t give a f*** if new services launch there. Again, as someone who worked with some of the largest consumers of AWS services we told folks to stop building there. Don’t get me started on China and KMS implementation. Have a good one.

0

0neMinute May 8, 2026 +4

Wth you talking about i said the opposite if what your saying. They can absolutely fix us east 1 world wide, they only did euro zone and china due to regulation changes. Those zones like gov cloud are heavily restricted on services. You should know this is obvious if you even use aws ? Edit: also as someone who worked for aws you should know one of tenets is if its not documented then its not an official stance or practice.

4

dave0352x May 8, 2026 +2

Warm and fuzzy! So glad I stopped working there

2

kbn_ May 8, 2026 +4

Half the problem is the fact that new managed services usually come up in IAD first. The other half is IAD remains the interface default for a lot of things if you’re just clicking around in the console. Amazon could do a much better job getting people to spread out to other regions.

4

CrayonUpMyNose May 8, 2026 +1

Raise the price and put the fact as a question into every associate exam, so that every last engineer mentions it as a basic fact in meetings.

1

Ani-3 May 10, 2026 +1

Doesn’t help that AWS seems to default to that region even if all of your infrastructure is in a different region. I know there’s a setting to change it.

1

CircumspectCapybara May 8, 2026 +32

It's because service providers are c**** and too lazy to properly design highly available multi-region distributed systems. If you want a five nine availability SLO, you have to be multi-regional, there's just no getting around it. A flood or hurricane or Iranian missile strike can take out an entire region and you can't do anything about that, you have to be in multiple regions. Service providers gotta stop being c**** and do proper engineering.

32

eXecute_bit May 8, 2026 +12

Marketing wants to advertise all the nines, but investors and therefore the executives won't actually pay for it because (looking at all the nines they see from AWS) "it probably won't happen". Then it's somehow my fault when it does.

12

adx931 May 9, 2026 +1

Every day brings news that makes me glad I retired when I did. I think the future is going to be a move back to slow, paper-based processes.

1

essjay24 May 8, 2026 +4

> Iranian missile strike Don’t get me started. I was getting heat because EMEA couldn’t log into my app. I asked them to send me an email about it because I knew that it wasn’t me but login services hosted out of a blown up data center. Imagine their surprise when the couldn’t login to email either.

4

Loud_Ninja2362 May 8, 2026 +1

That also requires execs who are willing to pay for the hard engineering required to do that proper design work.

1

reasonman May 9, 2026 +4

it's not even necessarily use1 that was the issue, it was a specific building in a specific az. a lot of customers deploy in single az and refuse to do multiaz or multiregion arch and get bit for it, even the biggest names do this c**** shit

4

schnurble May 9, 2026 +1

In this case yes it's a single AZ, sure. But if you look back the bigger outages have always been us-east-1. And yes customers need to have more regional redundancy. But still use1 is the common thread.

1

reasonman May 9, 2026 +1

well yes, its the largest most populous region with every supported service and feature living there. there's a much larger surface area for things to go wrong. of course they shouldn't, but things happen and if one region accounts for the vast majority of your load the naturally most of the impacts will be felt there.

1

drkspace2 May 9, 2026 +1

Even if you try your best to avoid us-east-1, you still need it for atleast iam and route53.

1

Acceptable_Bat379 May 8, 2026 +43

I work in the field, the entire internet is held together by duct tape and dreams

43

kiss_my_what May 8, 2026 +15

BGP, DNS and a whole lot of whisky.

15

seriousnotshirley May 8, 2026 +8

Can confirm, worked on DNS and BGP teams at a company that had Ana awesome whiskey cabinet.

8

oldfogey12345 May 8, 2026 +41

Its people not paying for redundancy. Oversimplifying a bit, but.. People who are affected by a heat issue for one building only paid Amazon to house their digital stuff in that one building. If your service being down for any amount of time will cost your business enough money, then it's worth the extra money to pay Amazon to have a "copy" of your website in more than one building.

41

Bovronius May 8, 2026 +22

If it was all a matter of personal responsibility that'd be great....but even though in my decades Ive never put anything on us-east, when it's down our company is pretty much paralyzed. Both sales tax calculation companies we can use for our software go dark when US-east goes down, half our vendor portals go down... Our hr software...our banks site... The move to cloud has put us all in a shared risk pool, and unfortunately everything is becoming so interconnected and dependant that when any of the big 3 have problems everyone is going to feel it.

22

Think_Positively May 8, 2026 +11

It's also only early May. I live in New England so I'm no VA weather expert, but I also don't need to be to understand that it's going to get a LOT hotter in the coming weeks and months. AWS should be counted as a utility at this point, and they have regulations in place to account for stuff like this (unless you're in TX).

11

Weaver270 May 8, 2026 +7

Redundancies are for government and private companies who dont have to meet quarterly numbers and... Insert other excuses here ..

7

Dabaer77 May 8, 2026 +4

"Efficiency" as understood by an MBA is eliminating any kind of back up or redundancy and hoping things just never break. Then when they end up breaking a different group of MBAs get to say no one could have foreseen anything ever breaking.

4

sylbug May 8, 2026 +6

It wasn’t supposed to be like that. The strength in a distributed network is that you can route around damage. We took a robust, distributed system and did a capitalism on it.

6

no_dice May 8, 2026 +1

This was literally one AZ in a region with 6 of them. Anyone who experienced an outage as a result implemented something that goes against best practices.

1

oneseason2000 May 8, 2026 +7

Maybe more like how fragile it is when a few people can make unilateral decisions impacting tens of millions. The wild part is how the unilateral bit comes about, imo.

7

shinjikun10 May 8, 2026 +19

Back when the internet first started there was a man who helped design TCP/IP. He said in a meeting in congress that he could take down the entire internet himself if he wanted. I can't remember his name.

19

Single_9_uptime May 8, 2026 +27

Probably in reference to BGP. Anyone with access to core internet routers from tier one providers could cause havoc. But there are a lot more controls on that today than there were in the early days of the internet, and the network is far more disparate to the extent it isn’t possible for one person to take down the entire internet. Limited things still get broken occasionally from bunk advertisements leaking out, but it’s very limited who’s capable of doing so. If it were that easy to take down the entire internet in remotely modern times it would have happened already.

27

hitbythebus May 8, 2026 +2

Some folk would take your comment as a challenge.

2

shinjikun10 May 8, 2026 -2

It could have been Yakov Rekhter or maybe Vint Cerf. I can't remember.

-2

ThoughtsOfALayman May 8, 2026 +8

Are you referring to L0pht, maybe? It was a group, rather than one man, but they made that claim before congress.

8

diogenes-shadow May 8, 2026 +9

Each AWS zone has at least three data centers working together. Any one of the three buildings can go down and the others should be able to keep things running most of the time. The internet and modern services are very fault tolerant. They have outages but you mostly never hear about it for this reason.

9

PNW_ModTraveler May 8, 2026 +3

Both statements are false. I don’t support data centers but if you want to cry wolf… It’s closer to 32% and “taking it all down” is just a sensationalist take.

3

[deleted] May 8, 2026 -3

[removed]

-3

PaidUSA May 8, 2026 +3

This is worse than any slop post by far. "Regulate my free expression" is so much more detrimental than slop.

3

Justin__D May 8, 2026 +3

Right? The internet used to be a much more open forum. Now it's censored to hell and back, as proven by shit like use of words like "grape" and "unalive." That's a core part of the *problem* with the modern internet, yet we have people wanting more of that?

3

[deleted] May 8, 2026 +1

[removed]

1

PaidUSA May 8, 2026 +1

Block most of the c*** people post. Don’t f****** backtrack now. That’s literally censorship you’re calling for.

1

PNW_ModTraveler May 8, 2026 -3

So he come a police state like China but worse!? 😂

-3

mineyCrafta25 May 8, 2026 +1

The "cloud" infrastructure at that

1

Every-Development398 May 8, 2026 +1

AWS is not one region but many this will impact some but not all by any means.

1

weasel5134 May 8, 2026 +1

Infrastructure is so much worse than you know. Just in general

1

LittleKitty235 May 10, 2026 +1

About 6-8% of US bridges are officially rated as poor condition or structurally deficient. The power grid in both Texas and California is held together with hopes and prayers. If you've been paying attention you know exactly how bad it is now, and after Trump DOGE efforts expect it to get worse

1

weasel5134 May 10, 2026 +1

I have first hand horror stories I worked underneath (not on just physically below) a bridge so bad I was scared to drive my truck back over again

1

Hedhunta May 8, 2026 +1

Even the nice looking data centers are filled with kluge fixes in my experience its amazing anything works at all

1

JuicedRacingTwitch May 8, 2026 +1

If your ops are that critical you should plan for multi cloud failover and redundancy. This is a budget and scope issue.

1

Bornee35 May 9, 2026 +1

Those who came up with the original principle of a resilient, decentralized network for sharing information are probably rolling in their graves right now

1

DisillusionedPatriot May 9, 2026 +1

Even more wild is the lack of urgency to update said infrastructure.

1

Gzngahr May 10, 2026 +1

This is a weak point in the supposed AI job stealing apocalypse and why they would love to put data centers on the moon or in orbit. Lay off too many people with little prospect of finding alternative work to continue affording their lives, someone is bound to attack the infrastructure. You don’t even have to destroy it, just sabotage the cooling systems or power supply.

1

haklor May 10, 2026 +1

The major cloud providers give architecture guidance to companies to ensure that services are not impacted in a single data center or region is impacted. It is on the various companies to determine the cost/benefit analysis on if they want to pay for the availability. Some companies refuse to pay until they get impacted by a small outage or degradation.

1

DukeandKate May 8, 2026 +83

Coinbase impacted - good.

83

livenn May 8, 2026 +57

Didn’t know those housed toilet paper

57

im-ba May 8, 2026 +13

I understood that reference 🔥🧻🔥

13

broke_boi1 May 8, 2026 +16

Took an AWS Cloud Architecting course a few months ago. One of the things they hammer is to deploy and have backups in different availability zones so shit like this doesn’t happen

16

adx931 May 9, 2026 +3

Unfortunately, Amazon won't pay for the courses for their own people so they don't know that on the infrastructure side.

3

waidee70 May 9, 2026 +2

More like Coinbase didn’t know basic redundancy if one AZ caused actual issues for them

2

Sirwired May 9, 2026 +1

Any Amazon employee is eligible for the internal training.

1

fountain20 May 8, 2026 +27

Can we start doing this in real time. A year and a half has passed. Little late to fix the problem.

27

Magic_Neil May 8, 2026 +33

Well that explains why a half dozen of my servers went down a couple hours ago!

33

karateninjazombie May 9, 2026 +4

Meh. Non emergency. Massive f*** up in planning though. Someone will lose their job as a result.

4

couchjitsu May 9, 2026 +3

Guess coinbase will have to axe another 14% of their workforce

3

RiversSecondWife May 8, 2026 +10

We have a bunch of fire here in Florida. You want some for that data center?

10

czs5056 May 8, 2026 +3

Better suck ALL the water in the entire state then to cool it off. /s

3

thepianoman456 May 8, 2026 +4

Let’s just fuckin scrap AI. It has a couple legitimate uses, but for all the AI slop the people generate for memes, or “creating art / music” (by stealing other people’s legitimately created art and music) we need more data centers. If we all just refuse to use garbage ass generative AI, there won’t be a need for more data centers… at least, the absurd amount that the tech billionaires want to build.

4

Software_Quiet May 8, 2026 +3

as the kids say, let them cook!

3

Iconic254 May 8, 2026 +3

The disruption was caused by overheating at a data center, which subsequently triggered a power loss that affected specific hardware.

3

Mrjlawrence May 8, 2026 +3

I’m sure this won’t result in tech bros clamoring for more data centers /s

3

Pardot42 May 8, 2026 +2

I'll bet there will be many data enters catastrophically overheating in the next few years

2

karer3is May 10, 2026 +1

We can only hope... although I can imagine the crypto and AI bro meltdowns that ensue will be even more intense than those of the data centers

1

olearyboy May 8, 2026 +1

Wasn’t even that hot yesterday

1

ReedForman May 8, 2026 +1

Come into one of the warehouses.. Amazon been going c**** on their AC bills lately

1

MrBahhum May 8, 2026 -2

They are overheating because they are poorly managed. All data centers are resource sinks. They need to disclose how much resources they use.

-2

i_am_voldemort May 8, 2026 -3

A communication disruption can mean only one thing: invasion. For those who did not catch the reference: https://youtu.be/eF4Hcr7XX3c

-3

secretqwerty10 May 8, 2026 -3

or, hear me out: the cooling failed, like it says in the article

-3

i_am_voldemort May 8, 2026 +4

It's a quote from The Phantom Menace you twit.

4

secretqwerty10 May 8, 2026 -2

ooo i'm sorry i don't remember a forgettable quote from a movie that's 3 years older than i am

-2

geekgirl114 May 9, 2026

Is it one data center or the whole us-east-1 region? There are about 10 zones in the region

0

AWS says data center overheating in North Virginia disrupts services; Coinbase impacted

💬 Send a Message

104 Comments