Tell HN: I DDoSed myself using CloudFront and Lambda Edge and got a $4.5k bill

		Tell HN: I DDoSed myself using CloudFront and Lambda Edge and got a $4.5k bill
	175 points by huksley 5 hours ago \| hide \| past \| favorite \| 214 comments
	I am using awesome NextJS and serverless-nextjs and deploy my app to CloudFront and Lambda@Edge. I made a mistake and accidentally created a serverless function that called itself. In a recursive loop, with a 30s timeout. I thought I fixed it and deployed the code to the dev environment. I have had an AWS Billing alert (Budgets) set up to prompt me when my monthly budget goes over $300 (my usual bill is $200/month). Imagine the terror when I woke up the next day to see the AWS Billing alert email saying I already owed $1,484! I removed a function and deployed it again in 30 minutes, but it was too late. It has already run for 24 hours, using over 70 million Gb-Second! Only after that I've learned that AWS Billing alerts do not work this way for CloudFront. You get delayed information on charges because they collect them from all regions. On the following day, the bill settled at a shocking $4600. This is more than we have ever spent on AWS all time. CloudFront includes the AWS Shield Standard feature, but somehow, it was not activated for this case (Lambda@Edge calling itself via CloudFront). Now, I understand that I should have created CloudWatch alarms, which would alert me when the number of requests exceeds the limit. The problem is, that they need to be set up per region, and I got CloudFront charges from all points of presence. I am a big proponent of the serverless approach. It makes it easy to scale and develop things (e.g., you get PR review version branches for free, both frontend and backend code like Vercel does). But now, I am unsure because such unexpected charges can ruin a side-project or emerging startup. Now I am waiting on a response from AWS Support on these charges; maybe they can help me waive part of that. What is your experience with it? Would you recommend to use to build a new product if you are bootstrapped, 3-person startup?

I'm very much on the boring technology side of things with respect to hosting.

40€ / month gets you a very powerful dedicated server that can easily handle millions of requests per day and performs incredibly well and can be managed easily.

If you also use containers you even get quite a bit of flexibility and agility.

To be honest I don't really understand the sentiment that developers can get away with not knowing basic sysadmin stuff and at the same time have to spend relevant amounts of time, energy and money to get up to speed with cloud solutions, k8s and so on.

But then again, I'm not one of the cool kids...

Servers can also be liability. You need to document, implement and maintaing hardening, have a process for regularly patching os and apps, monitor logs, have backup and disaster recovery procedures, regularly test the procedures, figure how to implement data encryption at rest, implement high-availability and so on.

Good platform-as-service can solve many things for you and let you focus on the core thing you are providing.

Obviously not everybody needs to worry so much about the stuff mentioned above. If you are providing SaaS solution, there’s a good chance some customer will start asking these questions as part of their procurement process.

I'd rather worry about (and fix) those technical problems then having to deal with possible billing-pocalyse.

An occasional $4K charge is negligible for a business (almost everyone is better off dealing with this sort of overrun than troubleshooting servers, all else equal), and anyway this is a particularity of AWS CloudFront's billing protections rather than a fundamental flaw with serverless.

Take for example OS and app upgrades. Quite often you have few requirements, for example 1)all updates should be first tested in test env 2)updates need to be installed in timely manner 3)critical updates need to be installed quickly (30 days is too slow)

When you start thinking these, they are not so easy. (1) means you can’t just run “apt upgrade” on every server - you need to manage the updates to make sure they get tested first. (2) is kind of ok, but requires some work on regular basis (at least checking things) (3) means you need to monitor the updates for your stack and classify them. You can get feeds for example for Ubuntu, but does that cover whole stack. And checking these weekly (or daily) actually gets boring.

All this stuff can be done, but IMHO it is time consuming and takes time from more important things. The weekly/monthly JIRA tickets for checking x, y and z get quite annoying when you also have n other things to finish. Then you start slacking on them and feel the pain when trying to collect evidence for the next procurement process check.

If you have tens or hundreds of servers and can have a separate infra team with professional sysadmins then this is all fine. My rant is mainly for small teams, where same guys are supposed to develop and run things.

Small teams should just configure apt unattended upgrades and call it a day. I do this with my personal server and haven't had any issues for years.

Also the cost of those services vs. a few well-configured basic VPS hosts is nuts. I was shocked to learn that to get MySQL/Postgres in GCP it's often >$40/month - even for the insanely tiny instance sizes. We're talking for like 20G storage...

In contrast I've ran some pretty hot-and-heavy MySQL/Postgres databases on basic DO/Linode VPS's for half of that cost with much success. I get these cloud tools give a ton of "out of box" features - but you are paying for it at the end of the day.

Anecdotally I've noticed a huge shift away from general devs/engineers having general CLI/Linux/sysadmin knowledge.

If you had no knowledge on standing up a DB server (installing, configuring, monitoring, logging, access control, and backups, restricting network access), how much time would that take you to learn? That and your VPS requires setup for access control, patches, updates, monitoring, logging.

Point taken, upvote given, but for a lot of people standing up a standard redundant DB config for MySQL and Postgres isn't that big of a deal.

I say this as an engineer who has a ton of relational DB experience. Before cloud providers w/DB options were so plentiful devs/engineers/sysadmins would have to set this up by hand.

> If you had no knowledge on standing up a DB server

My point is, "but yea what if you do?" There's actual value in that I've found, especially when I'm looking to save pennies at the early-stage of a startup while in the customer aq phase. Many folks have expertise in those areas without needing a cloud provider.

I do get the "but all of these things!" argument (access control, patches, updates, monitoring, logging) but often those are easily solved/solvable problems even without leveraging cloud offerings. I can get very far along with an ELK/TIG stack + uptime robot and basic VPS networking features... all for a very affordable monthly bill on the infra.

you can get servers with managed hosting, still at a fixed rate each month

I ran a website with 37 million users and 3.6Gbps peak bandwidth (JavaScript+thumbnails, no video) from my own two racks of Linux servers thay i have not systematically updated for years. The OSes were beyond the lts support winows. I manually compiled my own updates, but very rarely and only those that i deemed critical. Granted, the site stack was completely custom so the standard automated hacks didn't work. In 15 years there were no accidents.

I dont think this should be standard. It sounds like you’re saying you ran a service with 37 million people’s information on a software stack that was so old that not even the vendor is supporting it anymore and could be riddled with security issues that you wouldn’t even know much less be able to detect? It may work but certainly not going to get any security certifications this way..

Not to mention that a successful compromise could be used to serve malware to the users.

I hardly stored any PII. Also, this was a high profile site that you know: if it was hacked, they would probably try to deny access and extort: it would make a lot of monetary sense. We would also have lost our merchant accounts very quickly (although CC numbers were not stored, only MD5s, but i suppose they could have been captured from the application's memory after TLS decryption but before MD5 hashing, although this would have been a difficult task even on a rooted server). I am 99.999% confident by observing and recording loads, traffic, logs and other parameters of all servers that there were no intrusions. I understand that there's market size and appeal for bureaucracy and certifications, but i have my own data.

Thankfully, the servers were not in Europe! I said "hardly". It's genuinely amusing how people downvote facts, simply because they don't fit the zeitgeist.

With unattended-upgrades you might get away with such behavior. Also, spme webservers have a great track record. Then its the question if your other services are secure, and your JS/CSS.

Nginx and apache (upgraded from 2.2 to 2.4 once)

I'm a fan of boring technology too, but I would like to suggest to you that Serverless _is_ kind of boring.

Essentially you just upload a ZIP of your application, and register a handler function that takes a JSON payload.

Obviously this is quite a bit more boring than a K8s cluster, with a bunch of nodes, networking, Helm charts, etc.

I would posit that even compared to something like a DO Droplet, Serverless is still kind of boring. Everything is going to fit into the model of registering a handler function to accept a JSON payload. There's no debate about whether we're going to have Nginx, or what USGI runtime we're using. It's just a function.

And with Serverless, your cost for doing a couple million, 2-second-long requests is about four cents.

The challenge with serverless is building systems that rely on more complex backend processes and existing code and doing things like testing using most of an existing codebase. Serverless is great for nodejs/javascript stacks that are database and front-end heavy but don't need more complexity like queueing, event streaming, or more complex architectures. Then serverless becomes a huge mess normally and the developer experience becomes a giant catastrophe, as well.

Here the OP kind of got caught by the terrible DX that is almost natural to serverless IMO.

> your cost for doing a couple million, 2-second-long requests is about four cents.

This seems wrong to me ? Can you explain a bit more ? Are these just API requests or ?

It's admittedly a simplification and a best case. For AWS Lambda, the price is in GB-seconds, and the amount of CPU available to your function is itself a function of the memory allocated.

The price is $0.0000166667 for every GB-second on X86. So it looks like I've also misplaced a decimal. It's $0.20 per 1M requests with 1GB of memory.

Lambdas can be sized from 128MB to 10GB, and pricing depends on the resources allotted.

Ultimately, this pricing just represents the compute time for a function to complete. That function can do whatever it wants.

There are additional costs to put an API Gateway in front of it, for instance, to make it a publicly accessible API.

Also s3 costs for storing the lambda function code, dependencies and that functions past versions.

I'm not day to day on cloud stuff, haven't been in a while, that's why I'm asking this, not intended to be passive-aggressive:

So if I had a "hello world" function, that basically just returned a constant JSON payload...I'm seriously looking at a pittance, pennies for millions of requests? I.e. $0.20?

Reading some replies here, it’s no wonder some startups go bankrupt while investing so much in infrastructure — all while not having enough users to justify more than a single dedicated server.

Scalability should be the last thing in mind if your product sucks.

Amen.

I've consulted for companies that went all in on AWS Lambda + AWS SQS, only to transition them to an EC2 instance that performs the same computations at a fraction of the cost.

No — you do not need K8 on day one.

> EC2 instance that performs the same computations at a fraction of the cost

And typically with much lower latency/request times.

GCP functions, AWS Lambdas, etc... my anecdotal experience is that they are way slower for request times vs a $20 VPS running the exact same workload.

> To be honest I don't really understand the sentiment that developers can get away with not knowing basic sysadmin stuff and at the same time have to spend relevant amounts of time, energy and money to get up to speed with cloud solutions, k8s and so on.

There was a thread here the other day on how DevOps has failed. And this should hopefully show everyone why DevOps is needed. Cloud infrastructure is complex and needs specialists, otherwise simple mistakes can be very costly. If I make a new lambda, you better believe I'm watching invocations.

Or does it? The whole point of GP is that you don't need the complexity that would require devops staff to manage if you are not serving > 10s of millions of reqs in huge bursts with long stretches of dead time between.

I don't know, I'm curious about the actual requirements vs the marketing buy-in and resume building that may happen when designing the system.

This. The company I work for use AWS and that makes sense for the level they scale at etc and I'm sure they work out a deal with Amazon.

I have a few dedicated servers monthly cost ~100$, never ran into any problems, for a brief period I needed to pay cloudflare for some dns management that was getting a bit heavy, but even that was because I didn't know enough as to how to optimize.

For any personal projects etc / POC's even startups launching to less than 100k daily users (traffic and load type dependant ofc) hold off on the AWS>

I also don't like the lambda architecture in AWS as it seems to lock you into it in a sense, you're almost tied to that infrastructure for good. Yes you can rework it but thats tech debt that may not be possible.

Love AWS but only as an enterprise solution.

If you're putting your service behind CloudFront, you're probably aiming for DDOS protection and low latency or something that you can't easily get from a VPS. Of course, you can put your VPS behind CloudFront, but you would run into the same issue (albeit the fixed capacity of your server would effectively cap your bill).

A single VPS is great if you don't need reliability or scale, but if you care about those things you probably need at least a few of them running behind load balancers, and you probably don't want humans manually poking at them (but rather you want some reproducibility). Moreover, there's a bunch of host management stuff you're taking upon yourself--configuring metrics collection and log collection and SSH and certs and stateful backups/replication/failover and application deployment and process management and a whole bunch of other things that come for free with serverless.

Implying that serverless/Kubernetes/whatever is just hype ("cool kids") while ignoring the pretty significant discrepancies in requirements is pretty silly.

Cloudfront is terrible for DDOS protection though, given that there is a per request pricing (and no, AWS WAF doesn't help as it just imposes additional pricing on top, the only thing it does is to prevent Cloudfront from processing those requests.)

However, if your statement is generally about CDNs such as Cloudflare or Bunny.net that don't have per request pricing, it makes sense.

Standing up a service isn't only about compute. Where is your logging, metrics, alerting? Even with a server, you need methods for access control, hardening, log rotation, updates/patches (what happens when the OS goes out of support?).

Yeah, but how will you scale up your multi-dozen-users-at-the-same-time app easily then?

Scaling is a hard problem. I've worked at places where we used a whole network of auto-scaleable services and guess what - you will still have problems. Each managed service has tradeoffs, often ones that you only encounter once you'd made a substantial commitment to that service. There is no free lunch and you're fooling yourself if you think there is one.

Often you get lucky and the cluster of managed services you select happen to scale along the metrics of your resource use. Many people's resource use patterns are similar and the managed service people take advantage of that. This is nice! But it's a trade for downside risk: you may find that your resource use patterns differ from the 90% case and your spend goes up very fast or your scaling hits a wall.

In my experience a lot of designing a service architecture is picking where you want your complexity. Services (in or out of containers) running on VMs have a simple billing and architectural model. In my experience they form a good basis to organize your other resources around and are a good foundation to grow from.

I think this is a joke people missed. Surely scaling multi-dozen-users-at-the-same-time was obvious enough that people didn't start talk about scaling seriously, but that's what happened.

If you app has multi-dozen users, and needs to "scale" something is wrong.

How do you scale your cloud app?

Usually, the hard part of scaling isn't raw compute power, it's scaling your datastore after you've already exhausted the option of throwing more compute power at it, and this problem remains whether you're on the cloud or not.

Until you hit that problem however, throwing more hardware at it is the right solution (you may be surprised just how much load a single Postgres server on bare-metal can handle).

I agree with this in general, but would caveat that AWS etc. have made throwing more hardware at the datastore solution a lot easier. You're right that a bare-metal Postgres monster can serve a lot--but Aurora Serverless V2, if you can live with its (pretty mild IMO) quirks and if you can pay for it, is a profoundly hard-to-argue-with offering.

Buy more servers, loadbalance, and don't take the short painful route architecture-wise?

It isn't really that hard.

Can't wait to end my next system design with "just buy more servers. scaling isn't really that hard."

I've done exactly this in many meetings. It's a balance between infra costs and dev time optimizing spend. Typically ping pongs back and forth.

There is a point where that becomes exponentially prohibitively expensive.

At a former startup where I worked as senior engineer, that was our original approach. Then one weekend we tripled our userbase and horizontally scaling required massive changes in database architecture, sharding solutions, etc.

"Just buy more servers and load balance" is the short painful route. Carefully planning out and taking advantage of scalable architecture that can be provided to you less expensively because it too, runs at scale is the hard method. The fact that its easy to shoot yourself in the foot with it doesn't make it the easy route.

You're going to understand where the cool kids come from once your single dedicated server goes down or can't handle the load any more. As soon as you try to scale horizontally or become highly available and start to think about how to do it you end up falling into the same rabbit hole.

> If you also use containers you even get quite a bit of flexibility and agility.

Yeah... and then the only difference is between a single host and multiple hosts. Guess which one Kubernetes is for?

With all due respect and no offense intended, your perspective sounds a lot like "I've never attempted to scale so I can't understand the problems"

> can get away with not knowing basic sysadmin stuff

I promise you that the knowledge you need for your single dedicated server is also needed for k8s clusters, and I definitely can't imagine anyone who can maintain a k8s cluster but can't maintain a single linux host. It's more like that scaling horizontally comes with exponentially more difficult problems than what you're used to or have heard of.

Obviously the billing is an issue but that doesn't negate the whole concept. They should definitely implement better expense controls. A simple hard cap to activate if you want would fix this for good.

> With all due respect and no offense intended, your perspective sounds a lot like "I've never attempted to scale so I can't understand the problems"

With all due respect, I don't think you've ever actually put together a local cluster.

A simple 4 machine k8s cluster sitting literally on dirt in my basement can scale out to the equivalent of thousands of dollars of AWS spend a month. I broke even on the initial purchase outlay for my workloads in less than a year.

The problems are almost never scaling the web servers. The problem is scaling the infrastructure that those servers need to be useful.

Generally - your DB is the first pain point, your network is the next.

If you can run it in a container, that service probably isn't the bottleneck for scaling, it's going to be whatever is providing the persistent disk for that service, and the network between the two.

Both of those things happen to also be fairly expensive to scale in the cloud as well.

> With all due respect, I don't think you've ever actually put together a local cluster.

I have, and still do. It's neither redundant nor highly available. The power source isn't, the internet connection isn't and it's also located in my basement and not multiple regions.

> The problems are almost never scaling the web servers. The problem is scaling the infrastructure that those servers need to be useful.

There you go..

> Both of those things happen to also be fairly expensive to scale in the cloud as well.

No. [0]

[0]: https://i.ibb.co/TKmB9HX/image.png

> It's neither redundant nor highly available. The power source isn't, the internet connection isn't and it's also located in my basement and not multiple regions.

For a lot of projects this doesn’t matter. The cost of downtime might be lower than the cost of high availability and occasional downtime is OK.

With regards to your image... so what? Virtual networking equipment is virtual - I too can spin up hundreds of subnets for nothing.

My internal transfer costs are also.... drumroll... zero. (ok - technically, at some point I bought a 10gbs switch and 500ft of ethernet cable)

Lets talk about what it costs you put data back out onto the net, or into a different region. Then lets compare long term storage costs.

Because that shit is only cheap in the "hobbiest" range. Outside of that, outbound data and storage is fucking expensive in the cloud.

> I too can spin up hundreds of subnets for nothing.

Really? In completely different physical locations with inter-region routing and NAT gateways for each region that route into a redundant load balancer? Then you're right and don't need AWS any more. However, I'd suspect your costs would not be very affordable either.

> Lets talk about what it costs you put data back out onto the net, or into a different region.

If you're seriously trying to compare your residential internet connection with something like this I'm not going to go there..

> Then lets compare long term storage costs.

Same logic applies.

For 99% of projects you're never going to hit the point a single server (or group of servers if you truly need redundancy for some level of uptime) can't handle the load. For the other 1% that end up needing that scale I have a hard time accepting it's actually better to start building for massive scale day 1 instead of day 1000.

This is such a narrow perspective. It’s not always about load, most applications can benefit from some level of redundancy and using AWS etc doesn’t equal a massive scaling operation forced upon you. In fact, the other options would. I'll give you some common examples.

You have a script that needs to run, without fail, at a certain time every day.

You have an endpoint that is mission-critical and even a second of downtime would cause insane manual workload.

You have an endpoint that needs to have as little end-user latency as physically possible.

These are all real examples from my work in fintech. The solution to all of them is a few dollars in cloud functions in Lambda and Lambda@Edge. Now imagine having to provision, orchestrate and maintain dozens of dedicated hosts all by yourself, simply for these handful scripts. The expense of employee time alone would make it absolutely idiotic to go that route and if anything this would equal massive scale day - but even worse, because you didn't even plan for it beforehand.

I work in fintech (banking and payments) and most of our clients aren’t even allowed to host on aws; they have to go for a local provider because aws doesn’t have a hosting hub in their country. Not sure what part you work in but this has never been a problem in the past 20 years with just servers, switches, load balancers etc.

I prefer aws over metal for these kind of setups, but for many other cases I definitely do not; just a dedi or a vps with docker or k8s and/or something like openfaas is enough for almost all startups and beyond. Making it literally impossible to make mistakes like OP. And when needed maybe failover or load balancing.

I don’t know about that, the hedge fund I work for is US-based and AWS can be fully SEC and FINRA compliant.

We also have a few dedicated servers, but mostly only for infrequently accessed data and logging that doesn’t need to be highly available.

I really can’t understand why this argument keeps coming up. Different solutions for different usecases. Yet anytime Kubernetes or cloud functions are discussed people come in and go like “hurr durr my single Hetzner dedicated server can do all of that and doesn’t have these problems”

Because some people here do exactly the same thing with aws/cloud. Not you but many here treat aws like it’s the thing you should use ‘because scaling and failover and omg 1 sec downtime’. And then the stories like OP showing that it is dangerous and my experience that most who do this are overpaying and could do with a hetzner server or, better even, a $5/mo vps, even if they do it right (which they are not, generally from all setups I have seen).

I agree with you though on the case by case.

Obviously the billing mechanism (or, lack of control) is an issue but that doesn't negate the whole concept. They could (and should) easily fix this by implementing hard caps to activate if so required.

Pretty easy, OP sounds exactly like a guy who would have been perfectly fine with a single (Hetzner) server. Then these arguments popup and they are most of the time right. I agree with you, that it always depends on the use. However, hurr durr Hetzner Server seems to be the more reasonable choice here (once again).

Why? If you had the choice between complete redundancy and infinite scaling by default while having almost zero work, or using a dedicated server that you need to configure and constantly maintain, what would you choose?

If with one thing, a single mistake can result in a $5k bill, and with the other I have a guaranteed fixed bill of 50 bucks a month, then unless I have enough money to burn, the choice is crystal clear. Maybe you wipe your butt with 100$ bills. Others aren't that lucky.

The choice is only crystal clear until AWS etc implement hard caps to activate if needed. Then having some Lambda functions is not only exponentially cheaper than a whole dedicated server and by default highly available and redundant, but also equally as safe financially wise.

Exactly; if it is ‘the same’ but with a guarantee of max 50$/mo then sure, I would pick that, but that is not the case, also not with caps. So it is an entirely different case and usually the 50$ option actually brings you very far without any financial risk outside that $50/mo.

Nobody says there isn't any use case for AWS. The point is that "the cool kids" like to start their side projects on AWS. Nothing about it is mission critical. I guarantee you op is not working on some Fintech stuff.

How's that bad? Isn't this how we all learned? By playing with cool modern technology?

If the solution you're building is business-critical then you already have a problem day one: you need high-availability. That means you need at least two servers. As soon as you add that additional server your troubles have begun. Doesn't really much matter whether you add 1 server to your setup or 100.

Your problems compound if you have to consider disaster recovery (DR) and need an offsite location you can failover to. Now you have to contend with data replication and all its concerns.

So even if your application only has a few dozen users things can get quite complex quite fast. AWS makes it simple, practically push-button. Deploying in multiple Availability Zones (AZ) is a no-brainer, and it isn't horribly difficult to deploy in multiple regions, complete with data replication. All while having no servers to configure and maintain. No containers to contend with, no Kubernetes to fight with. To me it's the most stupid-simple solution available today and it's ridiculously cheap! At least so long as you test before you deploy for world exposure!

I completely agree with you. I really suspect some people have never seen themselves how insanely, mind-bogglingly powerful AWS is.

Try doing this [0] with your dedicated servers in under 10 minutes.

And then try attaching highly-available scripts/cloud functions and dozens of different integrated functionality in seconds.

And then try setting up good Network ACL, firewalls, route tables, NAT gateways, load balancers with a few clicks.

... and people here are seriously suggesting that instead using AWS would equal a massive forced scaling operation. Lol.

[0]: https://i.ibb.co/TKmB9HX/image.png

I'm not trying to dismiss the value there is in having highly scalable cloud architectures available to the projects that need them, but a huge proportion of projects will never need them.

Most projects don't even actually suffer for modest downtime, although you can achieve AWS-comparable downtime even without AWS/Google/Azure-style "cloud" architecture.

It can make a huge difference in operating and engineering costs to know which bucket your project fits in. Good architectural foresight can even let you smoothly move from one to other if unexpected growth or a pivot indicate that it's warranted.

Engineering is about understanding the scope/scale of problems, not about being dogmatic or getting caught up in problems that don't apply.

I was thinking more of when the unpatched Ubuntu 12.04 is exploited via an SSH 0day and the server is used to host a Citibank phishing website.

Most people don't need to scale and even running 5x redundant servers is cheaper than a comparable cloud solution.

Running Lambda, I get a million calls per month for free. Then it's 20 cents per million calls.

Just curious - have you really researched cloud solutions or did you just compare the price of hosting EC2 instances in AWS vs having your own server? Because that's not what cloud is about.

> Running Lambda, I get a million calls per month for free. Then it's 20 cents per million calls.

And you are overpaying by a significant chunk. My raspberry pi - the old cheap one - can handle 10 million requests a day without breaking a sweat. If I push it, I can get up around 90 million requests a day without too much effort (it's only about 1 request/ms)

I really don't think most devs understand how fucking cheap hardware that's comparable to these services is.

Now - you might be using a host of other valuable features that your cloud provider gives you (things like edge servers near your customers, or truly significant outbound network traffic flows, or a very robust multi-region setup, or disk backing of some sort, etc).

But generally speaking - you ARE overpaying for cpu cycles in the cloud. It's not really up for debate.

Of course you are paying more for AWS than for a damn raspberry pi.

You say this like it's obvious that AWS is better.

There are certainly cases where AWS can be better (ease of edge networks and multi-region availability come to mind)

But outside of a very small set of cases (most of which are when companies are victims of their own success - which is actually a wonderful problem to have) what's the compelling reason to actually use AWS if "Of course I am paying more" for it?

In my opinion, if you can still run your db on a single machine - you don't fucking need the cloud yet. That covers a pretty large chunk of businesses.

Most of the "cloud" is convincing business that could self-host their entire stack on a raspberry pi in a basement that they should be spending thousands on cloud compute costs a year.

Fuck - the most damning evidence is simply how much fucking money these companies are making from upselling you cpu cycles that they're getting mostly for free. For the 4th quarter of 2021, amazon reported a profit of 5.2 billion on ~18 billion in revenue from AWS.

> Just curious - have you really researched cloud solutions

(heads-up, the tone of that comment is a bit off.)

Even a very small $5/mo VPS can easily deliver 100 requests per second on any language, even languages that are not known for their performance.

Over the course of a month, assuming an even distribution, that's around 263 million requests, which would be $53 on Lambda per your pricing, or more than 10x as expensive.

This cost differential actually increases, both in relative and absolute terms as the number of requests increases.

The cost per request on a hardware server actually decreases dramatically on a more powerful, hardware server, whereas Lambda's pricing stays the same.

This means that your price per request isn't just 10x higher than it should be, but it might be 100x or 1000x. That's a very substantial cost increase -- so substantial that it can be an existential threat to the survival of your app.

Lambda's great for highly dynamic, low-request rate applications, like an email form processor on a website. It's really a poor choice for anything that needs significant horsepower or lots of requests.

So I'm actually in the process of doing Lambda versus persistent costing right now for a new project, where it's heavily load-based and very spiky, but the work on each packet of information is actually very lightweight. The tricky part here in AWS is not Lambda, which is pretty reasonable in general--the pitfalls I'm seeing are around data storage. DynamoDB is stealthily very expensive, either provisioned or on-demand. Right now I'm converging on lambdas for compute and RDS/S3 for storage, but what really makes AWS shine for this use case more than anything else is SQS.

SQS is so good and that there isn't a great, easily deployed (sorry, RabbitMQ) option for exactly what it does is a real push towards AWS. (GCP Cloud PubSub is close enough, but I know AWS way better than I do GCP.) If I didn't need this, and I felt confident managing RabbitMQ as a queueing solution, I don't think AWS would be a compelling solution because $60 in Hetzner nodes, in a HA configuration, could do a lot of work.

(If this thing works I'll need an on-prem solution anyway, so I'll probably have to build that too--but that's "good problems to have".)

I think NATS is an excellent, and very easy to run and deploy. Pretty much has a "just works" attribute I like to assign to boring technology

NATS and NSQ are better than SQS if you can deploy them yourself. ZeroMQ is a great option as well, but it probably will require some major brain changes, since it doesn't fit most people's mental model of what a queuing server looks like.

DynamoDB is not required to use cloud functions. You can either use regular RDS like you said, Aurora or even just your own EC2-based cluster (all of these you can attach to the VPC the functions are attached to as well) and there's a lot of nice developments going on like Cloudflare D1.

Totally agree on SQS, knowing PubSub as well I'd say they're pretty much on the same level. All the interconnectedness is where cloud platforms shine

And? I don't think you are aware of how cheap metal is these days.

Plus the OP has already decided that free tiers don't work.

With almost service like hosting where there could potentially be unknown costs I always select services with fixed fees and pay annually via PayPal so no card info is stored.

If you have to pay by card and you have to subscribe always use a disposal virtual card. Most of the time these companies won't pursue you for payment because it's simply not worth the effort. The most likely outcome is that they'll just suspend your account. What you don't want happening is them taking money from you before you're even aware what is happening because that way you won't get it back and there's no room for negotiation.

the cloud means you get to write sexy code and call yourself an engineer

not the cloud means you write boring configuration files and call yourself a sysadmin

Powerful yes, what about getting good peering and routing?

Hetzner, OVH. Hell, if you go with Kimsufi (OVH's budget brand) you can get dedicated for a dozen bucks.

I think every developer has an AWS billing horror story.

My horror story is that my site allows users to upload videos and share them to a limited number of colleagues. When a user requests a video, a CloudFront URL is created that lasts a few hours.

I had not thought much about hotlinking because the link only lasts a few hours - what would be the point? Well, those few hours make a big difference when it’s linked on a high traffic website.

Turns out someone paid for the cheapest plan ($7) and uploaded two multi-GB files. They hotlinked them on a Vietnamese porn site and ran up charges of almost $10k.

I was alerted by Cost Anomaly Detector but it had already run up most of those charges (and the totals CAD listed were much smaller and made it seem like less of a problem, thus delaying my reaction). AWS, to their credit, waived the charges.

I had WAF already setup but it wasn’t very helpful for this type of thing. I could only block sites that I already knew about. I ended up going with a Lambda@Edge solution that validates the source site before allowing access.

Lessons learned: 1. Customers may abuse things in ways you didn’t predict 2. Cost Anomaly Detector has a delay and only kicks in once charges have accrued. It can save you from an insane bill but won’t save you completely from large bills. 3. AWS can be reasonable about this but the ball is entirely in their court.

How can two few gb files = 10k usd of data in a few hours?

How popular is that vietnamese porn site!!

I once committed my private AWS keys to a public github repo. A bot scooped it up nearly instantly and spun up many, many ec2 instances that were (probably) mining bitcoins.

I received an automated email from Github telling me that I had committed a private key, but it came in the middle of the night.

In the morning, when I learned what had happened, my bill was over $3k.

I fixed the issue and emailed AWS asking for some relief, and they called me and let me know they were waving all the charges.

So, perhaps you too can beg for mercy?

The difference between his situation and yours is that you didn't create the charges. Legally you're not liable for something someone does while impersonating you, even if you walked around with your private key on a t-shirt. They may or may not be nice to him but for you they didn't have a choice.

I don't think that's true? I mean sure, you might not legally be liable when someone impersonates you in the real world. But I'm absolutely certain the AWS terms say somewhere that you agree to take care of your creds and are liable for whatever is done with them, etc?

Both could be true. A contract can say anything, but it's going to be bound by the legal framework it operates in, and in this case I don't think there's much of a distinction between the digital and real world, except for physical resources not changing hands.

Hypothetically, the contract could say Jeff Bezos will come to your house and personally kill you, but there's no consentual murder in most places

It doesn't matter what the terms say. The charges would be the result of a violation of Title 18 Code 1030 - it's the digital equivalent of someone stealing your car and writing the title over to someone else. You're entitled to keep your car (or your money spent on AWS) regardless of the receiving party's expectation of claim to it, even if they incurred loss in the process.

Now, Amazon would be entirely within their rights to cancel your account and refuse to do business with you after this, but they would not have the right to collect that money from you, or to keep that money had it already been charged to you.

I mean, for all we know it could have been him mining the bitcoins, with committing the private key by accident being the cover up story.

The legal burden of proof for that lies on Amazon, not him, at least in the US.

Legally you're not liable for something someone does while impersonating you

This unfortunately isn't true. It also sounds like he created an app key from his root account that enabled anyone to literally impersonate him.

A typical use case is to create a user that has only the specific rights that are needed and generate an app key for that user. For example, I have a user that can only read S3 buckets. If it were to leak, the worst that would happen is I would leak some encrypted backup data.

I want to offer two counterpoints to common sentiments here regarding AWS billing.

1. Don't be afraid of playing around with AWS (and even spending some money). AWS is really good at refunding you if you accidentally rack up a couple grand in surprise bills. Also even if you legitimately spin up big servers to try a kubernetes cluster for a couple of days, that $20 you spent is almost certainly great bang-for-buck for the benefit of learning that experience and getting your hands dirty with AWS.

2. AWS billing is actually really good for what it is. If you've ever run any non-trivial operational system (in the real world), you would know how hard it is to collate all expenses and get them tallied up. AWS collates all billing data with ~24h lag and you can slice and splice it to your heart's content. After all it's a complicated distributed system that they've managed to build that doesn't slow down your services or otherwise get in the way!

With regards (1), I feel like there are two different worlds (and given how non-transparent Amazon is, that's very believable).

I've never run up a huge AWS bill accidentally, but I personally know 2 people who have, and neither was refunded, even after asking. In both cases we are talking $400-$800, enough to really hurt someone, but not bankrupt them.

Interestingly, AWS won't refund service credits provided by an accelerator if you've accidentally blown through those instead.

Possession is 9/10ths of the law. If they have already been paid, they are not likely to refund you.

How the fuck do you blow through $150k at an early stage startup???

As someone who worked at an early stage video intelligence startup, surprisingly easy. Redshift + Elastic Transcode + Cloudfront makes it extremely easy to spend thousands and thousands of $ per month.

> AWS is really good at refunding you if you accidentally rack up a couple grand in surprise bills.

I wouldn't bet my bank account on that always holding true. If it's not in the terms and conditions that they'll refund you for accidental mistakes that lead to high billing, then you're gambling if you assume they will.

Azure supports hard stops on services with billing maximums. It does mean that stuff gets turned off if you enable that. Then again, as an individual, that's a superb way to control costs.

And since Scamazon doesn't do that and INSTEAD "gives" you a 1 month unlimited credit, there's no telling just how stratospheric your bill can be.

> AWS is really good at refunding you if you accidentally rack up a couple grand in surprise bills.

If there were hard limits, there'd be no need to beg AWS support for leniency, which they can capriciously choose you don't deserve.

Completely true. However if what you're doing is highly price sensitive and willing to accept downtime over ScaryBill, then this the option you all need. And it's completely (and I bet intentionally) not available on AWS. AWS's message is "bend over and we'll tell you how far and long".

Larger companies see $4k as a nothing; pay and move on. Household budgets, not so much.

If there were hard limits, that would also mean that the billing system is on the critical path for all systems, and not just an after-the-fact ETL.

Not necessarily. A cloud provider could retroactively cap charges at the hard limits, but only cut access to resources asynchronously. That's effective what happens when you complain now with AWS.

If AWS wanted to, they could absolutely implement a hard cap. It's not like letting some services run for a few hours until billing catches up costs them a lot of money.

What is true--not necessarily in order is:

- I suspect AWS in aggregate probably makes a fair bit of money on overages that a user eats but would have had a hard circuit in place if they could have, and

- Even reasonably designed hard circuit breakers (e.g. we cut off access to your stateful data unless you pay your bill but we won't delete it for 30 days) are still giving developers a potentially well-hidden foot-gun for a production environment that management might not actually want.

Just like in real life, if you run out of money you have to stop doing things.

> Just like in real life, if you run out of money you have to stop doing things

In real life when you hit your card's limit, your transactions get declined. Straight away.

I had this last month in a supermarket after my (personal) checking account didn't have enough money to cover my purchase, I'd completely forgotten to transfer money from my business account.

My bank wasn't prepared to let my account go overdrawn, not even by the equivalent of $20, which is absolutely their right.

Amazon, OTOH, benefits in lots of ways by not implementing this mechanism.

> In real life when you hit your card's limit, your transactions get declined. Straight away.

Except for when it doesn't. I've got two primary current accounts, one with a "legacy" bank in the UK and one with a modern bank. The legacy bank is happy to let me go into an unplanned overdraft, and charge me for the privilege of doing so.

> a "legacy" bank

"Been there, done that". Not there any more!

A better analog to the store and Scamazon's implementation would be:

You put stuff in your cart.

You go through the register. You agree to "prevailing price".

No prices show up because they are "calculating".

You leave the store.

A day later, you're hit with a surprise bill that's 10-100x more than you though. A $100 transaction ends up being $1000-$10000.

There's no refunds.

The dispute procedure is to beg and hope they "let you ignore it".

For the companies I've worked for, having HARD limits on devs would put the C-levels minds at ease.

At this moment, any dev with AWS keys has an unlimited month-per-month credit line that the company is on the hook for paying. And at best is the hope and prayer that the billing notifications aren't utter shit.

I had an ECS cron job hang one time, so instead of a 30 second compute charge it ended up being almost a full month of continuous runtime. My usual $30 bill was $800, and the estimated charges for the next month based on 2 days of use was $1300. I didn't have a billing alert setup (definitely setup a billing alert!).

It could not have been easier to get AWS to remove the charge. A quick email to support with a brief explanation and it was immediately accepted. The hardest part was that they wanted a very specific request for how much I was asking to be refunded. So I had to go back and calculate my average costs per service and compare that to the charged costs. After that it was immediately refunded.

They aren't just handing these things out though. They made me read and acknowledge I had read their service agreements and basically swear that I know what happened and it won't happen again. Really painless process overall, all things considered.

Off-topic maybe, but I'm very curious as it's hard to find hard numbers:

> my usual bill is $200/month

How many req/second are you serving? What kind of things are happening?

It seems like the bills are outrageously expensive when it comes to various cloud services, as I'm personally hosting a service that does between 10-100 req/second on average during a month, and my monthly bill end up being closer to $40/month, including traffic and everything. I'm running a database on the same server, and 20% of the requests writes stuff both to disk and to the database, while the rest just reads from DB or disk.

The whole setup took around 5 hours to setup on one day, and have been running flawlessly from day one, we haven't had to migrate servers yet after ~6 months of production usage. Probably one day we're gonna have to upgrade the server to a $80/month one, but that's a one-time thing and our revenue easily covers that.

In my consulting days I helped people with issues like this quite often, and what I found tended to come down to severe inefficiency caused by people not fully understanding how services need to perform at scale and what causes them to perform poorly. You hear a lot of “oh, it seemed to work really well on my machine”, which doesn’t necessarily translate well to performing or scaling smoothly in the wild.

With a bit of refactoring and simplifying it tended to cut out a lot of issues. I suppose the issue is that a lot of people don’t really know what to look for or how to anticipate issues in complex infrastructure (which is totally fair, I only learned via trial by fire and have done some really stupid stuff).

I don’t think I ever encountered a case where the code was correct and bills were too high. Some AWS/GCP configurations would be pretty bad, but the code would also tend to be incredibly inefficient.

I always encourage people to respect people who are great at dev ops and to either hire them if they’re big enough or just consult them if they’re uncertain about things. Throwing a day of consulting rate at someone smarter than you is a great way to learn and it could easily save you money in the not-so-long term.

I’d say that because they’re crazy to rely on me for infrastructure, but I’d make things better if I was around already and they asked me to help out. But I’m no substitute for someone who actually knows what they’re doing.

It depends on how efficient the dev is.

For comparison, I've been running a paid Slack app (a few $K per year) that manages to run entirely within the free tier on Google Cloud Platform.

It is depends on the user activity but typically it is not so many requests. Around 20 req/second with a reply in 100-400 ms.

These recursive requests were going at the rate 17k/second with 30s timeout each.

The benefits of the current approach is I don't need to manage any servers, and I have different environments for free. Also sizing is not an issue, I just tune AWS Lambda limits to be able to serve one single request.

I will need to invest some time to understand how big the instance should be (how much memory) because I struggle to measure how big it should be without going out of memory or CPU.

I run 1.2 million uptime checks per week, my total AWS bill was $150/mo before I migrated to permanently running VMs - it's definitely doable without trying too hard.

Yes, moving to VM is definitely doable,

but now, being a 1 person dev team, it is challenging in maintenance.

My fear is, being on a vacation, and suddenly this VM dies. It might take too much time to bring it back online, and I might be out of good network coverage.

Self healing VMs - look into fly.io, they just restart when out of memory etc

took about a day to rewrite and a week of 30 mins per day to figure out how to optimise my code for self healing.

knowing my bill is capped by number of VMs * memory, saves a lot of stress

> my total AWS bill was $150/mo before I migrated to permanently running VMs

I know I keep harping on about this, but [if you have a vague idea what you're doing] you can squeeze an awful lot out of a cheap VM.

I'm currently working on a project that we're deliberately prototyping using ultra-cheap VMs. If it takes off we know how to scale it up, if it doesn't the costs stay very very low.

Definitely - with startups offering 3x free firecracker VMs (of 256MB RAM) these days, there's almost no reason to start ideas with serverless.

AWS may promote the technologies as prototype friendly but at the end of the day its built to be enterprise grade production tool. A company will not even bother with a 4000$ mistake, its just the price of doing business so there is little incentive to address these types of problems. Playing around with AWS for side projects is like using a chainsaw, it can really accelerate your work but if you are going to make a mistake you may lose an arm and a leg :).

That's not really fair because while AWS does have a lot of issues, their refund policy isn't one of them. It's usually really easy to present a case for refunding accidental charges.

I've heard they have a generous foot gun billing policy and thankfully I've never had to find out, but we shouldn't be that grateful, because ultimately the cloud providers do this in their own rather dishonorable self interest.

It would be fairly simple for them to allow users to set up hard billing limits. Yes, it wouldn't be accurate to the second. And yes, it would mean that deployments would fail with data loss or in unpredictable ways, but in most cases that would be preferable for these users as opposed to a couple orders of magnitude increase in billing costs.

But the cloud providers don't support hard billing limits because they like people fucking up and accidentally running up their bill. After all, it's probably only a small fraction of users that go through all the humiliating rigamarole of unwinding a provisioning mistake.

So yeah, good on Amazon for being so generous with the band aids, but maybe they should try a little harder at helping their users not shoot off their toes...

> It would be fairly simple for them to allow users to set up hard billing limits.

former AWS SDE here

I don't believe it would be "fairly simple" to build a completely new off switch into 150+ services, likely with multiple integration points in each service. In addition, the mere existence of an off switch introduces new failure points, where failure directly turns into downtime.

The effort to implement this is far from trivial, removes resources from implementing other features that the really large accounts are asking for, and adds complexity with direct availability risks. It's not at all surprising they don't implement this.

IMO Google Cloud has the solution for this - access to APIs is off by default and you must enable API access before anything will work. Their portal is pretty good at estimating costs in the first place, so resources created there aren't much of an issue, but having to use the portal to enable programmatic access is a great way to avoid mistakes.

Can confirm. When I was learning the basics of EB, I accidentally spun up a bunch of EC2 instances in a region I didn't mean to that ran for ~3 weeks and racked up a $2.4k bill on the company's account.

They wrote it off ~6hrs after we filed a support ticket about it.

Chainsaw is the perfect metaphor for this, well done.

Using AWS for sideprojects is a great way to make sure you have AWS skills the next time you interview, assuming you aren't using it at work.

While reading through these threads I always get a feeling that everyone's working at YouTube or stuff like that, and they need to serve millions of users per hour. Meanwhile I'm in my corner here with an old-school $20 VPS that does just fine for my 10k users.

For small projects why do you need the scale? I feel like once you need the scale serverless is the way more expensive then even managed Kubernetes. I still think serverless is hosting services way to make far more money with the illusion that it is easier when it really isn't. Logging is normally a huge pain. Local dev is usually a huge pain. Managing versions is a pain over just git branches especially over multiple environments. It is a pain to setup different environments and full CI/CD. In then end they might be ok with prototypes but real big systems they are huge pain but that is just my real life experience.

To expand on this OP, I've done the AWS-full-stack approach in a mid-sized startup. Modern Serverless problems require modern serverless solutions. That ecosystem is simply not as developed as "traditional" web-server CI/CD. Here are some things that you will eventually need to optimize for.

- After crossing a certain threshold in scaling needs, Lambda costs more than regular EC2 on ELB

- Lambda cold-start times can be a deal-breaker when users first visit your website. If you contact AWS they will tell you to setup a simple cron job that keep lambdas "warm". But AWS provides no visibility in what's warm or cold, or which endpoints link to which lambdas.

- Dealing with Cloudwatch logs of various lambda runs (IMHO) is objectively a bad dev experience. Query insights is getting better, but is still a pain to work with.

- To reduce deployment and development times, you'll eventually want to deep-dive into lambda layers. Modern problems modern solutions.

- One lambda calling and awaiting another lambda is not a supported first-class use-case. There's no API that allows you to get the status of a lambda run. There's a hack around this where you use AWS Step-Functions. Modern problems modern solutions.

We're still on AWS full-stack "serverless" for our webserver and realtime stream processor. At the time I didn't know what I was getting my company into. I wish I just made a Flask webserver instead.

Serverless isn't just about scale, it's about deploying code without having to touch any infrastructure. The lambda free tier is also very generous (1M free requests per month).

I used to work for Firebase, this is a common problem. For my own developer focussed startup I have prevented functions from calling each other to an unbounded depth, exactly so this footgun is removed.

The technical details is outbound requests is given a role encoded in the user-agent, and then I can easily filter out incoming requests by user-agents [1].

[1] https://observablehq.com/@endpointservices/webcode-docs#opti... (see loop prevention flags)

I rarely use AWS for smaller projects, and prefer to either use Digital Ocean or bare metal from a local data center (well local when I lived in NY).

After a surprise bill like this, I would re-evaluate what serverless is actually giving me.

Cloud vs. bare metal costs should always be thoroughly calculated. Fragment from an article from current "FreeBSD Journal"[1]

> We compared the three-year total cost of ownership of a VPS, such as a DigitalOcean Droplet, against two equivalent leased or purchased bare metal servers. We estimated that the leased option costs about half as much compared to equal resources in the cloud, and owning the servers would cost less than a quarter of the pure cloud options.

[1] https://freebsdfoundation.org/our-work/journal/

[1] https://freebsdfoundation.org/wp-content/uploads/2022/06/Jou...

I'm only using AWS for my domain (too lazy to move) and even then I use an external dns manager because aws charges something like 50 cents per a dns record per month.

Everything on aws is a clusterfuck designed to suck money out of enterprise businesses

DigitalOcean is amazingly simple and I have been a big fan since they launched.

It's really difficult for AWS or any other serverless provider for that matter, to achieve a kind of "bulletproof and safe user experience" across different offerings that encompasses everything that has to do with billing/monitoring/alerting and then also cover all kinds of potential customer scenarios (like the function calling itself, as one example).

For example, it's totally understandable that the alarms can be specified per region, why shouldn't it be like this?

Also the global AWS billing $300 alert seems to have worked but you were asleep as far as I understand. If it was a call-out style alert, then you would've noticed in the middle of the night and could've stopped it.

The only thing I agree is frustrating is this: > CloudFront includes the AWS Shield Standard feature, but somehow, it was not activated for this case (Lambda@Edge calling itself via CloudFront).

Maybe you can argue that you weren't made aware of this but idk... keep us updated

What’s the case for not implementing an optional “shut down all my services at $spend and stay shut down until I intervene” ?

Honestly? Many people would enable it, forget about it, and footgun themselves on the other side.

Perhaps AWS should have "personal/developer" accounts that have this enabled by default and continually warn you about it, whereas "company/enterprise" don't have them.

> Many people would enable it, forget about it, and footgun themselves on the other side.

Yeah, but I figure as long as Amazon doesn't immediately remove stored data, the damage of the footgun would be minimal. Speaking for myself, of course, I'd rather have a short outage than an unexpected thousand dollar overnight expense. It seems so trivial that it's unclear why AWS would not implement this feature. The only explanation that makes sense is that they want these surprise bills to occur.

Because the downside for a company isn't "oh it was off overnight" it's "we finally hit it big and made zero sales because AWS shut us off".

Given how easily they reverse the bills, I suspect that they have a policy of doing it (perhaps a few times per account, something to prevent abuse) because they really don't want to trigger the above scenario.

Alternatively it's "we would have made a profit this month but a bug in this one service chewed through our budget in one hour". Sure, you might be able to get a refund, but that's no way to plan a business.

I’d think if your rate of spending is >$50/hour then that’s nearly-always a bug. The only reason this conversation is taking place is because serverless “infinitely scales”. Autoscaling physical instances has a max limit for similar reasons.

I've experienced plenty of scenarios where costs have quite legitimately spiked.

Ultimately whatever solution you put in place, someone is going to complain about it. At least with the system they currently have in place they can reimburse customers. Whereas it is a lot harder to fix their reputation after they've automatically stopped production services.

Given how easily they reimburse customers, I suspect it's intentional - one can be "fixed after the fact" and the other can't - if your site goes down during a slashdotting and you lose sales, etc, there's no getting those back, but if you inadvertently run costs high, they can just refund/cancel those costs.

They 'might' reverse those fees, they might not. You are at their mercy and mercy is finicky.

Azure HAS this hard limit feature already.

Ive seen nobody on HN, twitter, reddit complain about "my site was down during heavy business since i turned on hard billing setting". Not a single person.

However, I see frantic after frantic post of "I was testing something on AWS and it caused me a $X000 or $X0000 bill."

But as the posts in here are apt to suggest - you can always beg AWS support for a reversal. Great plan there.

The first is obviously customer error and unless you're posting to get laughed at, you're likely not to gain traction.

(Also one could make the "nobody uses Azure" joke here.)

Personally I think that much of AWS is "way overpowered" for the normal person/business, and you shouldn't be playing with it if a $X0k bill would be impactful (as likely other solutions are much better tuned to your needs and money).

Sorry, I might have been not very clear, but AWS billing alert for $300 have been triggered only when it have reached $1,484 in charges.

If that alert triggered earlier (and $300 would have been triggered in an hour), my all accumulated charges would be only $400 not $4500.

So the hard learning here is that CloudFront charges takes time to appear on your bill, up to 24 hours.

It's not really difficult. They just need a way to set hard spending limits. Probably on by default.

Unless you're a big company, "we stopped your function in the middle of the night" is a whole lot better than "we ran your function all night and you owe us $4k".

In my 25 years of running production services, I honestly cannot think of one company I've worked for that would have accepted their function being stopped in the middle of the night.

AWS already has a recourse for incidents like these: refund the spend. That is far more reliable than trusting an organisation can tolerate an outage.

The difference is none of the little sideprojects I work on are worth $4k in a month, let alone in a day. Obviously most companies want to spend the cash, but they should want as many programmers using AWS for sideprojects as possible.

I started off using Lambda as well and made the same sort of mistake. I can't remember exactly how much my bill was, but it was enough that it would have drained all my savings and effectively kill my startup. AWS was kind enough to write off most of it.

We now use Lambda only for simple cron / background tasks, or consuming from Kinesis. We use ECS for everything else. ECS is nice because it's relatively simple compared to K8, but still gives the full benefit of running multiple containers on one box.

I wrote a blog post last year about our migration over to ECS and experimenting with various ways to cut costs: https://blog.bigpicture.io/how-we-cut-our-aws-bill-by-70/

On a side note, I don't know if you've already acquired any AWS credits. If not, Product Hunt Founders Club is a decent deal that will give you $5k in AWS credits. Between that and the no Stripe fees for 1 year, it paid for itself in no time.

https://www.producthunt.com/founder-club

Thank you!

I recently applied for AWS Activate credits because our startup was a part of YCombinator Startup School recently.

Thank your for ECS suggestion.

I am definitely considering it, but struggling to choose between ECS, Elastic Beanstalk or EC2.

My past experience with ECS was a bit frustrating because I was forced to use CodeCommit to deploy and I didn't liked that. I would prefer to deploy directly from CI, for example from GitHub actions.

ECS runs on EC2. You basically register available servers and ECS automatically puts containers onto the instances where space is available.

We have it setup with Github actions to automatically deploy to ECS as well.

I think I should definitely consider it. I was using ECS Fargate on another project and it was not using EC2.

Why you choose ECS with EC2 instances instead of Fargate?

Good question. Tbh I haven't looked into Fargate too much. For our use case, we're processing millions of requests every day. So we needed more control for performance and cost. We're also fairly comfortable with lower level stuff and use Terraform to manage the infrastructure.

AWS support has historically been pretty good about removing these charges. Just be careful next time.

I racked up a $8k AWS bill for my university when I was leading a club. A few emails to AWS support and it was all resolved. Although there might've been more leniency since I was a student.

What’s even more worrying is the numerous accidents hiding in a $700k a month bill which is our problem.

That is a whole new class of problem. Better start parsing those CSV details billing logs for some gems!

I wish cloud providers had a nuclear option. Like “if my monthly spend hits $X, then just stop everything immediately”. Often these billing issues happen on little hobby projects and things that the owner would clearly be fine taking offline to avoid thousands in fees.

> then just stop everything immediately

What does stop everything immediately mean for things that aren't compute? Backups and storage, for example.

I was only thinking about this for compute related services, which is where most of these surprise charges come from. I suppose you could have some fine grained rules for other services.

This is partly the reason why I would never use anything besides VMs or baremetal I provision myself. I'd rather have scaling problems I can solve by provisioning more hardware than billing problems because I fudged the setup. Yes, AWS might be good refunding "oopsies" but when trying to bootstrap a business I have better things to do than recover from heart palpitations.

This is a well known but poorly publicized issue with Lambda -- that they can get stuck in infinite loops and run up your bill.

I advise anyone I work with that if you are calling one lambda from another anywhere in your system, you should generate a request ID with every inbound request and then pass it along with each call as part of the context, and then error out if you see the same request again.

The good news is that AWS is aware of this and that their alarms are delayed, and will almost always waive the fees for you if you ask.

What is this kind of sh1t!? Why do people put their software on a platform where you don't know what it's going to cost upfront?

Maybe I don't understand. Maybe there are legitimate use cases for AWS and other 'vague clouds'. And it's not only these kinds of bills you get, but I also heard one even has to pay to get your own stuff off of these platforms.

What's wrong with good old webhosting (in whatever shape, size or form)? I understand there are use cases - probably for big big apps - that will benefit from cloud hosting, but that can't be for every project or business. Right?

Please enlighten me.

Really wish the major clouds had a “I’m experimenting. Kill it if it goes over 1k usd” mode. Not some crappy alerts - actual halt (and delete if necessary)

I bet that would increase profits too by encouraging people try experiment more

According to other threads here Azure has this mode.

reminds me when I set up an S3 triggered lambda function that also wrote into the same directory. What ensued was millions of files and folders generated recursively.

Fortunately, AWS was kind enough to reverse the billing. Had this been Google Cloud, I would've gotten the cold shoulder and a low key threat that if I reverse the transaction I would lose access to my other paid Google products outside GCP :/

With S3 you don't even need lambda to do something like this

S3 has some setting where you can log activity on a bucket into another bucket

But that setting allows you to set the destination bucket to be the same bucket that you're monitoring. So ~30s after something happens on the monitored bucket, S3 writes a log into the same bucket. And then that activity triggers the logging again. So every ~30-60s, forever, there's a little log written into the bucket.

It takes a while to add up to something noticeable if your monthly AWS bill is already a few digits long. It's super fun to sift through the bucket a few months later when you're trying to figure out if there's any real data in the bucket or just endless logs.

I've had an horror story with Google Cloud, and they have been very helpful.

After I explained my situation (a $5k bill for an inactive side/toy project is extremely painful), and I provided great details about what I think happened to cause it, they wrote off the charges.

At work, my development team is contracted with a company that uses AWS, and for better or worse, we have also become the devops team. We have been burned by AWS before, and we have a rule of thumb: if you are deploying new functionality/service communication, after deploy, monitor for 10-15 minutes, with a wide enough window to see if there is a noticable/unexpected change from before the deploy. It always feels like wasted/burned time, but better to waste time than money. AWS is good about reversing accidental charges, though, but life is always easier if you don't even have to contact support.

> if you are deploying new functionality/service communication, after deploy, monitor for 10-15 minutes, with a wide enough window to see if there is a noticable/unexpected change from before the deploy. It always feels like wasted/burned time, but better to waste time than money.

... have you considered automating this? Alarms are pretty straight forward across all cloud platforms. Since you're using AWS: CloudWatch has anomaly detection. I haven't used it personally but perhaps it's worthwhile to look into: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...

First, based on my experience, they usually are able to waive this, if it's an honest mistake and from a small company.

Second, you can apply for credits, you can get at least $1,000 founders credits for AWS Activate, and if you work with any VC, you can get up to $100,000.

Lastly, note that Lambda@edge is (much) more expensive than Lambda, the tech stack I personally pick for any new product, is pure old lambda, serverless-express, and good old React / Vue for the frontend with static assets on S3 (with CloudFront and Route53 for custom domain, ACM for TLS). I'm not that familiar with NextJS but I assume it's mostly used to support SSR right? I would look into the tradeoffs of using it with non edge lambda, vs ditching SSR and using Lambda for API calls only.

I keep my fingers crossed!

We also applied for AWS Activate sometime ago, as a member of YCombinator Startup School.

I would really like to deploy to regular Lambdas, they are per region and easier to monitor, but unfortunately there is no support for this in serverless-nextjs right now.

Good thing about is it almost automatic - static frontend, SSR and API are deployed appropriately and work perfectly, without too much fuzz.

I try to balance between bringing customer value and doing infrastructure work, and now it is clear I should have spend more time on making a better architecture.

Open a support ticket with them and they will likely forgive this charge as a one-time gratuity

Good. Now you have learnt a great lesson, and you also realize that production mistakes can cost quite a lot in a short amount of time.

Try filing a support ticket with them with this information. I had something vaguely similar happen on GCP and they refunded the full amount

Hi, author here!

Thanks, already done that, and waiting for it to be reviewed by AWS. Support was very responsive, though.

Make sure to set up budgets too! Support will gladly help you with that since it helps prevent this situation in the future :)

For the future, I’m not sure if it suits you but I find you can get a far easier, cheaper, and more predictable dev and deploy experience without using something like CloudFront.

I once moved a project from AWS to Digitalocean for a small team (AWS was just all they knew, so it was what they used) and I was able to cut the monthly bill down quite a bit.

It isn’t that DO is inherently cheaper or better. It’s just dead simple so it’s easy to deploy with only what you actually need, with easy limits and visibility on what gets spun up. In some cases it’s arguably less cost efficient, but it’s really hard to mess up.

For the team I was supporting, simply having the visibility and a simple tool was worth a lot in saved time. They previously spent way too much time on AWS, and couldn’t even get the right infrastructure with the time they invested.

So, maybe something worth considering at least. Good luck with the bill! You’re certainly not alone (I got an $800 charge for a db I forgot to kill a few years ago).

If you contact support via your Amazon account and explain your error they will often remove some (but usually not all) of the bill.

Sorry that happened, always one of the scarier parts of using AWS. This sounds like an especially tricky one with the standard billing alerts not even catching it.

Thank you!

I already contacted them, my past experience with AWS was pleasant, it is just this CloudFront delayed billing should be better clarified in the docs I suppose.

You could just as easily have a "standard" setup recursively calling itself, and scaling up multiple instances as a result. This isn't an issue due to serverless.

AWS will hopefully issue a refund.

This is the exact reason I fear AWS. How should I go about learning it when mistakes like this can basically ruin me for the next few months? Not sure if there are safe resources or Free Sandboxes somewhere

AWS gives you free resources to start with. They just aren't capped, so you could bankrupt yourself. They also seem to be good about fixing honest mistakes, but it's scary and long.

Given that you thought you were doing your due diligence and had set up billing alerts, but their billing alerts are incomplete -- they should be on the hook to give you a one-time pass on that particular failure mode.

With all these stories about unexpected bills from cloud providers, are there cloud providers offering services with pre-paid credit instead of ex-post invoicing?

maybe doing recursive calls on lambda was not the best course of action?

I've been in teams making dozens of lambda based apps without issues,

lambda itself is the smallest item on the bill.

API Gateway is multiple times more expensive than the compute (lambda) fees

It is not easy to catch self calling functions from a static analysis :(

Yeah, lesson learned.

Don't ever use AWS unless you have a limited card (e.g. Privacy.com) and are willing to burn that bridge.

Consider it an expensive lesson on not giving your credit card to those scammers.

Aws model is incredibly predatory. All the companies I've seen move to AWS ended up spending way more than before for much less gained. And they still needed an ops team, just more expensive than the sysadmin they had before to manage actual servers.

I'll stick to VPS

I feel like once a day there is always a cautionary tale from some absolute dumbass getting reamed by AWS.

One of my biggest fears. What's to prevent trolls and competitors from just spamming your endpoints in a loop? How do people using pay-per-use infra deal with these problems?

I really want to use Lambda for public endpoints but it just scares me.

There are a lot of tools in AWS to deal with this. Api Gateway Rate Limiting, WAF, etc.

That said it does take a level of awareness to set these things up. Some of what we try to do at SST is turn on these for you automatically so you're not being punished for not knowing something

We're looking at going to a NextJS frontend setup soon and maybe Lambda for SSR... Would Netlify be a good use case to avoid situations like this and keep the hosting bill down? Netlify CMS really has our eye after chatting with the Plausible.io folks.

Sounds like AWS might refund you based on other responses, but always be prepared in case they don't. Make sure your DNS provider is not the same entity as your hosting provider, so an unresolved bill doesn't result in your domain being held hostage.

I once made a lot of web requested with lambda in a VPC through a NAT. Cost me similar amounts...

Wow, cloud. The future!!

I bet your glad you gave aws that money and didn't just invest it in your own hardware.

Serverless is just somebody else's servers.

The cloud is way over-used and over-relied upon. Most companies could run on a cluster of raspberry pis or equivalent single computer before they need to worry about the cloud.

So the whole point of AWS is you can scale up on demand. The point of it is there's no capital expenditures so you can scale up crazy fast. More scale more bills. Bills on demand.

Now, there's two sides to the second part of this. The following is a tiny bite of the fruit of the Tree of the Knowledge of Good and Evil.

The good is that Amazon has a policy of leniency. So writing them personally and explaining and asking will probably--and for publicly viewable problems I've seen them come through. For me personally too, now I've also entertained a bad impression of them because they sold me some counterfeits, but one of the items I was sure was a counterfeit and which because of this whole crazy thing on the mail and on the phone they ended up forfeiting, turned out to be genuine compared to the genuine item from the maker itself. I could never distinguish the items in any way other than context.

But the ill part, and I considered posing this the other way, ill before good, but in this case it's good before ill, is that they actually have to make a profit at some point. Now they make a lot of profit, but this doesn't change that, the basic thing in business is that it's not just granting, it's charging for what you grant. Both. It's not only about survival, it's also about integrity, because if you never charge you starve, meaning in practice you debase your values until you can eat.

Which brings me back to the main point, which is this: they used real energy they had to really pay for, and real hardware that depreciated, and you got the code wrong. Now Amazon I've heard doesn't have training wheels, you want training wheels. Like if you play with assembly, you can fuck up your computer, that's $2000, and since here you don't know what you're doing, they will tell you to like replace the logic board, do this whole thing, computer repairmen for sure screw people.

Just like riding a bike, you ride with training wheels until you go through the pain of learning to ride for real, and you fall, and you scrape your knees, again and again until you finally ride for real. And you keep falling off your bike indefinitely, just less.

Totally agree. I think of about it too compared to the cost of a semester of useless college classes and the guaranteed loss of thousands of dollars.

I opened a separate bank account with $2k in it that is the cost of learning cloud computing and for piece of mind if I do something crazy.

I am also though not leaving anything running ever when I log out. No way I am ready to run a lambda function in production.

Self-host and stop using the cloud, especially when you clearly don't know what you're doing.

> Now I am waiting on a response from AWS Support on these charges; maybe they can help me waive part of that.

Honestly why should they? They're very clear about their billing policy and pricing, the only reason they might is good PR/karma from yc posting...

Tell HN: I DDoSed myself using CloudFront and Lambda Edge and got a $4.5k bill

Recommend

W, X, and Z: The Layers of a System

Check whether a given point lies inside a triangle or not

iPhone 14 Component Shipments Now Underway Ahead of September Launch

复星创富投资高性能功率驱动芯片提供商英弗耐思

Next-gen Armv9 CPUs unleash compute performance - Announcements - Arm Community...

Weight differences of MBA with 256GB, 512GB, and 1TB SSDs?

Teapodo: A Lightweight Audio Editor

“宅经济”不管用？游戏精品化或是发展新趋势

Alibaba to shut down its R&D center in Israel - PingWest

BlockQuake team on More Than Money: Compliance, regulation and the future of exc...

About Joyk