4

Ask HN: How does HN manage to be always online?

 2 years ago
source link: https://news.ycombinator.com/item?id=31821269
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Ask HN: How does HN manage to be always online?

According to @dang (https://news.ycombinator.com/item?id=28479595) via @sctb (https://news.ycombinator.com/item?id=16076041)
  We’re recently running two machines (master and standby) at M5 Hosting. All of HN runs on a single box, nothing exotic:
  CPU: Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz (3500.07-MHz K8-class CPU)
  FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 hardware threads
  Mirrored SSDs for data, mirrored magnetic for logs (UFS)
We managed to run a successful bootcamp school LMS on a single cheapest 1gb ram hetzner vps with rails app, redis cache, postgres, backups, staging env and near zero production issues.

Recently had to upgrade to the next tier because of growth.

Modern servers are super fast and reliable as long as you know what you’re doing and don’t waste it on unnecessary overheads like k8s etc.

s.gif
Even k8s only costs about 10% overhead.

It's kind of incredible that "a few hetzner dedis running k8s" still has better reliability than the Cloud™ does.

s.gif
Remove Kubernetes and "A few Hetzner dedicated servers running just your application" has even better reliability as you remove more points of failures than "A few Hetzner dedicated servers running Kubernetes".
s.gif
So much this. System failure is a statistics game. Which is why I like the Erlang approach of having isolated state that you assume will corrupt after a while and you can restart that part of your application with a fresh state.

K8s does this, kind of, on a higher level, where the downtime is often more noticeable because there's more to restart. But if your application is already highly fault-tolerant, this is just another point of failure.

s.gif
> It's kind of incredible

It's not incredible, it's normal - it's been always like this.

s.gif
Adding to this: If you use dedicated servers rather than VPS, you can also squeeze out a lot more performance of the boxes as you have dedicated CPUs just for you, instead of shared (vCPU) that others use too.
A single bare metal server is more reliable than most people think it is. Complexity adds a lot of overhead and layer after layer that could possibly fail.
s.gif
A single server is much faster than most people think, too!

In the microservice or serverless arrangements I've seen, data is scattered across the cloud.

It's common for the dominant factor in performance to be data locality, most times this talk about data locality is about avoiding trips to RAM, or worse disk. But in our "modern" distributed cloud things, finding a bit of data frequently involves a trip over the network. In the monolith world what was once invoking a method on an account object, has become making a HTTP POST to the accounts microservice.

What might have been a microseconds operation in the single server world, might become hundreds of milliseconds in the distributed cloud world. While you can't horizontally scale a single server, your 1000x head start in performance might delay scaling issues for a very long time.

A most excellent paper related to this topic that I think should be mandatory reading before allowing anyone an AWS account is http://www.frankmcsherry.org/assets/COST.pdf :)

s.gif
When you give half the effort to set things up properly, a single server can handle a lot of load and traffic, and get a lot of things done.

If you know some details of the services you're going to host on that hardware, the things you can do while saving a lot of resources is considered as black magic by many people who only deploys microservices to K8S systems.

...and you don't need VMs, containers, K8S and anything.

s.gif
I think the main risk of the HN architecture is that (I believe) it's a single datacentre in a single location. Hopefully they have offsite backups.

The other risk I guess would be that all the NS are AWS/route53 severs, so if that went down they'd be up for about a minutes (looks like the DNS TTL is 2 minutes).

You could host your own NS servers in two different locations on two different providers, you could have a third hot spare server ready to go too, that would allow the service to survive an earthquake flattening San Diego (off site server ready to go), and cope with the loss of AWS DNS. Whether the cost/benefit ratio is there is another matter. I think the serving side of route53 is fairly reliable though (even when the interface to update records fails on a frequent basis), and the cost of being down isn't particularly terrible.

s.gif
Obligatory Lamport quote: "A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable."
Related question: How is HN funded? I presume through YC.
There are micro-outages somewhat frequently; I don’t mean that as a criticism but merely as an observation.
s.gif
I sometimes vote and click "Reply" with very small delay. This prompts a message that they can't serve requests that fast.

Other than that I have seen no downtime.

s.gif
That is just prudent overload protection (enforce certain time between actions) I think, and not that they are actually unable to handle it.
s.gif
I'm reading through hckrnews.com, as I'm sure others have their favorite readers.

Those function as proxies and lower the perception of downtime as well.

I've never experienced the website being inaccessible even for short periods.

s.gif
An interesting note would be that multiple 9s of uptime isn't actually necessary to keep users happy on a site like HN. I used to use sites in the early 2000s that were down a lot more regularly than HN but it didn't put me off using them.
It IS down occasionally: https://twitter.com/hnstatus

Not all "downs" are reflected there. Last time I remember having really bad perf or non-workable, don't remember - but opening in incognito, you would get cached results fast.

most of the outages usually happen when there are changes. hn has no changes.
They read file, save data to file. That's it.
I would love to know tech stack and architecture of HN. And how it evolved (if it did). And what resources they spend (money, manhours etc) to maintain it.
s.gif
It's written in arc, a custom dialect of lisp that Paul Graham made.
Maybe they are intentionally using boring tech, not using Cloudflare, AWS, etc and are self hosting somewhere, hopefully.
s.gif
> Maybe they are intentionally using boring tech

This might be true on the infrastructure layer, but HN definitly uses "fancy" technology as Paul developed his own lisp that powers HN, Arc :) http://arclanguage.org/

> Arc is designed for exploratory programming: the kind where you decide what to write by writing it. A good medium for exploratory programming is one that makes programs brief and malleable, so that's what we've aimed for. This is a medium for sketching software.

s.gif
news.yc might not be, but ycombinator.com and startupschool.org are on Cloudflare.
Powered by Illuminati technology.
s.gif
>Powered by Illuminati technology.

PG operating the switchboard.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK