27

Pavel Trukhanov: Real world SSD wearout

 5 years ago
source link: https://www.tuicool.com/articles/hit/QbIJjmr
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Real world SSD wearout

mEBVbiB.png!web

A year ago we’ve added SMART metrics collection to our monitoring agent that collects disk drive attributes on clients servers.

So here a couple of interesting cases from the real world.

Because we needed it to work without installing any additional software, like smartmontools, we implemented collection not of all the attributes, but only basic and not vendor-specific ones — to be able to provide consistent experience. And also that way we skipped burdensome task of maintaining a knowledge base of specific stuff — and I like that a lot :)

This time we’ll discuss only SMART attribute named “media wearout indicator”. Normalized, it shows a percentage of “write resource” left in the device. Under the hood the device keeps track of the number of cycles the NAND media has undergone, and the percentage is calculated against the maximum number of cycles for that device. The normalized value declines linearly from 100 to 1 as the average erase cycle count increases from 0.

Are there any actually dead SSDs?

Though SSDs are pretty common nowadays, just couple of years earlier you could hear a lot of fear talk about SSD wearout. So we wanted to see if some of it were true. So we searched for the maximum wearout across all the devices of all of our clients.

It was just 1%

Reading the docs says it just won’t go below 1%. So it is weared out .

We notified this client. Turns out it was a dedicated server in Hetzner. Their support replaced the device:

NV3Y3mr.png!web

Do SSDs die fast?

As we introduced SMART monitoring for some of the clients already some time ago, we have accumulated history. And now we can see it on a timeline.

A server with highest wearout rate we have across our clients servers unfortunately was added to okmeter.io monitoring only two month ago:

jUb2Qvi.png!web

This chart indicates that during these two month only, it burned through 8% of “write resource”.

So 100% of this SSD lifetime under that load will end in 100/(8/2) = 2 years .

Is that a lot or too little? I don’t know. But let’s check what kind of load it’s serving?

eAVRfyU.png!web

As you can see, it’s ceph doing all the disk writes, but it’s not doing these writes for itself — it’s a storage system for some application. This particular environment was running under Kubernetes, so let’s sneak a peek what’s running inside:

bAry2uR.png!web

It’s Redis! Though you might’ve noticed divergence in values with the previous chart — values here are 2 times lower (it’s probably due to ceph’s data replication), load profile is the same, so we conclude it’s redis after all.

Let’s see what redis is doing:

aummuar.png!web

So it’s on average less than 100 write commands per second. As you might know, there’s two ways Redis makes actual writes to disk:

  • RDB — which periodically snapshots all the dataset to the disk, and
  • AOF — which writes a log of all the changes.

It’s obvious that’s here we saw RDB with 1 minute dums:

nEFr6jA.png!web

Case: SSD + RAID

We see that there are three common patterns of server storage system setup with SSDs:

  • Two SSDs in a RAID-1 that holds everything there is.
  • Some HDDs + SSDs in a RAID-10 — we see that setup a lot on traditional RDBMS servers: OS, WAL and some “cold” data on HDD, while SSD array hold hotest data.
  • Just a bunch of SSDs (JBOD) for some NoSQL like Apache Cassandra.

So in the first case with RAID-1 writes go to both disks symmetrically, and wearout happens with the same rate:

JfYVVzu.png!web

Looking for some anomalies we found one server where it was completely different:

qABNzuN.png!web

Checking mount options, to understand this, didn’t produce much insight — all the partitions were RAID-1 mdraid s:

6R3i2aj.png!web

But looking for per device IO metrics we see, again, there’s difference between two disks. And /dev/sda gets more bytes written:

B77vUnf.png!web

Turns out there’s swap configured on one of the /dev/sda partitions. And pretty decent swap IO on this server:

b2EFZvA.png!web

SSD wearout and PostgreSQL

This journey began with me looking to check SSD wearout with different Postgres write load profiles. But not much luck — all of our client’s Postgres databases, with at least somewhat high write load, are configured pretty carefully — writes go mostly to HDDs.

But I found one pretty interesting case nevertheless:

FzqUfea.png!web

We see these two SSDs in a RAID-1 weared out 4% during 3 months. But checking if it’s high amount of WAL writes turned out to be wrong — it’s only less than 100Kb/s:

VF3EZ3q.png!web

I figured that probably Postgres generates writes in some other way, and it is indeed. Constant temp files writes all the time:

MjAzuaJ.png!web

Thanks to Postgres elaborate internal statistics and okmeter.io’s rich support for it, we easily spotted the root cause of that:

fuaauuR.png!web

It was a SELECT query generating all that load and wearout! SELECT ’s in Postgres can sometime generate even non-temp file, but real writes.Read about it here.

Summary

  • Redis+RDB generates a ton of disk writes and it depends not on the amount of changes in Redis db, but on DB size and dump frequency. RDB seems to produce the maximum Write Amplification from all known to me storages.
  • Actively used SWAP on SSD is probably a bad idea. Unless you want to add some jitter to RAID-1 SSDs wearout.
  • In DBMSes like Postgresql it might be not only WAL and datafiles that dominate disk writes. Bad database design or access patterns might produce a lot of temp files writes. Read how to monitor Postgres queries .

That’s all for today. Be aware of your SSDs wearout!

Follow us onour blog or twitter to read more cases.

We at okmeter.io believe that for engineer to dig up a root cause of a problem, he needs decent tooling and a lot of metrics on every layer and part of infrastructure. That where we’re trying to help.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK