7

Notes on NetApp E-Series Performance Analyzer

 1 year ago
source link: https://scaleoutsean.github.io/2022/10/26/eseries-performance-analyzer-e-series.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Notes on NetApp E-Series Performance Analyzer

26 Oct 2022 -

6 minute read

E-Series Performance Analyzer or SANtricity Web UI

The other day I ran a simple performance test with MinIO backed by an E-Series storage array.

GET hit over 4 GB/s but SANtricity (E-Series Web UI) averaged less because the benchmark exits after a short time if IO levels are stable, which they were.

GET Benchmark in SANtricity Web UI

For this and other reasons, this week I decided to try E-Series Performance Analyzer release v3.0.0.

I knew that due to the 30s interval (I did not change the default) I wouldn’t get a better effect with Grafana. The first bump was a GET test (preceded by PUT, to create files) while the second (annotated) was the mixed workload test:

GET Benchmark in Grafana

(You may open this in a new tab to see a larger image.)

Log of the second (mixed benchmark) run shown in Grafana shows “cluster total” is 3,794.75 MiB/s, but Grafana shows 2,200 MiB/s (aggregate from the four MinIO volumes).


Throughput 2827.9MiB/s within 7.500000% for 10.003s. Assuming stability. Terminating benchmark.
warp: Benchmark data written to "warp-mixed-2022-10-26[034738]-lI2M.csv.zst"
Mixed operations.
Operation: DELETE, 10%, Concurrency: 20, Ran 35s.
 * Throughput: 63.51 obj/s

Operation: GET, 45%, Concurrency: 20, Ran 35s.
 * Throughput: 2849.26 MiB/s, 284.93 obj/s

Operation: PUT, 15%, Concurrency: 20, Ran 35s.
 * Throughput: 949.28 MiB/s, 94.93 obj/s

Operation: STAT, 30%, Concurrency: 20, Ran 35s.
 * Throughput: 190.05 obj/s

Cluster Total: 3794.75 MiB/s, 632.87 obj/s over 36s.

From this we can conclude that Grafana won’t give you “better” results for bursty moments because just like SANtricity Web UI, unless it takes samples every second or even every few seconds (which would have helped with this particular workload). When using the default setting (30s metrics-gathering interval) in EPA, SANtricity Web UI can give more precise results.

E-Series Performance Analyzer allows you to set an arbitrary interval for gathering performance metrics, but I wouldn’t lower it below 30s. In fact I prefer to set it to 60 seconds as I do with SolidFire Collector. There’s no point in overloading the array API endpoint and especially so if you already get the same information from the application or even the filesystem as is the case with BeeGFS and other filesystems.

Performance monitor in SANtricity Web UI isn’t as flexible and customizable (layout, averaging, retention, indicators, etc.), but it runs in the controllers. These are the usual pro’s and con’s for all storage systems.

E-Series Performance Analyzer notes

What follows is few additional notes on E-Series Performance Analyzer.

Those new to E-Series:

  • These days SANtricity Web UI/API runs on E-Series controllers
  • API can be accessed directly (SANtricity on array controller(s)) or - this is more secure and very common - through the SANtricity Web Services Proxy (WSP)
  • E-Series Performance Analyzer is essentially a sophisticated Docker Compose file which has all services required, including WSP, and it works out of the box

The README file is good and accurate, although too long for my liking. Still, I won’t rehash it here - it is correct and complete

In root directory you’ll find .env and .auth.env. The latter has an easy-to-guess WSP password, and you can change it before you build containers, but if you do that you also need to change the password in plugins/eseries_monitoring/collector/config.json (it’s documented in README.md, no need to remember, but remember to read the file).

Core E-Series-related code is in plugins/eseries_monitoring:

  • Python collector

You could use docker-compose.yaml from that directory or even just Python collector container and run it against existing WSP (which may be managed by someone else), as long as you had the credentials. You could also use it with another Grafana and InfluxDB v1.

The collector script sends data to the older InfluxDB v1. As I described in the BeeGFS performance monitoring post (see that BeeGFS link above), BeeGFS monitor in 7.3.0 also sends data to InfluxDB v1, so maybe you want to use that DB or (see that post) you can massage your pipeline and send data elsewhere.

Interestingly, it seems there is a ready-made version of collector script that can send data to Graphite or any Graphite sink. I haven’t tested it so I don’t know if it works but even if it doesn’t, it shouldn’t be too hard to make it work.

Personally I would prefer to run just the Collector container and the rest (metrics database, Grafana, WSP) should be shared services used by the rest of infrastructure. This is also how I modified SolidFire Collector to work - there’s no need to build application islands.

Related to authentication, E-Series has a read-only monitor role, but it seems it cannot change its own password (i.e. only an administrator account can change a monitor account’s password) so as of now that’s a bit tricky because if SANtricity and/or WSP passwords change outside of your control, you will have to update EPA’s configuration (Collector, maybe WSP as well), and use make run to have Docker Compose (and possibly WSP, if you run it) pick up up-to-date authentication settings.

One way to automate password rotation for WSP (if you run it) is to store the password in Kubernetes secrets (or a vault) and restart containers when it changes. The same approach can work for Collector.

The entire process can be automated through the SANtricity API and if you choose this approach it is probably advisable to change SANtricity’s administrator password in the same run, so that you can use that new password to immediately change the monitor account’s password. The reason is you need an admin password to change the monitor account password in any case, so you may as well change all passwords that need to be rotated and do it in one script:

  • Change SANtricity admin password
  • Change SANtricity monitor password
  • Change WSP password in Kubernetes secrets
  • Change/update Kubernetes secrets for Collector
  • Restart EPA and/or WSP (or set up a watcher to restart these automatically)

Another interesting detail for non-root users may be that docker-compose up can’t be used without sudo because InfluxDB data directory is root-owned, so use make run rather than docker-compose. As I mentioned above, I think this is over-engineered (so to speak) - I’d prefer no Makefile a just a simple docker-compose YAML with 2 containers (WSP and Collector). To be fair, I wouldn’t blame the developers for this approach because not everyone is a DevOps fanatic - in fact most many users don’t want to do much more than edit config files, type make run and hit ENTER.

Summary

E-Series Performance Analyzer is a bit complex, but it does work as advertised.

If you have the skills to do it, I suggest taking just the Collector script and pointing it to use your existing Web Services Proxy or even let it talk directly to the array without using a proxy.

If you prefer Graphite metrics over using InfluxDB v1 or converting on the fly, try the Graphite version.

You will notice that both Grafana and WSP Web interface are exposed on external Docker host network. You can restrict this (see README.md).


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK