4

Brute force sizing NetApp E-Series

 1 year ago
source link: https://scaleoutsean.github.io/2022/09/04/brute-force-sizing-netapp-eseries.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Brute force sizing NetApp E-Series

04 Sep 2022 -

12 minute read

Introduction

Some time ago I wrote a demo of a “Brute Force Sizer for SolidFire” which I really liked because it saved me a lot of time compared to the official company tool.

This week I decided to do something similar for E-Series when configured for BeeGFS.

It took me hours, but sizing with E-Series tools also takes hours and it happens every week.

For now I’m still thinking if I should share this now or maybe later if I make it into a Web app… But this simple to replicate on your own: you can find supported RAID configurations, disk sizes and enclosure counts from E-Series data sheets - there’s no secret sauce of any kind.

Considerations

Configuring NetApp E-Series has more options compared to SolidFire, so a brute force sizer must eliminate some combinations. On the other hand, BeeGFS shouldn’t be attached to E-Series with just any configuration, which helps.

Here’s what I focused on:

  • Proper RAID 6 (8D2P) for Data, with several other options for EF600 where up to 24 disks can be used in controller shelf
  • Some DDP configurations that result in 5, 4, 2 and 1 DDP per 60-drive enclosure
  • Small RAID 1 / RAID10 groups on EF600 useful for Metadata

Five DDP (each with 12 disks and 2 disks of spare capacity) per enclosure is weird, but DDP starts at 11 disks which is even weirder.

For NL-SAS disk sizes, I only used 10TB and larger and didn’t bother with smaller disks because they use a lot of power and rack units which is okay when you need under 200-300 TB of capacity but in the case of BeeGFS almost no one does.

Create configuration options

With these configuration “principles” in place, I wrote a bunch of if-then loops to generate various permutations.

ComboID,Model,Enclosure,Protection,DiskCount,DiskSize,RawTB,UsableTB,UsableTiB,BeeGFSTiB
0,E5760,1,R6,10,10,100,80,72.8,71.3
1,E5760,1,R6,10,12,120,96,87.3,85.6
2,E5760,1,R6,10,18,180,144,131,128.3
3,E5760,1,R6,20,10,200,160,145.5,142.6
4,E5760,1,R6,20,12,240,192,174.6,171.1
5,E5760,1,R6,20,18,360,288,261.9,256.7
6,E5760,1,R6,30,10,300,240,218.3,213.9
7,E5760,1,R6,30,12,360,288,261.9,256.7
...
483,EF600H,7,DDP,60,10,4200,4060,3692.5,3618.7
484,EF600H,7,DDP,60,12,5040,4872,4431.1,4342.4
485,EF600H,7,DDP,60,18,7560,7308,6646.6,6513.7

This resulted in 486 configuration combinations and a 21kB CSV file.

Some important points:

  • ComboID is a unique ID for particular combination created by my config generator script, which helps when the other person isn’t using Jupyter and just needs some inputs to create a price quote and such. I can paste that row or give them the CSV file and tell them “please create a quote for Combo IDs 5 and 6”
  • BeeGFSTiB is a TiB value of the sum of BeeGFS formatted E-Series data disks
  • (Quantity of) Enclosure(s) helps me watch rack space

Choice of query tool

My ideal query tool would be a single page app with Java Script tables that can sort, filters and enable/disable certain options.

In practice that turned out to be more complicated than necessary and I also realized I’d need friendly ways to export table data as text, and more.

I did something like that for SolidFire Capacity and Efficiency Report Generator, but remembering the hassle, I went back to Jupyter.

Show available options

This scatter chart isn’t actionable and doesn’t have a legend, but I wanted to see how all of capacity options look like:

Scatter chart of available options

Key points:

  • Blue and green are E5760 and EF600 Hybrid array, respectively. These can go to several PiB. I knew that before I saw the chart, but it’s nice to see that configuration options for larger capacities become scarce once you get to 3-4PiB range. This also means when querying for sub-3PiB ranges, we should use a narrow range, while for larger capacity we can use a wider range, in order to get some, but not too many, solutions that satisfy these criteria
  • Red dots are EF600-based combinations (single all-flash NVMe controller shelf) - usually used for metadata, so still relevant. Personally I usually size for BeeGFS Data capacity first, and for Metadata after that (and possibly using a dedicated EF600 unit for that)

What’s the difference between EF600 and EF600 Hybrid? The latter has SAS expansion enclosures which can accommodate NL-SAS HDD and SAS SSD drives. There are other, subtler differences, as well.

Below is a bar chart of by BeeGFSTiB of different combo IDs. It can be sorted by BeeGFSTiB ascending and have ComboIDs displayed on the side but it’d take a vertically large image to do that.

Combinations for BeeGFSTiB

Query capacity range

Currently I have just one query type, which is for a capacity range of BeeGFS data (in TiB). As an example, I entered 2400 and 2600 (TiB).

Results of that query can be plotted as bar chart, with the length of blue bars representing usable BeeGFS Data capacity in TiB (again, I don’t have ComboIDs shown, but it’s hard for IDs that are next to each other - I suppose should create another data set from the results and plot that).

Brute force sizing for capacity range of Data

That aside, this chart gives us some useful hints.

  • At the very bottom, at least two combinations barely make it. We definitively need to check out those
  • Many combinations overshoot minimum usable BeeGFS Data capacity by a lot, so I could have used a smaller range (e.g. 2400-2500) to get fewer choices

It’s time to inspect these configurations.

Manual selection of promising candidates

After drawing that chart I also print out the result in text format, to be able to reason about it.

Resulting combinations are sorted by Enclosure count and Data capacity, because these are usually most important properties when evaluating a combination.

   ComboID  Model      Enclosure  Protection      DiskCount    DiskSize    BeeGFSTiB
 ---------  -------  -----------  ------------  -----------  ----------  -----------
        86  E5760              3  DDP                    12          18       2406.5
       341  EF600H             3  DDP                    12          18       2406.5
        89  E5760              3  DDP                    15          18       2502.8
       344  EF600H             3  DDP                    15          18       2502.8
        92  E5760              3  DDP                    20          18       2599
       347  EF600H             3  DDP                    20          18       2599
       130  E5760              4  DDP                    60          12       2481.4
       385  EF600H             4  DDP                    60          12       2481.4
       113  E5760              4  R6                     50          18       2567
       368  EF600H             4  R6                     50          18       2567

Conclusion:

  • All top configurations - including the two that barely make it, are based on DDP. For sequential IO DDP is slower than RAID 6 (8D2P) but if I needed a cluster for non-sequential or mixed IO maybe I’d go with Combo ID 86. Notice that 3 enclosures x 60 disks = 180 disks.
  • For highest performance, I’d go down the list until I find a RAID 6-based combination. Those are 113 and 368 which are essentially the same, just with different controllers. I’d need more RU (space) - 4 enclosures (plus 2U for EF600 Hybrid which has a 2U EF600 controller shelf can only hold NVMe SSDs, in the case of EF600 Hybrid - this can be confusing but I didn’t want to complicate the tool to account for that). So I’d probably go with Combo ID 113 for this one.
    • One of the reasons I didn’t want to account for the extra 2U for EF600 Hybrid’s controller shelf in these combinations is what I said at the top - usually I use that EF600 controller shelf for Metadata so I need those 2Us for SSDs either here, or in a separate EF600 or E2824 or EF300 2U unit dedicated to BeeGFS Metadata.

Discuss and configure

People usually like to check top 2-3 options and discuss advantages and disadvantages, so we can first copy top candidates and share them via email.

   ComboID  Model      Enclosure  Protection      DiskCount    DiskSize    BeeGFSTiB
 ---------  -------  -----------  ------------  -----------  ----------  -----------
        86  E5760              3  DDP                    12          18       2406.5
       341  EF600H             3  DDP                    12          18       2406.5
       113  E5760              4  R6                     50          18       2567
       368  EF600H             4  R6                     50          18       2567

I know I just said “2-3” and then came up with four.

I chose 4 options rather than “2-3” because it’s really just two options with different arrays, and the third option is the array model - whether we want the E5760 or EF600 Hybrid.

If you have more conviction, or if the requirements are very specific, you can even choose the winning combination right away.

Where’s the metadata

That’s easy to size because Metadata is just a fraction of Data.

In fact, while writing this post this reminded me I can just take the min value from Data, multiply it by 0.02 and 0.04 to get min_md and max_md, respectively, and because I have several RAID 1 “combos” I may be able to do this right away. Took me several minutes to figure out why I couldn’t get any RAID 1-like results.

I assumed needed at least 48 TiB for Metadata (2% of Data). UsableTiB, rather than usable BeeGFSTiB, are used in this Jupyter query.

To illustrate the problem with RAID 1 configurations I had: in all cases Usable TiB is less than my minimum metadata capacity requirement (at first I couldn’t even see this graph):

Brute force sizing for RAID 1-based Metadata capacity

I had to temporary allow all Protection levels (RAID6, DDP, R1) to get configuration suggestions matching my criteria. But to really fix this, I had to generate a few more RAID 1/RAID 10 combinations, which resulted in additional 10 combinations in the CSV file. Updated result:

Brute force sizing for capacity range of Metadata with R1
  ComboID  Model    Protection      DiskCount    DiskSize    UsableTiB
---------  -------  ------------  -----------  ----------  -----------
      488  EF600    R1                     14           8         49.1
      473  EF600    R1                      8          15         55.5
      494  EF600    R1                     16           8         55.5
      500  EF600    R1                     18           8         62.8

But even with this problem, it’s easy to size: because all my Data are in NL-SAS shelves, I either can populate up to 24 slots of the EF600 Hybrid controller shelf I use for Data, or get a new EF300 or EF600 for Metadata. So the answer is simple:

  • just double the number of drives from Combo ID 253 and use two groups with 4 x 15.3TB disks each (2 * 4 * 15 * 0.5) to get 55.2 TiB for Metadata, or
  • use two RAID 10 groups with 8 x 7.68TB each for the same amount of Metadata capacity (2 * 8 * 7.68 * 0.5)
  • if we need more performance - use more smaller disks. If not, use fewer large disks. We can’t use 3.84TB disks in RAID 10 here because there’s only 24 disk slots available. We could use DDP (Combo ID 240), but there’s no need for that - RAID 1 is much more suitable for Metadata and 16 7.68TB disks give us IOPS we need to power this 2.4PiB system

What about the performance

Similar to the disk size thing I mentioned above, usually there’s enough disks involved to hit the array maximum. If - while doing the above - I knew the requirement was 100GB/s sequential read, knowing how fast each array is I’d know I need (for example) four, so I’d pick a combo with four enclosures and make it four arrays with 1 (controller) enclosure each. That’s usually easier to do than finding suitable number of disks and enclosures.

The other thing is given four alternatives, it’s easy to tell which one is faster (for NL-SAS, it’s usually the one with more disks and RAID 6), so if I need to meet specific performance target I’d pick one or two and verify performance sizing based on other tools available.

Why not use MS Excel

It’s not an either-or question - the CSV file with configuration combinations can be imported to Excel and it’s only a matter of querying it from within Excel. So that option is there.

The reason I chose to use Jupyter over Excel is that it’s easy to use from a browser and locally, without waiting for the file to open and having to click more times than necessary. With Jupyter I just need to input two values (Min BeeGFS TiB, Max BeeGFS TiB) and click Run. The same Python code can run from the CLI, while which is another benefit for me or folks who can’t/won’t run Jupyter.

I also created the CLI version - it can work with CSV files on S3 (using S3 Select; the same way I did it for SolidFire), and because CLI output isn’t easily sort-able, I sort the output several ways to be able to focus on my sizing priority (whether it’s capacity, cost, performance…):

S3 Select version with pre-sorted pretty tables

MS Excel is more feature-full and the charts could be amazing, but the speed and flexibility to send plain text output via instant messaging or email made me go with Python. (In the case you prefer PowerShell, that would work just as well - Jupyter can use PowerShell as well).

Conclusion

This approach doesn’t give a single solution, but it helps me narrow thousands of possible ways to just a handful, which I can review to 2-3 most promising approaches.

It takes 10 seconds to start Jupyter, five to key in Min/Max values and hit Run, and probably around 30 to find best candidates.

For comparison, it can take up to one minute just to open a new browser tab, and authenticate with 2FA and have a heavy sizing application load. And from there we still have to do run the tool half a dozen times to find what takes just 30 seconds in Jupyter.

I may create an HTML version (with CSS or JS) as well. Not that I need it, but it would be easier to on me if I could just put this online as a single page Web app.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK