3

MongoDB 101: How to Tune Your MongoDB Configuration After Upgrading to More Memo...

 2 years ago
source link: https://www.percona.com/blog/2021/01/08/mongodb-101-how-to-tune-your-mongodb-configuration-after-upgrading-to-more-memory/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

MongoDB configurationIn this post, we will be discussing what to do when you add more memory to your MongoDB deployment, a common practice when you are scaling resources.

Why Might You Need to Add More Memory?

Scaling resources is a way of adding more resources to your environment.  There are two main ways this can be accomplished: vertical scaling and horizontal scaling.

  • Vertical scaling is increasing hardware capacity for a given instance, thus having a more powerful server.
  • Horizontal scaling is when you add more servers to your architecture.   A pretty standard approach for horizontal scaling, especially for databases,  is load balancing and sharding.

As your application grows, working sets are getting bigger, and thus we start to see bottlenecks as data that doesn’t fit into memory has to be retrieved from disk. Reading from disk is a costly operation, even with modern NVME drives, so we will need to deal with either of the scaling solutions we mentioned.

In this case, we will discuss adding more RAM, which is usually the fastest and easiest way to scale hardware vertically, and how having more memory can be a major help for MongoDB performance.

How to Calculate Memory Utilization in MongoDB

Before we add memory to our MongoDB deployment, we need to understand our current Memory Utilization.  This is best done by querying serverStatus and requesting data on the WiredTiger cache.

Since MongoDB 3.2, MongoDB has used WiredTiger as its default Storage Engine. And by default, MongoDB will reserve 50% of the available memory – 1 GB for the WiredTiger cache or 256 MB whichever is greater.

For example, a system with 16 GB of RAM, would have a WiredTiger cache size of 7.5 GB.

Shell
( 0.5 * (16-1) )

The size of this cache is important to ensure WiredTiger is performant. It’s worth taking a look to see if you should alter it from the default. A good rule is that the size of the cache should be large enough to hold the entire application working set.

How do we know whether to alter it? Let’s look at the cache usage statistics:

Shell
db.serverStatus().wiredTiger.cache
"application threads page read from disk to cache count" : 9,
"application threads page read from disk to cache time (usecs)" : 17555,
"application threads page write from cache to disk count" : 1820,
"application threads page write from cache to disk time (usecs)" : 1052322,
"bytes allocated for updates" : 20043,
"bytes belonging to page images in the cache" : 46742,
"bytes belonging to the history store table in the cache" : 173,
"bytes currently in the cache" : 73044,
"bytes dirty in the cache cumulative" : 38638327,
"bytes not belonging to page images in the cache" : 26302,
"bytes read into cache" : 43280,
"bytes written from cache" : 20517382,
"cache overflow score" : 0,
"checkpoint blocked page eviction" : 0,
"eviction calls to get a page" : 5973,
"eviction calls to get a page found queue empty" : 4973,
"eviction calls to get a page found queue empty after locking" : 20,
"eviction currently operating in aggressive mode" : 0,
"eviction empty score" : 0,
"eviction passes of a file" : 0,
"eviction server candidate queue empty when topping up" : 0,
"eviction server candidate queue not empty when topping up" : 0,
"eviction server evicting pages" : 0,
"eviction server slept, because we did not make progress with eviction" : 735,
"eviction server unable to reach eviction goal" : 0,
"eviction server waiting for a leaf page" : 2,
"eviction state" : 64,
"eviction walk target pages histogram - 0-9" : 0,
"eviction walk target pages histogram - 10-31" : 0,
"eviction walk target pages histogram - 128 and higher" : 0,
"eviction walk target pages histogram - 32-63" : 0,
"eviction walk target pages histogram - 64-128" : 0,
"eviction walk target strategy both clean and dirty pages" : 0,
"eviction walk target strategy only clean pages" : 0,
"eviction walk target strategy only dirty pages" : 0,
"eviction walks abandoned" : 0,
"eviction walks gave up because they restarted their walk twice" : 0,
"eviction walks gave up because they saw too many pages and found no candidates" : 0,
"eviction walks gave up because they saw too many pages and found too few candidates" : 0,
"eviction walks reached end of tree" : 0,
"eviction walks started from root of tree" : 0,
"eviction walks started from saved location in tree" : 0,
"eviction worker thread active" : 4,
"eviction worker thread created" : 0,
"eviction worker thread evicting pages" : 902,
"eviction worker thread removed" : 0,
"eviction worker thread stable number" : 0,
"files with active eviction walks" : 0,
"files with new eviction walks started" : 0,
"force re-tuning of eviction workers once in a while" : 0,
"forced eviction - history store pages failed to evict while session has history store cursor open" : 0,
"forced eviction - history store pages selected while session has history store cursor open" : 0,
"forced eviction - history store pages successfully evicted while session has history store cursor open" : 0,
"forced eviction - pages evicted that were clean count" : 0,
"forced eviction - pages evicted that were clean time (usecs)" : 0,
"forced eviction - pages evicted that were dirty count" : 0,
"forced eviction - pages evicted that were dirty time (usecs)" : 0,
"forced eviction - pages selected because of too many deleted items count" : 0,
"forced eviction - pages selected count" : 0,
"forced eviction - pages selected unable to be evicted count" : 0,
"forced eviction - pages selected unable to be evicted time" : 0,
"forced eviction - session returned rollback error while force evicting due to being oldest" : 0,
"hazard pointer blocked page eviction" : 0,
"hazard pointer check calls" : 902,
"hazard pointer check entries walked" : 25,
"hazard pointer maximum array length" : 1,
"history store key truncation calls that returned restart" : 0,
"history store key truncation due to mixed timestamps" : 0,
"history store key truncation due to the key being removed from the data page" : 0,
"history store score" : 0,
"history store table insert calls" : 0,
"history store table insert calls that returned restart" : 0,
"history store table max on-disk size" : 0,
"history store table on-disk size" : 0,
"history store table out-of-order resolved updates that lose their durable timestamp" : 0,
"history store table out-of-order updates that were fixed up by moving existing records" : 0,
"history store table out-of-order updates that were fixed up during insertion" : 0,
"history store table reads" : 0,
"history store table reads missed" : 0,
"history store table reads requiring squashed modifies" : 0,
"history store table remove calls due to key truncation" : 0,
"history store table writes requiring squashed modifies" : 0,
"in-memory page passed criteria to be split" : 0,
"in-memory page splits" : 0,
"internal pages evicted" : 0,
"internal pages queued for eviction" : 0,
"internal pages seen by eviction walk" : 0,
"internal pages seen by eviction walk that are already queued" : 0,
"internal pages split during eviction" : 0,
"leaf pages split during eviction" : 0,
"maximum bytes configured" : 8053063680,
"maximum page size at eviction" : 376,
"modified pages evicted" : 902,
"modified pages evicted by application threads" : 0,
"operations timed out waiting for space in cache" : 0,
"overflow pages read into cache" : 0,
"page split during eviction deepened the tree" : 0,
"page written requiring history store records" : 0,
"pages currently held in the cache" : 24,
"pages evicted by application threads" : 0,
"pages queued for eviction" : 0,
"pages queued for eviction post lru sorting" : 0,
"pages queued for urgent eviction" : 902,
"pages queued for urgent eviction during walk" : 0,
"pages read into cache" : 20,
"pages read into cache after truncate" : 902,
"pages read into cache after truncate in prepare state" : 0,
"pages requested from the cache" : 33134,
"pages seen by eviction walk" : 0,
"pages seen by eviction walk that are already queued" : 0,
"pages selected for eviction unable to be evicted" : 0,
"pages selected for eviction unable to be evicted as the parent page has overflow items" : 0,
"pages selected for eviction unable to be evicted because of active children on an internal page" : 0,
"pages selected for eviction unable to be evicted because of failure in reconciliation" : 0,
"pages walked for eviction" : 0,
"pages written from cache" : 1822,
"pages written requiring in-memory restoration" : 0,
"percentage overhead" : 8,
"tracked bytes belonging to internal pages in the cache" : 5136,
"tracked bytes belonging to leaf pages in the cache" : 67908,
"tracked dirty bytes in the cache" : 493,
"tracked dirty pages in the cache" : 1,
"unmodified pages evicted" : 0

There’s a lot of data here about WiredTiger’s cache, but we can focus on the following fields:

  • wiredTiger.cache.maximum bytes configured: This is the current maximum cache size.
  • wiredTiger.cache.bytes currently in the cache – This is the size of the data currently in the cache.   This is typically 80% of your cache size plus the amount of “dirty” cache that has not yet been written to disk. This should not be greater than the maximum bytes configured.  Having a value equal to or greater than the maximum bytes configured is a great indicator that you should have already scaled out.
  • wiredTiger.cache.tracked dirty bytes in the cache – This is the size of the dirty data in the cache. This should be less than five percent of your cache size value and can be another indicator that we need to scale out.   Once this goes over five percent of your cache size value WiredTiger will get more aggressive with removing data from your cache and in some cases may force your client to evict data from the cache before it can successfully write to it.
  • wiredTiger.cache.pages read into cache – This is the number of pages that are read into cache and you can use this to judge its per-second average to know what data is coming into your cache.
  • wiredTiger.cache.pages written from cache – This is the number of pages that are written from the cache to disk.   This will be especially heavy before checkpoints have occurred.  If this value continues to increase, then your checkpoints will continue to get longer and longer.

Looking at the above values, we can determine if we need to increase the size of the WiredTiger cache for our instance.  We might also look at the WiredTiger Concurrency Read and Write Ticket usage.  It’s fine that some tickets are used, but if the number continues to grow towards the number of cores then you’re reaching saturation of your CPU.  To check your tickets used you can see this in Percona Monitoring and Management Tool (PMM) or run the following query:

Shell
db.serverStatus().wiredTiger.concurrentTransactions
"write" : {
"out" : 0,
"available" : 128,
"totalTickets" : 128
"read" : {
"out" : 1,
"available" : 127,
"totalTickets" : 128

The wiredTiger.cache.pages read into cache value may also be indicative of an issue for read-heavy applications. If this value is consistently a large part of your cache size, increasing your memory may improve overall read performance.

Example

Using the following numbers as our example starting point, we can see the cache is small and there is definitely memory pressure on the cache:

We also are using the default wiredTiger cache size, so we know we have 16 GB of memory on the system (0.5 * (16-1)) = 7.5 GB.   Based on our knowledge of our (imaginary) application, we know the working set is 16 GB, so we want to be higher than this number.  In order to give us room for additional growth since our working set will only continue to grow, we could resize our server’s RAM from 16 GB to 48 GB.  If we stick with the default settings, this would increase our WiredTiger cache to 23.5 GB. (0.5 * (48-1)) = 23.5 GB.  This would leave 24.5 GB of RAM for the OS and its filesystem cache.  If we wanted to increase the size given to the WiredTiger cache we would set the storage.wiredTiger.engineConfig.cacheSizeGB to the value we wanted.   For example, say we want to allocate 30 GB to the wiredTiger cache to really avoid any reads from disk in the near term, leaving 18 GB for the OS and its filesystem cache.   We would add the following to our mongod.conf file:

Shell
storage:
   wiredTiger:
       engineConfig:
           cacheSizeGB: 30

For either the default setting or the specific settings to recognize the added memory and take effect, we will need to restart the mongod process.

Also note that unlike a lot of other database systems where the database cache is typically sized closer to 80-90% of system memory, MongoDB’s sweet spot is in the 50-70% range.  This is because MongoDB only uses the WiredTiger cache for uncompressed pages, while the operating system caches the compressed pages and writes them to the database files.  By leaving free memory to the operating system, we increase the likelihood of getting the page from the OS cache instead of needing to do a disk read.

Summary

In this article, we’ve gone over how to update your MongoDB configuration after you’ve upgraded to more memory.   We hope that this helps you tune your MongoDB configuration so that you can get the most out of your increased RAM.   Thanks for reading!

Additional Resources:

MongoDB Best Practices 2020 Edition

Tuning MongoDB for Bulk Loads

STAY UP-TO-DATE With Percona!

p

Join 50,000+ of your fellow open-source enthusiasts! Our newsletter provides updates on Percona open source software releases, technical resources, and valuable MySQL, MariaDB, PostgreSQL, and MongoDB-related articles. Get information about Percona Live, our technical webinars, and upcoming events and meetups where you can talk with our experts.

Enter your work email address:*

By submitting my information I agree that Percona may use my personal data in send communication to me about Percona services. I understand that I can unsubscribe from the communication at any time in accordance with the Percona Privacy Policy.

Author

Mike Grayson

Mike is a database engineer who focuses on MongoDB for the Percona Managed Services Team. He helps keep our Managed Services customers MongoDB databases available and performant. He is AWS and Azure certified.


Leave a Reply Cancel reply


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK