Merging Empty Chunks in MongoDB
source link: https://www.percona.com/blog/merging-empty-chunks-in-mongodb/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
I recently wrote about one of the problems we can encounter while working with sharded clusters, which is Finding Undetected Jumbo Chunks in MongoDB. Another issue that we might run into is dealing with empty chunk management.
As we know, there is also an autoSplitter process that partitions chunks when they become too big. There is also a balancer process that takes care of moving chunks to ensure even distribution between all shards. So as data grows, chunks are partitioned and perhaps moved over to other shards and all is well.
But what happens when we delete data? It can be the case that some chunks are now empty. If we delete a lot of data, perhaps a significant number of the chunks will be empty. This can be a significant issue for sharded collections with a TTL index.
One of the potential problems when dealing with a high percentage of empty chunks is uneven data distribution. The balancer will make sure the number of chunks on each shard is roughly the same, but it does not take into account whether the chunks are empty or not. So you might end up with a cluster that looks balanced, but in reality, a few shards have way more data than the rest.
To deal with this problem, the first step is to identify empty chunks.
Identifying Empty Chunks
To illustrate this, let’s consider a client’s collection that is sharded by the “org_id” field. Let’s assume the collection currently has the following chunks ranges:
minKey –> 1
1 -–> 5
5 —-> 10
10 –> 15
15 —-> 20
We can use the dataSize command to determine the size of a chunk. This command receives the chunk range as part of the arguments. For example, to check how many documents we have on the third chunk, we would run:
This returns a document like the following:
If the size is 0 we know we have an empty chunk, and we can consider merging it with either the chunk that comes right after it (with the range 10 → 15) or the one just before it (with the range 1 → 5).
Assuming we take the first option, here is the mergeChunks command that helps us get this done:
The new chunk ranges now would be as follows:
minKey –> 1
1 —-> 5
5 —-> 15
15 —-> 20
One caveat is that the chunks we want to merge might not be on the same shard. If that is the case we need to move them together first, using the moveChunk command.
Putting it All Together
Following the above logic, we can iterate through all the chunks in shard key order and check their size. If we find an empty chunk, we merge it with the chunk just before it. If the chunks are not on the same shard, we move them together. The following script can be used to print all the commands required:
We can invoke it from the Mongo shell as follows:
The script will generate all the commands needed to merge pairs of chunks where at least one is empty. After running the generated commands, this should cut the number of empty chunks in half. Running the script multiple times will eventually get rid of all the empty chunks.
Most people are aware of the problems with jumbo chunks; now we have seen how empty chunks can also be problematic in certain scenarios.
It is a good idea to stop the balancer before attempting any operation that modifies chunks (like merging the empty chunks). This ensures that no conflicting operations happen at the same time. Don’t forget to enable back the balancer afterward.
Aggregate valuable and interesting links.
Joyk means Joy of geeK