2

Percona Backup for MongoDB: Restore a Single Collection From Backup

 2 weeks ago
source link: https://www.percona.com/blog/percona-backup-for-mongodb-restore-a-single-collection-from-backup/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Percona Backup for MongoDB: Restore a Single Collection From Backup

April 4, 2024

Michael Villegas

When you design a backup strategy, you need to think about the business requirements, as you will need to shape your backups to meet them. Let’s review the basics briefly; you need to define the RPO and RTO. The RPO stands for “Recovery Point Objective”, which means how far back you will be able to recover. The RTO stands for “Recovery Time Objective”, this is the time the business expects the data to be recovered. This article will focus on one scenario that can help to meet the RTO.

The scenario

Imagine the company has a replica set running Percona Server for MongoDB 6.0 (PSMDB). This replica set has a footprint of one terabyte. The operations team has also configured Percona Backup for MongoDB (PBM) to generate physical and logical backups. One terrible day, a chain of unfortunate events occurs; one developer gets a call from his manager about a critical bug found on the PRODUCTION system, he quickly goes through the code that was released yesterday and finds that the issue can be easily fixed by removing a set of documents inserted on a collection. As he has read-write access to the PRODUCTION database, he decided to be fast and run the delete command directly on PRODUCTION to try to mitigate the issue as fast as possible. As you can imagine, when someone does things fast, the tendency to make a mistake is high. This was the case, and 90% of the documents in this collection were removed, and now the problem is even bigger than a critical bug — the system is completely down.

The solution

Since the database is rather large, it can take a long time to restore the whole thing, and given that a single collection is the culprit, it will be faster to execute a restore for that single collection.

The first thing to do is to list the backups available:

$ pbm list
Backup snapshots:
  2024-03-22T20:42:50Z <logical> [restore_to_time: 2024-03-22T20:43:12Z]
  2024-03-22T21:45:35Z <physical> [restore_to_time: 2024-03-22T21:45:36Z]
PITR <off>:
  2024-03-22T20:43:13Z - 2024-03-22T20:52:58Z

Next, we need to find the most recent logical backup, as the restore of selected collections requires a logical backup. In this case, the backup that we need is “2024-03-22T20:42:50Z”.

Now, we have two options: 

  1. Restore the collection on the live database: This will overwrite the existing data in the collection. If you are sure that no additional data was added to the collection, then this definitely is the fastest and simplest path.
  2. Restore the collection on a temporary instance: This will allow you to export and import the data into the live database without overwriting the new data generated. This alternative adds more steps to the process, but we can preserve the existing data.

Option one

Restore the single collection into the live database:

$ pbm restore 2024-03-22T20:42:50Z --ns "sample_training.zips"
Starting restore 2024-03-22T22:23:56.715785074Z from '2024-03-22T20:42:50Z'...Restore of the snapshot from '2024-03-22T20:42:50Z' has started

You can view what is  PBM doing by running the following command:

$ pbm status -s running
Currently running:
==================
(none)

In this case, the restore process is complete; you can list the restore operations with the following command:

$ pbm list --restore
Restores history:
  2024-03-22T22:23:56.715785074Z [backup: snapshot, selective] done

You can see the restore details with this command:

$ pbm describe-restore 2024-03-22T22:23:56.715785074Z
name: "2024-03-22T22:23:56.715785074Z"
opid: 65fe04fccc46cf421780bab5
backup: "2024-03-22T20:42:50Z"
type: logical
status: done
namespaces:
- sample_training.zips
last_transition_time: "2024-03-22T22:24:05Z"
replsets:
- name: rs0
  status: done
  last_transition_time: "2024-03-22T22:24:04Z"

Finally, confirm the data was restored as expected

rs0 [direct: primary] sample_training> db.zips.find().count()
29470

Option two

Create a mongod config file for the temporary instance:

$ cat /etc/mongod_tmp.conf|grep -v "^$"|grep -v "^#"
storage:
  dbPath: /var/lib/mongodb_tmp ### Different dbPath
  journal:
    enabled: true
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod_tmp.log ### Different log file
processManagement:
  fork: true
  pidFilePath: /var/run/mongod_tmp.pid ### Different pidFilePath
  port: 27018 ### Different port
  bindIp: 127.0.0.1
replication:
  replSetName: rs0

Create the dbPath:

$ sudo mkdir /var/lib/mongodb_tmp
$ sudo chown mongod.mongod /var/lib/mongodb_tmp/

Start the temporary instance:

$ sudo -u mongod /usr/bin/mongod -f /etc/mongod_tmp.conf
about to fork child process, waiting until server is ready for connections.
forked process: 16270
child process started successfully, parent exiting

Configure PBM to run on the new instance and make sure it has point-in-time recovery (PITR) disabled. In this case, the new instance is running on port 27018.

$ pbm status
Cluster:
========
  - rs0/192.168.56.3:27018 [P]: pbm-agent v2.4.0 OK
PITR incremental backup:
========================
Status [ON]
Currently running:
==================
(none)
Backups:
========
S3 us-east-1 s3://bucket-s3/mongodb_backup/test1
  (none)

Force a sync to pull the list of backups stored on the S3 bucket:

$ pbm config --force-resync
Storage resync started

List the backups and make sure the logical backup you require is present:

$ pbm list
Backup snapshots:
  2024-03-22T20:42:50Z <logical> [restore_to_time: 2024-03-22T20:43:12Z]
  2024-03-22T21:45:35Z <physical> [restore_to_time: 2024-03-22T21:45:36Z]
PITR <off>:
  2024-03-22T20:43:13Z - 2024-03-22T20:52:58Z

Restore the collection you need to recover:

$ pbm restore 2024-03-22T20:42:50Z --ns "sample_training.zips"
Starting restore 2024-03-22T22:47:52.180513787Z from '2024-03-22T20:42:50Z'...Restore of the snapshot from '2024-03-22T20:42:50Z' has started

Export all the documents from the recovered collection:

$ mongodump --uri=$MONGODB_URI --archive=/tmp/sample_training.zips.archive.gzip --gzip --db=sample_training --collection=zips
2024-03-22T23:05:46.047+0000 WARNING: ignoring unsupported URI parameter 'replsetname'
2024-03-22T23:05:46.099+0000 writing sample_training.zips to archive '/tmp/sample_training.zips.archive.gzip'
2024-03-22T23:05:46.368+0000 done dumping sample_training.zips (29470 documents)

Import the archive file into the live database, this will append the data.

$ mongorestore --uri=$MONGODB_URI --nsInclude="sample_training.zips" --gzip --archive=/tmp/sample_training.zips.archive.gzip
2024-03-22T23:13:28.180+0000 WARNING: ignoring unsupported URI parameter 'replsetname'
2024-03-22T23:13:28.221+0000 preparing collections to restore from
2024-03-22T23:13:28.246+0000 reading metadata for sample_training.zips from archive '/tmp/sample_training.zips.archive.gzip'
2024-03-22T23:13:28.250+0000 restoring to existing collection sample_training.zips without dropping
2024-03-22T23:13:28.251+0000 restoring sample_training.zips from archive '/tmp/sample_training.zips.archive.gzip'
2024-03-22T23:13:29.934+0000 finished restoring sample_training.zips (29470 documents, 0 failures)
2024-03-22T23:13:29.935+0000 no indexes to restore for collection sample_training.zips
2024-03-22T23:13:29.935+0000 29470 document(s) restored successfully. 0 document(s) failed to restore.

Alternate solution

If the requirement is to recover the data up until the second when it was deleted, then we should do a PITR. For this option to be viable, we need to have this feature enabled in PBM. Due to the size of the database, we will need a separate server to execute the restore process. The steps to perform this are detailed on this documentation page, Make a point-in-time restore. Once you have the database restored up to the time you need it, you can export the collection documents and import them as we did on the second option.

Percona Backup for MongoDB flexibility

The flexibility that PBM offers to manage the backup and restore operations is unique, and this is just a simple scenario that PBM can help you with. It is important to understand how PBM works to be able to build strategies to meet the business needs. If you need help managing your databases, don’t hesitate to contact us, we have an excellent team of experts ready to help.

Percona Distribution for MongoDB is a source-available alternative for enterprise MongoDB. A bundling of Percona Server for MongoDB and Percona Backup for MongoDB, Percona Distribution for MongoDB combines the best and most critical enterprise components from the open source community into a single feature-rich and freely available solution.

Download Percona Distribution for MongoDB Today!

Array

Share This Post!

Subscribe
Connect with
guest
Label
0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK