Postgresql备份与增量恢复

之前，我们在《Postgresql主从异步流复制方案》一节中，部署了Postgresql的主从异步流复制环境。主从复制的目的是为了实现数据的备份，实现数据的高可用性和容错行。下面主要简单地介绍下我们运维Postgresql数据库时的场景备份与恢复方案。

增量备份

PostgreSQL在做写入操作时，对数据文件做的任何修改信息，首先会写入WAL日志（预写日志），然后才会对数据文件做物理修改。当数据库服务器掉重启时，PostgreSQL在启动时会首先读取WAL日志，对数据文件进行恢复。因此，从理论上讲，如果我们有一个数据库的基础备份（也称为全备），再配合WAL日志，是可以将数据库恢复到任意时间点的。

上面的知识点很重要，因为我们场景的增量备份说白了就是通过 基础备份 + 增量WAL日志 进行 重做 恢复的。

增量备份设置

为了演示相关功能，我们基于《Postgresql主从异步流复制方案》一节中的环境pghost1服务器上，创建相关管理目录

切换到 postgres 用户下

mkdir -p /data/pg10/backups
mkdir -p /data/pg10/archive_wals

backups目录则可以用来存放基础备份

archive_wals目录自然用来存放归档了

接下来我们修改我们的postgresql.conf文件的相关设置

wal_level = replica

archive_mode = on

archive_command = '/usr/bin/lz4 -q -z %p /data/pg10/archive_wals/%f.lz4'

archive_command 参数的默认值是个空字符串，它的值可以是一条shell命令或者一个复杂的shell脚本。

在archive_command的shell命令或脚本中可以用 %p 表示将要归档的WAL文件的包含完整路径信息的文件名，用 %f 代表不包含路径信息的WAL文件的文件名。

修改wal_level和archive_mode参数都需要重新启动数据库才可以生效，修改archive_command不需要重启，只需要reload即可，例如：

postgres=# SELECT pg_reload_conf();

postgres=# show archive_command ;

创建基础备份

我们使用之前介绍过的pg_basebackup命令进行基础备份的创建，基础备份很重要，我们的数据恢复不能没有它，建议我们根据相关业务策略，周期性生成我们的基础备份。

$ pg_basebackup -Ft -Pv -Xf -z -Z5 -p 25432 -D /data/pg10/backups/

这样，我们就成功生成我们的基础数据备份了

设置还原点

一般我们需要根据重要事件发生时创建一个还原点，通过基础备份和归档恢复到事件发生之前的状态。

创建还原点的系统函数为：pg_create_restore_point，它的定义如下：

postgres=#  SELECT pg_create_restore_point('domac-201810141800');

恢复到指定还原点

接下来，我们通过一个示例，让我们的数据还原到我们设置的还原点上

首先，我们创建一张测试表：

CREATE TABLE test_restore(
    id SERIAL PRIMARY KEY,
    ival INT NOT NULL DEFAULT 0,
    description TEXT,
    created_time TIMESTAMPTZ NOT NULL DEFAULT now()
);

初始化一些测试数据作为基础数据，如下所示：

postgres=# INSERT INTO test_restore (ival) VALUES (1);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (2);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (3);
INSERT 0 1
postgres=# INSERT INTO test_restore (ival) VALUES (4);
INSERT 0 1

postgres=# select * from test_restore;
 id | ival | description |         created_time
----+------+-------------+-------------------------------
  1 |    1 |             | 2018-10-14 11:13:41.57154+00
  2 |    2 |             | 2018-10-14 11:13:44.250221+00
  3 |    3 |             | 2018-10-14 11:13:46.311291+00
  4 |    4 |             | 2018-10-14 11:13:48.820479+00
(4 rows)

并且按照上文的方法创建一个基础备份。如果是测试，有一点需要注意，由于WAL文件是写满16MB才会进行归档，测试阶段可能写入会非常少，可以在执行完 基础备份 之后，手动进行一次WAL切换。例如：

postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/1D01B858
(1 row)

或者通过设置archive_timeout参数，在达到timeout阈值时强行切换到新的WAL段。

接下来，创建一个还原点，如下所示：

postgres=# select pg_create_restore_point('domac-1014');
 pg_create_restore_point
-------------------------
 0/1E0001A8
(1 row)

接下来我们对数据做一些变更, 我们删除test_restore的所有数据：

postgres=# delete from test_restore;
DELETE 4

下面进行恢复到名称为“domac-1014”还原点的实验，如下所示：

停止数据库

$ pg_ctl stop -D /data/pg10/db

移除旧的数据目录

$ rm -rf /data/pg10/db

$ mkdir db && chmod 0700 db

$ tar -xvf /data/pg10/backups/base.tar.gz -C /data/pg10/db

cp $PGHOME/share/recovery.conf.sample /pgdata/10/data/recovery.conf

chmod 0600 /pgdata/10/data/recovery.conf

修改 recovery.conf, 修改以下配置信息：

restore_command = '/usr/bin/lz4 -d /data/pg10/archive_wals/%f.lz4 %p'
recovery_target_name = 'domac-1014

然后启动数据库进入恢复状态，观察日志，如下所示：

bash-4.2$ pg_ctl start -D /data/pg10/db
waiting for server to start....2018-10-14 11:26:56.949 UTC [8397] LOG:  listening on IPv4 address "0.0.0.0", port 25432
2018-10-14 11:26:56.949 UTC [8397] LOG:  listening on IPv6 address "::", port 25432
2018-10-14 11:26:56.952 UTC [8397] LOG:  listening on Unix socket "/tmp/.s.PGSQL.25432"
2018-10-14 11:26:56.968 UTC [8398] LOG:  database system was interrupted; last known up at 2018-10-14 09:26:59 UTC
2018-10-14 11:26:57.049 UTC [8398] LOG:  starting point-in-time recovery to "domac-1014"
/data/pg10/archive_wals/00000002.history.lz4: No such file or directory
2018-10-14 11:26:57.052 UTC [8398] LOG:  restored log file "00000002.history" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.077 UTC [8398] LOG:  restored log file "000000020000000000000016" from archive
2018-10-14 11:26:57.191 UTC [8398] LOG:  redo starts at 0/16000060
2018-10-14 11:26:57.193 UTC [8398] LOG:  consistent recovery state reached at 0/16000130
2018-10-14 11:26:57.193 UTC [8397] LOG:  database system is ready to accept read only connections
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.217 UTC [8398] LOG:  restored log file "000000020000000000000017" from archive
 done
server started
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.384 UTC [8398] LOG:  restored log file "000000020000000000000018" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.513 UTC [8398] LOG:  restored log file "000000020000000000000019" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.699 UTC [8398] LOG:  restored log file "00000002000000000000001A" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.805 UTC [8398] LOG:  restored log file "00000002000000000000001B" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:57.982 UTC [8398] LOG:  restored log file "00000002000000000000001C" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.116 UTC [8398] LOG:  restored log file "00000002000000000000001D" from archive
/data/pg10/archive_w : decoded 16777216 bytes
2018-10-14 11:26:58.310 UTC [8398] LOG:  restored log file "00000002000000000000001E" from archive
2018-10-14 11:26:58.379 UTC [8398] LOG:  recovery stopping at restore point "domac-1014", time 2018-10-14 11:17:20.680941+00
2018-10-14 11:26:58.379 UTC [8398] LOG:  recovery has paused
2018-10-14 11:26:58.379 UTC [8398] HINT:  Execute pg_wal_replay_resume() to continue.

重启后，我们对test_restore表进行查询，看数据是否正常恢复：

postgres=# select * from test_restore;
 id | ival | description |         created_time
----+------+-------------+-------------------------------
  1 |    1 |             | 2018-10-14 11:13:41.57154+00
  2 |    2 |             | 2018-10-14 11:13:44.250221+00
  3 |    3 |             | 2018-10-14 11:13:46.311291+00
  4 |    4 |             | 2018-10-14 11:13:48.820479+00
(4 rows)

可以看到数据已经恢复到指定的还原点： domac-1014 。

这时，recovery.conf可以移除，避免下次数据重启，数据再次恢复到该还原点

总结

备份和恢复是数据库管理中非常重要的工作，日常运维中，我们需要根据需要进行相关策略的备份，并且周期性地进行恢复测试，保证数据的安全。

增量备份

增量备份设置

创建基础备份

设置还原点

恢复到指定还原点

总结

Recommend

Upload image to server using URLSessionUploadTask

jspenguin / make-webext · GitLab

GitHub - explosion/wheelwright: ? Automated build repo for Python wheels

SAMSUNG 三星 Galaxy S9+ 智能手机 6GB+128GB 5499元包邮_国美优惠

15日10点:imi`s 爱美丽 IM23AIC1 女士低腰提臀平角内裤 *2件 76.5元（合38.25元/件）_...

来到这世界很无奈

深度 | 家居产业能不能产生伟大的公司？

直客无车承运平台成下一个投资风口

沃尔玛收购在线零售商Bare Necessities

Docker使用

About Joyk