PostgreSQL “Point-in-time Recovery” (PITR) also called as incremental database backup , online backup or may be archive backup. At all times, PostgreSQL maintains a write ahead log (WAL) in the pg_wal/ subdirectory of the cluster’s data directory. The log records every change made to the database’s data files. This log exists primarily for crash-safety purposes: if the system crashes, the database can be restored to consistency by “replaying” the log entries made since the last checkpoint. However, the existence of the log makes it possible to use a third strategy for backing up databases: we can combine a file-system-level backup with backup of the WAL files. If recovery is needed, we restore the file system backup and then replay from the backed-up WAL files to bring the system to a current state.
This assumes that hot-backups are enabled in PostgreSQL. Ensure the following are setup in postgresql.conf on the box where the backups are located
# create directory for archive logs mkdir /pg_log_archive chown -R postgres:postgres /pg_log_archive/
We’ll create a simple directory and make sure it’s owned by ‘postgres’ user. Once this is done, we can start editing the PostgreSQL configuration.
Setting up WAL archiving
PostgreSQL stores all modifications in form of Write-Ahead Logs (WAL). These files can be used for replication, but they can also be used for PITR.
wal_level = replica archive_mode = on archive_command = 'test ! -f /pg_log_archive/%f && cp %p /pg_log_archive/%f' max_wal_senders = 5
The settings above have to be configured – you want to enable archiving (archive_mode = on), you want also to set wal_level to replica. The wal_level determines how much information is written to the WAL. The default value is replica, which writes enough data to support WAL archiving and replication, including running read-only queries on a standby server.In releases prior to 9.6, this parameter also allowed the values archive and hot_standby. These are still accepted but mapped to replica.
Once we are done with those changes, we need to restart PostgreSQL.
At this point, you should see WAL files appearing in your target directory. Please keep in mind that the process is triggered after a WAL file rotation – if you have very low traffic, it may take a while before WAL archiving will be started.
Create a database with some data.
For force archiving the logs use below command
psql -c "select pg_switch_wal(); # pg_switch_xlog(); for versions < 10
pg_wal is a PostgreSQL log file folder that use to store all data history records.Every log file can handle around 16M data, when it excess this limit, it will automatically create a new log file, filename is follow 0-9 and A-Z
Summary of PostgreSQL Backup Steps
- Modify postgresql.conf to support archive log
- Make a base backup (full database backup)
- Backup base backup to remote storage.
- Backup WAL (archive log files) to remote storage (continuous process)
Summary of PostgreSQL Point-in-time Recovery Steps
- Extract files from base backup
- Copy files from pg_wal folder
- Create recovery.conf file
- Start Recover
Create dummy tables and records
Table testPITR2 created at 19:17:16
Create a full Database backup – base backup
The next step will be to take a base backup that we’ll use as a base for our point-in-time restore. You can take it using pg_basebackup:
# pg_basebackup -l "Backup created at $(hostname) on $(date)" --progress -Ft -D /postgresbackup/
For more information on pg_basebackup click here
Refer the backup_label to get information on base backup
-bash-4.2$ pwd /var/lib/pgsql/10/data/pg_wal -bash-4.2$ ls 000000010000000000000018 00000001000000000000001D 000000010000000000000023 000000010000000000000029 00000001000000000000002F 000000010000000000000019 00000001000000000000001E 000000010000000000000024 00000001000000000000002A 000000010000000000000030 00000001000000000000001A 00000001000000000000001F 000000010000000000000025 00000001000000000000002B 000000010000000000000031 00000001000000000000001B 000000010000000000000020 000000010000000000000026 00000001000000000000002C 000000010000000000000032 00000001000000000000001C 000000010000000000000021 000000010000000000000027 00000001000000000000002D archive_status 00000001000000000000001C.00000028.backup 000000010000000000000022 000000010000000000000028 00000001000000000000002E -bash-4.2$ cat 00000001000000000000001C.00000028.backup START WAL LOCATION: 0/1C000028 (file 00000001000000000000001C) STOP WAL LOCATION: 0/1C000130 (file 00000001000000000000001C) CHECKPOINT LOCATION: 0/1C000060 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2018-11-26 19:29:08 UTC LABEL: Backup created at ip-172-31-32-96.ec2.internal on Mon Nov 26 19:29:07 UTC 2018 STOP TIME: 2018-11-26 19:29:13 UTC
Table testPITR3 & testPITR4 created at 19:43:51 & 19:44:49
Following are the backups and Wal file data we require.
a) Base backup file – pgdatabk20081125.tar
b) Log files hanv’t archive yet – all files under Pg_wal folder
c) WALs – all archive files under wals folder (may be a remote storage in real environment)
- Rename /var/lib/pgsql/10/data to /var/lib/pgsql/10/data.old .Assume database file in data folder was damaged due to disaster we created just now, we need to create a fresh database later.
- Unzip / extract files from base.tar and pg_wal.tar, create a new data directory under /var/lib/pgsql/10 . Move all extracted files from from base.tar and pg_wal.tar /var/lib/pgsql/10/data and /var/lib/pgsql/10/data/pg_wal respectively
# pwd /var/lib/pgsql/10 # mkdir data # chown -R postgres:postgres data # chmod 700 data/ # tar xvf /postgresbackup/base.tar -C /var/lib/pgsql/10/data/ # tar xvf /postgresbackup/pg_wal.tar -C /var/lib/pgsql/10/data/pg_wal/
Start the database Instance
Table testPITR2 created at 19:17:16 is restored
3. Copy log files from pg_wal folder. Some log files still located in /var/lib/pgsql/10/data.old pg_wal folder (those log files haven’t archive yet) during disaster happening, we need to copy the log file back and recover it as much as possible.
# cp /var/lib/pgsql/10/data.old/pg_wal/0* /var/lib/pgsql/10/data/pg_wal/
First of all, you need to figure out the point at which you should restore your data. Ideally, you’ll be able to give an exact time. Other options are: a named restore point (pg_create_restore_point()), transaction ID or WAL location LSN. You can try to use the pg_waldump utility to parse WAL files and print them in human-readable format, but it’s actually far from being human-readable so it can still be tricky to pinpoint the exact location.
4. Create a recovery.conf file and put it under /var/lib/pgsql/10/data
Now create a recovery command file recovery.conf in the data directory. You might also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery was successful.
Sample recovery.conf file can be found at /usr/pgsql-10/share/recovery.conf.sample
# systemctl stop postgresql-10.service # vi /var/lib/pgsql/10/data/recovery.conf
Create following content in recovery.conf
restore_command = 'cp /pg_log_archive/%f %p' recovery_target_time = '2018-11-26 19:44:49'
recovery_target_time is the time we need to recover to. Omit this setting will make PostgreSQL recover as much as it can, it may recover all changes.
Make sure that this config file permissions are properly set
# chown postgres:postgres /var/lib/pgsql/10/data/recovery.conf
Also Make sure that all files under /var/lib/pgsql/10/data/pg_wal/ have correct permissions/ownership of postgres user and group.
In the example above, we defined a restore command (simple cp from our /pg_log_archive directory into PostgreSQL pg_wal directory). We also should decide where to stop – we decided to use recovery_target_time. We can also use recovery_target_lsn , recovery_target_name & recovery_target_xid .
Above recovery.conf file will make PostgreSQL take the archive log files from /pg_log_archive/ folder and recover the data changes until 2018-11-26 19:44:49 (table testPITR4created).
5. Start database and verify the log file and data
If server is in recovery mode. You can switch it to normal operations by running:
# SELECT pg_wal_replay_resume();
You can also define in recovery.conf how PostgreSQL should behave after the recovery using:
There are three options:
- ‘pause’, the default, keeps the server in recovery mode. It is useful if you want to verify that everything is indeed in place before you allow normal traffic to hit the server and modify data.
- ‘promote’ will end the recovery mode as soon as the recovery itself completes.
- ‘shutdown’ will stop the PostgreSQL server after the recovery.
This is one time process, after recovery process started and finished, we cant make any recovery changes (like roll forward to another time).If we want to roll forward to another restore time, we need to start whole recovery process again, like extract files from base backup and copy log files.
After recovery process finished, recovery.conf will rename to recovery.done by PostgreSQL to avoid start the recovery process again.
After recovery is complete, recovery.conf file will be renamed to recovery.done. Please keep in mind that this won’t happen with:
recovery_target_action = ‘shutdown’
In this case you need to remove or rename the file manually, otherwise the server will keep restarting over and over again.
If restore is successful and everything looks good, allow your users to connect by restoring pg_hba.conf to normal.