PostgreSQL Backups: What is pg_basebackup?
- pg_basebackup is a simple yet powerful utility provided by PostgreSQL for taking online and consistent file system-level backups.
- The backups created using this command include the actual database files, allowing for straightforward restoration: simply uncompress the backup files and copy them to the database data directory.
- This utility can create base backups of your database cluster, which can be used for point-in-time recovery. Additionally, pg_basebackup supports setting up a standby replica database server using its streaming option.
- In this guide, we will explore the process of creating backups using the pg_basebackup utility and delve into the various options available for this task.
Backup Process for a PostgreSQL Cluster
You can run the following simple command to take the base backup of your database.
$ pg_basebackup -D /tmp/backups -x
- pg_basebackup: This is the command to initiate the base backup process.
- -D /tmp/backups: This option specifies the directory where the backup will be stored. In this case, the backup is being directed to the /tmp/backups directory.
- -x: This option includes WAL (Write-Ahead Logging) files in the backup. Including WAL files ensures that the backup is consistent and can be used to restore the database to a consistent state.
Prerequisites
- Before we start taking the backup and run the above command, we need to make some changes in the postgresql.conf file so our database can run properly during the backup process. Change the following values in the postgresql.conf file.
wal_level = archive max_wal_senders = 4
- Here, WAL stands for write-ahead logs (transaction logs) for PostgreSQL. By changing the value of wal_level to archive, the backup command will generate the logs in the format that are compatible with pg_basebackup and other replication-based backup tools.
- The value of max_wal_senders will determine how many clients can connect with the database and request the data files.
- Apart from this change, we need to make one more change in the pg_hba.conf file adding the following line:
local replication postgres peer
Local replication postgres peer
- Labeling backups is a good practice for organizing and identifying different backup sets. It helps in distinguishing between backups taken at different points in time, serving as a reference when you need to restore data.
- In PostgreSQL, labeling backups typically involves adding a timestamp or some other descriptive identifier to the backup filenames or directory names.
$ pg_basebackup -D /tmp/backups -x -l "This backup was created at $(hostname) on $(date)"
Viewing the Backup Progress
$ pg_basebackup -D /tmp/backups -x -P
- when you run this command, you’ll see progress information indicating how much of the backup process has been completed, such as the amount of data transferred and the transfer rate.
- This is useful for tracking the progress of the backup operation and estimating how long it might take to complete.
Creating Backups from a Remote Server
- This command can also be used to take the backups of remote PostgreSQL clusters. You can provide the hostname and port number by specifying the following options in your command.
$ pg_basebackup -D /tmp/backups -x -h 10.0.2.15 -p 5432 -U postgres
Creating gzipped tar Backup
- If you create the backup in plain text format, the backup folder will occupy a lot of space on your server. Instead of using plain text format, you can generate backup files in tar format, which is more space-efficient.
- Additionally, you can use the
-z
flag to compress the backup, further reducing the required storage space. Here is the command for creating a compressed backup in tar format:
 $ pg_basebackup -D /tmp/backups -x -Ft -z
Creating a Backup by Streaming the Transaction Logs
- The
pg_basebackup
command can stream the transaction logs (WAL) in parallel while taking the backup. This is particularly useful when new data is being added to the database during the backup process. Once the backup is complete, you can simply extract the backup and start using it as a fully functional database. - When you specify this option, the
pg_basebackup
command will open two connections to the server: one to request the data files and another to stream the WAL logs in parallel. Therefore, ensure that themax_wal_senders
value is greater than 2 in your PostgreSQL configuration file.
$ pg_basebackup -D /tmp/backups -X stream -Ft -z
Recovery process of PostgreSQL cluster
- First of all, stop your database server to start the recovery process.
$ pg_ctlcluster 10.6 main stop
- Now, move the current database files to some other folder to keep this as a backup.
$ mv main main.backup
- Copy the archived base backup files to the data directory of the database.
$ cp -r /tmp/backups main
- Start the database server now to perform the recovery.
$ pg_ctlcluster 10.6 main start
- If you want, you can monitor the recovery process by checking the logs file.
$ tail -f /var/log/postgresql/postgresql-10.6-main.log
Recovery process is fairly simple as we copied actual data files during the backup process. So, you just have to extract the backup files and move it to the data directory of the database.