Most people admit that backups are essential, though they also devote only a very small amount of time to thinking about the topic.
The first recipe is about understanding and controlling crash recovery. We need to understand what happens if the database server crashes, so we can understand when we might need to recover.
The next recipe is all about planning. That’s really the best place to start before you go charging ahead to do backups.
Understanding and controlling crash recovery
Crash recovery is the PostgreSQL subsystem that saves us if the server should crash, or fail as a part of a system crash.
It’s good to understand a little about it and to do what we can to control it in our favor.
How to do it…
If PostgreSQL crashes there will be a message in the server log with severity-level PANIC. PostgreSQL will immediately restart and attempt to recover using the transaction log or Write Ahead Log (WAL).
The WAL consists of a series of files written to the pg_xlog subdirectory of the PostgreSQL data directory. Each change made to the database is recorded first in WAL, hence the name “write-ahead” log. When a transaction commits, the default and safe behavior is to force the WAL records to disk. If PostgreSQL should crash, the WAL will be replayed, which returns the database to the point of the last committed transaction, and thus ensures the durability of any database changes.
Note that the database changes themselves aren’t written to disk at transaction commit. Those changes are written to disk sometime later by the “background writer” on a well-tuned server.
Crash recovery replays the WAL, though from what point does it start to recover? Recovery starts from points in the WAL known as “checkpoints”. The duration of crash recovery depends upon the number of changes in the transaction log since the last checkpoint. A checkpoint is a known safe starting point for recovery, since at that time we write all currently outstanding database changes to disk. A checkpoint can become a performance bottleneck on busy database servers because of the number of writes required. There are a number of ways of tuning that, though please also understand the effect on crash recovery that those tuning options may cause. Two parameters control the amount of WAL that can be written before the next checkpoint. The first is checkpoint_segments, which controls the number of 16 MB files that will be written before a checkpoint is triggered. The second is time-based, known as checkpoint_timeout, and is the number of seconds until the next checkpoint. A checkpoint is called whenever either of those two limits is reached.
It’s tempting to banish checkpoints as much as possible by setting the following parameters:
checkpoint_segments = 1000 checkpoint_timeout = 3600
Though if you do you might give some thought to how long the recovery will be if you do and whether you want that.
Also, you should make sure that the pg_xlog directory is mounted on disks with enough disk space for at least 3 x 16 MB x checkpoint_segments. Put another way, you need at least 32 GB of disk space for checkpoint_segments = 1000. If wal_keep_segments > 0 then the server can also use up to 16MB x (wal_keep_segments + checkpoint_segments).
How it works…
Recovery continues until the end of the transaction log. We are writing this continually, so there is no defined end point; it is literally the last correct record. Each WAL record is individually CRC checked, so we know whether a record is complete and valid before trying to process it. Each record contains a pointer to the previous record, so we can tell that the record forms a valid link in the chain of actions recorded in WAL. As a result of that, recovery always ends with some kind of error reading the next WAL record. That is normal.
Recovery performance can be very fast, though it does depend upon the actions being recovered. The best way to test recovery performance is to setup a standby replication server.
It’s possible for a problem to be caused replaying the transaction log, and for the database server to fail to start.
Some people’s response to this is to use a utility named pg_resetxlog, which removes the current transaction log files and tidies up after that surgery has taken place.
pg_resetxlog destroys data changes and that means data loss. If you do decide to run that utility, make sure you take a backup of the pg_xlog directory first. My advice is to seek immediate assistance rather than do this. You don’t know for certain that doing this will fix a problem, though once you’ve done it, you will have difficulty going backwards.
This section is all about thinking ahead and planning. If you’re reading this section before you take a backup, well done.
The key thing to understand is that you should plan your recovery, not your backup. The type of backup you take influences the type of recovery that is possible, so you must give some thought to what you are trying to achieve beforehand.
If you want to plan your recovery, then you need to consider the different types of failures that can occur. What type of recovery do you wish to perform?
You need to consider the following main aspects:
- Full/Partial database?
- Everything or just object definitions only?
- Point In Time Recovery
- Restore performance
We need to look at the characteristics of the utilities to understand what our backup and recovery options are. It’s often beneficial to have multiple types of backup to cover the different types of failure possible.
Your main backup options are
- logical backup—using pg_dump
- physical backup—file system backup
pg_dump comes in two main flavors: pg_dump and pg_dumpall. pg_dump has a -F option to produce backups in various file formats. The file format is very important when it comes to restoring from backup, so you need to pay close attention to that.
The following table shows the features available, depending upon the backup technique selected.
Table of Backup/Recovery options:
SQL dump to an archive file pg_dump -F cSQL dump to a script file pg_dump -F p or pg_dumpallFilesystem backup using pg_start_ backupBackup typeLogicalLogicalPhysicalRecover to point in time?NoNoYesBackup all databases?One at a timeYes (pg_dumpall)YesAll databases backed up at same time?NoNoYesSelective backup?YesYesNo (Note 3)Incremental backup?NoNoPossible (Note 4)Selective restore?YesPossible (Note 1)No (Note 5)DROP TABLE recoveryYes
Possible (Note 6)
DROP TABLESPACE recovery Possible (Note 2)Possible (Note 6)Possible (Note 6)Compressed backup files?YesYesYesBackup is multiple files?NoNoYesParallel backup possible?NoNoYesParallel restore possible?YesNoYesRestore to later release?YesYesNoStandalone backup?YesYesYes (Note 7)Allows DDL during backupNoNoYes
How to do it…
- If you’ve generated a script with pg_dump or pg_dumpall and need to restore just a single object, then you’re going to need to go deep. You will need to write a Perl script (or similar) to read the file and extract out the parts you want. It’s messy and time-consuming, but probably faster than restoring the whole thing to a second server, and then extracting just the parts you need with another pg_dump.
- See recipe Recovery of a dropped/damaged tablespace.
- Selective backup with physical backup is possible, though will cause later problems when you try to restore.
- Selective restore with physical backup isn’t possible with currently supplied utilities.
- See recipe for Standalone hot physical backup
How it works…
To backup all databases, you may be told you need to use the pg_dumpall utility. I have four reasons why you shouldn’t do that, which are as follows:
- If you use pg_dumpall, then the only output produced is into a script file. Script files can’t use the parallel restore feature of pg_restore, so by taking your backup in this way you will be forcing the restore to be slower than it needs to be.
- pg_dumpall produces dumps of each database, one after another. This means that:
- pg_dumpall is slower than running multiple pg_dump tasks in parallel, one against each database.
- The dumps of individual databases are not consistent to a particular point in time. If you start the dump at 04:00 and it ends at 07:00 then we’re not sure exactly when the dump relates to—sometime between 0400 and 07:00.
- Options for pg_dumpall are similar in many ways to pg_dump, though not all of them exist, so some things aren’t possible. In summary, pg_dumpall is slower to backup, slow to restore, and gives you less control over the dump. I suggest you don’t use it for those reasons. If you have multiple databases, then I suggest you take your backup by doing either.
- Dump global information for the database server using pg_dumpall -g. Then dump all databases in parallel using a separate pg_dump for each database, taking care to check for errors if they occur. Use the physical database backup technique instead.
Hot logical backup of one database
Logical backup makes a copy of the data in the database by dumping out the contents of each table.
How to do it…
The command to do this is simple and as follows:
pg_dump -F c > dumpfile or pg_dump -F c –f dumpfile
You can also do this through pgAdmin3 as shown in the following screenshot:
How it works…
pg_dump produces a single output file. The output file can use the split(1) command to separate the file into multiple pieces if required.
pg_dump into the custom format is lightly compressed by default. Compression can be removed or made more aggressive.
pg_dump runs by executing SQL statements against the database to unload data. When PostgreSQL runs an SQL statement we take a “snapshot” of currently running transactions, which freezes our viewpoint of the database. We can’t (yet) share that snapshot across multiple sessions, so we cannot run an exactly consistent pg_dump in parallel in one database, nor across many databases.
The time of the snapshot is the only time we can recover to—we can’t recover to a time either before or after that time. Note that the snapshot time is the start of the backup, not the end.
When pg_dump runs, it holds the very lowest kind of lock on the tables being dumped. Those are designed to prevent DDL from running against the tables while the dump takes place. If a dump is run at the point that other DDL are already running, then the dump will sit and wait. If you want to limit the waiting time you can do that by setting the –-lock-wait-timeout option.
pg_dump allows you to make a selective backup of tables. The -t option also allows you to specify views and sequences. There’s no way to dump other object types individually using pg_dump. You can use some supplied functions to extract individual snippets of information available at the following website:
pg_dump works against earlier releases of PostgreSQL, so it can be used to migrate data between releases.
pg_dump doesn’t generally handle included modules very well. pg_dump isn’t aware of additional tables that have been installed as part of an additional package, such as PostGIS or Slony, so it will dump those objects as well. That can cause difficulties if you then try to restore from the backup, as the additional tables may have been created as part of the software installation process in an empty server.
What time was the pg_dump taken? The snapshot for a pg_dump is taken at the beginning of a run. The file modification time will tell you when the dump finished. The dump is consistent at the time of the snapshot, so you may want to know that time. If you are making a script dump, you can do a dump verbose as follows:
which then adds the time to the top of the script. Custom dumps store the start time as well and that can be accessed using the following:
pg_restore --schema-only -v dumpfile | head | grep Started -- Started on 2010-06-03 09:05:46 BST
Note that pg_dump does not dump the roles (such as users/groups) and tablespaces. Those two things are only dumped by pg_dumpall; see the next recipes for more detailed descriptions.