Most people admit that backups are essential, though they also devote only a very small amount of time to thinking about the topic.
The first recipe is about understanding and controlling crash recovery. We need to understand what happens if the database server crashes, so we can understand when we might need to recover.
The next recipe is all about planning. That’s really the best place to start before you go charging ahead to do backups.
Crash recovery is the PostgreSQL subsystem that saves us if the server should crash, or fail as a part of a system crash.
It’s good to understand a little about it and to do what we can to control it in our favor.
If PostgreSQL crashes there will be a message in the server log with severity-level PANIC. PostgreSQL will immediately restart and attempt to recover using the transaction log or Write Ahead Log (WAL).
The WAL consists of a series of files written to the pg_xlog subdirectory of the PostgreSQL data directory. Each change made to the database is recorded first in WAL, hence the name “write-ahead” log. When a transaction commits, the default and safe behavior is to force the WAL records to disk. If PostgreSQL should crash, the WAL will be replayed, which returns the database to the point of the last committed transaction, and thus ensures the durability of any database changes.
Note that the database changes themselves aren’t written to disk at transaction commit. Those changes are written to disk sometime later by the “background writer” on a well-tuned server.
Crash recovery replays the WAL, though from what point does it start to recover? Recovery starts from points in the WAL known as “checkpoints”. The duration of crash recovery depends upon the number of changes in the transaction log since the last checkpoint. A checkpoint is a known safe starting point for recovery, since at that time we write all currently outstanding database changes to disk. A checkpoint can become a performance bottleneck on busy database servers because of the number of writes required. There are a number of ways of tuning that, though please also understand the effect on crash recovery that those tuning options may cause. Two parameters control the amount of WAL that can be written before the next checkpoint. The first is checkpoint_segments, which controls the number of 16 MB files that will be written before a checkpoint is triggered. The second is time-based, known as checkpoint_timeout, and is the number of seconds until the next checkpoint. A checkpoint is called whenever either of those two limits is reached.
It’s tempting to banish checkpoints as much as possible by setting the following parameters:
checkpoint_segments = 1000 checkpoint_timeout = 3600
Though if you do you might give some thought to how long the recovery will be if you do and whether you want that.
Also, you should make sure that the pg_xlog directory is mounted on disks with enough disk space for at least 3 x 16 MB x checkpoint_segments. Put another way, you need at least 32 GB of disk space for checkpoint_segments = 1000. If wal_keep_segments > 0 then the server can also use up to 16MB x (wal_keep_segments + checkpoint_segments).
Recovery continues until the end of the transaction log. We are writing this continually, so there is no defined end point; it is literally the last correct record. Each WAL record is individually CRC checked, so we know whether a record is complete and valid before trying to process it. Each record contains a pointer to the previous record, so we can tell that the record forms a valid link in the chain of actions recorded in WAL. As a result of that, recovery always ends with some kind of error reading the next WAL record. That is normal.
Recovery performance can be very fast, though it does depend upon the actions being recovered. The best way to test recovery performance is to setup a standby replication server.
It’s possible for a problem to be caused replaying the transaction log, and for the database server to fail to start.
Some people’s response to this is to use a utility named pg_resetxlog, which removes the current transaction log files and tidies up after that surgery has taken place.
pg_resetxlog destroys data changes and that means data loss. If you do decide to run that utility, make sure you take a backup of the pg_xlog directory first. My advice is to seek immediate assistance rather than do this. You don’t know for certain that doing this will fix a problem, though once you’ve done it, you will have difficulty going backwards.
This section is all about thinking ahead and planning. If you’re reading this section before you take a backup, well done.
The key thing to understand is that you should plan your recovery, not your backup. The type of backup you take influences the type of recovery that is possible, so you must give some thought to what you are trying to achieve beforehand.
If you want to plan your recovery, then you need to consider the different types of failures that can occur. What type of recovery do you wish to perform?
You need to consider the following main aspects:
We need to look at the characteristics of the utilities to understand what our backup and recovery options are. It’s often beneficial to have multiple types of backup to cover the different types of failure possible.
Your main backup options are
pg_dump comes in two main flavors: pg_dump and pg_dumpall. pg_dump has a -F option to produce backups in various file formats. The file format is very important when it comes to restoring from backup, so you need to pay close attention to that.
The following table shows the features available, depending upon the backup technique selected.
Table of Backup/Recovery options:
SQL dump to an archive file pg_dump -F cSQL dump to a script file pg_dump -F p or pg_dumpallFilesystem backup using pg_start_ backupBackup typeLogicalLogicalPhysicalRecover to point in time?NoNoYesBackup all databases?One at a timeYes (pg_dumpall)YesAll databases backed up at same time?NoNoYesSelective backup?YesYesNo (Note 3)Incremental backup?NoNoPossible (Note 4)Selective restore?YesPossible (Note 1)No (Note 5)DROP TABLE recoveryYes
Yes
Possible (Note 6)
DROP TABLESPACE recovery Possible (Note 2)Possible (Note 6)Possible (Note 6)Compressed backup files?YesYesYesBackup is multiple files?NoNoYesParallel backup possible?NoNoYesParallel restore possible?YesNoYesRestore to later release?YesYesNoStandalone backup?YesYesYes (Note 7)Allows DDL during backupNoNoYes
To backup all databases, you may be told you need to use the pg_dumpall utility. I have four reasons why you shouldn’t do that, which are as follows:
Logical backup makes a copy of the data in the database by dumping out the contents of each table.
The command to do this is simple and as follows:
pg_dump -F c > dumpfile or pg_dump -F c –f dumpfile
You can also do this through pgAdmin3 as shown in the following screenshot:
pg_dump produces a single output file. The output file can use the split(1) command to separate the file into multiple pieces if required.
pg_dump into the custom format is lightly compressed by default. Compression can be removed or made more aggressive.
pg_dump runs by executing SQL statements against the database to unload data. When PostgreSQL runs an SQL statement we take a “snapshot” of currently running transactions, which freezes our viewpoint of the database. We can’t (yet) share that snapshot across multiple sessions, so we cannot run an exactly consistent pg_dump in parallel in one database, nor across many databases.
The time of the snapshot is the only time we can recover to—we can’t recover to a time either before or after that time. Note that the snapshot time is the start of the backup, not the end.
When pg_dump runs, it holds the very lowest kind of lock on the tables being dumped. Those are designed to prevent DDL from running against the tables while the dump takes place. If a dump is run at the point that other DDL are already running, then the dump will sit and wait. If you want to limit the waiting time you can do that by setting the –-lock-wait-timeout option.
pg_dump allows you to make a selective backup of tables. The -t option also allows you to specify views and sequences. There’s no way to dump other object types individually using pg_dump. You can use some supplied functions to extract individual snippets of information available at the following website:
https://www.postgresql.org/docs/9.0/static/functions-info.html#FUNCTIONS-INFO-CATALOG-TABLE
pg_dump works against earlier releases of PostgreSQL, so it can be used to migrate data between releases.
pg_dump doesn’t generally handle included modules very well. pg_dump isn’t aware of additional tables that have been installed as part of an additional package, such as PostGIS or Slony, so it will dump those objects as well. That can cause difficulties if you then try to restore from the backup, as the additional tables may have been created as part of the software installation process in an empty server.
What time was the pg_dump taken? The snapshot for a pg_dump is taken at the beginning of a run. The file modification time will tell you when the dump finished. The dump is consistent at the time of the snapshot, so you may want to know that time. If you are making a script dump, you can do a dump verbose as follows:
pg_dump -v
which then adds the time to the top of the script. Custom dumps store the start time as well and that can be accessed using the following:
pg_restore --schema-only -v dumpfile | head | grep Started -- Started on 2010-06-03 09:05:46 BST
Note that pg_dump does not dump the roles (such as users/groups) and tablespaces. Those two things are only dumped by pg_dumpall; see the next recipes for more detailed descriptions.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…