Securing the WAL Stream

July 15, 2014 - 12:00 am

1541

6 min read

(For more resources related to this topic, see here.)

The primary mechanism that PostgreSQL uses to provide a data durability guarantee is through its Write Ahead Log (WAL). All transactional data is written to this location before ever being committed to database files. Once WAL files are no longer necessary for crash recovery, PostgreSQL will either delete or archive them. For the purposes of a highly available server, we recommend that you keep these important files as long as possible. There are several reasons for this; they are as follows:

Archived WAL files can be used for Point In Time Recovery (PITR)
If you are using streaming replication, interrupted streams can be re-established by applying WAL files until the replica has caught up
WAL files can be reused to service multiple server copies

Learn Programming & Development with a Packt Subscription

In order to gain these benefits, we need to enable PostgreSQL WAL archiving and save these files until we no longer need them. This section will address our recommendations for long term storage of WAL files.

Getting ready

In order to properly archive WAL files, we recommend that you provision a server dedicated to backups or file storage. Depending on the transaction volume, an active PostgreSQL database might produce thousands of these on a daily basis. At 16 MB apiece, this is not an idle concern. For instance, for a 1 TB database, we recommend at least 3 TB of storage space.

In addition, we will be using rsync as a daemon on this archive server. To install this on a Debian-based server, execute the following command as a root-level user:

sudo apt-get install rsync

Red-Hat-based systems will need this command instead:

sudo yum install rsync xinetd

How to do it…

Our archive server has a 3 TB mount at the /db directory and is named arc_server on our network. The PostgreSQL source server resides at 192.168.56.10. Follow these steps for long-term storage of important WAL files on an archive server

Enable rsync to run as a daemon on the archive server.
On Debian based systems, edit the /etc/default/rsync file and change the RSYNC_ENABLE variable to true.
On Red-Hat-based systems, edit the /etc/xinet.d/rsync file and change the disable parameter to no.
Create a directory to store archived WAL files as the postgres user with these commands:
```
sudo mkdir /db/pg_archived
sudo chown postgres:postgres /db/pg_archived
```

Create a file named /etc/rsyncd.conf and fill it with the following contents:

[wal_store]
    path = /db/pg_archived
    comment = DB WAL Archives
    uid = postgres
    gid = postgres
    read only = false
    hosts allow = 192.168.56.10
    hosts deny = *

Start the rsync daemon.
Debian-based systems should execute the following command: sudo service rsync start
Red-Hat-based systems can start rsync with this command instead: sudo service xinetd start
Change the archive_mode and archive_command parameters in postgresql.conf to read the following:
```
archive_mode = on
archive_command = 'rsync -aq %p arc_server::wal_store/%f'
```
Restart the PostgreSQL server with a command similar to this:
```
pg_ctl -D $PGDATA restart
```

How it works

The rsync utility is normally used to transfer files between two servers. However, we can take advantage of using it as a daemon to avoid connection overhead imposed by using SSH as an rsync protocol. Our first step is to ensure that the service is not disabled in some manner, which would make the rest of this guide moot.

Next, we need a place to store archived WAL files on the archive server. Assuming that we have 3 TB of space in the /db directory, we simply claim /db/pg_archived as the desired storage location. There should be enough space to use /db for backups as well, but we won’t discuss that here.

Next, we create a file named /etc/rsyncd.conf, which will configure how rsync operates as a daemon. Here, we name the /db/pg_archived directory wal_store so that we can address the path by its name when sending files. We give it a human-readable name and ensure that files are owned by the postgres user, as this user also controls most of the PostgreSQL-related services.

The next, and possibly the most important step, is to block all hosts but the primary PostgreSQL server from writing to this location. We set hosts deny to *, which blocks every server. Then, we set hosts allow to the primary database server’s IP address so that only it has access. If everything goes well, we can start the rsync (or xinetd on Red Hat systems) service and we can see that in the following screenshot:

PostgreSQL 9 High Availability Cookbook

Next, we enable archive_mode by setting it to on. With archive mode enabled, we can specify a command that will execute when PostgreSQL no longer needs a WAL file for crash recovery. In this case, we invoke the rsync command with the -a parameter to preserve elements such as file ownership, timestamps, and so on.

In addition, we specify the -q setting to suppress output, as PostgreSQL only checks the command exit status to determine its success. In the archive_command setting, the %p value represents the full path to the WAL file, and %f resolves to the filename. In this context, we’re sending the WAL file to the archive server at the wal_store module we defined in rsyncd.conf.

Once we restart PostgreSQL, it will start storing all the old WAL files by sending them to the archive server.

In case any rsync command fails because the archive server is unreachable, PostgreSQL will keep trying to send it until it is successful. If the archive server is unreachable for too long, we suggest that you change the archive_command setting to store files elsewhere. This prevents accidentally overfilling the PostgreSQL server storage.

There’s more…

As we will likely want to use the WAL files on other servers, we suggest that you make a list of all the servers that could need WAL files. Then, modify the rsyncd.conf file on the archive server and add this section:

[wal_fetch]
    path = /db/pg_archived
    comment = DB WAL Archive Retrieval
    uid = postgres
    gid = postgres
    read only = true
    hosts allow = host1, host2, host3
    hosts deny = *

Now, we can fetch WAL files from any of the hosts in hosts allow. As these are dedicated PostgreSQL replicas, recovery servers, or other defined roles, this makes the archive server a central location for all our WAL needs.

Summary

In this article, we’ve successfully learned how to secure the WAL stream by following the given steps.

Resources for Article:

Further resources on this subject:

PostgreSQL 9: Reliable Controller and Disk Setup [article]
Backup in PostgreSQL 9 [article]
Recovery in PostgreSQL 9 [article]