|Read more about this book|
The main tunable settings for PostgreSQL are in a plain text file named postgresql.conf that’s located at the base of the database directory structure. This will often be where $PGDATA is set to on UNIX-like systems, making the file $PGDATA/postgresql.conf on those platforms.
This article by Gregory Smith, author of PostgreSQL 9.0 High Performance, mirrors the general format of the official documentation’s look at these parameters at http://www.postgresql.org/docs/current/static/runtime-config.html. However, it is more focused on guidelines for setting the most important values, from the perspective of someone interested in performance tuning, rather than describing the meaning of every parameter. This should be considered a supplement to rather than a complete replacement for the extensive material in the manual.
General logging setup is important but it is somewhat outside the scope of this article. You may need to set parameters such as log_destination, log_directory, and log_filename to save your log files in a way compatible with the system administrations requirements of your environment. These will all be set to reasonable defaults to get started with on most systems.
On UNIX-like systems, it’s common for some of the database logging to be set in the script that starts and stops the server, rather than directly in the postgresql.conf file. If you instead use the pg_ctl command to manually start the server, you may discover that logging ends up on your screen instead. You’ll need to look at the script that starts the server normally (commonly /etc/init.d/postgresql) to determine what it does, if you want to duplicate that behavior. In most cases, you just need to add –l logfilename to the pg_ctl command line to redirect its output to the standard location.
The default log_line_prefix is empty, which is not what you want. A good starting value here is the following:
log_line_prefix=’%t:%r:%[email protected]%d:[%p]: ‘
This will put the following into every log line:
- %t: Timestamp
- %u: Database user name
- %r: Remote host connection is from
- %d: Database connection is to
- %p: Process ID of connection
It may not be obvious what you’d want all of these values for initially, particularly, the process ID. Once you’ve tried to chase down a few performance issues, the need for saving these values will be more obvious, and you’ll be glad to already have this data logged.
Another approach worth considering is setting log_line_prefix such that the resulting logs will be compatible with the pgFouine program. That is a reasonable, general purpose logging prefix, and many sites end up needing to do some sort of query analysis eventually.
The options for this setting are as follows:
- none: Do not log any statement-level information.
- ddl: Log only Data Definition Language (DDL) statements such as CREATE and DROP. This can normally be left on even in production, and is handy to catch major changes introduced accidentally or intentionally by administrators.
- mod: Log any statement that modifies a value, which is essentially everything except for simple SELECT statements. If your workload is mostly SELECT based with relatively few data changes, this may be practical to leave enabled all the time.
- all: Log every statement. This is generally impractical to leave on in production due to the overhead of the logging. However, if your server is powerful enough relative to its workload, it may be practical to keep it on all the time.
Statement logging is a powerful technique for finding performance issues. Analyzing the information saved by log_statement and related sources for statement-level detail can reveal the true source for many types of performance issues. You will need to combine this with appropriate analysis tools.
Once you have some idea of how long a typical query statement should take to execute, this setting allows you to log only the ones that exceed some threshold you set. The value is in milliseconds, so you might set:
And then you’ll only see statements that take longer than one second to run. This can be extremely handy for finding out the source of “outlier” statements that take much longer than most to execute.
If you are running 8.4 or later, you might instead prefer to use the auto_explain module: http://www.postgresql.org/docs/8.4/static/auto-explain.html instead of this feature. This will allow you to actually see why the queries that are running slowly are doing so by viewing their associated EXPLAIN plans.
Vacuuming and statistics
PostgreSQL databases require two primary forms of regular maintenance as data is added, updated, and deleted.
VACUUM cleans up after old transactions, including removing information that is no longer visible and returning freed space to where it can be re-used. The more often you UPDATE and DELETE information from the database, the more likely you’ll need a regular vacuum cleaning regime. However, even static tables with data that never changes once inserted still need occasional care here.
ANALYZE looks at tables in the database and collects statistics about them— information like estimates of how many rows they have and how many distinct values are in there. Many aspects of query planning depend on this statistics data being accurate.
As both these tasks are critical to database performance over the long-term, starting in PostgreSQL 8.1 there is an autovacuum daemon available that will run in the background to handle these tasks for you. Its action is triggered by the number of changes to the database exceeding a threshold it calculates based on the existing table size.
The parameter for autovacuum is turned on by default in PostgreSQL 8.3, and the default settings are generally aggressive enough to work out of the box for smaller database with little manual tuning. Generally you just need to be careful that the amount of data in the free space map doesn’t exceed max_fsm_pages, and even that requirement is automated away from being a concern as of 8.4.
Enabling autovacuum on older versions
If you have autovacuum available but it’s not turned on by default, which will be the case with PostgreSQL 8.1 and 8.2, there are a few related parameters that must also be enabled for it to work, as covered in http://www.postgresql.org/docs/8.1/interactive/maintenance.html or http://www.postgresql.org/docs/8.2/interactive/routine-vacuuming.html.
The normal trio to enable in the postgresql.conf file in these versions are:
Note that as warned in the documentation, it’s also wise to consider adjusting superuser_reserved_connections to allow for the autovacuum processes in these earlier versions.
The autovacuum you’ll get in 8.1 and 8.2 is not going to be as efficient as what comes in 8.3 and later. You can expect it to take some fine tuning to get the right balance of enough maintenance without too much overhead, and because there’s only a single worker it’s easier for it to fall behind on a busy server. This topic isn’t covered at length here. It’s generally a better idea to put time into planning an upgrade to a PostgreSQL version with a newer autovacuum than to try and tweak an old one extensively, particularly if there are so many other performance issues that cannot be resolved easily in the older versions, too.
A few operations in the database server need working memory for larger operations than just regular sorting. VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY all can allocate up to maintainance_work_mem worth of memory instead. As it’s unlikely that many sessions will be doing one of these operations at once, it’s possible to set this value much higher than the standard per-client work_mem setting. Note that at least autovacuum_max_workers (defaulting to 3 starting in version 8.3) will allocate this much memory, so consider those sessions (perhaps along with a session or two doing a CREATE INDEX) when setting this value.
Assuming you haven’t increased the number of autovacuum workers, a typical high setting for this value on a modern server would be at five percent of the total RAM, so that even five such processes wouldn’t exceed a quarter of available memory. This works out to approximately 50 MB of maintainance_work_mem per GB of server RAM.
PostgreSQL makes its decisions about how queries execute based on statistics collected about each table in your database. This information is collected by analyzing the tables, either with the ANALYZE statement or via autovacuum doing that step. In either case, the amount of information collected during the analyze step is set by default_statistics_target. Increasing this value makes analysis take longer, and as analysis of autovacuum happens regularly this turns into increased background overhead for database maintenance. But if there aren’t enough statistics about a table, you can get bad plans for queries against it.
The default value for this setting used to be the very low (that is,10), but was increased to 100 in PostgreSQL 8.4. Using that larger value was popular in earlier versions, too, for general improved query behavior. Indexes using the LIKE operator tended to work much better with values greater than 100 rather than below it, due to a hard-coded change at that threshold.
Note that increasing this value does result in a net slowdown on your system if you’re not ever running queries where the additional statistics result in a change to a better query plan. This is one reason why some simple benchmarks show PostgreSQL 8.4 as slightly slower than 8.3 at default parameters for each, and in some cases you might return an 8.4 install to a smaller setting. Extremely large settings for default_statistics_target are discouraged due to the large overhead they incur.
If there is just a particular column in a table you know that needs better statistics, you can use ALTER TABLE SET STATISTICS on that column to adjust this setting just for it. This works better than increasing the system-wide default and making every table pay for that requirement. Typically, the columns that really require a lot more statistics to work properly will require a setting near the maximum of 1000 (increased to 10,000 in later versions) to get a serious behavior change, which is far higher than you’d want to collect data for on every table in the database.