Thomas Munro from EnterpriseDB on parallelism in PostgreSQL

PostgreSQL is a powerful, open-source object-relational database system. Since its introduction, it has been well-received by developers for its reliability, feature robustness, data-integrity, better licensing, and much more. However, one of its limitations has been the lack of support for parallelism, which changed in the subsequent releases.

At PostgresOpen 2018, Thomas Munro, a programmer at EnterpriseDB and PostgreSQL contributor talked about how parallelism has evolved in PostgreSQL over the years. In this article, we will see some of the key parallelism-specific features that Munro discussed in his talk.

[box type="shadow" align="" class="" width=""]

Further Learning

This article gives you a glimpse of query parallelism in PostgreSQL. If you want to explore it further along with other concepts like data replication, and database performance, check out our book Mastering PostgreSQL 11 - Second Edition by Hans-Jürgen Schönig.

This second edition of Mastering PostgreSQL 11 helps you build dynamic database solutions for enterprise applications using PostgreSQL, which enables database analysts to design both the physical and technical aspects of the system architecture with ease.

[/box]

Evolution of parallelism in PostgreSQL

PostgreSQL uses a process-based architecture instead of a thread-based one. On startup, it launches a “postmaster” process and after that creates a new process for every database session. Previously, it did not support parallelism in a single connection and each query used to run serially. The absence of “intra-query parallelism” in PostgreSQL was a huge limitation for answering the queries faster.

Parallelism here means allowing a single process to have multiple threads to query the system and utilize the increasing CPU core counts. The foundation for parallelism in PostgreSQL was laid out in the 9.4 and 9.5 releases. These came with infrastructure updates like dynamic shared memory segments, shared memory queues, and background workers. PostgreSQL 9.6 was actually the first release that came with user-visible features for parallel query execution. It supported executor nodes: gather, parallel sequential scan, partial aggregate, and finalize aggregate. However, this was not enabled by default.

Then in 2017, PostgreSQL 10 was released, which had parallelism enabled by default. It had a few more executor nodes including gather merge, parallel index scan, and parallel bitmap heap scan. Last year, PostgreSQL 11 came out with a couple of more executor nodes including parallel append and parallel hash join. It also introduced partition-wise joins and parallel CREATE INDEX.

Key parallelism-specific features in PostgreSQL

Parallel sequential scans

Parallel sequential scans was the very first feature for parallel query execution. Introduced in PostgreSQL 9.6, this scan distributes blocks of a table among different processes. This assignment is done one after the other to ensure that the access to the table remains sequential.

The processes that run in parallel and scan the tuples of a table are called parallel workers. There is one special worker called leader, which is responsible for coordinating and collecting the output of the scan from each of the worker. The leader may or may not participate in scanning the database depending on its load in dividing and combining processes.

Parallel index scan

Parallel index scan is based on the same concept as parallel sequential scan, but it involves more communication and waiting. Currently, the parallel index scans are supported only for B-Tree indexes. In a parallel index scan, index pages are scanned in parallel. Each process will scan a single index block and return all tuples referenced by that block. Meanwhile, other processes will also scan different index blocks and return the tuples. The results of a parallel B-Tree scan are then returned in sorted order.

Parallel bitmap heap scan

Again, this also has the same concept as the parallel sequential scan. Explaining the difference, Munro said, “You’ve got a big bitmap and you are skipping ahead to the pages that contain interesting tuples.” In parallel bitmap heap scan, one process is chosen as the leader, who performs a scan of one or more indexes and creates bitmap indicating which table blocks need to be visited. These table blocks are then divided among the worker processes as in a parallel sequential scan. Here the heap scan is done in parallel, but the underlying index scan is not.

Parallel joins

PostgreSQL supports all three join strategies in parallel query plans: nested loop join, hash join, or merge join. However, there is no parallelism supported in the inner loop. The entire loop is scanned as a whole, and the parallelism comes into play when each worker executes the inner loop as a whole. The results of each join are sent to gather node to produce the final results.

Nested loop join: The nested loop is the most basic way for PostgreSQL to perform a join. Though it is considered to be slow, it can be efficient if the inner side is an index scan. This is because the outer tuples and hence the loops that loop up values in the index will be divided among worker processes.

Merge join: The inner side is executed in full. It can be inefficient when sort needs to be performed because the work and resulting data are duplicated in every cooperating process.

Hash join: In this join as well, the inner side is executed in full by every worker process to build identical copies of the hash table. It is inefficient in cases when the hash table is large or the plan is expensive. However, in parallel hash join, the inner side is a parallel hash that divides the work of building a shared hash table over the cooperating processes. This is the only join in which we can have parallelism on both sides.

Partition-wise join

Partition-wise join is a new feature introduced in PostgreSQL 11. In partition-wise join, the planner knows that both sides of the join have matching partition schemes. Here a join between two similarly partitioned tables are broken down into joins between their matching partitions if there is an equi-join condition between the partition key of joining tables.

Munro explains, “It becomes parallelizable with the advent of parallel append, which can then run different branches of that query plan in different processes. But if you do that then granularity of parallelism is partitioned, which is in some ways good and in some ways bad compared to block-based granularity.”

He further adds, “It means when the last worker runs out of work to do everyone else has to wait for that before the query is finished. Whereas, if you use block-based parallelism you don’t have the problem but there are some advantages as a result of that as well.”

Parallel aggregation in PostgreSQL

Calculating aggregates can be very expensive and when evaluated in a single process it could take a considerable amount of time. This problem was solved in PostgreSQL 9.6 with the introduction of parallel aggregation. This is essentially a divide and conquer strategy where multiple workers calculate a part of aggregate before the final value based on these calculations is calculated by the leader.

This article walked you through some of the parallelism-specific features in PostgreSQL presented by Munro in his PostgresOpen 2018 talk.

If you want to get to grips with other advanced PostgreSQL features and SQL functions, do have a look at our Mastering PostgreSQL 11 - Second Edition book by Hans-Jürgen Schönig. By the end of this book, you will be able to use your database to its utmost capacity by implementing advanced administrative tasks with ease.

PostgreSQL committer Stephen Frost shares his vision for PostgreSQL version 12 and beyond

Introducing PostgREST, a REST API for any PostgreSQL database written in Haskell

Percona announces Percona Distribution for PostgreSQL to support open source databases