[box type=”note” align=”” class=”” width=””]This article is an excerpt from the book Mastering PostgreSQL 9.6 written by Hans-Jürgen Schönig. PostgreSQL is an open source database used not only for handling large datasets (Big Data) but also as a JSON document database. It finds applications in the software as well as the web domains. This book will enable you to build better PostgreSQL applications and administer databases more efficiently. [/box]
In this article, we explain the concept of Stored Procedures, and how to write them effectively in PostgreSQL 9.6.
When it comes to stored procedures, PostgreSQL differs quite significantly from other database systems. Most database engines force you to use a certain programming language to write server-side code. Microsoft SQL Server offers Transact-SQL while Oracle encourages you to use PL/SQL. PostgreSQL does not force you to use a certain language but allows you to decide on what you know best and what you like best.
The reason PostgreSQL is so flexible is actually quite interesting too in a historical sense. Many years ago, one of the most well-known PostgreSQL developers (Jan Wieck), who had written countless patches back in its early days, came up with the idea of using TCL as the server-side programming language. The trouble was simple—nobody wanted to use TCL and nobody wanted to have this stuff in the database engine. The solution to the problem was to make the language interface so flexible that basically any language can be integrated with PostgreSQL easily. Then, the CREATE LANGUAGE clause was born:
test=# h CREATE LANGUAGE Command: CREATE LANGUAGE Description: define a new procedural language Syntax: CREATE [ OR REPLACE ] [ PROCEDURAL ] LANGUAGE name CREATE [ OR REPLACE ] [ TRUSTED ] [ PROCEDURAL ] LANGUAGE name HANDLER call_handler [ INLINE inline_handler ] [ VALIDATOR valfunction ]
Nowadays, many different languages can be used to write stored procedures. The flexibility added to PostgreSQL back in the early days has really paid off, and so you can choose from a rich set of programming languages.
How exactly does PostgreSQL handle languages? If you take a look at the syntax of the CREATE LANGUAGE clause, you will see a couple of keywords:
- HANDLER: This function is actually the glue between PostgreSQL and any external language you want to use. It is in charge of mapping PostgreSQL data structures to whatever is needed by the language and helps to pass the code around.
- VALIDATOR: This is the policeman of the infrastructure. If it is available, it will be in charge of delivering tasty syntax errors to the end user. Many languages are able to parse the code before actually executing it. PostgreSQL can use that and tell you whether a function is correct or not when you create it. Unfortunately, not all languages can do this, so in some cases, you will still be left with problems showing up at runtime.
- INLINE: If it is present, PostgreSQL will be able to run anonymous code blocks in this function.
The anatomy of a stored procedure
Before actually digging into a specific language, I want to talk a bit about the anatomy of a typical stored procedure. For demo purposes, I have written a function that just adds up two numbers:
test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS ' SELECT $1 + $2; ' LANGUAGE 'sql'; CREATE FUNCTION
The first thing you can see is that the procedure is written in SQL. PostgreSQL has to know which language we are using, so we have to specify that in the definition. Note that the code of the function is passed to PostgreSQL as a string (‘). That is somewhat noteworthy because it allows a function to become a black box to the execution machinery. In other database engines, the code of the function is not a string but is directly attached to the statement. This simple abstraction layer is what gives the PostgreSQL function manager all its power.
Inside the string, you can basically use all that the programming language of your choice has to offer. In my example, I am simply adding up two numbers passed to the function. For this example, two integer variables are in use. The important part here is that PostgreSQL provides you with function overloading. In other words, the
mysum(int, int) function is not the same as the
mysum(int8, int8) function. PostgreSQL sees these things as two distinct functions. Function overloading is a nice feature; however, you have to be very careful not to accidentally deploy too many functions if your parameter list happens to change from time to time. Always make sure that functions that are not needed anymore are really deleted.
|The CREATE OR REPLACE FUNCTION clause will not change the parameter list. You can, therefore, use it only if the signature does not change. It will either error out or simply deploy a new function.|
Let’s run the function:
test=# SELECT mysum(10, 20); Mysum ------- 30 (1 row)
The result is not really surprising.
Introducing dollar quoting
Passing code to PostgreSQL as a string is very flexible. However, using single quotes can be an issue. In many programming languages, single quotes show up frequently. To be able to use quotes, people have to escape them when passing the string to PostgreSQL. For many years this has been the standard procedure. Fortunately, those old times have passed by and new means to pass the code to PostgreSQL are available:
test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS $$ SELECT $1 + $2; $$ LANGUAGE 'sql'; CREATE FUNCTION
The solution to the problem of quoting strings is called dollar quoting. Instead of using quotes to start and end strings, you can simple use $$. Currently, I am only aware of two languages that have assigned a meaning to $$. In Perl as well as in bash scripts, $$ represents the process ID. To overcome even this little obstacle, you can use $ almost anything $ to start and end the string. The following example shows how that works:
test=# CREATE OR REPLACE FUNCTION mysum(int, int) RETURNS int AS $ $ SELECT $1 + $2; $ $ LANGUAGE 'sql'; CREATE FUNCTION
All this flexibility allows you to really overcome the problem of quoting once and for all. As long as the start string and the end string match, there won’t be any problems left.
Making use of anonymous code blocks
So far, you have learned to write the most simplistic stored procedures possible, and you have learned to execute code. However, there is more to code execution than just full-blown stored procedures. In addition to full-blown procedures, PostgreSQL allows the use of anonymous code blocks. The idea is to run code that is needed only once. This kind of code execution is especially useful to deal with administrative tasks. Anonymous code blocks don’t take parameters and are not permanently stored in the database as they don’t have a name anyway.
Here is a simple example:
test=# DO $$ BEGIN RAISE NOTICE 'current time: %', now(); END; $$ LANGUAGE 'plpgsql'; NOTICE: current time: 2016-12-12 15:25:50.678922+01 CONTEXT: PL/pgSQL function inline_code_block line 3 at RAISE DO
In this example, the code only issues a message and quits. Again, the code block has to know which language it uses. The string is again passed to PostgreSQL using simple dollar quoting.
Using functions and transactions
As you know, everything that PostgreSQL exposes in user land is a transaction. The same, of course, applies if you are writing stored procedures. The procedure is always part of the transaction you are in. It is not autonomous—it is just like an operator or any other operation.
Here is an example:
All three function calls happen in the same transaction. This is important to understand because it implies that you cannot do too much transactional flow control inside a function. Suppose the second function call commits. What happens in such a case anyway? It cannot work.
However, Oracle has a mechanism that allows for autonomous transactions. The idea is that even if a transaction rolls back, some parts might still be needed and should be kept. The classical example is as follows:
- Start a function to look up secret data
- Add a log line to the document that somebody has modified this important secret data
- Commit the log line but roll back the change
- You still want to know that somebody attempted to change data
To solve problems like this one, autonomous transactions can be used. The idea is to be able to commit a transaction inside the main transaction independently. In this case, the entry in the log table will prevail while the change will be rolled back.
As of PostgreSQL 9.6, autonomous transactions are not happening. However, I have already seen patches floating around that implement this feature. We will see when these features make it to the core.
To give you an impression of how things will most likely work, here is a code snippet based on the first patches:
AS $$ DECLARE PRAGMA AUTONOMOUS_TRANSACTION; BEGIN FOR i IN 0..9 LOOP START TRANSACTION; INSERT INTO test1 VALUES (i); IF i % 2 = 0 THEN COMMIT; ELSE ROLLBACK; END IF; END LOOP; RETURN 42; END; $$;
The point in this example is that we can decide on the fly whether to commit or to roll back the autonomous transaction.
More about stored procedures – on how to write them in different languages and how you can improve their performance – can be found in the book Mastering PostgreSQL 9.6. Make sure you order your copy now!