Categories: DatabasesNews

rob sullivan: Using pg_repack in AWS RDS from Planet PostgreSQL

3 min read

As your database keeps growing, there is a good chance you’re going to have to address database bloat.
While Postgres 13 has launched with some exciting features with built-in methods to rebuild indexes concurrently, many people still end up having to use pg_repack to do an online rebuild of the tables to remove the bloat. Customers on AWS RDS struggle figuring out how to do this. Ready to learn how?

Since you have no server to access the local binaries, and because AWS RDS provides no binaries for the versions they are using, you’re going to have to build your own. This isn’t as hard as one might think because the official pg repos have an installer (ie: sudo apt install postgresql-10-pg_repack). If you don’t use the repos, the project itself, is an open source project with directions: http://reorg.github.io/pg_repack/

While you were getting up to speed above, I was spinning up a postgres 10.9 db on RDS. I started it yesterday so that it would be ready by the time you got to this part of the post. Lets create some data:

-- let's create the table
CREATE TABLE burritos (
id SERIAL UNIQUE NOT NULL primary key,
title VARCHAR(10) NOT NULL,
toppings TEXT NOT NULL,
thoughts TEXT,
code VARCHAR(4) NOT NULL,
UNIQUE (title, toppings)
);

--disable auto vacuum
ALTER TABLE burritos SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);


-- orders up
INSERT INTO burritos (title, toppings, thoughts, code)
SELECT
left(md5(i::text), 10),
md5(random()::text),
md5(random()::text),
left(md5(random()::text), 4)
FROM GENERATE_SERIES(1, 1000000) s(i);


UPDATE burritos SET toppings = md5(random()::text) WHERE id < 250;
UPDATE burritos SET toppings = md5(random()::text) WHERE id between 250 and 500;
UPDATE burritos SET code = left(md5(random()::text), 4) WHERE id between 2050 and 5000;
UPDATE burritos SET thoughts = md5(random()::text) WHERE id between 10000 and 20000;
UPDATE burritos SET thoughts = md5(random()::text) WHERE id between 800000 and 900000;
UPDATE burritos SET toppings = md5(random()::text) WHERE id between 600000 and 700000;

(If you are curious how Magistrate presents bloat, here is a clip of the screen:)

Much like a human that has had that much interaction with burritos… our database has quite a bit of bloat. Assuming we already have the pg_repack binaries in place, either though compilation or installing the package on the OS, we now need to enable the extension. We’ve put together a handy reference for installing extensions to get you going.

pg_repack has a lot of options. Feel free to check them out, but I’m going to start packing:

/usr/local/bin/pg_repack -U greataccounthere -h bloatsy.csbv99zxhbsh.us-east-2.rds.amazonaws.com -d important -t burritos -j 4
NOTICE: Setting up workers.conns
ERROR: pg_repack failed with error: You must be a superuser to use pg_repack

This might feel like game over because of the implementation of superuser on RDS, but the trick is to take a leap of faith and add another flag (-k) that skips the superuser check:

/usr/local/bin/pg_repack-1.4.3/pg_repack -U greataccounthere -h bloatsy.csbv99zxhbsh.us-east-2.rds.amazonaws.com -k -d important -t burritos -j 4
NOTICE: Setting up workers.conns
INFO: repacking table "public.burritos"
LOG: Initial worker 0 to build index: CREATE UNIQUE INDEX index_16449 ON repack.table_16442 USING btree (id) TABLESPACE pg_default
LOG: Initial worker 1 to build index: CREATE UNIQUE INDEX index_16451 ON repack.table_16442 USING btree (title, toppings) TABLESPACE pg_default
LOG: Command finished in worker 0: CREATE UNIQUE INDEX index_16449 ON repack.table_16442 USING btree (id) TABLESPACE pg_default
LOG: Command finished in worker 1: CREATE UNIQUE INDEX index_16451 ON repack.table_16442 USING btree (title, toppings) TABLESPACE pg_default

It works! The table is feeling fresh and tidy and your application has a little more pep in its step. When using Magistrate our platform matrix also knows when you have pg_repack installed and gives you the commands to run for tables it detects with high bloat percentage.

Matthew Emerick

Share
Published by
Matthew Emerick
Tags: PostgreSQL

Recent Posts

Top life hacks for prepping for your IT certification exam

I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…

3 years ago

Learn Transformers for Natural Language Processing with Denis Rothman

Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…

3 years ago

Learning Essential Linux Commands for Navigating the Shell Effectively

Once we learn how to deploy an Ubuntu server, how to manage users, and how…

3 years ago

Clean Coding in Python with Mariano Anaya

Key-takeaways:   Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…

3 years ago

Exploring Forms in Angular – types, benefits and differences   

While developing a web application, or setting dynamic pages and meta tags we need to deal with…

3 years ago

Gain Practical Expertise with the Latest Edition of Software Architecture with C# 9 and .NET 5

Software architecture is one of the most discussed topics in the software industry today, and…

3 years ago