5

pg_repack

 2 years ago
source link: https://www.percona.com/blog/2021/06/24/understanding-pg_repack-what-can-go-wrong-and-how-to-avoid-it/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Understanding pg_repack: What Can Go Wrong – and How to Avoid It

pg_repack is one of the oldest, widely used, extension projects for PostgreSQL. It is so much popular that even DBaaS service providers couldn’t avoid it. It is a “power tool” in the hands of a DBA to deal with bloated/fragmented tables. I can’t imagine a serious production deployment without it these days. It magically replaces the bloated, fragmented tables with a fresh fully packed table without holding an exclusive lock on the table during its processing**. This extension made PostgreSQL’s built-in commands like VACUUM FULL and CLUSTER almost useless.

**Momentarily AccessExclusive locks are required. Please see the discussion below.

But unfortunately, regular and repeated usage of pg_repack increased the comfort levels of everyone (myself included). As a consequence, it is randomly recommended as a solution for everything, as a panacea. But I recently started coming across cases like a scheduled job doing pg_repack of every table. This time it rang an alarm bell and it started to feel like overuse of antibiotics. So I thought about writing about how pg_repack works and how it can affect your system while it is running for a regular user who doesn’t want to do a detailed study. because a better understanding might help for an informed decision on where it should be used.

pg_repack has a lot of functionalities (options). In this blog post, my intention is to discuss only how it does the basic “repack”ing of a fragmented/bloated table.

Pre-Checks and Ensuring Sanity

This is to ensure that pg_repack is installed and extension is available in the database, we are running the pg_repack as a superuser, etc. It creates a provision for cleaning up temporary objects created by pg_repack. It collects metadata about tables and associated objects like Indexes, Toast, Triggers, etc. because tracking of tables and associated objects is very important. Objects like invalid indexes, conflicting triggers, etc. are checked. We are not going to discuss this in detail.

The actual processing starts with obtaining an advisory lock on the table’s OID, to make sure no other pg_repack is working on the table. This is not so much a concern with full-table repacks, but mainly so that index-only repacks don’t interfere with each other or a full-table repack. If there are other pg_repaks in progress, repack attempts may get a message as follows and exit.

Shell
ERROR: Another pg_repack command may be running on the table. Please try again later.

Create temporary objects while holding AccessExclusive lock on the table

Yes. Correct, pg_repack needs the heavyweight AccessExclusive lock. But temporarily.  pg_repack attempts to gain a table ACCESS EXCLUSIVE lock by executing a statement as follows:

Shell
LOCK TABLE <tablename> IN ACCESS EXCLUSIVE MODE
An Access Exclusive lock requires that no other sessions are accessing the table, Not even a SELECT query. pg_repack wait for a “wait-time” (60 seconds by default). This wait-time can be changed using the optional parameter --wait-timeout. Once this wait-time is over pg_repack will start trying to cancel the conflicting statements.  Users may see messages as follows for each of the attempts.
Shell
WARNING: canceling conflicted backends
WARNING: canceling conflicted backends

So the point to be noted is:

So please avoid running pg_repack when there is a lot of concurrent activities on the table. because the new sessions are allowed to get conflicting locks like ACCESS SHARE concurrently and a session waiting for ACCESS EXCLUSIVE lock may need to wait indefinitely. We should be selecting a proper time window of low activity for pg_repack work

These cancel attempts will continue for another round of “wait-timeout” seconds. But even after attempting for this second round of wait-timeout, if the AcessExclusive lock is not obtained, it escalates to termination of every conflicting session. So pg_repack will terminate sessions if the total wait time exceeds double the “wait-time”. This could be problematic for application connections that get abruptly get terminated. This can lead to outages if the application layer does not handle it gracefully.

The pg_repack may emit a message as follows:

Shell
WARNING: terminating conflicted backends
And there will be PostgreSQL log entries as follows for each of the sessions which is killed
Shell
2021-06-17 06:28:46.317 UTC [2761] FATAL: terminating connection due to administrator command
2021-06-17 06:28:46.317 UTC [2758] FATAL: terminating connection due to administrator command

So the point to be noted is: pg_repack can terminate sessions if they stand against obtaining AcessExclusive lock which can lead to unexpected outage or misbehavior of the application.

If the double the wait time is crossed, pg_repack may just proceed with terminate the sessions. But, again this behavior also can be controlled using a parameter --no-kill-backend.  if this parameter is specified, pg_repack will respect all concurrent sessions and cancel itself instead of attempting to canceling or terminating other sessions
pg_repack may emit a message as follows

Shell
WARNING: timed out, do not cancel conflicting backends
INFO: Skipping repack public.t1 due to timeout

I believe this is more desirable in production systems. So the point to be noted is:

Always remember to specify --no-kill-backend whenever you deal with critical systems

When AccessExclusive is locked, pg_repack creates all temporary objects including the substituted table with the same structure.

It creates a primary key TYPE as per the original table. For example, if we are repacking a table with a primary key of a single field “id”, the primary key type definition would look like:
Shell
CREATE TYPE repack.pk_16423 AS (id integer)
This TYPE of definition is useful because there can be composite keys. The following is an example of it dealing with tables of a composite key.
Shell
CREATE TYPE repack.pk_16824 AS (id integer, id1 integer, id2 timestamp without time zone);
Then It proceeds to create a “log” table to capture all the data changes (CDC-Change data capture) during the pg_repack operation. This table will have a primary key of BIGINT data type, Primary key type of the original table created in the above step, and “row” datatype of the table which we are repacking. This row can hold the entire tuple information of the table which we repack.
This log table definition is easy because of the TYPE definition and the “row” type. here is an example
Shell
CREATE TABLE repack.log_16423 (id bigserial PRIMARY KEY, pk repack.pk_16423, row public.t1)

Now pg_repack creates a trigger on the table to be repacked so that whenever there is DML on the table, corresponding information needs to be captured to the log table created above. This is done using an AFTER INSERT OR DELETE OR UPDATE trigger. For example:

Shell
CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON public.t1 FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('INSERT INTO repack.log_16423(pk, row) VALUES( CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.id)::repack.pk_16423) END, $2)')

Since the pg_repack is holding the AccessExclusive lock at this stage, there won’t be any concurrent DMLs at this stage, which is going to change in the following stages.

pg_repack plays an important trick at this stage before releasing AcessExclusive lock as the comment in the source code says:

Shell
/* While we are still holding an AccessExclusive lock on the table, submit
* the request for an AccessShare lock asynchronously from conn2.
* We want to submit this query in conn2 while connection's
* transaction still holds its lock, so that no DDL may sneak in
* between the time that connection commits and conn2 gets its lock.

Yes, pg_repack uses another connection to the database and sends an AccessShare lock request through that. This prevents any DDL in between the switch to AccessShare lock. The moment the main connection commits, An AccessShare lock will be granted to the second connection. So pg_repack uses two database connections to carry out the work.

But still, there is a chance that some DDL can interfere. So pg_repack kills any concurrent DDL against the table by default.

Once this stage is complete pg_repack can proceed with releasing AccessExclusive lock on the first connection by COMMITing the transaction. So that the AccessShare lock request by the second connection will be granted. This COMMIT is very special that all the log table and triggers on the table will be committed so that it is available to the entire system from this point onwards.

Copying rows/tuples to a temporary table

Copying of tuples from the table to a new table is performed with SERIALIZABLE isolation. because there shouldn’t be any inconsistency between the data getting into the log table and the temporary, substitute table which pg_repack is going to create.

Shell
BEGIN ISOLATION LEVEL SERIALIZABLE

Since the AccessExclusive lock is removed, concurrent sessions can proceed with their DMLs and Select queries. Only DDLs will be blocked. So we can say that the table is available for transactions and queries.

Prior to the data copy, the log table is truncated. because pg_repack needs only those log data that is captured from the starting of data copy.

Shell
DELETE FROM repack.log_16423

Again pg_repack will attempt to kill any session which might be waiting for doing a DDL and get an AcessShare lock on the table. Since the other connection is already holding an AccessShare lock, This can be gained without much problem.

At this stage, pg_repack creates a substitute table that is going to replace the original table with the exact same structure as the original table but without any data. It will be a CTAS statement, something like :

Shell
CREATE TABLE repack.table_16423... AS SELECT col1,col2,col3... FROM ONLY public.t1 WITH NO DATA
followed by the data copy:
Shell
INSERT INTO repack.table_16423 SELECT col1,col2,co3... FROM ONLY public.t1

Once the data copy is over, the transaction will be COMMIT ed. This completes one heaviest stage in repacking in terms of load and WAL generation

Indexes, Keys will be created at this stage on this temporary, substitute table at this stage.

Apply the CDC log to a temporary table

Please remember that pg_repack’s main connection is not holding any lock on the table at this stage (other than the second connection’s AccessShare lock).  So There is nothing blocking the transactions (DMLs) at this stage. Depending on the time it took for the data copy in the previous stage, and concurrent transactions during the data copy, There could be a lot of CDC(Change Data Capture) entries in the log file. This needs to be copied to the new temporary/substitute table.

This logic is implemented as C function in pg_repack. you may refer to the source code of repack_apply. It reads all the data from the log table and processes INSERTS, UPDATES, and DELETES. In order to speed up the repeated operations, Prepared Statements are used. Finally, all those data from the log table which is processed will be deleted from the log table.

Swapping the original table with a temporary table

This is performed by the second connection because it already holds an AccessShare lock, But it will escalate the lock to AccessExclusive lock.

Shell
LOCK TABLE public.t1 IN ACCESS EXCLUSIVE MODE

The CDC apply will be performed once again (The same “repack_apply”) while holding the AccessExclusive lock on the table. So if there is any new entry that appears in the log table, that also will be processed.

The original table and the temporary table quickly by executing repack_swap function like:

Shell
SELECT repack.repack_swap('16423');

This is the most beautiful and powerful part of pg_repack. which is implemented in a C function repack_swap. Not just tables are swapped, ownership, associated toasts (table and index), indexes, and dependencies are also swapped. Oids are swapped so that the oid of the table remains the same even after pg_repack. The Swapping work complies with a COMMIT

Final cleanup

pg_repack uses its built-in C Function repack_drop for doing the cleanup of all temporary objects. To prevent any concurrent sessions from acquiring a lock on the table which could prevent the cleanup, An AccessExclusive lock is obtained before the cleanup. This is the third time an AccessExcluive lock is placed on the table.

Summary

pg_repack is one of the most powerful, popular, and useful extensions. We encourage the usage wherever applicable with proper supervision. But please avoid over-usage. As I tried to explain,  we should expect a good amount of data movement between the original table to the temporary table, trigger writing to the log table, data copy from the log table to the temporary table, etc. So we should be expecting a higher WAL generation also. Considering all the implications, pg_repack needs to be performed on a low activity time window to avoid undesirable consequences.

Some of the important points to reiterate for end-users are:

  1. pg_repack needs to acquire heavyweight AccessExclusive lock multiple times. But Temporarily.
  2. In a high concurrency situation, it will be almost impossible to get an AccessExclusive lock
  3. pg_repack, by default, will attempt canceling the conflicting statements if it is not able to gain AcessExclusive lock-in wait-time
  4. It may proceed to terminate sessions if the total wait exceeds double the amount of wait time. This could lead to undesirable outcomes and outages.
  5. Defaults of pg_repack may not be good for critical systems. use --no-kill-backend option to make it more gentle.
  6. No DDLs are allowed against the table which is undergoing pg_repack and any session that attempts to do so might get killed.

STAY UP-TO-DATE With Percona!

p

Join 33,000+ of your fellow open-source enthusiasts! Our newsletter provides updates on Percona open source software releases, technical resources, and valuable MySQL, MariaDB, PostgreSQL, and MongoDB-related articles. Get information about Percona Live, our technical webinars, and upcoming events and meetups where you can talk with our experts.

Enter your work email address:*

By submitting my information I agree that Percona may use my personal data in send communication to me about Percona services. I understand that I can unsubscribe from the communication at any time in accordance with the Percona Privacy Policy.

Author

Jobin Augustine

Jobin Augustine is a PostgreSQL expert and Open Source advocate and has more than 19 years of working experience as consultant, architect, administrator, writer, and trainer in PostgreSQL, Oracle and other database technologies. He has always been an active participant in the Open Source communities and his main focus area is database performance and optimization. He is a contributor to various Open Source Projects and is an active blogger and loves to code in C++ and Python. Jobin holds a Masters in Computer Applications and joined Percona in 2018 as a Senior Support Engineer. Prior to joining Percona, he worked at OpenSCG for 2 years as Architect and was part of the BigSQL core team, a complete PostgreSQL distribution offering. Previous to his work at OpenSCG, Jobin worked at Dell as Database Senior Advisor for 10 years and 5 years with TCS/CMC.


Leave a Reply Cancel reply


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK