3

PostgreSQL Partitioning using pg_partman

 5 months ago
source link: https://www.percona.com/blog/partitioning-in-postgresql-with-pg_partman-serial-based-trigger-based/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Partitioning in PostgreSQL With pg_partman (Serial-Based & Trigger-Based)

December 1, 2023

Neha Korukula

The PostgreSQL partition manager pg_partman is an open source extension widely supported and actively maintained by the PostgreSQL community. 

  • pg_partman is an extension that streamlines the creation and management of table partition sets, supporting both time-based and serial-based partitioning approaches.
  • You can use pg_partman to automate and create partitions by breaking large tables into smaller partitions, thereby enhancing performance.

In my previous blog post, PostgreSQL Partitioning Made Easy Using pg_partman (TimeBased), we discussed:

  • Installation of pg_partman
  • Native partitioning implementation using pg_partman (time-based)
  • Automatic maintenance of partitions

In this blog, we will discuss the partitioning of newly created and already existing tables via serial-based. 

Please read the previous blog post for the installation of pg_partman before continuing with this blog.

In setting up partitioning without a template table using pg_partman, you actively configure the extension to create partitions based on specific criteria.

Using pg_partman, only the range partitioning (more about types of partition) can be implemented, either time-based or serial-based, but list partitioning cannot be implemented, as it can be only predicted when a new partition has to be created with range partitioning on dates or timestamps or IDs.

Partitioning by range without template

You can create partitions based on a serial-based column using pg_partman. For example:

Step 1: Create a table with native partitioning type by range using serial-typed column.

partman=# CREATE TABLE students (id INT PRIMARY KEY , fname VARCHAR(20) , lname VARCHAR (20) , dob DATE NOT NULL ,joined DATE )
PARTITION BY RANGE (id);
CREATE TABLE

Step 2: Create parent to create initial child partitioned tables without template table.

partman=# SELECT partman.create_parent('public.students', p_control := 'id',p_type := 'native',p_interval := '100000',p_premake := 3, p_start_partition := '0');
create_parent
---------------
(1 row)

The structure of the parent table with its partitions created is as follows.

partman=# d+ students
Partitioned table "public.students"
Column | Type                  | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+-----------------------+-----------+----------+---------+----------+-------------+---
id     | integer               |           | not null |         | plain |
fname  | character varying(20) |           |          |         | extended |
lname  | character varying(20) |           |          |         | extended |
dob    | date                  |           | not null |         | plain |
joined | date                  |           |          |         | plain |
Partition key: RANGE (id)
Indexes:
         "students_pkey" PRIMARY KEY, btree (id)
Partitions: students_p100000 FOR VALUES FROM (100000) TO (200000),
            students_p200000 FOR VALUES FROM (200000) TO (300000),
            students_p300000 FOR VALUES FROM (300000) TO (400000),
            students_default DEFAULT

Here, we insert 1,000,000 rows to view data moving into partitioned tables.

partman=# INSERT INTO students (id ,fname,lname,dob ,joined) VALUES ( generate_series(1, 1000000) ,(array['Oswald', 'Henry', 'Bob', 'Vennie'])[floor(random() * 4 + 1)],
(array['Leo', 'Jack', 'Den', 'Daisy' ,'Woody'])[floor(random() * 5 + 1)], '1995-01-01'::date + trunc(random() * 366 * 3)::int,
generate_series('1/1/2022'::date, '08/31/2023'::date, '1 day'));
INSERT 0 1000000
partman=# SELECT students_p100000 , students_p200000 ,students_p300000,
students_default,students_totalcnt from ( SELECT COUNT(*) FROM students_p100000 ) AS students_p100000,
( SELECT COUNT(*) FROM students_p200000 ) AS students_p200000,( SELECT COUNT(*) FROM students_p300000) AS students_p300000,
( SELECT COUNT(*) FROM students_default) AS students_default , ( SELECT COUNT(*) FROM students ) AS students_totalcnt ;
students_p100000 | students_p200000 | students_p300000 | students_default | students_totalcnt
------------------+------------------+------------------+------------------+-------------------
(100000)         | (100000)         | (100000)         | (700000)         | (1000000)
(1 row)

Using functions, we can create child partitions and move the data from default to child tables.

partman.run_maintenance_proc → Create child partitions 

partman=# CALL partman.run_maintenance_proc();
ERROR: updated partition constraint for default partition "students_default" would be violated by some row
CONTEXT: SQL statement "ALTER TABLE public.students ATTACH PARTITION public.students_p400000 FOR VALUES FROM ('400000') TO ('500000')"

Here, we see that, while performing run_maintenance_proc to create needed partitions for existing data in default tables, it’s a violation of rows as it helps to create new partitions needed as per the data but not to insert data to new partitions. So, we can use the function partition_data_proc to move the data.

Using partman.partition_data_proc we can move data batch-wise to partitioned tables.

partman.partition_data_proc → Move data to partitioned tables

partman=# CALL partman.partition_data_proc ('public.students');'
NOTICE: Batch: 1, Rows moved: 99999 NOTICE: Batch: 2, Rows moved: 100000
NOTICE: Batch: 3, Rows moved: 100000 NOTICE: Batch: 4, Rows moved: 100000
NOTICE: Batch: 5, Rows moved: 100000 NOTICE: Batch: 6, Rows moved: 100000
NOTICE: Batch: 7, Rows moved: 100000 NOTICE: Batch: 8, Rows moved: 1
<strong>NOTICE: Total rows moved: 700000</strong>
NOTICE: Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data
partman=# VACUUM ANALYZE public.students;
VACUUM

Here, we actively move the data from the default to the newly created partitioned table.

partman=# SELECT students_p100000 , students_p200000 ,students_p300000,
students_p400000 , students_p500000 ,students_p600000, students_p700000 , students_p800000 ,students_p900000,students_default,
students_totalcnt from ( SELECT COUNT(*) FROM students_p100000 ) AS students_p100000, ( SELECT COUNT(*) FROM students_p200000 )
AS students_p200000, ( SELECT COUNT(*) FROM students_p300000) AS students_p300000, ( SELECT COUNT(*) FROM students_p400000 ) AS
students_p400000, ( SELECT COUNT(*) FROM students_p500000 ) AS students_p500000, ( SELECT COUNT(*) FROM students_p600000)
AS students_p600000, ( SELECT COUNT(*) FROM students_p700000 ) AS students_p700000, ( SELECT COUNT(*) FROM students_p800000 )
AS students_p800000, ( SELECT COUNT(*) FROM students_p900000) AS students_p900000, ( SELECT COUNT(*) FROM students_default)
AS students_default , (SELECT COUNT(*) FROM students ) AS students_totalcnt ;
students_p100000 | students_p200000 | students_p300000 | students_p400000 | students_p500000 | students_p600000 | students_p700000 |
students_p800000 | students_p900000 | <strong>students_default</strong> | students_totalcnt
------------------+------------------+------------------+------------------+------------------
(100000)         | (100000)         | (100000)         | (100000)         | (100000)         | (100000)         | (100000)      |
(100000)         | (100000)         | <strong>(0)</strong>              | (1000000)
(1 row)

Partman type partitioning

Creating the parent table with the partman option (more about parent table creation options) allows us to create trigger-based partitions using pg_partman’s method of partitioning, which has less read-write performance compared to native/declarative partitioning.

Below, let’s see the implementation steps for trigger-based partitioning.

Partition creations using pg_partman for a non-declarative table: (trigger-based)

Step 1: Create a table with partman-type range partitioning using serial-typed columns and insert random data of 5,000 rows.

partman=# CREATE TABLE donors (id INT PRIMARY KEY , name VARCHAR(20) , bloodgroup VARCHAR (15) , last_donated DATE NOT NULL, contact_num VARCHAR(10));
CREATE TABLE
partman=# INSERT INTO donors (id , name , bloodgroup , last_donated , contact_num) VALUES (generate_series(1, 5000) ,'user_' || trunc(random()*100) ,
(array['A+ group', 'A- group', 'O- group', 'O+ group','AB+ group','AB- group','B+ group','B- group'])[floor(random() * 8 + 1)] , '2022-01-01'::date + trunc(random() * 366 * 1)::int,
CAST(1000000000 + floor(random() * 9000000000) AS bigint));
INSERT 0 5000

Step 2: Create parent to create initial child partitioned tables without template table.

partman=# SELECT partman.create_parent('public.donors', p_control := 'id',p_type := 'partman',p_interval := '1000');
create_parent
---------------
(1 row)

The table structure has been created with child tables based on the default premake value of four on partman.part_config and a trigger named donors_part_trig.

partman=# d+ donors
Table "public.donors"
Column        | Type                  | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------------+-----------------------+-----------+----------+---------+----------+-----------
id            | integer               |           | not null |         | plain |
name          | character varying(20) |           |          |         | extended |
bloodgroup    | character varying(15) |           |          |         | extended |
last_donated  | date                  |           | not null |         | plain |
contact_num   | character varying(10) |           |          |         | extended |
Indexes:
       "donors_pkey" PRIMARY KEY, btree (id)
Triggers:
donors_part_trig BEFORE INSERT ON donors FOR EACH ROW EXECUTE FUNCTION donors_part_trig_func()
Child tables: donors_p1000,donors_p2000,donors_p3000,donors_p4000, donors_p5000,
              donors_p6000,donors_p7000, donors_p8000,donors_p9000
Access method: heap

Using partman.partition_data_proc, we can move data batch-wise to partitioned tables.

partman=# CALL partman.partition_data_proc ('public.donors');
NOTICE: Batch: 1, Rows moved: 999 NOTICE: Batch: 2, Rows moved: 1000
NOTICE: Batch: 3, Rows moved: 1000 NOTICE: Batch: 4, Rows moved: 1000
NOTICE: Batch: 5, Rows moved: 1000 NOTICE: Batch: 6, Rows moved: 1
NOTICE: Total rows moved: 5000
NOTICE: Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data

Consequently, upon inserting new data into the table, we notice that if the partition exists, the data seamlessly moves into the respective partitions; however, if the partition doesn’t exist, the data remains within the main table.

partman=# INSERT INTO donors (id , name , bloodgroup , last_donated , contact_num) VALUES (generate_series(5001, 7000) ,'user_' || trunc(random()*100) ,
(array['A+ group', 'A- group', 'O- group', 'O+ group','AB+ group','AB- group','B+ group','B- group'])[floor(random() * 8 + 1)] , '2022-01-01'::date + trunc(random() * 366 * 1)::int,
CAST(1000000000 + floor(random() * 9000000000) AS bigint));
INSERT 0 0

In this scenario, inserting data where the partition doesn’t exist results in directing the data to the main table. Subsequently, manual movement of this data occurs in batches to the newly created partitions.

SELECT donors_p1000,donors_p2000,donors_p3000,donors_p4000,donors_p5000,
donors_p6000,donors_p7000,donors_p8000,donors_p9000,donors_totalcnt from
( SELECT COUNT(*) FROM donors_p1000 ) AS donors_p1000, ( SELECT COUNT(*) FROM donors_p2000) AS donors_p2000, ( SELECT COUNT(*) FROM donors_p3000 ) AS donors_p3000,
( SELECT COUNT(*) FROM donors_p4000 ) AS donors_p4000,( SELECT COUNT(*) FROM donors_p5000) AS donors_p5000,( SELECT COUNT(*) FROM donors_p6000 ) AS donors_p6000,
( SELECT COUNT(*) FROM donors_p7000 ) AS donors_p7000,( SELECT COUNT(*) FROM donors_p8000) AS donors_p8000, ( SELECT COUNT(*) FROM donors_p9000) AS donors_p9000,
( SELECT COUNT(*) FROM donors ) AS donors_totalcnt ;
donors_p1000 | donors_p2000 | donors_p3000 | donors_p4000 | donors_p5000 | donors_p6000 | donors_p7000 | donors_p8000 |
donors_p9000 | donors_totalcnt
--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+-----------------
(1000)       | (1000)       | (1000)       | (1000)       | (1000)       | (1000)       | (0)          | (0)          |
(0)          | (6000)
(1 row)
partman=# INSERT INTO donors (id , name , bloodgroup , last_donated , contact_num) VALUES (generate_series(14001, 15000) ,'user_' || trunc(random()*100) ,
(array['A+ group', 'A- group', 'O- group', 'O+ group','AB+ group','AB- group','B+ group','B- group'])[floor(random() * 8 + 1)] , '2022-01-01'::date + trunc(random() * 366 * 1)::int,
CAST(1000000000 + floor(random() * 9000000000) AS bigint));
INSERT 0 1000
partman=# CALL partman.partition_data_proc('public.donors');
NOTICE: Batch: 1, Rows moved: 999
NOTICE: Batch: 2, Rows moved: 1
NOTICE: Total rows moved: 1000
NOTICE: Ensure to VACUUM ANALYZE the parent (and source table if used) after partitioning data
partman=# VACUUM ANALYZE donors;
VACUUM

Employing functions to proactively create partitioned tables and migrate data enables the proactive planning of partitions ahead of inserts, preventing data accumulation in default partitions or the main table.

The blog I mentioned earlier contains details about the steps for automatic partitioning.

For more details on pg_partman and its latest updates, check out https://www.postgresql.org/about/news/pg_partman-470-released-2495/.

References:

Performing ETL Using Inheritance in PostgreSQL

PostgreSQL Partitioning Using Traditional Methods

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Share This Post!

Subscribe

Connect with
guest

Label

0 Comments

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK