0

Revolutionizing PostgreSQL Schema Changes with pg-osc

 3 months ago
source link: https://www.mydbops.com/blog/postgresql-schema-changes-with-pg_osc/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Revolutionizing PostgreSQL Schema Changes with pg_osc

Revolutionizing PostgreSQL Schema Changes with pg_osc

Share this article

Email address

In the world of PostgreSQL database management, the significance of schema modifications cannot be overstated. Enter pg_osc, or PostgreSQL Online Schema Change – an open-source tool designed to revolutionize the execution of schema modifications without imposing substantial downtime or locking the entire database. This tool proves invaluable for making crucial adjustments to the database structure, ranging from adding or removing columns to changing data types and creating or dropping indexes.

Lets delve into the intricate workings of pg_osc, exploring its benefits, installation via Docker, and real-world examples showcasing its prowess.

Key Features of pg_osc

At its core, pg_osc operates by creating a shadow table to facilitate alterations to the main table. This involves the meticulous creation of a shadow table mirroring the production table with the desired structural modifications. Subsequently, the tool seamlessly copies content to the shadow table, ensuring no impact or lock on the production table. The final step involves a swift swap between the two tables once all data has been successfully transferred.

The necessity of pg_osc in PostgreSQL environments becomes apparent when considering its ability to minimize downtime, enable continuous operations during schema changes without blocking concurrent transactions, and optimize overall database performance.

Benefits of pg_osc

Here are some reasons why we need pg_osc in PostgreSQL:

  1. Minimize downtime: Traditional schema changes in PostgreSQL, like using ALTER TABLE, can require exclusive locks on the table being modified, which can lead to database downtime for the duration of the operation. pg_osc allows you to make these changes with minimal interruption to database operations.

  1. Avoid blocking: While a schema change is in progress, concurrent transactions can continue to read and write data to the affected table without being blocked, improving overall database performance and responsiveness.

  1. Performance improvements: pg_osc is designed to minimize the impact on database performance during schema changes. It can use various strategies to optimize the change process, which can be more efficient than standard ALTER TABLE operations.

  2. Safety and reliability: pg_osc is engineered to ensure the safety and consistency of schema changes, reducing the risk of data corruption or loss during the operation.

Flow Diagram of pg_osc and How It Works

The flow diagram elucidates the seamless operation of pg_osc. The primary table, shadow table, and audit table each play a crucial role in executing schema changes.

  • Primary table: The primary table is the target table for executing a possible schema change in a database.

  • Shadow table: The shadow table is a duplicated version of an existing primary table.

  • Audit table: An audit table is a table designed to store updates, inserts, or deletes made to a primary table.

PostgreSQL pg_osc

Fig: Flow Diagram of pg_osc

Key Steps in Executing Schema Changes with pg-osc

  • Create an audit table to log changes, using triggers for inserts, updates, and deletes on the parent table.

  • Acquire an ACCESS EXCLUSIVE lock on the parent table to ensure exclusive access for subsequent schema modifications.

  • Develop a new shadow table with the desired schema changes for the transition.

  • Migrate data by copying all rows from the old parent table to the new shadow table.

  • Enhance query performance by building indexes on the new shadow table.

  • Ensure historical consistency by replaying changes from the audit table to the shadow table.

  • Clean up the audit table by deleting rows that have been replayed and are no longer needed.

  • Finalize the transition by acquiring an ACCESS EXCLUSIVE Lock when delta changes are minimal.

  • Activate the new structure by swapping table names between the shadow table and the parent table.

  • Safeguard data integrity during schema transition by updating references in other tables (foreign keys) with NOT VALID constraints.

  • Optimize query performance by analyzing the new table with the ANALYZEoperation.

  • Validate NOT VALID foreign keys to ensure data integrity after the schema transition.

  • Optionally drop the old parent table if it is no longer needed in the system.

Installation via Docker

Installing pg_osc is simplified through Docker. The command docker pull shayonj/pg-osc:latest fetches the latest version, streamlining the installation process.

docker pull shayonj/pg-osc:latest

Examples with Demo:

  1. Adding a Column: Adding a column, such as member, to the team_member table is demonstrated with the following command:

    docker run --network host -it --rm shayonj/pg-osc:latest     pg-online-schema-change perform     --alter-statement 'ALTER TABLE team_member ADD COLUMN "member" BOOLEAN DEFAULT TRUE;'     --dbname "postgres" --password "password"     --host "localhost"     --username "postgres"

    Before

    pg_osc
    Before adding a column

    After (adding column member by using pg-osc)

    pg_osc
    After adding a column
  2. Renaming a Column: Renaming a column, for instance, changing birthdate to new_birthdate in the team_member table is showcased as follows:

    docker run --network host -it --rm shayonj/pg-osc:latest     pg-online-schema-change perform     --alter-statement 'ALTER TABLE team_member RENAME COLUMN birthdate TO new_birthdate;'     --dbname "postgres" --password "password"     --host "localhost"     --username "postgres"      --drop

    Note:- Here we have used –drop, that drops the duplicate table.

    Before

    pg_osc
    Before renaming a column

    After (renaming a column by using pg-osc)

    pg_osc
    After renaming a column
  3. Multiple Alter Statements: Demonstrating multiple alter statements involves adding the column permanent_member, renaming username to Name, and dropping the purchased column in the member table:

    docker run --network host -it --rm shayonj/pg-osc:latest     pg-online-schema-change perform     --alter-statement 'ALTER TABLE member ADD COLUMN Permanent_member BOOLEAN DEFAULT TRUE; ALTER TABLE member RENAME COLUMN username TO Name; ALTER TABLE member DROP column purchased;'     --dbname "postgres" --password "password"     --host "localhost"     --username "postgres"      --wait-time-for-lock 4     --kill-backends

    Before executing multiple Alter

    pg_osc
    Before executing multiple ALTER statements

    After executing multiple Alter

    pg_osc
    After executing multiple ALTER statements
  4. Kill Other Processes After 4s: When working on a busy database table, pg_osc offers the --kill-backends feature to deal with competition for locks.

    When pg-osc attempts to acquire an ACCESS EXCLUSIVE lock, it briefly holds and then releases it. If other processes are attempting to acquire the lock, you can configure pg-osc to wait for a specific duration before forcefully stopping those competing processes. This helps manage conflicts and ensures a smoother operation on a table with high activity.

    --wait-time-for-lock 4 (wait for 4 seconds)

    --kill-backends (terminate the backend process after 4 seconds)

    docker run --network host -it --rm shayonj/pg-osc:latest     pg-online-schema-change perform     --alter-statement 'ALTER TABLE member ADD COLUMN Permanent_member BOOLEAN DEFAULT TRUE; ALTER TABLE member RENAME COLUMN username TO Name; ALTER TABLE member DROP column purchased;'     --dbname "postgres" --password "password"     --host "localhost"     --username "postgres"      --wait-time-for-lock 4     --kill-backends 

    Before

    pg_osc
    Before --kill-backends

    After (using  --wait-time-for-lock 4,  --kill-backends)

    pg_osc
    After --kill-backends

Limitations

  • pg-osc does not currently support partitioned tables.

  • For schema changes with pg-osc, a primary key is necessary on the table to uniquely identify rows during the replay of changes.

  • pg-osc obtains an ACCESS EXCLUSIVE lock on the parent table twice during the schema change: first when setting up triggers and creating a shadow table, and next when performing the table swap and updating foreign key references.

  • Adequate disk space is required for the schema change operation due to the nature of duplicating a table.

  • Indexes, constraints, and sequence names are altered and lose their original names during the schema change operation.

Advanced Features of pg_osc

  • pg-osc ensures data consistency during column changes, supporting additions, drops, or renames without data loss, and maintaining data integrity through schema modifications.

  • It minimizes locks during schema changes, reducing the potential for blocking concurrent transactions and facilitating smoother database modifications.

  • The tool adeptly handles indexes and foreign keys, crucial for data integrity and optimal query performance.

  • Users have the option to either drop or retain old tables after schema changes, providing flexibility in managing historical data according to specific requirements.

  • pg-osc offers backend management features, including the termination of competing backends, similar to pg_repack, facilitating the quick setup of shadow tables during schema changes.

  • For accurate data migration, the tool backfills old and new columns as data is copied from the primary to the shadow table, ensuring correctness in the new schema structure.

  • Users can specify a custom copy statement with pg-osc, offering control and customization options during the data migration process.

In conclusion, pg_osc is a vital tool for efficient schema modifications in PostgreSQL, minimizing downtime, avoiding blocking, and ensuring data consistency. Its seamless process, advanced features, and practical examples make it an indispensable solution.

Consider harnessing the power of pg_osc for optimized PostgreSQL database management. Reach out to explore our Managed PostgreSQL Services and Consulting expertise, ensuring a smooth and efficient database evolution. Contact Us.




About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK