4

Quick Data Archival in MySQL Using Partitions

 1 year ago
source link: https://www.percona.com/blog/quick-data-archival-in-mysql-using-partitions/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Quick Data Archival in MySQL Using Partitions

Quick Data Archival in MySQL Using Partitions

Space constraint has been an endless and painstaking process for many of us, especially in systems that have a high number of transactions and data growth exceeding hundreds of GBs in a matter of days. In this blog, I will share a solution to remove this space and remove rows from tables in a few seconds regardless of the size of a table without causing any additional load on the database using table partitions.

The first approach that comes to anyone’s mind for deleting the row is using a DELETE query in SQL. Suppose, one wants to delete rows from a table that are older than one year—the query for such operations would be like this:

MySQL
DELETE FROM salaries WHERE from_date <DATE_SUB(NOW(),INTERVAL 1 YEAR);

The above query is pretty straightforward but there are a few caveats:

  • Server business will grow exponentially and could impact the usual traffic on the server.
  • To speed up the above query we need to create an appropriate index so that the query can be executed in minimal time and have less impact on server performance.
  • If we are using binlog_format as ROW, a huge number of bin logs would be created which could choke I/O of servers and require extra cleanup.
  • Even after deleting all the rows, space won’t be freed. MySQL won’t shrink tablespace and storage won’t be released to the file system. To release it to the file system, we need to recreate the table by running ANALYZE or an ALTER.

One way to get around this is using the Percona pt-archiver. There is an excellent blog you may want to follow for mysql data archival with minimal disruption. But the archiver process will take time as it also considers system load, replica lag, and specified parameters to throttle the archiving process without affecting ongoing traffic.

What I propose here is using MySQL partitioning, which is a much faster approach.

What is partitioning?

In MySQL, the InnoDB storage engine has long supported the notion that a tablespace and the MySQL Server, even prior to the introduction of partitioning, could be configured to employ different physical directories for storing different databases. Partitioning takes this notion a step further, allowing users to save portions of the table according to a user-defined rule. The user-selected rule by which data can be divided is known as a partitioning function, which could be a simple rule against a set of ranges or value lists, an internal hashing function, or a linear hashing function.

When we partition the table data-file is split across multiple partitions of smaller data-files. The operation we do against that specific range of data, will not affect the whole table as only one data file is touched.

Table without partition:

Shell
centos: employees # ls -lh salaries*#*
-rw-r-----. 1 mysql mysql 104G Oct 11 05:47 salaries.ibd

Table with partition:

Shell
centos: employees # ls -lh salaries*#*
-rw-r-----. 1 mysql mysql 9.0G Oct 11 05:47 salaries#P#p01.ibd
-rw-r-----. 1 mysql mysql 9.0G Oct 11 05:47 salaries#P#p02.ibd
-rw-r-----. 1 mysql mysql  10G Oct 11 05:47 salaries#P#p03.ibd
-rw-r-----. 1 mysql mysql  11G Oct 11 05:47 salaries#P#p04.ibd
-rw-r-----. 1 mysql mysql  11G Oct 11 05:47 salaries#P#p05.ibd
-rw-r-----. 1 mysql mysql  12G Oct 11 05:47 salaries#P#p06.ibd
-rw-r-----. 1 mysql mysql  12G Oct 11 05:47 salaries#P#p07.ibd
-rw-r-----. 1 mysql mysql  13G Oct 11 05:47 salaries#P#p08.ibd
-rw-r-----. 1 mysql mysql  14G Oct 11 05:47 salaries#P#p09.ibd
-rw-r-----. 1 mysql mysql  14G Oct 11 05:47 salaries#P#p10.ibd
-rw-r-----. 1 mysql mysql  15G Oct 11 05:47 salaries#P#p11.ibd
-rw-r-----. 1 mysql mysql  15G Oct 11 05:47 salaries#P#p12.ibd
-rw-r-----. 1 mysql mysql  16G Oct 11 05:47 salaries#P#p13.ibd
-rw-r-----. 1 mysql mysql  16G Oct 11 05:47 salaries#P#p14.ibd
-rw-r-----. 1 mysql mysql  17G Oct 11 05:47 salaries#P#p15.ibd
-rw-r-----. 1 mysql mysql  17G Oct 11 05:47 salaries#P#p16.ibd
-rw-r-----. 1 mysql mysql  16G Oct 11 05:47 salaries#P#p17.ibd
-rw-r-----. 1 mysql mysql  13G Oct 11 05:47 salaries#P#p18.ibd
-rw-r-----. 1 mysql mysql  96M Oct 11 05:45 salaries#P#p19.ibd

How will partitioning help in quickly deleting rows and releasing space?

To archive old data in a partitioned table, we will create an empty table that is identical to the original table in structure but does not have multiple partitions like the original table. Once this table is created we will swap the newly created empty table with one of the partitions of the original table in a matter of seconds.

In this example, we are using a table partitioned on the basis of a date range.

MySQL
mysql>show create table salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  `response_code` blob NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE  COLUMNS(from_date)
(PARTITION p01 VALUES LESS THAN ('1985-12-31') ENGINE = InnoDB,
 PARTITION p02 VALUES LESS THAN ('1986-12-31') ENGINE = InnoDB,
 PARTITION p03 VALUES LESS THAN ('1987-12-31') ENGINE = InnoDB,
 PARTITION p04 VALUES LESS THAN ('1988-12-31') ENGINE = InnoDB,
 PARTITION p05 VALUES LESS THAN ('1989-12-31') ENGINE = InnoDB,
 PARTITION p06 VALUES LESS THAN ('1990-12-31') ENGINE = InnoDB,
 PARTITION p07 VALUES LESS THAN ('1991-12-31') ENGINE = InnoDB,
 PARTITION p08 VALUES LESS THAN ('1992-12-31') ENGINE = InnoDB,
 PARTITION p09 VALUES LESS THAN ('1993-12-31') ENGINE = InnoDB,
 PARTITION p10 VALUES LESS THAN ('1994-12-31') ENGINE = InnoDB,
 PARTITION p11 VALUES LESS THAN ('1995-12-31') ENGINE = InnoDB,
 PARTITION p12 VALUES LESS THAN ('1996-12-31') ENGINE = InnoDB,
 PARTITION p13 VALUES LESS THAN ('1997-12-31') ENGINE = InnoDB,
 PARTITION p14 VALUES LESS THAN ('1998-12-31') ENGINE = InnoDB,
 PARTITION p15 VALUES LESS THAN ('1999-12-31') ENGINE = InnoDB,
 PARTITION p16 VALUES LESS THAN ('2000-12-31') ENGINE = InnoDB,
 PARTITION p17 VALUES LESS THAN ('2001-12-31') ENGINE = InnoDB,
 PARTITION p18 VALUES LESS THAN ('2002-12-31') ENGINE = InnoDB,
 PARTITION p19 VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB) */
1 row in set (0.00 sec)

The size of this table is around 105 GB.

MySQL
mysql>select table_schema, table_name, table_rows, round(data_length / 1024 / 1024 / 1024 ) DATA_MB, round(index_length / 1024 / 1024 / 1024 ) INDEX_GB, round(data_free / 1024 / 1024 / 1024) FREE_MB, round(data_length / 1024 / 1024 / 1024 )+round(index_length / 1024 / 1024 )+round(data_free / 1024 / 1024 /1024 ) TOTAL_MB from information_schema.tables where table_schema='employees' and table_name='salaries';
+--------------+------------+------------+---------+----------+---------+---------+
| table_schema | table_name | table_rows | DATA_GB | INDEX_GB | FREE_GB | TOTAL_GB|
+--------------+------------+------------+---------+----------+---------+---------+
| employees    | salaries   |    2845404 |     105 |        0 |       0 |     105 |
+--------------+------------+------------+---------+----------+---------+---------+
1 row in set (0.00 sec)

We will create a table that is identical to the partitioned table but we will remove partitioning from this newly created table.

MySQL
mysql> CREATE TABLE salaries_swap_p3 LIKE salaries;
Query OK, 0 rows affected (0.23 sec)
mysql> ALTER TABLE salaries_swap_p3 REMOVE PARTITIONING;
Query OK, 0 rows affected (0.32 sec)
Records: 0  Duplicates: 0  Warnings: 0

At this moment, this partition of the table has all the data and is occupying the space in the file system.

MySQL
mysql> SELECT count(*) FROM salaries PARTITION (p03);
+----------+
| count(*) |
+----------+
|    57395 |
+----------+
1 row in set (0.00 sec)
mysql> SELECT count(*) FROM salaries_swap_p3;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.02 sec)

Space reserved by the file system:

Shell
[root@ip-172-31-83-227 employees]# ls -lrth *salaries*3*
-rw-r-----. 1 mysql mysql  96K Nov 21 03:54 salaries_swap_p3.ibd
-rw-r-----. 1 mysql mysql 8.5K Nov 21 03:56 salaries_swap_p3.frm
-rw-r-----. 1 mysql mysql  10G Nov 21 03:56 salaries#P#p03.ibd

To exchange partitions we need to execute a query that requires metadata lock on the table and is done almost instantaneously.

MySQL
mysql> ALTER TABLE salaries EXCHANGE PARTITION p03 WITH TABLE salaries_swap_p3;

Since this operation requires metadata lock and is very fast, it is advised to run this operation in low-traffic periods.

MySQL
mysql> show processlist;
+----+------+-----------+-----------+---------+------+---------------------------------+------------------------------------------------------------------------+-----------+---------------+
| Id | User | Host      | db        | Command | Time | State                           | Info                                                                   | Rows_sent | Rows_examined |
+----+------+-----------+-----------+---------+------+---------------------------------+------------------------------------------------------------------------+-----------+---------------+
|  4 | root | localhost | employees | Query   |   0  | Waiting for table metadata lock | ALTER TABLE salaries EXCHANGE PARTITION p03 WITH TABLE salaries_swap_p3 |         0 |             0 |
+----+------+-----------+-----------+---------+------+---------------------------------+------------------------------------------------------------------------+-----------+---------------+
1 row in set (0.00 sec)

After the swapping, the new table has all the data of the partition and the partition of the table is empty.

MySQL
mysql>  SELECT count(*) FROM salaries PARTITION (p03);
+----------+
| count(*) |
+----------+
+----------+
1 row in set (0.00 sec)
mysql>  SELECT count(*) FROM salaries_swap_p3;
+----------+
| count(*) |
+----------+
|    57395 |
+----------+
1 row in set (0.02 sec)

Space freed by filesystem:

Shell
[root@ip-172-31-83-227 employees]# ls -lrth *salaries*3*
-rw-r-----. 1 mysql mysql  10G Nov 21 03:54 salaries_swap_p3.ibd
-rw-r-----. 1 mysql mysql 8.5K Nov 21 03:56 salaries_swap_p3.frm
-rw-r-----. 1 mysql mysql  96K Nov 21 03:56 salaries#P#p03.ibd

Now, you can proceed to drop the swapped table without locking the main table or can use this technique as well to speed up the drop: Speed Up Your Large Table Drops in MySQL.

Shell
mysql> drop table if exists salaries_swap_p3; 
Query OK, 0 rows affected (5.23 sec)

Conclusion

What we did here is use the MySQL partitioning-related functions to answer a customer challenge of purging data older than x days, sooner. But, partitioning is not a one size fits all solution, partitioning comes with its own caveats. If there are a lot of secondary indexes on a table and the search queries are not only limited to the partition key, the query performance could deteriorate exponentially. If partitioning the table is not an option, pt-archiver is a great tool that would automate the DELETE statement on your databases with minimal monitoring.

Further Reference:

https://dev.mysql.com/doc/refman/5.7/en/partitioning.html

https://dev.mysql.com/doc/refman/5.7/en/partitioning-management-exchange.html

Want to archive tables? Use Percona Toolkit’s pt-archiver


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK