MySQL Compressed Binary Logs
source link: https://mysql.wisborg.dk/2020/05/07/mysql-compressed-binary-logs/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
MySQL Compressed Binary Logs –Skip to content
On a busy server, the binary logs can end up being one of the largest contributors to amount of disk space used. That means higher I/O, larger backups (you are backing up your binary logs, right?), potentially more network traffic when replicas fetch the logs, and so on. In general, binary logs compress well, so it has been a long time wish for a feature that allowed you to compress the logs while MySQL are still using them. Starting from MySQL 8.0.20 that is now possible. I will take a look at the new feature in this post.
Configuration
The binary log compression feature is controlled by two variables, one for enabling the feature and another to specify the compression level. These are summarized in the below table.
The two option names are fairly self explanatory. binlog_transaction_compression
specifies whether compression is enabled or not, and binlog_transaction_compression_level_zstd
specifies the compression level. The high level, the (in principle – see also the tests later in the blog) better compression at the cost of increased CPU.
Both options can be set dynamically both at the global and session scopes. However, it is not allowed to change the session values in the middle of a transaction. If you do that, you get an error like this:
mysql> SET SESSION binlog_transaction_compression = ON;
ERROR: 1766 (HY000): The system variable binlog_transaction_compression cannot be set when there is an ongoing transaction.
While it is easy to configure binary log compression, there are some limitations, you should be aware of.
Limitations
The short version of the limitations boils down to only transactional row events are compressed. Statement based events, GTID information, rotation events, row-based events for non-transactional tables, etc. are not compressed. Also if you have a transaction and add non-transactional changes, then neither will be compressed.
If you use all default values for your binary logs and use the InnoDB storage engine (the default), then compression will work. For the complete list of limitations, see Binary Log Transaction Compression in the manual.
Information
This also means that each transaction is compressed on its own. See the examples later in the blog for more on what that means.
As I usually preach, monitoring is key to understanding your system. What does the binary log compression feature support in terms of monitoring?
Monitoring
There are two ways to monitor the performance of the binary log compression feature. A Performance Schema table with compression statistics and new stage events.
The binary_log_transaction_compression_stats
table in the Performance Schema includes statistics since the last restart of MySQL (or last time the table was truncated) for the compression. The table has two rows for the binary log, one for compressed events and one for events that are not compressed. Replicas will similarly have two rows for the relay log. An example of the content as well as the binary logs at the same time is:
mysql> SELECT * FROM binary_log_transaction_compression_stats\G
*************************** 1. row ***************************
LOG_TYPE: BINARY
COMPRESSION_TYPE: ZSTD
TRANSACTION_COUNTER: 15321
COMPRESSED_BYTES_COUNTER: 102796461
UNCOMPRESSED_BYTES_COUNTER: 252705572
COMPRESSION_PERCENTAGE: 59
FIRST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:30730
FIRST_TRANSACTION_COMPRESSED_BYTES: 313
FIRST_TRANSACTION_UNCOMPRESSED_BYTES: 363
FIRST_TRANSACTION_TIMESTAMP: 2020-05-07 19:26:37.744437
LAST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:46058
LAST_TRANSACTION_COMPRESSED_BYTES: 712
LAST_TRANSACTION_UNCOMPRESSED_BYTES: 1627
LAST_TRANSACTION_TIMESTAMP: 2020-05-07 19:38:14.149782
*************************** 2. row ***************************
LOG_TYPE: BINARY
COMPRESSION_TYPE: NONE
TRANSACTION_COUNTER: 20
COMPRESSED_BYTES_COUNTER: 5351
UNCOMPRESSED_BYTES_COUNTER: 5351
COMPRESSION_PERCENTAGE: 0
FIRST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:30718
FIRST_TRANSACTION_COMPRESSED_BYTES: 116
FIRST_TRANSACTION_UNCOMPRESSED_BYTES: 116
FIRST_TRANSACTION_TIMESTAMP: 2020-05-07 19:26:37.508155
LAST_TRANSACTION_ID: 74470a0c-8ea4-11ea-966e-080027effed8:31058
LAST_TRANSACTION_COMPRESSED_BYTES: 116
LAST_TRANSACTION_UNCOMPRESSED_BYTES: 116
LAST_TRANSACTION_TIMESTAMP: 2020-05-07 19:30:30.840767
2 rows in set (0.0026 sec)
mysql> SHOW BINARY LOGS;
+---------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+---------------+-----------+-----------+
| binlog.000142 | 240 | No |
| binlog.000143 | 4933 | No |
| binlog.000144 | 28238118 | No |
| binlog.000145 | 24667167 | No |
| binlog.000146 | 39221771 | No |
| binlog.000147 | 11944631 | No |
| binlog.000148 | 196 | No |
+---------------+-----------+-----------+
7 rows in set (0.0005 sec)
The example is generated with GTIDs enabled, and all of the binary logs except binlog.000142
have been created after the last restart.
This shows that there has been 15341 transactions (74470a0c-8ea4-11ea-966e-080027effed8:30718-46058
) of which 15321 have been compressed with an average of 59% compression (compressing 252,705,572 bytes to 102,796,461 bytes). There are also statistics for the first and most recent transaction that was compressed. Similarly there have been 20 “transactions” that could not be compressed.
One thing is to know the compression rate, you also need to know the price you pay in terms of the overhead of performing the compression and decompression. That can be monitored using two stage events in the Performance Schema:
stage/sql/Compressing transaction changes.
stage/sql/Decompressing transaction changes.
(Yes, the period is part of the name.)
Neither are enabled by default and for the instruments to be collection, you will also need to enable the events_stages_current
consumer. For example to enable these in the configuration file:
[mysqld]
performance-schema-instrument = "stage/sql/%Compressing transaction changes.=ON"
performance-schema-consumer-events-stages-current = ON
Warning
There is some overhead by enabling instrumentation of stages. Make sure you test the impact first, if you consider enabling these on a production system.
You can compare that with the wait/io/file/sql/binlog
event (enabled by default) which is the time spent doing I/O. For example:
mysql> SELECT EVENT_NAME, COUNT_STAR,
FORMAT_PICO_TIME(SUM_TIMER_WAIT) AS total_latency,
FORMAT_PICO_TIME(MIN_TIMER_WAIT) AS min_latency,
FORMAT_PICO_TIME(AVG_TIMER_WAIT) AS avg_latency,
FORMAT_PICO_TIME(MAX_TIMER_WAIT) AS max_latency
FROM performance_schema.events_stages_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'stage/sql/%transaction changes.'\G
*************************** 1. row ***************************
EVENT_NAME: stage/sql/Compressing transaction changes.
COUNT_STAR: 15321
total_latency: 6.10 s
min_latency: 22.00 ns
avg_latency: 397.96 us
max_latency: 982.12 ms
*************************** 2. row ***************************
EVENT_NAME: stage/sql/Decompressing transaction changes.
COUNT_STAR: 0
total_latency: 0 ps
min_latency: 0 ps
avg_latency: 0 ps
max_latency: 0 ps
2 rows in set (0.0008 sec)
mysql> SELECT *
FROM sys.io_global_by_wait_by_latency
WHERE event_name = 'sql/binlog'\G
*************************** 1. row ***************************
event_name: sql/binlog
total: 27537
total_latency: 4.83 min
avg_latency: 10.51 ms
max_latency: 723.92 ms
read_latency: 138.25 us
write_latency: 290.69 ms
misc_latency: 4.82 min
count_read: 37
total_read: 1002 bytes
avg_read: 27 bytes
count_write: 16489
total_written: 99.26 MiB
avg_written: 6.16 KiB
1 row in set (0.0015 sec)
In this case, I have used the sys.io_global_by_wait_by_latency
view as a short cut as it automatically makes the latencies human readable. For the latencies of the compression/decompression stages, the FORMAT_PICO_TIME()
function convert them.
In this example, MySQL has spent 6.21 seconds compressing binary logs averaging just under 400 microseconds per transaction. For comparison, 4.8 minutes have been spent doing I/O on the binary log files.
You should check the time spent on writing and reading the binary log files before enabling compression, so you can determine the change in performance. You should also check the change in CPU usage.
Book
If you want to learn more about MySQL performance tuning, then I have written MySQL 8 Query Performance Tuning published by Apress.
The book is available from Apress (print and DRM free ePub+PDF), Amazon (print and Kindle), Barnes & Noble (print), and others.
In the above outputs, it showed the compression was 59%, but how is it for different kinds of workloads?
Examples – Workloads
In order to determine how well the compression works, I have performed a series of tasks and compared the size of the binary logs with and without compression. For comparison, I have also tried to manually compress the binary logs from the series of tests without compression to see the optimal compression (as opposed to the per-transaction compression used by MySQL). The tests have been performed with the default configuration except the settings necessary for the given test.
The following workloads have been performed:
- Bulk load: Loading the
employees
sample database. - Bulk update: Updating the
salary
column of all rows in theemployees.salaries
table:UPDATE employees.salaries SET salary = salary + 1
. - Bulk load: Populating four tables with 100,000 rows using
sysbench
as preparation for theoltp_read_write
benchmark. - OLTP workload: Running the
sysbench
oltp_read_write
benchmark with--events=15000
using the four tables from the previous test. - Single row deletes: Deleting all 100,000 rows in one of the tables from the
sysbench
test. The rows are deleted one by one which represents a worst case scenario for the compression as the transactions are very small and there is only the before image in the binary log for each deleted row.
The single row deletes can be performed easily using MySQL Shell, for example in the Python language mode:
from
datetime
import
datetime
for
i
in
range
(
100000
):
if
i
%
5000
=
=
0
:
print
(
'{0}: i = {1}'
.
format
(datetime.now(), i))
session.run_sql(
'DELETE FROM sbtest.sbtest1 WHERE id = {0}'
.
format
(i
+
1
))
print
(
'{0}: i = 1000000'
.
format
(datetime.now()))
The size of each task generates uncompressed binary logs in the range of 25 MiB to 82 MiB.
The tests were performed with the following settings:
- No compression
- Compression enabled
- Encrypted but no compressed
- Encryption and compression enabled
- No compression in MySQL + compressing with
zstd
The compression with zstd
was chosen as MySQL uses the Zstandard compression algorithm. If you want to try yourself, you can download the Zstandard source code from Facebook's GitHub repository which also includes the compilation instructions.
The resulting size of the binary log in bytes for each combination can be found in the following table.
TestNormalCompressEncryptEncrypt +
Compresszstd
Load employees
6637863928237933663791512823863426892387Bulk Update8569832024667051856988322466767724779953sysbench
prepare7636774039221052763682523922180639775067sysbench
run261902001193371326190712119365618468625Single Row Deletes4730015639492400473006683963768416525219
That the encrypted binary log compresses as well as the unencrypted suggest that the compression is done before the encryption. (Encrypted data does not compress well.) Because there is no difference whether encryption is enabled, only the normal (uncompressed), compressed, and zstd
results will be discussed further. The binary log sizes can also be seen in this figure:
Unsurprising the bulk loads and update perform the best with compressed binary log being 29% to 51% of the uncompressed size. The sysbench
OLTP workload also compresses well ending up being 46% of the size. Also unsurprising, the single row deletes do not compress nearly as well with the compressed binary log being 83% of the size of the uncompressed binary log.
When comparing the binary logs compressed by MySQL with those compressed manually with zstd
, the file size is around the same for the bulk loads reflecting that for large transactions, compressing on a per-transaction basis works as well as compressing the whole file. As the transaction size becomes smaller, the relative efficiency of the per-transaction compression reduces which is particularly visible for the single row deletes.
The other factor to consider is the compression level.
Examples – Compression Level
There are some oddities when it comes to the compression level, and the short story is that there is no need to change the setting.
The first oddity is that the allowed values are 1-22 but zstd
only supports levels 1-19. There is nothing in the MySQL documentation explaining the difference.
The second oddity is that there is effectively no change in the size of the compressed binary logs by changing the value of binlog_transaction_compression_level_zstd
as can be seen from the table:
Level/TestMySQLzstd
LoadOLTPLoadOLTP1282381421193545034483207853154532823793311933713268923878468625112823812811902669247371946639524192823824611937664188671875724300222823812511910022
For comparison, the compression at levels 1, 3, 11, and 19 are included compressing the binary log manually with zstd
. “Load” is for loading the employees
database and OLTP is the sysbench
oltp_read_write
benchmark. The data can also be seen in this figure:
As it can be seen, there is essentially no difference in the file size irrespective of the compression level used in MySQL whereas for zstd
the file size reduces as expected with increased compression level. For level 1 with the load test MySQL even compresses significantly better than zstd
. It is like the compression level is never being set in MySQL.
Information
One possible explanation is that Zstandard supports training the algorithm for a given type of data (creating a dictionary). This particularly helps improving compression of small data. I do not know whether MySQL uses a dictionary and if so whether that makes all compression levels work roughly equal.
Conclusion
The new binary log transaction compression works great and can be a great way to reduce the amount of I/O, disk usage, and network usage. Should you enable it?
Most likely you will benefit from enabling binary log compression unless you are very strapped for CPU. From these tests, it is likely the disk space occupied by binary logs can be roughly cut in half, but as always it is workload dependent and you should test with your workload.
Advice
You can also enable compression at the protocol level for the transmission of the binary logs. There is no reason to both enable binary log transaction compression and the protocol compression.
On the other hand, there is not reason at present to change the compression level.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK