4

Percona Database Performance Blog - How InnoDB Handles TEXT/BLOB Columns

 2 years ago
source link: https://www.percona.com/blog/2021/05/27/how-innodb-handles-text-blob-columns/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
How InnoDB Handles TEXT/BLOB Columns

InnoDB Handles TEXT BLOB ColumnsRecently we had a debate in the consulting team about how InnoDB handles TEXT/BLOB columns. More specifically, the argument was around the Barracuda file format with dynamic rows.

In the InnoDB official documentation, you can find this extract:

When a table is created with ROW_FORMAT=DYNAMIC, InnoDB can store long variable-length column values (for VARCHAR, VARBINARY, and BLOB and TEXT types) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page.

Whether columns are stored off-page depends on the page size and the total size of the row. When a row is too long, the longest columns are chosen for off-page storage until the clustered index record fits on the B-tree page. TEXT and BLOB columns that are less than or equal to 40 bytes are stored in line.

The first paragraph indicates InnoDB can store long columns fully off-page, in an overflow page. A little further down in the documentation, the behavior of short values, the ones with lengths less than 40 bytes, is described. These short values are stored in the row.

Our argument was about what happened for lengths above 40 bytes. Are these values stored in the rows or in overflow pages. I don’t really need a reason to start digging on a topic, imagine when I have one. With simple common tools, let’s experiment and find out.

Experimentation with TEXT/BLOB

For our experiment, we need a MySQL database server, a table with a TEXT column, and the hexdump utility. I spun up a simple LXC instance in my lab with Percona-server 8.0 and created the following table:

Shell
CREATE TABLE `testText` (
`id` int(11) unsigned NOT NULL,
`before` char(6) NOT NULL DEFAULT 'before',
`data` text NOT NULL,
`after` char(5) NOT NULL DEFAULT 'after',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The before and after columns are there to help locate the TEXT column in the datafile page. We can insert a few rows in the table, a TEXT value with size being a multiplier of 16 is convenient with hexdump, successive identical lines are replaced by ‘*’.

Shell
mysql> insert into testText (id,data) values (1,repeat('0123456789abcdef',1));
Query OK, 1 row affected (0.01 sec)
mysql> insert into testText (id,data) values (2,repeat('0123456789abcdef',2));
Query OK, 1 row affected (0.00 sec)
mysql> insert into testText (id,data) values (3,repeat('0123456789abcdef',3));
Query OK, 1 row affected (0.01 sec)

Then we use the hexdump utility at the shell level, the first leaf page starts at offset 0xc000:

Shell
# hexdump -C /var/lib/mysql/test/testText.ibd
0000c060  02 00 1b 69 6e 66 69 6d  75 6d 00 04 00 0b 00 00  |...infimum......|
0000c070  73 75 70 72 65 6d 75 6d  10 00 00 10 00 32 00 00  |supremum.....2..|
0000c080  00 01 00 00 00 05 74 3f  d9 00 00 01 5d 01 10 62  |......t?....]..b|
0000c090  65 66 6f 72 65 30 31 32  33 34 35 36 37 38 39 61  |efore0123456789a|
0000c0a0  62 63 64 65 66 61 66 74  65 72 20 00 00 18 00 42  |bcdefafter ....B|
0000c0b0  00 00 00 02 00 00 00 05  74 40 da 00 00 01 5e 01  |........t@....^.|
0000c0c0  10 62 65 66 6f 72 65 30  31 32 33 34 35 36 37 38  |.before012345678|
0000c0d0  39 61 62 63 64 65 66 30  31 32 33 34 35 36 37 38  |9abcdef012345678|
0000c0e0  39 61 62 63 64 65 66 61  66 74 65 72 30 00 00 20  |9abcdefafter0.. |
0000c0f0  ff 7e 00 00 00 03 00 00  00 05 74 45 dd 00 00 01  |.~........tE....|
0000c100  62 01 10 62 65 66 6f 72  65 30 31 32 33 34 35 36  |b..before0123456|
0000c110  37 38 39 61 62 63 64 65  66 30 31 32 33 34 35 36  |789abcdef0123456|
0000c130  37 38 39 61 62 63 64 65  66 61 66 74 65 72 00 00  |789abcdefafter..|

Clearly, the values are stored inside the row, even the third one with a length of 48.

Cutoff to an Overflow Page

If we continue to increase the length, the behavior stays the same up to a length of 8080 bytes (505 repetitions). If we add 16 bytes to the length, the row becomes larger than half of the available space on the page. At this point, the TEXT value is moved to an overflow page and replaced by a 20 bytes pointer in the row itself. The pointer in the rows looks like this:

Shell
0000c060 02 00 1c 69 6e 66 69 6d 75 6d 00 02 00 0b 00 00 |...infimum......|
0000c070 73 75 70 72 65 6d 75 6d 14 c0 00 00 10 ff f1 00 |supremum........|
0000c080 00 00 01 00 00 00 05 74 80 a4 00 00 01 fe 01 10 |.......t........|
0000c090 62 65 66 6f 72 65 00 00 01 02 00 00 00 04 00 00 |before..........|
0000c0a0 00 26 00 00 00 00 00 00 1f a0 61 66 74 65 72 00 |.&........after.|
0000c0b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The 20 bytes pointer is: 00 00 01 02 00 00 00 04 00 00 00 26 00 00 00 00 00 00 1f a0
Space ID: 00 00 01 02
First overflow page: 00 00 00 04 (4 x 0x4000 = 0x10000)
Offset in overflow page: 00 00 00 26
Version number: 00 00 00 00
Total length of the TEXT value: 00 00 1f a0 (0x1fa0 = 8096 = 16*506)

The overflow page:

Shell
00010000 c1 06 3d 24 00 00 00 04 00 00 00 00 00 00 00 00 |..=$............|
00010010 00 00 00 00 74 dd 8e 3f 00 0a 00 00 00 00 00 00 |....t..?........|
00010020 00 00 00 00 01 02 00 00 1f a0 ff ff ff ff 30 31 |..............01|
00010030 32 33 34 35 36 37 38 39 61 62 63 64 65 66 30 31 |23456789abcdef01|
00011fc0 32 33 34 35 36 37 38 39 61 62 63 64 65 66 00 00 |23456789abcdef..|
00011fd0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The value header starting @ 0x10026: 00 00 1f a0 ff ff ff ff
Length in that overflow page: 00 00 1f a0 (0x1fa0 = 8096 = 16*506)
Next overflow page: ff ff ff ff (this means it is the last)

From the above, one can make a few interesting observations:

  • The 20 bytes pointer includes a space_id value so in theory, the overflow page could be in another tablespace (ibd file).
  • The total length is using 4 bytes even though this is a TEXT and not a LONGTEXT column. 2 bytes would have been sufficient.
  • The length of an overflow page chunk is also using 4 bytes for length even though the largest possible InnoDB page size is 64KB.

Performance Impacts

TEXT/BLOB columns are the source of a number of performance-related impacts. Let’s review the most obvious ones.

Storage

The first performance impact related to storage inefficiency. TEXT/BLOB values are stored in chunks of 16KB if the default InnoDB page size is used. That means, on average, about 8KB per value is lost when overflow pages are used. This leads to larger data files and less efficient caching.

Reads

The presence of TEXT/BLOB columns can significantly increase the number of read IOPs required for queries. For example, let’s consider the following simple query:

Shell
SELECT * from myTable limit 10;

If we ignore caching, without TEXT/BLOB columns, the above query would require only one read IOP per level in the primary key btree of myTable. For a small table, this could be one or two read IOPs. Now, if each row has a 1MB TEXT/BLOB column, the same simple query would require in excess of 640 read IOPs since each TEXT/BLOB value is a chain of 64 pages of 16KB.

Writes

For this section, let’s assume a worst-case scenario with row-based replication enabled and a full row image. Now, if we insert a row with a 1MB value as a longText column in a Percona server 8.0 instance, we have:

Shell
+----------------------------------+---------------------------+
| FILE_NAME                        | SUM_NUMBER_OF_BYTES_WRITE |
+----------------------------------+---------------------------+
| /var/lib/mysql/#ib_16384_0.dblwr |                   1572864 |
| /var/lib/mysql/ib_logfile0       |                   1174528 |
| /var/lib/mysql/test/testText.ibd |                   1130496 |
| /var/lib/mysql/binlog.000003     |                   1048879 |
| /tmp/MLfd=41                     |                   1048783 |
| /var/lib/mysql/undo_001          |                    278528 |
| /var/lib/mysql/undo_002          |                    131072 |
| /var/lib/mysql/ibdata1           |                     16384 |
| /var/lib/mysql/mysql.ibd         |                     16384 |
+----------------------------------+---------------------------+

For a total of around 6.4 MB. This is not surprising, there are two log files and the data is also written twice because of the doublewrite buffer. The temporary file is used for the disk binlog cache and unless the transaction is very long, it won’t actually be written to storage.

Anyway, this is just to set the stage for what happens after an update. If I just change the last letter of the longText value, the amount of data written raises to approximately 10 MB.

Shell
+----------------------------------+---------------------------+
| FILE_NAME                        | SUM_NUMBER_OF_BYTES_WRITE |
+----------------------------------+---------------------------+
| /var/lib/mysql/#ib_16384_0.dblwr | 2260992                   |
| /var/lib/mysql/test/testText.ibd | 2211840                   |
| /var/lib/mysql/binlog.000003     | 2097475                   |
| /tmp/MLfd=41                     | 2097379                   |
| /var/lib/mysql/ib_logfile0       | 1129472                   |
| /var/lib/mysql/undo_001          | 32768                     |
| /var/lib/mysql/ibdata1           | 16384                     |
+----------------------------------+---------------------------+

The longText value is not modified in place, it is copied to a new set of overflow pages. The new and old overflow pages then need to be flushed to storage. Also, since we use the worst-case scenario of the full row image, the binary log entry has the old and new value stored but the InnoDB log files only have the new version.

I hope this illustrates why storing mutable data in a TEXT/BLOB column is a bad idea.

Although columns stored using the MySQL JSON data type are stored as a TEXT/BLOB column, MySQL 8.0 added some logic to allow in-place updates. The impact of an update to a large JSON column in 8.0 is not as severe as in 5.7.

How to Best Deal with TEXT/BLOB Columns?

Data Compression

Data compression is a very compelling option for TEXT/BLOB values By definition, those values are normally large and thus, usually compress well. This is nothing new and has been reported in previous previously. PostgreSQL, for one, compresses by default its TEXT/BLOB columns (called TOAST).

The best option for compression is always the application when it is possible of course. As we have just seen, this reduces the write load and spares the database from the CPU burden of compression.

Another option with MySQL is the InnoDB Barracuda table compression. When used, TEXT/BLOB values are compressed before being written to the overflow pages. This is much more efficient than compressing the pages one at a time.

Finally, if you are using Percona Server or MariaDB, you have access to transparent column compression. This is the second-best option, performance-wise, behind compression by the application is not possible.

Avoid Returning TEXT/BLOB Columns

When there are large TEXT/BLOB columns in a table, the cost of accessing those columns is high. Because of this, it is important to select only the columns that are needed and avoid the default use of “select * from”. Unfortunately, many ORM frameworks instantiate objects by grabbing all the columns. When using such frameworks, you should consider storing the TEXT/BLOB in a separate table with a loose 1:1 relationship with the original table. That way, the ORM is able to instantiate the object without necessarily forcing a read of the TEXT/BLOB columns.

Conclusion

I hope this article improved your understanding of TEXT/BLOB values in InnoDB. When used correctly, TEXT/BLOB columns can be useful and have a limited impact on the performance of the database.

Related

Blob Storage in InnodbFebruary 9, 2010In "Insight for DBAs"

Data Compression in InnoDB for Text and Blob FieldsMay 30, 2012In "Insight for DBAs"

Innodb row size limitationApril 7, 2011In "Insight for Developers"


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK