

Percona Database Performance Blog - How InnoDB Handles TEXT/BLOB Columns
source link: https://www.percona.com/blog/2021/05/27/how-innodb-handles-text-blob-columns/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Recently we had a debate in the consulting team about how InnoDB handles TEXT/BLOB columns. More specifically, the argument was around the Barracuda file format with dynamic rows.
In the InnoDB official documentation, you can find this extract:
When a table is created with ROW_FORMAT=DYNAMIC, InnoDB can store long variable-length column values (for VARCHAR, VARBINARY, and BLOB and TEXT types) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page.
…
Whether columns are stored off-page depends on the page size and the total size of the row. When a row is too long, the longest columns are chosen for off-page storage until the clustered index record fits on the B-tree page. TEXT and BLOB columns that are less than or equal to 40 bytes are stored in line.
The first paragraph indicates InnoDB can store long columns fully off-page, in an overflow page. A little further down in the documentation, the behavior of short values, the ones with lengths less than 40 bytes, is described. These short values are stored in the row.
Our argument was about what happened for lengths above 40 bytes. Are these values stored in the rows or in overflow pages. I don’t really need a reason to start digging on a topic, imagine when I have one. With simple common tools, let’s experiment and find out.
Experimentation with TEXT/BLOB
For our experiment, we need a MySQL database server, a table with a TEXT column, and the hexdump utility. I spun up a simple LXC instance in my lab with Percona-server 8.0 and created the following table:
The before and after columns are there to help locate the TEXT column in the datafile page. We can insert a few rows in the table, a TEXT value with size being a multiplier of 16 is convenient with hexdump, successive identical lines are replaced by ‘*’.
Then we use the hexdump utility at the shell level, the first leaf page starts at offset 0xc000:
Clearly, the values are stored inside the row, even the third one with a length of 48.
Cutoff to an Overflow Page
If we continue to increase the length, the behavior stays the same up to a length of 8080 bytes (505 repetitions). If we add 16 bytes to the length, the row becomes larger than half of the available space on the page. At this point, the TEXT value is moved to an overflow page and replaced by a 20 bytes pointer in the row itself. The pointer in the rows looks like this:
The overflow page:
From the above, one can make a few interesting observations:
- The 20 bytes pointer includes a space_id value so in theory, the overflow page could be in another tablespace (ibd file).
- The total length is using 4 bytes even though this is a TEXT and not a LONGTEXT column. 2 bytes would have been sufficient.
- The length of an overflow page chunk is also using 4 bytes for length even though the largest possible InnoDB page size is 64KB.
Performance Impacts
TEXT/BLOB columns are the source of a number of performance-related impacts. Let’s review the most obvious ones.
Storage
The first performance impact related to storage inefficiency. TEXT/BLOB values are stored in chunks of 16KB if the default InnoDB page size is used. That means, on average, about 8KB per value is lost when overflow pages are used. This leads to larger data files and less efficient caching.
Reads
The presence of TEXT/BLOB columns can significantly increase the number of read IOPs required for queries. For example, let’s consider the following simple query:
If we ignore caching, without TEXT/BLOB columns, the above query would require only one read IOP per level in the primary key btree of myTable. For a small table, this could be one or two read IOPs. Now, if each row has a 1MB TEXT/BLOB column, the same simple query would require in excess of 640 read IOPs since each TEXT/BLOB value is a chain of 64 pages of 16KB.
Writes
For this section, let’s assume a worst-case scenario with row-based replication enabled and a full row image. Now, if we insert a row with a 1MB value as a longText column in a Percona server 8.0 instance, we have:
For a total of around 6.4 MB. This is not surprising, there are two log files and the data is also written twice because of the doublewrite buffer. The temporary file is used for the disk binlog cache and unless the transaction is very long, it won’t actually be written to storage.
Anyway, this is just to set the stage for what happens after an update. If I just change the last letter of the longText value, the amount of data written raises to approximately 10 MB.
The longText value is not modified in place, it is copied to a new set of overflow pages. The new and old overflow pages then need to be flushed to storage. Also, since we use the worst-case scenario of the full row image, the binary log entry has the old and new value stored but the InnoDB log files only have the new version.
I hope this illustrates why storing mutable data in a TEXT/BLOB column is a bad idea.
Although columns stored using the MySQL JSON data type are stored as a TEXT/BLOB column, MySQL 8.0 added some logic to allow in-place updates. The impact of an update to a large JSON column in 8.0 is not as severe as in 5.7.
How to Best Deal with TEXT/BLOB Columns?
Data Compression
Data compression is a very compelling option for TEXT/BLOB values By definition, those values are normally large and thus, usually compress well. This is nothing new and has been reported in previous previously. PostgreSQL, for one, compresses by default its TEXT/BLOB columns (called TOAST).
The best option for compression is always the application when it is possible of course. As we have just seen, this reduces the write load and spares the database from the CPU burden of compression.
Another option with MySQL is the InnoDB Barracuda table compression. When used, TEXT/BLOB values are compressed before being written to the overflow pages. This is much more efficient than compressing the pages one at a time.
Finally, if you are using Percona Server or MariaDB, you have access to transparent column compression. This is the second-best option, performance-wise, behind compression by the application is not possible.
Avoid Returning TEXT/BLOB Columns
When there are large TEXT/BLOB columns in a table, the cost of accessing those columns is high. Because of this, it is important to select only the columns that are needed and avoid the default use of “select * from”. Unfortunately, many ORM frameworks instantiate objects by grabbing all the columns. When using such frameworks, you should consider storing the TEXT/BLOB in a separate table with a loose 1:1 relationship with the original table. That way, the ORM is able to instantiate the object without necessarily forcing a read of the TEXT/BLOB columns.
Conclusion
I hope this article improved your understanding of TEXT/BLOB values in InnoDB. When used correctly, TEXT/BLOB columns can be useful and have a limited impact on the performance of the database.
Related
Blob Storage in InnodbFebruary 9, 2010In "Insight for DBAs"
Data Compression in InnoDB for Text and Blob FieldsMay 30, 2012In "Insight for DBAs"
Innodb row size limitationApril 7, 2011In "Insight for Developers"
Recommend
-
8
Percona Live ONLINE: Keynotes Now Live! Back to the Blog
-
9
Percona XtraBackup 1.6 for Windows “try me” edition Back to the Blog Previously we had XtraBack...
-
6
One of the new features introduced in MySQL 8.0.24 was the ability to log all SQL statements that a...
-
10
We’ve recently received some questions regarding PL/Java and I found it hard to get clear instructions searching on the internet. It’s not that there is no good information out there, but most of it is either incomplete, outdated, or confusin...
-
13
Percona Database Performance Blog Back to the Blog Security is everybody’s concern when talking about data and information, and therefore it becomes the main foundation of...
-
9
Increase Database Performance By Using Multiple Columns Last updated 07/11/2021 By thinking outside of the box, you can often get much faster database performance by splitting values across multiple database col...
-
5
Percona Database Performance Blog Back to the Blog
-
8
Percona Database Performance Blog Back to the Blog
-
9
Happy National Video Game Day! Is Your Database Ready?
-
6
Percona Database Performance Blog Back to the Blog In this blog post, I’ll look at MyRocks performance through some benchmark testing. As...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK