17

MySQL: Disk Space Exhaustion for Implicit Temporary Tables

 4 years ago
source link: https://www.tuicool.com/articles/aaUV3yb
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

q22iI3a.jpg!web I was recently faced with a real issue about completely exhausting the disk space on MySQL. This was a serious issue because of the continuous outages of the service, as the customer had to constantly restart the server and wait for the next outage.

What was happening? In this article, I’m going to explain it and propose solutions.

Implicit temporary tables

MySQL needs to create implicit temporary tables for solving some kinds of queries. The queries that require a sorting stage most of the time need to rely on a temporary table. For example, when you use GROUP BY , ORDER BY or DISTINCT. Such queries are executed in two stages: the first is to gather the data and put them into a temporary table, the second is to execute the sorting on the temporary table.

A temporary table is also needed in case of some UNION statements evaluation, for VIEW that cannot use merge, for derived tables when using subqueries, for multiple-table UPDATE , and some other cases.

If the temporary table is small it can be created into the memory, otherwise, it’s created on the disk. Needless to say that an in-memory temporary table is faster. MySQL creates an in-memory table, and if it becomes too large it is converted to an on-disk table. The maximum size for in-memory temporary tables is defined by the tmp_table_size or max_heap_table_size value, whichever is smaller. The default size in MySQL 5.7 is 16MB. If you run queries on a large amount of data, or if you have not optimized queries, you can increase the variables. When setting the threshold, take into consideration the available RAM and the number of concurrent connections you have during your peaks. You cannot indefinitely increase the variables, as at some point you’ll need to let MySQL use on-disk temporary tables.

Note: the temporary table is created on-disk if the tables involved have TEXT or BLOB columns, even if the size is less than the configured threshold.

Temporary tables storage engine

Until MySQL 5.6, all the on-disk temporary tables are created as MyISAM. The temporary table is created in-memory or on-disk, depending on the configuration, and it’s dropped immediately at the end of the query. From MySQL 5.7, they are created as InnoDB by default. Then you can rely on the advanced features.

The new default is the best option for the overall performance and should be used in the majority of the cases.

A new configuration variable is available to set the storage engine for the temporary tables: internal_tmp_disk_storage_engine . The variable can be set to  innodb (default if not set) or myisam .

The potential problem with InnoDB temporary tables

Although using InnoDB is the best for performance, a new potential issue could arise. In some particular cases, you can have disk exhaustion and server outage.

As any other InnoDB table in the database, the temporary tables have their own tablespace file. The new file is in the data directory together with the general tablespace, with the name ibtmp1 . It stores all the tmp tables. A tablespace file cannot be shrunk, and it grows constantly as long as you don’t run a manual OPTIMIZE TABLE. The ibtmp1 makes no difference, as you cannot use OPTIMIZE. The only way to shrink ibtmp1 size to zero is to restart the server.

Fortunately, even if the file cannot shrink, after the execution of a query the temporary table is automatically dropped and the space in the tablespace can be reused for another incoming query.

Let’s think now about the following case:

  • you have non-optimized queries that require the creation of very large on-disk tmp tables
  • you have optimized queries, but they are creating very large on-disk tmp tables because you are doing in purpose computation on a very large dataset (statistics, analytics)
  • you have a lot of concurrent connections running the same queries with tmp table creation
  • you don’t have a lot of free space in your volume

In such a situation it’s easy to understand that the file ibtmp1 size can increase considerably and the file can easily exhaust the free space. This was happening several times a day, and the server had to be restarted in order to completely shrink the ibtmp1 tablespace.

It’s not mandatory that the concurrent queries are launched exactly at the same time. Since a query with a large temporary table will take several seconds or minutes to execute, it is sufficient to have queries launched at different times while the preceding ones are still running. Also, you have to consider that any connection creates its own temporary table, so the same exact query will create another exact copy of the same temporary table into the tablespace. Exhausting the disk space is very easy with non-shrinkable files!

So, what to do to avoid disk exhaustion and outages?

The trivial solution: use a larger disk

This is really trivial and can solve the problem, but it is not the optimal solution. In fact, it’s not so easy to figure out what your new disk size should be. You can guess by increasing the disk size step by step, which is quite easy to do if your environment is in the cloud or you have virtual appliances on a very large platform. But it’s not easy to do in on-premise environments.

But with this solution, you can risk having unneeded expenses, so keep that in mind.

You can also move the ibtmp1 file on a dedicated large disk, by setting the following configuration variable:

[mysqld]
innodb_temp_data_file_path = ../../tmp/ibtmp1:12M:autoextend

A MySQL restart is required.

Please note that the path has to be specified as relative to the data directory.

Set an upper limit for ibtmp1 size

For example:

[mysqld]
innodb_temp_data_file_path = ibtmp1:12M:autoextend:max:10G

In this case, the file cannot grow more than 10GB. You can easily eliminate the outages, but this is a dangerous solution. When the data file reaches the maximum size, queries fail with an error indicating that the table is full. This is probably bad for your applications.

Step back to MyISAM for on-disk temporary tables

This solution seems to be counterintuitive but it could be the best way to avoid the outages in a matter of seconds and is guaranteed to use all needed temporary tables.

You can set the following variable into my.cnf :

internal_tmp_disk_storage_engine = MYISAM

Since the variable is dynamic you can set it also at runtime:

SET GLOBAL internal_tmp_disk_storage_engine = MYISAM;

Stepping back to MyISAM, you will considerably decrease the possibility of completely filling your disk space. In fact, the temporary tables will be created into different files and immediately dropped at the end of the query. No more issues about a forever increasing file.

And while there is always the possibility to see the same issue, just in case you can run the queries at the exact same time or really very close. In my real-world case, this was the solution to avoid all the outages.

Optimize your queries

This is the most important thing to do. After stepping back the storage engine to MyISAM to mitigate the outage occurrences, you have to absolutely take the time to analyze the queries.

The goal is to decrease the size of the on-disk temporary tables. It’s not the aim of this article to explain how to investigate the queries, but you can rely on the slow log, on a tool likept-query-digest and on EXPLAIN.

Some tips:

  • create missing indexes on the tables
  • add more filters in the queries to gather less data, if you don’t really need it
  • rewrite queries to optimize the execution plan
  • if you have very large queries on purpose, you can use a queue manager in your applications to serialize their executions or to decrease the concurrency

This will be the longest activity, but hopefully, after all the optimizations, you can return to set the temporary storage engine to InnoDB for better performance.

Conclusion

Sometimes the improvements can have unexpected side effects. The InnoDB storage engine for on-disk temporary tables is a good improvement, but in some particular cases, for example, if you have non-optimized queries and little free space, you can have outages because of “disk full” error. Stepping back the tmp storage engine to MyISAM is the fastest way to avoid outages, but the optimization of the queries is the more important thing to do as soon as possible in order to return to InnoDB. And yes, even a larger or dedicated disk may help. It’s a trivial suggestion, I know, but it can definitely help a lot.

By the way, there’s a feature request about the issue: https://bugs.mysql.com/bug.php?id=82556


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK