MyRocks Disk Full Edge Case

VRnM7zU.jpg!web RocksDB engine—and it’s MySQL implementation MyRocks—is a very good alternative engine for MySQL. It has proven to be very efficient and stable for many workloads, including those of large scale. However, it is still a relative newborn in the MySQL ecosystem, and has only a small fraction of the adoption rate of InnoDB. That means it is not so well tested at all possible edge cases, and may have many unreported bugs. One known bug is discussed here: if you are a MyRocks user, it’s important that you are aware of the possibility of lost data in the specific circumstances described below.

In writing this article, I want to broadcast a wider warning, as the problem I found is pretty serious and could lead to a very unpleasant situation.

The problem is related to not a very edge case after all – disk full. The result could be extremely bad, though. After printing some errors, the RocksDB engine continues to message clients that consecutive writes are accepted, while they are, in fact, lost! In addition, depending on the workload and at which moment the disk ran out of space, in a worst case scenario we may lose access to tables that use the RocksDB engine completely.

Let me show how bad the situation could be, using a basic sandbox and sysbench test example. I have a sandbox with MyRocks installed:

mysql [localhost] {msandbox} ((none)) > show engines\G
*************************** 1. row ***************************
Engine: ROCKSDB
Support: YES
Comment: RocksDB storage engine
Transactions: YES
XA: YES
Savepoints: YES
 
mysql [localhost] {msandbox} ((none)) > select @@version,@@version_comment;
+-----------+----------------------------------------------------+
| @@version | @@version_comment |
+-----------+----------------------------------------------------+
| 5.7.22-22 | Percona Server (GPL), Release 22, Revision f62d93c |
+-----------+----------------------------------------------------+
1 row in set (0.00 sec)

Once my test system had not so much free disk space remaining, I ran this simple sysbench prepare command:

$ sysbench /usr/share/sysbench/oltp_insert.lua --mysql_storage_engine=rocksdb --table-size=1000000 --tables=4 --mysql-db=db2 --mysql-user=root --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox5723.sock --threads=4 --time=200 --report-interval=1 --events=0 --db-driver=mysql prepare
sysbench 1.0.15 (using bundled LuaJIT 2.1.0-beta2)
 
Initializing worker threads...
 
Creating table 'sbtest2'...
Creating table 'sbtest3'...
Creating table 'sbtest1'...
Creating table 'sbtest4'...
Inserting 1000000 records into 'sbtest2'
Inserting 1000000 records into 'sbtest1'
Inserting 1000000 records into 'sbtest3'
Inserting 1000000 records into 'sbtest4'
Creating a secondary index on 'sbtest4'...
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest4_k_4_4_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_4 ON sbtest4(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest4_k_4_4_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
Creating a secondary index on 'sbtest2'...
Creating a secondary index on 'sbtest1'...
Creating a secondary index on 'sbtest3'...
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest2_k_2_5_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_2 ON sbtest2(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest2_k_2_5_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest3_k_3_6_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_3 ON sbtest3(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest3_k_3_6_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
FATAL: mysql_drv_query() returned error 1105 ([./.rocksdb/db2.sbtest1_k_1_7_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device) for query 'CREATE INDEX k_1 ON sbtest1(k)'
FATAL: `sysbench.cmdline.call_command' function failed: /usr/share/sysbench/oltp_common.lua:238: SQL error, errno = 1105, state = 'HY000': [./.rocksdb/db2.sbtest1_k_1_7_0.bulk_load.tmp] bulk load error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device

This resulted in the following errors being printed to the error log:

2018-08-10T11:01:35.379611Z 5 [Note] RocksDB: Begin index creation (0,260)
2018-08-10T11:01:40.105596Z 3 [Note] RocksDB: Begin index creation (0,261)
2018-08-10T11:01:40.838902Z 4 [Note] RocksDB: Begin index creation (0,262)
2018-08-10T11:01:40.924093Z 6 [Note] RocksDB: Begin index creation (0,263)
2018-08-10T11:03:14.868958Z 0 [ERROR] RocksDB: Error detected in background, Status Code: 5, Status: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on device
2018-08-10T11:03:14.868986Z 0 [ERROR] RocksDB: BackgroundErrorReason: 0
2018-08-10T11:03:14.869003Z 0 [ERROR] LibRocksDB:[/mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/min-stretch-x64/percona-server-5.7.22-22/storage/rocksdb/rocksdb/db/db_impl_compaction_flush.cc:1293] Waiting after background flush error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000101.sst: No space left on deviceAccumulated background error counts: 1
2018-08-10T11:03:15.430064Z 0 [ERROR] LibRocksDB:[/mnt/workspace/percona-server-5.7-binaries-release-rocks/label_exp/min-stretch-x64/percona-server-5.7.22-22/storage/rocksdb/rocksdb/db/db_impl_compaction_flush.cc:1373] Waiting after background compaction error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000102.sst: No space left on device, Accumulated background error counts: 2
2018-08-10T11:03:18.538866Z 9 [Note] RocksDB: Begin index creation (0,268)
2018-08-10T11:03:19.897797Z 9 [ERROR] Error finishing bulk load.
2018-08-10T11:03:23.306957Z 10 [Note] RocksDB: Begin index creation (0,269)
2018-08-10T11:03:23.509448Z 8 [Note] RocksDB: Begin index creation (0,270)
2018-08-10T11:03:23.539705Z 7 [Note] RocksDB: Begin index creation (0,271)
2018-08-10T11:03:24.536092Z 10 [ERROR] Error finishing bulk load.
2018-08-10T11:03:24.659233Z 7 [ERROR] Error finishing bulk load.
2018-08-10T11:03:24.736380Z 8 [ERROR] Error finishing bulk load.

OK, so I resized my data partition online (no MySQL service restart) and gave it more free disk space. And here is what happened next. I tried to see if MySQL/MyRocks works now:

mysql [localhost] {msandbox} (db1) > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.33 sec)
 
mysql [localhost] {msandbox} (db1) > insert into sbtest1 set k=1;
Query OK, 1 row affected (0.01 sec)
 
mysql [localhost] {msandbox} (db1) > select count(*) from sbtest1;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.32 sec)
 
mysql [localhost] {msandbox} (db1) > delete from db1.sbtest2 where id<100;
Query OK, 99 rows affected (0.01 sec)
 
mysql [localhost] {msandbox} (db1) > select count(*) from db1.sbtest2;
+----------+
| count(*) |
+----------+
| 1000000 |
+----------+
1 row in set (0.32 sec)

That’s the worst part – there’s no error returned to the client, and yet my new writes are basically gone!

The good thing is we can still read from the tables, so mysqldump or mydumper is the way to go now, BEFORE you attempt to restart the service :

mysql [localhost] {msandbox} (db1) > select id from db1.sbtest2 where id<10;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+----+
9 rows in set (0.00 sec)

When I later try to restart the service, it crashes on shutdown. In some cases, the database is able to recover and offer access to the old RocksDB data (though the new data, written above, are still lost).

In other cases, though, your RocksDB won’t load after restart:

mysql [localhost] {msandbox} (db1) > show create table sbtest1\G
ERROR 1286 (42000): Unknown storage engine 'ROCKSDB'

and using this option:

rocksdb_wal_recovery_mode = 3

…helps to recover the engine to usable state.

Ultimately, the worst case scenario is that you end up with the MyRocks engine dead and unable to recover, no matter what options you try. We can see this in the error log when I try to start the service:

2018-08-10T09:33:18.756868Z 0 [Note] RocksDB: 2 column families found
2018-08-10T09:33:18.756993Z 0 [Note] RocksDB: Column Families at start:
2018-08-10T09:33:18.757016Z 0 [Note] cf=default
2018-08-10T09:33:18.757022Z 0 [Note] write_buffer_size=67108864
2018-08-10T09:33:18.757027Z 0 [Note] target_file_size_base=67108864
2018-08-10T09:33:18.757043Z 0 [Note] cf=__system__
2018-08-10T09:33:18.757048Z 0 [Note] write_buffer_size=67108864
2018-08-10T09:33:18.757052Z 0 [Note] target_file_size_base=67108864
2018-08-10T09:33:18.811671Z 0 [ERROR] RocksDB: Could not get index information for Index Number (0,282), table db3.sbtest4
2018-08-10T09:33:18.811741Z 0 [ERROR] RocksDB: Failed to initialize DDL manager.
2018-08-10T09:33:18.811795Z 0 [ERROR] Plugin 'ROCKSDB' init function returned error.
2018-08-10T09:33:18.811812Z 0 [ERROR] Plugin 'ROCKSDB' registration as a STORAGE ENGINE failed.

Unfortunately, we were not able to find any workaround for this ultimate worst case situation, and so recovering data— including all other RocksDB tables not touched during the incident —may be very hard.

The problem, which is obviously a bug, is reported here: https://jira.percona.com/browse/PS-4706 .

Fortunately, the engineers at Facebook—the original RocksDB founder—are aware of the issue and have already released a simple patch. This allows the engine simply to crash instead of allowing lost writes. While this is not a perfect solution, it at least protects you from losing data. Related commit details can be found here: https://github.com/facebook/mysql-5.6/commit/4a7d3fb8f96c96c2be10d61a7796ccac7610a5d6 .

At Percona, we accepted that fix, and the latest Percona Server for MySQL 5.7.23 version , already has it incorporated: https://github.com/percona/percona-server/commit/d0d5635573ebc4f157e36a989d14f7ee6f87c410

So, with the upgraded version, we instead will see this kind of error log event:

2018-09-19T08:12:14.496412Z 0 [ERROR] RocksDB: Error detected in background, Status Code: 5, Status: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000104.sst: No space left on device
2018-09-19T08:12:14.496435Z 0 [ERROR] RocksDB: BackgroundErrorReason: 1
2018-09-19T08:12:14.496451Z 0 [ERROR] LibRocksDB:[/mnt/workspace/percona-server-5.7-binaries-release-rocks-<span class="code-keyword">new</span>/label_exp/min-stretch-x64/test/percona-server-5.7.23-23/storage/rocksdb/rocksdb/db/db_impl_compactio
n_flush.cc:1472] Waiting after background compaction error: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000104.sst: No space left on device, Accumulated background error counts: 1
2018-09-19T08:12:14.511910Z 12 [ERROR] RocksDB: failed to write to WAL, Status Code: 5, Status: IO error: No space left on deviceWhile appending to file: ./.rocksdb/000104.sst: No space left on device
2018-09-19T08:12:14.511935Z 12 [ERROR] MyRocks: aborting on WAL write error.
08:12:14 UTC - mysqld got signal 6 ;

Therefore, all MyRocks users are advised to upgrade ASAP and if that’s not possible, you should at least double check the disk space monitoring and alerting.

Recommend

命令行通配符教程

Next.js & Apollo GraphQL Performance Tuning: Lists & pagination

如何通过编程解决华容道问题？

Digging deeper in SSA

原荐 MicroPython实例之TPYBoard开发板控制OLED显示中文

Golang创建最简单的HTTP和HTTPS服务

ProxySQL 1.4.10 and Updated proxysql-admin Tool Now in the Percona Repository

[译] 一些好用的开源监控工具汇总

论SLAM技术发展趋势

Basement 后端云服务正式亮相云栖大会

About Joyk