[Warning] InnoDB: Difficult to Find Free Blocks in the Buffer Pool

bMbqUze.png!web A couple of weeks ago, one of our customers reached us asking about the WARNING messages in their MySQL error log. After a while, there were a few more requests from some other customers asking whether to worry about these messages or not. In this post, I am going to write about the condition at which this WARNING message is written into the log and will explain some of the fundamentals behind the scene.

Look at the warningmber message which appears in the MySQL error log. It says it’s difficult to find a free block in the buffer pool and searched through the pool in a loop for 336 times. This is something weird to imagine; why would it have to go in a loop so many times? Let’s try to understand this.

[Warning] InnoDB:Difficult to find free blocks in the buffer pool (336 search iterations)! 0 failed attempts to flush a page! Consider increasing the buffer pool size. It is also possible that in your Unix version fsync is very slow, or completely frozen inside the OS kernel. Then upgrading to a newer version of your operating system may help. Look at the number of fsyncs in diagnostic info below. Pending flushes (fsync) log: 0; buffer pool: 0. 1621989850 OS file reads, 1914021664 OS file writes, 110701569 OS fsyncs. Starting InnoDB Monitor to print further diagnostics to the standard output.

2019-10-26T15:02:03.962059Z 4520929 [Warning] InnoDB:Difficult to find free blocks in the buffer pool (337 search iterations)! 0 failed attempts to flush a page! Consider increasing the buffer pool size. It is also possible that in your Unix version fsync is very slow, or completely frozen inside the OS kernel. Then upgrading to a newer version of your operating system may help. Look at the number of fsyncs in diagnostic info below. Pending flushes (fsync) log: 0; buffer pool: 0. 1621989850 OS file reads, 1914021664 OS file writes, 110701569 OS fsyncs. Starting InnoDB Monitor to print further diagnostics to the standard output.

In simple terms, these messages appear when a thread inside InnoDB continuously looks out for a free block but cannot grab one from the buffer pool. In reality, it’s not that simple, and there were some things that happened before it, as knowing about the building blocks of the buffer pool would make it easy to understand this story. The buffer pool is mainly comprised of three lists:

Free List ====> List of page block that a thread can read in a page from the disk
LRU List ====> List of least recently used data/index pages
Flush List ====> List of dirty or modified pages.

As you know, every query is handled by a thread inside InnoDB. If the required page is not available in the buffer pool, it has to be fetched from the disk and placed inside it, for which it needs a free block allocated from the Free list. Remember that a free block can only be allocated from the free list. So a thread that needs a free block goes through below iterations:

Look for a free block from the free list and assign it if it’s available. If not, proceed.
Scan the LRU list for a clean block and move it to the free list and use it available. If not proceed.
Flush a dirty page from the tail of the LRU list and move the free block to the free list and use it.

I have simplified the above steps a bit more than it is, to make it easy to understand, and that’s not the exact way it happens – but something similar, for sure. At any point, if the block is moved to a free list, it can be consumed by any other thread, which can make the current thread fail to grab the free block. Then it continues to go through the loop of actions with 10 ms sleep every time.

When Does it Write the Message to the Log?

When a thread exceeds more than 20 iterations, it writes this WARNING message to the log. Below is the code block from the latest release that points to the exact situation.

  if (n_iterations > 20 && srv_buf_pool_old_size == srv_buf_pool_size) {    <strong><======= Here it is</strong>
    ib::warn(ER_IB_MSG_134)
        << "Difficult to find free blocks in the buffer pool"
           " ("
        << n_iterations << " search iterations)! " << flush_failures
        << " failed attempts to"
           " flush a page! Consider increasing the buffer pool"
           " size. It is also possible that in your Unix version"
           " fsync is very slow, or completely frozen inside"
           " the OS kernel. Then upgrading to a newer version"
           " of your operating system may help. Look at the"
           " number of fsyncs in diagnostic info below."
           " Pending flushes (fsync) log: "

What Do You Do Now?

This situation can be a consequence of different backgrounds, and the below list can be validated to find the exact reason. Though it has saturated the IO subsystem many times, it cannot be the same with every instance and workload.

Slow (OR) Saturated IO sub system that fails to flush pages to disk as fast as required.
An inadequate number of page cleaner threads to do enough checkpoint activity that creates free page blocks inside the buffer pool.
Smaller buffer pool which eventually results in a smaller free list as well as LRU list.

Monitoring tools like Percona Monitoring and Management (PMM) would be very useful to inspect the above aspects. In the below example screenshot captured from my local PMM GUI, you can see that the free pages inside the buffer pool are almost empty at different points in time which is one of the causes behind this situation as explained above. This data is available from the InnoDB Buffer Pool Pages panel of MySQL InnoDB Metrics dashboards.

quMziqJ.png!web

Similarly, there are other panels like InnoDB Buffer Pool Requests to see the disk reads which is another metric to say if the current buffer pool is sufficient or not. One can also see the IO wait pattern by looking at the CPU Usage panel of OS/System Overview dashboards. Below are some of the most common action plans to avoid this situation in an instance.

Increasing the InnoDB buffer pool size when the free buffers in the buffer pool are always <= 5% of the total capacity.
Tuning/Upgrading the IO sub-system to do flush-to-disk activity faster than the present.
Optimizing queries to avoid full table/index scans so that disk reads into the buffer pool be reduced.

Hope this post helps review these cases faster and produces more accurate recommendations according to the cause of the problem.

When Does it Write the Message to the Log?

What Do You Do Now?

Recommend

Embracing JavaScript Tools

kafka高吞吐量之消息压缩

Apple iPhone at risk of hacking through email app - BBC News

太卡了！直播带货的董明珠首秀被网络打败了

疫情致2200万美国民众失业亿万富豪财富却增长10%

豆瓣有公墓，微博有「悼念堂」，互联网这样祭奠逝者

人造肉的意义我们都懂，但它可能走错了路

Deriving a Quine in a Lisp

"机甲战士"外卖小哥现身上海负重一百斤一口气爬五楼_创事记_新浪科技_新浪...

iQOO Neo3评测:144Hz竞速屏加身重新定义平价旗舰

About Joyk