57

READ ONLY transactions in MySQL

 4 years ago
source link: https://www.tuicool.com/articles/zM7vmi3
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
NBrAfu6.jpg!web Duke Humfrey’s Library – Bodleian Library, Oxford, Europe

Transactions are the least known RDBMS features. Everyone knows they exist, but few know how they work. For this reason, it is not obvious that they can be read only. This article explains what read only transactions are and why we should use them when appropriate.

What is a read only transaction?

An objection I’ve heard more than once is: transactions are for writes, SELECT s don’t even happen inside a transaction! But this is wrong. Transactions are not only about ROLLBACK and COMMIT . They are also about isolation .

As a general rule, this is what happens. When you start a transaction we acquire a view , or snapshot , on data . Until the end of the transaction we will see that snapshot, not modifications made by other connections .

Reads are generally non-locking. After you’ve read a row, other users can still modify it, even if your transaction is still in progress. Internally, MySQL will keep the information that allow to “rebuild” data as they were when you acquired the snapshot, so you can still see them. This information is preserved in the transaction logs .

To be clear, we actually always use transactions. By default, we run queries in autocommit mode, and we don’t start or end transactions explicitly. But even in this case, every SQL statement we run is a transaction.

There is no such thing as reading objective data. We always read from (or write to) snapshots acquired at a certain point of time , that include all the changes made by committed transactions.

Why should we use read only transactions?

The answer should be obvious now. When we want to run more than one SELECT , and we want the results to be consistent, we should do it in a read only transaction. For example, we could do something like this:

SELECT COUNT(DISTINCT user_id)
    FROM product WHERE date = DATE();
SELECT COUNT(DISTINCT product_id)
    FROM product WHERE date = DATE();

If we want the two results to reflect the reality at a certain point in time, we need to do this in a transaction:

START TRANSACTION READ ONLY;
SELECT COUNT(DISTINCT user_id)
    FROM product WHERE date = DATE();
SELECT COUNT(DISTINCT product_id)
    FROM product WHERE date = DATE();
COMMIT;

The question implies something: why should we use a read only transaction, when we can use a regular read write transaction? If for some reason we try to write something, the statement will fail. So it’s simpler to just always use read write transactions.

The reason for using read only transactions is mentioned in the MySQL manual:

InnoDB can avoid the overhead associated with setting up the transaction ID ( TRX_ID field) for transactions that are known to be read-only. A transaction ID is only needed for a transaction that might perform write operations or locking reads such as SELECT ... FOR UPDATE . Eliminating unnecessary transaction IDs reduces the size of internal data structures that are consulted each time a query or data change statement constructs a read view .

8.5.3 Optimizing InnoDB Read-Only Transactions – 25 August 2019

Isolation levels

Isolation level is a setting that defines how much the current transaction is isolated from concurrent transactions. It can be changed at session level, before starting the next transaction. There is also a global setting, which takes effect for sessions that don’t change their isolation level. By default it is REPEATABLE READ .

Isolation levels are:

READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ

(We’re skipping SERIALIZABLE here, as it’s just a variation of REPEATABLE READ and it does not make sense with read only transactions)

This list is only valid for MySQL. Different DBMSs support different isolation levels.

The behavior described above refers to REPEATABLE READ . To recap, it acquires a snapshot of the data, which is used for the whole duration of the transaction.

Another isolation level is READ COMMITTED . It acquires a new snapshot for each query in the transaction. But this does not make sense for a read only transaction.

A usually underestimated isolation level is READ UNCOMMITTED . The reason why it is underestimated is that it reads data that are being written by other transactions, but are not yet committed – and possibly, will never be. For this reason, highly inconsistent data may be returned.

However, this can be an important optimisation. In my experience, it can make some queries much faster and greatly reduce CPU usage, especially when you are running expensive analytics SELECT s on write-intensive servers.

There are cases when results inconsistencies don’t matter. A typical example is running aggregations on many rows: an average on one million values will not change sensibly. Another example is when reading session-level data, that no one else is modifying; or old data, that no one is supposed to modify anymore.

If you want to use READ UNCOMMITTED , there is no reason to wrap multiple queries in a single transaction. For each query, you can run something like this:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
START TRANSACTION;
SELECT COUNT(*) FROM product;
COMMIT;

MetaData Lock and the rollback trick

When accessing a table using a non-transactional engine (like MyISAM) inside a transaction, MySQL will return a warning. They seem to assume that there is no reason to access such tables from a transaction, which is false. MariaDB doesn’t return such warning.

In reality, transactions acquire metadata locks (MDL) . This means that other connections will not be able to run an ALTER TABLE on those tables until the transaction finishes. Preventing such problems is very important and should be done every time it’s useful.

mysqldump rolls back

mysqldump --single-transaction uses default read write transactions, not read only ones. My guess is that they don’t consider it important for a single long transaction, but just for many concurrent transactions.

However, the MariaDB version does something interesting with rollbacks:

ROLLBACK TO SAVEPOINT in --single-transaction mode to release metadata lock on table which was already dumped. This allows to avoid blocking concurrent DDL on this table without sacrificing correctness, as we won't access table second time and dumps created by --single-transaction mode have validity point at the start of transaction anyway.

Note that this doesn't make --single-transaction mode with concurrent DDL safe in general case. It just improves situation for people for whom it might be working.

mysqldump in MariaDB repo , taken from commit 624dd71b9419555eca8baadc695e3376de72286f

In other words:

  • An MDL blocks some operations, so it can be a problem just like any other lock, especially in a long transaction;
  • An MDL is only useful for data that we are going to access again, just like any other lock;
  • Transactions support savepoints in the middle of a transaction, and we can rollback to a savepoint instead of rolling the whole transaction back;
  • Doing so releases the MDLs on objects that were only accessed after the savepoint.

Conclusions

Here I wanted to advertise a feature that, in my opinion, should be used much more often. Actually, if you know from the beginning that a transaction is read only, there is no reason not to inform MySQL about that.

We discussed which isolation levels make sense for read only transactions, and the nice ROLLBACK TO SAVEPOINT trick used by mysqldump for MariaDB.

Did you notice any mistake? Do you have ideas to contribute?
Please comment!

As usual, I’ll be happy to fix errors and discuss your ideas.

Toodle pip,
Federico

Photo source: Wikipedia


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK