Primary Key in InnoDB: how they work and best practices

A vintage primary key

InnoDB is MySQL default storage engine – the one you use if you don’t know what a storage engine is, or you don’t have special needs. This foreword will surprise my readers from the MySQL community, but recently I’ve talked (about databases) with a lot of people who don’t know what InnoDB is.

In this article I will show what a proper primary key looks like for InnoDB, and why.

TL;DR: copy+paste this

It is a good thing to always use a primary key. For most tables, just use this:

id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY

Why is this important?

If you’re still reading, you are interested in why good practices are good, and how things work internally. Let’s see the reasons.

How the primary key and secondary indexes are stored

In InnoDB, under the hood, the primary key is actually the table . The first columns are the ones that the we define as part of the primary key ( id if you follow the above suggestion). Rows are physically ordered by the primary key columns ( id ).

Secondary indexes are stored separately. Each index contains the columns that we define as part of the index.

Suppose for example that we have the following table:

CREATE TABLE t (
    a INTEGER UNSIGNED,
    b INTEGER UNSIGNED,
    c INTEGER UNSIGNED,
    d INTEGER UNSIGNED,
    PRIMARY KEY (a, b),
    INDEX idx_c (c)
) ENGINE InnoDB;

The physical table will look like this:

a b c d 1 0 1 3 1 2 5 6 2 1 1 0 3 1 5 4 4 5 2 0 4 6 1 7

idx_c will look like this:

c a b 1 1 0 1 2 1 1 4 6 2 4 5 5 1 2 5 3 1

Tables that follow these rules are called organised by primary key . Most databases are not organised by primary key.

This information leads us to a series of performance considerations.

Performance of searches by primary key

A search by primary key implies a search in a data structure – in this context we don’t really care about the used data structure. What really matters is that, when InnoDB finds the values it is looking for, it finds all the columns that it may have to return.

A secondary index is different. When InnoDB finds the entries it needs, it finds a reference to the corresponding primary key entries ( a and b , in this case). Usually this is not enough, because more columns need to be read to answer the query. In this case, for each row found in the secondary index, InnoDB will have to read the corresponding rows in the table (primary key) . Each of this search takes approximately the same time as the search by secondary index.

One big reason why each table should have a primary key is that queries by primary key are very fast .

There are, however, queries by secondary index that are as fast as a query by primary key. Consider the following queries:

SELECT c FROM t WHERE c = 1;
SELECT a, b FROM t WHERE c = 1;

Those queries can be satisfied by checking the index, without any additional operation. For these queries, idx_c is a covering index .

Replication

Another performance concern is related to MySQL replication and Galera. It is not limited to InnoDB: in a replication chain (but not in Galera) we can use any storage engine, but this will not mitigate the problem.

Replication works by sending the contents of the binary log from a master to the slaves. The recommended replication format is ROW ( binlog_format = ROW ), which means that a binary representation of the row and the corresponding primary key value is sent . For DELETE s and UPDATE s, this makes the search for the row to erase/modify fast.

What happens if there is no primary key? A NOT NULL UNIQUE index will be used instead. But what if there is no such index? In that case, the whole rows to modify are written in the binary log and sent to the slave. To replicate the change, the slave will have to find the rows with a full table scan , which is obviously much slower.

Avoid the mutex

If we create an InnoDB table without a primary key, it will have a so-called clustered index . It is an internal 6 bytes index, invisible to the user.

The problem with this index is that its write access is governed by a global mutex: dict_sys->mutex . The same mutex governs several, apparently unrelated operations in InnoDB. As a consequence, t he use of clustered indexes sensibly reduces our databases scalability .

Jeremy Cole analyses this problem in his article: How does InnoDB behave without a primary key? . It is from 2013, but occasionally I still find myself pointing people to it to explain why a primary key is highly desirable.

Primary key size

Primary keys are contained by all secondary indexes, so a big primary key leads to big secondary indexes . Also a smaller key is faster , and the primary key is involved in most of the queries.

These are good reasons to have small primary keys. To achieve this, we want to use an integer, as strings take more space. INTEGER UNSIGNED is 4 bytes, and the maximum value is more than 4 billions. This is almost always enough – when it isn’t, we can use BIGINT UNSIGNED .

I don’t see a reason to use less than 4 bytes, even when this can be done safely. Often we have a column pointing to another table’s primary key – call it a referencing column. Primary keys and their referencing columns should be of the same type, otherwise JOIN s will cause a lot of type conversions that will lower performance. It sounds like an easy task, but in practice I see this error quite often in production databases. Always using INTEGER UNSIGNED – except for the rare cases where BIGINT UNSIGNED is necessary – makes the task easier.

Values insert order

A table rows are physically ordered by primary key. Therefore, for performance reasons, new values should be appended to the end of the key. If values are inserted in the middle, InnoDB will have to do additional operations to make room for them.

To guarantee that new values are inserted to the end of the key, we should always use the AUTO_INCREMENT attribute.

Using a TIMESTAMP column will also help to insert the values in the correct order. But nothing guarantees that the generated values are unique, which could results in error during insertions. It’s better to avoid TIMESTAMP primary keys .

Common exceptions

Let’s see a couple of common practical cases where the above general hint doesn’t apply, and why that’s the case.

If you have more exceptions, please explain them in a comment.

Many to many relationships

A many to many relationship between C and t2 is represented as a separate table, with a pointer to those tables:

t3 {t1_id, t2_id}

We could add an id column which is also the primary key. Then we should also ensure that the relationships are unique, by adding a UNIQUE index on t1_id, t2_id . And if we want to optimise searches by t2_id , we should also add an index on that column. But then the question is: will there be a real gain? Probably not. Writes could even be slower, because they will write much more stuff .

So it’s better to build the primary key on both the columns, but in which order? In most practical cases, putting them in the right order guarantees that the rows will usually be written in the primary key order. For example, suppose we need to link books with authors. It is an M2M relationship, no doubt about it. The correct order is (book_id, author_id) . Why?

Because rows will typically be written when a new book appears. And most of the books are written by existing authors. We could even ask the developers to add rows for a new book ordered by the authors id’s – that shouldn’t be hard. But probably it will be useless, as most books are written by a single author.

These considerations ‘are driven by logic. In your organisation things could be different, so make your hypothesis and run some queries to check if they are correct. For new features, you may want to talk to developers to find out how they expect data to look like. If they don’t know, you can reach a product manager.

I gave similar advise in other posts, and I will do it again: I strongly believe that the communication between different parts of an organisation is essential for building high quality products and services .

Read-only key/value tables

In this context, by read only we mean that a table won’t change as a part of normal operations. For example, it may:

Never change (eg: chemical elements);
Rarely change (eg: our organisation’s offices);
Change in batch (eg: nightly, once a week).

This rules out the concern that we may have performance problems during insertions.

In a key/value table, normally we have these access patterns:

Get a single value by key – in a simple query or in a JOIN ;
Check if a query exists ;
Get a list of keys – optionally ordered and paginated.

An id column will not help any of these queries, and will slightly slow them down for reasons we already discussed. Thus, a read only K/V table should look like this:

CREATE TABLE constant (
    constant_name VARCHAR(100) NOT NULL,
    constant_value VARCHAR(1000) NOT NULL,
    PRIMARY KEY (constant_name)
) ENGINE InnoDB;

The reality is usually a bit more complex and dirty than clean examples designed for technical writings. For example, a similar table could have a timestamp or it could be some sort of multivalue. Yet, if it’s read only, similar considerations should apply.

Conclusions

Now we know why primary keys are so important in InnoDB and how they are implemented. We know what a key should look like in the general case, and in some special cases, namely: many to many relationships, and read-only K/V tables.

TL;DR: copy+paste this

Why is this important?

How the primary key and secondary indexes are stored

Performance of searches by primary key

Replication

Avoid the mutex

Primary key size

Values insert order

Common exceptions

Many to many relationships

Read-only key/value tables

Conclusions

Please let me know!

Recommend

【链得得独家】一文读懂日本金融厅“区块链圆桌闭门会议”九大核心内容

肺癌预测的人机对决：谷歌 AI 挫败 6 位放射专家

ConstraintLayout 介绍与实战 - 知乎

Issue 184 – May 21st 2019

GitHub - abraithwaite/jeff: ?Jeff provides the simplest way manage web sessions...

黑苹果完美攒机配置分享

你的前端知识框架，该如何搭建？

除了追番，你还可以在 B 站爱上学习

GitHub - sunpowerio/android-wallet: 光元区块链安卓钱包

技术渣，谁给写个爬虫，感谢

About Joyk