

Federico Razzoli: Why MySQL tables need a primary key
source link: https://www.tuicool.com/articles/nqA7zuU
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.


Traitors gate, in the Tower of London.
Finding the keys is very important, if you’re trapped in a place like this.
In MySQL and MariaDB, all tables should have a primary key – especially InnoDB tables . I mentioned this in several articles, see Primary Key in InnoDB: how they work and best practices . But occasionally people tell me that they are not convinced about that, and I have to explain the reasons. I am going to write them hear, to keep this article as a future reference. I will include links to other resources for more in depth info about single aspects.
Implicit primary keys
First of all, what happens if there is no primary key? There are two possibilities.
If you do not define a PRIMARY KEY
for your table, MySQL locates the first UNIQUE
index where all the key columns are NOT NULL
and InnoDB
uses it as the clustered index.
So, it’s possible that the table will still have a clustered index based on an existing secondary index . This will not be obvious from information_schema
tables or statements like SHOW CREATE TABLE
. The effects of removing that particular UNIQUE
index will not be obvious. Basically, such an important behavior depends on the presence of a particular UNIQUE
index that was created with the table (not later), and the order of such indexes in the CREATE TABLE
statement. This is counter-intuitive and confusing . If we build a view to list tables without a primary key, we need additional logic to check if there is an implicit primary key.
Now let’s see what happens if no index is eligible to be a clustered index.
If the table has no PRIMARY KEY
or suitable UNIQUE
index, InnoDB
internally generates a hidden clustered index named GEN_CLUST_INDEX
on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB
assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
This clarifies what a clustered index is (if there is no NOT NULL UNIQUE
column). I’ve met people who thought that this is good, because saves us from having a table without a primary key. But actually, all the problems explained in the rest of this article only apply to tables with the 6 bytes clustered index .
Fast queries
This is the weakest reason. For some tables, it doesn’t even apply. Yet, it is something that we should keep in mind.
As I explained in the article Primary Key in InnoDB: how they work and best practices , in InnoDB a table is actually a primary key . The values of all columns are physically stored in the primary key, ordered by primary key values. All other indexes are stored in separated data structures.
Think what it means for very common queries like this:
SELECT first_name, last_name FROM user WHERE id = 24;
InnoDB will look for id=24
in the primary key, and it finds all values it needs. A very fast search. But think about this query:
SELECT first_name, last_name FROM user WHERE email = '[email protected]';
Suppose there is an index on email
. InnoDB will look for that beautiful email address in the index; if it finds it, it will find id=24
, and then it will needs to search the corresponding entry in the primary key. Two operations instead of one . That’s why searches by primary key are much faster with InnoDB.
Clustered index and dict_sys->mutex
The mutex problem is well described in the wonderful article How does InnoDB behave without a Primary Key? , by Jeremy Cole. The core of the problem is that:
This global sequence counter is protected by dict_sys->mutex, even for incrementing (as opposed to using atomic increment). (…)
Given how much other code within InnoDB is protected by dict_sys->mutex
. I think it’s fair to say any tables with an implicit clustered key ( ROW_ID
) could expect to experience random insert stalls during operations like dropping (unrelated) tables. Parallel insertion into multiple tables with implicit keys could be performance-constrained, as it will be serialized on both the shared mutex and cache contention for the shared counter variable.
How does InnoDB behave without a Primary Key?
Jeremy Cole
I am pretty sure that the article used to list an incomplete series of operations governed by dict_sys->mutex
. Unfortunately the list is not there anymore, maybe because it was too incomplete, or it turned out to be incorrect, or simply because it became obsolete with time. What I can add here is that the list included many operations whose correlation with the generation of a clustered index value was quite hard to understand.
Primary key, the binary log and replication
The recommended binary log format is ROW
. With this format, the changes are logged in a binary form, which means:
INSERT DELETE UPDATE
Actually old values are also logged if binlog_row_image=FULL
, but this doesn’t matter now.
What happens if there is no primary key? In that case, all column values are logged to identify the row . This makes the binary log bigger. But there is a bigger problem.
In MySQL replication, the binary log entries ( events ) are sent from the master to the slaves so they can be replicated. A deletion or an update by primary key are supposed to be very fast. But a deletion or an update by all columns is slow, because it implies a full table scan . The bigger the table without a primary key, the slower the slaves.
This problem is demonstrated by Jean-François Gagné in his article The danger of no Primary Key when replicating in RBR .
Primary key and clusters
The same problem appears with clusters (both Galera and group replication), but the effect is different.
Why different? Galera and group replication are synchronous replication mechanisms. A node cannot lag behind others.
Instead, whenever we run a COMMIT
against one node, the transaction needs to be certified by other nodes . The nodes that received the COMMIT
will send the writeset to its fellows, which will try to apply it and communicate the result to the first node. Only after this certification process, the first node will return success to the client.
If the certification takes a long time, the COMMIT
is slow from a client point of view.
Making primary keys mandatory
MariaDB 10.1 introduced the innodb_force_primary_key
variable, which forces InnoDB tables to have a primary key . Unfortunately, this doesn’t solve the replication problem if we use other storage engines.
MariaDB also has a wsrep_certify_nonPK
variable, which automatically generates a primary key if it is missing. This variable is ON
by default. Instead, I recommend to set it to OFF
, and set to ON
instead. In this way no primary key will be automatically generated, but the creation of tables without a primary key will fail.
MySQL 8.0 introduced a better variable: sql_require_primary_key
. It simply disallows to create tables without a primary key , or drop a primary key from an existing table.
Conclusions
I wrote this article as a future reference . When someone asks why I insist on always having primary keys, I will give a quick answer, and will send this link. You may want to do the same.
For this reason, if you have more reasons to make primary keys mandatory, or have something to object, or you have methods to suggest to make primary keys mandatory, please comment! As usual, I’m very interested in your opinion. It helps to turn my articles (writings from a single person, after all) into a more collective knowledge base.
Toodle pip,
Federico
Recommend
-
48
Last year I published my 2018 Database Wishlis...
-
35
Having a primary key defined for each user table is best practice for performance in InnoDB. And when using Group Replication or InnoDB Cluster for automatic high availability it is even mandatory. So it is wise to check...
-
5
Federico Mena-Quintero Main :: Activity Log Boring news about Federico You may want to look at a
-
14
Mar 19, 2021 — 21:06 CUT MacStories Unwind: Federico Test Drives a High-End iPad Dock, We Recap Reviews of Panic’s Nova and Sofa, and the Sad, Perplexing HomePod News ...
-
10
Creator & Artist Stories How Federico Vigevani hit 10M subs in six year...
-
6
Episode 219 – May 19, 2021 Federico’s 2021 iPad Pro Review: Step into the Capture Zone 0:00 01:04:42 ...
-
8
Episode 228 – Jun 21, 2021 Federico’s Research and Note-Taking Setup 0:00 41:11
-
6
Mar 18, 2022 — 18:04 CUT MacStories Unwind: Federico’s Surprise
-
1
Why MySQL Could Be Slow With Large Tables Back to the Blog
-
7
ProblemAt times we might need uniqueness constraint for a common column across tables. Let’s say we’ve a bunch of products a retailer sells e.g. books, toys and dresses. Each item-kind needs its own table as they’ll have diff...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK