MySQL/MariaDB: run less queries!
source link: https://www.tuicool.com/articles/hit/nqAVzuA
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Nowadays probably Sisyphus runs SQL queries that return infinite rows,
and loops over them.
This sculpture is in Highgate Cemetary, in London.
Do you want to improve your application’s and database server’s performance? Of course one of the most important things to do is to optimise the most impacting queries .
You have various ways to find them, depending on what exactly you mean by “most impacting”. You can have an approach oriented to your systems workload , or an approach oriented to your users experience . On the former a lot has been written in the MySQL community. On the latter, I’ll write something in the future. They are not mutually exclusive and they both are important.
But there is a third important approach: get rid of the queries that are serving no purpose . Because they add work to the server, they probably make requests slower, they could lock rows, and they could consume enough CPU and memory to increase the cost of our servers.
MySQL/MariaDB performance_schema
will help to find those queries. It needs to be enabled.
Performance Schema and useless queries
I know, I know: you think you don’t have such queries. Well, chances are, you’re wrong. In complex environments, normally there are useless queries. Maybe they were useful when they were written, but at some point they lost their purpose and no one removed them. Maybe they used to return data that don’t exist anymore, because a feature was removed or rows changed. Maybe they stopped working correctly after a table was modified. Maybe they never worked but no one ever noticed that because testing is insufficient.
Whatever the reason… let’s find them.
Queries that always return an error
Many queries return an error, but… only in certain situations. For example, some queries use the IN (1, 2, 3...)
syntax, which is fine. But sometimes the list of values is empty, and the IN ()
syntax is not valid. Other examples include unquoted strings: if the string is a number, the query will work. But if it’s alphanumeric, it will fail. Of course those queries should be found and fixed.
But some queries always return an error. This happens if their syntax is not valid, and the syntax error does not depend on parameters. Or maybe parameters always cause errors (in the example above, maybe strings are always alphanumeric).
To find such queries, run:
SELECT DIGEST_TEXT, COUNT_STAR, FIRST_SEEN, LAST_SEEN FROM performance_schema.events_statements_summary_by_digest WHERE DIGEST_TEXT IS NOT NULL AND SUM_ERRORS = COUNT_STAR AND COUNT_STAR > 20 AND LAST_SEEN > (NOW() - INTERVAL 1 MONTH) ORDER BY COUNT_STAR DESC LIMIT 10 \G
Some notes:
- We are looking for completely useless queries. But after that, you may want to see queries whose ratio between
COUNT_STAR
andSUM_ERRORS
is bad. Or even queries that returned at least one error. - We put a lower limit to
COUNT_STAR
to avoid false positives – queries that were executed by mistake, perhaps manually, and will never run again. - The limit on
LAST_SEEN
is also important. We want to exclude queries that don’t run anymore, or don’t fail anymore because the conditions that caused the error have changed. - Queries executed more times are obviously more important to erase.
Queries that never return rows
Believe it or not, I often see queries that always return an empty resultset. If you find such queries, you should at least suspect that they are useless and can be removed.
All queries consume some resources. But queries that don’t return rows are much more important to remove than queries that fail. The reason is that these queries are executed. They return nothing, but they could examine a big amount of rows, materialise them in memory, or keep them locked.
Find them in this way:
SELECT * FROM performance_schema.events_statements_summary_by_digest WHERE ( TRIM(DIGEST_TEXT) LIKE 'SELECT%' OR TRIM(DIGEST_TEXT) LIKE 'CREATE%TABLE%SELECT%' OR TRIM(DIGEST_TEXT) LIKE 'DELETE%' OR TRIM(DIGEST_TEXT) LIKE 'UPDATE%' OR TRIM(DIGEST_TEXT) LIKE 'REPLACE%' ) AND SUM_ROWS_SENT = 0 AND SUM_ROWS_AFFECTED = 0 AND COUNT_STAR > 20 AND LAST_SEEN > (NOW() - INTERVAL 1 MONTH) ORDER BY SUM_ROWS_EXAMINED DESC LIMIT 10 \G
Notes:
- See notes about the previous query.
- We need to filter by
DIGEST_TEXT
because we are not interested inCREATE
,DROP
,SET
, commands, and many others. - Writes (
UPDATE
, etc) do not return rows, but they do affect some rows (usually). We exclude them withSUM_ROWS_AFFECTED = 0
. - Here I ordered the results by the number of rows examined because I consider it very significant about the amount of work cause by the query. Depending on which problems you have, you may want to filter by something else, for example
SUM_CREATED_TMP_TABLES
if you have memory consumption problems.
What to do next
Talk to developers. There could be reasons why a query must not be eliminated. If it’s wrong, it can be fixed. If it returns no rows, that could be expected for some reason.
High level developers hate database problems. But this particular type of problems has more chances to attract their attention. Their code does something potentially useless or wrong, and they usually will want to know why and fix the problem.
Do you have better or alternative ways to find such queries? Please comment!
Toodle pip,
Federico
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK