25

Monitor and Optimize Slow Queries with PMM and EverSQL – Part 2

 5 years ago
source link: https://www.tuicool.com/articles/hit/rQzeymB
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

EverSQL is a platform that intelligently tunes your SQL queries by providing query optimization recommendations, and feedback on missing indexes. This is the second post of our EverSQL series, if you missed our introductory post  take a look there first and then come back to this article.

We’ll use the Stackoverflow data set again as we did in our first post.

Diving into query optimization

We’ll grab the worst performing query in the list from PMM and optimize it. This query builds a list of the top 50 most recent posts which have a score greater than two, and involves joining two large tables – posts and comments. The original runtime of that query is above 20 minutes and causes high load on the server while running.

NvIRzaz.png!web

Assuming you have EverSQL’s chrome extension installed, you’ll see a new button in the PMM Query Analytics page, allowing you to send the query and schema structure directly to EverSQL, to retrieve indexing and query optimization recommendations.

aUZjamM.png!web

fmi2yea.png!web

After implementing EverSQL’s recommendations, the query’s execution duration significantly improved:

nEzyYrf.png!web

Optimization Internals

So what was the actual optimization in this specific case? And why did it work so well? Let’s look at the original query:

SELECT
   p.title
FROM
   so.posts p 
       INNER JOIN
   so.comments c ON p.id = c.postid
WHERE
c.score > 2
GROUP BY p.id
ORDER BY p.creationdate DESC
LIMIT 100;

The tables’ structure:

CREATE TABLE `posts` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  ...
  PRIMARY KEY (`Id`),
  KEY `posts_idx_creationdate` (`CreationDate`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1
 
CREATE TABLE `comments` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  `PostId` int(11) NOT NULL,
  `Score` int(11) DEFAULT NULL,
  ....
  PRIMARY KEY (`Id`),
  KEY `comments_idx_postid` (`PostId`),
  KEY `comments_idx_postid_score` (`PostId`,`Score`),
  KEY `comments_idx_score` (`Score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

This query will return the post title of the latest 100 stackoverflow posts, which had at least one popular comment (with a score higher than two). The posts table contains 39,646,923 records, while the comments table contains 64,510,258 records.

This is the execution plan MySQL (v5.7.20) chose:

VjMziyM.png!web

One of the challenges with this query is that the GROUP BY and ORDER BY clauses contain different fields, which prevent MySQL from using an index for the ORDER BY. As MySQL’s documentation states:

“In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it may still use indexes to find the rows that match the WHERE clause. Examples:  … The query has different ORDER BY and GROUP BY expressions.”.

Now let’s look into the optimized query:

SELECT 
   p.title
FROM
   so.posts p
WHERE
   EXISTS( SELECT 
           1
       FROM
           so.comments c
       WHERE
           p.id = c.postid AND c.score > 2)
ORDER BY p.creationdate DESC
LIMIT 100;

Since the comments table is joined in this query only to check for existence of matching records in the posts table, we can use an EXISTS subquery instead. This will allow us to avoid inflating the results (by using JOIN) and then deflating them (by using GROUP BY), which are costly operations.

Now that the GROUP BY is redundant and removed, the database can optionally choose to use an index for the ORDER BY clause.

The new execution plan MySQL chooses is:

fAJ3MvA.png!web

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms.

We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL !

Co-Author: Tomer Shay

zeEr22a.jpg!web

Tomer Shayis the Founder of  EverSQL . He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK