

SQL That Meant Well: Applying Conditions in the Subquery vs the Outer Query
source link: https://www.tuicool.com/articles/hit/f2UvU3y
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

This is a new series that has been percolating in my head for a while. When I help clients with SQL, we’re often adding (or removing) indexes to help SQL performance. It has been said (though not by me) that SQL re-write is obsolete because the Db2 Optimizer will take care of that. But there are still some ways you can write a query that will have it using a much more expensive access plan than is needed. This series covers SQL that returned the required results, but does so in a way that could have been written much more efficiently.
Much of what is reviewed here would be true for any RDBMS, but I know it to be true for these queries on Db2 for Linux, Unix, and Windows. Table or column names may be changed to protect the innocent.
This particular SQL was found in the an analysis of the package cache using my favorite SQL for finding problem SQL in the package cache . In several different time frames analyzed, it was found to be consuming more than 10% of the Rows Read by all the queries in the package cache.
SQL Statement or Fragment Found in the Wild
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id, environment_start_dt DESC ) seq , person_id , sample1 , sample2 , sample3 , environment_no , environment_start_dt FROM schema.ENVIRNOMENT ) AS recentenv WHERE seq = 1 and person_id=? ;
Access Plan
Explaining this query showed most of the expense was in the table scan of schema.ENVIRONMENT.
Rows RETURN ( 1) Cost I/O | 1.91471e-06 FILTER ( 2) 29995.8 17868 | 573015 TBSCAN ( 3) 29880.2 17868 | 573015 SORT ( 4) 29859.9 17868 | 573015 TBSCAN ( 5) 29505.3 17868 | 573015 TABLE: SCHEMA ENVIRONMENT Q1
A Better Way to Get the Same Results
We tried this query instead to reduce the amount of data that actually needed to be processed:
SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY person_id, environment_start_dt DESC ) seq , person_id , sample1 , sample2 , sample3 , environment_no , environment_start_dt FROM schema.ENVIRNOMENT WHERE person_id=? ) AS recentenv WHERE seq = 1 ;
Modified Access Plan
Looking at the access plan, this syntax made a world of difference:
Total Cost: 32.5022 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 1 FILTER ( 2) 32.5022 3 | 1.09716 FETCH ( 3) 32.5014 3 /---+----\ 1.09716 573015 IXSCAN TABLE: SCHEMA ( 4) ENVIRONMENT 21.6697 Q1 2 | 573015 INDEX: DB2INST1 IDX_ENVIRONMENT_01 Q1
The Difference
That’s a 99.999% improvement in the timeron cost – not only do we need to evaluate fewer rows, but there is also an index that we can use to get those rows.
If we look at what we’re asking Db2 to do with each query, the difference is easy to understand:

Original Query:First, figure out a table that returns 6 columns from the table that represents every row from environment along with a column ranking the rows for each person_id from most recent environment start date (1) to least recent. After you have that, find the rows that match a specific person_id, and out of those, find the one row with a rank of 1.
Modified Query: First, figure out a table that returns 6 columns from the table, along with a column ranking the rows from most recent environment start date (1) to least recent for a specific person_id . Then return the row with a rank of 1.
Because the condition is applied on the inner query, and the cardinality of this column is pretty low (it is fairly unique), the rows that even need to be evaluated and sorted is much lower. The first query had to evaluate and work with over 500,000 rows to return one. The second query on average has to deal with one or two. This requires far fewer I/Os. Even if the table is already entirely in the buffer pool, this requires fewer CPU cycles to execute the query. With the modified query, we’re also getting the data in order from an index, which luckily allows us to also eliminate the sort.
Recommend
-
6
How to Improve ActiveRecord Query Performance with Subquery Caching Updated Aug 4, 2020 2 comments 10 minute read ...
-
6
SQL - Subquery returned more than one value. How to fix? advertisements -- Using orders, customers, employees, and order details tables, selec...
-
9
SQL SERVER – Applying Query Hints to ViewsI totally agree that it is not possible to apply query hints to views inside the view. It will give you an error. However, there is a workaround.
-
5
万向区块链蜂巢研习社 | 深度对话波卡生态基础设施:SubQuery、Subscan、zCloak NetworkWeb3.0 训练营二期的培训一直在稳步进行中,万向区块链蜂巢研习社第 61 期举行了 Web3.0 训练营二期基础设施专场直播分享,邀请了 Parity 亚洲负责人 Yaoqi...
-
7
2021-09-08 10:46 融资新闻 | 去中心化数据聚合器SubQuery完成900万美元融资 SubQuery 900万美元 轮次 ...
-
5
Subquery - Number and Quantity advertisements Looking to find all part numbers (d046d) with at least one record having vaule greater than zero...
-
8
What is the difference between correlated subqueries and uncorrelated subqueries? Let’s start out with an example of what an uncorrelated subquery looks like, and then we can compare that with a correlated subquery. Here is an exam...
-
22
What is the difference between a derived table and a subquery? Explain it with an example. Both derived tables and subqueries can look the same and many people may even think they are the same thing, but there are definitely diffe...
-
6
SQL query - count with subquery advertisements I want to get the count of all the columns that are retrieved in...
-
3
October 4, 2022 /
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK