

PostgreSQL: LIMIT vs FETCH FIRST ROWS ... WITH TIES - Cybertec
source link: https://www.cybertec-postgresql.com/en/postgresql-limit-vs-fetch-first-rows-with-ties/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Most people in the SQL and in the PostgreSQL community have used the LIMIT clause provided by many database engines. However, what many do not know is that LIMIT / OFFSET
are off standard and are thus not portable. The proper way to handle LIMIT
is basically to use SELECT … FETCH FIRST ROWS
. However, there is more than meets the eye.
LIMIT vs. FETCH FIRST ROWS
Before we dig into some of the more advanced features we need to see how LIMIT
and FETCH FIRST ROWS
can be used. To demonstrate this feature, I have compiled a simple data set:
test=#
CREATE
TABLE
t_test (id
int
);
CREATE
TABLE
test=#
INSERT
INTO
t_test
VALUES
(1), (2), (3), (3),
(4), (4), (5);
INSERT
0 7
test=#
TABLE
t_test;
id
----
1
2
3
3
4
4
5
(7
rows
)
Our data set has 7 simple rows. Let’s see what happens if we use LIMIT
:
test=#
SELECT
*
FROM
t_test LIMIT 3;
id
----
1
2
3
(3
rows
)
In this case, the first three rows are returned. Note that we are talking about ANY rows here. Whatever can be found first is returned. There is no special order.
The ANSI SQL compatible way of doing things is as follows:
test=#
SELECT
*
FROM
t_test
FETCH
FIRST
3
ROWS
ONLY
;
id
----
1
2
3
(3
rows
)
Many of you may never have used or seen this kind of syntax before, but this is actually the “correct” way to handle LIMIT
.
However, there is more: What happens if NULL
is used inside your LIMIT
clause? The result might surprise you::
test=#
SELECT
*
FROM
t_test LIMIT
NULL
;
id
----
1
2
3
3
4
4
5
(7
rows
)
The database engine does not know when to stop returning rows. Remember, NULL
is undefined, so it does not mean zero. Therefore, all rows are returned. You have to keep that in mind in order to avoid unpleasant surprises…
FETCH FIRST … ROWS WITH TIES
WITH TIES
has been introduced in PostgreSQL 13 and fixes a common problem: handling duplicates. If you fetch the first couple of rows, PostgreSQL stops at a fixed number of rows. However, what happens if the same data comes again and again? Here is an example:
test=#
SELECT
*
FROM
t_test
ORDER
BY
id
FETCH
FIRST
3
ROWS
WITH
TIES;
id
----
1
2
3
3
(4
rows
)
In this case, we’ve actually got 4 rows, not just 3. The reason is that the last value shows up again after 3 rows, so PostgreSQL decided to include it as well. What is important to mention here is that an ORDER BY
clause is needed, because otherwise, the result would be quite random. WITH TIES
is therefore important if you want to include all rows of a certain kind – without stopping at a fixed number of rows.
Suppose one more row is added:
test=#
INSERT
INTO
t_test
VALUES
(2);
INSERT
0 1
test=#
SELECT
*
FROM
t_test
ORDER
BY
id
FETCH
FIRST
3
ROWS
WITH
TIES;
id
----
1
2
2
(3
rows
)
In this case, we indeed get 3 rows, because it is not about 3 types of values, but really about additional, identical data at the end of the data set.
WITH TIES: Managing additional columns
So far we have learned something about the simplest case using just one column. However, that’s far from practical. In a real work application, you will certainly have more than a single column. So let us add one:
test=#
ALTER
TABLE
t_test
ADD
COLUMN
x
numeric
DEFAULT
random();
ALTER
TABLE
test=#
TABLE
t_test;
id | x
----+--------------------
1 | 0.258814135879447
2 | 0.561647200043165
3 | 0.340481941960185
3 | 0.999635345010109
4 | 0.467043266494571
4 | 0.742426363498449
5 | 0.0611112678267247
2 | 0.496917052156565
(8
rows
)
In the case of LIMIT
nothing changes. However, WITH TIES
is a bit special here:
test=#
SELECT
*
FROM
t_test
ORDER
BY
id
FETCH
FIRST
4
ROWS
WITH
TIES;
id | x
----+-------------------
1 | 0.258814135879447
2 | 0.561647200043165
2 | 0.496917052156565
3 | 0.999635345010109
3 | 0.340481941960185
(5
rows
)
What you can see here is that 5 rows are returned. The fifth row is added because id = 3
appears more than once. Mind the ORDER BY
clause: We are ordering by id
. For that reason, the id
column is relevant to WITH TIES
.
Let’s take a look at what happens when the ORDER BY
clause is extended:
test=#
SELECT
*
FROM
t_test
ORDER
BY
id, x
FETCH
FIRST
4
ROWS
WITH
TIES;
id | x
----+-------------------
1 | 0.258814135879447
2 | 0.496917052156565
2 | 0.561647200043165
3 | 0.340481941960185
(4
rows
)
We are ordering by two columns. Therefore WITH TIES
is only going to add rows if both columns are identical, which is not the case in my example.
LIMIT… Or finally…
WITH TIES
is a wonderful new feature provided by PostgreSQL. However, it is not only there to limit data. If you are a fan of windowing functions you can also make use of WITH TIES as shown in one of my other blog posts covering advanced SQL features provided by PostgreSQL.
Recommend
-
32
PostgreSQL 13 Beta 1 was released yesterday, you can read the release notes here https://www.postgresql.org/about/news/2040/ One thing that cau...
-
5
Overcome 10k rows database limit on Heroku by upgrading the database plan Over time I have collected quite a number of small Rails heroku apps. They usually start small, but overtime they hit the limit of 10k rows an...
-
11
Locking rows in PostgreSQL Someone asked a question today in #postgresql about locking rows for a long running task, where multiple tasks could be updating ro...
-
8
How to interpret PostgreSQL EXPLAIN ANALYZE output © Laurenz Albe 2021...
-
7
PostgreSQL、Redis与Memcached的性能比较 - CYBERTEC测试设置的一些特征:AMD Ryzen 3600 CPU 设置为性能模式。我的测试脚本本质上是同步的,即基本上使用单个 CPU 内核,所以速度可能很重要。 Ubuntu 20.04 桌面作为操作系统,禁用交换。...
-
6
Nic Lin's Blog喜歡在地上滾的工程師最近研究一個在 Postgres 奇怪的效能問題。情況是這樣的,我有一個通知系統 (Notifications) 的 table,也有對常用的搜尋打 index...
-
10
Vinita Kasliwal June 16, 2022 2 minute read...
-
6
Ranked #4 for todayRows CommunityThe first publishing platform for spreadsheetsRows Community got a major revamp making it easier...
-
4
How to Partition Consecutive Rows with PostgreSQL After a recent hot streak in my golf league, I decided I’d like to add a new feature to the league’s website. T...
-
6
Rumor There’s rumor around the internet that we should avoid having > 20M rows in a single MySQL table. Otherwise, the table’s performance will be downgraded, you will find SQL query much slower than usual when it’s above the soft l...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK