1

PostgreSQL 13支持增量排序(Incremental Sorting) - 月图灵

 1 year ago
source link: https://www.cnblogs.com/zhangfx01/p/16348068.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

PostgreSQL 13支持增量排序(Incremental Sorting)

PostgreSQL 13一个重要的功能是支持增量排序,使用order by 时可以加速排序,SQL如下

select * from test order by a,b limit 10;

如果在字段a上面建立了索引,需要对字段a、b进行排序,如果一个结果已经按几个前导键排序,这就允许对附加的b进行批量排序。

enable_incremental_sort

PostgreSQL新增了配置enable_incremental_sort用于控制是否开启增量排序,此参数默认开启

在PostgreSQL 13中创建测试表进行测试

postgres=# create table test(id int,c1 int ,c2 int,info varchar(300),crt_time timestamp);
CREATE TABLE
postgres=# insert into test select t,t,2,'test',clock_timestamp()  from generate_series(1,1000000)t;
INSERT 0 1000000
postgres=# create index i_test_id on test(id);
CREATE INDEX
--查看数据如下
postgres=# select * from test order by id,c1 limit 10;
 id | c1 | c2 | info |          crt_time          
----+----+----+------+----------------------------
  1 |  1 |  2 | test | 2022-06-02 14:23:38.253289
  2 |  2 |  2 | test | 2022-06-02 14:23:38.253777
  3 |  3 |  2 | test | 2022-06-02 14:23:38.253785
  4 |  4 |  2 | test | 2022-06-02 14:23:38.253787
  5 |  5 |  2 | test | 2022-06-02 14:23:38.25379
  6 |  6 |  2 | test | 2022-06-02 14:23:38.253791
  7 |  7 |  2 | test | 2022-06-02 14:23:38.253793
  8 |  8 |  2 | test | 2022-06-02 14:23:38.253795
  9 |  9 |  2 | test | 2022-06-02 14:23:38.253809
 10 | 10 |  2 | test | 2022-06-02 14:23:38.25381
(10 rows)

PostgreSQL 13 测试

  • 这里我是在pg14中做的测试,pg13这个参数名叫enable_incrementalsort
postgres=# show enable_incremental_sort;
 enable_incremental_sort 
-------------------------
 on
(1 row)

postgres=# explain analyze select * from test order by id,c1 limit 10;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.46..1.16 rows=10 width=25) (actual time=0.159..0.163 rows=10 loops=1)
   ->  Incremental Sort  (cost=0.46..70373.03 rows=1000000 width=25) (actual time=0.157..0.159 rows=10 loops=1)
         Sort Key: id, c1
         Presorted Key: id
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
         ->  Index Scan using i_test_id on test  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.103..0.106 rows=11 loops=1)
 Planning Time: 0.427 ms
 Execution Time: 0.265 ms
(8 rows)

  • 可以看到Incremental SortPresorted Key: id并且走了i_test_id索引,SQL耗时0.265ms

关闭enable_incremental_sort

postgres=# set enable_incremental_sort=off;
SET
postgres=# explain analyze select * from test order by id,c1 limit 10;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=38962.64..38962.67 rows=10 width=25) (actual time=272.945..272.953 rows=10 loops=1)
   ->  Sort  (cost=38962.64..41462.64 rows=1000000 width=25) (actual time=272.933..272.937 rows=10 loops=1)
         Sort Key: id, c1
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on test  (cost=0.00..17353.00 rows=1000000 width=25) (actual time=0.028..118.098 rows=1000000 loops=1)
 Planning Time: 0.305 ms
 Execution Time: 273.023 ms
(7 rows)
  • 关闭增量排序后SQL耗时273.023 ms,性能差了几个数量级

PostgreSQL 12 测试

  • Abase 7.0基于PostgreSQL 12.3

同样使用上面的建表语句,执行SQL如下

postgres=#  explain analyze select * from test order by id,c1 limit 10;
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=38962.64..38962.67 rows=10 width=536) (actual time=288.847..288.851 rows=10 loops=1)
   ->  Sort  (cost=38962.64..41462.64 rows=1000000 width=536) (actual time=288.839..288.840 rows=10 loops=1)
         Sort Key: id, c1
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on test  (cost=0.00..17353.00 rows=1000000 width=536) (actual time=0.078..173.460 rows=1000000 loops=1)
 Planning Time: 24.726 ms
 Execution Time: 289.135 ms
(7 rows)

PG 12中执行计划和PG 14关闭enable_incremental_sort参数一样,性能较低

当然这只是一个简单的查询,如果包含where,以及连表等情况是否也可以使用 Incremental Sort

加上c1 > 100000,c1没有创建索引

postgres=# explain analyze select * from test where c1 > 100000 order by id,c1 limit 10;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.47..1.23 rows=10 width=25) (actual time=49.470..49.476 rows=10 loops=1)
   ->  Incremental Sort  (cost=0.47..68345.40 rows=899386 width=25) (actual time=49.467..49.469 rows=10 loops=1)
         Sort Key: id, c1
         Presorted Key: id
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
         ->  Index Scan using i_test_id on test  (cost=0.42..27873.02 rows=899386 width=25) (actual time=49.383..49.387 rows=11 loops=1)
               Filter: (c1 > 100000)
               Rows Removed by Filter: 100000
 Planning Time: 0.879 ms
 Execution Time: 49.594 ms
(10 rows)

加上 id > 100000,id有索引

postgres=# explain analyze select * from test where id > 100000 order by id,c1 limit 10;
                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.46..1.19 rows=10 width=25) (actual time=0.160..0.164 rows=10 loops=1)
   ->  Incremental Sort  (cost=0.46..65542.05 rows=899386 width=25) (actual time=0.148..0.150 rows=10 loops=1)
         Sort Key: id, c1
         Presorted Key: id
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
         ->  Index Scan using i_test_id on test  (cost=0.42..25069.68 rows=899386 width=25) (actual time=0.115..0.119 rows=11 loops=1)
               Index Cond: (id > 100000)
 Planning Time: 0.408 ms
 Execution Time: 0.258 ms
(9 rows)

可以看到即使where条件没有索引,排序字段有索引也可以使用增量排序功能,而且效果也还不错。做了一个过滤操作 Filter: (c1 > 100000)

PG 13 多字段排序

  • 根据id,c1,c2进行排序,一样可以走增量排序
postgres=# explain analyze select * from test  order by id,c1,c2 limit 10;
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.46..1.16 rows=10 width=25) (actual time=0.175..0.179 rows=10 loops=1)
   ->  Incremental Sort  (cost=0.46..70373.03 rows=1000000 width=25) (actual time=0.172..0.174 rows=10 loops=1)
         Sort Key: id, c1, c2
         Presorted Key: id
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
         ->  Index Scan using i_test_id on test  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.126..0.130 rows=11 loops=1)
 Planning Time: 0.485 ms
 Execution Time: 0.237 ms
(8 rows)

PG 13 join

  • 复制一张test2
postgres=# create table test2 as select * from test;
SELECT 1000000

postgres=# create index i_test2_id on test2(id);
CREATE INDEX
  • join连表查询,并且排序字段test.id,test.c1
postgres=# explain analyze select *from test join test2 on test.id = test2.id order by test.id,test.c1 limit 10;
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.93..3.04 rows=10 width=50) (actual time=0.089..0.092 rows=10 loops=1)
   ->  Incremental Sort  (cost=1.93..110738.33 rows=1000000 width=50) (actual time=0.087..0.089 rows=10 loops=1)
         Sort Key: test.id, test.c1
         Presorted Key: test.id
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
         ->  Merge Join  (cost=1.85..65738.33 rows=1000000 width=50) (actual time=0.044..0.068 rows=11 loops=1)
               Merge Cond: (test.id = test2.id)
               ->  Index Scan using i_test_id on test  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.022..0.036 rows=11 loops=1)
               ->  Index Scan using i_test2_id on test2  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.014..0.018 rows=11 loops=1)
 Planning Time: 0.599 ms
 Execution Time: 0.174 ms
(11 rows)

postgres=# set enable_incremental_sort=off ;
SET
postgres=# explain analyze select *from test join test2 on test.id = test2.id order by test.id,test.c1 limit 10;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=87347.97..87347.99 rows=10 width=50) (actual time=1964.394..1964.407 rows=10 loops=1)
   ->  Sort  (cost=87347.97..89847.97 rows=1000000 width=50) (actual time=1964.391..1964.402 rows=10 loops=1)
         Sort Key: test.id, test.c1
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Merge Join  (cost=1.85..65738.33 rows=1000000 width=50) (actual time=0.070..1690.949 rows=1000000 loops=1)
               Merge Cond: (test.id = test2.id)
               ->  Index Scan using i_test_id on test  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.042..571.732 rows=1000000 loops=1)
               ->  Index Scan using i_test2_id on test2  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.017..585.722 rows=1000000 loops=1)
 Planning Time: 1.292 ms
 Execution Time: 1964.517 ms
(10 rows)

join后排序也可以走增量排序,使用增量排序耗时:0.174 ms,而关闭增量后耗时1964.517 ms

  • 如果join后排序的字段来自不同的表test.id,test2.c1
postgres=# explain analyze select *from test join test2 on test.id = test2.id order by test.id,test2.c1 limit 10;
                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.93..3.04 rows=10 width=50) (actual time=0.151..0.155 rows=10 loops=1)
   ->  Incremental Sort  (cost=1.93..110738.33 rows=1000000 width=50) (actual time=0.149..0.151 rows=10 loops=1)
         Sort Key: test.id, test2.c1
         Presorted Key: test.id
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 26kB  Peak Memory: 26kB
         ->  Merge Join  (cost=1.85..65738.33 rows=1000000 width=50) (actual time=0.075..0.088 rows=11 loops=1)
               Merge Cond: (test.id = test2.id)
               ->  Index Scan using i_test_id on test  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.040..0.044 rows=11 loops=1)
               ->  Index Scan using i_test2_id on test2  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.025..0.028 rows=11 loops=1)
 Planning Time: 0.778 ms
 Execution Time: 0.230 ms
(11 rows)


postgres=# set enable_incremental_sort=off ;
SET
postgres=# explain analyze select *from test join test2 on test.id = test2.id order by test.id,test2.c1 limit 10;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=87347.97..87347.99 rows=10 width=50) (actual time=1493.513..1493.519 rows=10 loops=1)
   ->  Sort  (cost=87347.97..89847.97 rows=1000000 width=50) (actual time=1493.510..1493.513 rows=10 loops=1)
         Sort Key: test.id, test2.c1
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Merge Join  (cost=1.85..65738.33 rows=1000000 width=50) (actual time=0.065..1228.403 rows=1000000 loops=1)
               Merge Cond: (test.id = test2.id)
               ->  Index Scan using i_test_id on test  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.027..318.044 rows=1000000 loops=1)
               ->  Index Scan using i_test2_id on test2  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.027..390.231 rows=1000000 loops=1)
 Planning Time: 0.761 ms
 Execution Time: 1493.685 ms
(10 rows)

join后排序的字段来自不同的表test.id,test2.c1,也可以走增量排序,开启增量耗时:0.230,关闭后耗时:1493.685 ms

来看看一个比较慢的SQL:

  • 这个SQL两表关联,而且使用了c2=2这一列全部为2,并且使用offset 100000
postgres=# explain analyze select *from test join test2 on test.id = test2.id where test.c2 = 2 order by test.id,test2.c1 limit 10 offset 100000;
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11325.58..11326.72 rows=10 width=50) (actual time=198.125..198.131 rows=10 loops=1)
   ->  Incremental Sort  (cost=2.02..113237.64 rows=1000000 width=50) (actual time=0.127..193.661 rows=100010 loops=1)
         Sort Key: test.id, test2.c1
         Presorted Key: test.id
         Full-sort Groups: 3126  Sort Method: quicksort  Average Memory: 29kB  Peak Memory: 29kB
         ->  Merge Join  (cost=1.94..68237.64 rows=1000000 width=50) (actual time=0.052..152.908 rows=100011 loops=1)
               Merge Cond: (test.id = test2.id)
               ->  Index Scan using i_test_id on test  (cost=0.42..27873.02 rows=1000000 width=25) (actual time=0.026..46.138 rows=100011 loops=1)
                     Filter: (c2 = 2)
               ->  Index Scan using i_test2_id on test2  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.020..51.088 rows=100011 loops=1)
 Planning Time: 0.707 ms
 Execution Time: 198.252 ms
(12 rows)

因为增量排序的缘故,查询还是很快

  • 如果我们关闭增量排序功能
postgres=# explain analyze select *from test join test2 on test.id = test2.id where test.c2 = 2 order by test.id,test2.c1 limit 10 offset 100000;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=156536.56..156536.59 rows=10 width=50) (actual time=2496.085..2496.093 rows=10 loops=1)
   ->  Sort  (cost=156286.56..158786.56 rows=1000000 width=50) (actual time=2469.643..2491.429 rows=100010 loops=1)
         Sort Key: test.id, test2.c1
         Sort Method: external merge  Disk: 72432kB
         ->  Merge Join  (cost=1.94..68237.64 rows=1000000 width=50) (actual time=0.082..1371.433 rows=1000000 loops=1)
               Merge Cond: (test.id = test2.id)
               ->  Index Scan using i_test_id on test  (cost=0.42..27873.02 rows=1000000 width=25) (actual time=0.040..433.114 rows=1000000 loops=1)
                     Filter: (c2 = 2)
               ->  Index Scan using i_test2_id on test2  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.033..401.784 rows=1000000 loops=1)
 Planning Time: 0.807 ms
 Execution Time: 2530.205 ms
(11 rows)

这个SQL耗时 2530.205 ms,和198.252 ms比增量排序提升还是很明显

但是我们观察到上面的SQL中使用id进行关联,且用id排序的时候查询效率较高,如果排序的字段换成crt_time效果如何?

postgres=# explain analyze select *from test join test2 on test.id = test2.id where test.c2 = 2 order by test.crt_time,test2.c1 limit 10 offset 100000;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=156536.56..156536.59 rows=10 width=50) (actual time=2702.107..2702.133 rows=10 loops=1)
   ->  Sort  (cost=156286.56..158786.56 rows=1000000 width=50) (actual time=2667.324..2697.033 rows=100010 loops=1)
         Sort Key: test.crt_time, test2.c1
         Sort Method: external merge  Disk: 72432kB
         ->  Merge Join  (cost=1.94..68237.64 rows=1000000 width=50) (actual time=0.161..1524.794 rows=1000000 loops=1)
               Merge Cond: (test.id = test2.id)
               ->  Index Scan using i_test_id on test  (cost=0.42..27873.02 rows=1000000 width=25) (actual time=0.074..488.803 rows=1000000 loops=1)
                     Filter: (c2 = 2)
               ->  Index Scan using i_test2_id on test2  (cost=0.42..25373.02 rows=1000000 width=25) (actual time=0.073..487.688 rows=1000000 loops=1)
 Planning Time: 1.835 ms
 Execution Time: 2746.486 ms
(11 rows)

当join关联的字段和order by的字段不一样时,虽然order by的字段有索引但也不能走,如果字段一致那么也能利用增量排序。

使用test.crt_time排序和上面关闭增量排序执行计划一样

  • 增量排序对于单表多字段排序来说效率还是提升明显

  • join连表查询如果关联的键和排序键一样也能走增量排序,如果不一样则不能走增量排序

参考资料:

https://postgres.fun/20200721193000.html

新版本调研 · 13 Beta 1 初体验

https://mp.weixin.qq.com/s/mBIL2uzIHB7qVByBIVRmhg


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK