6

【PostgreSQL 】PostgreSQL 15对distinct的优化 - abce

 1 year ago
source link: https://www.cnblogs.com/abclife/p/16484264.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

【PostgreSQL 】PostgreSQL 15对distinct的优化

2022-07-17 11:34  abce  阅读(80)  评论(0)  编辑  收藏  举报

table t_ex;
c1 | c2
----+----
2 | B
4 | C
6 | A
2 | C
4 | B
6 | B
2 | A
4 | B
6 | C
2 | C

以下SQL语句有序地返回"c1"列中唯一值:

select distinct on(c1) * from abce;

对于c2列,会根据c1的唯一性,从表中找到的第一个值。

postgres=# select distinct on(c1) * from abce;
c1 | c2
----+----
2  | B
4  | B
6  | B
(3 rows)

以下SQL语句有序地返回"c2"列中唯一值:

# select distinct on(c2) * from abce;
c1 | c2
----+----
6  | A
2  | B
4  | C
(3 rows)

最后从表中返回唯一性的记录

postgres=# select distinct * from abce;
c1 | c2
----+----
6  | C
4  | C
4  | B
2  | C
2  | A
6  | B
6  | A
2  | B
(8 rows)

那么你可能会问,在postgresql15中,distinct的增强体现在哪些方面呢?答案是:并发

在此之前,只有一个cpu或进程来计算不同的值。在postgresql15中,可以使用并发,使用多个cpu进程。
这一特性涉及好几个参数,但是,我们只聚焦在参数max_parallel_workers_per_gather。

为了演示这个改进,我们创建三个表,没有索引,填充大约5000000条记录。注意,表的列数分别为1,5,10。

Table "public.t1"
Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
c1     | integer |           |          |
Table "public.t5"
Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
c1     | integer               |           |          |
c2     | integer               |           |          |
c3     | integer               |           |          |
c4     | integer               |           |          |
c5     | character varying(40) |           |          |
Table "public.t10"
Column |         Type          | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
c1     | integer               |           |          |
c2     | integer               |           |          |
c3     | integer               |           |          |
c4     | integer               |           |          |
c5     | character varying(40) |           |          |
c6     | integer               |           |          |
c7     | integer               |           |          |
c8     | integer               |           |          |
c9     | integer               |           |          |
c10    | integer               |           |          |
insert into t1 select generate_series(1,500);
insert into t5
select   generate_series(1,500)
,generate_series(500,1000)
,generate_series(1000,1500)
,(random()*100)::int
,'aofjaofjwaoeev$#^&ETHE#@#Fasrhk!!@%Q@';
insert into t10
select   generate_series(1,500)
,generate_series(500,1000)
,generate_series(1000,1500)
,(random()*100)::int
,'aofjaofjwaoeev$#^&ETHE#@#Fasrhk!!@%Q@'
,generate_series(1500,2000)
,generate_series(2500,3000)
,generate_series(3000,3500)
,generate_series(3500,4000)
,generate_series(4000,4500);
List of relations
Schema | Name | Type  |  Owner   | Persistence | Access method |  Size  |
--------+------+-------+----------+-------------+---------------+--------+
public | t1   | table | postgres | permanent   | heap          | 173 MB |
public | t10  | table | postgres | permanent   | heap          | 522 MB |
public | t5   | table | postgres | permanent   | heap          | 404 MB |

下一步是将生成的数据dump到以下的版本中:

PG VERSION
pg96
pg10
pg11
pg12
pg13
pg14
pg15

数据导入后,使用下面的脚本生成结果:

#!/bin/bash
for v in 96 10 11 12 13 14 15
do
# run the explain analzye 5X in order to derive consistent numbers
for u in $(seq 1 5)
do
echo "--- explain analyze: pg${v}, ${u}X ---"
psql -p 100$v db01 -c "explain analyze select distinct on (c1) * from t1" > t1.pg$v.explain.txt
psql -p 100$v db01 -c "explain analyze select distinct * from t5" > t5.pg$v.explain.txt
psql -p 100$v db01 -c "explain analyze select distinct * from t10" > t10.pg$v.explain.txt
done
done

以下是结果比较,可以看到表越大,性能收获越大。

PG VERSION

1 column (t1), ms

5 column (t5), ms

10 column (t10), ms

pg96

3,382

9,743

20,026

pg10

2,004

5,746

13,241

pg11

1,932

6,062

14,295

pg12

1,876

5,832

13,214

pg13

1,973

2,358

3,135

pg14

1,948

2,316

2,909

pg15

1,439

1,025

1,245

764761-20220716143747003-642151134.png

来看看不同版本之间的执行计划:

PG96 QUERY PLAN, TABLE T1
-------------------------------------------------------------------------------
Unique  (cost=765185.42..790185.42 rows=500 width=4) (actual time=2456.805..3381.230 rows=500 loops=1)
->  Sort  (cost=765185.42..777685.42 rows=5000000 width=4) (actual time=2456.804..3163.600 rows=5000000 loops=1)
Sort Key: c1
Sort Method: external merge  Disk: 68432kB
->  Seq Scan on t1  (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.055..291.523 rows=5000000 loops=1)
Planning time: 0.161 ms
Execution time: 3381.662 ms
PG15 QUERY PLAN, TABLE T1
---------------------------------------------------------------------------
Unique  (cost=557992.61..582992.61 rows=500 width=4) (actual time=946.556..1411.421 rows=500 loops=1)
->  Sort  (cost=557992.61..570492.61 rows=5000000 width=4) (actual time=946.554..1223.289 rows=5000000 loops=1)
Sort Key: c1
Sort Method: external merge  Disk: 58720kB
->  Seq Scan on t1  (cost=0.00..72124.00 rows=5000000 width=4) (actual time=0.038..259.329 rows=5000000 loops=1)
Planning Time: 0.229 ms
JIT:
Functions: 1
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.150 ms, Inlining 31.332 ms, Optimization 6.746 ms, Emission 6.847 ms, Total 45.074 ms
Execution Time: 1438.683 ms

当DISTINCT列的数量增加时,真正的差异出现了,如查询表 t10 所示。 可以看到并行化在起作用!

PG96 QUERY PLAN, TABLE T10
-------------------------------------------------------------------------------------------
Unique  (cost=1119650.30..1257425.30 rows=501000 width=73) (actual time=14257.801..20024.271 rows=50601 loops=1)
->  Sort  (cost=1119650.30..1132175.30 rows=5010000 width=73) (actual time=14257.800..19118.145 rows=5010000 loops=1)
Sort Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
Sort Method: external merge  Disk: 421232kB
->  Seq Scan on t10  (cost=0.00..116900.00 rows=5010000 width=73) (actual time=0.073..419.701 rows=5010000 loops=1)
Planning time: 0.352 ms
Execution time: 20025.956 ms
PG15 QUERY PLAN, TABLE T10
------------------------------------------------------------------------------------------- HashAggregate  (cost=699692.77..730144.18 rows=501000 width=73) (actual time=1212.779..1232.667 rows=50601 loops=1)
Group Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
Planned Partitions: 16  Batches: 17  Memory Usage: 8373kB  Disk Usage: 2976kB
->  Gather  (cost=394624.22..552837.15 rows=1002000 width=73) (actual time=1071.280..1141.814 rows=151803 loops=1)
Workers Planned: 2
Workers Launched: 2
->  HashAggregate  (cost=393624.22..451637.15 rows=501000 width=73) (actual time=1064.261..1122.628 rows=50601 loops=3)
Group Key: c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
Planned Partitions: 16  Batches: 17  Memory Usage: 8373kB  Disk Usage: 15176kB
Worker 0:  Batches: 17  Memory Usage: 8373kB  Disk Usage: 18464kB
Worker 1:  Batches: 17  Memory Usage: 8373kB  Disk Usage: 19464kB
->  Parallel Seq Scan on t10  (cost=0.00..87675.00 rows=2087500 width=73) (actual time=0.072..159.083 rows=1670000 loops=3)
Planning Time: 0.286 ms
JIT:
Functions: 31
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 3.510 ms, Inlining 123.698 ms, Optimization 200.805 ms, Emission 149.608 ms, Total 477.621 ms
Execution Time: 1244.556 ms

提高性能:
postgres运行时参数max_parallel_workers_per_gather来提高性能。新初始化的集群中的默认值为2。
如下表所示,由于测试硬件本身的能力有限,它很快成为收益递减的原因。

​​在postgresql 15中:

max_parallel_workers_per_gather

1 column (t1)

5 column (t5)

10 column (t10)

2

1,439

1,025

1,245

1,464

1,013

1,391

1,401

1,045

1,428

764761-20220716145246810-509721399.png

关于索引:如本查询计划中所示,应用索引时未实现性能改进。

PG15,表T10,max_parallel_workers_per_gather=4:

QUERY PLAN                                                                                
-----------------------------------------------------------------------------------
Unique  (cost=0.43..251344.40 rows=501000 width=73) (actual time=0.060..1240.729 rows=50601 loops=1)
->  Index Only Scan using t10_c1_c2_c3_c4_c5_c6_c7_c8_c9_c10_idx on t10  (cost=0.43..126094.40 rows=5010000 width=73) (actual time=0.058..710.780 rows=5010000 loops=1)
Heap Fetches: 582675
Planning Time: 0.596 ms
JIT:
Functions: 1
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 0.262 ms, Inlining 0.000 ms, Optimization 0.122 ms, Emission 2.295 ms, Total 2.679 ms
Execution Time: <strong>1249.391 ms</strong>

跨多个CPU运行DISTINCT是性能能力的一大进步。
但是请记住,当增加max_parallel_workers_per_gather的数量并接近硬件的限制时,性能下降的风险。
在正常情况下,查询计划器可能会决定使用索引而不是运行并行工作程序。
解决此问题的一种方法是考虑禁用运行时参数,例如enable_indexonlyscan和enable_indexscan。
最后,不要忘记运行EXPLAIN ANALYZE以了解发生了什么。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK