0

记一次SQL优化 - 下半夜的风

 1 year ago
source link: https://www.cnblogs.com/fanchengmeng/p/16511121.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

昨天(2022-7-22)上线了我的一个功能,测试环境数据量较小,问题不大,但是上生产之后,直接卡死了,然后就开始了这么一次SQL优化,这里记录一下。

不太方便透露公司的表结构,这里我自己建了几张表,模拟一下就可以了。

肯定有杠精要说表可以不这样设计了,但是事实现在系统就是这样设计的,如果想改动表设计,影响面就太大了(我们急着上线哦)。当然,本文的后面也会给出修改设计的方案,以达到更优解。

1. 创建表

进货单表:

CREATE TABLE `purchase_order` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键自增id', `purchase_time` varchar(255) DEFAULT NULL COMMENT '进货时间', `purchase_pre_unit_price` decimal(10,2) unsigned zerofill NOT NULL COMMENT '进货预订单价(元/kg)', `purchase_weight` decimal(10,2) unsigned zerofill NOT NULL COMMENT '进货重量(kg)', `purchase_bill_no` varchar(255) NOT NULL COMMENT '进货单号', PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=62181 DEFAULT CHARSET=utf8 COMMENT='进货单';

进货结算单表:

CREATE TABLE `settlement_voucher` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增ID', `purchase_bill_no` varchar(512) DEFAULT NULL COMMENT '进货单号', `settlement_bill_no` varchar(64) NOT NULL COMMENT '结算单号', `unit_price` decimal(10,2) unsigned zerofill NOT NULL COMMENT '实际结算单价(元/kg)', `settlement_weight` decimal(10,2) unsigned zerofill NOT NULL COMMENT '实际结算重量(kg)', `cut_off_time` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT '结算时间', PRIMARY KEY (`id`), KEY `idx_settlement_bill_no` (`settlement_bill_no`)) ENGINE=InnoDB AUTO_INCREMENT=63288 DEFAULT CHARSET=utf8 COMMENT='进货结算单';
CREATE TABLE `invoice` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键ID', `invoice_code` varchar(255) NOT NULL COMMENT '发票代码', `invoice_number` varchar(255) NOT NULL COMMENT '发票号码', `pay_amount` decimal(10,2) DEFAULT NULL COMMENT '发票金额', PRIMARY KEY (`id`), KEY `idx_invoice_number` (`invoice_number`)) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='发票表';

发票-结算单关联表:

CREATE TABLE `settlement_invoice_relation` ( `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键ID', `invoice_code` varchar(255) DEFAULT NULL COMMENT '发票代码', `invoice_number` varchar(255) DEFAULT NULL COMMENT '发票号码', `settlement_bill_no` varchar(64) DEFAULT NULL COMMENT '结算单号', PRIMARY KEY (`id`), KEY `idx_settlement_bill_no` (`settlement_bill_no`), KEY `idx_invoice_number` (`invoice_number`)) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='发票-结算单关联表';

以上是我自己创建的几张表,先介绍一下这几张表的关系:

  • 进货单表(purchase_order)和进货结算单表(settlement_voucher)通过进货单号(purchase_bill_no)关联。这里值得注意的是:
    • 一个进货单可以对应多个进货结算单,通过purchase_bill_no关联,如以下数据:
    • 一个进货结算单可以对应多个进货单,通过purchase_bill_no关联,settlement_voucher表中的purchase_bill_no字段存放多个进货单号,使用英文逗号隔开。如以下数据:
  • 发票表(invoice)和结算单表(settlement_voucher)有一个关联关系表(settlement_invoice_relation)
    • 发票表和关联关系表使用invoice_codeinvoice_number关联
    • 结算单表和关联关系表使用settlement_bill_no关联
    • 发票和结算单是多对多的关系

现在需要以进货结算单表(settlement_voucher)查询出一个列表:

  • 列表字段有【进货时间(多个使用英文逗号隔开)、进货平均单价、进货预定总金额,结算单号,结算平均单价,结算金额,结算时间,发票号码(多个使用英文逗号隔开),发票代码(多个使用英文逗号隔开)】
  • 查询条件有:进货时间(一个进货结算单对应多个进货单时,只要有一个进货单的时间在范围内,就查询到),结算时间,发票号码(一个结算单对应多个发票时,只要有一个发票能关联上,就查询到)
  • 根据结算时间排序

当然,实际当时的那个需求,列表字段比这多,查询条件也比这多......

3. 给表插入数据

先给货单表(purchase_order)和进货结算单表(settlement_voucher)各自插入10万条数据,我这里使用了存储过程:

begindeclare i int; declare purchase_weight decimal(10,2);declare unit_price decimal(10,2);declare purchase_bill_no varchar(255);declare settlement_bill_no varchar(255);set i=0;while i<100000 do select ROUND(RAND()*100,2) into purchase_weight from dual; select ROUND(RAND()*10,2) into unit_price from dual; select CONCAT('purchase-',LPAD(i,8,'0')) into purchase_bill_no from dual; select CONCAT('settlement-',LPAD(i,8,'0')) into settlement_bill_no from dual; -- 插入进货单表,进货时间随机生成 insert into purchase_order(purchase_time,purchase_pre_unit_price,purchase_weight,purchase_bill_no) select (DATE_ADD(NOW(), INTERVAL FLOOR(1 - (RAND() * 864000)) SECOND )), unit_price,purchase_weight,purchase_bill_no from dual; -- 插入结算单表,结算时间随机生成 insert into settlement_voucher(purchase_bill_no,settlement_bill_no,unit_price,settlement_weight,cut_off_time) select purchase_bill_no,settlement_bill_no,unit_price,purchase_weight, (DATE_ADD(NOW(), INTERVAL FLOOR(1 - (RAND() * 864000)) SECOND )) from dual;set i=i+1; end while;end

调用存储过程生成数据:

call pre();

生成之后需要随机改几条数据,模拟一个进货单可以对应多个进货结算单,以及一个进货结算单可以对应多个进货单两种情况(这样数据更真实一点)。

一个进货单可以对应多个进货结算单的情况就不模拟了,这种情况其实对这次查询的影响并不大。

一个进货结算单可以对应多个进货单的情况:

2513991-20220723104128228-970222987.png

再创建一些发票数据和结算单-发票关联数据,需要体现多对多的关系:

insert into invoice(invoice_code,invoice_number,pay_amount)VALUES('111111','1111100','1000'),('111112','1111101','1001'),('111113','1111102','1002'),('111114','1111103','1003'),('111115','1111104','1004'),('111116','1111105','1005'),('111117','1111106','1006'),('111118','1111107','1007'),('111119','1111108','1008'),('111110','1111109','1009'); INSERT into settlement_invoice_relation(invoice_code,invoice_number,settlement_bill_no)VALUES('111111','1111100','settlement-00000000'),('111112','1111101','settlement-00000000'),('111113','1111102','settlement-00000000'),('111114','1111103','settlement-00000004'),('111114','1111103','settlement-00000006'),('111114','1111103','settlement-00000030'),('111116','1111105','settlement-00000041'),('111117','1111106','settlement-00000041'),('111118','1111107','settlement-00000043');

4. 开始根据需求写SQL

优化第一步,当然是想让产品经理去掉一些查询条件,避免进货单表和进货结算表关联了,但是你懂的。。。。。。

这里就以进货时间为条件查询为例(因为主要就是进货单和进货结算单关联导致慢查询),记得需求哦,就是一个进货结算单可能对应多个进货单,只要有其中一个进货单在时间范围内,就需要查询出这条进货结算单

还有:我上面创建的表中索引也模拟了当时优化之前的索引......

4.1 第一版

select GROUP_CONCAT(po.purchase_time) as 进货时间, AVG(IFNULL(po.purchase_pre_unit_price,0)) as 进货均价, t.settlement_bill_no as 结算单号, AVG(IFNULL(t.unit_price,0)) as 结算均价, any_value(t.cut_off_time) as 结算时间, any_value(invoice_tmp.invoice_code) as 发票代码, any_value(invoice_tmp.invoice_number) as 发票号码from settlement_voucher tleft join purchase_order po on FIND_IN_SET(po.purchase_bill_no,t.purchase_bill_no)>0left join ( select sir.settlement_bill_no, GROUP_CONCAT(i.invoice_number) invoice_number, GROUP_CONCAT(i.invoice_code) invoice_code from settlement_invoice_relation sir, invoice i where sir.invoice_code = i.invoice_code and sir.invoice_number = i.invoice_number group by sir.settlement_bill_no) invoice_tmp on invoice_tmp.settlement_bill_no = t.settlement_bill_nowhere 1=1 -- and t.settlement_bill_no='settlement-00000000'and EXISTS(select 1 from purchase_order po1 where FIND_IN_SET(po1.purchase_bill_no,t.purchase_bill_no)>0 and po1.purchase_time >='2022-07-01 00:00:00' )and EXISTS(select 1 from purchase_order po1 where FIND_IN_SET(po1.purchase_bill_no,t.purchase_bill_no)>0 and po1.purchase_time <='2022-07-23 23:59:59' )group by t.settlement_bill_no;

第一版SQL当时在本地环境执行是用了5秒左右,此时就已经意识到问题了,这别说上生产了,就是在测试环境都得挂掉。

但是看看我在自己的垃圾服务器(双核4G)上跑这条SQL吧,是根本执行不出来的(虽然公司服务器好一些,但是生产环境确实卡死了):

2513991-20220723104227441-1410502133.png

当时就还没没看执行计划,一眼看去,这个SQL中用到了FIND_IN_SET,肯定是不会走索引的,建了索引也没用,也就是主要是进货单表(purchase_order)和进货结算单表settlement_voucher关联会很慢,毕竟他们是多对多的关系,再加上这恶心的需求。所以现在想想该怎么才能不用 FIND_IN_SET

对,吃饭期间,突发奇想:我应该可以把进货结算单表拆成一个临时表,如果进货结算单表对应了5个进货单,我就把进货结算单拆成5条数据,这五条数据除了进货单号不一样,其他字段都 一样,这样就可以不用FIND_IN_SET了。

说干就干,于是有了下面第二版SQL。

4.2 第二版

向把进货结算单表拆分成上面说的临时表,需要添加一个表:

CREATE TABLE `incre_table` ( `id` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='用于分割进货结算单表';-- 注意:这里一个进货结算单对应多少个进货单,这里就要依次插入多少条数据,我这里10条 就够用了insert into incre_table(id) VALUES(1);insert into incre_table(id) VALUES(2);insert into incre_table(id) VALUES(3);insert into incre_table(id) VALUES(4);insert into incre_table(id) VALUES(5);insert into incre_table(id) VALUES(6);insert into incre_table(id) VALUES(7);insert into incre_table(id) VALUES(8);insert into incre_table(id) VALUES(9);insert into incre_table(id) VALUES(10);

先来看看怎么把一条进货结算单数据拆分成多条:

select sv.cut_off_time, sv.settlement_bill_no, sv.unit_price, sv.settlement_weight, SUBSTRING_INDEX(SUBSTRING_INDEX(sv.purchase_bill_no,',',it.id),',',-1) purchase_bill_nofrom settlement_voucher svRIGHT JOIN incre_table it on it.id<=(LENGTH(sv.purchase_bill_no) - LENGTH(REPLACE(sv.purchase_bill_no,',','')) + 1)where sv.settlement_bill_no='settlement-00000000';

来解释一下这个骚操作:

  • 首先我创建了一个只有id的表incre_table,插入了十条数据,并且这十条数据必须是1-10。
  • 然后我使用settlement_voucher 右连接了 incre_table,并且只取incre_table中id小于或等于进货单数量的数据。这样就控制了这条SQL应该查询多少条数据(就刚好是一个进货结算单对应的进货单条数)。
  • 然后使用SUBSTRING_INDEX去一个一个拆分settlement_voucher表中的进货单号

这套SQL执行的结果就是:

2513991-20220723104309865-343357940.png

综合起来,就写好了第二版SQL:

select GROUP_CONCAT(po.purchase_time) as 进货时间, AVG(IFNULL(po.purchase_pre_unit_price,0)) as 进货均价, t.settlement_bill_no as 结算单号, AVG(IFNULL(t.unit_price,0)) as 结算均价, any_value(t.cut_off_time) as 结算时间, any_value(invoice_tmp.invoice_code) as 发票代码, any_value(invoice_tmp.invoice_number) as 发票号码from ( select sv.cut_off_time, sv.settlement_bill_no, sv.unit_price, sv.settlement_weight, SUBSTRING_INDEX(SUBSTRING_INDEX(sv.purchase_bill_no,',',it.id),',',-1) purchase_bill_no from settlement_voucher sv RIGHT JOIN incre_table it on it.id<=(LENGTH(sv.purchase_bill_no) - LENGTH(REPLACE(sv.purchase_bill_no,',','')) + 1)) tleft join purchase_order po on po.purchase_bill_no = t.purchase_bill_noleft join ( select sir.settlement_bill_no, GROUP_CONCAT(i.invoice_number) invoice_number, GROUP_CONCAT(i.invoice_code) invoice_code from settlement_invoice_relation sir, invoice i where sir.invoice_code = i.invoice_code and sir.invoice_number = i.invoice_number group by sir.settlement_bill_no) invoice_tmp on invoice_tmp.settlement_bill_no = t.settlement_bill_nowhere 1=1 -- and t.settlement_bill_no='settlement-00000000'and po.purchase_time >='2022-07-01 00:00:00' and po.purchase_time <='2022-07-23 23:59:59' group by t.settlement_bill_no;

测试查询数据结果肯定是没有问题的哦!!!

好的,到这里终于把所有用到FIND_IN_SET的地方去掉了,这时看索引就有意义了!

看看执行计划吧:

2513991-20220723104352253-1730836226.png

阿西巴,一堆的全表扫描,看看上面第二版SQL,发现进货表(purchase_order)的purchase_bill_no字段是应该走索引的,按道理这个字段一般设计表的时候就应该已经加索引了,但是我以为只是我以为,它确实没加索引,好的,那就给它加上索引吧:

create index idx_purchase_bill_no on purchase_order(purchase_bill_no);

加完是这个索引后,再看看执行计划:

purchase_order表的purchase_bill_no已经走了索引,但是settlement_invoice_relation咋不走索引,它是有两个索引的。。。。。。

2513991-20220723104420345-179626094.png

再看看在我的垃圾服务器上执行,看能不能执行出来:

2513991-20220723104445751-724064939.png

好了,为了让settlement_invoice_relation表的查询也走索引,开始下一轮的SQL优化

4.3 第三版

就不在下面去聚合获取invoice_codeinvoice_number了,在上面来聚合,至于要以这两个字段作为查询条件,那可以把下面这条SQL再包一层,作为一个临时表再查询一遍,这里就不演示了

select GROUP_CONCAT(po.purchase_time) as 进货时间, AVG(IFNULL(po.purchase_pre_unit_price,0)) as 进货均价, t.settlement_bill_no as 结算单号, AVG(IFNULL(t.unit_price,0)) as 结算均价, any_value(t.cut_off_time) as 结算时间, GROUP_CONCAT(DISTINCT invoice_tmp.invoice_code) as 发票代码, GROUP_CONCAT(DISTINCT invoice_tmp.invoice_number) as 发票号码from ( select sv.cut_off_time, sv.settlement_bill_no, sv.unit_price, sv.settlement_weight, SUBSTRING_INDEX(SUBSTRING_INDEX(sv.purchase_bill_no,',',it.id),',',-1) purchase_bill_no from settlement_voucher sv RIGHT JOIN incre_table it on it.id<=(LENGTH(sv.purchase_bill_no) - LENGTH(REPLACE(sv.purchase_bill_no,',','')) + 1)) tleft join purchase_order po on po.purchase_bill_no = t.purchase_bill_noleft join ( select sir.settlement_bill_no, i.invoice_number, i.invoice_code from settlement_invoice_relation sir, invoice i where sir.invoice_code = i.invoice_code and sir.invoice_number = i.invoice_number ) invoice_tmp on invoice_tmp.settlement_bill_no = t.settlement_bill_nowhere 1=1 -- and t.settlement_bill_no='settlement-00000000'and po.purchase_time >='2022-07-01 00:00:00' and po.purchase_time <='2022-07-23 23:59:59' group by t.settlement_bill_no;

再看看执行计划:

2513991-20220723104514624-1930604491.png

这时,基本优化结束,再看看在我的垃圾服务器上跑出的结果:

2513991-20220723104533231-1447338649.png

到这里,基本上生产上是可以在三秒以内查询出来了,本次SQL优化就到此结束了!!!

但是,其实还是可以继续优化的,但是设计到系统改的地方比较多了,影响面比较大,这里就说一下思路,暂时不能实践:

可以把进货单表purchase_order和进货结算单表settlement_voucher之间,建立一个中间表,实现多对多的关系,再加以索引,应该会更快,而且可以一劳永逸,以后这种关联都会比较方便了!


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK