淺談 MySQL 子查詢及其優化
使用過oracle或者其他關系數據庫的DBA或者開發人員都有這樣的經驗,在子查詢上都認為數據庫已經做過優化,能夠很好的選擇驅動表執行,然后在把該經驗移植到mysql數據庫上,但是不幸的是,mysql在子查詢的處理上有可能會讓你大失所望,在我們的生產系統上就碰到過一些案例,例如:
SELECT i_id,
sum(i_sell) AS i_sell
FROM table_data
WHERE i_id IN
(SELECT i_id
FROM table_data
WHERE Gmt_create >= '2011-10-07 00:00:00')
GROUP BY i_id;(備注:sql的業務邏輯可以打個比方:先查詢出10-07號新賣出的100本書,然后在查詢這新賣出的100本書在全年的銷量情況)。 這條sql之所以出現的性能問題在于mysql優化器在處理子查詢的弱點,mysql優化器在處理子查詢的時候,會將將子查詢改寫。通常情況下,我們希望由內到外,先完成子查詢的結果,然后在用子查詢來驅動外查詢的表,完成查詢;但是mysql處理為將會先掃描外面表中的所有數據,每條數據將會傳到子查詢中與子查詢關聯,如果外表很大的話,那么性能上將會出現問題;
針對上面的查詢,由于table_data這張表的數據有70W的數據,同時子查詢中的數據較多,有大量是重復的,這樣就需要關聯近70W次,大量的關聯導致這條sql執行了幾個小時也沒有執行完成,所以我們需要改寫sql:
SELECT t2.i_id,
SUM(t2.i_sell) AS sold
FROM
(SELECT DISTINCT i_id
FROM table_data
WHERE gmt_create >= '2011-10-07 00:00:00') t1,
table_data t2
WHERE t1.i_id = t2.i_id
GROUP BY t2.i_id;我們將子查詢改為了關聯,同時在子查詢中加上distinct,減少t1關聯t2的次數; 改造后,sql的執行時間降到100ms以內。
mysql的子查詢的優化一直不是很友好,一直有受業界批評比較多,也是我在sql優化中遇到過最多的問題之一,mysql在處理子查詢的時候,會將子查詢改寫,通常情況下,我們希望由內到外,也就是先完成子查詢的結果,然后在用子查詢來驅動外查詢的表,完成查詢,但是恰恰相反,子查詢不會先被執行;今天希望通過介紹一些實際的案例來加深對mysql子查詢的理解。下面將介紹一個完整的案例及其分析、調優的過程與思路。
1、案例:
用戶反饋數據庫響應較慢,許多業務動更新被卡住;登錄到數據庫中觀察,發現長時間執行的sql;
| 10437 | usr0321t9m9 | 10.242.232.50:51201 | oms | Execute | 1179 | SendingSql為:
SELECT tradedto0.* FROM a1 tradedto0 WHERE tradedto0.tradestatus='1' AND (tradedto0.tradeoid IN (SELECT orderdto1.tradeoid FROM a2 orderdto1 WHERE orderdto1.proname LIKE '%??%' OR orderdto1.procode LIKE '%??%')) AND tradedto0.undefine4='1' AND tradedto0.invoicetype='1' AND tradedto0.tradestep='0' AND (tradedto0.orderCompany LIKE '0002%') ORDER BY tradedto0.tradesign ASC, tradedto0.makertime DESC LIMIT 15;</pre>
2、現象:其他表的更新被阻塞
UPDATE a1 SET tradesign='DAB67634-795C-4EAC-B4A0-78F0D531D62F', markColor=' #CD5555', memotime='2012-09- 22', markPerson='??' WHERE tradeoid IN ('gy2012092204495100032') ;為了盡快恢復應用,將其長時間執行的sql kill掉后,應用恢復正常;
3、分析執行計劃:
db@3306 :explain SELECT tradedto0.* FROM a1 tradedto0 WHERE tradedto0.tradestatus='1' AND (tradedto0.tradeoid IN (SELECT orderdto1.tradeoid FROM a2 orderdto1 WHERE orderdto1.proname LIKE '%??%' OR orderdto1.procode LIKE '%??%')) AND tradedto0.undefine4='1' AND tradedto0.invoicetype='1' AND tradedto0.tradestep='0' AND (tradedto0.orderCompany LIKE '0002%') ORDER BY tradedto0.tradesign ASC, tradedto0.makertime DESC LIMIT 15;+----+--------------------+------------+------+---------------+------+---------+------+-------+----- | id | select_type | table | type | possible_keys | key | keylen | ref | rows | Extra | +----+--------------------+------------+------+---------------+------+---------+------+-------+----- | 1 | PRIMARY | tradedto0 | ALL | NULL | NULL | NULL | NULL | 27454 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | orderdto1_ | ALL | NULL | NULL | NULL | NULL | 40998 | Using where | +----+--------------------+------------+------+---------------+------+---------+------+-------+-----</pre>從執行計劃上,我們開始一步一步地進行優化:
首先,我們看看執行計劃的第二行,也就是子查詢的那部分,orderdto1_進行了全表的掃描,我們看看能不能添加適當的索引:A . 使用覆蓋索引:
db@3306:alter table a2 add index ind_a2(proname,procode,tradeoid); ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes添加組合索引超過了最大key length限制:
</span>B.查看該表的字段定義:
db@3306 :DESC a2 ; +---------------------+---------------+------+-----+---------+-------+ | FIELD | TYPE | NULL | KEY | DEFAULT | Extra | +---------------------+---------------+------+-----+---------+-------+ | OID | VARCHAR(50) | NO | PRI | NULL | | | TRADEOID | VARCHAR(50) | YES | | NULL | | | PROCODE | VARCHAR(50) | YES | | NULL | | | PRONAME | VARCHAR(1000) | YES | | NULL | | | SPCTNCODE | VARCHAR(200) | YES | | NULL | |C.查看表字段的平均長度:
db@3306 :SELECT MAX(LENGTH(PRONAME)),avg(LENGTH(PRONAME)) FROM a2; +----------------------+----------------------+ | MAX(LENGTH(PRONAME)) | avg(LENGTH(PRONAME)) | +----------------------+----------------------+ | 95 | 24.5588 |D.縮小字段長度
ALTER TABLE MODIFY COLUMN PRONAME VARCHAR(156);再進行執行計劃分析:db@3306 :explain SELECT tradedto0.* FROM a1 tradedto0 WHERE tradedto0.tradestatus='1' AND (tradedto0.tradeoid IN (SELECT orderdto1.tradeoid FROM a2 orderdto1 WHERE orderdto1.proname LIKE '%??%' OR orderdto1.procode LIKE '%??%')) AND tradedto0.undefine4='1' AND tradedto0.invoicetype='1' AND tradedto0.tradestep='0' AND (tradedto0.orderCompany LIKE '0002%') ORDER BY tradedto0.tradesign ASC, tradedto0.makertime DESC LIMIT 15;+----+--------------------+------------+-------+-----------------+----------------------+---------+ | id | select_type | table | type | possible_keys | key | keylen | ref | rows | Extra | +----+--------------------+------------+-------+-----------------+----------------------+---------+ | 1 | PRIMARY | tradedto0 | ref | ind_tradestatus | indtradestatus | 345 | const,const,const,const | 8962 | Using where; Using filesort | | 2 | DEPENDENT SUBQUERY | orderdto1 | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index | +----+--------------------+------------+-------+-----------------+----------------------+---------+</pre>發現性能還是上不去,關鍵在兩個表掃描的行數并沒有減小(8962*41005),上面添加的索引沒有太大的效果,現在查看t表的執行結果:
db@3306 : SELECT orderdto1.tradeoid FROM t orderdto1 WHERE orderdto1.proname LIKE '%??%' OR orderdto1.procode LIKE '%??%';Empty SET (0.05 sec)</pre>結果集為空,所以需要將t表的結果集做作為驅動表;
4、改寫子查詢:
通過上面測試驗證,普通的mysql子查詢寫法性能上是很差的,為mysql的子查詢天然的弱點,需要將sql進行改寫為關聯的寫法:
SELECT tradedto0_.* FROM a1 tradedto0_ , (SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2 WHERE tradedto0_.tradestatus='1' AND (tradedto0_.tradeoid=t2.tradeoid) AND tradedto0_.undefine4='1' AND tradedto0_.invoicetype='1' AND tradedto0_.tradestep='0' AND (tradedto0_.orderCompany LIKE '0002%') ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15;5、查看執行計劃:
db@3306 :explain SELECT tradedto0.* FROM a1 tradedto0 , (SELECT orderdto1.tradeoid FROM a2 orderdto1 WHERE orderdto1.proname LIKE '%??%' OR orderdto1.procode LIKE '%??%')t2 WHERE tradedto0.tradestatus='1' AND (tradedto0.tradeoid=t2.tradeoid) AND tradedto0.undefine4='1' AND tradedto0.invoicetype='1' AND tradedto0.tradestep='0' AND (tradedto0.orderCompany LIKE '0002%') ORDER BY tradedto0.tradesign ASC, tradedto0.makertime DESC LIMIT 15;+----+-------------+------------+-------+---------------+----------------------+---------+------+ | id | select_type | table | type | possible_keys | key | keylen | ref | rows | Extra | +----+-------------+------------+-------+---------------+----------------------+---------+------+ | 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables | | 2 | DERIVED | orderdto1 | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index | +----+-------------+------------+-------+---------------+----------------------+---------+------+</pre>
6、執行時間:
db@3306 : SELECT tradedto0.* FROM a1 tradedto0 , (SELECT orderdto1.tradeoid FROM a2 orderdto1 WHERE orderdto1.proname LIKE '%??%' OR orderdto1.procode LIKE '%??%')t2 WHERE tradedto0.tradestatus='1' AND (tradedto0.tradeoid=t2.tradeoid) AND tradedto0.undefine4='1' AND tradedto0.invoicetype='1' AND tradedto0.tradestep='0' AND (tradedto0.orderCompany LIKE '0002%') ORDER BY tradedto0.tradesign ASC, tradedto0.makertime DESC LIMIT 15;Empty SET (0.03 sec)</pre>縮短到了毫秒;
7、總結:
1. mysql子查詢在執行計劃上有著明顯的弱點,需要將子查詢進行改寫
可以參考:
a. 生產庫中遇到mysql的子查詢:http://hidba.org/?p=412
b. 內建的builtin InnoDB,子查詢阻塞更新:http://hidba.org/?p=456
2. 在表結構設計上,不要隨便使用varchar(N)的大字段,導致無法使用索引
可以參考:
a. JDBC內存管理—varchar2(4000)的影響:http://hidba.org/?p=31
b. innodb中大字段的限制:http://hidba.org/?p=144
c. innodb使用大字段text,blob的一些優化建議: http://hidba.org/?p=551
</span>8、Refer:
[1] 生產庫中遇到mysql的子查詢 http://hidba.org/?p=412
[2] 淺談mysql的子查詢 http://hidba.org/?p=624
[3] mysql子查詢的弱點 http://hidba.org/?p=260
來自:http://hidba.org/?p=624