问题背景:下午收到慢查询导致超时告警,用pt-query-digest 获取到相应慢sql,如下。
select count(1) from (
select mdos.sn_code,count(1) from mps_device_order_static mdos
inner join business b on b.sid=mdos.sid
inner join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' and mdos.sn_code<>'null' and 1=1 AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30'
group by mdos.sn_code) tab;
分析过程:
查看执行计划 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
1 | PRIMARY | NULL | ALL | NULL | NULL | NULL | NULL | 58943 | 100.00 | NULL | |
2 | DERIVED | mdos | NULL | ALL | idx_static_date,idx_sn_code,idx_sid,idx_static_date_sn_code_total_money_total_count | NULL | NULL | NULL | 2064111 | 28.56 | Using where; Using temporary; Using filesort |
2 | DERIVED | qd | NULL | eq_ref | ux_sn_code,ix_business_guid | ux_sn_code | 202 | management.mdos.sn_code | 1 | 100.00 | NULL |
2 | DERIVED | b | NULL | eq_ref | PRIMARY,ix_sid_int | PRIMARY | 144 | management.qd.business_guid | 1 | 10.00 | Using where |
4 rows in set, 1 warning (0.00 sec)
发现驱动表mdos 即mps_device_order_static 走全表扫描
各表索引情况如下:
mps_device_order_static 表索引
PRIMARY KEY (id
),
KEY idx_static_date
(static_date
),
KEY idx_agent_id
(agent_id
),
KEY idx_sn_code
(sn_code
),
KEY idx_sid
(sid
),
KEY idx_static_date_sn_code_total_money_total_count
(static_date
,sn_code
,total_money
,total_count
)
)
business 表索引
PRIMARY KEY (guid
),
UNIQUE KEY ix_account
(account
),
UNIQUE KEY ix_sid_int
(sid
),
KEY ix_agent_guid
(agent_guid
),
KEY ix_open_id
(open_id
),
KEY ix_vip_guid
(vip_guid
),
KEY ix_vip_account
(vip_account
),
KEY ix_vpi_cloud_open_id
(vip_cloud_open_id
),
KEY ix_sub_channel_number
(sub_channel_number
),
KEY ix_register_time
(register_time
) USING BTREE,
KEY ix_wechat_appid
(wechat_appid
),
KEY idx_tong_cheng_hui_guid
(tong_cheng_hui_guid
)
)
qdsp_device 表索引
PRIMARY KEY (id
),
UNIQUE KEY ux_sn_code
(sn_code
),
KEY ix_agent_guid
(agent_guid
),
KEY ix_business_guid
(business_guid
) USING BTREE,
KEY idx_chain_store_guid_sn_code
(chain_store_guid
,sn_code
),
KEY idx_terminal_number
(terminal_number
),
KEY idx_storehouse_guid
(storehouse_guid
),
KEY idx_device_status
(device_status
)
)
查看了sql各连接条件,和where语句中条件,确认所有条件字段都是有索引的。所以应该是什么原因导致了索引失效
尝试将语句进行拆解,一步一步分析
首先试试驱动表mdos单表执行计划
explain select mdos.sn_code,count(*) from mps_device_order_static mdos
#inner join business b on b.sid=mdos.sid
#inner join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' and mdos.sn_code<>'null' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30'
group by mdos.sn_code;
+----+-------------+-------+------------+-------+-----------------------------------------------------------------------------+-------------------------------------------------+---------+------+---------+----------+-----------------------------------------------------------+
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | mdos | NULL | range | idx_static_date,idx_sn_code,idx_static_date_sn_code_total_money_total_count | idx_static_date_sn_code_total_money_total_count | 3 | NULL | 1032055 | 50.00 | Using where; Using index; Using temporary; Using filesort |
1 row in set, 1 warning (0.00 sec)
发现正常走索引
接着试
explain select mdos.sn_code,count(*) from mps_device_order_static mdos
#inner join business b on b.sid=mdos.sid
inner join qdsp_device qd on qd.sn_code=mdos.sn_code #and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' and mdos.sn_code<>'null' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30'
group by mdos.sn_code;
+----+-------------+-------+------------+--------+-----------------------------------------------------------------------------+-------------------------------------------------+---------+-------------------------+---------+----------+-----------------------------------------------------------+
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | mdos | NULL | range | idx_static_date,idx_sn_code,idx_static_date_sn_code_total_money_total_count | idx_static_date_sn_code_total_money_total_count | 3 | NULL | 1032055 | 50.00 | Using where; Using index; Using temporary; Using filesort |
1 | SIMPLE | qd | NULL | eq_ref | ux_sn_code | ux_sn_code | 202 | management.mdos.sn_code | 1 | 100.00 | Using index |
也能正常走索引
接着试
explain select mdos.sn_code,count(*) from mps_device_order_static mdos
inner join business b on b.sid=mdos.sid
#inner join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' and mdos.sn_code<>'null' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30'
group by mdos.sn_code;
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | mdos | NULL | ALL | idx_static_date,idx_sn_code,idx_sid,idx_static_date_sn_code_total_money_total_count | NULL | NULL | NULL | 2064111 | 14.28 | Using where; Using temporary; Using filesort |
1 | SIMPLE | b | NULL | eq_ref | ix_sid_int | ix_sid_int | 4 | management.mdos.sid | 1 | 100.00 | Using where; Using index |
2 rows in set, 1 warning (0.01 sec)
发现走全表索引失效,走全表扫描。这时感觉应该是和business 表关联这里有问题,但还是没找到头绪
改写下sql试试
explain select count(distinct mdos.sn_code) from mps_device_order_static mdos
join business b on b.sid=mdos.sid
join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30';
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | mdos | NULL | ALL | idx_static_date,idx_sn_code,idx_sid,idx_static_date_sn_code_total_money_total_count | NULL | NULL | NULL | 2064111 | 14.28 | Using where |
1 | SIMPLE | qd | NULL | eq_ref | ux_sn_code | ux_sn_code | 202 | management.mdos.sn_code | 1 | 100.00 | Using index |
1 | SIMPLE | b | NULL | eq_ref | ix_sid_int | ix_sid_int | 4 | management.mdos.sid | 1 | 100.00 | Using where; Using index |
3 rows in set, 1 warning (0.00 sec)
还是不行
接着试试走强制索引
explain select count(distinct mdos.sn_code) from mps_device_order_static mdos force index(idx_static_date)
join business b on b.sid=mdos.sid
join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30';
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | mdos | NULL | range | idx_static_date | idx_static_date | 3 | NULL | 589440 | 90.00 | Using index condition; Using where |
1 | SIMPLE | qd | NULL | eq_ref | ux_sn_code,ix_business_guid | ux_sn_code | 202 | management.mdos.sn_code | 1 | 100.00 | NULL |
1 | SIMPLE | b | NULL | eq_ref | PRIMARY,ix_sid_int | PRIMARY | 144 | management.qd.business_guid | 1 | 10.00 | Using where |
3 rows in set, 1 warning (0.00 sec)
这时发现能走索引,速度快了很多。但是没找到原因
后来咨询了一位大佬,根据他的猜测可能是表数据量比较大,同时数据分布比较分散,所以导致索引失效走全表扫描。根据他的猜测,缩短了过滤时间,将mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30' 缩短到一周试试。
explain select count(distinct mdos.sn_code) from mps_device_order_static mdos force index(idx_static_date)
join business b on b.sid=mdos.sid
join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-07';
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | mdos | NULL | range | idx_static_date | idx_static_date | 3 | NULL | 133714 | 90.00 | Using index condition; Using where |
1 | SIMPLE | qd | NULL | eq_ref | ux_sn_code,ix_business_guid | ux_sn_code | 202 | management.mdos.sn_code | 1 | 100.00 | NULL |
1 | SIMPLE | b | NULL | eq_ref | PRIMARY,ix_sid_int | PRIMARY | 144 | management.qd.business_guid | 1 | 10.00 | Using where |
3 rows in set, 1 warning (0.01 sec)
可以正常走索引了。
至此,得出两种解决办法
1、利用force index 强制走索引
select count(distinct mdos.sn_code) from mps_device_order_static mdos force index(idx_static_date)
join business b on b.sid=mdos.sid
join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-30';
2、缩短过滤时间,查询多次
select count(distinct mdos.sn_code) from mps_device_order_static mdos force index(idx_static_date)
join business b on b.sid=mdos.sid
join qdsp_device qd on qd.sn_code=mdos.sn_code and b.guid=qd.business_guid
where 1=1 and mdos.sn_code<>'' AND mdos.static_date>='2020-05-01' AND mdos.static_date<='2020-05-07';
第2种方法更好,更快