1. 查询优化器概述
1.1 优化器在查询处理中的位置
查询优化器是MySQL的"大脑",负责将解析后的SQL语句转换为最高效的执行计划。
1.2 优化器的核心价值
// 优化器的核心作用:选择最优执行路径 public class OptimizerCoreFunction { public void demonstrateOptimization() { String sql = "SELECT u.name, o.amount " + "FROM users u JOIN orders o ON u.id = o.user_id " + "WHERE u.country = 'US' AND o.amount > 100"; // 未经优化的可能执行方式(概念) ExecutionPlan naivePlan = new ExecutionPlan( "全表扫描users → 全表扫描orders → 嵌套循环连接 → 过滤条件" ); // 优化后的执行计划 ExecutionPlan optimizedPlan = new ExecutionPlan( "索引扫描users(country) → 索引扫描orders(user_id, amount) → 索引嵌套循环连接" ); System.out.println("性能提升: 10x faster"); } }
2. 基于成本的优化模型
2.1 成本模型基础
MySQL优化器使用基于成本的决策模型,估算不同执行计划的相对开销。
-- 查看优化器成本相关配置 SHOW VARIABLES LIKE 'optimizer_switch'; SHOW VARIABLES LIKE 'optimizer_cost_model'; -- 关键成本常数 SHOW VARIABLES LIKE '%cost%';
成本计算要素:
- IO成本:数据读取开销
- CPU成本:数据处理开销
- 内存成本:临时表、排序开销
- 网络成本:分布式查询开销
2.2 成本估算原理
// 成本估算模型(概念代码) public class CostEstimator { public double estimateTableScanCost(TableMetadata table) { // 全表扫描成本 = 页面数 * 单页IO成本 double pageCount = table.getPageCount(); double ioCostPerPage = getIOCostPerPage(); return pageCount * ioCostPerPage; } public double estimateIndexScanCost(IndexMetadata index, Selectivity selectivity) { // 索引扫描成本 = 索引遍历成本 + 数据查找成本 double indexTraversalCost = estimateIndexTraversal(index); double dataLookupCost = estimateDataLookup(index, selectivity); return indexTraversalCost + dataLookupCost; } public double estimateJoinCost(JoinType joinType, double leftCost, double rightCost, long leftRowCount, long rightRowCount) { switch (joinType) { case NESTED_LOOP: return leftCost + (leftRowCount * rightCost); case HASH_JOIN: return leftCost + rightCost + (leftRowCount + rightRowCount) * getHashCost(); case MERGE_JOIN: return leftCost + rightCost + getSortCost(leftRowCount) + getSortCost(rightRowCount); default: throw new IllegalArgumentException("Unknown join type: " + joinType); } } }
3. 查询重写与转换
3.1 子查询优化
优化器会将许多子查询转换为更高效的JOIN操作。
-- 原始子查询 SELECT name FROM users WHERE id IN (SELECT user_id FROM orders WHERE amount > 1000); -- 优化器可能重写为 SELECT u.name FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.amount > 1000); -- 或者进一步优化为半连接 SELECT u.name FROM users u SEMI JOIN orders o ON u.id = o.user_id AND o.amount > 1000;
3.2 条件化简
-- 复杂条件化简 SELECT * FROM products WHERE (price > 100 AND price < 200) OR (price > 150 AND price < 250); -- 优化器化简为 SELECT * FROM products WHERE price > 100 AND price < 250; -- 其他化简示例 SELECT * FROM users WHERE age >= 18 AND age <= 65 AND (country = 'US' OR country = 'UK' OR country = 'CA'); -- 可能重写为 SELECT * FROM users WHERE age BETWEEN 18 AND 65 AND country IN ('US', 'UK', 'CA');
3.3 视图合并
-- 创建视图 CREATE VIEW active_users AS SELECT id, name, email FROM users WHERE status = 'active'; -- 查询视图 SELECT au.name, o.amount FROM active_users au JOIN orders o ON au.id = o.user_id WHERE o.created_date > '2023-01-01'; -- 优化器进行视图合并 SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id WHERE u.status = 'active' AND o.created_date > '2023-01-01';
4. 执行计划选择策略
4.1 访问路径选择
优化器为每个表选择最佳的数据访问方法。
-- 示例表和索引 CREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(100), email VARCHAR(100), country VARCHAR(50), age INT, INDEX idx_country (country), INDEX idx_age (age), INDEX idx_country_age (country, age) ); -- 不同查询的访问路径选择 EXPLAIN SELECT * FROM users WHERE id = 100; -- 使用主键索引:const访问 EXPLAIN SELECT * FROM users WHERE country = 'US' AND age > 25; -- 可能使用复合索引 idx_country_age:ref/range访问 EXPLAIN SELECT * FROM users WHERE name LIKE 'John%'; -- 可能使用全表扫描或索引扫描
4.2 连接顺序优化
-- 多表连接查询 EXPLAIN SELECT u.name, o.amount, p.product_name FROM users u JOIN orders o ON u.id = o.user_id JOIN products p ON o.product_id = p.id WHERE u.country = 'US' AND o.amount > 100 AND p.category = 'Electronics'; -- 优化器会评估不同的连接顺序: -- 1. users → orders → products -- 2. users → products → orders -- 3. orders → users → products -- 4. orders → products → users -- 5. products → orders → users -- 6. products → users → orders
4.3 连接算法选择
MySQL优化器根据表大小、索引情况选择最佳连接算法。
// 连接算法选择逻辑(概念) public class JoinAlgorithmSelector { public JoinAlgorithm selectAlgorithm(JoinQuery query, Statistics stats) { Table left = query.getLeftTable(); Table right = query.getRightTable(); // 嵌套循环连接:小表驱动大表,有高效索引时 if (isSmallTable(left) && hasEfficientIndex(right, query.getJoinCondition())) { return JoinAlgorithm.NESTED_LOOP; } // 哈希连接:无合适索引,内存充足时(MySQL 8.0+) if (supportsHashJoin() && estimatedSize(left) + estimatedSize(right) < getAvailableMemory()) { return JoinAlgorithm.HASH_JOIN; } // 其他情况使用嵌套循环 return JoinAlgorithm.NESTED_LOOP; } }
5. 优化器特性与开关
5.1 优化器开关配置
MySQL提供细粒度的优化器控制开关。
-- 查看当前优化器开关配置 SHOW VARIABLES LIKE 'optimizer_switch'; -- 典型输出: -- index_merge=on,index_merge_union=on,index_merge_sort_union=on, -- index_merge_intersection=on,engine_condition_pushdown=on, -- index_condition_pushdown=on,mrr=on,mrr_cost_based=on, -- block_nested_loop=on,batched_key_access=off, -- materialization=on,semijoin=on,loosescan=on, -- firstmatch=on,duplicateweedout=on, -- subquery_materialization_cost_based=on, -- use_index_extensions=on,condition_fanout_filter=on, -- derived_merge=on,use_invisible_indexes=off, -- skip_scan=on,hash_join=on -- 临时修改优化器开关 SET SESSION optimizer_switch = 'index_merge=off,hash_join=on'; -- 永久修改(在my.cnf中) [mysqld] optimizer_switch = "index_merge=off,hash_join=on"
5.2 关键优化特性详解
5.2.1 索引合并
-- 索引合并示例 EXPLAIN SELECT * FROM users WHERE country = 'US' OR age > 30; -- 可能使用索引合并: -- Using union(idx_country,idx_age); Using where
5.2.2 索引条件下推
-- 索引条件下推(ICP) EXPLAIN SELECT * FROM users WHERE country = 'US' AND name LIKE 'A%'; -- 没有ICP:从索引读取所有country='US'的记录,然后在server层过滤name -- 有ICP:在存储引擎层直接过滤country='US' AND name LIKE 'A%'
5.2.3 派生表合并
-- 派生表合并 EXPLAIN SELECT * FROM ( SELECT id, name FROM users WHERE country = 'US' ) AS us_users WHERE name LIKE 'J%'; -- 优化器可能将派生表合并到外层查询: -- SELECT id, name FROM users WHERE country = 'US' AND name LIKE 'J%'
6. 统计信息与成本估算
6.1 统计信息收集
优化器依赖统计信息进行准确的成本估算。
-- 手动更新表统计信息 ANALYZE TABLE users; -- 查看表统计信息 SHOW TABLE STATUS LIKE 'users'; -- 查看索引统计信息 SHOW INDEX FROM users; -- 使用Information Schema查看详细统计 SELECT * FROM information_schema.STATISTICS WHERE TABLE_NAME = 'users'; -- 查看列的基数(不同值数量) SELECT TABLE_NAME, COLUMN_NAME, CARDINALITY FROM information_schema.STATISTICS WHERE TABLE_NAME = 'users' AND COLUMN_NAME = 'country';
6.2 直方图统计
MySQL 8.0+支持直方图统计,提供更准确的选择性估算。
-- 创建直方图统计 ANALYZE TABLE users UPDATE HISTOGRAM ON age, country; -- 查看直方图信息 SELECT * FROM information_schema.COLUMN_STATISTICS WHERE TABLE_NAME = 'users'; -- 删除直方图 ANALYZE TABLE users DROP HISTOGRAM ON age;
6.3 统计信息配置
-- 配置统计信息采样 SHOW VARIABLES LIKE 'innodb_stats%'; -- 重要配置参数: -- innodb_stats_persistent:是否持久化统计信息 -- innodb_stats_auto_recalc:自动重新计算统计信息 -- innodb_stats_persistent_sample_pages:采样页数 -- innodb_stats_transient_sample_pages:临时采样页数 -- 修改配置 SET GLOBAL innodb_stats_persistent_sample_pages = 100;
7. 执行计划分析与解读
7.1 EXPLAIN命令详解
-- 基本EXPLAIN EXPLAIN SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id WHERE u.country = 'US' AND o.amount > 100; -- JSON格式的详细执行计划 EXPLAIN FORMAT=JSON SELECT u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id WHERE u.country = 'US' AND o.amount > 100; -- 传统格式输出字段说明: -- id: 执行顺序 -- select_type: 查询类型 -- table: 访问的表 -- partitions: 匹配的分区 -- type: 访问类型 -- possible_keys: 可能使用的索引 -- key: 实际使用的索引 -- key_len: 使用的索引长度 -- ref: 与索引比较的列 -- rows: 估算的扫描行数 -- filtered: 条件过滤的百分比 -- Extra: 额外信息
7.2 访问类型分析
访问类型(type字段)从最优到最差排序:
-- 最优:system/const EXPLAIN SELECT * FROM users WHERE id = 1; -- 优秀:eq_ref EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id WHERE o.id = 100; -- 良好:ref/range EXPLAIN SELECT * FROM users WHERE country = 'US'; EXPLAIN SELECT * FROM users WHERE age BETWEEN 20 AND 30; -- 一般:index EXPLAIN SELECT country FROM users; -- 覆盖索引扫描 -- 最差:ALL EXPLAIN SELECT * FROM users WHERE name LIKE '%John%'; -- 全表扫描
7.3 Extra字段关键信息
-- 使用索引覆盖 EXPLAIN SELECT id, country FROM users WHERE country = 'US'; -- Extra: Using index -- 使用文件排序 EXPLAIN SELECT * FROM users ORDER BY name; -- Extra: Using filesort -- 使用临时表 EXPLAIN SELECT country, COUNT(*) FROM users GROUP BY country; -- Extra: Using temporary -- 索引条件下推 EXPLAIN SELECT * FROM users WHERE country = 'US' AND name LIKE 'A%'; -- Extra: Using index condition
8. 优化器提示与强制优化
8.1 优化器提示语法
-- 强制使用特定索引 SELECT * FROM users USE INDEX (idx_country) WHERE country = 'US'; -- 忽略特定索引 SELECT * FROM users IGNORE INDEX (idx_country) WHERE country = 'US'; -- 强制索引(即使优化器认为不高效) SELECT * FROM users FORCE INDEX (idx_country) WHERE country = 'US'; -- 查询块命名和提示 SELECT /*+ QB_NAME(qb1) */ u.name FROM users u WHERE u.id IN ( SELECT /*+ QB_NAME(qb2) */ user_id FROM orders WHERE amount > 100 );
8.2 高级优化器提示
-- 设置连接顺序 SELECT /*+ JOIN_ORDER(u, o, p) */ u.name, o.amount, p.product_name FROM users u JOIN orders o ON u.id = o.user_id JOIN products p ON o.product_id = p.id; -- 强制使用哈希连接(MySQL 8.0+) SELECT /*+ HASH_JOIN(u o) */ u.name, o.amount FROM users u JOIN orders o ON u.id = o.user_id; -- 设置资源组 SELECT /*+ RESOURCE_GROUP(background) */ COUNT(*) FROM historical_data; -- 优化器时间限制 SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM large_table;
8.3 索引提示实践
-- 复杂查询的索引提示 EXPLAIN SELECT /*+ INDEX_MERGE(u idx_country idx_age) */ u.name, o.amount FROM users u USE INDEX (idx_country, idx_age) JOIN orders o FORCE INDEX FOR JOIN (idx_user_id) ON u.id = o.user_id WHERE (u.country = 'US' OR u.age > 30) AND o.amount > 100; -- 查看提示效果 SHOW WARNINGS;
9. 常见优化场景与解决方案
9.1 索引选择问题
-- 问题:优化器选择了错误的索引 EXPLAIN SELECT * FROM orders WHERE user_id = 100 AND status = 'completed' AND created_date > '2023-01-01'; -- 解决方案1:使用索引提示 SELECT * FROM orders USE INDEX (idx_user_status_date) WHERE user_id = 100 AND status = 'completed' AND created_date > '2023-01-01'; -- 解决方案2:更新统计信息 ANALYZE TABLE orders; -- 解决方案3:优化索引设计 ALTER TABLE orders ADD INDEX idx_covering (user_id, status, created_date, amount);
9.2 连接顺序问题
-- 问题:非最优的连接顺序 EXPLAIN SELECT u.name, o.amount, p.name FROM users u JOIN orders o ON u.id = o.user_id JOIN products p ON o.product_id = p.id WHERE u.country = 'US' AND p.category = 'Electronics' AND o.amount > 1000; -- 解决方案:使用STRAIGHT_JOIN强制连接顺序 SELECT STRAIGHT_JOIN u.name, o.amount, p.name FROM users u JOIN orders o ON u.id = o.user_id JOIN products p ON o.product_id = p.id WHERE u.country = 'US' AND p.category = 'Electronics' AND o.amount > 1000;
9.3 子查询优化
-- 问题:相关子查询性能差 SELECT u.name, (SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id) as order_count FROM users u WHERE u.country = 'US'; -- 解决方案:重写为LEFT JOIN SELECT u.name, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id WHERE u.country = 'US' GROUP BY u.id, u.name;
10. 优化器监控与诊断
10.1 性能Schema监控
-- 监控优化器相关事件 SELECT * FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 10; -- 查看优化器追踪(需要启用) SHOW VARIABLES LIKE 'optimizer_trace'; SET SESSION optimizer_trace = 'enabled=on'; -- 执行查询 SELECT * FROM users WHERE country = 'US' AND age > 25; -- 查看优化器追踪结果 SELECT * FROM information_schema.OPTIMIZER_TRACE;
10.2 慢查询日志分析
-- 启用慢查询日志 SET GLOBAL slow_query_log = 1; SET GLOBAL long_query_time = 1; -- 1秒 SET GLOBAL log_queries_not_using_indexes = 1; -- 分析慢查询日志 SELECT * FROM mysql.slow_log WHERE query_time > 5 ORDER BY query_time DESC; -- 使用pt-query-digest工具分析(外部工具) -- pt-query-digest /var/lib/mysql/slow.log
10.3 实时诊断工具
-- 查看当前执行查询 SHOW PROCESSLIST; -- 查看锁信息 SELECT * FROM information_schema.INNODB_LOCKS; SELECT * FROM information_schema.INNODB_LOCK_WAITS; -- 查看线程状态 SELECT * FROM performance_schema.threads WHERE TYPE = 'FOREGROUND';
11. 优化器最佳实践
11.1 索引设计原则
-- 复合索引设计 -- 好的设计:等值查询列在前,范围查询列在后 CREATE INDEX idx_optimal ON orders (user_id, status, created_date); -- 使用示例: SELECT * FROM orders WHERE user_id = 100 AND status = 'completed' AND created_date > '2023-01-01'; -- 可以使用索引的前两列进行等值查找,第三列进行范围查找 -- 覆盖索引设计 CREATE INDEX idx_covering ON users (country, age) INCLUDE (name, email); -- 查询可以完全使用索引,避免回表 SELECT country, name FROM users WHERE country = 'US' AND age > 25;
11.2 查询编写最佳实践
-- 避免SELECT * SELECT id, name, email FROM users WHERE country = 'US'; -- 好的实践 SELECT * FROM users WHERE country = 'US'; -- 避免 -- 使用EXISTS代替IN(对于相关子查询) SELECT name FROM users u WHERE EXISTS (SELECT 1 FROM orders o WHERE o.user_id = u.id); -- 好的实践 SELECT name FROM users WHERE id IN (SELECT user_id FROM orders); -- 可能性能较差 -- 避免在WHERE子句中对列进行函数操作 SELECT * FROM users WHERE DATE(created_time) = '2023-01-01'; -- 避免 SELECT * FROM users WHERE created_time >= '2023-01-01' AND created_time < '2023-01-02'; -- 好的实践
11.3 配置优化建议
-- 优化器相关配置建议 -- 在my.cnf中配置 [mysqld] # 启用现代优化特性 optimizer_switch = index_condition_pushdown=on,mrr=on,mrr_cost_based=on, batched_key_access=on,materialization=on,semijoin=on, loosescan=on,firstmatch=on,duplicateweedout=on, subquery_materialization_cost_based=on, use_index_extensions=on,condition_fanout_filter=on, derived_merge=on,hash_join=on # 统计信息配置 innodb_stats_persistent = ON innodb_stats_auto_recalc = ON innodb_stats_persistent_sample_pages = 100 # 缓存配置 query_cache_type = 0 # 在MySQL 8.0中已移除,5.7中根据场景选择
12. 总结
MySQL查询优化器是数据库性能的核心组件,通过智能的成本估算和计划选择,将声明式的SQL转换为高效的执行策略。
优化器核心工作机制:
- 查询重写:简化、标准化查询结构
- 成本估算:基于统计信息估算不同计划的执行成本
- 计划枚举:生成多种可能的执行计划
- 计划选择:基于成本模型选择最优计划
关键优化技术:
- 索引选择与索引条件下推
- 连接顺序与连接算法优化
- 子查询转换为连接操作
- 条件化简与表达式优化
性能调优要点:
- 理解EXPLAIN输出,识别性能瓶颈
- 合理使用优化器提示进行微调
- 维护准确的统计信息
- 设计高效的索引策略
掌握优化器的工作原理,能够帮助开发者编写更高效的SQL语句,设计更合理的数据库结构,并在性能问题出现时快速定位和解决。