2.5 常见配置
配置项主要在 config.xml 或 users.xml 中, 基本上都在 users.xml 里
config.xml 的配置项
https://clickhouse.com/docs/en/operations/server-configuration-parameters/settings/
users.xml 的配置项
https://clickhouse.tech/docs/en/operations/settings/settings/
2.5.1 CPU 资源
2.5.2 内存资源
2.5.3 存储
ClickHouse 不支持设置多数据目录,为了提升数据 io 性能,可以挂载虚拟券组,一个券组绑定多块物理磁盘提升读写性能,多数据查询场景 SSD 会比普通机械硬盘快 2-3 倍。
3 ClickHouse语法优化规则
ClickHouse的 SQL 优化规则是基于 RBO(Rule Based Optimization),下面是一些优化规则
3.1 准备测试用表
上传官方的数据集
将 visits_v1.tar 和 hits_v1.tar 上传到虚拟机,解压到 clickhouse 数据路径下
// 解压到 clickhouse 数据路径 sudo tar -xvf hits_v1.tar -C /var/lib/clickhouse sudo tar -xvf visits_v1.tar -C /var/lib/clickhouse //修改所属用户 sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/data/datasets sudo chown -R clickhouse:clickhouse /var/lib/clickhouse/metadata/datasets
- 重启 clickhouse-server
sudo clickhouse restart
- 执行查询
clickhouse-client --query "SELECT COUNT(*) FROM datasets.hits_v1" clickhouse-client --query "SELECT COUNT(*) FROM datasets.visits_v1"
注意:官方的 tar 包,包含了建库、建表语句、数据内容,这种方式不需要手动建库、建表,最方便。
hits_v1 表有 130 多个字段,880 多万条数据
visits_v1 表有 180 多个字段,160 多万条数据
3.2 COUNT 优化
在调用 count 函数时,如果使用的是 count() 或者 count(*),且没有 where 条件,则会直接使用 system.tables 的 total_rows,例如:
EXPLAIN SELECT count()FROM datasets.hits_v1; Union Expression (Projection) Expression (Before ORDER BY and SELECT) MergingAggregated ReadNothing (Optimized trivial count)
注意 Optimized trivial count ,这是对 count 的优化, 只要括号中不写具体的字段就会进行优化
。
如果 count 具体的列字段,则不会使用此项优化:
EXPLAIN SELECT count(CounterID) FROM datasets.hits_v1; Union Expression (Projection) Expression (Before ORDER BY and SELECT) Aggregating Expression (Before GROUP BY) ReadFromStorage (Read from MergeTree)
3.3 消除子查询重复字段
- 下面语句子查询中有两个重复的 id 字段,会被去重:
EXPLAIN SYNTAX SELECT a.UserID, b.VisitID, a.URL, b.UserID FROM hits_v1 AS a LEFT JOIN ( SELECT UserID, UserID as HaHa, VisitID FROM visits_v1) AS b USING (UserID) limit 3;
- 返回优化语句:
SELECT UserID, VisitID, URL, b.UserID FROM hits_v1 AS a ALL LEFT JOIN ( SELECT UserID, VisitID FROM visits_v1 ) AS b USING (UserID) LIMIT 3
3.4 谓词下推
当 group by 有 having 子句,但是没有 with cube、with rollup 或者 with totals 修饰的时候,having 过滤会下推到 where 提前过滤。例如下面的查询,HAVING name 变成了 WHERE name,在 group by 之前过滤:
EXPLAIN SYNTAX SELECT UserID FROM hits_v1 GROUP BY UserID HAVING UserID = '8585742290196126178'; //返回优化语句 SELECT UserID FROM hits_v1 WHERE UserID = \'8585742290196126178\' GROUP BY UserID
子查询也支持谓词下推:
EXPLAIN SYNTAX SELECT * FROM ( SELECT UserID FROM visits_v1 ) WHERE UserID = '8585742290196126178' //返回优化后的语句 SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = \'8585742290196126178\' ) WHERE UserID = \'8585742290196126178\'
再来一个复杂例子:
EXPLAIN SYNTAX SELECT * FROM ( SELECT * FROM ( SELECT UserID FROM visits_v1 ) UNION ALL SELECT * FROM ( SELECT UserID FROM visits_v1 ) ) WHERE UserID = '8585742290196126178' //返回优化后的语句 SELECT UserID FROM ( SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = \'8585742290196126178\' ) WHERE UserID = \'8585742290196126178\' UNION ALL SELECT UserID FROM ( SELECT UserID FROM visits_v1 WHERE UserID = \'8585742290196126178\' ) WHERE UserID = \'8585742290196126178\' ) WHERE UserID = \'8585742290196126178\'
3.5 聚合计算外推
聚合函数内的计算,会外推,例如:
EXPLAIN SYNTAX SELECT sum(UserID * 2) FROM visits_v1 //返回优化后的语句 SELECT sum(UserID) * 2 FROM visits_v1
3.6 聚合函数消除
如果对聚合键,也就是 group by key 使用 min、max、any 聚合函数,则将函数消除,
例如:
EXPLAIN SYNTAX SELECT sum(UserID * 2), max(VisitID), max(UserID) FROM visits_v1 GROUP BY UserID //返回优化后的语句 SELECT sum(UserID) * 2, max(VisitID), UserID FROM visits_v1 GROUP BY UserID
3.7 删除重复的 order by key
例如下面的语句,重复的聚合键 id 字段会被去重:
EXPLAIN SYNTAX SELECT * FROM visits_v1 ORDER BY UserID ASC, UserID ASC, VisitID ASC, VisitID ASC //返回优化后的语句: select …… FROM visits_v1 ORDER BY UserID ASC, VisitID ASC
3.8 删除重复的 limit by key
例如下面的语句,重复声明的 name 字段会被去重:
EXPLAIN SYNTAX SELECT * FROM visits_v1 LIMIT 3 BY VisitID, VisitID LIMIT 10 //返回优化后的语句: select …… FROM visits_v1 LIMIT 3 BY VisitID LIMIT 10
3.9 删除重复的 USING Key
例如下面的语句,重复的关联键 id 字段会被去重:
EXPLAIN SYNTAX SELECT a.UserID, a.UserID, b.VisitID, a.URL, b.UserID FROM hits_v1 AS a LEFT JOIN visits_v1 AS b USING (UserID, UserID) //返回优化后的语句: SELECT UserID, UserID, VisitID, URL, b.UserID FROM hits_v1 AS a ALL LEFT JOIN visits_v1 AS b USING (UserID)
3.10 标量替换
如果子查询只返回一行数据,在被引用的时候用标量替换,例如下面语句中的 total_disk_usage 字段:
EXPLAIN SYNTAX WITH ( SELECT sum(bytes) FROM system.parts WHERE active ) AS total_disk_usage SELECT (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage, table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10; //返回优化后的语句: WITH CAST(0, \'UInt64\') AS total_disk_usage SELECT (sum(bytes) / total_disk_usage) * 100 AS table_disk_usage, table FROM system.parts GROUP BY table ORDER BY table_disk_usage DESC LIMIT 10
3.11 三元运算优化
如果开启了 optimize_if_chain_to_multiif 参数,三元运算符会被替换成 multiIf 函数,
例如:
EXPLAIN SYNTAX SELECT number = 1 ? 'hello' : (number = 2 ? 'world' : 'oldlu') FROM numbers(10) settings optimize_if_chain_to_multiif = 1; //返回优化后的语句: SELECT multiIf(number = 1, \'hello\', number = 2, \'world\', \'oldlu\') FROM numbers(10) SETTINGS optimize_if_chain_to_multiif = 1