数值函数
使用频率较高的数值函数如下:
- abs(x):参数x的绝对值。
- ceil(x):参数x向上取整。
- floor(x):参数x向下取整。
- rand([n]):生成 [0,1)的浮点数。
- round(x,d):将参数x四舍五入到d小数位。
- truncate(x,d):返回数字x,截断为d小数位。
使用示例:
select ceil(-1.23) , floor(1.23) , rand() , rand(1) , round(0.123, 2) , truncate(1.223, 1); +-----------+-----------+------------------+------------------+---------------+-------------------+ |ceil(-1.23)|floor(1.23)| rand() | rand(1) |round(0.123, 2)|truncate(1.223, 1) | +-----------+-----------+------------------+------------------+---------------+-------------------+ | -1.0 | 1 |0.4990247756570755|0.7133693869548766| 0.12 | 1.2 | +-----------+-----------+------------------+------------------+---------------+-------------------+
字符串函数
- length(str):以字节为单位返回字符串的长度。
- char_length(str):返回字符串str的长度,以字符为单位。多字节字符算作单个字符。这意味着对于包含五个2字节字符的字符串,length()返回10,而char_length()返回5。
- lower(str):将所有字符更改为小写。
- upper(str):将所有字符更改为大写。
- left(str,len):返回字符串 str 中最左边的 len 字符,如果任何参数是 null,则返回 null。
- right(str,len):返回字符串 str 中最右边的 len 字符,如果任何参数是 null,则返回 null。
select length('databend') as len , char_length('databend') as char_len , lower('Databend') as lower_char , upper('Databend') as upper_char , left('Databend', 4) as left_char , right('Databend', 4) as right_char; +-------+----------+------------+------------+-----------+------------+ | len | char_len | lower_char | upper_char | left_char | right_char | +-------+----------+------------+------------+-----------+------------+ | 8 | 8 | databend | DATABEND | Data | bend | +-------+----------+------------+------------+-----------+------------+
- trim([{both | leading | trailing} [remstr] from ] str):指定删除字符串的前导或尾随。如果省略删除字符串,则删除空格。
- repeat(str, count):重复字符串 str count 次数。
- lpad(str,len,padstr):左填充字符串 padstr 为长度为 len 个字符。如果str长于len,则返回值将缩短为len字符。
- rpad(str,len,padstr):右填充字符串padstr为len字符的长度。如果str长于len,则返回值将缩短为len字符。
select trim(leading '?' from '???Databend???') as leading_trim , trim(trailing '?' from '???Databend???') as trailing_trim , trim(both '?' from '???Databend???') as both_trim , trim(' Databend ') as spaces_trim , repeat('Databend', 2) as repeat_char , lpad('36363', 10, '0') as lpad_char , lpad('36363', 10, '?') as rpad_char; +-------------+-------------+-----------+-------------+------------------+------------+------------+ |leading_trim |trailing_trim| both_trim | spaces_trim | repeat_char | lpad_char | rpad_char | +-------------+-------------+-----------+-------------+------------------+------------+------------+ | Databend??? | ???Databend | Databend | Databend | DatabendDatabend | 0000036363 | ?????36363 | +-------------+-------------+-----------+-------------+------------------+------------+------------+
- locate(substr, str, pos):返回 substr 子字符串在字符串 str 中从 pos 开始的第几个位置,pos 不写为0首次。
- substring(str,pos,len):从 pos 位置开始,返回字符串 str 中 len 个长度字符,默认为最长长度。
- inster(str,pos,len,new_str):子字符串从位置pos开始,len字符长被字符串newstr取代。
- replace(str,from_str,to_str):将字符串from_str的所有出现都替换为字符串to_str。
selecselect locate(' ', 'Databend Clound') as pos_char , substring('Databend Clound', locate(' ', 'Databend Clound')) as sub_char , insert('Databend Clound', length('Databend Clound'), 7, ' Server') as insert_char , replace('Databend Clound', 'Databend', 'Mysql') as replace_char1 , replace('Databend Clound', 'Databend', '') as replace_char2 , replace('www.mysql.com', 'mysql', '') as replace_char3; +----------+----------+-----------------------+---------------+---------------+---------------+ | pos_char | sub_char | insert_char | replace_char1 | replace_char2 | replace_char3 | +----------+----------+-----------------------+---------------+---------------+---------------+ | 9 | Clound | Databend Cloun Server | Mysqld Clound | d Clound | www.l.com | +----------+----------+-----------------------+---------------+---------------+---------------+
从最后几列可以看出,Databend 和 Mysql 替换过程中始终不能完整替换,因此在使用过程中,多去测试看结果再实际应用。
另外,Databend 不支持 Mysql 中的 substring_index()函数,但是可以发散思维,比如上面的 sub_char 实现,还有以下方法实现:
select substring('Databend Clound',1,locate(' ','Databend Clound')-1) as sub_char; +------------+ | sub_char | +------------+ | Databend | +------------+
- concat(str1,str2,…):合并拼接字符串,返回varchar 数据类型值或 null 数据类型。
- concat(separator,str1,str2,…):根据分隔符 separator 合并字符串,返回varchar 数据类型值或 null 数据类型。
select concat('data', 'bend') as concat_char1 , concat('data', 'bend', null) as concat_char2 , concat('data', 'bend', 1::varchar) as concat_char3 , concat_ws('、', 'data', 'bend') as concat_char4; +--------------+--------------+---------------+--------------+ | concat_char1 | concat_char2 | concat_char3 | concat_char4 | +--------------+--------------+---------------+--------------+ | databend | NULL | databend1 | data、bend | +--------------+--------------+---------------+--------------+
逻辑函数
- ifnull(expr1,expr2):如果不是NULL,则返回expr1。否则返回expr2。它们必须具有相同的数据类型。
- greatest(values1,values2):从一组值中返回最大值。
- least((values1,values2):从一组值中返回最小值。
if(cond1, expr1, [cond2, expr2, …], expr_else):如果cond1为TRUE,则返回expr1。否则,如果cond2为TRUE,则返回expr2,
coalesce(x,…):从左到右检查是否传递了 null 参数,并返回第一个非 null 参数。
select ifnull(null, 'a') as t1 , greatest(2, 3) as t2 , least(2, 3, 4) as t3 , if(1 > 2, false, true) as t4 , coalesce(null, 'Databend', 'Mysql', null) as t5 ;
JSON 函数
对于这类函数,主要针对一些特殊的数据类型 variant ,可以前往【Databend】数据类型查看说明。
- parse_json(str):将输入字符串解释为 json 文档,生成一个 variant 值。如果解析过程中发生错误,则返回NULL值。
- object_keys(variant):返回一个数组,其中包含输入变体对象中的键列表。
- get_path(variant, path_name):通过 path_name 从 variant 中提取值。如果其中一个参数是null则该值作为 variant 或 null 返回,path_name由字段名之前有句号(.)、冒号(:)或索引运算符([index])的串联组成。第一个字段名称不需要指定前导标识符。
- get(variant,index|field_name):根据 index 或 field_name 获取具体值,未获得返回 null。
select parse_json('[-1, 12, 289, 2188, false]') as list_json , parse_json('{ "x" : "abc", "y" : false, "z": 10} ') as key_json , object_keys(parse_json('{"a": 1, "b": [1,2,3]}')) as get_keys , get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k1[0]') as k1 , get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2:k3') as k3 , get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k4') as k4 , get_path(parse_json('{"k1":[0,1,2], "k2":{"k3":3,"k4":4}}'), 'k2.k5') as k5 , get(parse_json('[2.71, 3.14]'), 0) as get_list_value , get(parse_json('{"aa":1, "aa":2, "aa":3}'), 'aa') as get_key_values , get(parse_json('{"aa":1, "aa":2, "aa":3}'), 'aa') as get_null ; +------------------------+------------------------------+-----------+---+---+---+------+--------------+--------------+--------+ | list_json | key_json | get_keys | k1| k3| k4| k5 |get_list_value|get_key_values|get_null| +------------------------+------------------------------+-----------+---+---+---+------+--------------+--------------+--------+ | [-1,12,289,2188,false] | {"x":"abc","y":false,"z":10} | ["a","b"] | 0 | 3 | 4 | NULL | 2.71 | 1 | NULL | +------------------------+------------------------------+-----------+---+---+---+------+--------------+--------------+--------+
- json_path_query_array(variant, path_name):获取指定 variant 值的 path_name 路径返回的所有 json 项,并将结果包装成数组。
- json_path_query(variant, path_name):获取指定 variant 值的 path_name 路径返回的所有 json 项。
with t1 as (select 'laptop' as name, '{"brand": "dell", "colors": ["black", "silver"], "price": 1200, "features": {"ram": "16gb", "storage": "512gb"}}' as details union all select 'smartphone' as name, '{"brand": "apple", "colors": ["white", "black"], "price": 999, "features": {"ram": "4gb", "storage": "128gb"}}' as details union all select 'headphones' as name, '{"brand": "sony", "colors": ["black", "blue", "red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}' as details) select name, json_path_query_array(parse_json(details), '$.features.*') as all_features from t1; +-----------+---------------------+ | name | all_features | +-----------+---------------------+ | Laptop | ["16GB", "512GB"] | +-----------+---------------------+ | Smartphone| ["4GB", "128GB"] | +-----------+---------------------+ | Headphones| ["20h", "5.0"] | +-----------+---------------------+ with t1 as (select 'laptop' as name, '{"brand": "dell", "colors": ["black", "silver"], "price": 1200, "features": {"ram": "16gb", "storage": "512gb"}}' as details union all select 'smartphone' as name, '{"brand": "apple", "colors": ["white", "black"], "price": 999, "features": {"ram": "4gb", "storage": "128gb"}}' as details union all select 'headphones' as name, '{"brand": "sony", "colors": ["black", "blue", "red"], "price": 150, "features": {"battery": "20h", "bluetooth": "5.0"}}' as details) select name, json_path_query(parse_json(details), '$.features.*') as all_features from t1; +------------+--------------+ | name | all_features | +------------+--------------+ | Laptop | "16GB" | | Laptop | "512GB" | | Smartphone | "4GB" | | Smartphone | "128GB" | | Headphones | "20h" | | Headphones | "5.0" | +------------+--------------+
聚合函数
主要介绍常用的几种,如下:
- avg(expr):返回expr的平均值。
- max(expr):返回expr的最大值。
- min(expr):返回expr的最小值。
- sum(expr):返回expr的汇总值。
- count([distinct] expr):返回expr的记录数。
- median(expr):计算数字数据序列的中位数。
- string_agg(expr [, delimiter]):将列的所有非null 值转换为 string,由分隔符分隔。
-- 数据准备 create table if not exists program_languages ( id int, language_name varchar, score int ); insert into program_languages (id, language_name,score) values (1, 'python',80), (2, 'javascript',90), (3, 'java',75), (4, 'c#',95), (5, 'ruby',85); -- 指标计算 select avg(score) as avg_score , max(score) as max_score , min(score) as min_score , median(score) as median_score , sum(score) as total_score , count(distinct language_name) as language_cnt , string_agg(language_name, '、') as language_concat from program_languages; +-----------+-----------+-----------+--------------+-------------+--------------+-----------------------------------+ | avg_score | max_score | min_score | median_score | total_score | language_cnt | language_concat | +-----------+-----------+-----------+--------------+-------------+--------------+-----------------------------------+ | 85 | 95 | 75 | 85 | 425 | 5 | python、javascript、java、c#、ruby | +-----------+-----------+-----------+--------------+-------------+--------------+-----------------------------------+
以上这些聚合函数 null 值都不计算在内。
聚合函数 Databend 比 Mysql 多了一个 median() 可用于更好地计算中位数,其实还有一些其他聚合函数,有兴趣的可以自己扩展。
总结
Databend 作为新一代云原生数据仓库,提供了许多多样化的函数,函数与 Mysql 相比,并无较大差异,我们只要掌握基础常用哪些函数,基本上可以解决工作中大部分问题。
参考资料:
Databend Functions:https://docs.databend.com/sql/sql-functions/
Mysql 常用函数和基础查询:https://blog.csdn.net/weixin_50357986/article/details/134378858