随着流式处理的发展,出现了越来越多的工具和语言,使得数据处理变得更加高效、灵活和易用。在此背景下,SLS 推出了 SPL(SLS Processing Language) 语法,以此统一查询、端上处理、数据加工等的语法,保证了数据处理的灵活性。iLogtail 作为日志、时序数据采集器,在 2.0 版本中,全面支持了 SPL 。本文对处理插件进行了梳理,介绍了如何编写 SPL 语句,从插件处理模式迁移到 2.0 版本的 SPL 处理模式,帮助用户实现更加灵活的端上数据处理。
1. SPL
iLogtail 一共支持 3 种处理模式。
- 原生插件模式:由 C++ 实现的原生插件,性能最强。
- 拓展插件模式:由 Go 实现的拓展插件,提供了丰富的生态,足够灵活。
- SPL 模式:随着 iLogtail 2.0 在 C++ 处理插件中支持了 SPL 的处理能力,对数据处理能力带来了很大的提升,兼顾性能与灵活性。用户只需要编写 SPL 语句,即可以利用 SPL 的计算能力,完成对数据的处理。SPL 语法可以参考链接
特点 | 开发门槛 | |
原生插件 |
|
|
扩展插件 |
|
|
SPL |
|
|
总的来说,iLogtail 2.0 + SPL 主要有以下的优势:
- 统一数据处理语言:对于同样一种格式的数据,用户可以在不同场景中使用同一种语言进行处理,提高了数据处理的效率
- 查询处理更高效:SPL 对弱结构化数据友好,同时 SPL 主要算子由 C++ 实现,接近 iLogtail 1.X 版本的原生性能
- 丰富的工具和函数:SPL 提供了丰富的内置函数和算子,用户可以更加灵活地进行组合
- 简单易学:SPL 属于一种低代码语言,用户可以快速上手,日志搜索、处理一气呵成
接下来,本文将介绍如何用灵活的 SPL 语句,实现其他两种处理模式相同的处理能力。
2. 原生插件对比
2.1 正则解析
根据正则提取提取字段。输入 Nginx 格式:
127.0.0.1 - - [07/Jul/2022:10:43:30 +0800] "POST /PutData?Category=YunOsAccountOpLog" 0.024 18204 200 37 "-" "aliyun-sdk-java"
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtal/debug/simple.log processors: - Type: processor_parse_regex_native SourceKey: content Regex: ([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\\"]*)\" \"([^\\"]*)\" Keys: - ip - time - method - url - request_time - request_length - status - length - ref_url - browser flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-regexp content, '([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\\"]*)\" \"([^\\"]*)\"' as ip, time, method, url, request_time, request_length, status, length, ref_url, browser | project-away content flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "ip": "127.0.0.1", "time": "07/Jul/2022:10:43:30", "method": "POST", "url": "/PutData?Category=YunOsAccountOpLog", "request_time": "0.024", "request_length": "18204", "status": "200", "length": "37", "ref_url": "-", "browser": "aliyun-sdk-java", "__time__": "1713184059" }
2.2 分隔符解析
根据分隔符分隔提取字段,输入:
127.0.0.1,07/Jul/2022:10:43:30 +0800,POST,PutData Category=YunOsAccountOpLog,0.024,18204,200,37,-,aliyun-sdk-java
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_delimiter_native SourceKey: content Separator: "," Quote: '"' Keys: - ip - time - method - url - request_time - request_length - status - length - ref_url - browser flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-csv content as ip, time, method, url, request_time, request_length, status, length, ref_url, browser | project-away content flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "ip": "127.0.0.1", "time": "07/Jul/2022:10:43:30 +0800", "method": "POST", "url": "PutData?Category=YunOsAccountOpLog", "request_time": "0.024", "request_length": "18204", "status": "200", "length": "37", "ref_url": "-", "browser": "aliyun-sdk-java", "__time__": "1713231487" }
2.3 Json 解析
解析 json 格式日志,输入:
{"url": "POST /PutData?Category=YunOsAccountOpLog HTTP/1.1","ip": "10.200.98.220", "user-agent": "aliyun-sdk-java", "request": "{\"status\":\"200\",\"latency\":\"18204\"}", "time": "07/Jul/2022:10:30:28", "__time__": "1713237315" }
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_json_native SourceKey: content flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-json content | project-away content flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "url": "POST /PutData?Category=YunOsAccountOpLog HTTP/1.1", "ip": "10.200.98.220", "user-agent": "aliyun-sdk-java", "request": "{\"status\":\"200\",\"latency\":\"18204\"}", "time": "07/Jul/2022:10:30:28", "__time__": "1713237315" }
2.4 正则解析+时间解析
根据正则表达式解析字段,并将其中的一个字段解析成日志时间,输入:
127.0.0.1,07/Jul/2022:10:43:30 +0800,POST,PutData Category=YunOsAccountOpLog,0.024,18204,200,37,-,aliyun-sdk-java
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_regex_native SourceKey: content Regex: ([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\\"]*)\" \"([^\\"]*)\" Keys: - ip - time - method - url - request_time - request_length - status - length - ref_url - browser - Type: processor_parse_timestamp_native SourceKey: time SourceFormat: '%Y-%m-%dT%H:%M:%S' flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-regexp content, '([\d\.]+) \S+ \S+ \[(\S+)\] \"(\w+) ([^\\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\\"]*)\" \"([^\\"]*)\"' as ip, time, method, url, request_time, request_length, status, length, ref_url, browser | extend ts=date_parse(time, '%Y-%m-%d %H:%i:%S') | extend __time__=cast(to_unixtime(ts) as INTEGER) | project-away ts | project-away content flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "ip": "127.0.0.1", "time": "07/Jul/2022:10:43:30 +0800", "method": "POST", "url": "PutData?Category=YunOsAccountOpLog", "request_time": "0.024", "request_length": "18204", "status": "200", "length": "37", "ref_url": "-", "browser": "aliyun-sdk-java", "__time__": "1713231487" }
2.5 正则解析+过滤
根据正则表达式解析字段,并根据解析后的字段值过滤日志。输入:
127.0.0.1 - - [07/Jul/2022:10:43:30 +0800] "POST /PutData?Category=YunOsAccountOpLog" 0.024 18204 200 37 "-" "aliyun-sdk-java" 127.0.0.1 - - [07/Jul/2022:10:44:30 +0800] "Get /PutData?Category=YunOsAccountOpLog" 0.024 18204 200 37 "-" "aliyun-sdk-java"
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_regex_native SourceKey: content Regex: ([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\\"]*)\" \"([^\\"]*)\" Keys: - ip - time - method - url - request_time - request_length - status - length - ref_url - browser - Type: processor_filter_regex_native FilterKey: - method - status FilterRegex: - ^(POST|PUT)$ - ^200$ flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-regexp content, '([\d\.]+) \S+ \S+ \[(\S+) \S+\] \"(\w+) ([^\\"]*)\" ([\d\.]+) (\d+) (\d+) (\d+|-) \"([^\\"]*)\" \"([^\\"]*)\"' as ip, time, method, url, request_time, request_length, status, length, ref_url, browser | project-away content | where regexp_like(method, '^(POST|PUT)$') and regexp_like(status, '^200$') flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "ip": "127.0.0.1", "time": "07/Jul/2022:10:43:30", "method": "POST", "url": "/PutData?Category=YunOsAccountOpLog", "request_time": "0.024", "request_length": "18204", "status": "200", "length": "37", "ref_url": "-", "browser": "aliyun-sdk-java", "__time__": "1713238839" }
2.6 脱敏
将日志中的敏感信息脱敏。输入:
{"account":"1812213231432969","password":"04a23f38"}
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_desensitize_native SourceKey: content Method: const ReplacingString: "******" ContentPatternBeforeReplacedString: 'password":"' ReplacedContentPattern: '[^"]+' flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-regexp content, 'password":"(\S+)"' as password | extend content=replace(content, password, '******') flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "{\"account\":\"1812213231432969\",\"password\":\"******\"}", "__time__": "1713239305" }
3. 拓展插件对比
3.1 添加字段
在输出结果中添加字段,输入:
this is a test log
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_add_fields Fields: service: A IgnoreIfExist: false flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | extend service='A' flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "this is a test log", "service": "A", "__time__": "1713240293" }
3.2 Json 解析+丢弃字段
解析 json 并删除解析后的指定字段。输入:
{"key1": 123456, "key2": "abcd"}
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_json_native SourceKey: content - Type: processor_drop DropKeys: - key1 flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-json content | project-away content | project-away key1 flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "key2": "abcd", "__time__": "1713245944" }
3.3 Json 解析+重命名字段
解析 json 并重命名解析后的字段。输入:
{"key1": 123456, "key2": "abcd"}
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_json_native SourceKey: content - Type: processor_rename SourceKeys: - key1 DestKeys: - new_key1 flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-json content | project-away content | project-rename new_key1=key1 flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "new_key1": "123456", "key2": "abcd", "__time__": "1713249130" }
3.4 Json 解析+过滤日志
解析 json 并根据字段条件过滤日志。输入:
{"ip": "10.**.**.**", "method": "POST", "browser": "aliyun-sdk-java"} {"ip": "10.**.**.**", "method": "POST", "browser": "chrome"} {"ip": "192.168.**.**", "method": "POST", "browser": "aliyun-sls-ilogtail"}
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_json_native SourceKey: content - Type: processor_filter_regex Include: ip: "10\\..*" method: POST Exclude: browser: "aliyun.*" flushers: - Type: flusher_stdout OnlyStdout: tru
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-json content | project-away content | where regexp_like(ip, '10\..*') and regexp_like(method, 'POST') and not regexp_like(browser, 'aliyun.*') flushers: - Type: flusher_stdout OnlyStdout: tru
输出:
{ "ip": "10.**.**.**", "method": "POST", "browser": "chrome", "__time__": "1713246645" }
3.5 Json 解析+字段值映射处理
解析 json 并根据字段值的不同,映射为不同的值。输入:
{"_ip_":"192.168.0.1","Index":"900000003"} {"_ip_":"255.255.255.255","Index":"3"}
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_parse_json_native SourceKey: content - Type: processor_dict_map MapDict: "127.0.0.1": "LocalHost-LocalHost" "192.168.0.1": "default login" SourceKey: "_ip_" DestKey: "_processed_ip_" Mode: "overwrite" HandleMissing": true Missing: "Not Detected" flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-json content | project-away content | extend _processed_ip_= CASE WHEN _ip_ = '127.0.0.1' THEN 'LocalHost-LocalHost' WHEN _ip_ = '192.168.0.1' THEN 'default login' ELSE 'Not Detected' END flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "_ip_": "192.168.0.1", "Index": "900000003", "_processed_ip_": "default login", "__time__": "1713259557" }
3.6 字符串替换
替换日志中的指定字符串。输入:
hello,how old are you? nice to meet you
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_string_replace SourceKey: content Method: const Match: "how old are you?" ReplaceString: "" flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | extend content=replace(content, 'how old are you?', '') flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "hello, nice to meet you", "__time__": "1713260499" }
3.7 数据编码与解码
Base64
对日志进行 Base64 加密。输入:
this is a test log
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_base64_encoding SourceKey: content NewKey: content1 flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | extend content1=to_base64(cast(content as varbinary)) flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "this is a test log", "content1": "dGhpcyBpcyBhIHRlc3QgbG9n", "__time__": "1713318724" }
MD5
对日志进行 MD5 加密。输入:
hello,how old are you? nice to meet you
原有插件:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_string_replace SourceKey: content Method: const Match: "how old are you?" ReplaceString: "" flushers: - Type: flusher_stdout OnlyStdout: true
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | extend content1=lower(to_hex(md5(cast(content as varbinary)))) flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "this is a test log", "content1": "4f3c93e010f366eca78e00dc1ed08984", "__time__": "1713319673" }
4. 新增能力项
4.1 数学计算
输入:4。
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | extend val = cast(content as double) | extend power_test = power(val, 2) | extend round_test = round(val) | extend sqrt_test = sqrt(val) flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "4", "power_test": 16.0, "round_test": 4.0, "sqrt_test": 2.0, "val": 4.0, "__time__": "1713319673" }
4.2 URL 计算
URL 编码解码
输入:
https://homenew.console.aliyun.com/home/dashboard/ProductAndService
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | extend encoded = url_encode(content) | extend decoded = url_decode(encoded) flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "https://homenew.console.aliyun.com/home/dashboard/ProductAndService", "decoded": "https://homenew.console.aliyun.com/home/dashboard/ProductAndService", "encoded": "https%3A%2F%2Fhomenew.console.aliyun.com%2Fhome%2Fdashboard%2FProductAndService", "__time__": "1713319673" }
URL 提取
输入:
https://sls.console.aliyun.com:443/lognext/project/dashboard-all/logsearch/nginx-demo?accounttraceid=d6241a173f88471c91d3405cda010ff5ghdw
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | extend host = url_extract_host(content) | extend query = url_extract_query(content) | extend path = url_extract_path(content) | extend protocol = url_extract_protocol(content) | extend port = url_extract_port(content) | extend param = url_extract_parameter(content, 'accounttraceid') flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "content": "https://sls.console.aliyun.com:443/lognext/project/dashboard-all/logsearch/nginx-demo?accounttraceid=d6241a173f88471c91d3405cda010ff5ghdw", "host": "sls.console.aliyun.com", "param": "d6241a173f88471c91d3405cda010ff5ghdw", "path": "/lognext/project/dashboard-all/logsearch/nginx-demo", "port": "443", "protocol": "https", "query": "accounttraceid=d6241a173f88471c91d3405cda010ff5ghdw", "__time__": "1713319673" }
4.3 比较&逻辑运算符
输入:
{"num1": 199, "num2": 10, "num3": 9}
SPL:
enable: true inputs: - Type: input_file FilePaths: - /workspaces/ilogtail/debug/simple.log processors: - Type: processor_spl Script: | * | parse-json content | extend compare_result = cast(num1 as double) > cast(num2 as double) AND cast(num2 as double) > cast(num3 as double) flushers: - Type: flusher_stdout OnlyStdout: true
输出:
{ "compare_result": "true", "content": "{\"num1\": 199, \"num2\": 10, \"num3\": 9}", "num1": "199", "num2": "10", "num3": "9", "__time__": "1713319673" }
5. 其他
更多能力请参考链接
欢迎大家补充更多 iLogtail SPL 实践案例!
作者:阿柄