日志服务数据加工最佳实践: 解析CSV格式的日志-阿里云开发者社区

开发者社区> 阿里云存储服务> 正文

日志服务数据加工最佳实践: 解析CSV格式的日志

简介: 本篇介绍日志服务数据加工最佳实践: 解析CSV格式的日志


本案例是根据客户实际应用需求中产生,以下将详细从原始数据到需求再到解决方案等几个方面向读者解答如何使用LOG DSL加工语法,解决实际生产中的问题。

原始日志

_program_:error
_severity_:6
_priority_:14
_facility_:1
topic:syslog-forwarder
content:10.64.10.20|10/Jun/2019:11:32:16 +0800|m.zf.cn|GET /zf/11874.html HTTP/1.1|200|0.077|6404|10.11.186.82:8001|200|0.060|https://yz.m.sm.cn/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei|-|Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-A00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36|-|-

需求

  1. _program_等于access时,对字段content做一次PSV(pipe分隔的解析),然后丢弃content字段。
  2. request: GET /css/mip-base.css HTTP/1.1这个字段需要拆分为request_method,http_version,以及request。
  3. http_referer做url解码
  4. time做格式化

解决方案

1、如果_program_是access,则执行e_psv(解析content内容,详细用法见全局操作函数部分)函数,并删除原始字段content内容

e_if(e_search("_program_==access"), e_compose(e_psv("content", "remote_addr, time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid", restrict=True), e_drop_fields("content")))

返回的日志为:

__source__:  1.2.3.4
__tag__:__client_ip__:  2.3.4.5
__tag__:__receive_time__:  1562845168
__topic__:  
_facility_:  1
_priority_:  14
_program_:  access
_severity_:  6
body_bytes_sent:  6404
guid:  -
host:  m.zf.cn
http_referer:  https://yz.m.sm.cn/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei
http_user_agent:  Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36
http_x_forwarded_for:  -
remote_addr:  10.64.10.20
request:  GET /zf/11874.html HTTP/1.1
request_time:  0.077
session_id:  -
status:  200
time_local:  10/Jun/2019:11:32:16 +0800
topic:  syslog-forwarder
upstream_addr:  10.11.186.82:8001
upstream_response_time:  0.060
upstream_status:  200

2、使用e_regex函数拆分request,解析成request_method,request,http_version

e_regex("request",r"^(?P<request_method>\w+) (?P<request>.+) (?P<http_version>\w+/[\d\.]+)$")

返回的日志为:

request:  /zf/11874.html
request_method:  GET
http_version:  HTTP/1.1

3、对http_referer做url解码

e_set("http",url_decoding("http_referer"))

返回的日志为:

http: https://yz.m.sm.cn/s?q=蛋花龙须面的做法&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei

4、对时间做格式化处理

e_set("time_local",dt_strptime(v("time"),"%d/%b/%Y:%H:%M:%S +0800"))

返回的日志为:

time_local:  2019-06-13 13:45:11

5、综上解决方案具体如下:

e_if(e_search("_program_==access"), e_compose(e_psv("content", "remote_addr, time_local,host,request,status,request_time,body_bytes_sent,upstream_addr,upstream_status, upstream_response_time,http_referer,http_x_forwarded_for,http_user_agent,session_id,guid", restrict=True), e_drop_fields("content")))
e_regex("request",r"^(?P<request_method>\w+) (?P<request>.+) (?P<http_version>\w+/[\d\.]+)$")
e_set("http",url_decoding("http_referer"))
e_set("time_local",dt_strptime(v("time"),"%d/%b/%Y:%H:%M:%S +0800"))

输出的日志

__source__:  1.2.3.4
__tag__:__client_ip__:  2.3.4.5
__tag__:__receive_time__:  1562840879
__topic__:  
_facility_:  1
_priority_:  14
_program_:  access
_severity_:  6
body_bytes_sent:  6404
guid:  -
host:  m.zf.cn
http_referer:  https://yz.m.sm.cn/s?q=%E8%9B%8B%E8%8A%B1%E9%BE%99%E9%A1%BB%E9%9D%A2%E7%9A%84%E5%81%9A%E6%B3%95&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei
http_user_agent:  Mozilla/5.0 (Linux; Android 9; HWI-AL00 Build/HUAWEIHWI-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36
http_x_forwarded_for:  -
remote_addr:  10.64.10.20
request:  GET /zf/11874.html HTTP/1.1
request_time:  0.077
session_id:  -
status:  200
time_local:  10/Jun/2019:11:32:16 +0800
topic:  syslog-forwarder
upstream_addr:  10.11.186.82:8001
upstream_response_time:  0.060
upstream_status:  200
http: https://yz.m.sm.cn/s?q=蛋花龙须面的做法&from=wy878378&uc_param_str=dnntnwvepffrgibijbprsvdsei

进一步参考

欢迎扫码加入官方钉钉群获得实时更新与阿里云工程师的及时直接的支持:
image

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

分享:

阿里云存储基于飞天盘古2.0分布式存储系统,产品多种多样,充分满足用户数据存储和迁移上云需求。

官方博客
链接