关于Skywalking
的介绍请参见中文文档
Skywalking简单环境图
安装
环境:
linux ubuntu 18 TLS arm64
elasticsearch: 7.11.0
skywalking: 8.4.0
1. 安装ElasticSearch
参考ELK最佳实践
2. 安装Skywalking
2.1 下载安装包
进入下载页面,选择最新的版本进行下载,以下是本次笔记所下载版本
https://www.apache.org/dyn/closer.cgi/skywalking/8.4.0/apache-skywalking-apm-es7-8.4.0.tar.gz
2.2 解压
tar -xf apache-skywalking-apm-es7-8.4.0.tar.gz # 移动至/opt/server/ 目录下 mv apache-skywalking-apm-bin-es7 skywalking
2.3 修改配置
vim /opt/server/skywalking/config/application.yml
storage: selector: ${SW_STORAGE:elasticsearch7} elasticsearch7: nameSpace: ${SW_NAMESPACE:"elasticsearch7"} clusterNodes: ${SW_STORAGE_ES_CLUSTER_NODES:192.168.1.13:9200,192.168.1.14:9200,192.168.1.15:9200} protocol: ${SW_STORAGE_ES_HTTP_PROTOCOL:"http"} #trustStorePath: ${SW_STORAGE_ES_SSL_JKS_PATH:""} #trustStorePass: ${SW_STORAGE_ES_SSL_JKS_PASS:""} dayStep: ${SW_STORAGE_DAY_STEP:1} # Represent the number of days in the one minute/hour/day index. indexShardsNumber: ${SW_STORAGE_ES_INDEX_SHARDS_NUMBER:1} # Shard number of new indexes indexReplicasNumber: ${SW_STORAGE_ES_INDEX_REPLICAS_NUMBER:1} # Replicas number of new indexes # Super data set has been defined in the codes, such as trace segments.The following 3 config would be improve es performance when storage super size data in es. superDatasetDayStep: ${SW_SUPERDATASET_STORAGE_DAY_STEP:-1} # Represent the number of days in the super size dataset record index, the default value is the same as dayStep when the value is less than 0 superDatasetIndexShardsFactor: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_SHARDS_FACTOR:5} # This factor provides more shards for the super data set, shards number = indexShardsNumber * superDatasetIndexShardsFactor. Also, this factor effects Zipkin and Jaeger traces. superDatasetIndexReplicasNumber: ${SW_STORAGE_ES_SUPER_DATASET_INDEX_REPLICAS_NUMBER:0} # Represent the replicas number in the super size dataset record index, the default value is 0. user: ${SW_ES_USER:"elastic"} password: ${SW_ES_PASSWORD:"elastic"} secretsManagementFile: ${SW_ES_SECRETS_MANAGEMENT_FILE:""} # Secrets management file in the properties format includes the username, password, which are managed by 3rd party tool. bulkActions: ${SW_STORAGE_ES_BULK_ACTIONS:1000} # Execute the async bulk record data every ${SW_STORAGE_ES_BULK_ACTIONS} requests syncBulkActions: ${SW_STORAGE_ES_SYNC_BULK_ACTIONS:50000} # Execute the sync bulk metrics data every ${SW_STORAGE_ES_SYNC_BULK_ACTIONS} requests flushInterval: ${SW_STORAGE_ES_FLUSH_INTERVAL:10} # flush the bulk every 10 seconds whatever the number of requests concurrentRequests: ${SW_STORAGE_ES_CONCURRENT_REQUESTS:2} # the number of concurrent requests resultWindowMaxSize: ${SW_STORAGE_ES_QUERY_MAX_WINDOW_SIZE:10000} metadataQueryMaxSize: ${SW_STORAGE_ES_QUERY_MAX_SIZE:5000} segmentQueryMaxSize: ${SW_STORAGE_ES_QUERY_SEGMENT_SIZE:200} profileTaskQueryMaxSize: ${SW_STORAGE_ES_QUERY_PROFILE_TASK_SIZE:200} oapAnalyzer: ${SW_STORAGE_ES_OAP_ANALYZER:"{\"analyzer\":{\"oap_analyzer\":{\"type\":\"stop\"}}}"} # the oap analyzer. oapLogAnalyzer: ${SW_STORAGE_ES_OAP_LOG_ANALYZER:"{\"analyzer\":{\"oap_log_analyzer\":{\"type\":\"standard\"}}}"} # the oap log analyzer. It could be customized by the ES analyzer configuration to support more language log formats, such as Chinese log, Japanese log and etc. advanced: ${SW_STORAGE_ES_ADVANCED:""}
只需修改
storage
配置
storage.selector
:选择哪种数据库进行存储,我们选择elasticsearch7修改
elastcisearch
中的以下配置
nameSpace
: 命名空间
clusterNodes
: es集群
user
: es用户名
password
: es密码
2.4 启动服务
/opt/server/skywalking/bin/oapService.sh # 查看日志 tail -100f /opt/server/skywalking/logs/skywalking-oap-server.log
第一次启动时间较长,需要初始化环境
2.5 启动UI服务
/opt/server/skywalking/bin/webappService.sh
如需修改配置 webapp/webapp.yml
2.6 查看控制台
3. 服务集成
skywalking已经搭建好了,那么现在就开始集成到服务里吧
3.1 准备
假装你已经知道服务是使用skywalking-agent
进行数据采集的(不知道就看最开头的文档吧),关于agent
相关的文件在/opt/server/skywalking/agent
目录下
3.2 修改服务启动脚本
- java 脚本
export SW_AGENT_NAME=demo export SW_AGENT_SPAN_LIMIT=2000 export SW_AGENT_COLLECTOR_BACKEND_SERVICES=122.9.35.11:21800 JAVA_AGENT="-javaagent:/opt/server/skywalking/agent/skywalking-agent.jar" javar -jar ${JAVA_AGENT} demo.jar
- Docker
FROM openjdk:8-jdk-alpine3.8 ENV SW_AGENT_NAME=demo \ SW_AGENT_SPAN_LIMIT=2000 \ SW_AGENT_COLLECTOR_BACKEND_SERVICES=122.9.35.11:21800 \ JAVA_AGENT=-javaagent:/app/agent/skywalking-agent.jar \ ENTRYPOINT ["sh","-c","java ${JAVA_AGENT} -jar /app/app.jar"]
- 在 docker-compose中编辑数据卷挂载
volumes: - /opt/server/skywalking/agent:/app/agent
SW_AGENT_NAME: 服务名
SW_AGENT_SPAN_LIMIT:调用链路记录的最大跨度
SW_AGENT_COLLECTOR_BACKEND_SERVICES:skywalking-oap的地址
这些配置都在agent/config/agent.config中
4. 测试
这里我已经编写好了一个接口:/oauth/login
这个接口将途径 apiserver(网关) -> auth(认证中心) -> user(用户服务) -> mysql | redis
发起一个请求
curl http://localhost:9001/oauth/login
查看ui界面
查看拓扑图
查看调用链路
5. 性能剖析
我们发现有一个性能剖析的的tab,怎么用呢?
端点名称在追踪链路中找到
点击分析,可以看到出现了线程栈,并且有每个方法的调用时长
6.告警
默认告警规则
为了方便,skywalking在发行版中提供了默认的alarm setting.yml
文件,包括以下规则
1.最近 3 分钟内服务平均响应时间超过 1 秒。
2.服务成功率在最近 2 分钟内低于80%。
3.服务响应时间在最近 3 分钟内低于 1000 毫秒.
4.服务实例在最近 2 分钟内的平均响应时间超过 1 秒。
5.端点平均响应时间在最近 2 分钟内超过1秒。
6.数据库访问平均响应时间在过去 2 分钟内超过 1 秒。
7.端点之间平均响应时间在最近 2 分钟内超过 1 秒。
想要定制化告警需要自己实现,如何实现具体参考官方文档
7. 集成ELK
我们发现,在链路追踪中,存在一个trace id
,这个trace id
是全链路的,通过这个trace id
我们可以找到整条调用链,如果我们将这个trace id
放到日志中,再集成到ELK, 嘿嘿~
- 引入依赖
<!-- skywalking --> <dependency> <groupId>org.apache.skywalking</groupId> <artifactId>apm-toolkit-logback-1.x</artifactId> <version>8.4.0</version> </dependency>
- 修改
logback-spring.xml
<?xml version="1.0" encoding="UTF-8"?> <configuration scan="true" scanPeriod="60 seconds" debug="false"> <include resource="org/springframework/boot/logging/logback/defaults.xml"/> <springProperty name="applicationName" scope="context" source="spring.application.name" /> <property name="LOG_FILE_NAME_PATTERN" value="logs/${applicationName}/log.out"/> <!-- 日志格式 --> <property name="CONSOLE_LOG_PATTERN" value="%clr(%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}}){faint} %clr(${LOG_LEVEL_PATTERN:-%5p}) %clr(${PID:- }){magenta} %clr(---){faint} %clr([%15.15t]){faint} %clr(%c){cyan} %clr(:){faint} %m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}"/> <property name="FILE_LOG_PATTERN" value="%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}} ${applicationName} [%tid] ${LOG_LEVEL_PATTERN:-%5p} ${PID:- } --- [%t] %c : %m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}"/> <!--输出到控制台--> <appender name="console" class="ch.qos.logback.core.ConsoleAppender"> <encoder> <pattern>${CONSOLE_LOG_PATTERN}</pattern> </encoder> </appender> <!--输出到文件--> <appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender"> <file>${LOG_FILE_NAME_PATTERN}</file> <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy"> <fileNamePattern>${LOG_FILE_NAME_PATTERN}.%d{yyyy-MM-dd}.%i.gz</fileNamePattern> <!-- 日志保留天数 --> <maxHistory>7</maxHistory> <!-- 每个日志文件的最大值 --> <maxFileSize>10MB</maxFileSize> </rollingPolicy> <encoder class="ch.qos.logback.core.encoder.LayoutWrappingEncoder"> <layout class="org.apache.skywalking.apm.toolkit.log.logback.v1.x.TraceIdPatternLogbackLayout"> <pattern>${FILE_LOG_PATTERN}</pattern> </layout> </encoder> </appender> <!-- (多环境配置日志级别)根据不同的环境设置不同的日志输出级别 --> <springProfile name="local"> <root level="info"> <appender-ref ref="console"/> </root> </springProfile> <springProfile name="dev"> <root level="info"> <appender-ref ref="file"/> </root> </springProfile> <springProfile name="staging"> <root level="info"> <appender-ref ref="file"/> </root> </springProfile> <springProfile name="online"> <root level="info"> <appender-ref ref="console"/> <appender-ref ref="file"/> </root> </springProfile> </configuration>
主要修改项:
FILE_LOG_PATTERN中添加: [%tid]
encode中layout的class修改为:TraceIdPatternLogbackLayout (必须!!)
如何在ELK查看参考 参考ELK最佳实践 :JAVA项目实战小结
最后,更多的内容请参考官方文档,官网文档才是最好最快的学习途径