输入数据
{"problem_id": "001", "time_range": "2025-08-28 15:08:03 ~ 2025-08-28 15:13:03", "candidate_root_causes": ["ad.Failure", "ad.LargeGc", "ad.memory", "ad.cpu","ad.networkLatency", "cart.Failure", "cart.cpu", "checkout.cpu", "checkout.Failure", "image-provider.cpu", "image-provider.memory", "image-provider.networkLatency", "inventory.Failure", "inventory.cpu", "inventory.memory", "inventory.networkLatency", "load-generator.cpu", "load-generator.FloodHomepage", "payment.Failure", "payment.Unreachable", "payment.cpu", "payment.memory", "payment.networkLatency", "product-catalog.Failure", "product-catalog.cpu", "product-catalog.memory", "product-catalog.networkLatency", "recommendation.CacheFailure", "recommendation.Failure", "recommendation.cpu", "recommendation.memory", "recommendation.networkLatency", "system.NodeKiller"], "alarm_rules": ["overall_error_count"]}
页面诊断
查看错误数据
- 访问下面的链接进入云监控2.0调用链分析页面:
- 比赛A榜依赖的云监控2.0的workspace链接为:https://sls.aliyun.com/doc/playground/tianchi2025.html
- 比赛B榜依赖的云监控2.0的workspace链接为:https://sls.aliyun.com/doc/playground/tianchi2025b.html
- 依次点击左栏
应用监控、顶栏调用链分析; - 将故障时段
2025-08-28 15:08:03 ~ 2025-08-28 15:13:03原样复制,粘贴至页面右上角时间输入框,回车确认; - 在页面左侧
快捷筛选栏,选定错误状态,可得故障时段内全部错误调用; - 进入
Trace列表,在操作列点击详情按钮,可查看 Trace 信息:
观察拓扑视图
在顶栏点选拓扑视图,可见服务之间调用关系。视图大小可通过鼠标滚轮缩放;各方块位置可自由拖动,检查多组调用链,发现共同点系payment服务故障:
智能分析
在Trace详情页面,点击检测到异常右侧的魔棒按钮,可展开 Copilot 并向其提问:
交叉检验
在 多个Trace详情页面,通过Copilot分析,检测到相同的异常
得出结论
结合拓扑视图、Copilot 分析、日志校验,可定位根因系payment服务出现故障:
{"problem_id": "001", "root_causes": ["payment.Failure"]}