# SLS机器学习最佳实践：时序相似性分析

### 一、使用场景

• 当您有N台机器的监控指标数据时，想快速知道在某一段时间机器的CPU形态的大致呈现哪些形态，便于用户更好的了解当前系统的状态；
• 当您指定某一台机器的某指标曲线时，您想知道哪些机器的相同指标与当前指定的曲线更加相似；
• 当您人工输入一条时序曲线（网站整体的访问延迟曲线），您想知道是哪个服务的访问延时的变化规律同当前指定的曲线十分相似，便于缩小问题的排查；
上述场景问题均可以归纳成，时序聚类（按照形态、按照数值），时序相似性判别这两个问题。

### 二、函数介绍

SLS平台提供了两个函数供大家使用，具体的文档地址可以参考下：https://help.aliyun.com/document_detail/93235.html

ts_density_cluster
ts_hierarchical_cluster

### 三、案例实战

#### 3.1 数据探索

• query-01
* | select DISTINCT index_name, machine, region from log
• query-02
* | select count(1) as num from (select DISTINCT index_name, machine, region from log)
• query-03
* and index_name : load |
select
__time__,
value,
concat(
region, '#', machine, '#', index_name
) as ins
from log order by __time__
limit 10000
• query-04
*
and index_name : load |
select
date_trunc('minute', __time__) as time,
region,
avg(value) as value
from log group by time, region order by time limit 1000

#### 3.2 聚类实战

*
and index_name : load |
select
ts_hierarchical_cluster(time, value, ins)
from
(
select
__time__ as time,
value,
concat(
region, '#', machine, '#', index_name
) as ins
from
log
)

#### 3.2 相似性查找

*
and index_name : load |
select
cast(
cast(ts_value as double) as bigint
) as ts_value,
cast(ds_value as double) as ds_value,
name
from
(
select
tt[1][1] as name,
tt[2] as ts,
tt[3] as ds
from
(
select
ts_similar_instance(
time, value, ins, 'aysls-pub-cn-beijing-k8s#192.168.7.254:9100#load',
10,
'euclidean'
) as res
from
(
select
__time__ as time,
value,
concat(
region, '#', machine, '#', index_name
) as ins
from
log
)
),
unnest(res) as t(tt)
),
unnest(ts) as t(ts_value),
unnest(ds) as t(ds_value)
order by
ts_value
limit
10000

+ 订阅