局部异常因子与KL散度异常检测算法简述

简介: Local Outlier FactorGiven local outlier factors, we can detect the outliers that are always away from most of the samples. In order to outline the algorithm, some concepts must go first:

Local Outlier Factor

Given local outlier factors, we can detect the outliers that are always away from most of the samples. In order to outline the algorithm, some concepts must go first:
Reachability Distance

RDk(x,x)=max(xx(k),xx)
RDk(x,x)=max(xx(k),xx)

where x(k) x(k) stands for the k kth point nearest to x x in training set {xi}ni=1 {xi}ni=1. Note that k k is manually selected.
Local Reachability Density
LRDk(x)=(1ki=1kRDk(x(i),x))1
LRDk(x)=(1kki=1RDk(x(i),x))1

Local Outlier Factor
LOFk(x)=1kki=1LRDk(x(i))LRDk(x)
LOFk(x)=1kki=1LRDk(x(i))LRDk(x)

Evidently, as the LOF of x x ascends, the probability that x x is an outlier also goes up. Theoretically, it is an easy algorithm with intuitive principle. However, when n n is a very large number, it also requires tremendous computation amount.

Here is a simple example

n=100; x=[(rand(n/2,2)-0.5)*20; randn(n/2,2)]; x(n,1)=14;
k=3; x2=sum(x.^2,2);
[s, t]=sort(sqrt(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'), 2);

for i=1:k+1
    for j=1:k
        RD(:,j)=max(s(t(t(:,i),j+1),k), s(t(:,i),j+1));
    end
    LRD(:,i)=1./mean(RD,2);
end
LOF=mean(LRD(:,2:k+1),2)./LRD(:,1);

figure(1); clf; hold on
plot(x(:,1),x(:,2),'rx');
for i=1:n
    plot(x(i,1),x(i,2),'bo', 'MarkerSize', LOF(i)*10);
end
AI 代码解读

Large circle means abnormal point

KL Divergence

In unsupervised learning problems, there is usually little information about the outliers. However, when some known normal sample set {xi}ni=1 {xi}ni=1 is given, we may be confident to figure out the outliers in the test set {xi}ni=1 {xi}ni=1 to some degree.
Kullback-Leibler (KL) divergence, also known as Relative Entropy, is a powerful tool to estimate the probability density ratio of normal samples to test samples-

w(x)=p(x)p(x)
w(x)=p(x)p(x)

where p(x) p(x) is the probability density of normal samples and p(x) p(x) is that of test ones, and avoid direct calculation of the ratio. The ratio of normal sample approaches 1 but of an outlier is away from 1.
To begin with, let transform the model of density ratio to a parameterized linear model:
wα(x)=j=1bαjψj(x)=αTψ(x)
wα(x)=bj=1αjψj(x)=αTψ(x)

where α=(α1,,αb)T α=(α1,,αb)T is the parameter vector and ψ(x)=(ψ1,,ψb)T ψ(x)=(ψ1,,ψb)T is a non-negative basis function vector. Then wα(x)p(x) wα(x)p(x) can be seen as an estimation of p(x) p(x). Define the similarity between wα(x)p(x) wα(x)p(x) and p(x) p(x) as KL distance, i.e.
KL(pwα(x)p(x))=p(x)logp(x)wα(x)p(x)
KL(pwα(x)p(x))=p(x)logp(x)wα(x)p(x)

In general case, KL distance is non-negative and equals to zero only if wαp=p wαp=p. When KL distance is considerably small, wαp wαp can be regarded near to p p. In order to guarantee that wαp wαp is well-defined, we apply the following constraint
wα(x)p(x)dx=1,x,wα(x)p(x)0
wα(x)p(x)dx=1,x,wα(x)p(x)0

Then by approximation, we can transform the estimation above to the following optimal problem:
maxα1ni=1nlogwα(xi)s.t.1ni=1nwα(xi)=1,α1,,αn0
maxα1nni=1logwα(xi)s.t.1nni=1wα(xi)=1,α1,,αn0

We briefly summarize the estimation process:
  1. Initialize α α.
  2. Repeatedly carry out the following process until α α comes a suitable precision:
    1. αα+ϵAT(1./Aα) αα+ϵAT(1./Aα)
    2. αα+(1bTα)b(bTb) αα+(1bTα)b(bTb)
    3. αmax(0,α) αmax(0,α)
    4. αα/(bTα) αα/(bTα)

where A A is the matrix whose (i,j) (i,j)th element is ψj(xi) ψj(xi). b b is the vector whose j jth element is 1nni=1ψj(xi) 1nni=1ψj(xi).

Here is an example (Gaussian Kernal Model):

function [ a ] = KLIEP( k, r )

a0=rand(size(k,2),1); b=mean(r)'; c=sum(b.^2);
for o=1:1000
    a=a0+0.01*k'*(1./k*a0); a=a+b*(1-sum(b.*a))/c;
    a=max(0,a); a=a/sum(b.*a);
    if norm(a-a0)<0.001, break, end
    a0=a;
end

end
AI 代码解读
n=100; x=randn(n,1); y=randn(n,1); y(n)=5;
hhs=2*[1,5,10].^2; m=5;
x2=x.^2; xx=repmat(x2,1,n)+repmat(x2',n,1)-2*(x*x');
y2=y.^2; yx=repmat(y2,1,n)+repmat(x2',n,1)-2*y*x';
u=floor(m*(0:n-1)/n)+1;
u=u(randperm(n));

for hk=1:length(hhs)
    hh=hhs(hk);k=exp(-xx/hh); r=exp(-yx/hh);
    for i=1:m
        g(hk,i)=mean(k(u==i,:)*KLIEP(k(u~=i,:),r));
    end
end
[gh,ggh]=max(mean(g,2));
HH=hhs(ggh);
k=exp(-xx/HH); r=exp(-yx/HH); s=r*KLIEP(k,r);

figure(1); clf; hold on; plot(y,s,'rx');
AI 代码解读

Abnormal degree

SVM

Furthermore, outlier detection can be done using support vector machine techniques. Due to the time limit, we just outline the main structure of that algorithm.
A typical SVM outlier detector gives a hyper-ball that contains nearly all the sample points. Then a point which is outlying the hyper-ball can be seen as an outlier. Concretely speaking, we get the center c c and radius R R by solving the following optimal problem:

minc,r,ξ(R2+Ci=1nξi)s.t.xic2R2+ξi,ξi0,i=1,2,,n
minc,r,ξ(R2+Cni=1ξi)s.t.xic2R2+ξi,ξi0,i=1,2,,n

It can be solved by using Lagrange multiplers:
L(c,r,ξ,α,β)=R2+Ci=1nξii=1nαi(R2+ξixic2)i=1nβiξi
L(c,r,ξ,α,β)=R2+Cni=1ξini=1αi(R2+ξixic2)ni=1βiξi

Then its dual problem can be formulated as:
maxα,βinfc,R,ξL(c,r,ξ,α,β),s.t.α0,β0
maxα,βinfc,R,ξL(c,r,ξ,α,β),s.t.α0,β0

KKT condition:
Lc=0LR=0Lξi=0c=ni=1αixini=1αii=1nαi=1αi+βi=C,i=1,2,,n
Lc=0c=ni=1αixini=1αiLR=0ni=1αi=1Lξi=0αi+βi=C,i=1,2,,n

Therefore, the dual problem can be solved by
α^=argmaxα=i=1nαixTixii,j=1nαiαjxTixjs.t.0αiC,i=1,2,,n
ˆα=argmaxα=(ni=1αixTixini,j=1αiαjxTixj)s.t.0αiC,i=1,2,,n

It is in the form of typical quadratic programming problem. After solving it, we are able to further solve c c and R R:
R2^=xij=1nα^jxj2,c^=i=1nα^ixi
^R2=xinj=1ˆαjxj2,ˆc=ni=1ˆαixi

where xi xi is the support vector satisfying xic2=R2 xic2=R2 and 0<αi<C 0<αi<C.
Hence, when a sample point x x satisfies
xc^2>R2^
xˆc2>^R2

it can be viewed as an outlier.
目录
打赏
0
0
0
0
1
分享
相关文章
基于GRU网络的MQAM调制信号检测算法matlab仿真,对比LSTM
本研究基于MATLAB 2022a,使用GRU网络对QAM调制信号进行检测。QAM是一种高效调制技术,广泛应用于现代通信系统。传统方法在复杂环境下性能下降,而GRU通过门控机制有效提取时间序列特征,实现16QAM、32QAM、64QAM、128QAM的准确检测。仿真结果显示,GRU在低SNR下表现优异,且训练速度快,参数少。核心程序包括模型预测、误检率和漏检率计算,并绘制准确率图。
145 65
基于GRU网络的MQAM调制信号检测算法matlab仿真,对比LSTM
面向办公室屏幕监控系统的改进型四叉树屏幕变化检测算法研究
本文提出一种改进型四叉树数据结构模型,用于优化办公室屏幕监控系统。通过动态阈值调节、变化优先级索引及增量更新策略,显著降低计算复杂度并提升实时响应能力。实验表明,该算法在典型企业环境中将屏幕变化检测效率提升40%以上,同时减少资源消耗。其应用场景涵盖安全审计、工作效能分析及远程协作优化等,未来可结合深度学习实现更智能化的功能。
37 0
基于MSER和HOG特征提取的SVM交通标志检测和识别算法matlab仿真
### 算法简介 1. **算法运行效果图预览**:展示算法效果,完整程序运行后无水印。 2. **算法运行软件版本**:Matlab 2017b。 3. **部分核心程序**:完整版代码包含中文注释及操作步骤视频。 4. **算法理论概述**: - **MSER**:用于检测显著区域,提取图像中稳定区域,适用于光照变化下的交通标志检测。 - **HOG特征提取**:通过计算图像小区域的梯度直方图捕捉局部纹理信息,用于物体检测。 - **SVM**:寻找最大化间隔的超平面以分类样本。 整个算法流程图见下图。
基于MobileNet深度学习网络的活体人脸识别检测算法matlab仿真
本内容主要介绍一种基于MobileNet深度学习网络的活体人脸识别检测技术及MQAM调制类型识别方法。完整程序运行效果无水印,需使用Matlab2022a版本。核心代码包含详细中文注释与操作视频。理论概述中提到,传统人脸识别易受非活体攻击影响,而MobileNet通过轻量化的深度可分离卷积结构,在保证准确性的同时提升检测效率。活体人脸与非活体在纹理和光照上存在显著差异,MobileNet可有效提取人脸高级特征,为无线通信领域提供先进的调制类型识别方案。
基于深度学习的路面裂缝检测算法matlab仿真
本项目基于YOLOv2算法实现高效的路面裂缝检测,使用Matlab 2022a开发。完整程序运行效果无水印,核心代码配有详细中文注释及操作视频。通过深度学习技术,将目标检测转化为回归问题,直接预测裂缝位置和类别,大幅提升检测效率与准确性。适用于实时检测任务,确保道路安全维护。 简介涵盖了算法理论、数据集准备、网络训练及检测过程,采用Darknet-19卷积神经网络结构,结合随机梯度下降算法进行训练。
基于yolov2和googlenet网络的疲劳驾驶检测算法matlab仿真
本内容展示了基于深度学习的疲劳驾驶检测算法,包括算法运行效果预览(无水印)、Matlab 2022a 软件版本说明、部分核心程序(完整版含中文注释与操作视频)。理论部分详细阐述了疲劳检测原理,通过对比疲劳与正常状态下的特征差异,结合深度学习模型提取驾驶员面部特征变化。具体流程包括数据收集、预处理、模型训练与评估,使用数学公式描述损失函数和推理过程。课题基于 YOLOv2 和 GoogleNet,先用 YOLOv2 定位驾驶员面部区域,再由 GoogleNet 分析特征判断疲劳状态,提供高准确率与鲁棒性的检测方法。
基于分块贝叶斯非局部均值优化(OBNLM)的图像去噪算法matlab仿真
本项目基于分块贝叶斯非局部均值优化(OBNLM)算法实现图像去噪,使用MATLAB2022A进行仿真。通过调整块大小和窗口大小等参数,研究其对去噪效果的影响。OBNLM结合了经典NLM算法与贝叶斯统计理论,利用块匹配和概率模型优化相似块的加权融合,提高去噪效率和保真度。实验展示了不同参数设置下的去噪结果,验证了算法的有效性。
128 10
基于反光衣和检测算法的应用探索
本文探讨了利用机器学习和计算机视觉技术进行反光衣检测的方法,涵盖图像预处理、目标检测与分类、特征提取等关键技术。通过YOLOv5等模型的训练与优化,展示了实现高效反光衣识别的完整流程,旨在提升智能检测系统的性能,应用于交通安全、工地监控等领域。
分别使用OVP-UVP和OFP-UFP算法以及AFD检测算法实现反孤岛检测simulink建模与仿真
本课题通过Simulink建模与仿真,实现OVP-UVP、OFP-UFP算法及AFD检测算法的反孤岛检测。OVP-UVP基于电压幅值变化,OFP-UFP基于频率变化,而AFD则通过注入频率偏移信号来检测孤岛效应,确保电力系统安全稳定运行。系统使用MATLAB 2013b进行建模与仿真验证。

热门文章

最新文章

AI助理
登录插画

登录以查看您的控制台资源

管理云资源
状态一览
快捷访问

你好,我是AI助理

可以解答问题、推荐解决方案等