常用无监督降维方法简述

简介: Unsupervised Dimension ReductionData with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objec

Unsupervised Dimension Reduction

Data with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objective as the one with low dimension. Therefore, dimension reduction is one of the key tricks to tackle it.

Linear Dimension Reduction

In order to reduce the dimension of samples, such as transform {xi}ni=1 into {zi}ni=1 with little lose of information, we can use linear transformation :

zi=Txi

Before doing that, it is necessary to make sure the average of training set {xi}ni=1 to be zero, i.e. centralization. So if it were not true, we should move the frame :
xixi1ni=1nxi

Principal Component Analysis, PCA

PCA, as you can see in the following contents, is the simplest linear dimension reduction method. Suppose that zi is the orthogonal projection of xi . Then we require that TTT=Im . By the same way in LS methods we try to reduce the lose of information as little as possible, i.e. we try to minimize:

i=1nTTTxixi2=tr(TCTT)+tr(C)

where C is the covariance of training set:
C=i=1nxixTi

In summary, PCA is defined as
maxTRm×dtr(TCTT)s.t.TTT=Im

Consider the eigenvalues of C :
Cξ=λξ

Define the eigenvalues and corresponded eigen vectors as λ1λ2λd0 and ξ1,,ξd respectively. Then we get :
T=(ξ1,,ξm)T

Here is a simple example:

n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
[t,v]=eigs(x'*x,1);

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

这里写图片描述

Locality Preserving Projections

In PCA, the structure of clusters in origin training set may be changed, which is not true in locality preserving projections. It is another version of linear dimension reduction.
Define the similarity between xi and xi as Wi,i0 . When they are similar to large degree Wi,i is of a large value and vice versa. Since similarity is symmetric, we require Wi,i=Wi,i . There are several normal forms of similarity, such as the Gaussian Similarity:

Wi,i=exp(xixi22t2)

where t>0 is a tunable parameter.
For the purpose of holding the structure of clusters, it is necessary to hypothesis that similar xi would be transformed to similar zi . That is to say, we ought to minimize:
12i,i=1nWi,iTxiTxi2

However, to avoid the solution T=0 , we require
TXDXTTT=Im

where X=(x1,,xn)Rd×n , D is a diagonal matrix:
Di,i=i′′=1nWi,i′′0(i=i)(ii)

If we set L=DW , then we can represent our optimization goal as
minTRm×dtr(TXLXTTT)s.t.TXDXTTT=Im

So how to solve it? Consider the method we use in PCA:
XLXTξ=λXDXTξ

Then define the generalized eigenvalues and eigen vectors as λ1λ2λd0 and ξ1,,ξd respectively. Therefore
T=(ξd,ξd1,,ξdm+1)T
.
n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
x2=sum(x.^2,2);
W=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'));
D=diag(sum(W,2)); L=D-W;
z=x'*D*x;
z=(z+z')/2;
[t,v]=eigs(x'*L*x,z,1,'sm');

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

这里写图片描述

Kernalized PCA

Let us turn to methods of nonlinear dimension reduction. Due to the time limit, we may not analyze it as deep as the linear one.
When it comes to nonlinearity, kernal functions are sure to be highlighted. Take the Gaussian Kernal function for example:

K(x,x)=exp(xx22h2)

Here we will not take the eigenvalues of C into account as we did in PCA, but the eigenvalues of kernal matrix Kα=λα , where the (i,i) th element is K(xi,xi) . Hence KRn×n . Note that dimension of the kernal matrix K depends only on the number of samples.
However, centralization is necessary:
KHKH

where
H=In1n×n/n

1n×n is a matrix with all the elements to be one. The final outcome of kernalized PCA is:
(z1,.zn)=(1λ1α1,,1λmαm)THKH

where α1,,αm are m eigen vectors corresponded with m largest eigenvalues of HKH .
相关文章
|
7月前
|
机器学习/深度学习 运维 算法
监督算法和无监督算法之间的区别
【8月更文挑战第23天】
233 0
|
10月前
|
机器学习/深度学习 数据采集 人工智能
大模型开发:解释特征工程的重要性以及你如何进行特征选择。
特征工程对机器学习和深度学习至关重要,涉及数据清洗、转换和特征选择,以提升模型预测和泛化能力。它能提高数据质量、浓缩信息、优化模型性能及增强解释性。特征选择是关键步骤,包括过滤法、递归特征消除、嵌入式(如L1正则化)、包裹式和基于模型的方法。此过程通常迭代进行,结合多种工具和业务知识,并可通过自动化技术(如AutoML)简化。
575 0
|
机器学习/深度学习 算法 数据挖掘
|
10月前
|
机器学习/深度学习 运维 算法
大模型开发:解释监督学习和非监督学习之间的区别。
监督学习与非监督学习是机器学习的两大分支。监督学习使用带标签的训练数据来学习预测模型,如线性回归、SVM,常用于分类和回归问题。非监督学习则从无标签数据中挖掘模式和结构,如聚类、PCA,适用于市场细分和异常检测。关键在于根据任务和数据选择合适的方法。
395 1
|
10月前
|
计算机视觉
VanillaKD | 简单而强大, 对原始知识蒸馏方法的再审视
VanillaKD | 简单而强大, 对原始知识蒸馏方法的再审视
111 0
|
机器学习/深度学习 算法
深入讨论机器学习 8 大回归模型的基本原理以及差异!
深入讨论机器学习 8 大回归模型的基本原理以及差异!
595 0
深入讨论机器学习 8 大回归模型的基本原理以及差异!
|
10月前
|
机器学习/深度学习 人工智能 运维
[ICLR2024]基于对比稀疏扰动技术的时间序列解释框架ContraLSP
《Explaining Time Series via Contrastive and Locally Sparse Perturbations》被机器学习领域顶会ICLR 2024接收。该论文提出了一种创新的基于扰动技术的时间序列解释框架ContraLSP,该框架主要包含一个学习反事实扰动的目标函数和一个平滑条件下稀疏门结构的压缩器。论文在白盒时序预测,黑盒时序分类等仿真数据,和一个真实时序数据集分类任务中进行了实验,ContraLSP在解释性能上超越了SOTA模型,显著提升了时间序列数据解释的质量。
|
14天前
|
机器学习/深度学习 人工智能 运维
[ICLR2024]基于对比稀疏扰动技术的时间序列解释框架ContraLSP
[ICLR2024]基于对比稀疏扰动技术的时间序列解释框架ContraLSP
|
2月前
|
机器学习/深度学习 资源调度 算法
增强回归模型的可解释性:基于MCMC的混合建模与特征选择方法研究
本文介绍了一种基于正态回归混合模型的通用实现方法,结合MCMC算法,在参数估计的同时实现模型选择和特征选择。该方法突破了正态性假设限制,适用于非正态和非线性数据集,并通过回归混合和特征选择机制提升模型可解释性和性能。实验结果表明,该模型在复杂数据分类和参数估计方面表现出色,错误率仅为6%。此方法在客户群体识别、医疗数据分析等领域具有广泛应用价值。
92 10
增强回归模型的可解释性:基于MCMC的混合建模与特征选择方法研究