常用无监督降维方法简述

简介: Unsupervised Dimension ReductionData with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objec

Unsupervised Dimension Reduction

Data with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objective as the one with low dimension. Therefore, dimension reduction is one of the key tricks to tackle it.

Linear Dimension Reduction

In order to reduce the dimension of samples, such as transform {xi}ni=1 into {zi}ni=1 with little lose of information, we can use linear transformation :

zi=Txi

Before doing that, it is necessary to make sure the average of training set {xi}ni=1 to be zero, i.e. centralization. So if it were not true, we should move the frame :
xixi1ni=1nxi

Principal Component Analysis, PCA

PCA, as you can see in the following contents, is the simplest linear dimension reduction method. Suppose that zi is the orthogonal projection of xi . Then we require that TTT=Im . By the same way in LS methods we try to reduce the lose of information as little as possible, i.e. we try to minimize:

i=1nTTTxixi2=tr(TCTT)+tr(C)

where C is the covariance of training set:
C=i=1nxixTi

In summary, PCA is defined as
maxTRm×dtr(TCTT)s.t.TTT=Im

Consider the eigenvalues of C :
Cξ=λξ

Define the eigenvalues and corresponded eigen vectors as λ1λ2λd0 and ξ1,,ξd respectively. Then we get :
T=(ξ1,,ξm)T

Here is a simple example:

n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
[t,v]=eigs(x'*x,1);

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

这里写图片描述

Locality Preserving Projections

In PCA, the structure of clusters in origin training set may be changed, which is not true in locality preserving projections. It is another version of linear dimension reduction.
Define the similarity between xi and xi as Wi,i0 . When they are similar to large degree Wi,i is of a large value and vice versa. Since similarity is symmetric, we require Wi,i=Wi,i . There are several normal forms of similarity, such as the Gaussian Similarity:

Wi,i=exp(xixi22t2)

where t>0 is a tunable parameter.
For the purpose of holding the structure of clusters, it is necessary to hypothesis that similar xi would be transformed to similar zi . That is to say, we ought to minimize:
12i,i=1nWi,iTxiTxi2

However, to avoid the solution T=0 , we require
TXDXTTT=Im

where X=(x1,,xn)Rd×n , D is a diagonal matrix:
Di,i=i′′=1nWi,i′′0(i=i)(ii)

If we set L=DW , then we can represent our optimization goal as
minTRm×dtr(TXLXTTT)s.t.TXDXTTT=Im

So how to solve it? Consider the method we use in PCA:
XLXTξ=λXDXTξ

Then define the generalized eigenvalues and eigen vectors as λ1λ2λd0 and ξ1,,ξd respectively. Therefore
T=(ξd,ξd1,,ξdm+1)T
.
n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
x2=sum(x.^2,2);
W=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'));
D=diag(sum(W,2)); L=D-W;
z=x'*D*x;
z=(z+z')/2;
[t,v]=eigs(x'*L*x,z,1,'sm');

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

这里写图片描述

Kernalized PCA

Let us turn to methods of nonlinear dimension reduction. Due to the time limit, we may not analyze it as deep as the linear one.
When it comes to nonlinearity, kernal functions are sure to be highlighted. Take the Gaussian Kernal function for example:

K(x,x)=exp(xx22h2)

Here we will not take the eigenvalues of C into account as we did in PCA, but the eigenvalues of kernal matrix Kα=λα , where the (i,i) th element is K(xi,xi) . Hence KRn×n . Note that dimension of the kernal matrix K depends only on the number of samples.
However, centralization is necessary:
KHKH

where
H=In1n×n/n

1n×n is a matrix with all the elements to be one. The final outcome of kernalized PCA is:
(z1,.zn)=(1λ1α1,,1λmαm)THKH

where α1,,αm are m eigen vectors corresponded with m largest eigenvalues of HKH .
相关文章
|
8月前
|
机器学习/深度学习 人工智能 自然语言处理
大模型开发:解释强化学习以及它与监督学习的不同之处。
强化学习(RL)是机器学习的一种,通过智能体与环境交互学习最优策略,以获取最大回报,常用于动态环境如游戏和机器人。与之不同,监督学习(SL)使用有标签的训练数据来预测新数据,适用于如图像分类等稳定问题。两者关键区别在于学习方式和应用场景:RL侧重环境交互和策略优化,适合未知动态环境;SL依赖已知标签数据,适合标签明确的任务。在大模型开发中,两者各有优势,并不断融合创新,推动人工智能发展。
263 2
|
8月前
|
机器学习/深度学习 运维 算法
大模型开发:解释监督学习和非监督学习之间的区别。
监督学习与非监督学习是机器学习的两大分支。监督学习使用带标签的训练数据来学习预测模型,如线性回归、SVM,常用于分类和回归问题。非监督学习则从无标签数据中挖掘模式和结构,如聚类、PCA,适用于市场细分和异常检测。关键在于根据任务和数据选择合适的方法。
308 1
|
3月前
|
机器学习/深度学习 算法 数据可视化
机器学习的核心功能:分类、回归、聚类与降维
机器学习领域的基本功能类型通常按照学习模式、预测目标和算法适用性来分类。这些类型包括监督学习、无监督学习、半监督学习和强化学习。
71 0
|
8月前
|
机器学习/深度学习 自然语言处理 算法
|
5月前
|
机器学习/深度学习 运维 算法
监督算法和无监督算法之间的区别
【8月更文挑战第23天】
169 0
|
6月前
|
机器学习/深度学习
ICML 2024:揭示非线形Transformer在上下文学习中学习和泛化的机制
【7月更文挑战第10天】Rensselaer Polytechnic Institute和IBM的研究者探讨了非线性Transformer在上下文学习的理论基础。他们展示了Transformer如何通过注意力层聚焦相关上下文,并利用MLP层进行预测,揭示了其在不需微调情况下的泛化能力。尽管研究局限于二进制分类和单层模型,它为理解复杂模型在不同任务和领域的潜在适应性提供了新视角。[论文链接:](https://arxiv.org/pdf/2402.15607)**
51 1
|
7月前
|
机器学习/深度学习 分布式计算 算法
在机器学习项目中,选择算法涉及问题类型识别(如回归、分类、聚类、强化学习)
【6月更文挑战第28天】在机器学习项目中,选择算法涉及问题类型识别(如回归、分类、聚类、强化学习)、数据规模与特性(大数据可能适合分布式算法或深度学习)、性能需求(准确性、速度、可解释性)、资源限制(计算与内存)、领域知识应用以及实验验证(交叉验证、模型比较)。迭代过程包括数据探索、模型构建、评估和优化,结合业务需求进行决策。
64 0
|
8月前
|
机器学习/深度学习 数据采集 人工智能
大模型开发:解释特征工程的重要性以及你如何进行特征选择。
特征工程对机器学习和深度学习至关重要,涉及数据清洗、转换和特征选择,以提升模型预测和泛化能力。它能提高数据质量、浓缩信息、优化模型性能及增强解释性。特征选择是关键步骤,包括过滤法、递归特征消除、嵌入式(如L1正则化)、包裹式和基于模型的方法。此过程通常迭代进行,结合多种工具和业务知识,并可通过自动化技术(如AutoML)简化。
473 0
|
8月前
|
计算机视觉
VanillaKD | 简单而强大, 对原始知识蒸馏方法的再审视
VanillaKD | 简单而强大, 对原始知识蒸馏方法的再审视
78 0
|
机器学习/深度学习 人工智能 算法
目标检测模型设计准则 | YOLOv7参考的ELAN模型解读,YOLO系列模型思想的设计源头
目标检测模型设计准则 | YOLOv7参考的ELAN模型解读,YOLO系列模型思想的设计源头
939 0
目标检测模型设计准则 | YOLOv7参考的ELAN模型解读,YOLO系列模型思想的设计源头

热门文章

最新文章