常用无监督降维方法简述

2017-04-17 2431

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： Unsupervised Dimension ReductionData with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objec

Unsupervised Dimension Reduction

Data with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objective as the one with low dimension. Therefore, dimension reduction is one of the key tricks to tackle it.

Linear Dimension Reduction

In order to reduce the dimension of samples, such as transform $\{x_{i}\}_{i=1}^{n}$ into $\{z_{i}\}_{i=1}^{n}$ with little lose of information, we can use linear transformation :

z i = T x i

$z_{i}=Tx_{i}$
Before doing that, it is necessary to make sure the average of training set

{xi}ni=1 $\{x_{i}\}_{i=1}^{n}$ to be zero, i.e. centralization. So if it were not true, we should move the frame :

x i \leftarrow x i - 1 n \sum i' = 1 n x i'

$x_{i}\leftarrow x_{i}-\frac{1}{n}\sum_{i'=1}^{n}x_{i'}$

Principal Component Analysis, PCA

PCA, as you can see in the following contents, is the simplest linear dimension reduction method. Suppose that $z_{i}$ is the orthogonal projection of $x_{i}$ . Then we require that $TT^{T}=I_{m}$ . By the same way in LS methods we try to reduce the lose of information as little as possible, i.e. we try to minimize:

\sum i = 1 n ∥ T T T x i - x i ∥ 2 = - t r (T C T T) + t r (C)

$\sum_{i=1}^{n}\|T^{T}Tx_{i}-x_{i}\|^{2}=-\mathrm{tr}\left(TCT^{T}\right)+\mathrm{tr}(C)$
where

C $C$ is the covariance of training set:

C = \sum i = 1 n x i x T i

$C=\sum_{i=1}^{n}x_{i}x_{i}^{T}$
In summary, PCA is defined as

max T \in R m \times d t r (T C T T) s . t . T T T = I m

$\max_{T\in\mathbb{R}^{m\times d}}\mathrm{tr}\left(TCT^{T}\right)\quad s.t. \quad TT^{T}=I_{m}$
Consider the eigenvalues of

C $C$ :

C ξ = λ ξ

$C\xi=\lambda\xi$
Define the eigenvalues and corresponded eigen vectors as

λ1≥λ2≥⋯≥λd≥0 $\lambda_{1}\geq\lambda_{2}\geq\cdots\geq\lambda_{d}\geq 0$ and

ξ1,…,ξd $\xi_{1},\dots,\xi_{d}$ respectively. Then we get :

T = (ξ 1, \dots, ξ m) T

$T=(\xi_{1},\dots,\xi_{m})^{T}$

Here is a simple example:

n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
[t,v]=eigs(x'*x,1);

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

这里写图片描述

Locality Preserving Projections

In PCA, the structure of clusters in origin training set may be changed, which is not true in locality preserving projections. It is another version of linear dimension reduction.
Define the similarity between $x_{i}$ and $x_{i'}$ as $W_{i,i'}\geq 0$ . When they are similar to large degree $W_{i,i'}$ is of a large value and vice versa. Since similarity is symmetric, we require $W_{i,i'}=W_{i',i}$ . There are several normal forms of similarity, such as the Gaussian Similarity:

W i, i' = exp (- ∥ x i - x i ' ∥ 2 2 t 2)

$W_{i,i'}=\exp\left( -\frac{\|x_{i}-x_{i'}\|^{2}}{2t^{2}} \right)$
where

t>0 $t>0$ is a tunable parameter.
For the purpose of holding the structure of clusters, it is necessary to hypothesis that similar

xi $x_{i}$ would be transformed to similar

zi $z_{i}$ . That is to say, we ought to minimize:

1 2 \sum i, i' = 1 n W i, i' ∥ T x i - T x i' ∥ 2

$\frac{1}{2}\sum_{i,i'=1}^{n}W_{i,i'}\|Tx_{i}-Tx_{i'}\|^{2}$
However, to avoid the solution

T=0 $T=0$ , we require

T X D X T T T = I m

$TXDX^{T}T^{T}=I_{m}$
where

X=(x1,⋯,xn)∈Rd×n $X=(x_{1},\cdots,x_{n})\in\mathbb{R}^{d\times n}$ ,

D $D$ is a diagonal matrix:

D i, i' = ⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ \sum i'' = 1 n W i, i'' 0 (i = i') (i \neq i')

$D_{i,i'}=\left\{ \begin{aligned} &\sum_{i''=1}^{n}W_{i,i''} \quad & (i=i')\\ &0 & (i\neq i') \end{aligned} \right.$
If we set

L=D−W $L=D-W$ , then we can represent our optimization goal as

min T \in R m \times d t r (T X L X T T T) s . t . T X D X T T T = I m

$\min_{T\in\mathbb{R}^{m\times d}}\mathrm{tr}\left( TXLX^{T}T^{T} \right)\quad s.t. \quad TXDX^{T}T^{T}=I_{m}$
So how to solve it? Consider the method we use in PCA:

X L X T ξ = λ X D X T ξ

$XLX^{T}\xi = \lambda XDX^{T}\xi$
Then define the generalized eigenvalues and eigen vectors as

λ1≥λ2≥⋯λd≥0 $\lambda_{1}\geq\lambda_{2}\geq\cdots\lambda_{d}\geq 0$ and

ξ1,…,ξd $\xi_{1},\dots,\xi_{d}$ respectively. Therefore

T = (ξ d, ξ d - 1, \dots, ξ d - m + 1) T

$T=(\xi_{d},\xi_{d-1},\dots,\xi_{d-m+1})^{T}$ .

n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
x2=sum(x.^2,2);
W=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'));
D=diag(sum(W,2)); L=D-W;
z=x'*D*x;
z=(z+z')/2;
[t,v]=eigs(x'*L*x,z,1,'sm');

figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);

这里写图片描述

Kernalized PCA

Let us turn to methods of nonlinear dimension reduction. Due to the time limit, we may not analyze it as deep as the linear one.
When it comes to nonlinearity, kernal functions are sure to be highlighted. Take the Gaussian Kernal function for example:

K (x, x') = exp (- ∥ x - x ' ∥ 2 2 h 2)

$K(x,x')=\exp\left( -\frac{\|x-x'\|^{2}}{2h^{2}} \right)$
Here we will not take the eigenvalues of

C $C$ into account as we did in PCA, but the eigenvalues of kernal matrix

Kα=λα $K\alpha=\lambda\alpha$ , where the

(i,i′) $(i,i')$ th element is

K(xi,xi′) $K(x_{i},x_{i'})$ . Hence

K∈Rn×n $K\in\mathbb{R}^{n\times n}$ . Note that dimension of the kernal matrix

K $K$ depends only on the number of samples.
However, centralization is necessary:

K \leftarrow H K H

$K\leftarrow HKH$
where

H = I n - 1 n \times n / n

$H=I_{n}-1_{n\times n}/n$

1n×n $1_{n\times n}$ is a matrix with all the elements to be one. The final outcome of kernalized PCA is:

(z 1, \dots . z n) = (1 λ 1 - - \sqrt α 1, \dots, 1 λ m - - - \sqrt α m) T H K H

$(z_{1},\dots.z_{n})=\left( \frac{1}{\sqrt{\lambda_{1}}}\alpha_{1},\cdots, \frac{1}{\sqrt{\lambda_{m}}}\alpha_{m}\right)^{T}HKH$
where

α1,…,αm $\alpha_{1},\dots,\alpha_{m}$ are

m $m$ eigen vectors corresponded with

m $m$ largest eigenvalues of

HKH $HKH$ .

常用无监督降维方法简述

Unsupervised Dimension Reduction

Linear Dimension Reduction

Principal Component Analysis, PCA

Locality Preserving Projections

Kernalized PCA

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

常用无监督降维方法简述

Unsupervised Dimension Reduction

Linear Dimension Reduction

Principal Component Analysis, PCA

Locality Preserving Projections

Kernalized PCA

热门文章

最新文章

相关电子书