Unsupervised Dimension Reduction
Data with high dimension is always difficult to tackle. One hand is that it requires tremendous computation resource. On the other hand, it is not so objective as the one with low dimension. Therefore, dimension reduction is one of the key tricks to tackle it.
Linear Dimension Reduction
In order to reduce the dimension of samples, such as transform
Before doing that, it is necessary to make sure the average of training set
Principal Component Analysis, PCA
PCA, as you can see in the following contents, is the simplest linear dimension reduction method. Suppose that
where
In summary, PCA is defined as
Consider the eigenvalues of
Define the eigenvalues and corresponded eigen vectors as
Here is a simple example:
n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
[t,v]=eigs(x'*x,1);
figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);
Locality Preserving Projections
In PCA, the structure of clusters in origin training set may be changed, which is not true in locality preserving projections. It is another version of linear dimension reduction.
Define the similarity between
where
For the purpose of holding the structure of clusters, it is necessary to hypothesis that similar
However, to avoid the solution
where
If we set
So how to solve it? Consider the method we use in PCA:
Then define the generalized eigenvalues and eigen vectors as
n=100;
%x=[2*randn(n,1) randn(n,1)];
x=[2*randn(n,1) 2*round(rand(n,1))-1+randn(n,1)/3];
x=x-repmat(mean(x),[n,1]);
x2=sum(x.^2,2);
W=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x'));
D=diag(sum(W,2)); L=D-W;
z=x'*D*x;
z=(z+z')/2;
[t,v]=eigs(x'*L*x,z,1,'sm');
figure(1); clf; hold on; axis([-6 6 -6 6]);
plot(x(:,1),x(:,2),'rx');
plot(9*[-t(1) t(1)], 9*[-t(2) t(2)]);
Kernalized PCA
Let us turn to methods of nonlinear dimension reduction. Due to the time limit, we may not analyze it as deep as the linear one.
When it comes to nonlinearity, kernal functions are sure to be highlighted. Take the Gaussian Kernal function for example:
Here we will not take the eigenvalues of
However, centralization is necessary:
where
where