浅析利用高斯核函数进行半监督分类

简介: Laplacian RegularizationIn Least Square learning methods, we calculate the Euclidean distance between sample points to find a classifier plane. However, here we calculate the minimum dist

Laplacian Regularization

In Least Square learning methods, we calculate the Euclidean distance between sample points to find a classifier plane. However, here we calculate the minimum distance along the manifold of points and based on which we find a classifier plane.

In semi-supervised learning applications, we assume that the inputs x must locate in some manifold and the outputs y vary smoothly in that manifold. In the case of classification, inputs in the same manifold are supposed to have the same label. In the case of regression, the maps of inputs to outputs are supposed to vary smoothly in some manifold.

Take the Gaussian kernal function for example:

fθ(x)=j=1nθjK(x,xj),K(x,c)=exp(xc22h2)

There are unlabeled samples {xi}n+ni=n+1 that also be utilized:
fθ(x)=j=1n+nθjK(x,xj)

In order to make all of the samples (labeled and unlabeled) have local similarity, it is necessary to add a constraint condition:
minθ12i=1n(fθ(xi)yi)2+λ2θ2+v4i,i=1n+nWi,i(fθ(xi)fθ(xi))2

whose first two terms relate to the 2 regularized least square learning and last term is the regularized term relates to semi-supervised learning ( Laplacian Regularization). v0 is a parameter to tune the smoothness of the manifold. Wi,i0 is the similarity between xi and xi . Not familiar with similarity? Refer to:

http://blog.csdn.net/philthinker/article/details/70212147

Then how to solve the optimization problem? By the diagonal matrix D , whose elements are sums of row elements of W , and the Laplace matrix L that equals to DW , it is possible to transform the optimization problem above to a general 2 constrained Least Square problem. For simplicity, we omit the details here.

n=200; a=linspace(0,pi,n/2);
u=-10*[cos(a)+0.5 cos(a)-0.5]'+randn(n,1);
v=10*[sin(a) -sin(a)]'+randn(n,1);
x=[u v]; y=zeros(n,1); y(1)=1; y(n)=-1;
x2=sum(x.^2,2); hh=2*1^2;
k=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*(x'))/hh);
w=k;
t=(k^2+1*eye(n)+10*k*(diag(sum(w))-w)*k)\(k*y);

m=100; X=linspace(-20,20,m)';X2=X.^2;
U=exp(-(repmat(u.^2,1,m)+repmat(X2',n,1)-2*u*(X'))/hh);
V=exp(-(repmat(v.^2,1,m)+repmat(X2',n,1)-2*v*(X'))/hh);
figure(1); clf; hold on; axis([-20 20 -20 20]);
colormap([1 0.7 1; 0.7 1 1]);
contourf(X,X,sign(V'*(U.*repmat(t,1,m))));
plot(x(y==1,1),x(y==1,2),'bo');
plot(x(y==-1,1),x(y==-1,2),'rx');
plot(x(y==0,1),x(y==0,2),'k.');

LR

相关文章
|
机器学习/深度学习 人工智能 测试技术
使用随机森林分类器对基于NDRE(归一化差异水体指数)的特征进行分类
使用随机森林分类器对基于NDRE(归一化差异水体指数)的特征进行分类
97 1
|
6月前
|
数据可视化 数据建模 大数据
MCMC的rstan贝叶斯回归模型和标准线性回归模型比较
MCMC的rstan贝叶斯回归模型和标准线性回归模型比较
|
6月前
|
机器学习/深度学习 数据采集 算法
乳腺癌预测:特征交叉+随机森林=成功公式?
乳腺癌预测:特征交叉+随机森林=成功公式?
86 0
乳腺癌预测:特征交叉+随机森林=成功公式?
|
6月前
|
SQL 数据可视化 数据挖掘
R语言线性分类判别LDA和二次分类判别QDA实例
R语言线性分类判别LDA和二次分类判别QDA实例
|
机器学习/深度学习 算法 索引
逻辑回归与多项式特征:解密分类问题的强大工具
逻辑回归与多项式特征:解密分类问题的强大工具
|
机器学习/深度学习 存储 索引
用4种回归方法绘制预测结果图表:向量回归、随机森林回归、线性回归、K-最近邻回归
用4种回归方法绘制预测结果图表:向量回归、随机森林回归、线性回归、K-最近邻回归
160 0
特征选择:回归,二分类,多分类特征选择有这么多差异需要注意
特征选择:回归,二分类,多分类特征选择有这么多差异需要注意
144 0
|
机器学习/深度学习 算法 开发者
求解 SVM 分类超平面| 学习笔记
快速学习求解 SVM 分类超平面。
求解 SVM 分类超平面| 学习笔记
|
机器学习/深度学习 算法 开发者
回归模型参数估计-5| 学习笔记
快速学习回归模型参数估计-5。
回归模型参数估计-5| 学习笔记
|
机器学习/深度学习 算法 开发者
回归模型参数估计-3| 学习笔记
快速学习回归模型参数估计-3。
回归模型参数估计-3| 学习笔记
下一篇
无影云桌面