Logistic回归与最小二乘概率分类算法简述与示例

简介: Logistic Regression & Least Square Probability Classification1. Logistic RegressionLikelihood function, as interpreted by wikipedia: https://en.wikipedia.org/wiki/Likelihood_f

Logistic Regression & Least Square Probability Classification

1. Logistic Regression

Likelihood function, as interpreted by wikipedia:

https://en.wikipedia.org/wiki/Likelihood_function

plays one of the key roles in statistic inference, especially methods of estimating a parameter from a set of statistics. In this article, we’ll make full use of it.
Pattern recognition works on the way that learning the posterior probability p(y|x) of pattern x belonging to class y . In view of a pattern x , when the posterior probability of one of the class y achieves the maximum, we can take x for class y , i.e.

y^=argmaxy=1,,cp(u|x)

The posterior probability can be seen as the credibility of model x belonging to class y .
In Logistic regression algorithm, we make use of linear logarithmic function to analyze the posterior probability:
q(y|x,θ)=exp(bj=1θ(y)jϕj(x))cy=1exp(bj=1θ(y)jϕj(x))

Note that the denominator is a kind of regularization term. Then the Logistic regression is defined by the following optimal problem:
maxθi=1mlogq(yi|xi,θ)

We can solve it by gradient descent method:
  1. Initialize θ .
  2. Pick up a training sample (xi,yi) randomly.
  3. Update θ=(θ(1)T,,θ(c)T)T along the direction of gradient ascent:
    θ(y)θ(y)+ϵyJi(θ),y=1,,c
    where
    yJi(θ)=exp(θ(y)Tϕ(xi))ϕ(xi)cy=1exp(θ(y)Tϕ(xi))+{ϕ(xi)0(y=yi)(yyi)
  4. Go back to step 2,3 until we get a θ of suitable precision.

Take the Gaussian Kernal Model as an example:

q(y|x,θ)expj=1nθjK(x,xj)

Aren’t you familiar with Gaussian Kernal Model? Refer to this article:

http://blog.csdn.net/philthinker/article/details/65628280

Here are the corresponding MATLAB codes:

n=90; c=3; y=ones(n/c,1)*(1:c); y=y(:);
x=randn(n/c,c)+repmat(linspace(-3,3,c),n/c,1);x=x(:);

hh=2*1^2; t0=randn(n,c);
for o=1:n*1000
    i=ceil(rand*n); yi=y(i); ki=exp(-(x-x(i)).^2/hh);
    ci=exp(ki'*t0); t=t0-0.1*(ki*ci)/(1+sum(ci));
    t(:,yi)=t(:,yi)+0.1*ki;
    if norm(t-t0)<0.000001
        break;
    end
    t0=t;
end

N=100; X=linspace(-5,5,N)';
K=exp(-(repmat(X.^2,1,n)+repmat(x.^2',N,1)-2*X*x')/hh);

figure(1); clf; hold on; axis([-5,5,-0.3,1.8]);
C=exp(K*t); C=C./repmat(sum(C,2),1,c);
plot(X,C(:,1),'b-');
plot(X,C(:,2),'r--');
plot(X,C(:,3),'g:');
plot(x(y==1),-0.1*ones(n/c,1),'bo');
plot(x(y==2),-0.2*ones(n/c,1),'rx');
plot(x(y==3),-0.1*ones(n/c,1),'gv');
legend('q(y=1|x)','q(y=2|x)','q(y=3|x)');

这里写图片描述

2. Least Square Probability Classification

In LS probability classifiers, linear parameterized model is used to express the posterior probability:

q(y|x,θ(y))=j=1bθ(y)jϕj(x)=θ(y)Tϕ(x),y=1,,c

These models depends on the parameters θ(y)=θ(y)1,,θ(y)bT correlated to each classes y that is diverse from the one used by Logistic classifiers. Learning those models means to minimize the following quadratic error:
Jy(θ(y))==12(q(y|x,θ(y))p(y|x))2p(x)dx12q(y|x,θ(y))2p(x)dxq(y|x,θ(y))p(y|x)p(x)dx+12p(y|x)2p(x)dx
where p(x) represents the probability density of training set {xi}ni=1 .
By the Bayesian formula,
p(y|x)p(x)=p(x,y)=p(x|y)p(y)

Hence Jy can be reformulated as
Jy(θ(y))=12q(y|x,θ(y))2p(x)dxq(y|x,θ(y))p(x|y)p(y)dx+12p(y|x)2p(x)dx

Note that the first term and second term in the equation above stand for the mathematical expectation of p(x) and p(x|y) respectively, which are often impossible to calculate directly. The last term is independent of θ and thus can be omitted.
Due to the fact that p(x|y) is the probability density of sample x belonging to class y , we are able to estimate term 1 and 2 by the following averages:
1ni=1nq(y|xi,θ(y))2,1nyi:yi=yq(y|xi,θ(y))p(y)

Next, we introduce the regularization term to get the following calculation rule:
J^y(θ(y))=12ni=1nq(y|xi,θ(y))21nyi:yi=yq(y|xi,θ(y))+λ2nθ(y)2

Let π(y)=(π(y)1,,π(y)n)T and π(y)i={1(yi=y)0(yiy) , then
J^y(θ(y))=12nθ(y)TΦTΦθ(y)1nθ(y)TΦTπ(y)+λ2nθ(y)2
.
Therefore, it is evident that the problem above can be formulated as a convex optimization problem, and we can get the analytic solution by setting the twice order derivative to zero:
θ^(y)=(ΦTΦ+λI)1ΦTπ(y)
.
In order not to get a negative estimation of the posterior probability, we need to add a constrain on the negative outcome:
p^(y|x)=max(0,θ^(y)Tϕ(x))cy=1max(0,θ^(y)Tϕ(x))

We also take Gaussian Kernal Models for example:

n=90; c=3; y=ones(n/c,1)*(1:c); y=y(:);
x=randn(n/c,c)+repmat(linspace(-3,3,c),n/c,1);x=x(:);

hh=2*1^2; x2=x.^2; l=0.1; N=100; X=linspace(-5,5,N)';
k=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*(x'))/hh);
K=exp(-(repmat(X.^2,1,n)+repmat(x2',N,1)-2*X*(x'))/hh);
for yy=1:c
    yk=(y==yy); ky=k(:,yk);
    ty=(ky'*ky +l*eye(sum(yk)))\(ky'*yk);
    Kt(:,yy)=max(0,K(:,yk)*ty);
end
ph=Kt./repmat(sum(Kt,2),1,c);

figure(1); clf; hold on; axis([-5,5,-0.3,1.8]);
C=exp(K*t); C=C./repmat(sum(C,2),1,c);
plot(X,C(:,1),'b-');
plot(X,C(:,2),'r--');
plot(X,C(:,3),'g:');
plot(x(y==1),-0.1*ones(n/c,1),'bo');
plot(x(y==2),-0.2*ones(n/c,1),'rx');
plot(x(y==3),-0.1*ones(n/c,1),'gv');
legend('q(y=1|x)','q(y=2|x)','q(y=3|x)');

这里写图片描述

3. Summary

Logistic regression is good at dealing with sample set with small size since it works in a simple way. However, when the number of samples is large to some degree, it is better to turn to the least square probability classifiers.

相关文章
|
8月前
|
算法
logistic算法
logistic算法
77 0
|
8月前
|
算法
基于最小二乘正弦拟合算法的信号校正matlab仿真,校正幅度,频率以及时钟误差,输出SNDR,SFDR,ENOB指标
基于最小二乘正弦拟合算法的信号校正matlab仿真,校正幅度,频率以及时钟误差,输出SNDR,SFDR,ENOB指标
|
2月前
|
算法
分享一些提高二叉树遍历算法效率的代码示例
这只是简单的示例代码,实际应用中可能还需要根据具体需求进行更多的优化和处理。你可以根据自己的需求对代码进行修改和扩展。
|
3月前
|
算法
基于最小二乘递推算法的系统参数辨识matlab仿真
该程序基于最小二乘递推(RLS)算法实现系统参数辨识,对参数a1、b1、a2、b2进行估计并计算误差及收敛曲线,对比不同信噪比下的估计误差。在MATLAB 2022a环境下运行,结果显示了四组误差曲线。RLS算法适用于实时、连续数据流中的动态参数辨识,通过递推方式快速调整参数估计,保持较低计算复杂度。
|
3月前
|
移动开发 算法 前端开发
前端常用算法全解:特征梳理、复杂度比较、分类解读与示例展示
前端常用算法全解:特征梳理、复杂度比较、分类解读与示例展示
43 0
|
5月前
|
机器学习/深度学习 人工智能 算法
【人工智能】传统语音识别算法概述,应用场景,项目实践及案例分析,附带代码示例
传统语音识别算法是将语音信号转化为文本形式的技术,它主要基于模式识别理论和数学统计学方法。以下是传统语音识别算法的基本概述
151 2
|
5月前
|
机器学习/深度学习 运维 算法
深入探索机器学习中的支持向量机(SVM)算法:原理、应用与Python代码示例全面解析
【8月更文挑战第6天】在机器学习领域,支持向量机(SVM)犹如璀璨明珠。它是一种强大的监督学习算法,在分类、回归及异常检测中表现出色。SVM通过在高维空间寻找最大间隔超平面来分隔不同类别的数据,提升模型泛化能力。为处理非线性问题,引入了核函数将数据映射到高维空间。SVM在文本分类、图像识别等多个领域有广泛应用,展现出高度灵活性和适应性。
241 2
|
6月前
|
并行计算 算法 Python
Dantzig-Wolfe分解算法解释与Python代码示例
Dantzig-Wolfe分解算法解释与Python代码示例
|
5月前
|
机器学习/深度学习 算法 Python
python与朴素贝叶斯算法(附示例和代码)
朴素贝叶斯算法以其高效性和优良的分类性能,成为文本处理领域一项受欢迎的方法。提供的代码示例证明了其在Python语言中的易用性和实用性。尽管算法假设了特征之间的独立性,但在实际应用中,它仍然能够提供强大的分类能力。通过调整参数和优化模型,你可以进一步提升朴素贝叶斯分类器的性能。
162 0
|
7月前
|
机器学习/深度学习 算法 数据挖掘
Python机器学习10大经典算法的讲解和示例
为了展示10个经典的机器学习算法的最简例子,我将为每个算法编写一个小的示例代码。这些算法将包括线性回归、逻辑回归、K-最近邻(KNN)、支持向量机(SVM)、决策树、随机森林、朴素贝叶斯、K-均值聚类、主成分分析(PCA)、和梯度提升(Gradient Boosting)。我将使用常见的机器学习库,如 scikit-learn,numpy 和 pandas 来实现这些算法。

热门文章

最新文章