ℓ2constrained least squares
In the simple least squares, noisy samples may lead to overfitting learning output. Therefore, it is rational to constrain the space of parameters.
We will focus on the simplest case - ℓ2 constrained least squares in this note, i.e.
minθJLS(θ),s.t.∥θ∥2≤R
In order to the solve the aforesaid optimal problem, using Lagrangian dual problem, we can utilize the following optimal problem:
maxλminθ[JLS(θ)+λ2(∥θ∥2−R)]
A brief review of Lagrangian dual problem can be found in the bottom of the note.
However, it is not necessary to define a R to constrained λ, then we can solve the estimated θ as
θ^=argminθ[JLS(θ)+λ2∥θ∥2]
where the first term JLS(θ) represents the fitting level which is combined by λ2∥θ∥2 to prevent overfitting to some degree.
Taking the partial difference of [JLS(θ)+λ2∥θ∥2 and seting it to be zero, we get the solution
θ^=(ΦTΦ+λI)−1ΦTy
A more general method involves a regularizer G:
minθJLS(θ)s.t.θTGθ≤R
and
θ^=(ΦTΦ+λG)−1ΦTy
Ex: The Guassian Kernal Model
fθ(x)=∑j=1nθjK(x,xj),K(x,c)=exp(−∥x−c∥22h2)
The MATLAB codes go as follows:
n=50; N=1000;
x=linspace(-3,3,n)'; X=linspace(-3,3,n)';
pix=pi*x; y=sin(pix)./(pix)+0.1*x+0.2*randn(n,1);
x2=x.^2; X2=X.^2; hh=2*0.3^2; l=0.1;
k=exp(-(repmat(x2,1,n)+repmat(x2',n,1)-2*x*x')/hh);
K=exp(-(repmat(X2,1,n)+repmat(x2',N,1)-2*X*x')/hh);
t1=k\y; F1=K*t1; t2=(k^2+l*eye(n))\(k*y); F2=K*t2;
figure(1); clf; hold on; axis([-2.8,2.8,-1,1.5]);
plot(X,F1,'g-');plot(X,F2,'r--');plot(x,y,'bo');
legend('LS','L2-Constrained LS');
Appendix Lagrangian Dual Problem
Given differentiable convex function f:Rd→R and g:Rd→Rp, a optimal problem can be formulated as
mintf(t),s.t.g(t)≤0
Let
λ=(λ1,…,λp)T
be the Lagrangian multiplier.
Let
L(t,λ)=f(t)+λTg(t)
be the Lagrangian function.
Then the aforementioned optimal problem can be defined as
maxλinftL(t,λ),s.t.λ≥0
That’s the Lagrangian dual problem. We can get the same t by solving it.