3
votes

min_{a*x=y} +lambda*norm(\hat{a},1) is the objective function where a is the vector of coefficients, y denotes the noisy measurements and x is the unobserved input signal. I am aware of lasso() function but I don't prefer to use the inbuilt function since this will not help me to understand the steps. Can somebody help in implementing the l1 norm optimization?

Mathematically, my model is expressed as a moving average (MA) system: y[k] = a_1*x[k] + a_2*x[k-1] + a_{10}*x[k-9] + n[k] where n ~ N(0,\sigma^2) is Additive White Gaussian Noise and x is a zero mean Gaussian white process of unity variance and a_1,a_2,...,a_10 are the coefficients of the MA model which are known to be sparse. However, I don't know the position of the sparse coefficients.

In this model only 3 coefficients are non-zero whereas the rest are all zero or close to zero. One approach of doing parameter estimation is to construct an inverse filter or also known as the minimizing the prediction error.

By inverse filtering approach, I can create an inverse filter for the MA model which is expressed by: u[k] = x[k]-(\hat{a_2}*x[k-1]+ \hat{a_3}*x[k-3] + \hat{a_{4}}*x[k-4] +\ldots+\hat{a_{10}}*x[k-9] ).

Therefore, the objective function becomes: J = min_{\hat{a}*x=y} +lambda*norm(\hat{a},1) wherey is the observed noisy measurements and \hat{a}*x is the clean. Let \mathbf{\hat{a}} = {[\hat{a_1},\ldots,\hat{a_{10}}]}^T represent the estimated coefficient vector.

My approach is to split the objective function J into 2 parts--the first part being the OLS estimate which is fed into an l1 minimization routine. The output of the l1 minimization gives the sparse coefficients. Is the approach legitimate? If so, I need help on what is the l1 optimizer in Matlab?

The following is the code snippet where I have created the model. But I don't know how to solve the objective function. Please help.

%Generate input
N=500;
x=(randn(1,N)*100);
L = 10;
Num_lags = 1:L-1;
a = 1+randn(L,1);
%Data preparation into regressors    
a(rand(L,1)<.9)=0; % 90 of the coefficients are zero
X1 = lagmatrix(x, [0 Num_lags]);
2
Emmm, It is about 2 months since I read lasso paper, but it seems that solving lasso is a quadratic programming with linear constraints. I can find my code in that if you want, does that help?Hunter Jiang

2 Answers

2
votes

Due to the lasso paper, we have:

enter image description here

It's a quadratic programming with linear constraints , and there is a lasso parameter t>=O which controls the amount of shrinkage that is applied to the estimates. In your case, we could assume that t=0.1*sum(beta0). (beta0 is full least squares estimates; according to the conclusion on page 3)


Edit: assume that we have a lasso equation (1) with 2 parameters.

  1. Eq(1): argmin{(y(1)-alpha-beta_1*x(1,1)-beta_2*x(2,1))^2+(y(2)-alpha-beta_1*x(1,2)-beta_2*x(2,2))^2} s.t. sum(abs(beta))<=t `
  2. due to hat(alpha)=mean(y), we have Eq(1) ' : argmin{(hat(y(1))-beta_1*x(1,1)-beta_2*x(2,1))^2+(hat(y(2))-beta_1*x(1,2)-beta_2*x(2,2))^2} s.t. sum(abs(beta))<=t by defining hat(y)=y-mean(y).
  3. (1) Eq(1) ' could rewrite into argmin{(beta)'H(beta)+f(beta)+C} and the solution of it is equals to argmin{(beta)'H(beta)+f(beta)}
  4. (2) the s.t. part is equals to 2^(length(beta)) constraints (Ax=b), and A is a p-tuples (e.g. (-1,-1),(-1,1),(1,-1),(1,1) in 2 parameters case) with b=(t,t)'.

Then we could solve it by quadprog in MATLAB with:

  • calculate the matrix H,f by turning lasso function into a binomial type.

  • let delta_i ,i= 1, 2, ... , 2P be the p-tuples of the form (+-1 ,+-1, ...,+-1), we could rewrite our constraints into Ax<b(delta_i*beta<=t).

  • solving the lasso estimate by calling the function quadprog(H,f,A,b).

Here is my code:

clc; clear;
%Generate input
N=500;
Bnum=10;
x=(randn(N,Bnum)*1000);
true_beta = 1+randn(Bnum,1);
y=x*true_beta;

%find the solution of lm & t
%lm solution beta0
ls_beta=(x'*x)\(x'*y);
%define t
t=0.1*sum(abs(ls_beta));

%%solving the quadratic programming
%calc H
H=zeros(Bnum);
for ii=1:Bnum
    for ij=1:Bnum
      H(ii,ij)=sum(2*x(:,ii).*x(:,ij));
    end
end
H;
%calc f
f=zeros(Bnum,1);
yhat=y-mean(y);
for ii=1:Bnum
  f(ii)=sum(-2*yhat.*x(:,ii));
end
f;
%calc A
A=zeros(power(2,Bnum),Bnum);
for ii=1:power(2,Bnum)
  v=dec2bin(ii-1,Bnum);
  for ij=1:Bnum
    A(ii,ij)=(str2num(v(ij))-0.5)*2;
  end
end
%calc b
b=ones(power(2,Bnum),1)*t;
%calc quadprog
lasso_beta=quadprog(H,f,A,b);
[true_beta ls_beta lasso_beta]

we will have an answer ([true_beta ls_beta lasso_beta]) like this:

ans =

    0.5723    0.5723    0.0000
    0.4206    0.4206    0.0000
    1.9260    1.9260    0.1109
    1.0055    1.0055    0.0000
    0.3655    0.3655    0.0000
    1.8583    1.8583    0.1582
    0.5192    0.5192    0.0000
    2.4897    2.4897    0.7242
    0.3706    0.3706   -0.0000
    0.4060    0.4060    0.0000

p.s all the Emphasis come from the lasso paper.

Hope it helps!

2
votes

The code below could solve l1-optimization argmin{f(x)} s.t.||x||_1<=t.

Edit:Updated a typo in @V(ref. to the comment from SKM)

clc; clear;
%Generate input data
N=500;
Bnum=10;
X=(randn(N,Bnum)*1000);
true_beta = rand(Bnum,1);
Y=X*true_beta+rand(N,1);

%solve lasso using fminunc
lamda=1;
V = @(x) norm(Y-X*x)^2+lamda*norm(x,1);
options=optimoptions('fminunc','Algorithm','quasi-newton','Display','iter');
xopt = fminunc(V,zeros(Bnum,1),options)

However, I still recommend the code in the second post, which uses QUADPROG function. It will be much faster and more accurate.

Ref