Dictionary Learning with CVX Toolbox

May 22, 2011

In this post we do an exercise that help you implement sparse coding and dictionary learning without using algorithms dedicated to optimizing a \ell_1-regularized least squares problem. We will use a generic convex optimization CVX toolbox for sparse encoding and adopt simple updating rule for dictionary \mathbf{D}.

For simplicity, a single image “lena” is used to learn the dictionary. Firstly, patches are extracted and normalized:


clear all;
I=double(imread('lena.png'))/255;
% extract 8 x 8 patches
X=im2col(I,[8 8],'sliding');
X=X-repmat(mean(X),[size(X,1) 1]);
X=X ./ repmat(sqrt(sum(X.^2)),[size(X,1) 1]);
X=X(:,1:100:end);

Since the number of patches is so huge, we just sub-sample 1%. This is enough for us to check the correctness of our algorithm.

Next, the number of iterations, dictionary size, and regularization coefficient are set:


nIter = 25;
gamma = .15;
p = 200;

% init dictionary
sel = randperm(size(X,2));
sel = sel(1:p);
D = X(:,sel);
n = size(X,2);
%D = D ./ repmat( sqrt(sum(D.^2)), [w^2, 1] );
S = zeros(p,n);

We individually encode each of input sample by optimizing the following objective function:

\underset{\mathbf{s}}{\text{argmin }} \frac{1}{2}\left\Vert \mathbf{y} - \mathbf{D}\mathbf{s}\right\Vert_2^2 + \lambda*\left\Vert\mathbf{s}\right\Vert_1

cvx_begin quiet
variable s(p);
minimize (.5*norm(X(:,j)-D*s) + gamma*norm(s,1));
cvx_end

In order to update \mathbf{D}, we just need to solve a least squares problem:

\underset{\mathbf{D}}{\text{argmin }} \frac{1}{2}\left\Vert \mathbf{X} - \mathbf{D}\mathbf{S}\right\Vert_F^2

This objective function can be solved easily by taking the first-order derivative respect to \mathbf{D}:

\begin{array}{cl}  & \nabla_{\mathbf{D}}\text{Tr}\left(\frac{1}{2}\left(\mathbf{X}-\mathbf{D}\mathbf{S}\right)^{T}\left(\mathbf{X}-\mathbf{D}\mathbf{S}\right)\right)\\  = & \frac{1}{2}\nabla_{\mathbf{D}}\text{Tr}\left(\mathbf{X}^{T}\mathbf{X}-\mathbf{X}^{T}\mathbf{D}\mathbf{S}-\mathbf{S}^{T}\mathbf{D}^{T}\mathbf{X}+\mathbf{S}^{T}\mathbf{D}^{T}\mathbf{D}\mathbf{S}\right)\\  = & 0-\frac{1}{2}\mathbf{X}\mathbf{S}^{T}-\frac{1}{2}\mathbf{X}\mathbf{S}^{T}+\frac{1}{2}\mathbf{D}\left(\mathbf{S}\mathbf{S}^{T}+\mathbf{S}\mathbf{S}^{T}\right)\end{array}

If the Hessian \mathbf{D}^T\mathbf{D} is positive semidefinite, the above quadratic form reachs its global minimum value with the minimizer:

\mathbf{D}=\mathbf{X}\mathbf{S}^T\left(\mathbf{S}\mathbf{S}^T\right)^{-1}

There is the case that \mathbf{D} will grow big and \mathbf{S} shrinks small to fit \mathbf{X}. In order to prevent it from happenning, a \ell_2-norm applied for dictionary atoms is neccessary:

\Vert\mathbf{d}_{i}\Vert_{2}^{2}=1,i=1\ldots P

The Matlab code is very simple:

D = X * pinv(S);
D=D ./ repmat(sqrt(sum(D.^2)),[size(D,1) 1]);

After N iterations, we can visualize the content of dictionary atoms by calling from SPAMS Toolbox:

ImD=displayPatches(D);
figure, imagesc(ImD); colormap('gray');

The result after 6 hours training (!) is displayed below:

Learned dictionary from Lena image with 100 atoms


Follow

Get every new post delivered to your Inbox.