Recent literatures in computer vision and machine learning have observed the emergence of sparse coding as an effcient approach for feature selection. The basic idea of sparse coding [Olshausen 1997] is to represent a feature vector as linear combination of few bases from a predefined dictionary, hence induce the concept of sparsity. In other word, sparse coding provides low-dimensional approximation of a given signal in a given basis set. Apart from principal component analysis, sparse coding does not constraint the orthogonality of bases and it turns out that more flexibility is given to adapt the nonlinear representation of data.
From model fitting point of view, sparse coding provides means to avoid overfitting by eliminating insignificant variables in estimating the fitting function. Equivalently speaking, sparse coding finds a equilibrium solution between minimizing loss function of data-fitting versus sparse regularization of model coefficients, i.e.
in which is the column matrix of input signals with their corresponding sparse coefficients
encoded by dictionary of bases
. The most popular choice for loss function is loss-square and the regularization term is
norm so that (1) is a
-regularized least square optimization problem.
Since the first term in (1) is a quadratic form given the bases fixed, it is convex with global minimum. Although the second term -norm is convex also, it is not differentiable at the origin; hence, it requires some specific techniques to deal with. Instead of keeping bases fixed, ones can learn the dictionary
for better adaptation to data domain. The formulation (1) is not convex in both
and
simultaneously. The solution, nevertheless, can be obtained by iteratively optimize the above objective by alternatingly optimizing with respect to
and
while holding the other fixed.
Posted by vodinhphong