vodinhphong

Review on Codebook methods

In Thoughts on July 2, 2009 at 6:09 pm

As I mention in the previous post, I am doing on clustering and the task is how to build a efficient codebook for event recognition. So far, there are pretty much works on applying codebook to visual categorization, object recognition, and action recognition. Codebook based method is a shortcut to attack the recognition problem directly without concern in preprocessing. But everything is not so perfect, codebook based methods have tons of open questions to demystify. For instances, how many visual words a codebook should have; which is better, a tailored codebook with explicit annotated data, or a data-driven codebook using clustering algorithms; how to compromise among different object classes; which information need to be included, high frequency occurrence visual words, or rare ones; is there any method to preserve geometry information exerpt LDA, pLSA, and pyramid matching; a unified codebook for all classes is better than specific codebooks; and so on.

Since applied on texture classification [citation], codebook based methods are studied extensively on the object recognition problem. Two popular classification approaches are: i) Bayes Networks, ii) Support Vector Machines. In other word, we can use a generative model or discriminative model for learning. With the first choice, Naive Bayes is the most straightforward and simplest[ citation].More complicated models include LDA and pLSA with hidden topic variables [review]. Because the difficulty of generative model is to compute the full likelihood space. On the other hand, discriminative model offers direct methods to classify classes. Another approach is not to concern about incoporating geometry information but codebook discriminative power itself, i.e how to obtain a compact but high ‘quality’. In particular, clustering algorithm is analyzed under various circumstances and assumptions. Following characteristics are being got attention:

  1. Sampling techniques: interest point based, dense sampling
  2. Propose a efficient clustering algorithm
  3. The behavior of data points high very high dimensionality
  4. To present an class instance by the codebook

[citation] had a thorough analysis and found that dense sampling outperform interest point based sampling. Furthermore, when properly clustered, it can perform better. The conclusion is consistent with the observation that interest points do not actually capture the most informative regions. For example, I want to recognize cars from street context, but I always get high density visual words from building corners and trees. The problem is even more critical in the dense sampling case. However, dense sampling treat all points in images with no previllege. So we have below cause-results:

  • Sparse sampling /rightarrow lack of object relevant features /rightarrow combined codebook, i.e universal + specific codebooks (for each class), multi-scale codebooks
  • Dense sampling /rightarrow redundant & trivial features /rightarrow modified clustering algorithm, i.e kernel codebook, subsampling technique + meanshift

Although reliable conclusions have been made, the fact is we cannot apply dense sampling in every problem. For example, is it feasible to sample densely a video sequence? It’s huge, right? Therefore, in my opinion, sparse sampling techniques still have room for improvement. The keypoint here is we have to compensate by a good codebook formation.

Return to my problem, it is defined as follows:

  • Input: video sequences, defined human action concepts with different articulations as well as duration
  • Output: recognize as much as possible happened actions in the movie

After a bunch of experiments, it is likely that different actions type prefer different codebook size (the number of visual words). This is very natural and undoubtely. The handy solution is to build separate codebook for each action type. Consequently, one video sequence is tested on every specific action classifier. Why don’t use a multi-class classifier? On my perspective, the occurrence rate of event specific interest points is quite low (or very low) so its discriminative power is low as well. Using a multi-class classifier is not as efficient as a 2-class classifier with one-vs-all training strategy. However, it’s just a guess and it is possible to concatenate all the representation forms of one instance in each codebook into one long vector and learn it. We can easily infer that there are duplicated values in either case. Universal-specific codebook takes action at this point. But let’s ask the next question, “how damage the duplication cause?” There are two posibilities: i) the recognition is slow down, ii) the curse of dimensionality. Says we have 10 action classes and codebooks with size 1000 for each class. So the early fusion gives us a vector with 1000×10=10000 units! In the practical perspective, the curse of dimensionality is not well observed but through the overall performance of classifier. An elegant solution was proposed by [citation], in which they merge all the samples to generate a most common visual words. This amount has the high probability of occurrence in varous class instances. Then every class-specific codebook is generated indepently. An adaption process can optionaly applied between two kinds of codebook for better constraint.

Recently, Yang and Jurie (2008) proposed a way to unify codebook formation stage and classification stage. They claimed that these two processes are disconnected from each other. A iteration procedure is run to create codebook and

Battiato, S., Farinella, G., Gallo, G. & Ravi, D.
Scene categorization using bag of Textons on spatial hierarchy
#ICIP08#
2008, pp. 2536-2539

Dance, C., Willamowski, J., Fan, L., Bray, C. & Csurka, G.
Visual categorization with bags of keypoints
ECCV International Workshop on Statistical Learning in Computer Vision
2004

Gemert, J.C. van., Geusebroek, J.M., Veenman, C.J. & Smeulders, A.W.M.
Kernel Codebooks for Scene Categorization
European Conference on Computer Vision
2008

Jurie, F. & Triggs, B.
Creating Efficient Codebooks for Visual Recognition
ICCV
2005, pp. 604-610

Metzler, D.
Beyond bags of words: effectively modeling dependence and features in information retrieval
SIGIR Forum, 2008, Vol. 42(1), pp. 77

Monay, F., Quelhas, P., Odobez, J.-M. & Gatica-Perez, D.
Integrating Co-Occurrence and Spatial Contexts on PatchBased Scene Segmentation
CVPRW ‘06: Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
IEEE Computer Society, 2006, pp. 14

Nowak, E., Jurie, F. & Triggs, B.
Sampling strategies for bag-of-features image classification
In Proc. ECCV
Springer, 2006, pp. 490-503

Winn, J.M., Criminisi, A. & Minka, T.P.
Object Categorization by Learned Universal Visual Dictionary
ICCV
2005, pp. 1800-1807

Zhang, W., Surve, A., Fern, X. & Dietterich, T.G.
Learning non-redundant codebooks for classifying complex objects
ICML
2009, pp. 156

How to evaluate a clustering result?

In Research on June 28, 2009 at 8:13 am

Preface

According to Wikipedia,

Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics.

In this post I do not discuss about clustering algorithms but they way to evaluate a clustering result. My current problem is relevant to forming a codebook for visual categorization,  i.e to cluster a huge dataset (~ 6.525 million feature vectors) into clusters (visual words). After that, this codebook is used as a reference to vote into samples. In other word, this is exact the BoW method. The problem here is, how to know a clustering result is discriminative enough or not. Here I noted some idea from  Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

K-means clustering

It is natural to talk about clustering by review the K-means algorithm:

  • A special case of a general procedure known as EM (Expectation Maximization)
  • Termination conditions:
    • A fixed number of iterations
    • Sample partition unchanged
    • Centroid positions unchanged (does the 2nd condition hold?)
  • Time complexity \Theta(lKmn)
    • l: iteration
    • K: number of clusters
    • n: number of samples
    • n: sample dimension

Evaluation of clustering

Internal evaluation

  • High intra-cluster similarity
  • Low inter-cluster similarity
  • Measured quality of a clustering depends on
    • sample representation (i.e how to represent descriptor efficiently from raw data)
    • similarity measure (i.e this post)

Comment: It seems that this kind of evaluation is not very meaningful. Instead of using it, I take the clustering result to use for another application and measure the application’s performance to decide the clustering is good or not.

External evaluation

In spite of unsupervised learning, clustering can benefits from some kinds of benchmark data/labeled data (if available).  Assumed that I have this benchmark data and I want to know whether clustering method and accompanied parameters is good. Following measures can be used:

Purity

purity(\Omega,C)=\frac{1}{N}\sum{max_j\left|\omega_k\cap c_j\right|}

in which, \Omega={\omega_1,\omega_2,...,\omega_K} is the set of clusters, C={c_1,c_2,...,c_J} is the set of classes.

Purity demonstrates how much a cluster contains different classes. The more classes a cluster has in itself, the less purity is. However, purity can be easily obtained in the case K=N.

Normalized Mutual Information

NMI(\Omega,C)=\frac{I(\Omega,C)}{(H(\Omega)+H(C))/2}

in which, I – mutual information, expressed as follows:

I(\Omega,C)=\sum_k{\sum_j{P(\omega_k\cap c_j)log(\frac{P(\omega_k\cap c_j)}{P(\omega_k)P(c_j)})}}=\sum_k{\sum_j{\frac{|\omega_k\cap c_j|}{N}log(\frac{N|\omega_k\cap c_j|}{|\omega_k||c_j|})}}

and H – entropy, expressed as follows:

H(\Omega)=-\sum_kP(\omega_k)logP(\omega_k)=-\sum_k\frac{|\omega_k|}{N}log\frac{\omega_k}{N}

Comment:  Mutual information allows us to gain information about the classes when given what the clusters are ( in the ideal case, clusters are exact classes). However, MI has the similar case as Purity when each cluster contains just one sample. Avoiding it, MI is divided by the denominator (H(\Omega)+H(C))/2. Entropy increases with the number of clusters. In case K=N, H(\Omega)=logN and therefore NMI is low. Interesting?

Accuracy  criterion (or Rand Index)

RI=\frac{TP+TN}{TP+FP+FN+TN}

Ones can use accuracy concept to apply for a clustering result: A true positive TP decision assigns two similar samples to the same cluster, a false positive FP decision assigns two similar samples to different clusters. The formula is quite simple:

For a more comformtable C/C++

In Programming on June 25, 2009 at 2:36 pm

C/C++ coding is often a nightmare with me because of its allocation/deallocation mechanism is delegated to programmer. But it is exact the way I like.  Flexibility is a two-blade knife, it can help you much and it can kill  you at once! Since I was a IT student, I rarely used C/C++ to write programs, and used Java instead. At that time, I adored Java and hated C/C++ because of Visual Studio. I had not known that there are so much cool IDEs that I can use free.

By now, I am on the way to become a PhD student and I like C/C++. However, using C/C++ in a right way is the problem.

The first thing I think, it is what kind of OS I should use? Linux!

The second thing is which IDE do I use? Eclipse. With Eclipse, I can manipulate makefile with supports from Eclipse, or write and maintain by myself. This point is quite important because when you deploy an application/your code on another computer/system, the most portable way is to use Makefile.

The next thing is how to use C/C++ in the most comformtable way? Use good external libraries to reduce time to develop or fix bugs.

The most important library is STL (sure!)

If you want to write programs with rich command line options (and you should write that), use the Boost library with Program_options package. Writing a GUI application is a tedious task and I personally think it is the bad way. Within several minutes, you can add numerous of functionalities into your program without doubt about where to place these buttons, these textboxes, etc. If your OS is Windows, please install Boost from this site. Do not waste time to install from Boost homepage, it is just good for Linux users and tutorials.

In fact, the more you depend on external libraries, the more problems you get when deploying on other computers. So, be thoughtful before decide to use them. On the next post, I might introduce about Blitz++.