Recently, I think that the difference in process between human brain and current recognition systems is the difference in philosophy. Started from remarkable works of Feild and Oslaushend about receptive fields in cortical cells, vision research community believe (or at least admit) that early vision functions in vision pathway takes an important rule in recognizing objects and events. Products raised from this exploration is well-known concepts such as filter banks, textons, shapelets, movemes, low frequency shapes, weak features and so on. These features then are learnt by a classifier using state-of-the-art models (e.x Boosting, Support Vector Machine, Condition Random Field, Bayesian hierachical model). Depending on learner’s category is parametric or non-parametric, it obtain a parameter set or exemplar set after trained. Having haversted remarkable successes, this paradigm of recognition still gets stuck in problems of recognizing from different view-points, intra variant charateristic of object classes, object occlusion (including self-occlusion), varied appearance, multi-pose (human action). So, is it the rule for future vision system?
In spite of the domination of low-level feature based recognition system, there are still some prospective paradigms that thinks differently. The one that I see is a kind of system having a huge database of examples and a extremely robust image matching engine. How can such a system be built? The most notable research group that are pursuing this paradigm is hold at MIT CSAIL. The most motivated person is Antonio Torralba. His interest is exploiting huge database advantage for recognition. His recent works such as spatial envelope, 80 million tiny images, SIFT flow, have sketched a bright picture about how the second paradigm should be.

Let’s see what he did with SIFT flow. This is an image matching technique that finds similar images in their semantics. From a huge database of topics (e.x street, building, cars, people) SIFT flow can match a given image against the database to find out the correct label for the test image. There are two points for such systems. The first point, database must be prepared carefully and resonable. Instance s in the database should be as many as possible. The second point, the image matcheing engine have to be generalized enough. The more images database contains, the more diversities in appearance. If the matching engine is not generalized enough, we fail the mission. Another point is the engine should be fast otherwise it will take hours to compare many thounsands of images. The obvious disadvantage of sencond paradigm is expensive and unportable. However, machine vision is still far from daily life applications.
From the bests of my knowledge, the learning paradigm has produced a vast body of interesting literatures dedicated for themselves. After the arfiticial neural network phenomenon, artificial intelligence community has lowered their head temperature down. However, the marriage between traditional statistics and AI has inspired a new field: statistical machine learning. After Vapnik invented the Support Vector Machine with the core idea but thereotical VC dimension, Learning Theory was born. Simultaneously, inference techniques in probability also take an important position in the current machine learning literature. Graphical model has inspired reseachers to design fancy models that can express dependencies between object in an image or video. The more tools we have, the more products we can create. But the Great Wall of computer vision still stand there without moving back. How can computer recognize objects from different points of view? How can computer think oak tree and pine tree is in the tree class? Again, people begin to study in deductive transfer learning. So far, multitask learning is still lack of coherence.
On the other side, people tend to forget about pattern recognition. In today famous computer vision conferences, the rate of paper submission in pattern recognition is quite low. Undoubtely, it is a hard topic to cope with. The hard point lies in there is no specific pattern to deal with. Human brain thinks about objects using somehow coarse concepts. These concepts are maintained by a set of informative features or a huge set of exemplars, it is in controversy. But it is worthwhile for us to try all the posibility. Pattern matching also requires seminal works in vision feature. Consequently, whatever paradigms computer vision researchers choose to work with, vision and psychology researchers continue their own works diligently./.
Qui Nhon, Feb 17, 2009
Phong Vo