<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>All4Sci</title>
	<atom:link href="http://vodinhphong.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://vodinhphong.wordpress.com</link>
	<description>Computer Vision ideas....</description>
	<lastBuildDate>Tue, 25 Aug 2009 03:32:46 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='vodinhphong.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/ce86898c359d280bc36484bfc86c1198?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>All4Sci</title>
		<link>http://vodinhphong.wordpress.com</link>
	</image>
			<item>
		<title>Survive from Codecs hell</title>
		<link>http://vodinhphong.wordpress.com/2009/08/25/survive-from-codecs-hell/</link>
		<comments>http://vodinhphong.wordpress.com/2009/08/25/survive-from-codecs-hell/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 03:28:06 +0000</pubDate>
		<dc:creator>vodinhphong</dc:creator>
				<category><![CDATA[Techniques]]></category>

		<guid isPermaLink="false">http://vodinhphong.wordpress.com/?p=164</guid>
		<description><![CDATA[1. Introduction
Tackling video processing, everyone has to equip by oneself a minimal knowledge on video formats, video encoding, video decoding. As I moved from 2D object recognition into action/event recognition, it is clear that I have to process on a new kind of data, videos instead of images. So far I have used OpenCV for [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=164&subd=vodinhphong&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><h2>1. Introduction</h2>
<p>Tackling video processing, everyone has to equip by oneself a minimal knowledge on video formats, video encoding, video decoding. As I moved from 2D object recognition into action/event recognition, it is clear that I have to process on a new kind of data, videos instead of images. So far I have used OpenCV for almost tasks and found that it&#8217;s quite cool. But the auxiliary part of it, HighGUI, is not cool at all. HighGUI supports all kind of image formats by using third party libraries. In Windows redistribution package, image libraries are pre-complied as DLL files (libguide.dll) so that it is transparent. In Linux redistribution package, the OS has to be installed along with libpng, libjpeg, libtiff, .etc. Today these basic ones are packaged in popular Linux OS, such as Ubuntu, Fedora. The problem is not image but video. HighGUI video functionality behaves the same way as image codecs. It uses third parties libraries  on video reading/writing. In Windows, HighGUI uses the prosperous WinAPI for multimedia. In Linux, HighGUI uses the popular FFMPEG encoding/decoding API. In recent versions (1.0, 1.1pre1, and nightly build), OpenCV has problems with FFMPEG, especially in Ubuntu and Red Hat. It seems that it works properly in Fedora. Note that OpenCV is distributed under 2 ways: by manually download and install or install through OS&#8217;s repository. If you choose the latter way, it MIGHT work. Nevertheless, the problem here is FFMPEG is upgraded to version 0.5, in which some functions were deprecated, i,e img_convert, and header files directory structure wass changed. Unfortunately, OpenCV HighGUI has not been updated. There are several tips to fix the problem but personally I cannot apply them to work. That&#8217;s why I wrote this tutorial.</p>
<p>Return to our problem, says How to read (sometimes write) a video file? Assumed that you want to complete the task without cost (i.e spend money to buy a commercial cool video software for that task, or hire a expert to do it for you). I&#8217;m talking in the realm of Open Sources. As mentioned above, this article is implicit for computer vision tasks. Therefore you just need to extract frames (no audio) and no more (no fancy effects, no post-processing, no substitutes, no anti-aliasing, etc.)</p>
<p>The first concept you have to know is &#8220;codecs&#8221;. In the next part, I talk about how to convert between codecs and provide some converter tools. For who wants to programming with Video I/O API,  the last part introduce a few shortcuts.</p>
<h2>2. Background</h2>
<h3>Codec definition</h3>
<p>A <strong>codec</strong> is a device or <a title="Computer program" href="http://en.wikipedia.org/wiki/Computer_program">computer program</a> capable of <a title="Encoder" href="http://en.wikipedia.org/wiki/Encoder">encoding</a> and/or <a title="Decoding methods" href="http://en.wikipedia.org/wiki/Decoding_methods">decoding</a> a <a title="Digital" href="http://en.wikipedia.org/wiki/Digital">digital</a> <a title="Data" href="http://en.wikipedia.org/wiki/Data">data</a> stream or <a title="Signal (information theory)" href="http://en.wikipedia.org/wiki/Signal_%28information_theory%29">signal</a>. The word <em>codec</em> is a <a title="Portmanteau" href="http://en.wikipedia.org/wiki/Portmanteau">portmanteau</a> of &#8216;<strong>co</strong>mpressor-<strong>dec</strong>ompressor&#8217; or, most accurately, &#8216;<strong>co</strong>der-<strong>dec</strong>oder&#8217;. A <strong>video <a title="Codec" href="http://en.wikipedia.org/wiki/Codec">codec</a></strong> is a device or <a title="Software" href="http://en.wikipedia.org/wiki/Software">software</a> that enables <a title="Video compression" href="http://en.wikipedia.org/wiki/Video_compression">video compression</a> and/or decompression for digital video. The compression usually employs <a title="Lossy data compression" href="http://en.wikipedia.org/wiki/Lossy_data_compression">lossy data compression</a>.</p>
<h3>Commonly used standards and codecs</h3>
<p><strong><a title="H.261" href="http://en.wikipedia.org/wiki/H.261">H.261</a></strong>: Used primarily in older videoconferencing and videotelephony products. It included such well-established concepts as YCbCr color representation, the 4:2:0 sampling format, 8-bit sample precision, 16&#215;16 macroblocks, block-wise <a title="Motion compensation" href="http://en.wikipedia.org/wiki/Motion_compensation">motion compensation</a>, 8&#215;8 block-wise <a title="Discrete cosine transform" href="http://en.wikipedia.org/wiki/Discrete_cosine_transform">discrete cosine transformation</a>, zig-zag coefficient scanning, <a title="Quantization" href="http://en.wikipedia.org/wiki/Quantization">scalar quantization</a>, run+value symbol mapping, and <a title="Huffman coding" href="http://en.wikipedia.org/wiki/Huffman_coding">variable-length coding</a>. H.261 supported only <a title="Progressive scan" href="http://en.wikipedia.org/wiki/Progressive_scan">progressive scan</a> video.</p>
<p><strong><a title="MPEG-1" href="http://en.wikipedia.org/wiki/MPEG-1">MPEG-1</a> Part 2</strong>: Used for <a title="Video CD" href="http://en.wikipedia.org/wiki/Video_CD">Video CDs</a>, and also sometimes for online video. In terms of technical design, the most significant enhancements in MPEG-1 relative to H.261 were half-pel and bi-predictive <a title="Motion compensation" href="http://en.wikipedia.org/wiki/Motion_compensation">motion compensation</a> support. MPEG-1 supports only <a title="Progressive scan" href="http://en.wikipedia.org/wiki/Progressive_scan">progressive scan</a> video.</p>
<p><strong><a title="MPEG-2" href="http://en.wikipedia.org/wiki/MPEG-2">MPEG-2</a> Part 2</strong>: Used on <a title="DVD" href="http://en.wikipedia.org/wiki/DVD">DVD</a>, <a title="SVCD" href="http://en.wikipedia.org/wiki/SVCD">SVCD</a>, and in most digital video broadcasting and cable distribution systems. When used on a standard DVD, it offers good picture quality and supports widescreen. In terms of technical design, the most significant enhancement in MPEG-2 relative to MPEG-1 was the addition of support for <a title="Interlace" href="http://en.wikipedia.org/wiki/Interlace">interlaced</a> video.</p>
<p><strong><a title="H.263" href="http://en.wikipedia.org/wiki/H.263">H.263</a></strong>: Used primarily for videoconferencing, videotelephony, and internet video. H.263 represented a significant step forward in standardized compression capability for <a title="Progressive scan" href="http://en.wikipedia.org/wiki/Progressive_scan">progressive scan</a> video.</p>
<p><strong><a title="MPEG-4" href="http://en.wikipedia.org/wiki/MPEG-4">MPEG-4</a> Part 2</strong>: An <a title="MPEG" href="http://en.wikipedia.org/wiki/MPEG">MPEG</a> standard that can be used for internet, broadcast, and on storage media. It offers improved quality relative to MPEG-2 and the first version of H.263. It also included some enhancements of compression capability, both by embracing capabilities developed in H.263 and by adding new ones such as quarter-pel <a title="Motion compensation" href="http://en.wikipedia.org/wiki/Motion_compensation">motion compensation</a>. Like MPEG-2, it supports both <a title="Progressive scan" href="http://en.wikipedia.org/wiki/Progressive_scan">progressive scan</a> and <a title="Interlace" href="http://en.wikipedia.org/wiki/Interlace">interlaced</a> video.</p>
<p><strong><a title="DivX" href="http://en.wikipedia.org/wiki/DivX">DivX</a></strong>, <strong><a title="Xvid" href="http://en.wikipedia.org/wiki/Xvid">Xvid</a></strong>, <strong><a title="FFmpeg" href="http://en.wikipedia.org/wiki/FFmpeg">FFmpeg</a> MPEG-4</strong> and <strong><a title="3ivx" href="http://en.wikipedia.org/wiki/3ivx">3ivx</a></strong>: Different implementations of MPEG-4 Part 2.</p>
<p><strong><a title="MPEG-4" href="http://en.wikipedia.org/wiki/MPEG-4">MPEG-4</a> Part 10</strong> This emerging new standard is the current state of the art of <a title="ITU-T" href="http://en.wikipedia.org/wiki/ITU-T">ITU-T</a> and <a title="MPEG" href="http://en.wikipedia.org/wiki/MPEG">MPEG</a> standardized compression technology, and is rapidly gaining adoption into a wide variety of applications. It contains a number of significant advances in compression capability, and it has recently been adopted into a number of company products, including for example the <a title="XBOX 360" href="http://en.wikipedia.org/wiki/XBOX_360">XBOX 360</a>, <a title="PlayStation Portable" href="http://en.wikipedia.org/wiki/PlayStation_Portable">PlayStation Portable</a>, <a title="IPod" href="http://en.wikipedia.org/wiki/IPod">iPod</a>, <a title="IPhone" href="http://en.wikipedia.org/wiki/IPhone">iPhone</a>, the <a title="Nero Digital" href="http://en.wikipedia.org/wiki/Nero_Digital">Nero Digital</a> product suite, <a title="Mac OS X v10.4" href="http://en.wikipedia.org/wiki/Mac_OS_X_v10.4">Mac OS X v10.4</a>, as well as <a title="HD DVD" href="http://en.wikipedia.org/wiki/HD_DVD">HD DVD</a>/<a title="Blu-ray Disc" href="http://en.wikipedia.org/wiki/Blu-ray_Disc">Blu-ray Disc</a>.</p>
<p><strong><a title="X264" href="http://en.wikipedia.org/wiki/X264"></a></strong></p>
<p><strong><a title="VP6" href="http://en.wikipedia.org/wiki/VP6"></a></strong></p>
<p><strong><a title="Sorenson codec" href="http://en.wikipedia.org/wiki/Sorenson_codec"></a></strong></p>
<p><strong><a title="Theora" href="http://en.wikipedia.org/wiki/Theora"></a></strong></p>
<p><strong><a title="WMV" href="http://en.wikipedia.org/wiki/WMV">WMV</a> (Windows Media Video)</strong>: <a title="Microsoft" href="http://en.wikipedia.org/wiki/Microsoft">Microsoft</a>&#8217;s family of video codec designs including WMV 7, WMV 8, and WMV 9. It can do anything from low resolution video for dial up internet users to <a title="High-definition television" href="http://en.wikipedia.org/wiki/High-definition_television">HDTV</a>.</p>
<p><strong><a title="VC-1" href="http://en.wikipedia.org/wiki/VC-1"></a></strong></p>
<p><strong><a title="RealVideo" href="http://en.wikipedia.org/wiki/RealVideo">RealVideo</a></strong>: Developed by <a title="RealNetworks" href="http://en.wikipedia.org/wiki/RealNetworks">RealNetworks</a>.</p>
<p><strong><a title="Cinepak" href="http://en.wikipedia.org/wiki/Cinepak">Cinepak</a></strong>: A very early codec used by Apple&#8217;s QuickTime.</p>
<p><strong><a title="Huffyuv" href="http://en.wikipedia.org/wiki/Huffyuv">Huffyuv</a></strong>: Huffyuv (or HuffYUV) is a very fast, lossless Win32 video codec written by Ben Rudiak-Gould and published under the terms of the GPL as free software, meant to replace uncompressed YCbCr as a video capture format.</p>
<h3>Container</h3>
<p>A <strong>container</strong> or <strong>wrapper format</strong> is a <a title="File format" href="http://en.wikipedia.org/wiki/File_format">file format</a>, or often a <a title="Stream (computing)" href="http://en.wikipedia.org/wiki/Stream_%28computing%29">stream</a> format (the stream need not be stored as a file) whose specifications regard only the way data are stored (but <em>not</em> coded) within the file, and how many metadata could or are effectively stored, whereas no specific codification of the data themselves is implied or specified.</p>
<h3>Video container</h3>
<p>Simple container formats can contain different types of <a title="Audio codec" href="http://en.wikipedia.org/wiki/Audio_codec">audio codecs</a>, while more advanced container formats can support multiple audio and video streams, <a title="Subtitle (captioning)" href="http://en.wikipedia.org/wiki/Subtitle_%28captioning%29">subtitles</a>, chapter-information, and meta-data (<a title="Tag (metadata)" href="http://en.wikipedia.org/wiki/Tag_%28metadata%29">tags</a>) — along with the synchronization information needed to play back the various streams together.</p>
<p>Some containers are exclusive to audio:</p>
<ul>
<li><a title="Audio Interchange File Format" href="http://en.wikipedia.org/wiki/Audio_Interchange_File_Format">AIFF</a> (IFF file format, widely used on <a title="Mac OS" href="http://en.wikipedia.org/wiki/Mac_OS">Mac OS</a> platform)</li>
<li><a title="WAV" href="http://en.wikipedia.org/wiki/WAV">WAV</a> (<a title="Resource Interchange File Format" href="http://en.wikipedia.org/wiki/Resource_Interchange_File_Format">RIFF</a> file format, widely used on <a title="Microsoft Windows" href="http://en.wikipedia.org/wiki/Microsoft_Windows">Windows</a> platform)</li>
</ul>
<p>Other containers are exclusive to still images:</p>
<ul>
<li><a title="Tagged Image File Format" href="http://en.wikipedia.org/wiki/Tagged_Image_File_Format">TIFF</a> (Tagged Image File Format) is a wrapper file format for still images and associated metadata.</li>
</ul>
<p>Other flexible containers can hold many types of audio and video, as well as other media. The most popular multi-media containers are:</p>
<ul>
<li><a title="3GP" href="http://en.wikipedia.org/wiki/3GP">3GP</a> (used by many mobile phones, based on the ISO base media file format defined in MPEG-4 Part 12)</li>
<li><a title="Advanced Systems Format" href="http://en.wikipedia.org/wiki/Advanced_Systems_Format">ASF</a> (standard container for Microsoft <a title="Windows Media Audio" href="http://en.wikipedia.org/wiki/Windows_Media_Audio">WMA</a> and <a title="Windows Media Video" href="http://en.wikipedia.org/wiki/Windows_Media_Video">WMV</a>)</li>
<li><a title="Audio Video Interleave" href="http://en.wikipedia.org/wiki/Audio_Video_Interleave">AVI</a> (the standard <a title="Microsoft Windows" href="http://en.wikipedia.org/wiki/Microsoft_Windows">Microsoft Windows</a> container, also based on <a title="Resource Interchange File Format" href="http://en.wikipedia.org/wiki/Resource_Interchange_File_Format">RIFF</a>)</li>
<li><a title="QuickTime" href="http://en.wikipedia.org/wiki/QuickTime#QuickTime_file_format">MOV</a> (standard <a title="QuickTime" href="http://en.wikipedia.org/wiki/QuickTime">QuickTime</a> video container from <a title="Apple Inc." href="http://en.wikipedia.org/wiki/Apple_Inc.">Apple Inc.</a>)</li>
<li><a title="MPEG program stream" href="http://en.wikipedia.org/wiki/MPEG_program_stream">MPEG program stream</a> (standard container for MPEG-1 and MPEG-2 <a title="Elementary stream" href="http://en.wikipedia.org/wiki/Elementary_stream">elementary streams</a>)</li>
<li><a title="MPEG transport stream" href="http://en.wikipedia.org/wiki/MPEG_transport_stream">MPEG-2 transport stream</a> (TS) (a.k.a. MPEG-TS) (standard container for digital broadcasting; typically contains multiple video and audio streams.</li>
<li><a title="MPEG-4 Part 14" href="http://en.wikipedia.org/wiki/MPEG-4_Part_14">MP4</a> (standard audio and video container for the <a title="MPEG-4" href="http://en.wikipedia.org/wiki/MPEG-4">MPEG-4</a> multimedia portfolio, based on the ISO base media file format defined in MPEG-4 Part 12)</li>
<li><a title="RealMedia" href="http://en.wikipedia.org/wiki/RealMedia">RealMedia</a> (standard container for <a title="RealVideo" href="http://en.wikipedia.org/wiki/RealVideo">RealVideo</a> and <a title="RealAudio" href="http://en.wikipedia.org/wiki/RealAudio">RealAudio</a>)</li>
</ul>
<h2>3. Codecs softwares/libraries</h2>
<p>How do we can watch high definition movies on PC or watch Youtube on cellphone? Quite often we have to install a media player, i.e Windows Media Player, RealPlayer, QuickTime, SMPlayer, KMPlayer, Media Classic Player, VLC Player. But sometimes video codecs are already supported by OS and therefore we do not need to do anything. Commercial multimedia players have their own codec components. These codecs can be developed independently or based on some popular multimedia APIs. Some typical codec packs are FFMPEG, VFW, DirectShow, Mencoder.</p>
<p>In Linux, almost players or multimedia editors are based on FFMPEG. This is a universal codec supporting nearly every codec standard. Furthermore, as FFMPEG is open source under GPL license and well-documented, it is quite easy to program. FFMPEG is distributed along with 3 major components: libavcodec &#8211; responsible for decoding/encoding videos, libavformat &#8211; responsible for detect &amp; read video/audio streams from video containers, libswscale &#8211; software scaling. In fact, FFMPEG is a powerful command line based converter; however one can use it as a tool or use as a external library. It is a good idea to write your player using FFMPEG.</p>
<p>In Windows, players can be developed in many ways. The first way is to use Microsoft&#8217;s DirectX SDK. Historically, VFW(Video for Windows) was the first API in Windows 3.1. Then it was integrated in DirectX 5. Years by year, video encoding/decoding has been separated into DirectShow, a component of DirectX. DirectShow provides a comfortable environment on which third party softwares develop their own multimedia applications/video filters. Clearly, DirectShow is a comprehensive but complex API. The second way to develop video players/video converters in Windows is to write from scratch (i.e SMPlayer with Mencoder, VLC Player), or ultilize from other sources (i.e  FFMPEG) without depending on DirectX.</p>
<p>In other OSs, there are other codec standards and libraries! Easy to see that the number of codecs libraries/softwares are so large that people call them &#8216;codecs hell&#8217;. The keypoint to remmenber is there is no consistent way to deal with video codecs across OS. Consequently, we have to use various strategy to satisfy our needs. For who just want to use tool, it is quite straightforward to be done. In this section I introduce some useful tools.</p>
<h3>In Windows</h3>
<p><a href="http://www.virtualdub.org/"><strong>VirtualDub</strong></a> [open source, GUI]<a href="http://www.virtualdub.org/"> </a>is a video capture/processing utility for 32-bit Windows platforms (95/98/ME/NT4/2000/XP), licensed under the <a href="http://www.virtualdub.org/gpl.html">GNU General Public License (GPL)</a>. It has batch-processing capabilities for processing large numbers of files, frame editing and can be extended with <a href="http://www.virtualdub.org/virtualdub_filters.html">third-party video filters</a>.  VirtualDub is mainly geared toward processing AVI files, although it can read (not write) MPEG-1 and also handle sets of BMP images. In order to read MPEG-2 video, install this <a href="http://home.comcast.net/~fcchandler/Plugins/MPEG2/index.html" target="_blank">Plugin</a>. Virtual Dub does not provide rich command line options.</p>
<p>SUPER [freeware, GUI] Simplified Universal Player Encoder &amp; Renderer. A GUI to FFmpeg, MEncoder, MPlayer, x264, musepack, monkey&#8217;s audio, true audio, wavpack, ffmpeg2theora and the theora/vorbis RealProducer plugIn. Do not provide command line options.<span style="color:#000000;"><span style="font-size:x-small;"> </span></span></p>
<p><a href="http://www.mplayerhq.hu/design7/dload.html"><strong>MEncoder</strong> </a>[open source, cmd] is a free command line video decoding, encoding and filtering tool released under the GNU General Public License. It is a close sibling to <a title="MPlayer" href="http://en.wikipedia.org/wiki/MPlayer">MPlayer</a> and can convert all the formats that MPlayer understands into a variety of compressed and uncompressed formats using different codecs.</p>
<h3>In Linux</h3>
<p><a href="http://ffmpeg.org/"><strong>FFmpeg</strong> </a>[open source, cmd] is a complete, cross-platform solution to record, convert and stream audio and video. It includes<strong> libavcodec</strong> &#8211; the leading audio/video codec library. FFmpeg provides very powerful command line options.</p>
<h2>4. Video I/O programming</h2>
<p>In cases you need to work with frame on-the-fly or real-time application, let&#8217;s play with API. In particular, I am discussing about OpenCV and FFMPEG functionality. Why are they? The answer is straightforward: OpenCV is an outstanding computer vision library and FFMPEG is a matured, well-documented, stable codec library so far. OpenCV can call FFMPEG into operation but in some circumstances (as I told above), we need to what really happens behind the stage. If good libraries such as FFMPEG are not available (i.e in Windows), HighGUI interface needs to be well understood so that we can eliminate difficulties caused by codec hell.</p>
<h3>4.1 In Windows</h3>
<p>There are generally two ways to achieve the goal. The first one is to use DirectShow API from DirectX SDK. This is useful if you want to do complex tasks. There are tons of documentations and tutorials on using DirectShow alone or use with OpenCV. [add links here] Here I will use the minimal effort so that an application can decode a video effectively.</p>
<p>In Windows, HighGUI provides a interface function, i.e cvCaptureFromFile, to open a video file. At the lower level, it communicates with VFW (Video for Windows) API to decode that video. The fact is that VFW is an old video decode/encode API released in Windows 3.1. VFW was integrated into DirectX 5 and DirectShow later. Consequently, VFW is an outdated API and few codecs were supported by it. Fortunately, some external codec packages provide VFW compliant video codecs for us. The most reliable one I know is K-lite Codec Pack. Note that some codec packages can decode almost every codecs but I might not be available for other programs to rely on. For instance, installing KMPlayer or SMPlayer does not help to solve the codec problem of OpenCV. Therefore in order to read a MPEG-1 video file, please go to this site[link] and download the latest version of K-Lite Codec Pack Mega Package. Remember that just the Mega package provides VFW codecs. One more note, during the installation steps, please check on VFW codecs and check video codecs that you prefer to be supported on OpenCV.</p>
<p>Apparently, VFW does not provide very much codec standards. Supported codecs are:</p>
<ul>
<li>XviD [version 1.2.2] &#8211; an implementation of MPEG-4</li>
<li>DivX [version 6.8.5]- an implementation of MPEG-4</li>
<li>x264 [revision 1145]</li>
<li>On2 VP6 [version 6.4.2.0]</li>
<li>On2 VP7 [version 7.0.10.0]</li>
<li>Intel Indeo 4 [version 4.51.16.2]</li>
<li>Intel Indeo 5 [version 5.2562.15.54]</li>
<li>Intel I.263 [version 2.55.1.16]</li>
<li>huffyuv [version 2.1.1 CCE Patch 0.2.5] &#8211; free losses codec</li>
<li>DivX [version 3.11]- an implementation of MPEG-4</li>
<li>YV12 (Helix) [version 1.2]</li>
</ul>
<p>If you want to read an MPEG-2 video file, the fast way is to convert it to one of above formats. Recommended tool in Windows is SUPER, and in Linux is FFMPEG, i.e $ ffmpeg -i video.mpeg -vcodec mpeg4 video.avi.</p>
<p>For the final word, once VFW codecs were installed, feel free to use cvCreateCaptureFromFile to open a video file. However, if you want to write video file, there is one way: write it as an uncompressed video (raw format), i.e CV_FOURCC_DEFAULT.</p>
<h3>4.2 In Linux</h3>
<p>There are exist couples of tutorials on programming with FFMPEG, , . As FFMPEG releases recent versions, some of tutorials were out-of-date. I searched I found an up-to-date version that allows opening a video file, reading frame by frame in a while-loop. This simple example was modified into a C++ class in which Open() and NextFrame() are called to open a video file, read the next frame. This class is called Video, and you can change to the name you prefer. This code is compiled well under GCC 4.xx, Ubuntu Linux, and FFMPEG 0.5.</p>
<pre class="brush: css;">
/*
 * video.h
 *
 *  Created on: Jul 4, 2009
 *      Author: phong
 */

#ifndef VIDEO_H_
#define VIDEO_H_
#ifdef HAVE_LINUX_FFMPEG
extern &quot;C&quot;
{
	#include
	&lt;libavcodec/avcodec.h&gt;
	#include
	&lt;libavformat/avformat.h&gt;
	#include
	&lt;libswscale/swscale.h&gt;
}

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;stdbool.h&gt;

/**
 * @brief Video class used to decode all type
 * of codecs. Dedicated to Linux OS.
 * This class is an interface between high level
 * function call and FFMPEG API.
 *
 * @note Not available in Windows
 */
struct Video
{
	AVFormatContext *pFormatCtx;
	int             i, videoStream;
	AVCodecContext  *pCodecCtx;
	AVCodec         *pCodec;
	AVFrame         *pFrame;
	AVFrame         *pFrameRGB;
	AVPacket        packet;
	int             frameFinished;
	int             numBytes;
	uint8_t         *buffer;

	int width;
	int height;
	int step;

	Video ():
		pFormatCtx(0), pCodecCtx(0), pCodec(0),
		pFrame(0), pFrameRGB(0), buffer(0)
		{};

	int SaveFrame(AVFrame *pFrame, int width, int height, int iFrame);

	int Open (const char* filename);

	int NextFrame ();

	int IsEnd ();

	int Close ();
};

#endif
#endif /* VIDEO_H_ */
</pre>
<pre class="brush: css;">
/*
 * video.cpp
 *
 *  Created on: Jul 5, 2009
 *      Author: phong
 */
#include &quot;video.h&quot;

// avcodec_sample.0.5.0.c

// A small sample program that shows how to use libavformat and libavcodec to
// read video from a file.
//
// This version is for the 0.4.9+ release of ffmpeg. This release adds the
// av_read_frame() API call, which simplifies the reading of video frames
// considerably.
//
// Use
//
// gcc -o avcodec_sample.0.5.0 avcodec_sample.0.5.0.c -lavformat -lavcodec -lavutil -lswscale -lz -lbz2
//
// to build (assuming libavformat, libavcodec, libavutil, and swscale are correctly installed on
// your system).
//
// Run using
//
// avcodec_sample.0.5.0 myvideofile.mpg
//
// to write the first five frames from &quot;myvideofile.mpg&quot; to disk in PPM
// format.
#ifdef HAVE_LINUX_FFMPEG
int Video::Open (const char* filename)
{

    // Register all formats and codecs
    av_register_all();

    // Open video file
    if(av_open_input_file(&amp;pFormatCtx, filename, NULL, 0, NULL)!=0)
    {
    	fprintf (stderr, &quot;Cound not open file\n&quot;);
    	return 0;
    }
    // Retrieve stream information
    if(av_find_stream_info(pFormatCtx)&lt;0)
    {
    	fprintf (stderr, &quot;Could not find stream information\n&quot;);
    	return 0;
    }

    // Dump information about file onto standard error
    dump_format(pFormatCtx, 0, filename, false);

    // Find the first video stream
    videoStream=-1;
    for(i=0; i
&lt;pFormatCtx-&gt;nb_streams; i++)
        if(pFormatCtx-&gt;streams[i]-&gt;codec-&gt;codec_type==CODEC_TYPE_VIDEO)
        {
            videoStream=i;
            break;
        }
    if(videoStream==-1)
    {
    	fprintf (stderr, &quot;Did not find a video stream\n&quot;);
    	return 0;
    }

    // Get a pointer to the codec context for the video stream
    pCodecCtx=pFormatCtx-&gt;streams[videoStream]-&gt;codec;

    // Find the decoder for the video stream
    pCodec=avcodec_find_decoder(pCodecCtx-&gt;codec_id);
    if(pCodec==NULL)
    {
    	fprintf (stderr, &quot;Codec not found\n&quot;);
    	return 0;
    }

    // Open codec
    if(avcodec_open(pCodecCtx, pCodec)&lt;0)
    {
    	fprintf (stderr, &quot;Could not open codec\n&quot;);
    	return 0;
    }

    // Hack to correct wrong frame rates that seem to be generated by some codecs
    if(pCodecCtx-&gt;time_base.num&gt;1000 &amp;&amp; pCodecCtx-&gt;time_base.den==1)
		pCodecCtx-&gt;time_base.den=1000;

    // Allocate video frame
    pFrame=avcodec_alloc_frame();

    // Allocate an AVFrame structure
    pFrameRGB=avcodec_alloc_frame();
    if(pFrameRGB==NULL)
    {
    	fprintf(stderr, &quot;Could not allocate memory\n&quot;);
    	return 0;
    }

    // Determine required buffer size and allocate buffer
    numBytes=avpicture_get_size(PIX_FMT_RGB24, pCodecCtx-&gt;width,
        pCodecCtx-&gt;height);

    buffer = (uint8_t*)malloc(numBytes);

    // Assign appropriate parts of buffer to image planes in pFrameRGB
    avpicture_fill((AVPicture *)pFrameRGB, buffer, PIX_FMT_RGB24,
        pCodecCtx-&gt;width, pCodecCtx-&gt;height);

    width = pCodecCtx-&gt;width;
    height = pCodecCtx-&gt;height;
    step = pFrameRGB-&gt;linesize[0];

    return 1;
}

int Video::NextFrame()
{
    // Read frames and save first five frames to disk

    while(av_read_frame(pFormatCtx, &amp;packet)&gt;=0)
    {
        // Is this a packet from the video stream?
        if(packet.stream_index==videoStream)
        {
            // Decode video frame
            avcodec_decode_video(pCodecCtx, pFrame, &amp;frameFinished,
                packet.data, packet.size);

            // Did we get a video frame?
            if(frameFinished)
            {
				static struct SwsContext *img_convert_ctx;

				// Convert the image into YUV format that SDL uses
				if(img_convert_ctx == NULL) {
					int w = pCodecCtx-&gt;width;
					int h = pCodecCtx-&gt;height;

					img_convert_ctx = sws_getContext(w, h,
									pCodecCtx-&gt;pix_fmt,
									w, h, PIX_FMT_RGB24, SWS_BICUBIC,
									NULL, NULL, NULL);
					if(img_convert_ctx == NULL) {
						fprintf(stderr, &quot;Cannot initialize the conversion context!\n&quot;);
						return 0;
					}
				}
				int ret = sws_scale(img_convert_ctx, pFrame-&gt;data, pFrame-&gt;linesize, 0,
						  pCodecCtx-&gt;height, pFrameRGB-&gt;data, pFrameRGB-&gt;linesize);

				av_free_packet(&amp;packet);

				if (ret &gt; 0)
					return 1;
				else
				{
					fprintf (stderr, &quot;Sws_Scale failed\n&quot;);
					return 0;
				}
            }
        }
    }

    return 0;
}

int Video::Close()
{
	if (buffer == 0 || pFrameRGB == 0 || pFrame == 0)
		return 0;

    // Free the RGB image
    free(buffer);
    av_free(pFrameRGB);

    // Free the YUV frame
    av_free(pFrame);

    // Close the codec
    avcodec_close(pCodecCtx);

    // Close the video file
    av_close_input_file(pFormatCtx);

	return 1;
}

int Video::SaveFrame(AVFrame *pFrame, int width, int height, int iFrame)
{
    FILE *pFile;
    char szFilename[32];
    int  y;

    // Open file
    sprintf(szFilename, &quot;frame%d.ppm&quot;, iFrame);
    pFile=fopen(szFilename, &quot;wb&quot;);
    if(pFile==NULL)
        return 1;

    // Write header
    fprintf(pFile, &quot;P6\n%d %d\n255\n&quot;, width, height);

    // Write pixel data
    for(y=0; y&lt;height; y++)
        fwrite(pFrame-&gt;data[0]+y*pFrame-&gt;linesize[0], 1, width*3, pFile);

    // Close file
    return fclose(pFile);
}
#endif
</pre>
<p>In order to open and read frame by frame, and convert raw frame into IplImage structure (OpenCV) optionally, I use below code snippet:</p>
<pre class="brush: css;">
/*
 * videotest.cpp
 *
 *  Created on: Jul 13, 2009
 *      Author: phong
 */
#include &quot;video.h&quot;
#include &lt;iostream&gt;
#include &lt;opencv/cxcore.h&gt;
#include &lt;opencv/cv.h&gt;
#include &lt;opencv/highgui.h&gt;

using namespace std;

int videotest (int argc, char* argv[])
{
 if (argc &lt; 1)
 {
 cout &lt;&lt; &quot;Usage: &lt;videotest&gt; &lt;videofilename&gt;&quot; &lt;&lt; endl;
 return 0;
 }

 Video video;
 if (!video.Open(argv[1]))
 {
 cout &lt;&lt; &quot;Could not open video file&quot; &lt;&lt; endl;
 return 1;
 }

 cvNamedWindow (&quot;videotest&quot;);

 IplImage frame;
 cvInitImageHeader (&amp;frame, cvSize(video.width,video.height), IPL_DEPTH_8U, 3);
 IplImage* grayscale = cvCreateImage (cvGetSize(&amp;frame), IPL_DEPTH_8U, 1);

 while (video.NextFrame())
 {
 cvSetData (&amp;frame, video.pFrameRGB-&gt;data[0], video.pFrameRGB-&gt;linesize[0]);
 cvCvtColor (&amp;frame, grayscale, CV_RGB2GRAY);

 cvShowImage (&quot;videotest&quot;, grayscale);
 cvWaitKey (10);
 }

 video.Close();
 cvReleaseImage (&amp;grayscale);

 return 1;
}
</pre>
<h2>5. Last word</h2>
<p>I will update this tutorial so that one can encode and decode video formats using ffmpeg. By now,  just decoding function is available. Personally, I  think it is not a good idea when depending on a proxy library, such as OpenCV, to use another library. This tutorial can help avoiding such thing. Anyway, use if you like.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vodinhphong.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vodinhphong.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vodinhphong.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vodinhphong.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vodinhphong.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vodinhphong.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vodinhphong.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vodinhphong.wordpress.com/164/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vodinhphong.wordpress.com/164/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vodinhphong.wordpress.com/164/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=164&subd=vodinhphong&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://vodinhphong.wordpress.com/2009/08/25/survive-from-codecs-hell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/51f255d405a0b3dddbbb3bd29282512c?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vodinhphong</media:title>
		</media:content>
	</item>
		<item>
		<title>HTK Training</title>
		<link>http://vodinhphong.wordpress.com/2009/07/19/htk-training/</link>
		<comments>http://vodinhphong.wordpress.com/2009/07/19/htk-training/#comments</comments>
		<pubDate>Sun, 19 Jul 2009 11:24:55 +0000</pubDate>
		<dc:creator>vodinhphong</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://vodinhphong.wordpress.com/?p=168</guid>
		<description><![CDATA[Lý do
Anh em học môn ASR đã khổ sở hơn một tháng nay vào việc huấn luyện HTK. Vấn đề không khó nhưng cần sự tỉ mỉ, kiên trì và hướng dẫn từ thầy Hạ. Sau khi hoàn thành việc huấn luyện ở trường, mình nghĩ nên làm một cái tool, trước hết là phục [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=168&subd=vodinhphong&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><h2>Lý do</h2>
<p>Anh em học môn ASR đã khổ sở hơn một tháng nay vào việc huấn luyện HTK. Vấn đề không khó nhưng cần sự tỉ mỉ, kiên trì và hướng dẫn từ thầy Hạ. Sau khi hoàn thành việc huấn luyện ở trường, mình nghĩ nên làm một cái tool, trước hết là phục vụ cho mình, và cho những ai chưa huấn luyện xong HTK. Với quĩ thời gian một ngày rưỡi ngắn ngủi tranh thủ “xuất bản” để tất cả chúng ta có kịp thời gian nộp bài vào thứ 7 tới, mình đã vọc lại mớ dòng lệnh và tìm hiểu thêm ý nghĩa của nó. Những gì trình bày dưới đây là từ hướng dẫn của thầy Hạ  và những kinh nghiệm làm sai của mình lẫn  bạn bè. Ngoài ra mình có giải thích thêm một số ý nghĩa thông qua việc đọc HTK Book. Nhiều vấn đề còn chưa hiểu thấu đáo cũng như vốn từ dịch thuật củ chuối chắc chắn để lại nhiều thắc mắc các bạn khi đọc bài này. Nhưng rất mong các bạn đóng góp cho sự hoàn thiện của tutorial để anh em ta có thể đỡ “trâu bò” hơn cũng như các bạn năm sau đỡ khổ. Mong anh em ủng hộ. Xin cảm ơn.</p>
<h2>Thông tin chung</h2>
<h3>1.Cấu trúc thư mục</h3>
<p>Tạo thư mục ngoài cùng với tên tùy ý (trong bài này giả định tất cả để trong thư mục gốc C:\, tuy nhiên điều này không được khuyến khích).</p>
<p>Các thư mục con gồm có</p>
<ul>
<li>·         hmm0 – hmm15: các thư mục chứa file MMF.</li>
<li>·         cfg (config): chứa các file config cho một số lệnh.</li>
<li>·         ins (instruction): chứa các file .hed và .led.</li>
<li>·         mlf (master label file): chứa các file .mlf.</li>
<li>·         ph (phones): chứa các file phones: mono, tri.</li>
<li>·         pl (Perl script): chứa các file script viết bằng Perl.</li>
<li>·         txt (other files): chứa các file linh tinh như từ điển, danh sách file, wdnet, gram, train…</li>
<li>·         wave: chứa các file Wave và mfcc.</li>
</ul>
<h3>2.Một vài lưu ý</h3>
<ul>
<li>Nhớ thiết lập biến môi trường cho HTKTools và Perl.</li>
<li>Các hướng dẫn trong bài này đều giả sử các tập tin âm thanh đã được thu trước đó.</li>
<li>Các tiến trình là bán tự động, yêu cầu người dùng gõ các lệnh thi hành các file Perl.</li>
<li>Thư mục yourproject chứa một số tập tin khời đầu tối thiểu để người dùng có thể bắt đầu thực hiện theo chỉ dẫn.</li>
<li>Thư mục sample chứa các tập tin đã hoàn chỉnh các bước trong bài này.</li>
<li>Bạn có thể dùng allin1.bat (all in one) để chạy tự động các lệnh.</li>
<li>Ngườì dùng nếu tìm thấy lỗi làm ơn thông báo đến mọi người khác, và nếu có thể, xin gởi mail về địa chỉ <a href="mailto:phongkhtn@yahoo.com">phongkhtn@yahoo.com</a>. Xin cảm ơn nhiều.</li>
</ul>
<h2>Chuẩn bị dữ liệu</h2>
<h3>1.Tạo cấu trúc văn phạm</h3>
<p>Cấu trúc văn phạm là một đồ thị có hướng tổng quát. Nó chứa các cấu trúc câu có thể có trong ngữ cảnh của ứng dụng mà ta muốn dùng ASR. Ví dụ nếu ta muốn áp dụng ASR trong môi trường chứng khoán, cấu trúc văn phạm phải có những từ (word) và câu (sentence, cấu thành từ word) được thiết kế sao cho nó có thể trình bày tất cả các mẫu câu mà một người trong ngữ cảnh đó có thể nói về.</p>
<p>Tập tin cấu trúc văn phạm</p>
<blockquote><p>//gram.txt</p>
<p>$digit = moojt | hai | ba | boosn | nawm | sasu | bary | tasm | chisn | khoong;</p>
<p>(&lt;$digit&gt;)</p></blockquote>
<p align="center">
<p>C:\&gt;HParse txt/gram.txt txt/wdnet.txt</p>
<p>wordnet tạo ra sẽ có định dạng giống file lattice:</p>
<blockquote><p>//wdnet.txt</p>
<p>VERSION=1.0</p>
<p>N=13   L=31</p>
<p>I=0    W=khoong</p>
<p>I=1    W=!NULL</p>
<p>I=2    W=chisn</p>
<p>I=3    W=tasm</p>
<p>I=4    W=bary</p>
<p>I=5    W=sasu</p>
<p>I=6    W=nawm</p>
<p>I=7    W=boosn</p>
<p>I=8    W=ba</p>
<p>I=9    W=hai</p>
<p>I=10   W=moojt</p>
<p>I=11   W=!NULL</p>
<p>I=12   W=!NULL</p>
<p>J=0     S=1    E=0</p>
<p>J=1     S=12   E=0</p>
<p>J=2     S=0    E=1</p>
<p>J=3     S=2    E=1</p>
<p>J=4     S=3    E=1</p>
<p>J=5     S=4    E=1</p>
<p>J=6     S=5    E=1</p>
<p>J=7     S=6    E=1</p>
<p>J=8     S=7    E=1</p>
<p>J=9     S=8    E=1</p>
<p>J=10    S=9    E=1</p>
<p>J=11    S=10   E=1</p>
<p>J=12    S=1    E=2</p>
<p>J=13    S=12   E=2</p>
<p>J=14    S=1    E=3</p>
<p>J=15    S=12   E=3</p>
<p>J=16    S=1    E=4</p>
<p>J=17    S=12   E=4</p>
<p>J=18    S=1    E=5</p>
<p>J=19    S=12   E=5</p>
<p>J=20    S=1    E=6</p>
<p>J=21    S=12   E=6</p>
<p>J=22    S=1    E=7</p>
<p>J=23    S=12   E=7</p>
<p>J=24    S=1    E=8</p>
<p>J=25    S=12   E=8</p>
<p>J=26    S=1    E=9</p>
<p>J=27    S=12   E=9</p>
<p>J=28    S=1    E=10</p>
<p>J=29    S=12   E=10</p>
<p>J=30    S=1    E=11</p>
<p align="left">Để rõ hơn, xem cấu trúc file lattice SLF trong chương 20 phần phụ lục HTK Book.</p>
</blockquote>
<h3>2.Tạo từ điển</h3>
<p align="left">Muốn xây dựng từ điển thì bước đầu tiên là tập hợp tất cả các từ được dùng trong ngữ cảnh. Các từ này được xếp thứ tự alphabet trong tập tin và phải được phiên âm tương ứng. Qui cách phiên âm rất quan trọng, có thể hạ thấp chất lượng nhận dạng nếu không cẩn thận.</p>
<blockquote>
<p align="left">
<p align="left">
<p>#dict.dct</p>
<p>ba           b     a     sp<br />
bary   b     ar    y     sp<br />
boosn  b     oos   n     sp<br />
chisn  ch    is    n     sp<br />
hai          h     a     i     sp<br />
khoong       kh    oo    ng    sp<br />
moojt  m     ooj   t     sp<br />
nawm   n     aw    m     sp<br />
sasu   s     as    u     sp<br />
tasm   t     as    m     sp<br />
silence      sil</p></blockquote>
<p align="left"><em>Lưu ý có phone sp (short pause) trong từ điển.</em></p>
<p>Đối với bộ từ vựng nhỏ thì tốt nhất là gõ tay hoặc copy từ bộ từ điển có sẵn.<br />
Sau khi có từ điển thì chúng ta có thể tạo ra bộ phiên âm, gọi là monophones0</p>
<blockquote>
<p align="left">HDMan -m -w txt/wlist -n ph/monophones -l dlog txt/dict txt/dict.dct</p>
</blockquote>
<p align="left">Giải thích</p>
<p><strong>wlist</strong>: đầu vào, đơn giản là danh sách các từ được sử dụng trong wordnet, mỗi từ một dòng. wlist vẫn chưa có cho đến khi ta đến bước này. Để tạo wlist, ta cần có prompts. Prompts là gì ? Đây là một tập tin chứa các đoạn text yêu cầu utterance đọc vào. Utterance phải đọc từng câu trong prompts và lưu vào file .wav có tên tương ứng.</p>
<p align="left"><strong>T</strong><strong>ạ</strong><strong>o Prompts</strong></p>
<blockquote><p>C:\&gt;HSGen.exe -l -n 10 txt/wdnet.txt txt/dict.dct &gt;&gt; txt/prompts</p></blockquote>
<p align="left">Ở đây ta tạo tập tin prompt có tất cả 10 câu.</p>
<blockquote>
<p align="left">#prompts</p>
<p align="left">001 bary chisn chisn hai boosn moojt khoong<br />
002 khoong boosn tasm bary bary nawm khoong<br />
003 ba nawm moojt chisn sasu bary khoong<br />
004 bary boosn bary tasm nawm sasu nawm<br />
005 bary nawm ba chisn boosn chisn nawm<br />
006 bary boosn boosn khoong moojt tasm nawm<br />
007 moojt hai bary moojt bary tasm khoong<br />
008 khoong boosn hai bary sasu bary nawm<br />
009 moojt ba ba hai chisn nawm nawm<br />
010 tasm nawm sasu tasm bary bary nawm</p></blockquote>
<p align="left">Sau khi đã tạo prompts thì ta sẽ tạo được wlist bằng Perl script như sau:</p>
<blockquote>
<p align="left">C:\&gt; perl pl/prompts2wlist.pl txt/prompts txt/wlist</p>
</blockquote>
<p align="left">Như vậy ta đã có wlist, có cấu trúc như sau:</p>
<blockquote>
<p align="left">#wlist</p>
<p>ba<br />
bary<br />
boosn<br />
chisn<br />
hai<br />
khoong<br />
khoong<br />
moojt<br />
nawm<br />
sasu<br />
tasm</p>
<p align="left"><strong>monophones1</strong>: đầu ra, danh sách các phones được dùng để phiên âm trong từ điển.</p>
<p>#monophones1</p>
<p>sil<br />
b<br />
a<br />
sp<br />
ar<br />
y<br />
oos<br />
n<br />
ch<br />
is<br />
h<br />
i<br />
kh<br />
oo<br />
ng<br />
m<br />
ooj<br />
t<br />
aw<br />
s<br />
as<br />
u</p></blockquote>
<p><strong>beep</strong>: Đầu vào, từ điển phiên âm.</p>
<p align="left"><strong>names</strong>: Đầu vào, từ điển tên riêng (tùy chọn).<br />
<strong>dict</strong>: Đầu ra, từ điển mới được tạo, tổng hợp từ beep. names và wlist.</p>
<p><strong><em>Lưu ý</em></strong>: <em>Nếu beep có silence thì HDMan sẽ bỏ silence đi, làm cho dict mới bị thiếu silence. Vì sao ? Bởi vì HDMan sẽ tìm trong tất cả các từ trong beep mà nó có trong wlist. Mà wlist lại được tạo từ prompts. Prompts không chứa silence. Do đó, wlist, và hầu quả là, dict mới tạo cũng không chứa silence. Tốt nhất là ta cứ tạo dict cho có hình thức, nhưng không dùng. Ta chỉ dùng monophones0 mà nó tạo ra. Ở các bước sau ta chỉ dùng beep thôi. Nhưng nếu beep thực sự quá lớn trong khi wlist lại nhỏ, ta hiệu chỉnh lại dict để dùng. Việc hiệu chỉnh rất đơn giản, chỉ cần thêm silence – sil là được. Việc này được tự động hóa bằng Perl script.</em></p>
<p>Thêm nữa, việc đặt tên monophones0 hay monophones1 trong dòng tham số lệnh là không quan trọng. Dù thế nào thì sau khi tạo, monophones# sẽ luôn có âm sp và thiếu sil. Thêm sil và giữ nguyên sp thì có được monophones1.</p>
<p>Thêm sil và xóa sp thì có được monophones0.</p>
<blockquote>
<p align="left">C:\&gt;perl pl/mkMonophones.pl ph/monophones ph/monophones0 ph/monophones1</p>
</blockquote>
<p><strong> </strong></p>
<p><strong>Giải thích</strong></p>
<blockquote><p><strong>createMonophones.pl</strong>: Đầu vào, perl script.</p>
<p><strong>monophones</strong>: Đầu vào, tập tin monophones sau được tạo ở HDMan.</p>
<p><strong>monophones0</strong>: Đầu ra, tập tin monophones0 sau khi hiệu chỉnh.</p>
<p><strong>monophones1</strong>: Đầu ra, tập tin monophones0 sau khi hiệu chỉnh.</p></blockquote>
<h3>3.Thu dữ liệu</h3>
<p>Bước này bao gồm các công việc sau:</p>
<ol>
<li>Thu âm giọng đọc lưu trữ thành các tập tin .wav.</li>
<li>Viết tập tin transcript tương ứng với mỗi file.</li>
</ol>
<p align="left">hoặc:</p>
<ol>
<li>Tạo tập tin transcript (đã được tạo tự động bằng HSGen trong bước trước, chính là tập tin prompts).</li>
<li>Thu âm giọng đọc lưu trữ thành các tập tin .wav bằng cách đọc theo transcription trong tập tin prompts.</li>
</ol>
<p align="left">Như vậy ta có tất cả 2 cách thu dữ liệu:</p>
<p>n        Tự động tạo prompts và ghi âm theo.</p>
<p align="left">n        Ghi âm và tạo prompts sau.</p>
<p align="left">4.Tạo tập tin transcription</p>
<p>HTK không sử dụng file prompts cho xử lý sau này. Ta cần tạo ra một số file khác, cụ thể là dạng file MLF (Master Label File, tham khảo thêm trong HTK Book).</p>
<p>Có hai loại tập tin MLF cần tạo ra:</p>
<p><strong> </strong></p>
<p align="left"><strong>MLF ở mức từ (word), tức là tập tin words.mlf định dạng tập tin prompts theo chuẩn MLF:</strong></p>
<p>//words.mlf</p>
<p>#!MLF!#<br />
&#8220;*/001.lab&#8221;<br />
bary<br />
chisn<br />
chisn<br />
hai<br />
boosn<br />
moojt<br />
khoong<br />
.<br />
&#8220;*/002.lab&#8221;<br />
khoong<br />
boosn<br />
tasm<br />
bary<br />
bary<br />
nawm<br />
khoong<br />
&#8230;&#8230;&#8230;&#8230;&#8230;.</p>
<p align="left">Làm sao tạo ?</p>
<p align="left">C:\&gt; Perl pl/prompts2mlf.pl mlf/words.mlf txt/prompts</p>
<p align="left"><strong> </strong></p>
<p align="left"><strong>MLF ở mức âm (phones), tức là định dạng tập tin phones0.mlf và phones1.mlf theo MLF:</strong></p>
<p><strong> </strong></p>
<p>//phones1.mlf</p>
<p>&#8220;*/002.lab&#8221;<br />
sil<br />
kh<br />
oo<br />
ng<br />
b<br />
oos<br />
n<br />
t<br />
as<br />
m<br />
b<br />
ar<br />
y<br />
b<br />
ar<br />
y<br />
n<br />
aw<br />
m<br />
kh<br />
oo<br />
ng<br />
sil<br />
.</p>
<p align="left"><em>Thực ra, phones#.mlf là dạng khai triển của words.mlf ở mức âm.<br />
</em><br />
Làm sao tạo?</p>
<p align="left">C:\&gt; HLEd -l * -d txt/dict.dct -i mlf/phones0.mlf ins/mkphones0.led mlf/words.mlf</p>
<p align="left"><strong><em>Giải thích</em></strong></p>
<p align="left"><strong>-d dict</strong>: đầu vào, từ điển ta đã có từ trước.<br />
<strong>words.mlf</strong>: đầu vào, vừa được tạo ở trên.<br />
<strong>-i phones0.mlf</strong>: đầu ra.<br />
<strong>mkphones0.led</strong>: chứa các lệnh script để chuyển words.mlf thành phones0.mlf</p>
<p align="left">#mkphones0.led<br />
EX<br />
IS sil sil<br />
DE sp</p>
<p align="left"><strong><em>Giải thích</em></strong></p>
<p align="left"><strong>EX</strong>: Thay thế mỗi từ trong words.mlf bằng phiên âm tương ứng trong từ điển dict.<br />
<strong>IS</strong>: Chèn mô hình lặng (silence &#8211; sil) vào đầu và cuối của một từ.<br />
<strong>DE</strong>: Xóa tất cả các short pause (sp) được thêm vào sau lệnh EX.</p>
<p align="left"><strong>Tạo phones1.mlf</strong></p>
<p>Vì sao lại cần phones0.mlf và phones1.mlf ? Ai mà biết HTK muốn gì ! Nhưng đại khái là phones0.mlf không chứa âm sp, còn phones1.mlf thì có. Thêm sp vào chắc với mục đích tăng tính hiệu quả cho quá trình nhận dạng sau này.</p>
<p>Cách tạo phones1.mlf ?</p>
<p align="left">C:\&gt; HLEd -l * -d txt/dict.dct -i mlf/phones1.mlf ins/mkphones1.led mlf/words.mlf</p>
<p align="left"><strong><em>Giải thích</em></strong></p>
<p align="left"><strong>-d dict</strong>: đầu vào, từ điển ta đã có từ trước.<br />
<strong>words.mlf</strong>: đầu vào, vừa được tạo ở trên.<br />
<strong>-i phones1.mlf</strong>: đầu ra.<br />
<strong>mkphones1.led</strong>: chứa các lệnh script để chuyển words.mlf thành phones1.mlf</p>
<p align="left">//mkphones1.led<br />
EX<br />
IS sil sil</p>
<p align="left"><strong><em>Nhận xét</em></strong>: <em>phones1.mlf được tạo đơn giản bằng cách bỏ lệnh xóa sp trong mkphones1.led</em>.</p>
<p align="left">5.Mã hóa dữ liệu</p>
<p>Tại bước này, các file âm thanh mà ta đã thu ở bước 3 sẽ được rút đặc trưng. HTK hỗ trợ 2 dạng đặc trưng MFCC và LPC. MFCC nên được sử dụng vì nó tốt hơn (không tin thì thử LPC xem). Các thông tin cấu hình khác được lưu trong tập tin cấu hình config_HCopy.txt:</p>
<p align="left">#config_HCopy.txt<br />
#coding parameters &#8211; HCopy<br />
SOURCEKIND = WAVEFORM<br />
SOURCEFORMAT = WAV<br />
TARGETKIND = MFCC_0_D_A<br />
TARGETRATE = 100000.0<br />
SAVECOMPRESSED = T<br />
SAVEWITHCRC = T<br />
WINDOWSIZE = 250000.0<br />
USEHAMMING = T<br />
PREEMCOEF = 0.97<br />
NUMCHANS = 26<br />
CEPLIFTER = 22<br />
NUMCEPS = 12<br />
ENORMALISE = F</p>
<p align="left">
<p align="left">C:\&gt;HCopy -T 1 -C cfg/HCopy.cfg -S txt/listwavmfc</p>
<p align="left"><strong><em>Giải thích</em></strong></p>
<p><strong>-S listwavmfc</strong>: Đầu vào, chứa danh sách file wave &#8211; file mfc tương ứng.</p>
<p><strong>Làm sao t</strong><strong>ạ</strong><strong>o ?</strong></p>
<p align="left">C:&gt;perl pl/listwavmfc.pl wave txt/listwavmfc</p>
<p align="left"><strong><em>Giải thích</em></strong></p>
<p align="left"><strong>listwavmfc.pl</strong>: Tập tin script.<br />
<strong>Wave</strong>: Đầu vào, thư mục chứa các tập tin .wav, ở đây giả sử là nằm ở thư mục C:\.<br />
<strong>listwavmfc</strong>: Đầu ra, tập tin text chứa danh sách file wave.</p>
<p>Tạo monophone HMMs</p>
<p align="left">Ở giai đoạn này, chúng ta sẽ tạo và huấn luyện cho monophones HMM xấp xỉ bằng một hàm Gauss. Đầu tiên, tất cả các xấp xỉ của các âm đều có hàm Gauss như nhau (kỳ vọng và phương sai). Sau khi được huấn luyện, âm sp và silence lần lượt được thêm vào. Cuối cùng, chúng được huấn luyện lại.</p>
<h3>1.Tạo Flat Start Monophones</h3>
<p>Tại bước này, chúng ta sẽ định nghĩa ra một “đề cương” cho HMM (chữ đề cương “dịch ” từ chữ prototype). Việc gán thông tin nào cho prototype là không quan trọng, chủ yếu là xây dựng một cái khung. Một mô hình tốt mà HTK Book đề xuất là mô hình 3 trạng thái trái – giữa – phải tuần tự.</p>
<p align="center">
<p>#proto</p>
<p>~o  39</p>
<p>~h &#8220;proto&#8221;</p>
<p>5</p>
<p>2</p>
<p>39</p>
<p>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</p>
<p>39</p>
<p>1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1</p>
<p>3</p>
<p>39</p>
<p>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</p>
<p>39</p>
<p>1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1</p>
<p>4</p>
<p>39</p>
<p>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0</p>
<p>39</p>
<p>1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1</p>
<p>5</p>
<p>0.0          1.0         0.0         0.0         0.0</p>
<p>0.0          0.6         0.4         0.0         0.0</p>
<p>0.0          0.0         0.6         0.4         0.0</p>
<p>0.0          0.0         0.0         0.7         0.3</p>
<p>0.0          0.0         0.0         0.0         0.0</p>
<p><strong><em>Giải thích thêm</em></strong></p>
<p>Số chiều Vector đặc trưng  MFCC_0_D_A là 39 = 13 tĩnh (MFCC_0) + 13 hệ số delta + 13 hệ số acceleration.</p>
<p>Tuy là có 5 trạng thái nhưng 2 trạng thái đầu và cuối không xét.</p>
<p>Các vector kỳ vọng (mean) và phương sai (variance) đều bằng nhau.</p>
<p>Ma trận xác suất chuyển được khởi tạo theo kinh nghiệm.</p>
<p><strong>1.1.Làm sao tạo proto?</strong></p>
<p>C:\&gt;HCompV –C cfg/HCompV.cfg –f 0.01 –m –S txt/train.scp –M hmm0 hmm0/proto</p>
<p><strong><em>Giải thích</em></strong></p>
<p><strong>-C HCompV.cfg</strong>: Đầu vào, tập tin cấu hình để HCompV tham khảo, có nội dung như sau:</p>
<p>#HCompV.cfg</p>
<p>TARGETKIND = MFCC_0_D_A</p>
<p>TARGETRATE = 100000.0</p>
<p>SAVECOMPRESSED = T</p>
<p>SAVEWITHCRC = T</p>
<p>WINDOWSIZE = 250000.0</p>
<p>USEHAMMING = T</p>
<p>PREEMCOEF = 0.97</p>
<p>NUMCHANS = 26</p>
<p>CEPLIFTER = 22</p>
<p>NUMCEPS = 12</p>
<p>ENORMALISE = F</p>
<p><strong>-f 0.01</strong>: Tham số đầu vào, yêu cầu xuất file vFloor chứa vector floor có giá trị bằng 0.01 vector variance.</p>
<p><strong>-S train.scp</strong>: Đầu vào, chứa danh sách các tập tin đặc trưng mfc. Cách đơn giản để tạo file train.scp là dùng lệnh dir của hệ thống (có thể dùng Perl).</p>
<p><strong>Làm sao?</strong></p>
<p>C:\Wave&gt;dir /B *.mfc &gt; ../train.scp</p>
<p>Hoặc</p>
<p>C:\&gt;dir /B Wave\*.mfc &gt; train.scp</p>
<p>Tuy nhiên ta phải điều chỉnh tập tin train.scp bằng tay để thêm đường dẫn tuyệt đối cho từng tập tin .mfc được liệt kê trong đó. Ta có thể tránh điều này bằng cách sử dụng Perl script:</p>
<p>C:\&gt;perl pl/mkTrainFile.pl wave txt/train.scp</p>
<p><strong><em>Giải thích</em></strong></p>
<p><strong>CreateTrainFile.pl</strong>: Đầu vào, file script Perl.</p>
<p><strong>Wave</strong>: Đầu vào, thư mục chứa các tập tin .mfc.</p>
<p><strong>train.scp</strong>: Đầu ra, tên tập tin chứa danh sách file .mfc.</p>
<p><strong>-M hmm0</strong>: Đầu vào, thư mục mà HCompV sẽ dùng để chứa proto (phải được tạo trước).</p>
<p><strong>Hmm0/proto.txt</strong>: Đầu vào, tập tin chứa cấu trúc proto như phần trên đã trình bày (nhớ là lưu trong thư mục hmm0).</p>
<h3>1.2.</h3>
<p>Sau khi chạy HCompV, hai tập tin proto và vFloors được tạo ra trong thư mục hmm0. Thông thường, tiết mục tiếp theo là “cắt may” thủ công tập tin hmmdefs từ proto và monophones0. Tuy nhiên, chúng ta có script để tự động hóa chuyện này.</p>
<p><strong>Tạo macros tự động</strong></p>
<p>C:\&gt;perl pl/mkMacrosFile.pl hmm0/vFloors hmm0/macros</p>
<p><strong><em>Giải thích</em></strong></p>
<p><strong>createMacrosFile.pl</strong>: Đầu vào, perl script.</p>
<p><strong>hmm0/vFloors</strong>: Đầu vào, file vFloors được tạo từ lệnh HCompV ở trên.</p>
<p><strong>hmm0/macros</strong>: Đầu ra, file macros cần tạo.</p>
<p><strong>Tạo hmmdefs tự động</strong></p>
<p align="left">C:\&gt;perl pl/mkHmmdefsFile.pl hmm0/proto ph/monophones0 hmm0/hmmdefs</p>
<p><strong><em>Giải thích</em></strong></p>
<p><strong>createHmmdefsFile.pl</strong>: Đầu vào, perl script;</p>
<p><strong>hmm0/proto</strong>: Đầu vào, tập tin proto có được từ bước trước.</p>
<p><strong>monophones0</strong>: Đầu vào, tập tin monophones0 có từ bước .</p>
<p><strong>hmm0/hmmdefs</strong>: Đầu ra, tên tập tin hmm.</p>
<p>Sau khi đã có được hmmdefs và macros, chúng ta sử dụng HERest để tái ước lượng các tham số trong hmmdefs. Vì sao phải ước lượng lại ? Cần nhớ là khi ta tạo tập tin hmmdefs ở trên, mô hình xấp xỉ Gauss là như nhau cho mọi phones (mean và variance đều giống nhau và giống mô hình trong file proto). Với HERest, nó sẽ sử dụng thông tin trong các tập tin đặc trưng .mfc để ước lượng lại thông số hàm xấp xỉ.</p>
<p align="left">C:\&gt; HERest -C cfg/HERest.cfg -I mlf/phones0.mlf -t 250.0 150.0 1000.0  -S txt/train.scp -H hmm0/macros -H hmm0/hmmdefs -M hmm1 ph/monophones0</p>
<p><strong><em>Giải thích</em></strong></p>
<p><strong>-C HERest.cfg</strong>: Đầu vào, tập tin cấu hình</p>
<p>#HERest.cfg</p>
<p># Coding parameters</p>
<p>TARGETKIND = MFCC_D_A_0</p>
<p>TARGETRATE = 100000.0</p>
<p>SAVECOMPRESSED = T</p>
<p>SAVEWITHCRC = T</p>
<p>WINDOWSIZE = 250000.0</p>
<p>USEHAMMING = T</p>
<p>PREEMCOEF = 0.97</p>
<p>NUMCHANS = 26</p>
<p>CEPLIFTER = 22</p>
<p>NUMCEPS = 12</p>
<p>ENORMALISE = F</p>
<p><strong>-I mlf/phones0.mlf</strong>: Đầu vào, tập tin MLF được tạo từ trước.</p>
<p><strong>-t 250.0 150.0 1000.0</strong>: Đầu vào, tham số prunning.</p>
<p><strong>-S txt/train.scp</strong>: Đầu vào, danh sách các file .mfc.</p>
<p><strong>-H hmm0/macros</strong>: Đầu vào, vừa tạo.</p>
<p><strong>-H hmm0/hmmdefs</strong>: Đầu vào, vừa tạo.</p>
<p><strong>-M hmm1</strong>: Đầu ra, chứa tập tin hmmdefs và macros mới.</p>
<p><strong>ph/monophones0</strong>: Đầu vào, danh sách các phones (ngoại trừ sp).</p>
<p><strong>Sau khi đã có hmm1, ta huấn luyện tiếp hmm2 và hmm3 bằng HERest</strong>. Lưu ý là khi huấn luyện hmm2 thì hmmdefs và macros là của hmm1, tương tự như vậy với hmm3, là các file của hmm2.</p>
<h3>2.Fixing the Silence Models</h3>
<p align="center">
<p>Bước này thêm vào mô hình silence hai bước chuyển từ trạng thái 2 đến trạng thái 4 và ngược lại (như hình). Mô hình short pause (sp) cũng được thêm vào trạng thái trung tâm của mô hình sil. Vì sao làm vậy và tác dụng ra sao thì chưa được rõ.</p>
<p>Có hai bước nhỏ phải thực hiện:</p>
<p><strong>Thêm mô hình sp vào hmmdefs4</strong></p>
<p>C:\&gt;perl pl/makesp.pl hmm3/hmmdefs hmm4/hmmdefs hmm3/macros hmm4/macros</p>
<p>Chạy HHEd để thực hiện việc “trói buộc” mô hình sil và sp với nhau, đồng thời thêm các xác suất chuyển cho mô hình sil.</p>
<p align="left">C:\&gt; HHEd -H hmm4/macros -H hmm4/hmmdefs -M hmm5 ins/sil.hed ph/monophones1</p>
<p><strong>Giải thích</strong></p>
<p><strong>-H hmm4/macros</strong>: Đầu vào, tập tin macros trong hmm4.</p>
<p><strong>-H hmm4/hmmdefs</strong>: Đầu vào, tập tin hmmdefs trong hmm4.</p>
<p><strong>-M hmm5</strong>: Đầu ra hmm5.</p>
<p><strong>ph/monophones1</strong>: Đầu vào, tập tin sanh sách âm, có chứa sil và sp.</p>
<p><strong>ins/sil.hed</strong>: Đầu vào, tập tin chứa lệnh điều chỉnh.</p>
<p>#sil.hed</p>
<p>AT 2 4 0.2 {sil.transP}</p>
<p>AT 4 2 0.2 {sil.transP}</p>
<p>AT 1 3 0.3 {sp.transP}</p>
<p>TI silst {sil.state[3],sp.state[2]}</p>
<p><strong><em>Giải thích lệnh</em></strong><em> </em></p>
<p><strong>AT</strong>: thêm các xác suất dịch chuyển cho các dịch chuyển 2 – 4, 4 – 2 trong ma trận transition của mô hình sil (mở file hmm5/hmmdefs để kiểm chứng).</p>
<p>#hmm5/hmmdefs</p>
<p>~h &#8220;sil&#8221;</p>
<p>5</p>
<p>2</p>
<p>39</p>
<p>…</p>
<p>39</p>
<p>…</p>
<p>5.238852e+001</p>
<p>3</p>
<p>~s &#8220;silst&#8221;</p>
<p>4</p>
<p>39</p>
<p>…</p>
<p>39</p>
<p>…</p>
<p>1.008743e+002</p>
<p>5</p>
<p>0.000000e+000 1.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000</p>
<p>0.000000e+000 7.391776e-001 6.082239e-002 2.000000e-001 0.000000e+000</p>
<p>0.000000e+000 0.000000e+000 3.656323e-001 6.343677e-001 0.000000e+000</p>
<p>0.000000e+000 2.000000e-001 0.000000e+000 4.123393e-001 3.876607e-001</p>
<p>0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000 0.000000e+000</p>
<p><strong>AT 1 3 0.3 {sp.transP}</strong>: thêm xác suất dịch chuyển 1 -3 cho mô hình sp.</p>
<p>#hmm5/hmmdefs</p>
<p>~h &#8220;sp&#8221;</p>
<p>3</p>
<p>2</p>
<p>~s &#8220;silst&#8221;</p>
<p>3</p>
<p>0.000000e+000 7.000000e-001 3.000000e-001</p>
<p>0.000000e+000 5.000000e-001 5.000000e-001</p>
<p>0.000000e+000 0.000000e+000 0.000000e+000</p>
<p><em>Dễ thấy trạng thái 2 của sp cũng bị buộc với mô hình silst vừa mới tạo.</em></p>
<p><strong>TI</strong>: thực hiện trói buộc sp và sil bằng silst, được lưu ở đầu file hmmdefs. Quan sát mô hình sil, có thể thấy trạng thái 3 liên kết với mô hình silst mới này.</p>
<p>#hmm5/hmmdefs</p>
<p>~s &#8220;silst&#8221;</p>
<p>39</p>
<p>-1.233204e+001 9.116629e-001 3.020478e-001 2.393242e+000 3.001888e+000 3.456290e+000 3.706096e+000 2.861026e+000 6.466328e-001 9.644089e-001 7.715054e-001 6.577221e-001 7.104364e+001 -5.122911e-002 -2.084613e-002 2.598116e-001 -2.869415e-001 1.950579e-001 -3.822781e-002 -8.443902e-002 -1.072719e-001 -4.904599e-002 8.692242e-002 8.164209e-003 1.979581e-002 2.325216e-001 1.303717e-001 -4.080342e-002 4.128299e-002 -2.011553e-001 -7.077198e-002 -2.321420e-001 -9.913503e-002 -9.830537e-002 -1.752981e-001 -1.535826e-001 -1.363883e-001 -1.232475e-001 2.405691e-001</p>
<p>39</p>
<p>1.300436e+001 1.042869e+001 1.641352e+001 1.574066e+001 1.265830e+001 1.167596e+001 1.477261e+001 1.271613e+001 1.887716e+001 1.765516e+001 1.965426e+001 1.868261e+001 6.421104e+000 9.842667e-001 1.060887e+000 1.250746e+000 1.715912e+000 1.131693e+000 1.550215e+000 1.346661e+000 1.496958e+000 1.858089e+000 2.148503e+000 1.762000e+000 1.668207e+000 8.345646e-001 1.546411e-001 1.906256e-001 2.265411e-001 2.813516e-001 2.009885e-001 2.763181e-001 2.499908e-001 2.784255e-001 2.954797e-001 3.488080e-001 3.192161e-001 2.714097e-001 1.329405e-001</p>
<p>9.173016e+001</p>
<p><em>Ta nhận ra rằng silst đơn thuần là một hàm Gauss với mean và variance như trên.</em></p>
<p><strong>Thực hiện HERest thêm 2 lần nữa để tạo ra hmm6 và hmm7.</strong></p>
<h3>3.Canh chỉnh lại dữ liệu huấn luyện</h3>
<p>Trong từ điển phát âm có một số từ có nhiều kiểu phát âm khác nhau. Ở bước trước, HLEd chọn tùy ý một trong các kiểu phát âm. Ở bước này, chúng ta sẽ canh chỉnh lại tập tin transcription words.mlf. Nó sẽ chọn cách phiên âm nào khớp nhất so với dữ liệu ngữ âm. Đồng thời, nó sẽ thêm mô hình silence vào đầu và cuối mỗi utterance. Nên nhớ ta phải có entry silence sil trong từ điển (Điều này đã được giải quyết từ bước trước, nếu không nhớ xem lại bước tạo từ điển).</p>
<p align="left">C:\&gt; HVite -l * -o SWT -b silence -a -H hmm7/macros -H hmm7/hmmdefs -i mlf/aligned.mlf -m -t 250.0 -y lab  -I mlf/words.mlf -S txt/train.scp txt/dict.dct ph/monophones1</p>
<p><strong><em>Giải thích</em></strong></p>
<p><strong>-b silence</strong>: Đầu vào, chèn thêm sil vào đầu và cuối utterance.</p>
<p><strong> </strong></p>
<p><strong>HERest được thực hiện thêm 2 lần nữa để tạo hmm8 và hmm9.</strong></p>
<p><em> Lưu ý là ta không dùng phones1.mlf nữa mà chuyển sang aligned.mlf.</em></p>
<p align="left">C:\&gt; HERest -B -C cfg/HERest.cfg -I mlf/aligned.mlf -t 250.0 150.0 1000.0 -s stats  -S txt/train.scp -H hmm7/macros -H hmm7/hmmdefs -M hmm8 ph/monophones1</p>
<p align="left">C:\&gt; HERest -B -C cfg/HERest.cfg -I mlf/aligned.mlf -t 250.0 150.0 1000.0 -s stats  -S txt/train.scp -H hmm8/macros -H hmm8/hmmdefs -M hmm9 ph/monophones1</p>
<h2>Tạo triphones</h2>
<p>Giai đoạn cuối cùng trong việc xây dựng mô hình HMM là tạo các triphones phụ thuộc vào ngữ cảnh. Có hai bước nhỏ. Thứ nhất, monophones transcription (transcription đơn âm) được chuyển thành triphones transcription (transcription 3-âm). Các mô hình triphones được ước lượng lại từ mô hình monophones. Thứ hai, các trạng thái triphones được “tied” với nhau để quá trình ước lượng tốt hơn. Tied là gì ? Nhớ lại bước trước, khi mà ta thêm mô hình sp cho hmmdefs, tied (tạm dịch là trói buộc) nghĩa là 2 ay nhiều mô hình HMM sẽ dùng chung một bộ các tham số (mean hay variance chẳng hạn).</p>
<h3>1.Tạo triphones từ monophones</h3>
<h3>1.1</h3>
<p align="left">C:\&gt; HLEd -n ph/triphones1 -l * -i mlf/wintri.mlf ins/mktri.led mlf/aligned.mlf</p>
<p><strong>Giải thích</strong></p>
<p><strong>-n ph/triphones1</strong>: Đầu ra, danh sách các triphones.</p>
<p><strong>-i mlf/wintri.mlf</strong>: Đầu ra, triphones transcription.</p>
<p><strong>mlf/aligned.mlf</strong>: Đầu vào, monophones transcription đã được ước lượng lại.</p>
<p><strong>ins/mktri.led</strong>: Đầu vào, chứa lệnh tạo triphones từ monophones.</p>
<p>#mktri.led</p>
<p>WB sp</p>
<p>WB sil</p>
<p>TC</p>
<p><strong><em>Giải thích lệnh</em></strong></p>
<p><strong>WB</strong>: coi như sp và sil là những từ ở biên (word boundary symbol), và do đó, không chuyển chúng thành triphones.</p>
<p><strong>TC</strong>: chuyển tất cả các monophones thành triphones trừ các WB. Một điều đáng lưu ý là cũng có các biphones được tạo ra trong quá trình này, bởi vì chúng có một bên nằm sát biên. Xem xét ví dụ sau để hiểu rõ hơn:</p>
<p>sil th ih s sp m ae n sp</p>
<p>sil th+ih th-ih+s ih-s sp m+ae m-ae+n ae-n sp</p>
<p>biphones</p>
<p>triphones</p>
<p>word boundary symbol</p>
<p><em>Mô hình trên đây gọi là Word internal. Còn có hai mô hình nữa, chúng ta sẽ đề cập trong một dịp khác.</em></p>
<h3>1.2.</h3>
<p>Tiếp theo, chúng ta sẽ “nhái” mô hình monophones trong hmm9 thành triphones trong hmm10</p>
<p align="left">C:\&gt; HHEd -B -H hmm9/macros -H hmm9/hmmdefs -M hmm10 ins/mktri.hed ph/monophones1</p>
<p><strong>Giải thích</strong></p>
<p><strong>-H hmm9/macros -H hmm9/hmmdefs</strong>: Đầu vào, hmm của monophones.</p>
<p><strong>-M hmm10</strong>: Đầu ra, hmm10 được huấn luyện thành triphones.</p>
<p><strong>Ins/mktri.hed</strong>: Đầu vào, tập tin chứa lệnh thực hiện “trói buộc” các ma trận chuyển  của mỗi triphone trong tập tin triphones1.</p>
<p><strong>-B</strong>: Đầu vào, lưu trữ hmmdefs ở dạng nhị phân thay vì text (giảm không gian chiếm dụng).</p>
<p><strong>Làm sao tạo mktri.hed?</strong></p>
<p align="left">C:\&gt;perl pl/mkTriHed.pl ph/monophones1 ph/triphones1 ins/mktri.hed</p>
<p>Lý giải cách làm việc của makeTriHed.pl (chưa có).</p>
<p><em>Chúng ta sẽ nhận được một số WARNING về T_sil và T_sp. Không sao cả, chuyền này rất bình thường.</em></p>
<h3>1.3.</h3>
<p>Sau khi đã “nhái” xong, việc tiếp theo là tái ước lượng mô hình triphones này. Chúng ta cũng vẫn sử dụng HERest.</p>
<p align="left">C:\&gt; HERest -B -C cfg/HERest.cfg -I mlf/wintri.mlf -t 250.0 150.0 1000.0 -s stats  -S txt/train.scp -H hmm10/macros -H hmm10/hmmdefs -M hmm11 ph/triphones1</p>
<p><strong>Thực hiện thêm một lần nữa để có hmm12.</strong></p>
<p align="left">C:\&gt; HERest -B -C cfg/HERest.cfg -I mlf/wintri.mlf -t 250.0 150.0 1000.0 -s stats  -S txt/train.scp -H hmm11/macros -H hmm11/hmmdefs -M hmm12 ph/triphones1</p>
<h3>2.Tạo trạng thái ràng buộc cho triphones (tied state triphones)</h3>
<p>Công việc cuối cùng trong việc xây dựng mô hình là ràng buộc các các trạng thái trong các tập triphones, từ đó chia sẻ dữ liệu trong từng tập và kết quả nhận dạng sẽ tốt hơn. Mặc dù bước trước đã ràng buộc, quá trình này đỏi hỏi phải tinh tế hơn một chút vì nó sẽ ảnh hưởng rất lớn đến hiệu suất nhận dạng. Ở bước này chúng ta sẽ sử dụng HHEd để gom nhóm (clusterring) các trạng thái và sau đó trói buộc các trạng thái trong cùng một nhóm với nhau. HHEd đưa ra hai tùy chọn : i) Dùng độ đo tính tương đồng giữa các trạng thái để gom nhóm (cách này gọi là hướng dữ liệu); ii) Dùng cây quyết định, dựa trên việc đưa ra các câu hỏi về trạng thái trái (left) và phải (right) của ngữ cảnh (context) của từng triphones. Cây quyết định sẽ thử tìm những ngữ cảnh (context) nào tạo ra sự khác biệt lớn nhất đối với ngữ âm và sử dụng nó để gom nhóm.</p>
<blockquote>
<p align="left">C:\&gt; HHEd -B -H hmm12/macros -H hmm12/hmmdefs -M hmm13 ins/tree.hed ph/triphones1 &gt; log</p>
</blockquote>
<p><em>Khoan thực hiện lệnh này, nó sẽ được thực hiện sau khi đọc toàn bộ.</em></p>
<p><strong>Giải thích</strong></p>
<p><strong>tree.hed</strong>: là tập hợp các chỉ thị tìm kiếm các ngữ cảnh phù hợp cho việc gom nhóm.</p>
<p><strong>Hiểu &amp; dùng tập tin tree.hed</strong></p>
<p>Trong tree.hed có một số lệnh như: RO, TR, QS, TB, AU, CO, ST.</p>
<p><strong>RO 100.0</strong>: Thiết lập ngưỡng ngoài là 100 (không hiểu: The outlier threshold determines the minimum occupancy of any cluster and prevents a single outlier state forming a singleton cluster just because it is acoustically very different to all the other states.) và nạp file thống kế stats đã tạo ở bước trước.</p>
<p><strong>TR 0</strong>: Thiết lập trace về 0.</p>
<p><strong>QS *</strong>: Nạp câu hỏi. QS (question) là do người dùng tự định nghĩa. Và định nghĩa thế nào cho hiệu quả là cả một vấn đề lớn. Perl script của ta chỉ thực hiện những việc thiết yếu đối với QS: QS cho ngữ cảnh Left và Right. Một hình dung dễ hiểu về QS là gì có thể thông qua ví dụ sau:</p>
<p>QS &#8220;L_Class-Stop&#8221; {p-*,b-*,t-*,d-*,k-*,g-*}</p>
<p>Mỗi QS (câu hỏi) được định nghĩa bằng một tập các ngữ cảnh theo sau nó, đặt trong hai dấu ngoặc nhọn. Câu hỏi “L_Class_Stop” sẽ TRUE nếu ngữ cảnh bên trái là p,b,t,d,k hoặc g.</p>
<p>Các câu hỏi cụ thể hơn về consonant, vowel, nasal, diphthong,… có thể không cần nhưng nếu có sẽ tốt hơn.</p>
<p><strong>TR: .</strong></p>
<p><strong>TB</strong>: Hoạt động như sau: tất cả các triphones được cho chung vào một pool (đơn giản nghĩa là một chỗ chứa). Tất cả các QS lần lượt được nạp và được dùng để phân đôi pool này làm hai pool con. QS nào làm cực đại logarit likelihood của dữ liệu huấn luyện sẽ được chọn làm nhánh đầu tiên trong cây quyết định. Quá trình này được lặp lại cho đến khi với bất cứ QS nào, mức tăng log likelihood nhỏ hơn ngưỡng mà chúng ta qui định. Giá trị ngưỡng là con số đi theo sau TB</p>
<p>(VD: TB 40 &#8220;st_y_4_&#8221; {(&#8220;y&#8221;,&#8221;*-y+*&#8221;,&#8221;y+*&#8221;,&#8221;*-y&#8221;).state[4]}).</p>
<p><strong>AU “fulllist”</strong>: tập tin chứa danh sách đầy đủ các phones: mono,bi và tri.</p>
<p><strong>CO “tiedlist”</strong>: có một số mô hình sẽ giống nhau cả ba trạng thái và ma trận chuyển. Lệnh này tìm kiếm các mô hình giống nhau và nén lại bằng cách trói buộc chúng với nhau (khái niệm tie đã trình bày trước đây), tạo ra một danh sách mới các mô hình, lưu trữ trong tiedlist.</p>
<p>Như vậy, chúng ta phải tạo fulllist.</p>
<blockquote><p>C:\&gt;perl pl/mkFullList.pl ph/monophones0</p></blockquote>
<p>Tạo tập tin tree.hed</p>
<blockquote><p>C:\&gt;perl pl/mkTree.pl 40 ph/monophones0 ins/tree.hed</p></blockquote>
<p><strong>Giải thích</strong></p>
<p><strong>40</strong>: ngưỡng qui định đối với TB, có thể thay đổi tùy ý.</p>
<p><strong>ins/tree.hed</strong>: Đầu ra, tập tin tree.hed cần tạo.</p>
<p>Đến đây thì ta có thể yên tâm thực hiện lệnh HHEd.</p>
<p><strong>Ta tiếp tục chạy HERest thêm 2 lần nữa cho hmm14 và hmm15.</strong></p>
<blockquote>
<p align="left">C:\&gt;HERest -B -C cfg/HERest.cfg -I mlf/wintri.mlf -t 250.0 150.0 1000.0 -s stats  -S txt/train.scp -H hmm13/macros -H hmm13/hmmdefs -M hmm14 ph/triphones1</p>
<p align="left">C:\&gt;HERest -B -C cfg/HERest.cfg -I mlf/wintri.mlf -t 250.0 150.0 1000.0 -s stats  -S txt/train.scp -H hmm14/macros -H hmm14/hmmdefs -M hmm15 ph/triphones1</p>
</blockquote>
<h2>Đánh giá nhận dạng</h2>
<p>Bộ nhận dạng bây giờ đã hoàn hoàn thành và có thể được sử dụng. Bao nhiêu công sức bỏ ra và bây giờ là giai đoạn đánh giá [^.^].</p>
<h3>1.Nhận dạng dữ liệu test</h3>
<p>Giả sử chúng ta muốn nhận dạng thử bộ dữ liệu gồm 10 utterances.</p>
<blockquote>
<p align="left">C:\&gt; HVite –C cfg/Hvite.cfg -H hmm15/macros -H hmm15/hmmdefs -S test/test.scp -i test/recout.mlf -w txt/wdnet.txt txt/dict.dct tiedlist</p>
</blockquote>
<p><strong>Giải thích</strong></p>
<p><strong>–C cfg/Hvite.cfg</strong>: Đầu vào, tập tin cấu hình.</p>
<p><strong>-H hmm15/macros -H hmm15/hmmdefs</strong>: Đầu vào</p>
<p><strong>-S test/test.scp</strong>: Đầu vào, tập tin chứa danh sách các file .mfc cần nhận dạng.</p>
<p><strong>-i test/recout.mlf</strong>: Đầu ra, transcription nhận dạng được.</p>
<p><strong>-w txt/wdnet.txt</strong>: Đầu vào, wordnet được tạo từ những bước đầu.</p>
<p><strong>txt/dict.dct</strong>: Đầu vào, từ điển phiên âm.</p>
<p><strong>tiedlist</strong>: Đầu vào, danh sách phones tạo được từ lệnh CO “tiedlist” trong tree.hed.</p>
<p><strong>Lưu ý</strong></p>
<p><em>Với việc cấu tạo triphones theo kiểu word internal như đã nói phần trước, trong tập tin cấu hình Hvite.cfg cần có thêm 2 tham số FORCECXTEXP = T và ALLOWXWRDEXP=F. Muốn hiểu tại sao, xem chương 12 HTK Book.</em></p>
<p><em>Có thêm một vài tham số của Hvite như p, s, tùy người dùng điều chỉnh.</em></p>
<h3>2.Xuất kết quả</h3>
<blockquote><p>C:\&gt;HResults –f –t -I mlf/words.mlf tiedlist test/recout.mlf &gt; test/result</p></blockquote>
<p>Kết quả nhận dạng 20 utterances đầu trong train.scp</p>
<blockquote><p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212; Overall Results &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;SENT: %Correct=40.00 [H=8, S=12, N=20]</p>
<p>WORD: %Corr=100.00, Acc=89.29 [H=140, D=0, S=0, I=15, N=140]</p>
<p>=============================================================</p></blockquote>
<p>Kết quả không khả quan mấy, nhưng chạy được là tốt rồi.[^_^].</p>
<h2>Công việc tiếp theo</h2>
<p>Bước tiếp theo là tăng số hàm Gauss dùng để xấp xỉ mô hình HMM từ 1 lên 3, 5, 7,. Đọc Mixture Incrementing trong slide môn học.</p>
<p>Xa hơn là Adapting HMMs.</p>
<p>Hy vọng hai chủ đề này sẽ sớm được cập nhật.</p>
<p><a rel="attachment wp-att-175" href="http://vodinhphong.wordpress.com/2009/07/19/htk-training/htk_training_final/">Tải về bản PDF</a></p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vodinhphong.wordpress.com/168/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vodinhphong.wordpress.com/168/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vodinhphong.wordpress.com/168/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vodinhphong.wordpress.com/168/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vodinhphong.wordpress.com/168/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vodinhphong.wordpress.com/168/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vodinhphong.wordpress.com/168/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vodinhphong.wordpress.com/168/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vodinhphong.wordpress.com/168/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vodinhphong.wordpress.com/168/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=168&subd=vodinhphong&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://vodinhphong.wordpress.com/2009/07/19/htk-training/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/51f255d405a0b3dddbbb3bd29282512c?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vodinhphong</media:title>
		</media:content>
	</item>
		<item>
		<title>How to evaluate a clustering result?</title>
		<link>http://vodinhphong.wordpress.com/2009/06/28/how-to-evaluate-a-clustering-result/</link>
		<comments>http://vodinhphong.wordpress.com/2009/06/28/how-to-evaluate-a-clustering-result/#comments</comments>
		<pubDate>Sun, 28 Jun 2009 08:13:27 +0000</pubDate>
		<dc:creator>vodinhphong</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://vodinhphong.wordpress.com/?p=142</guid>
		<description><![CDATA[Preface
According to Wikipedia,
Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=142&subd=vodinhphong&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><h2>Preface</h2>
<p>According to Wikipedia,</p>
<blockquote><p><strong>Cluster analysis</strong> or <strong>clustering</strong> is the assignment of a set of observations into subsets (called <em>clusters</em>) so that observations in the same cluster are similar in some sense. Clustering is a method of <a title="Unsupervised learning" href="http://en.wikipedia.org/wiki/Unsupervised_learning">unsupervised learning</a>, and a common technique for <a title="Statistics" href="http://en.wikipedia.org/wiki/Statistics">statistical</a> <a title="Data analysis" href="http://en.wikipedia.org/wiki/Data_analysis">data analysis</a> used in many fields, including <a title="Machine learning" href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a>, <a title="Data mining" href="http://en.wikipedia.org/wiki/Data_mining">data mining</a>, <a title="Pattern recognition" href="http://en.wikipedia.org/wiki/Pattern_recognition">pattern recognition</a>, <a title="Image analysis" href="http://en.wikipedia.org/wiki/Image_analysis">image analysis</a> and <a title="Bioinformatics" href="http://en.wikipedia.org/wiki/Bioinformatics">bioinformatics</a>.</p></blockquote>
<p>In this post I do not discuss about clustering algorithms but they way to evaluate a clustering result. My current problem is relevant to forming a codebook for visual categorization,  i.e to cluster a huge dataset (~ 6.525 million feature vectors) into clusters (visual words). After that, this codebook is used as a reference to vote into samples. In other word, this is exact <a href="http://en.wikipedia.org/wiki/Bag_of_words_model_in_computer_vision">the BoW method</a>. The problem here is, how to know a clustering result is discriminative enough or not. Here I noted some idea from  <a href="http://nlp.stanford.edu/%7Emanning/">Christopher D. Manning</a>, <a href="http://theory.stanford.edu/people/raghavan/">Prabhakar Raghavan</a> and  <a href="http://www-csli.stanford.edu/%7Ehinrich">Hinrich Schütze</a>, <em>Introduction to Information Retrieval</em>, Cambridge University Press. 2008.</p>
<h2>K-means clustering</h2>
<p>It is natural to talk about clustering by review the K-means algorithm:</p>
<ul>
<li>A special case of a general procedure known as EM (Expectation Maximization)</li>
<li>Termination conditions:
<ul>
<li>A fixed number of iterations</li>
<li>Sample partition unchanged</li>
<li>Centroid positions unchanged (does the 2nd condition hold?)</li>
</ul>
</li>
<li>Time complexity <img src='http://l.wordpress.com/latex.php?latex=%5CTheta%28lKmn%29&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\Theta(lKmn)' title='\Theta(lKmn)' class='latex' />
<ul>
<li><img src='http://l.wordpress.com/latex.php?latex=l&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='l' title='l' class='latex' />: iteration</li>
<li><img src='http://l.wordpress.com/latex.php?latex=K&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K' title='K' class='latex' />: number of clusters</li>
<li><img src='http://l.wordpress.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' />: number of samples</li>
<li><img src='http://l.wordpress.com/latex.php?latex=n&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='n' title='n' class='latex' />: sample dimension</li>
</ul>
</li>
</ul>
<h2>Evaluation of clustering</h2>
<h3>Internal evaluation</h3>
<ul>
<li>High intra-cluster similarity</li>
<li>Low inter-cluster similarity</li>
<li>Measured quality of a clustering depends on
<ul>
<li>sample representation (i.e how to represent descriptor efficiently from raw data)</li>
<li>similarity measure (i.e this post)</li>
</ul>
</li>
</ul>
<p><em><strong>Comment</strong></em>: It seems that this kind of evaluation is not very meaningful. Instead of using it, I take the clustering result to use for another application and measure the application&#8217;s performance to decide the clustering is good or not.</p>
<h3>External evaluation</h3>
<p>In spite of unsupervised learning, clustering can benefits from some kinds of benchmark data/labeled data (if available).  Assumed that I have this benchmark data and I want to know whether clustering method and accompanied parameters is good. Following measures can be used:</p>
<h4>Purity</h4>
<p style="text-align:left;"><img src='http://l.wordpress.com/latex.php?latex=purity%28%5COmega%2CC%29%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%7Bmax_j%5Cleft%7C%5Comega_k%5Ccap+c_j%5Cright%7C%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='purity(\Omega,C)=\frac{1}{N}\sum{max_j\left|\omega_k\cap c_j\right|}' title='purity(\Omega,C)=\frac{1}{N}\sum{max_j\left|\omega_k\cap c_j\right|}' class='latex' /></p>
<p style="text-align:left;">in which, <img src='http://l.wordpress.com/latex.php?latex=%5COmega%3D%7B%5Comega_1%2C%5Comega_2%2C...%2C%5Comega_K%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\Omega={\omega_1,\omega_2,...,\omega_K}' title='\Omega={\omega_1,\omega_2,...,\omega_K}' class='latex' /> is the set of clusters, <img src='http://l.wordpress.com/latex.php?latex=C%3D%7Bc_1%2Cc_2%2C...%2Cc_J%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='C={c_1,c_2,...,c_J}' title='C={c_1,c_2,...,c_J}' class='latex' /> is the set of classes.</p>
<p style="text-align:left;">Purity demonstrates how much a cluster contains different classes. The more classes a cluster has in itself, the less purity is. However, purity can be easily obtained in the case <img src='http://l.wordpress.com/latex.php?latex=K%3DN&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K=N' title='K=N' class='latex' />.</p>
<h3 style="text-align:left;">Normalized Mutual Information</h3>
<p style="text-align:left;">
<p style="text-align:left;"><img src='http://l.wordpress.com/latex.php?latex=NMI%28%5COmega%2CC%29%3D%5Cfrac%7BI%28%5COmega%2CC%29%7D%7B%28H%28%5COmega%29%2BH%28C%29%29%2F2%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='NMI(\Omega,C)=\frac{I(\Omega,C)}{(H(\Omega)+H(C))/2}' title='NMI(\Omega,C)=\frac{I(\Omega,C)}{(H(\Omega)+H(C))/2}' class='latex' /></p>
<p style="text-align:left;">in which, <img src='http://l.wordpress.com/latex.php?latex=I&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='I' title='I' class='latex' /> &#8211; mutual information, expressed as follows:</p>
<p style="text-align:left;"><img src='http://l.wordpress.com/latex.php?latex=I%28%5COmega%2CC%29%3D%5Csum_k%7B%5Csum_j%7BP%28%5Comega_k%5Ccap+c_j%29log%28%5Cfrac%7BP%28%5Comega_k%5Ccap+c_j%29%7D%7BP%28%5Comega_k%29P%28c_j%29%7D%29%7D%7D%3D%5Csum_k%7B%5Csum_j%7B%5Cfrac%7B%7C%5Comega_k%5Ccap+c_j%7C%7D%7BN%7Dlog%28%5Cfrac%7BN%7C%5Comega_k%5Ccap+c_j%7C%7D%7B%7C%5Comega_k%7C%7Cc_j%7C%7D%29%7D%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='I(\Omega,C)=\sum_k{\sum_j{P(\omega_k\cap c_j)log(\frac{P(\omega_k\cap c_j)}{P(\omega_k)P(c_j)})}}=\sum_k{\sum_j{\frac{|\omega_k\cap c_j|}{N}log(\frac{N|\omega_k\cap c_j|}{|\omega_k||c_j|})}}' title='I(\Omega,C)=\sum_k{\sum_j{P(\omega_k\cap c_j)log(\frac{P(\omega_k\cap c_j)}{P(\omega_k)P(c_j)})}}=\sum_k{\sum_j{\frac{|\omega_k\cap c_j|}{N}log(\frac{N|\omega_k\cap c_j|}{|\omega_k||c_j|})}}' class='latex' /></p>
<p style="text-align:left;">and <img src='http://l.wordpress.com/latex.php?latex=H&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H' title='H' class='latex' /> &#8211; entropy, expressed as follows:</p>
<p style="text-align:left;"><img src='http://l.wordpress.com/latex.php?latex=H%28%5COmega%29%3D-%5Csum_kP%28%5Comega_k%29logP%28%5Comega_k%29%3D-%5Csum_k%5Cfrac%7B%7C%5Comega_k%7C%7D%7BN%7Dlog%5Cfrac%7B%5Comega_k%7D%7BN%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H(\Omega)=-\sum_kP(\omega_k)logP(\omega_k)=-\sum_k\frac{|\omega_k|}{N}log\frac{\omega_k}{N}' title='H(\Omega)=-\sum_kP(\omega_k)logP(\omega_k)=-\sum_k\frac{|\omega_k|}{N}log\frac{\omega_k}{N}' class='latex' /></p>
<p style="text-align:left;"><em><strong>Comment</strong></em>:  Mutual information allows us to gain information about the classes when given what the clusters are ( in the ideal case, clusters are exact classes). However, MI has the similar case as <img src='http://l.wordpress.com/latex.php?latex=Purity&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='Purity' title='Purity' class='latex' /> when each cluster contains just one sample. Avoiding it, MI is divided by the denominator <img src='http://l.wordpress.com/latex.php?latex=%28H%28%5COmega%29%2BH%28C%29%29%2F2&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='(H(\Omega)+H(C))/2' title='(H(\Omega)+H(C))/2' class='latex' />. Entropy increases with the number of clusters. In case <img src='http://l.wordpress.com/latex.php?latex=K%3DN&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='K=N' title='K=N' class='latex' />, <img src='http://l.wordpress.com/latex.php?latex=H%28%5COmega%29%3DlogN&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='H(\Omega)=logN' title='H(\Omega)=logN' class='latex' /> and therefore <img src='http://l.wordpress.com/latex.php?latex=NMI&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='NMI' title='NMI' class='latex' /> is low. Interesting?</p>
<h3 style="text-align:left;">Accuracy  criterion (or Rand Index)</h3>
<p><img src='http://l.wordpress.com/latex.php?latex=RI%3D%5Cfrac%7BTP%2BTN%7D%7BTP%2BFP%2BFN%2BTN%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='RI=\frac{TP+TN}{TP+FP+FN+TN}' title='RI=\frac{TP+TN}{TP+FP+FN+TN}' class='latex' /></p>
<p style="text-align:left;">Ones can use accuracy concept to apply for a clustering result: A true positive <img src='http://l.wordpress.com/latex.php?latex=TP&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='TP' title='TP' class='latex' /> decision assigns two similar samples to the same cluster, a false positive <img src='http://l.wordpress.com/latex.php?latex=FP&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='FP' title='FP' class='latex' /> decision assigns two similar samples to different clusters. The formula is quite simple:</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vodinhphong.wordpress.com/142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vodinhphong.wordpress.com/142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vodinhphong.wordpress.com/142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vodinhphong.wordpress.com/142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vodinhphong.wordpress.com/142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vodinhphong.wordpress.com/142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vodinhphong.wordpress.com/142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vodinhphong.wordpress.com/142/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vodinhphong.wordpress.com/142/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vodinhphong.wordpress.com/142/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=142&subd=vodinhphong&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://vodinhphong.wordpress.com/2009/06/28/how-to-evaluate-a-clustering-result/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/51f255d405a0b3dddbbb3bd29282512c?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vodinhphong</media:title>
		</media:content>
	</item>
		<item>
		<title>For a more comformtable C/C++</title>
		<link>http://vodinhphong.wordpress.com/2009/06/25/for-a-more-comformtable-cc/</link>
		<comments>http://vodinhphong.wordpress.com/2009/06/25/for-a-more-comformtable-cc/#comments</comments>
		<pubDate>Thu, 25 Jun 2009 14:36:09 +0000</pubDate>
		<dc:creator>vodinhphong</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://vodinhphong.wordpress.com/?p=136</guid>
		<description><![CDATA[C/C++ coding is often a nightmare with me because of its allocation/deallocation mechanism is delegated to programmer. But it is exact the way I like.  Flexibility is a two-blade knife, it can help you much and it can kill  you at once! Since I was a IT student, I rarely used C/C++ to write programs, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=136&subd=vodinhphong&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>C/C++ coding is often a nightmare with me because of its allocation/deallocation mechanism is delegated to programmer. But it is exact the way I like.  Flexibility is a two-blade knife, it can help you much and it can kill  you at once! Since I was a IT student, I rarely used C/C++ to write programs, and used Java instead. At that time, I adored Java and hated C/C++ because of Visual Studio. I had not known that there are so much cool IDEs that I can use free.</p>
<p>By now, I am on the way to become a PhD student and I like C/C++. However, using C/C++ in a right way is the problem.</p>
<p>The first thing I think, it is what kind of OS I should use? Linux!</p>
<p>The second thing is which IDE do I use? <a href="http://www.eclipse.org/downloads/">Eclipse</a>. With Eclipse, I can manipulate makefile with supports from Eclipse, or write and maintain by myself. This point is quite important because when you deploy an application/your code on another computer/system, the most portable way is to use Makefile.</p>
<p>The next thing is how to use C/C++ in the most comformtable way? Use good external libraries to reduce time to develop or fix bugs.</p>
<p>The most important library is STL (sure!)</p>
<p>If you want to write programs with rich command line options (and you should write that), use the Boost library with Program_options package. Writing a GUI application is a tedious task and I personally think it is the bad way. Within several minutes, you can add numerous of functionalities into your program without doubt about where to place these buttons, these textboxes, etc. If your OS is Windows, please install Boost from <a title="Boost" href="http://www.boostpro.com/download" target="_blank">this site</a>. Do not waste time to install from <a title="Boost" href="http://www.boost.org/">Boost homepage</a>, it is just good for Linux users and tutorials.</p>
<p>In fact, the more you depend on external libraries, the more problems you get when deploying on other computers. So, be thoughtful before decide to use them. On the next post, I might introduce about Blitz++.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vodinhphong.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vodinhphong.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vodinhphong.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vodinhphong.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vodinhphong.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vodinhphong.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vodinhphong.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vodinhphong.wordpress.com/136/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vodinhphong.wordpress.com/136/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vodinhphong.wordpress.com/136/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=136&subd=vodinhphong&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://vodinhphong.wordpress.com/2009/06/25/for-a-more-comformtable-cc/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/51f255d405a0b3dddbbb3bd29282512c?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vodinhphong</media:title>
		</media:content>
	</item>
		<item>
		<title>What person do you want to become in the next 20 years?</title>
		<link>http://vodinhphong.wordpress.com/2009/05/05/what-person-do-you-want-to-become-in-the-next-20-years/</link>
		<comments>http://vodinhphong.wordpress.com/2009/05/05/what-person-do-you-want-to-become-in-the-next-20-years/#comments</comments>
		<pubDate>Tue, 05 May 2009 04:41:26 +0000</pubDate>
		<dc:creator>vodinhphong</dc:creator>
				<category><![CDATA[Thoughts]]></category>

		<guid isPermaLink="false">http://vodinhphong.wordpress.com/?p=120</guid>
		<description><![CDATA[I am depressed in these days. So I would like to start a revolution in achieving productivity. The first revolution I had made in the first &#38; second year in undergraduate. It was an exciting period with perfectionism and optimism. Golden days were over. As a consequence, I increased in study achievement, i.e I can [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=120&subd=vodinhphong&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I am depressed in these days. So I would like to start a revolution in achieving productivity. The first revolution I had made in the first &amp; second year in undergraduate. It was an exciting period with perfectionism and optimism. Golden days were over. As a consequence, I increased in study achievement, i.e I can determine the orientation for myself, and work without supervision. But the unavoidable effect is those stuff is fading out days by days. My weakness, obviously, is no more than procrastination. This is the time more than ever, I realize this weakness most and desire to start a second revolution to get rid of procrastination in my habit. Simultaneously, I have to re-target my goal in long-run plan instead of bullshit weak plans. Does anybody read the book &#8220;Eat that frogs,&#8230;&#8221; ? There are SEVEN steps to sparkle out a good life:</p>
<p><strong>Step ONE</strong>. Decide exactly what you want<br />
<strong>Step TWO</strong>. Write it down<br />
<strong>Step THREE</strong>. Set a deadline on your goal<br />
<strong>Step FOUR</strong>. Make a list of everything that you can think of that you are going to do to achieve your goal.<br />
<strong>Step FIVE</strong>. Organize the list into a plan ( by priority and sequence)<br />
<strong>Step SIX</strong>. Take action on your plan immediately<br />
<strong>Step SEVEN</strong>. Do something every single day (keep moving, do not stop)</p>
<p>These steps are quite obvious for everyone but several people do it and few people execute their plan. If you are curious about the book, you can it the most interesting part from <a href="http://books.google.com/books?id=R3iBRVOX1tIC&amp;dq=eat+that+frogs+21+ways+to&amp;printsec=frontcover&amp;source=bl&amp;ots=DeIGbbGB6d&amp;sig=CFTzkgaVTSjt2T8fvyrVBikDUGc&amp;hl=en&amp;ei=xwf9SdTJA4mVkAWHwI3qBA&amp;sa=X&amp;oi=book_result&amp;ct=result&amp;resnum=3#PPA1,M1">Google Book</a>, or download illegally from <a href="http://rs209tl3.rapidshare.com/files/167743839/3907743/EatFroebook.rar">RAPIDSHARE</a>.</p>
<p>Return to the list. Let&#8217;s take the first step, <strong>Decide exact what you want.</strong> I got stuck at the first step!!! And I think everyone has the same problem as mine. We get frustrated and do not know what kind of person we do want, why, and how. Hence, the my problem goes beyond procrastination, it&#8217;s about my career, the most trivial question that all the student in Vietnam have to encounter with. How to solve it?</p>
<p>Anyone who cares about Computer Science should read the article &#8220;You and your research&#8221; sometime in his/her life. Finally, I read it after knowing about last year. It&#8217;s the good news, indeed. It is 20 page long and you can read it online <a href="http://www.cs.virginia.edu/%7Erobins/YouAndYourResearch.html">HERE</a>.</p>
<p>The articles is interesting at various perspectives but in order to answer the career question, I quoted here the most relevant information:</p>
<p><em>Question:</em> Would you compare research and management?</p>
<p><em>Hamming:</em> If you want to be a great researcher, you won&#8217;t make it being president of the company. If you want to be president of the company, that&#8217;s another thing. I&#8217;m not against being president of the company. I just don&#8217;t want to be. I think Ian Ross does a good job as President of Bell Labs. I&#8217;m not against it; but you have to be clear on what you want. Furthermore, when you&#8217;re young, you may have picked wanting to be a great scientist, but as you live longer, you may change your mind. For instance, <strong>I went to my boss, Bode, one day and said, &#8220;Why did you ever become department head? Why didn&#8217;t you just be a good scientist?&#8221;</strong> He said, &#8220;Hamming, I had a vision of what mathematics should be in Bell Laboratories. And I saw if that vision was going to be realized, <em>I</em> had to make it happen; <em>I</em> had to be department head.&#8221; When your vision of what you want to do is what you can do single-handedly, then you should pursue it. <strong>The day your vision, what you think needs to be done, is bigger than what you can do single-handedly, then you have to move toward management.</strong> And the bigger the vision is, the farther in management you have to go. If you have a vision of what the whole laboratory should be, or the whole Bell System, you have to get there to make it happen. You can&#8217;t make it happen from the bottom very easily. It depends upon what goals and what desires you have. And as they change in life, you have to be prepared to change. I chose to avoid management because I preferred to do what I could do single-handedly. But that&#8217;s the choice that I made, and it is biased. Each person is entitled to their choice. Keep an open mind. But when you do choose a path, for heaven&#8217;s sake be aware of what you have done and the choice you have made. Don&#8217;t try to do both sides.</p>
<p>It seems that a very natural answer for me (but it may not apply to you). Just forget about constantly asking &#8220;research and management, which one is better?&#8221;. We aske because of matter of money and position. If you are a kind of egoistic person, means that you feel happy if you can do whatever you want, this answer is for you. Cheer.</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vodinhphong.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vodinhphong.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vodinhphong.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vodinhphong.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vodinhphong.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vodinhphong.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vodinhphong.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vodinhphong.wordpress.com/120/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vodinhphong.wordpress.com/120/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vodinhphong.wordpress.com/120/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=120&subd=vodinhphong&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://vodinhphong.wordpress.com/2009/05/05/what-person-do-you-want-to-become-in-the-next-20-years/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/51f255d405a0b3dddbbb3bd29282512c?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vodinhphong</media:title>
		</media:content>
	</item>
		<item>
		<title>Computer Vision &#8211; two paradigms, one mission</title>
		<link>http://vodinhphong.wordpress.com/2009/02/17/computer-vision-two-paradigms-one-mission/</link>
		<comments>http://vodinhphong.wordpress.com/2009/02/17/computer-vision-two-paradigms-one-mission/#comments</comments>
		<pubDate>Tue, 17 Feb 2009 06:00:17 +0000</pubDate>
		<dc:creator>vodinhphong</dc:creator>
				<category><![CDATA[Thoughts]]></category>

		<guid isPermaLink="false">http://vodinhphong.wordpress.com/?p=100</guid>
		<description><![CDATA[Recently, I think that the difference in process between human brain and current recognition systems is the difference in philosophy. Started from remarkable works of Feild and Oslaushend about receptive fields in cortical cells, vision research community believe (or at least admit) that early vision functions in vision pathway takes an important rule in recognizing [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=100&subd=vodinhphong&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Recently, I think that the difference in process between human brain and current recognition systems is the difference in philosophy. Started from remarkable works of Feild and Oslaushend about receptive fields in cortical cells, vision research community believe (or at least admit) that early vision functions in vision pathway takes an important rule in recognizing objects and events. Products raised from this exploration is well-known concepts such as filter banks, textons, shapelets, movemes, low frequency shapes, weak features and so on. These features then are learnt by a classifier using state-of-the-art models (e.x Boosting, Support Vector Machine, Condition Random Field, Bayesian hierachical model). Depending on learner&#8217;s category is parametric or non-parametric, it obtain a parameter set or exemplar set after trained. Having haversted remarkable successes, this paradigm of recognition still gets stuck in problems of recognizing from different view-points, intra variant charateristic of object classes, object occlusion (including self-occlusion), varied appearance, multi-pose (human action). So, is it the rule for future vision system?</p>
<p>In spite of the domination of low-level feature based recognition system, there are still some prospective paradigms that thinks differently. The one that I see is a kind of system having a huge database of examples and a extremely robust image matching engine. How can such a system be built? The most notable research group that are pursuing this paradigm is hold at MIT CSAIL. The most motivated person is Antonio Torralba. His interest is exploiting huge database advantage for recognition. His recent works such as spatial envelope, 80 million tiny images, SIFT flow, have sketched a bright picture about how the second paradigm should be.</p>
<p align="center"><img src="http://people.csail.mit.edu/celiu/ECCV2008/pictures/framework.jpg" alt="" width="371" height="205" /></p>
<p>Let&#8217;s see what he did with SIFT flow. This is an image matching technique that finds similar images in their semantics. From a huge database of topics (e.x street, building, cars, people) SIFT flow can match a given image against the database to find out the correct label for the test image. There are two points for such systems. The first point, database must be prepared carefully and resonable. Instance s in the database should be as many as possible. The second point, the image matcheing engine have to be generalized enough. The more images database contains, the more diversities in appearance. If the matching engine is not generalized enough, we fail the mission. Another point is the engine should be fast otherwise it will take hours to compare many thounsands of images. The obvious disadvantage of sencond paradigm is expensive and unportable. However, machine vision is still far from daily life applications.</p>
<p>From the bests of my knowledge, the learning paradigm has produced a vast body of interesting literatures dedicated for themselves. After the arfiticial neural network phenomenon, artificial intelligence community has lowered their head temperature down. However, the marriage between traditional statistics and AI has inspired a new field: statistical machine learning. After Vapnik invented the Support Vector Machine with the core idea but thereotical VC dimension, Learning Theory was born. Simultaneously, inference techniques in probability also take an important position in the current machine learning literature. Graphical model has inspired reseachers to design fancy models that can express dependencies between object in an image or video. The more tools we have, the more products we can create. But the Great Wall of computer vision still stand there without moving back. How can computer recognize objects from different points of view? How can computer think oak tree and pine tree is in the tree class? Again, people begin to study in deductive transfer learning. So far, multitask learning is still lack of coherence.</p>
<p>On the other side, people tend to forget about pattern recognition. In today famous computer vision conferences, the rate of paper submission in pattern recognition is quite low. Undoubtely, it is a hard topic to cope with. The hard point lies in there is no specific pattern to deal with. Human brain thinks about objects using somehow coarse concepts. These concepts are maintained by a set of informative features or a huge set of exemplars, it is in controversy. But it is worthwhile for us to try all the posibility. Pattern matching also requires seminal works in vision feature. Consequently, whatever paradigms computer vision researchers choose to work with, vision and psychology researchers continue their own works diligently./.</p>
<p style="text-align:right;">Qui Nhon, Feb 17, 2009</p>
<p style="text-align:right;">Phong Vo</p>
<table style="height:18px;" border="0" cellspacing="0" width="656">
<tbody>
<tr>
<td width="5%"></td>
<td width="22%"></td>
<td width="23%"></td>
<td rowspan="2" width="23%"></td>
<td width="27%"></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vodinhphong.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vodinhphong.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vodinhphong.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vodinhphong.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vodinhphong.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vodinhphong.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vodinhphong.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vodinhphong.wordpress.com/100/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vodinhphong.wordpress.com/100/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vodinhphong.wordpress.com/100/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=100&subd=vodinhphong&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://vodinhphong.wordpress.com/2009/02/17/computer-vision-two-paradigms-one-mission/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/51f255d405a0b3dddbbb3bd29282512c?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vodinhphong</media:title>
		</media:content>

		<media:content url="http://people.csail.mit.edu/celiu/ECCV2008/pictures/framework.jpg" medium="image" />
	</item>
		<item>
		<title>How to conduct a research problem</title>
		<link>http://vodinhphong.wordpress.com/2008/12/21/how-to-conduct-a-research-problem/</link>
		<comments>http://vodinhphong.wordpress.com/2008/12/21/how-to-conduct-a-research-problem/#comments</comments>
		<pubDate>Sun, 21 Dec 2008 13:56:41 +0000</pubDate>
		<dc:creator>vodinhphong</dc:creator>
				<category><![CDATA[Research]]></category>

		<guid isPermaLink="false">http://vodinhphong.wordpress.com/?p=96</guid>
		<description><![CDATA[I&#8217;ve read a extremely good guide about how to to start out a research. I strongly recommend for rookies:
Literature search for computer science
Todd Veldhuizen
Chalmers Technical University
June 17, 2005

So, it&#8217;s the time to talk about the guide.  There are some good points in the guide that I want to re-present below:
i. Literature, should or shouldn&#8217;t?
ii. Kinds [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=96&subd=vodinhphong&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>I&#8217;ve read a extremely good guide about how to to start out a research. I strongly recommend for rookies:</p>
<blockquote><address>Literature search for computer science<br />
Todd Veldhuizen<br />
Chalmers Technical University<br />
June 17, 2005</address>
</blockquote>
<p>So, it&#8217;s the time to talk about the guide.  There are some good points in the guide that I want to re-present below:</p>
<p style="padding-left:30px;">i. Literature, should or shouldn&#8217;t?<br />
ii. Kinds of research<br />
iii. A unification process<br />
iv. Online search tools for CS<br />
v. Organizing your bibliography</p>
<p>Why do I care such problems? Ask yourself first, please! If you are a newbie in your field but already know a little how to conduct a research, I will wonder similar questions as me. If not, there are two cases for you: either you are not interested in research or you are doing research in a wrong way. Let&#8217;s focus the main points. (Note: read the guide before read below discussions)</p>
<h2><strong>i. Literature, shold or shouldn&#8217;t?</strong></h2>
<p>Literature reviewing help us build a wide knowledge about the field. You will know how far experts have gone so far, avoid producing duplicate already known results, have more creative ideas from them. Believe me, you cannot create any new without reviewing some state-of-the-art papers! Here I want to emphasize a key idea: your brain alone cannot think out anything worthwhile. (:D) Right?</p>
<p>However, literature makes us to run on given tracks. In other word, we lose imagination/intuition. Tell me if you experienced a similar feeling.</p>
<p>So, how much is enough? Neither read too much nor too few. I am getting stuck in tons of papers because of my ambitious. So be careful!</p>
<p>The second point is our brain cannot remmember exactly what it absorbed, especially in the case of research papers. I am unmanageable with a lot of papers,  lot of theorems, formulas, hypotheses etc., that are accumulated in many years, from various of conferences, journals, tutorials, talks, etc. How cannot you! After reading paper, I try to summarize the paper in several aspects:</p>
<ol>
<li> &#8211; Authors&#8217; claim: know what those authors are trying to do.</li>
<li> &#8211; Results: analyze the result as deep as possible. The result reflects wether authors achieve their claims or not. It also points out useful information about the problem. For example, assumed that I am doing the problem of classification. So, the confusion matrix will show me which class is likely to be confused with another class. Is this defect is jus happen in a particular method, or appear across methods? It is also a good objective to achive by yourself (your paper). Result analysis is a new technique for me. No not just look at how many percentage the method performs. We must analyze data as a geographer analyzes his data to forecast the weather.</li>
<li> &#8211; Paradigm awareness: you should aware which paradigm the paper belongs to. Otherwidse, you will be frustrated along the increase of the number of papers you accumulate day by day. For instance, you want to propose a new method in action recognition feld. There are several approaches and you have to choose one of them. You might like using template-based methods, or NLP-inspired methods, or dynamical modeling methods etc. You should able to categorize a new paper.</li>
<li> &#8211; Detail steps: you can investigate pros/cons at each step. Which method is used? Its weak-points? What is it advantage? Why don&#8217;t we use another one?</li>
<li> &#8211; Core idea: this is the most important part. This paper is a gold mine or a bullshit? What is its hidden core idea? Often, the core idea is sufficiently expressed in one or two long sentences. Core idea reveals wether the paper it novel/interesting/new approach or just reuse prevailing wisdom and add more spices.</li>
<li> &#8211; Anything else you want to note it down.</li>
</ol>
<p>In conclusion:</p>
<ul>
<li> &#8211; Extensive reviewing but not too much</li>
<li> &#8211; No need to comprehend all the method, catch the core idea</li>
<li> &#8211; Deep reading some typical papers</li>
<li> &#8211; Take notes (Claim, Results, Approach, Idea, etc.)</li>
</ul>
<h2>ii. Kinds of research</h2>
<p>There are two kinds of research (just a personal view, but I find it useful)</p>
<h3>*. Problem-oriented</h3>
<ul>
<li>- Seeks to solve a problem or understand a phenomenon.</li>
<li> &#8211; Usually an objective standard of what constitutes success: proof, experimental results that yield insight on the    phenomenon, performance on benchmark problems, usability studies, etc.</li>
<li> &#8211; Generally open to a wide array of approaches – anything is valuable, if it works or yields insight</li>
<li> &#8211; More in tune with what we think of as “science” (experimental veri?cation, replicability, falsi?cation, etc.)</li>
</ul>
<h3>*. Paradigm-oriented</h3>
<ul>
<li> &#8211; Starts from a presumption that a given idea or paradigm is true/valuable, and tries to explore, further develop, justify, ?nd applications, etc.</li>
<li> &#8211; Often have a community of researchers who band together based on similar interests; conferences with titles</li>
<li> &#8211; Often not receptive to studies that question, attack, or propose alternatives to the paradigm.</li>
<li> &#8211; Often lack an objective criterion of success; papers are judged based on whether they are “novel” or “interesting,” rather than how well they perform on some benchmark problems.</li>
<li> &#8211; Nonetheless a lot of interesting ideas come out of such communities.</li>
<li> &#8211; When doing problem-oriented research, you frequently need ideas from several such communities, and the fact that these areas have been thoroughly explored by paradigm-oriented researchers is very helpful.</li>
</ul>
<p>Let&#8217;s make a chitchat. At a first glance, I think we (I and fellow) are members of the flock named problem-oriented. But&#8230;wait, what problem do we have? Are we really a problem-orienter? Not at all. We have no problem, we are free, we are freedom man, we have no problem to worry about. No grand, no fund, no add-hoc problems, no career. We are paradigm-orienter indeed. Let&#8217;s look at the way you (and me) conduct a research problem. Eh, I surf the Google, type some keyword, and  begin to find state-of-the-art papers. After a while (some days or weeks), you say &#8220;Eureka!&#8221; and begin to grasp ever best publications. What a marvelous method! How it can be! Your mouth utters such trivial words. And then? You figure out a schedule to implement. Finally, you modify minor parts, or alter some insignificant modules to produce your own model. If you do exactly what I say, you are in a fad. For instance, some of my friends is faddists. Please understand it on the positive meaning. When a new technique/approach appears, it’s unclear what it’s good for and how it compares to other approaches. A lot of “fad” papers help answer useful questions of the form: “Is X good for Y?”.</p>
<pre>The most important characteristic of paradigm-oriented:</pre>
<blockquote><p>Paradigm-oriented communities seem particularly susceptible to fads, fashions, memes, bandwagons, etc.<br />
You frequently need helpful ideas from several such communities.<br />
A paradigm stream is often well-studied.</p></blockquote>
<p>However, the idea I want to address here is, problem-oriented and paradigm-oriented are two phases in a circular/feedback process. Routinely, a rookie grasps some papers first, and then selects the most one that he finds interested in. He has completed the first phase &#8211; paradigm-oriented. Fow now he should write a statement of research. Clearly, he starts the problem-oriented phase. Either he is still in that fad or decide to propose another fad to compete agaisnt, a rookie really needs a clear plan to do. So, we move to the next section.</p>
<h2>iii. A unification process</h2>
<p>Why is it called? Let me tell you a very short story. A famous poem in my hometown wrote a special poem. Whatever direction you read, from top to bottom and vice versa, the poem is totaly readable and sounds great. The same phenomenon again happens to the process below:</p>
<pre>1. Identify the problem.</pre>
<p>a. Write a problem statement<br />
b. Avoid a preemptive selection of approaches or solutions<br />
c. Understand the context in which the problem must be solved.</p>
<pre>2. Identify:</pre>
<p>a. Constraints that solution must satisfy<br />
b. Criterion for a success publication in your particular field.<br />
(Depending on which conference/journal you want to submit, constraints and criterion should looser or tighter)</p>
<pre>3. Enumerate possible alternative solutions, and judge them according to the criterion and constraints.</pre>
<pre>4. Select the best solution.</pre>
<pre>5. Implement and evaluate it.</pre>
<pre>(6. Writing manuscript)</pre>
<h2>For paradigm-oriented research: reverse the above process.</h2>
<pre>1. Select the best solution.</pre>
<pre>2. Implement and evaluate it.</pre>
<pre>3.  Enumerate possible alternative solutions, and judge them according to the criterion and constraints.</pre>
<pre>4. Identify:</pre>
<p>a. Constraints that solution must satisfy<br />
b. Criterion for a success publication in your particular field.<br />
(Depending on which conference/journal you want to submit, constraints and criterion should looser or tighter)</p>
<pre>5. Identify the problem.</pre>
<p>a. Write a problem statement<br />
b. Avoid a preemptive selection of approaches or solutions<br />
c. Understand the context in which the problem must be solved.</p>
<pre>(6. Writing manuscript)</pre>
<h2>iv. Search tools for CS</h2>
<p>I just write down must-see sites:</p>
<ol>
<li> 1. Google Scholar [http://scholar.google.com/]</li>
<li> 2. CiteSeer [http://citeseer.ist.psu.edu/ http://citeseer.csail.mit.edu/]</li>
<li> 3. Collection of Computer Science Bibliographies [http://liinwww.ira.uka.de/bibliography/index.html]</li>
<li> 4. ACM Guide [http://portal.acm.org/guide.cfm]</li>
<li> 5. UMI Dissertations [http://wwwlib.umi.com/dissertations/search]</li>
<li> 6. DBLP [http://www.informatik.uni-trier.de/~ley/db/]</li>
</ol>
<h2>v. Organizing your bibliography</h2>
<p>You should think about indexing and storing manually your papers if you don&#8217;t want to be burdened by chaos. Especially, if you have plan for paper writing, an organized bibliography will be helpful. Keeping track of experiment outcomes during a long time is also a good topic to discuss, but not now. I recommend you to make the accquaintance of LaTeX first. You will persuaded by its beauty and well-structured writing style. But it is not a game to play with. Consider to spend time to learn about it and LaTex will be an elegant weapon. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>There are two ways in order to organize your bibliography:</p>
<ol>
<li> 1. Use a well-known tool, such as EndNote or so. Get a short tutorial about the software and you are done.</li>
<li> 2. Maintain a BiBTeX file along with a hierarchical folder structure for storing electronic prints and a search tool. For detail, please review the guide.</li>
</ol>
<h2>* Conclusion</h2>
<p>I hope above discussions will help you in preparing for your own a good research skill.<br />
Contact me at vdphong@fit.hcmuns.edu.vn or actionrecognition@gmail.com</p>
  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/vodinhphong.wordpress.com/96/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/vodinhphong.wordpress.com/96/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/vodinhphong.wordpress.com/96/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/vodinhphong.wordpress.com/96/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/vodinhphong.wordpress.com/96/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/vodinhphong.wordpress.com/96/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/vodinhphong.wordpress.com/96/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/vodinhphong.wordpress.com/96/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/vodinhphong.wordpress.com/96/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/vodinhphong.wordpress.com/96/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=vodinhphong.wordpress.com&blog=5194527&post=96&subd=vodinhphong&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://vodinhphong.wordpress.com/2008/12/21/how-to-conduct-a-research-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/51f255d405a0b3dddbbb3bd29282512c?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">vodinhphong</media:title>
		</media:content>
	</item>
	</channel>
</rss>