vodinhphong

Survive from Codecs hell

In Techniques on August 25, 2009 at 3:28 am

1. Introduction

Tackling video processing, everyone has to equip by oneself a minimal knowledge on video formats, video encoding, video decoding. As I moved from 2D object recognition into action/event recognition, it is clear that I have to process on a new kind of data, videos instead of images. So far I have used OpenCV for almost tasks and found that it’s quite cool. But the auxiliary part of it, HighGUI, is not cool at all. HighGUI supports all kind of image formats by using third party libraries. In Windows redistribution package, image libraries are pre-complied as DLL files (libguide.dll) so that it is transparent. In Linux redistribution package, the OS has to be installed along with libpng, libjpeg, libtiff, .etc. Today these basic ones are packaged in popular Linux OS, such as Ubuntu, Fedora. The problem is not image but video. HighGUI video functionality behaves the same way as image codecs. It uses third parties libraries  on video reading/writing. In Windows, HighGUI uses the prosperous WinAPI for multimedia. In Linux, HighGUI uses the popular FFMPEG encoding/decoding API. In recent versions (1.0, 1.1pre1, and nightly build), OpenCV has problems with FFMPEG, especially in Ubuntu and Red Hat. It seems that it works properly in Fedora. Note that OpenCV is distributed under 2 ways: by manually download and install or install through OS’s repository. If you choose the latter way, it MIGHT work. Nevertheless, the problem here is FFMPEG is upgraded to version 0.5, in which some functions were deprecated, i,e img_convert, and header files directory structure wass changed. Unfortunately, OpenCV HighGUI has not been updated. There are several tips to fix the problem but personally I cannot apply them to work. That’s why I wrote this tutorial.

Return to our problem, says How to read (sometimes write) a video file? Assumed that you want to complete the task without cost (i.e spend money to buy a commercial cool video software for that task, or hire a expert to do it for you). I’m talking in the realm of Open Sources. As mentioned above, this article is implicit for computer vision tasks. Therefore you just need to extract frames (no audio) and no more (no fancy effects, no post-processing, no substitutes, no anti-aliasing, etc.)

The first concept you have to know is “codecs”. In the next part, I talk about how to convert between codecs and provide some converter tools. For who wants to programming with Video I/O API,  the last part introduce a few shortcuts.

2. Background

Codec definition

A codec is a device or computer program capable of encoding and/or decoding a digital data stream or signal. The word codec is a portmanteau of ‘compressor-decompressor’ or, most accurately, ‘coder-decoder’. A video codec is a device or software that enables video compression and/or decompression for digital video. The compression usually employs lossy data compression.

Commonly used standards and codecs

H.261: Used primarily in older videoconferencing and videotelephony products. It included such well-established concepts as YCbCr color representation, the 4:2:0 sampling format, 8-bit sample precision, 16×16 macroblocks, block-wise motion compensation, 8×8 block-wise discrete cosine transformation, zig-zag coefficient scanning, scalar quantization, run+value symbol mapping, and variable-length coding. H.261 supported only progressive scan video.

MPEG-1 Part 2: Used for Video CDs, and also sometimes for online video. In terms of technical design, the most significant enhancements in MPEG-1 relative to H.261 were half-pel and bi-predictive motion compensation support. MPEG-1 supports only progressive scan video.

MPEG-2 Part 2: Used on DVD, SVCD, and in most digital video broadcasting and cable distribution systems. When used on a standard DVD, it offers good picture quality and supports widescreen. In terms of technical design, the most significant enhancement in MPEG-2 relative to MPEG-1 was the addition of support for interlaced video.

H.263: Used primarily for videoconferencing, videotelephony, and internet video. H.263 represented a significant step forward in standardized compression capability for progressive scan video.

MPEG-4 Part 2: An MPEG standard that can be used for internet, broadcast, and on storage media. It offers improved quality relative to MPEG-2 and the first version of H.263. It also included some enhancements of compression capability, both by embracing capabilities developed in H.263 and by adding new ones such as quarter-pel motion compensation. Like MPEG-2, it supports both progressive scan and interlaced video.

DivX, Xvid, FFmpeg MPEG-4 and 3ivx: Different implementations of MPEG-4 Part 2.

MPEG-4 Part 10 This emerging new standard is the current state of the art of ITU-T and MPEG standardized compression technology, and is rapidly gaining adoption into a wide variety of applications. It contains a number of significant advances in compression capability, and it has recently been adopted into a number of company products, including for example the XBOX 360, PlayStation Portable, iPod, iPhone, the Nero Digital product suite, Mac OS X v10.4, as well as HD DVD/Blu-ray Disc.

WMV (Windows Media Video): Microsoft’s family of video codec designs including WMV 7, WMV 8, and WMV 9. It can do anything from low resolution video for dial up internet users to HDTV.

RealVideo: Developed by RealNetworks.

Cinepak: A very early codec used by Apple’s QuickTime.

Huffyuv: Huffyuv (or HuffYUV) is a very fast, lossless Win32 video codec written by Ben Rudiak-Gould and published under the terms of the GPL as free software, meant to replace uncompressed YCbCr as a video capture format.

Container

A container or wrapper format is a file format, or often a stream format (the stream need not be stored as a file) whose specifications regard only the way data are stored (but not coded) within the file, and how many metadata could or are effectively stored, whereas no specific codification of the data themselves is implied or specified.

Video container

Simple container formats can contain different types of audio codecs, while more advanced container formats can support multiple audio and video streams, subtitles, chapter-information, and meta-data (tags) — along with the synchronization information needed to play back the various streams together.

Some containers are exclusive to audio:

Other containers are exclusive to still images:

  • TIFF (Tagged Image File Format) is a wrapper file format for still images and associated metadata.

Other flexible containers can hold many types of audio and video, as well as other media. The most popular multi-media containers are:

3. Codecs softwares/libraries

How do we can watch high definition movies on PC or watch Youtube on cellphone? Quite often we have to install a media player, i.e Windows Media Player, RealPlayer, QuickTime, SMPlayer, KMPlayer, Media Classic Player, VLC Player. But sometimes video codecs are already supported by OS and therefore we do not need to do anything. Commercial multimedia players have their own codec components. These codecs can be developed independently or based on some popular multimedia APIs. Some typical codec packs are FFMPEG, VFW, DirectShow, Mencoder.

In Linux, almost players or multimedia editors are based on FFMPEG. This is a universal codec supporting nearly every codec standard. Furthermore, as FFMPEG is open source under GPL license and well-documented, it is quite easy to program. FFMPEG is distributed along with 3 major components: libavcodec – responsible for decoding/encoding videos, libavformat – responsible for detect & read video/audio streams from video containers, libswscale – software scaling. In fact, FFMPEG is a powerful command line based converter; however one can use it as a tool or use as a external library. It is a good idea to write your player using FFMPEG.

In Windows, players can be developed in many ways. The first way is to use Microsoft’s DirectX SDK. Historically, VFW(Video for Windows) was the first API in Windows 3.1. Then it was integrated in DirectX 5. Years by year, video encoding/decoding has been separated into DirectShow, a component of DirectX. DirectShow provides a comfortable environment on which third party softwares develop their own multimedia applications/video filters. Clearly, DirectShow is a comprehensive but complex API. The second way to develop video players/video converters in Windows is to write from scratch (i.e SMPlayer with Mencoder, VLC Player), or ultilize from other sources (i.e  FFMPEG) without depending on DirectX.

In other OSs, there are other codec standards and libraries! Easy to see that the number of codecs libraries/softwares are so large that people call them ‘codecs hell’. The keypoint to remmenber is there is no consistent way to deal with video codecs across OS. Consequently, we have to use various strategy to satisfy our needs. For who just want to use tool, it is quite straightforward to be done. In this section I introduce some useful tools.

In Windows

VirtualDub [open source, GUI] is a video capture/processing utility for 32-bit Windows platforms (95/98/ME/NT4/2000/XP), licensed under the GNU General Public License (GPL). It has batch-processing capabilities for processing large numbers of files, frame editing and can be extended with third-party video filters.  VirtualDub is mainly geared toward processing AVI files, although it can read (not write) MPEG-1 and also handle sets of BMP images. In order to read MPEG-2 video, install this Plugin. Virtual Dub does not provide rich command line options.

SUPER [freeware, GUI] Simplified Universal Player Encoder & Renderer. A GUI to FFmpeg, MEncoder, MPlayer, x264, musepack, monkey’s audio, true audio, wavpack, ffmpeg2theora and the theora/vorbis RealProducer plugIn. Do not provide command line options.

MEncoder [open source, cmd] is a free command line video decoding, encoding and filtering tool released under the GNU General Public License. It is a close sibling to MPlayer and can convert all the formats that MPlayer understands into a variety of compressed and uncompressed formats using different codecs.

In Linux

FFmpeg [open source, cmd] is a complete, cross-platform solution to record, convert and stream audio and video. It includes libavcodec – the leading audio/video codec library. FFmpeg provides very powerful command line options.

4. Video I/O programming

In cases you need to work with frame on-the-fly or real-time application, let’s play with API. In particular, I am discussing about OpenCV and FFMPEG functionality. Why are they? The answer is straightforward: OpenCV is an outstanding computer vision library and FFMPEG is a matured, well-documented, stable codec library so far. OpenCV can call FFMPEG into operation but in some circumstances (as I told above), we need to what really happens behind the stage. If good libraries such as FFMPEG are not available (i.e in Windows), HighGUI interface needs to be well understood so that we can eliminate difficulties caused by codec hell.

4.1 In Windows

There are generally two ways to achieve the goal. The first one is to use DirectShow API from DirectX SDK. This is useful if you want to do complex tasks. There are tons of documentations and tutorials on using DirectShow alone or use with OpenCV. [add links here] Here I will use the minimal effort so that an application can decode a video effectively.

In Windows, HighGUI provides a interface function, i.e cvCaptureFromFile, to open a video file. At the lower level, it communicates with VFW (Video for Windows) API to decode that video. The fact is that VFW is an old video decode/encode API released in Windows 3.1. VFW was integrated into DirectX 5 and DirectShow later. Consequently, VFW is an outdated API and few codecs were supported by it. Fortunately, some external codec packages provide VFW compliant video codecs for us. The most reliable one I know is K-lite Codec Pack. Note that some codec packages can decode almost every codecs but I might not be available for other programs to rely on. For instance, installing KMPlayer or SMPlayer does not help to solve the codec problem of OpenCV. Therefore in order to read a MPEG-1 video file, please go to this site[link] and download the latest version of K-Lite Codec Pack Mega Package. Remember that just the Mega package provides VFW codecs. One more note, during the installation steps, please check on VFW codecs and check video codecs that you prefer to be supported on OpenCV.

Apparently, VFW does not provide very much codec standards. Supported codecs are:

  • XviD [version 1.2.2] – an implementation of MPEG-4
  • DivX [version 6.8.5]- an implementation of MPEG-4
  • x264 [revision 1145]
  • On2 VP6 [version 6.4.2.0]
  • On2 VP7 [version 7.0.10.0]
  • Intel Indeo 4 [version 4.51.16.2]
  • Intel Indeo 5 [version 5.2562.15.54]
  • Intel I.263 [version 2.55.1.16]
  • huffyuv [version 2.1.1 CCE Patch 0.2.5] – free losses codec
  • DivX [version 3.11]- an implementation of MPEG-4
  • YV12 (Helix) [version 1.2]

If you want to read an MPEG-2 video file, the fast way is to convert it to one of above formats. Recommended tool in Windows is SUPER, and in Linux is FFMPEG, i.e $ ffmpeg -i video.mpeg -vcodec mpeg4 video.avi.

For the final word, once VFW codecs were installed, feel free to use cvCreateCaptureFromFile to open a video file. However, if you want to write video file, there is one way: write it as an uncompressed video (raw format), i.e CV_FOURCC_DEFAULT.

4.2 In Linux

There are exist couples of tutorials on programming with FFMPEG, , . As FFMPEG releases recent versions, some of tutorials were out-of-date. I searched I found an up-to-date version that allows opening a video file, reading frame by frame in a while-loop. This simple example was modified into a C++ class in which Open() and NextFrame() are called to open a video file, read the next frame. This class is called Video, and you can change to the name you prefer. This code is compiled well under GCC 4.xx, Ubuntu Linux, and FFMPEG 0.5.

/*
 * video.h
 *
 *  Created on: Jul 4, 2009
 *      Author: phong
 */

#ifndef VIDEO_H_
#define VIDEO_H_
#ifdef HAVE_LINUX_FFMPEG
extern "C"
{
	#include
	<libavcodec/avcodec.h>
	#include
	<libavformat/avformat.h>
	#include
	<libswscale/swscale.h>
}

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

/**
 * @brief Video class used to decode all type
 * of codecs. Dedicated to Linux OS.
 * This class is an interface between high level
 * function call and FFMPEG API.
 *
 * @note Not available in Windows
 */
struct Video
{
	AVFormatContext *pFormatCtx;
	int             i, videoStream;
	AVCodecContext  *pCodecCtx;
	AVCodec         *pCodec;
	AVFrame         *pFrame;
	AVFrame         *pFrameRGB;
	AVPacket        packet;
	int             frameFinished;
	int             numBytes;
	uint8_t         *buffer;

	int width;
	int height;
	int step;

	Video ():
		pFormatCtx(0), pCodecCtx(0), pCodec(0),
		pFrame(0), pFrameRGB(0), buffer(0)
		{};

	int SaveFrame(AVFrame *pFrame, int width, int height, int iFrame);

	int Open (const char* filename);

	int NextFrame ();

	int IsEnd ();

	int Close ();
};

#endif
#endif /* VIDEO_H_ */
/*
 * video.cpp
 *
 *  Created on: Jul 5, 2009
 *      Author: phong
 */
#include "video.h"

// avcodec_sample.0.5.0.c

// A small sample program that shows how to use libavformat and libavcodec to
// read video from a file.
//
// This version is for the 0.4.9+ release of ffmpeg. This release adds the
// av_read_frame() API call, which simplifies the reading of video frames
// considerably.
//
// Use
//
// gcc -o avcodec_sample.0.5.0 avcodec_sample.0.5.0.c -lavformat -lavcodec -lavutil -lswscale -lz -lbz2
//
// to build (assuming libavformat, libavcodec, libavutil, and swscale are correctly installed on
// your system).
//
// Run using
//
// avcodec_sample.0.5.0 myvideofile.mpg
//
// to write the first five frames from "myvideofile.mpg" to disk in PPM
// format.
#ifdef HAVE_LINUX_FFMPEG
int Video::Open (const char* filename)
{

    // Register all formats and codecs
    av_register_all();

    // Open video file
    if(av_open_input_file(&pFormatCtx, filename, NULL, 0, NULL)!=0)
    {
    	fprintf (stderr, "Cound not open file\n");
    	return 0;
    }
    // Retrieve stream information
    if(av_find_stream_info(pFormatCtx)<0)
    {
    	fprintf (stderr, "Could not find stream information\n");
    	return 0;
    }

    // Dump information about file onto standard error
    dump_format(pFormatCtx, 0, filename, false);

    // Find the first video stream
    videoStream=-1;
    for(i=0; i
<pFormatCtx->nb_streams; i++)
        if(pFormatCtx->streams[i]->codec->codec_type==CODEC_TYPE_VIDEO)
        {
            videoStream=i;
            break;
        }
    if(videoStream==-1)
    {
    	fprintf (stderr, "Did not find a video stream\n");
    	return 0;
    }

    // Get a pointer to the codec context for the video stream
    pCodecCtx=pFormatCtx->streams[videoStream]->codec;

    // Find the decoder for the video stream
    pCodec=avcodec_find_decoder(pCodecCtx->codec_id);
    if(pCodec==NULL)
    {
    	fprintf (stderr, "Codec not found\n");
    	return 0;
    }

    // Open codec
    if(avcodec_open(pCodecCtx, pCodec)<0)
    {
    	fprintf (stderr, "Could not open codec\n");
    	return 0;
    }

    // Hack to correct wrong frame rates that seem to be generated by some codecs
    if(pCodecCtx->time_base.num>1000 && pCodecCtx->time_base.den==1)
		pCodecCtx->time_base.den=1000;

    // Allocate video frame
    pFrame=avcodec_alloc_frame();

    // Allocate an AVFrame structure
    pFrameRGB=avcodec_alloc_frame();
    if(pFrameRGB==NULL)
    {
    	fprintf(stderr, "Could not allocate memory\n");
    	return 0;
    }

    // Determine required buffer size and allocate buffer
    numBytes=avpicture_get_size(PIX_FMT_RGB24, pCodecCtx->width,
        pCodecCtx->height);

    buffer = (uint8_t*)malloc(numBytes);

    // Assign appropriate parts of buffer to image planes in pFrameRGB
    avpicture_fill((AVPicture *)pFrameRGB, buffer, PIX_FMT_RGB24,
        pCodecCtx->width, pCodecCtx->height);

    width = pCodecCtx->width;
    height = pCodecCtx->height;
    step = pFrameRGB->linesize[0];

    return 1;
}

int Video::NextFrame()
{
    // Read frames and save first five frames to disk

    while(av_read_frame(pFormatCtx, &packet)>=0)
    {
        // Is this a packet from the video stream?
        if(packet.stream_index==videoStream)
        {
            // Decode video frame
            avcodec_decode_video(pCodecCtx, pFrame, &frameFinished,
                packet.data, packet.size);

            // Did we get a video frame?
            if(frameFinished)
            {
				static struct SwsContext *img_convert_ctx;

				// Convert the image into YUV format that SDL uses
				if(img_convert_ctx == NULL) {
					int w = pCodecCtx->width;
					int h = pCodecCtx->height;

					img_convert_ctx = sws_getContext(w, h,
									pCodecCtx->pix_fmt,
									w, h, PIX_FMT_RGB24, SWS_BICUBIC,
									NULL, NULL, NULL);
					if(img_convert_ctx == NULL) {
						fprintf(stderr, "Cannot initialize the conversion context!\n");
						return 0;
					}
				}
				int ret = sws_scale(img_convert_ctx, pFrame->data, pFrame->linesize, 0,
						  pCodecCtx->height, pFrameRGB->data, pFrameRGB->linesize);

				av_free_packet(&packet);

				if (ret > 0)
					return 1;
				else
				{
					fprintf (stderr, "Sws_Scale failed\n");
					return 0;
				}
            }
        }
    }

    return 0;
}

int Video::Close()
{
	if (buffer == 0 || pFrameRGB == 0 || pFrame == 0)
		return 0;

    // Free the RGB image
    free(buffer);
    av_free(pFrameRGB);

    // Free the YUV frame
    av_free(pFrame);

    // Close the codec
    avcodec_close(pCodecCtx);

    // Close the video file
    av_close_input_file(pFormatCtx);

	return 1;
}

int Video::SaveFrame(AVFrame *pFrame, int width, int height, int iFrame)
{
    FILE *pFile;
    char szFilename[32];
    int  y;

    // Open file
    sprintf(szFilename, "frame%d.ppm", iFrame);
    pFile=fopen(szFilename, "wb");
    if(pFile==NULL)
        return 1;

    // Write header
    fprintf(pFile, "P6\n%d %d\n255\n", width, height);

    // Write pixel data
    for(y=0; y<height; y++)
        fwrite(pFrame->data[0]+y*pFrame->linesize[0], 1, width*3, pFile);

    // Close file
    return fclose(pFile);
}
#endif

In order to open and read frame by frame, and convert raw frame into IplImage structure (OpenCV) optionally, I use below code snippet:

/*
 * videotest.cpp
 *
 *  Created on: Jul 13, 2009
 *      Author: phong
 */
#include "video.h"
#include <iostream>
#include <opencv/cxcore.h>
#include <opencv/cv.h>
#include <opencv/highgui.h>

using namespace std;

int videotest (int argc, char* argv[])
{
 if (argc < 1)
 {
 cout << "Usage: <videotest> <videofilename>" << endl;
 return 0;
 }

 Video video;
 if (!video.Open(argv[1]))
 {
 cout << "Could not open video file" << endl;
 return 1;
 }

 cvNamedWindow ("videotest");

 IplImage frame;
 cvInitImageHeader (&frame, cvSize(video.width,video.height), IPL_DEPTH_8U, 3);
 IplImage* grayscale = cvCreateImage (cvGetSize(&frame), IPL_DEPTH_8U, 1);

 while (video.NextFrame())
 {
 cvSetData (&frame, video.pFrameRGB->data[0], video.pFrameRGB->linesize[0]);
 cvCvtColor (&frame, grayscale, CV_RGB2GRAY);

 cvShowImage ("videotest", grayscale);
 cvWaitKey (10);
 }

 video.Close();
 cvReleaseImage (&grayscale);

 return 1;
}

5. Last word

I will update this tutorial so that one can encode and decode video formats using ffmpeg. By now,  just decoding function is available. Personally, I  think it is not a good idea when depending on a proxy library, such as OpenCV, to use another library. This tutorial can help avoiding such thing. Anyway, use if you like.