Interactive Machine Learning

December 20, 2011

Here are some short notes and link resources for interactive machine learning I’ve read.

  • A brief intro from blog Machine Learning Thoughts: http://hunch.net/?p=49
  • From the same blog, the article Interactive Machine Learning : http://hunch.net/?p=322. It defines what is and isn;t interactive ML. Active Learning is also included in Interactive ML, of course. The most important fact about AL is that “In active learning, the interaction is choosing which examples to label, and the learning is choosing from amongst a large set of hypotheses.” The keypoint is “which example to label”. There are plenty of false positives and false negatives, but we should not randomly select one of them. It turns out that the algorithm is not going to learn better in that case.
  • An useful resource page about Interactive Machine Learning : http://hunch.net/~jl/projects/interactive/
  • ICML and KDD 2010 Tutorial on Learning through Exploration http://hunch.net/~exploration_learning/

Bug when compiling Mex-files

December 12, 2011

Ones might encounters problems from compilers when trying to compile MEX files. Assume that you are in Linux, with installed GCC/G++ 4.x and MATLAB R2009x or R2010a. The machine can announces that some libs are absent, i.e. libstdcXXX but the fact is that you already install everything! Okay, the problem is just that there are small problems in the configuration file mexopts.sh, the bash file mentions everything about OS environment for MATLAB. By default, Mathworks encapsulates compilers with Matlab versions, and this compiler version soon to be outdated from the regular update of compiler, i.e. GCC. So, it might cause version conflict when MATLAB tries to find an exact compiler having version XYZ while your machine just have version ZYZ.01 which is more recent.

The solution for this is quite simple. You do not need to create any further symbolic links. Just go into your home directory by typing cd ~; then cd .matlab; then cd R20XXX; then open file mexopts.sh and start modifying.

Depending on your machine which is glnx86 or glnxa64, you must change the lines CC and CXX from gcc-x.xx.x to gcc, and g++-x.xx.x to g++. That’s it! Save it and enjoy Mex files.

Remark: every time you call mex -setup, this file is overrided, then you have to do this stuff again.


Compute Chisquared Kernel using “Matlab matrix style”

December 12, 2011

This is for whom just want to use matrix-style in Matlab rather than looping. The thumb of rule is simple: you want simpler codes, so you must trade it off with more memory. Sometimes, elegant code and speed is your choice.

The formula of \chi^2 kernel is as follows:

k(x,y) =\sum_{i=1}^p\frac{(x_i-y_i)^2}{x_i+y_i}

Given two matrices A\in\mathbb{R}^{m\times p} and B\in\mathbb{R}^{n\times p} in which p is the dimensionality of data. Says we want to compute the” Gram matrix” between A and B (strictly speaking, we call it Gram matrix iff A\equiv B). In other word, we would like to compute \chi^2 kernel value between every possible pairs (a_i,b_j),i=1,\ldots,m,j=1,\ldots,n.

The Matlab code for that stuff is the following:


[m p] = size(A);
n = size(B,1);

aa = repmat(A,[1 n]);
bb = repmat(reshape(B,1,[]),[m 1]);
D = reshape(((aa-bb).^2)./(aa+bb),[m*n p]);
D = reshape(sum(D,2),[m n]);


Constrained Quadratic Programming Solver – qpOASES

December 12, 2011

Xin giới thiệu cho những ai muốn làm optimization với constrained quadratic programming, toolbox free qpOASES. Thực sự số lượng toolbox tốt cho optimization không nhiều. Ví dụ nếu muốn solve một cái problem nào thì xài Matlab cracked có thể OK nhưng muốn tăng tốc thì hơi bị chua. Giải pháp là kiếm cái lib chạy trên C/C++. Theo thống kê trên wiki thì số toolbox xài free được rất hiếm.

May mắn là phát hiện ra chú này,

http://www.kuleuven.be/optec/software/qpOASES

giải rất kool. Biên dịch cũng cực kỳ đơn giản vì chả cần third party libs nào hết. Thằng này có interface cho cả matlab và trong option của nó có nhiều lựa tuỳ biến cho solver. Qua su dung thay thang nay kha robust!

Có điều cần lưu ý là nó không tolerable bằng Matlab opt. toolbox nhá! Nó dễ gặp các vấn đề về numerical, nếu data của mình ko được hiệu chỉnh kỹ lưỡng. Cái này thì problem dependent, nên ko thể có một trick chung được. Tuy nhiên, ko nên để magnitude quá chênh lệch giữa Hessian matrix H và f:

\frac{1}{2}x^THx+f^Tx

Thông thường, nhớ kiểm tra điều kiện semi-positive definite của H. Có thể kiểm tra nhanh bằng cách dùng hàm eigs của Matlab để tìm eigenvalues của H. Nếu tồn tại một negative eigenvalues thì coi như tiêu.Trường hợp H suy biến (singular/ill-conditioned) thì nên cộng vào một lượng bias nhỏ dọc đưòng chéo chính của H:
</pre>
H = H + diag(1e-4*ones(size(H,1)));


Regularization Paths and Coordinate Descent – videolectures.net

September 20, 2011

Regularization Paths and Coordinate Descent – videolectures.net.


A Unified View of Matrix Factorization Models – videolectures.net

June 15, 2011

A Unified View of Matrix Factorization Models – videolectures.net.


On the varieties of machine learning

June 7, 2011
  1. Supervised Learning (Induction Learning): train on large, labeled data sets drawn from same distribution as testing data
  2. Semi-supervised Learning (Induction Learning): same as above but, add large unlabelled or weakly labeled data sets from same domain.
  3. Transfer Learning (Induction Learning): Transfer learning is what happens when someone finds it much easier to learn to play chess having already learned to play checkers; or to recognize tables having already learned to recognize chairs; or to learn Spanish having already learned Italian.
  4. Transductive Learning: Given unlabeled test data during training, transfer the information from labeled examples to unlabeled.
  5. Self-taught Learning:…

Machine learning formalisms for classifying images of elephants and rhinos. Images on orange background are labeled; others are unlabeled. Top to bottom: Supervised classification uses labeled examples of elephants and rhinos; semi-supervised learning uses additional unlabeled examples of elephants and rhinos; transfer learning uses additional labeled datasets; self-taught learning just requires additional unlabeled images, such as ones randomly downloaded from the Internet. [Rajat Raina ICML'07].


Optimization for Machine Learning

June 5, 2011


Scene Discovery by Matrix Factorization

June 5, 2011


Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning

May 25, 2011

Follow

Get every new post delivered to your Inbox.