Feeds:
Posts

## Posts Tagged ‘computer vision’

Recently there was some excitement that a new earth-like exo planet was found. I was searching for the ways in which these new planets are detected and this is an accessible explanation.

Impressive idea. The way the objects are detected is pretty clever.

Oh the irony 🙂

Very clever idea to use motion capture and using it for virtual training against opponents. Neat !

Haven’t had a chance to play with it. But Tom Mitchell’s involvement lends it a sense of significance.

It is pretty cool that Wolfram Alpha now accepts formulas in TeX too. I never got a hang of Mathematica form and this will make it easier to enter formulas.

1. Yi Ma and the Blessing of Dimensionality
I posted a link on last week’s Biweekly links dissing a MSR article. The linked article gives a better discussion about Yi Ma’s research.

2. SpaceX Illustrates Privatization Risk
One of last week’s big news was the successful launch of SpaceX’s rocket. Surprisingly, I did not find a comprehensive discussion about their achievement. The linked article discusses some of the important themes.

3. HTML 5 is hot
A good slide detailing all the capabilities of HTML 5. Better viewed in one of the "modern" browsers. In a related post, intellectual honesty and html5  talks about Apple’s recent attempt at bolstering their HTML 5 credentials.

4. Google Chrome’s Developer Tools Improve Productivity
The developer tools in Chrome is becoming a powerful tool and this post talks about some of the important features. There is also a long video discussing the features in detail.

5. NASA JPL, robots and the AWS cloud
A creative application by NASA using multiple AWS components. Pretty impressive.

6. FASTer search from Intel and Oracle
A very interesting approach to improve database search. I skimmed through the paper and it looks interesting even though the idea of storing B-tree index in memory seems a bit unpractical , especially for very large databases. But a very creative idea nevertheless.

7. How the brain recognizes objects
Very interesting. New cognitive (and computation) models like this gives hope that we will ultimately know all about our brain.

1. How I teach machine learning
An interesting post by Hal on what to teach in the initial days of Machine Learning. I think not starting with Linear regression or Linear classifiers and instead focus on the more intuitive stuff makes sense . Of course, I will also try to teach some Decision Theory before plunging into other topics.

In a related topic in computer vision – here is a post which reverse engineers the face recognition software to make it fail .

2. Intel guru says 3-D Internet will arrive within five years
This is something I have been thinking about. Brining 3D to TV is the easy part. Brining it to Internet is the hard part. As people who have used VRML will remember , there are lot of hurdles to cross.

Using site speed in web search ranking – Site speed has been added as yet another factor in ranking. There seems to be lot of FUD to it even though it is just another factor. Matt Cutts’ post discusses some of the misconception.

4. Can We Trust Cloud Computing?
Yet another insightful article from Prof.Lipton. I think , businesses can use some private cloud facilities like Amazon Virtual Private Cloud but individuals might have to wait for efficient homomorphic encryption . I really liked his LP example to explain the core issue.

5. Enter the matrix: the deep law that shapes our reality
A nice article about the power of Random matrices. Very interesting ! I tried to follow Terry Tao’s Random Matrices course notes but after some time , things went over my head 😦

6. Apple News
The weekend arguably was all about Apple and IPhone 4.0 . Here is a detailed list of changes –  The Full App-by-App Breakdown.

The (in)famous post from Lee Brimelow – Apple Slaps Developers In The Face . John Gruber’s  insightful post (supposedly – because Jobs said so )  with which of course I don’t agree. Another very interesting post – Rethinking a Gospel of the Web.

Interesting !

8. Bringing a Smarter Search to Twitter, With Fees
An interesting idea – but will it work ? I don’t know.

If you liked this post , please subscribe to the RSS feed.

## Introduction To Mean Shift Algorithm

Its been quite some time since I wrote a Data Mining post . Today, I intend to post on Mean Shift – a really cool but not very well known algorithm. The basic idea is quite simple but the results are amazing. It was invented long back in 1975 but was not widely used till two papers applied the algorithm to Computer Vision.

I learned this algorithm in my Advanced Data Mining course and I wrote the lecture notes on it. So here I am trying to convert my lecture notes to a post. I have tried to simplify it – but this post is quite involved than the other posts.

It is quite sad that there exists no good post on such a good algorithm. While writing my lecture notes, I struggled a lot for good resources 🙂 . The 3 “classic" papers on Mean Shift are quite hard to understand. Most of the other resources are usually from Computer Vision courses where Mean Shift is taught lightly as yet another technique for vision tasks  (like segmentation) and contains only the main intuition and the formulas.

As a disclaimer, there might be errors in my exposition – so if you find anything wrong please let me know and I will fix it. You can always check out the reference for more details. I have not included any graphics in it but you can check the ppt given in the references for an animation of Mean Shift.

### Introduction

Mean Shift is a powerful and versatile non parametric iterative algorithm that can be used for lot of purposes like finding modes, clustering etc. Mean Shift was introduced in Fukunaga and Hostetler [1] and has been extended to be applicable in other fields like Computer Vision.This document will provide a discussion of Mean Shift , prove its convergence and slightly discuss its important applications.

### Intuitive Idea of Mean Shift

This section provides an intuitive idea of Mean shift and the later sections will expand the idea. Mean shift considers feature space as a empirical probability density function. If the input is a set of points then Mean shift considers them as sampled from the underlying probability density function. If dense regions (or clusters) are present in the feature space , then they correspond to the mode (or local maxima) of the probability density function. We can also identify clusters associated with the given mode using Mean Shift.

For each data point, Mean shift associates it with the nearby peak of the dataset’s probability density function. For each data point, Mean shift defines a window around it and computes the mean of the data point . Then it shifts the center of the window to the mean and repeats the algorithm till it converges. After each iteration, we can consider that the window shifts to a more denser region of the dataset.

At the high level, we can specify Mean Shift as follows :
1. Fix a window around each data point.
2. Compute the mean of data within the window.
3. Shift the window to the mean and repeat till convergence.

### Preliminaries

#### Kernels :

A kernel is a function that satisfies the following requirements :

1. $\int_{R^{d}}\phi(x)=1$

2. $\phi(x)\geq0$

Some examples of kernels include :

1. Rectangular $\phi(x)=\begin{cases} 1 & a\leq x\leq b\\ 0 & else\end{cases}$

2. Gaussian $\phi(x)=e^{-\frac{x^{2}}{2\sigma^{2}}}$

3. Epanechnikov $\phi(x)=\begin{cases} \frac{3}{4}(1-x^{2}) & if\;|x|\leq1\\ 0 & else\end{cases}$

Kernel Density Estimation

Kernel density estimation is a non parametric way to estimate the density function of a random variable. This is usually called as the Parzen window technique. Given a kernel K, bandwidth parameter h , Kernel density estimator for a given set of d-dimensional points is

${\displaystyle \hat{f}(x)=\frac{1}{nh^{d}}\sum_{i=1}^{n}K\left(\frac{x-x_{i}}{h}\right)}$

### Gradient Ascent Nature of Mean Shift

Mean shift can be considered to based on Gradient ascent on the density contour. The generic formula for gradient ascent is ,

$x_{1}=x_{0}+\eta f'(x_{0})$

Applying it to kernel density estimator,

${\displaystyle \hat{f}(x)=\frac{1}{nh^{d}}\sum_{i=1}^{n}K\left(\frac{x-x_{i}}{h}\right)}$

$\bigtriangledown{\displaystyle \hat{f}(x)=\frac{1}{nh^{d}}\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)}$

Setting it to 0 we get,

${\displaystyle \sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)\overrightarrow{x}=\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)\overrightarrow{x_{i}}}$

Finally , we get

${\displaystyle \overrightarrow{x}=\frac{\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)\overrightarrow{x_{i}}}{\sum_{i=1}^{n}K'\left(\frac{x-x_{i}}{h}\right)}}$

### Mean Shift

As explained above, Mean shift treats the points the feature space as an probability density function . Dense regions in feature space corresponds to local maxima or modes. So for each data point, we perform gradient ascent on the local estimated density until convergence. The stationary points obtained via gradient ascent represent the modes of the density function. All points associated with the same stationary point belong to the same cluster.

Assuming $g(x)=-K'(x)$ , we have

${\displaystyle m(x)=\frac{\sum_{i=1}^{n}g\left(\frac{x-x_{i}}{h}\right)x_{i}}{\sum_{i=1}^{n}g\left(\frac{x-x_{i}}{h}\right)}-x}$

The quantity $m(x)$ is called as the mean shift. So mean shift procedure can be summarized as : For each point $x_{i}$

1. Compute mean shift vector $m(x_{i}^{t})$

2. Move the density estimation window by $m(x_{i}^{t})$

3. Repeat till convergence

Using a Gaussian kernel as an example,

1. $y_{i}^{0}=x_{i}$
2. ${\displaystyle y_{i}^{t+1}=\frac{\sum_{i=1}^{n}x_{j}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}{\sum_{i=1}^{n}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}}$

### Proof Of Convergence

Using the kernel profile,

${\displaystyle y^{t+1}=\frac{\sum_{i=1}^{n}x_{i}k(||\frac{y^{t}-x_{i}}{h}||^{2})}{\sum_{i=1}^{n}k(||\frac{y^{t}-x_{i}}{h}||^{2})}}$

To prove the convergence , we have to prove that $f(y^{t+1})\geq f(y^{t})$

$f(y^{t+1})-f(y^{t})={\displaystyle \sum_{i=1}^{n}}k(||\frac{y^{t+1}-x_{i}}{h}||^{2})-{\displaystyle \sum_{i=1}^{n}}k(||\frac{y^{t}-x_{i}}{h}||^{2})$

But since the kernel is a convex function we have ,

$k(y^{t+1})-k(y^{t})\geq k'(y^{t})(y^{t+1}-y^{t})$

Using it ,

$f(y^{t+1})-f(y^{t})\geq{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(||\frac{y^{t+1}-x_{i}}{h}||^{2}-||\frac{y^{t}-x_{i}}{h}||^{2})$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-2y^{t+1}x_{i}+x_{i}^{2}-(y^{t^{2}}-2y^{t}x_{i}+x_{i}^{2}))$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2(y^{t+1}-y^{t})^{T}x_{i})$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2(y^{t+1}-y^{t})^{T}y^{t+1})$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2(y^{(t+1)^{2}}-y^{t}y^{t+1}))$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(y^{(t+1)^{2}}-y^{t^{2}}-2y^{(t+1)^{2}}+2y^{t}y^{t+1})$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(-y^{(t+1)^{2}}-y^{t^{2}}+2y^{t}y^{t+1})$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(-1)(y^{(t+1)^{2}}+y^{t^{2}}-2y^{t}y^{t+1})$

$=\frac{1}{h^{2}}{\displaystyle \sum_{i=1}^{n}-}k'(||\frac{y^{t}-x_{i}}{h}||^{2})(||y^{t+1}-y^{t}||^{2})$

$\geq0$

Thus we have proven that the sequence $\{f(j)\}_{j=1,2...}$is convergent. The second part of the proof in [2] which tries to prove the sequence $\{y_{j}\}_{j=1,2,...}$ is convergent is wrong.

### Improvements to Classic Mean Shift Algorithm

The classic mean shift algorithm is time intensive. The time complexity of it is given by $O(Tn^{2})$ where $T$ is the number of iterations and $n$ is the number of data points in the data set. Many improvements have been made to the mean shift algorithm to make it converge faster.

One of them is the adaptive Mean Shift where you let the bandwidth parameter vary for each data point. Here, the $h$ parameter is calculated using kNN algorithm. If $x_{i,k}$is the k-nearest neighbor of $x_{i}$ then the bandwidth is calculated as

$h_{i}=||x_{i}-x_{i,k}||$

Here we use $L_{1}$or $L_{2}$ norm to find the bandwidth.

An alternate way to speed up convergence is to alter the data points
during the course of Mean Shift. Again using a Gaussian kernel as
an example,

1. $y_{i}^{0}=x_{i}$
2. ${\displaystyle y_{i}^{t+1}=\frac{\sum_{i=1}^{n}x_{j}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}{\sum_{i=1}^{n}e^{\frac{-|y_{i}^{t}-x_{j}|^{2}}{h^{2}}}}}$
3. $x_{i}=y_{i}^{t+1}$

### Other Issues

1. Even though mean shift is a non parametric algorithm , it does require the bandwidth parameter h to be tuned. We can use kNN to find out the bandwidth. The choice of bandwidth in influences convergence rate and the number of clusters.
2. Choice of bandwidth parameter h is critical. A large h might result in incorrect
clustering and might merge distinct clusters. A very small h might result in too many clusters.

3. When using kNN to determining h, the choice of k influences the value of h. For good results, k has to increase when the dimension of the data increases.
4. Mean shift might not work well in higher dimensions. In higher dimensions , the number of local maxima is pretty high and it might converge to a local optima soon.
5. Epanechnikov kernel has a clear cutoff and is optimal in bias-variance tradeoff.

### Applications of Mean Shift

Mean shift is a versatile algorithm that has found a lot of practical applications – especially in the computer vision field. In the computer vision, the dimensions are usually low (e.g. the color profile of the image). Hence mean shift is used to perform lot of common tasks in vision.

Clustering

The most important application is using Mean Shift for clustering. The fact that Mean Shift does not make assumptions about the number of clusters or the shape of the cluster makes it ideal for handling clusters of arbitrary shape and number.

Although, Mean Shift is primarily a mode finding algorithm , we can find clusters using it. The stationary points obtained via gradient ascent represent the modes of the density function. All points associated with the same stationary point belong to the same cluster.

An alternate way is to use the concept of Basin of Attraction. Informally, the set of points that converge to the same mode forms the basin of attraction for that mode. All the points in the same basin of attraction are associated with the same cluster. The number of clusters is obtained by the number of modes.

Computer Vision Applications

Mean Shift is used in multiple tasks in Computer Vision like segmentation, tracking, discontinuity preserving smoothing etc. For more details see [2],[8].

### Comparison with K-Means

Note : I have discussed K-Means at K-Means Clustering Algorithm. You can use it to brush it up if you want.

K-Means is one of most popular clustering algorithms. It is simple,fast and efficient. We can compare Mean Shift with K-Means on number of parameters.

One of the most important difference is that K-means makes two broad assumptions – the number of clusters is already known and the clusters are shaped spherically (or elliptically). Mean shift , being a non parametric algorithm, does not assume anything about number of clusters. The number of modes give the number of clusters. Also, since it is based on density estimation, it can handle arbitrarily shaped clusters.

K-means is very sensitive to initializations. A wrong initialization can delay convergence or some times even result in wrong clusters. Mean shift is fairly robust to initializations. Typically, mean shift is run for each point or some times points are selected uniformly from the feature space [2] . Similarly, K-means is sensitive to outliers but Mean Shift is not very sensitive.

K-means is fast and has a time complexity $O(knT)$ where k is the number of clusters, n is the number of points and T is the number of iterations. Classic mean shift is computationally expensive with a time complexity $O(Tn^{2})$.

Mean shift is sensitive to the selection of bandwidth, $h$. A small $h$ can slow down the convergence. A large $h$ can speed up convergence but might merge two modes. But still, there are many techniques to determine $h$ reasonably well.

Update [30 Apr 2010] : I did not expect this reasonably technical post to become very popular, yet it did ! Some of the people who read it asked for a sample source code. I did write one in Matlab which randomly generates some points according to several gaussian distribution and the clusters using Mean Shift . It implements both the basic algorithm and also the adaptive algorithm. You can download my Mean Shift code here. Comments are as always welcome !

### References

1. Fukunaga and Hostetler, "The Estimation of the Gradient of a Density Function, with Applications in Pattern Recognition", IEEE Transactions on Information Theory vol 21 , pp 32-40 ,1975
2. Dorin Comaniciu and Peter Meer, Mean Shift : A Robust approach towards feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 24 No 5 May 2002.
3. Yizong Cheng , Mean Shift, Mode Seeking, and Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 17 No 8 Aug 1995.
4. Mean Shift Clustering by Konstantinos G. Derpanis
5. Chris Ding Lectures CSE 6339 Spring 2010.
6. Dijun Luo’s presentation slides.
7. cs.nyu.edu/~fergus/teaching/vision/12_segmentation.ppt

8. Dorin Comaniciu, Visvanathan Ramesh and Peter Meer, Kernel-Based Object Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence vol 25 No 5 May 2003.
9. Dorin Comaniciu, Visvanathan Ramesh and Peter Meer, The Variable Bandwidth Mean Shift and Data-Driven Scale Selection, ICCV 2001.

1. Simpler "Hello World" Demonstrated In C
Two nice links from Slashdot . One to compile a hello world without libc Hello from a libc-free world! and the other post is a old one – A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux . I hope to try out all the commands in these posts soon.

2. Fun Stuff
Time Machine by Tom Toles  and Memorizing Pi in Spiked Math. The Tom Toles one was really good , even by Toles’ standard. As usual, the punch line was at the bottom of the cartoon in tiny print 🙂

3. Fingertip Bacteria: A Promising Forensic Tool
A pretty exciting post where they use genomic profile of bacteria in our hand (fingertips, actually) to uniquely identify us. A microbial signature, if you will ! Looks cool to me. It is interesting for lot of reasons which are mentioned in the article – You can scrub off your fingerprints in a crime scene but not your "microbial" fingerprint.  Also, using this microdata, the researchers can figure out what they eat , where they live and work ! Real , real cool stuff if it comes fruition. (Privacy is definitely dead 😉 )

4. Software: Running Commentary for Smarter Surveillance?
Personally, I am not a big fan of Computer Vision. But this project looks promising. They try to convert surveillance data into a running commentary. And they use this text to analyze unusual stuff. I am not sure how good this system will ever be , but I really like the approach of focusing on the higher level stuff instead of getting lost in minutiae of tracking, segmentation etc.

5. How Privacy Vanishes Online
One more scary article on Privacy. But the only new news is the involvement of FTC in Netflix 2 and Google Buzz.

6. G.M. Tinkers With Augmented Reality System for Cars
Nice article on the futuristic augmented reality system for GM cars. Augmented reality is a big area now a days with Bing using it in Bing Maps. Using it in cars is even more cool. I checked youtube for other demos but was not able to find any. Share if you find one !

7. IE Test Drive
The last big news was platform preview of IE 9. If you are interested check the link.

1. Building a Better Teacher
A good article from NYT which talks about a quest to make high school teachers better and more effective . The article had some interesting points. For eg it implied that improving existing teachers is better than hiring better teachers. (partly because there are 1 Million+ teachers in US alone and finding so many good ones is hard). One of the ideas was that to be effective a teacher to grab student’s attention. Overall an interesting read. Although, it made me think if there is any similar studies made on university level education (especially graduate level). Andrew Gelman’s thoughts about the article is here

2. Top five gadgets from Microsoft’s research labs
I saw many interesting prototypes from TechFest. My favorite was Microsoft’s Translating Phone. Most of them were real cool.

Google had some nice posts this week and my favorite was Hopping on a Face Manifold via People Hopper. The application is very interesting. The problem is a real hard problem (even if you forget that they are handling millions of photos) and they seem to have done a decent job ! I noted that all people in Orkut are automatically opted in. Yikes .. Looks like another buzz like privacy controversy is in the making.

A polyglot Google Chrome beta, with new privacy features is a new feature in Chrome to automatically translate pages in foreign language. Cool !

Some cool applications for Android smart phones – Android app can recognize a person, pull up his status updates and Microsoft’s Tag Links the Physical World To the Internet . The concept of Microsoft tag looks pretty interesting !

Talking about Smartphones, NYT had a nice graph on who sued who on mobile patents – See more at An Explosion of Mobile Patent Lawsuits .

4. How To Improve Chat Roulette
A nice article on improving chat roulette.

5. At Last — The Full Story Of How Facebook Was Founded
A controversial and conspiracy theoryish article on how facebook was founded. Not sure how much to trust or what to make of it.

6. Spring 2010 jQuery Talks
Jquery recently has become one of my favorited libraries. I find using it pretty delightful ! This page has some slides that give a nice overview of JQuery.

7. Don’t Look Back
An interesting commentary on a book by Taliban’s Abdul Salam Zaeef. Had some interesting aphorisms. Another interesting but disturbing article from Joseph – The Karachi project.