Posts Tagged ‘latex’

Biweekly Links – 06-18-2010

1. Detexify2 – LaTeX symbol classifier
A neat webpage where you draw a symbol in it and it will tell you the latex symbol. Some thing I really need !

2. Predictalot for World Cup: Millions of predictions, stock market action
Prediction Markets are a very fascinating topic. It is fun to see them applied to World Cup. With the tournament producing lot of upsets, it is fun to watch other’s predictions and risks.

3. MMDS 2010. Workshop on Algorithms for Modern Massive Data Sets
MMDS is a fascinating workshop with lot of neat papers and tutorials. I followed the last two workshops and this one too promises to be fun. The papers are not linked correctly as of now which I hope will be corrected soon.

4. SeaMicro drops an atom bomb on the server industry
An interesting venture in server market. Let’s see if they can crack the market.

5. What Is I.B.M.’s Watson?
A neat article on IBM’s Watson which is a computer that can answer questions and is going to participate in Jeopardy. It is always fascinating to see applications that popularize AI to the masses.

6. The $600 billion challenge
Looks promising. Warren Buffet and Bill Gates have joined hands to revolutionize Charity business.

7. The Git Parable
A neat post which discusses the features in Git and its rationale.

8. Map: Where Americans Are Moving
A neat info graphic showing migration among US counties. A interesting visualization of data.

Read Full Post »

This series of posts discusses some basic ideas in data mining . For all posts in the series see here.

I got lot of useful feedback on my post on K-means. Most common  were to discuss the basic intuition in more detail and to include more figures to clarify the ideas . Thanks people for those comments !

Topic and Organization

In this post , I will discuss about Principal Component Analysis aka PCA  , a very popular dimensionality reduction technique. It has an amazing variety of applications in statistics, data mining, machine learning, image processing etc

PCA is a huge topic to cover in a single post. So I plan to write it as a trilogy – First post will talk about PCA in lower dimensions with emphasis on geometric intuition and application to compression. Second post will extend it to arbitrary dimensions and discuss about basic dimensionality reduction , and the two common algorithms for calculating PCA. Third post will discuss  Eigenfaces , which is probably one of the coolest application of PCA where we use it for face recognition.

Motivation for PCA

Before discussing PCA per se, let us discuss the reasons why PCA is one of the most important technique in data mining / machine learning. In this post I will primarily discuss with respect to compression. In the next post, we will discuss the more general idea of dimensionality reduction.

Let us assume that we have some arbitrary process that generates (x,y) points in 2 dimensional space . We do not know the behavior/internals of the process but we suspect that there is some inherent relation between x and y. There can be many reasons for trying to know the internal behavior. For eg if we know the relation between x and y, then we need to store only x and the transformation function. For eg if we had 1000 points in 2 dimensions , then we need 2000 words to store them (assuming each number takes word to store ,which is 4 bytes) . But if we knew the relation between x and y, we only need to store 1000 numbers (ie only x) which will occupy 1000 words as we can recalculate y from x using the transformation function. We get a 50% compression !

Let us take an example. Assume that our unknown process, generated some points which when plotted , we get the following graph.

The red lines are the x-y axes and the green crosses are the points generated by the process. The relation between x and y is obvious to us now. The internal logic is y=x . Now if we just store x, then we know how to get y. Some other examples of possible functions are  y=1.8x + 32 (Celsius to Fahrenheits) or y=1.6x ( miles to kilometers) . These two examples are slightly not very obvious but still we can see that x and y are related by some transformation function. PCA works best when this transformation function is linear. y=x, y=1.6x, y=1.8c+32 , y=ax+b are all linear but y=x^2 is not.

Let us discuss the y=x example in slightly more detail. In this simple example, we were able to easily identify the transformation function as the data was in 2 dimensions. In 2 dimensions, the only valid linear transformation is that of a line. But in higher dimensions it will be a (hyper)plane and hence finding the exact linear transformation might be hard to do it just by eyeballing the data. In fact, even visualizing the data in higher dimensions is a big problem. Hence we need a more algorithmic way to find the relation.

Short Detour To Linear Algebra : Vector Spaces and Basis

Basis is one of the fundamental ideas in Linear Algebra. I will give a short discussion of basis here – for more details consult some good book on Linear Algebra or some video lectures on it. For more details on learning it see my post on Linear Algebra .

Before discussing basis, let us talk about Vector Spaces. Intuitively, a vector space is a collection of vectors . It has the additional property that if you take any two vectors in it and add them, the result is also in the original collection. Similarly, if you multiply any vector by a scalar , the result is also in the original collection. So we can say that vector space is a collection of vectors that are "closed" under vector addition and scalar multiplication. A subspace is a similar idea. A subspace is subset of vectors in the original vector space but also have the behavior of a vector space. ie being closed under vector operations.

If the above paragraph sounded very abstract , it isn’t. You can consider the set of all real values vectors representing points in 2 dimensions as vector space. It is represented as \mathbb R^2 . If you take any two vectors and add them the result is still in \mathbb R^2 . Same goes for scaling.

Now that we have discussed vector spaces, let us talk about Basis. Basis is again a set of vectors. We say that B is a Basis for a Vector Space V , if all the vectors in vector space V can be expressed as a linear combination of the vectors in the basis,B  . So the vectors in the basis can express every vector in the vector space as a linear combination of themselves but no vector in a basis can be expressed as linear combination of other vectors in a basis. Wikipedia concisely states this fact – Basis is a "linearly independent spanning set."

The idea of Basis might sound confusing, but an example might clear it. Let us consider all the real valued 2-dimensional points represented by \mathbb R^2 . The simplest basis (aka standard basis ) for \mathbb R^2 that corresponds to the x-y axes are

e_1=\left[ \begin{array}{c} 1 \\ 0 \end{array} \right] , e_2=\left[ \begin{array}{c} 0 \\ 1 \end{array} \right]

We can notice several things in this basis.
a) All the vectors in \mathbb R^2 can be represented as a linear combination of e_1 and e_2 .

\left[ \begin{array}{c} a \\ b \end{array} \right] = a \times \left[ \begin{array}{c} 1 \\ 0 \end{array} \right] + b \times \left[ \begin{array}{c} 0 \\ 1 \end{array} \right]

b) The vectors e_1 and e_2 are the columns of the identity matrix. Also the two vectors are orthogonal. (ie the vectors are perpendicular to each other)
c) More interestingly, the two vectors e_1 and e_2 are also orthonormal (orthogonal vectors with unit length)
d) We can also notice that this basis corresponds to the standard x-y axes.

An important thing to note is that a vector space might have more than one basis. There are common techniques like Gram-Schmidt to generate Orthogonal sets of vectors. For eg , other valid basis for \mathbb R^2   are :

\left( \left[ \begin{array}{c} 1 \\ 1 \end{array} \right], \left[ \begin{array}{c} 1 \\ -1 \end{array} \right] \right) , \left( \left[ \begin{array}{c} 1 \\ 1 \end{array} \right], \left[ \begin{array}{c} -1 \\ 2 \end{array} \right] \right)

Change Of Basis

So why did we discuss basis so much ? The reason is that PCA can be considered as a process of finding a new basis  which is a linear combination of the original basis such that the inherent structure of points becomes more clear. Most of the time the standard basis (which is the x-y axes) makes a lousy basis. Again, let us see a picture to clarify this idea.


This image is very similar to the previous image. As before, the red lines are the x-y axes and the green crosses are the points. But what are the blue lines ? They are two axes which are perpendicular to each other but have the nice property that they bring out the internal structure more clearly. In other words, they are the new basis. Let us refer them as u_1 and u_2 .

How is this new basis related to the original basis ? We get the new basis by rotating the original basis by 45 degrees. Remember, rotation, stretching and shrinking of the constituent vectors are all valid linear transformations.

Why is this new basis better than the old one ? Because , in this new basis the original point become (1,0), (2,0) and so on. So we only need one number to represent a point instead of the two in the original basis. We can also think that we have projected points in 2-dimensions to 1-dimension and have got the benefits of compression ! But we know that we can still get all the points back if needed.

A Dose of Real World

So far we have been living in the ideal world where the sensors gave us perfect data. But in real world , the sensors usually give the data with some noise. So let us take a look at the plot where a gaussian noise  is added to the points.


So whatever we discussed above still holds but with slight modifications. Since the data was in 2 dimensions, to get a compression we need to project the points to 1 dimension. So we have to choose to project the points onto one of the axes (basis). How do we select the which axis to project on ?  Since the points have noise in them , we need to project the points on to the axis which minimizes the projection error. Let us consider both the cases :

[Image Credit : The following 3 images are obtained from Stanford Prof Andrew Ng‘s Machine Learning course : Lecture notes on PCA  ]

Case 1: Here we project on the first vector(axis) u_1. The image with the projection error is here. 

Case 2: Here we project on the first vector(axis) u_2. The image with the projection error is here.

Which projection did you prefer ? I am sure you would have preferred the first projection intuitively. Algorithmically too that is the correct answer. The projection onto vector u_1 had lesser projection error than the projection onto vector u_2 . In short, PCA selects a basis which is a linear combination of the original basis which also minimizes the projection error. How it does that , we will discuss in the next post.

Alternate View Of PCA

So far we have discussed PCA in terms of change of basis. There is an alternate way of looking at it . This way is slightly less intuitive but is very useful when you want to prove the optimality of PCA or derive formulas for PCA. The new idea is to look at the variance.

Let us take a look at the points again.

We have to select one of the axis onto which we want to project the points. Which axis should you select ? From the image , vector u_1 is in the direction of higher variance than vector u_2. Hence we need to select projection over u_1 because that projection tries to maintain more information about the original points than the projection onto u_2. Please look at the images of the actual projection onto u_1 and u_2 given above. Projection on u_2  is bad as some points in the original data might be projected onto a same point in the new axis.

An alternate reasoning is that the original points will be along the direction of highest variance and the noise will be along the direction of the lesser variance. This is because, noise by definition is usually much smaller compared to original data points. Since u_1 lies along the direction of highest variance it must lie on the best fit line . u_2 being smaller gives the direction of the noise. ( What happens if the noise is too high ? PCA will still work but with very high projection error )

There is a formal proof that shows that minimizing the projection error and maximizing the variance are equivalent. We will skip the proof here.


The important points in this post are the following :
1. The standard basis is usually a bad one to store data.
2. PCA tries to find a new basis such that the projection error is minimized.
3. Alternate way to look at PCA is that it finds a vector to project which is in the direction of the highest variance.

Read Full Post »

1. Stanford iPhone Application Programming course
I am not a big fan of iPhone but its a nice gesture by Stanford to make the videos of iPhone application programming available for all. I sampled certain lectures and they were pretty interesting.

2. Who Writes Linux

An interesting article on the Linux Kernel development , developers and development process. It also includes a breakup of developer profiles of Linux Kernel. The most surprising aspect of the report is that 75% of the developers are officially paid by some company (RedHat, IBM etc ) to develop Linux code.

3. Kasparov and Good Interaction Design
A commentary on a NYTimes article on Chess, Humans and Computers. There are lot of good nuggets of information : for eg good interaction design trumps smart algorithms , Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process, the age of youngest grandmasters keep falling even though they dont match grand masters of yore etc

4. Are The Reals Really Uncountable?

Richard J. Lipton’s  aptly named blog Godel’s Lost Letter and P=NP is my favorite theoretical computer science blog. He writes 2-3 posts every week but each of them has exceptional quality and incredibly thoughtful. In this post, he iscusses , Georg Cantor’s  very popular diagonal argument . I learnt diagonal argument during discrete mathematics course – The idea was very simple but somehow I never fully accepted or trusted it. It is nice to see that many more people share my feelings 🙂 The comments section has some nice discussion on why most students feel that way.

5. LaTeXSearch: 1M snippets in a searchable database 
 LaTeXSearch.com  is a free online service from Springer  to search a huge database of Latex snippets. I am still figuring out how best to use this site .But I guess , I will use their LaTeX Sandbox Site more to test my LaTeX snippets. I had been using a draft file in my WordPress blog to test my LaTeX expressions though this site seems to more convenient.

6. How to disrupt Wall Street
There is a huge , populist anger against Wall Street right now. From Obama to Congress to main street people, all anger is directed at Wall Street. This blog offers some interesting ways in which technological innovation can disrupt Wall Street’s profits. Good Read !!

Read Full Post »

Latex Editor in Linux – LyX

I have been talking about LaTeX in my previous post.  LaTeX is a tool to create and typeset  mathematically inclined documents and most academic papers are written in it. In this post, I want to talk about editing LaTeX documents.LyX is available for all the major operating systems.

For people who have not used LaTex, this page gives an excellent tutorial. I like this page from Stack Overflow which gives some best practices. If you (for some reason) decided to buy some LaTeX book , you can get some info in this page. I have read superficially the first two books and they are excellent.

Before I talk about LyX, there are other means of writing LaTex code. If you are a fan of Vim , then Vim LaTeX suite is excellent. (Ubuntu Package – vim-latexsuite) . If you are a gedit fan , then you can get the latex plugin for gedit. (Ubuntu package – gedit-latex-plugin) . In case you want to use LaTeX to prepare presentations then use Beamer (Ubuntu package latex-beamer) . Regarding Beamer, I kind of feel its unwieldy, but the other plugins are excellent.


I found Lyx from some LaTeX mailing list and have not looked back. It is very easy to install in Linux (Ubuntu package – lyx) . It comes with a huge list of packages and formats that you can be productive immediately. The best feature of Lyx is that its a WYSIWIG editor for LaTeX. It doesn’t sound like much, but it can be extremely useful and productive. It is always nice to see how your LaTeX expression looks like immediately. It also has a lot of mechanisms by which it avoids you from making errors which was helpful when I was a LaTeX newbie.

More details about using LyX can be found in its wiki page . LyX’s FAQ is here, even though for me its unsorted FAQ was more useful . A very useful set of tips are in the tips page.

Some random tips I find useful are :

1. Learn command sequences for common LyX operations. It will make you really really productive. For eg, to go into math mode, you press Ctrl-m. LyX allows custom command sequences, use them once you are comfortable.

2. To create numbered (ordered list) press Alt-P-E (Mnemonic is Enumerate). For unordered (bullet) it is Alt-P-B. (Mnemonic is , of course, Bullet).

3. I usually make both the Math (auto) and Math Panels (auto) toolbars always visible. (via View->Toolbars) . I also usually enable the Command Buffer toolbar as I use it extensively , partly because I am listening to MIT ‘s Linear Algebra lectures.

4. If you chose to enable Command Buffer, it can be accessed via Alt-x. I primarily use it to enter matrices. To enter a matrix , enter “math-matrix  numcols numrows” in the buffer. Also make sure that you use \left[ and \right] to denote the brackets as they scale with the matrix size. Yes, the notation is counter-intuitive.

5. Another common beginner pitfall happens when you use summation or the product symbol in LaTeX but the limits appear awkwardly. In that case you will have to use the \displaystyle markup.

For example, $ latex \sum_{i=1}^{\infty} i^2 $ gets you this

\sum_{i=1}^{\infty} i^2

But, $ latex \displaystyle \sum_{i=1}^{\infty} i^2 $ gets you this

\displaystyle \sum_{i=1}^{\infty} i^2

6. The default layout automatically displays the date also in the output. In case you want to avoid that add the tag “\date{}” at the end of the title.

There are lot of other useful tips which can be found in Lyx’s tip wiki.


Happy LyXing !!

Read Full Post »

Using LaTeX in WordPress

LaTeX ,for people, who have not used it , is an awesome and a must learn tool . Quoting from its home page , “LaTeX is a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents.” Most of the papers in CS, Math and Physics communities use LaTeX in one form or the other.

I have been using LaTeX for almost an year, primarily as a way to make notes when I listen to video lectures or read some technical book. I intend to write (hopefully) many articles , mathematical in nature which will need LaTeX. So I have been exploring ways to add mathematical content in WordPress blog.

Luckily, WordPress supports LaTex and some of its most popular ams packages. You can get more information here. In summary , you create latex text by putting your content within a special markup called latex .

$latex all the LaTeX code here $

For more information look here.

I guess this will be sufficient for most of my needs. There are some inconveniences using LaTeX in WordPress . For a discussion see here. For more power users, Luca has written a Python tool to convert a LaTeX file to a form ready to be copy pasted to WordPress. It can be downloaded here. I have not used it myself per se, but I intend to use it a lot in the near future. Given the fact that it is used both by Luca and Terry Tao, I am sure that it will satisfy all my needs and more.

So get your LaTeX rolling !!

Read Full Post »