Feeds:
Posts
Comments

Posts Tagged ‘wordpress’

1. Google News
Google was in the news this week quite frequently. The most buzz worthy was the integration of Google voice with gmail. My favorite news though was Google Acquires Angstro. This company had been doing some neat work and had some good code in github. Hopefully, these tools will be integrated into Google services. Another interesting news was Google’s instant search – most likely they are doing some A/B testing but I think it is a cool idea . I also found a presentation from a Googler called The Real Life Social Network v2 which was very insightful (but looong !).

2. Facebook Looks to Develop More Social Startups
An interesting partnership – although I do not know how much Y-Combinator benefits from it !

3. Want Instant Delivery of New Blog Posts & Comments?
This is a post from WordPress about cleverly using Jabber for getting real time information about blog posts and comments from the blogs you are interested in ! I feel the idea is pretty geeky but not sure how useful it is in practice.

4. A Search Service that Can Peer into the Future
This is probably the most innovative application using NYTimes’ articles. The interface is slick and it shows information in a very interesting manner. I guess the articles were heavily tagged with micro formats – I cannot see how this tool could be built otherwise. The coolest idea is that it can peer into the future (sort of). It has become one of my favorite past times – To see the references in NYTimes for 2010 and 2011 (and beyond). Check this tool out !

5. Running On Empty
This post discusses issues faced by documentation translators and the stigma attached when you work for Ubuntu related stuff (that is an exaggeration but kinda true).

6. Bad News, Good News
This post brings a very valid point that I have noticed many times. All the newspapers discuss about bad news and usually ignore good news – Most of the time , the only positive news comes from sports. What effect this has on us ? I remember Dr.Abdul Kalam made a similar observation some years back. May be time for introspection.

Advertisements

Read Full Post »

Analyzing my blog traffic is one of my favorite past times. Seeing traffic surge and strangers using my posts gives me gratification. Initially, my analysis was fairly low tech – checking WordPress stats page periodically. After doing it a few times, I realized I could do more methodically – and I can use the statistics to do some rudimentary data mining.

As you know, I have a WordPress.com blog. WordPress exposes the statistics in two ways : As a flash chart in the dashboard and as API service.

Viewing Blog Stats – Low Tech Way

If you want to take a look at your blog’s stats today , then you can do it in the dashboard. Assuming you are logged into your WordPress login, go to your blog. You will see WordPress toolbar at the top of the page. Click "My Dashboard". It will show a chart of total daily visits for the past 15 days. If you want more details, hover your mouse over "My Account" and select "Stats".

This page shows a wealth of information about your blog but it is limited to 2 days – today and yesterday. There are atleast 6 important statistics in that page.

WordPress Stats Sections

a) Views Per Day
This is probably the statistic you are most interested in. This plots the total number of visits each day for the last 30 days. This also has multiple tabs which allows you to aggregate the statistics information. For example you can view the total weekly and monthly visits . This gives a basic view about how your blog is faring.

b) Referrers
Referrers are basically web pages from which visitors reached your blog. There are different types of referrers : pingbacks, related posts, explicit linking etc. If a certain website is driving lot of visitors to your blog , it is time to notice ! Referrers allows you to get that insight. This panel shows the top 10 referrers. You can click on the "Referrers" link to get the full listing.

c) Top Posts and Pages
This shows the top 10 posts for the given day. One interesting thing is that they track home page separately. Also if you have different pages (eg About), then visits to these pages are also tracked. Again, if you want to see the number of visits to each page, you can click on the title link.

One interesting thing is that each of the posts/pages in this section also has an addition chart icon near them. Clicking on them gives the stats for that particular post alone. The new page shows a chart detailing how many visits occurred to that page in the last few weeks. Most interestingly , they also show the average visits per day for the last few weeks and months. This gives a glimpse into the lasting popularity of your post.

d) Search Engine Terms
I am not very clear what this exactly means – The instruction says "These are terms people used to find your blog." . I am not sure if this corresponds to search terms typed in search engines or in WordPress search boxes etc . Anyway, this panel gives information about the search terms that fetch visits to your blog.

e) Clicks
Clicks panel shows the list of links in your blog that your visitors clicked. In a way , you can consider it as an inverse of referrers. In this case, you act as a referrer for some other blog. This post gives some hints about the type of visitors to your blog.

f) Aggregate Statistics
There is also another panel that shows some aggregate stats. This shows the total number of views to your blog so far , number of blogs and posts , email subscribers etc.

A Better Way

Using WordPress Stats page to get your data is fairly low tech. This gives some data which can only give you an instinct of how things go. But it will not give you any deeper insights. For doing more data mining, you need data – lots of data. Fortunately, WordPress makes available a stats API which you can query to get regular data. In the rest of the post , we will talk about the API and how to use the data that you get out of it.

Using WordPress Stats API

The primary url which provides the stats is http://stats.wordpress.com/csv.php . You can click on it to see the required parameters. There are 4 important parameters to this API.

a) api_key  : api_key  is a mandatory parameter and ensures that only owner queries the website. There are three ways to get this information. This key is emailed to you at the time you created your blog. Or you can access it at My Dashboard -> Users -> Personal Settings. This will show your api key. For the truly lazy click this url .

b) blog_id : This is a number which uniquely identifies your blog. Either blog_id or blog_uri is mandatory. I generally prefer blog_id. Finding blog_id is a bit tricky. Go to the blog stats page (My Account -> Stats). Click on the title link of "Top Posts and Pages". This will open a new page which shows the statistics for last 7 days. If you look at the page’s url, it will have a parameter called blog. The value of this parameter is the blog_id . Atleast for my blog , it is a 8 digit number.

c) blog_uri : If you do not want to take all the trouble of getting blog_id , use blog_uri. This is nothing but the url of your blog (http://blah.wordpress.com).

d) table : This field identifies the exact statistic you want. One of views, postviews, referrers, searchterms, clicks. Each of these correspond to the sections of WordPress stats discussed above. If table is not specified , views is selected as the default table.

You can get more details from the stats API page given above.

Sample Python Scripts to fetch WordPress Stats

I have written a few scripts which fetch each of the WordPress stats.  One of them run every hour and gets the total number of views so far for the whole blog. The other scripts runs once a day and fetch the total clicks, search terms, referrer and top posts for that day. All of these store the data as a csv file which lends itself to analysis.

If you are interested in the scripts , the links to them are  :

1. getBlogDaysStats.py  : Fetches the total views for the whole blog at the current time. For best results run every hour.
2. getBlogReferrers.py : Fetches all the referrers to your blog.
3. getBlogPostViews.py : Fetches the number of views for individual blog posts and pages.
4. getBlogSearchTerms.py : Fetches all the search terms used to find your blog today.
5. getBlogClicks.py : Fetches the urls that people who visited your blog clicked.

How to Collect WordPress Statistics

The first step is of course to collect data periodically. I use cron to run the scripts. My crontab file looks like this :

11 * * * * /usr/bin/python scriptpath/getBlogDaysStats.py
12 0 * * * /usr/bin/python scriptpath/getBlogClicks.py
14 0 * * * /usr/bin/python scriptpath/getBlogPostViews.py
15 0 * * * /usr/bin/python scriptpath/getBlogReferrers.py
16 0 * * * /usr/bin/python scriptpath/getBlogSearchTerms.py

Basically, I run the getBlogDaysStats every hour and other scripts every day. I also run the rest of scripts at early morning so that it fetches the previous day’s data.

How to Use WordPress Statistics

If you run the scripts for few days, you will have lot of data. The amount of analysis you can make is limited only by your creativity. In this section, I will tell some of the ways I use the stats instead of giving an explicit how-to.

1. Views per day : It is collected by getBlogDaysStats.py. The most basic stuff is to chart them. This will give a glimpse of your trend – If it is static or climbing, then good news. If it is falling down it is something to worry about. I must also mention that have a more or less a plateau in your chart happens often. For eg in my blog, the charts follow a pattern – It increases for quite some time , then stays at the same level for a long time and then increases again. Also , worrying about individual day’s statistics is not a good idea. Try to aggregate them into weekly and monthly values as they give a less noisy view of your blog traffic.

Another common thing to do is to analyze per hour traffic. This can be easily derived from the output of the script. Basically, if m is the number of views at time a and n is the number of views at time b , then you received n-m views in b-a hours. I usually calculate it for every hour. This gives a *basic* idea of peak time for your blog – You can also infer your primary audience , although the interpretation is ambiguous. As an example , I get most of my traffic at night – especially between 1 AM – 9 AM. Morning time traffic is pretty weak and it picks up again in the evening. Interpreting this is hard as my blog covers a lot of topics – but if your blog is more focused you learn a lot about your visitors.

2. Referrers : This is a very useful statistic if you do some marketing for your blog. For best results, you may want to use just the domain of the url instead of the whole url for analysis. Using it you can figure out which sites drive traffic to your blog. If it is another blog, then it is a good idea to cultivate some friendship with that blog’s owner. For eg, for my blog , I found that digg drives more traffic that reddit. Also facebook drives some traffic to my blog – so I use WordPress’s facebook publicize feature. I also find that I get some traffic due to WordPress’s related posts feature which means that I must use good use of categories and tags. Your mileage may vary but I hope the basic idea is clear.

3. Individual Post Views : This is probably the most useful set of statistics. Basically , it allows you to analyze the traffic of individual posts over a period of time. I have a file which associates a blog post with extra information : For eg it stores the categories, tags, original creation date, all modification dates etc. (If you are curious , I store the information in JSON format). Once you have this information lot of analysis is possible.

a. You can figure out your audience type. If for a post, you have lot of audience in the first week and almost no audience from then on – then most likely your audience is driven by subscription. If it keeps having a regular traffic, then probably it has some useful stuff and traffic is constantly driven to it by search engines. For eg, my Biweekly links belong to the first scenario : When I publish one, lot of people visit it and then after a few days it gets practically no visits. In the other case, my post of Mean Shift gets a steady stream of views every week. If you want to sustain a good viewership, you may want to write more posts which can attract long term views.

b. If you use categories and tags wisely, you can tally the number of views per each category. This will give you an idea of the blog posts which users prefer. I noticed that my audience seems to like my Linux / Data Mining posts than other categories. So it is a good idea to write more of those posts.

c. You can kind of see a pareto effect in your blog posts. For eg, my top 10 blogs account for atleast 70% of my blog traffic. So if I could identify them correctly, I can write lesser posts but still maintain my blog traffic ๐Ÿ˜‰

You can do lot more than these simple analysis but this is just a start.

4. Search Terms : This is another neat statistic. You can use it to figure out the primary way in which users access your blog. For eg, the ratio of total blog post view for a day and number of search terms for the day is quite interesting. If the ratio is high, then most of the people find your blog using search engines. In a way , this a potential transient audience whom you can convert to regular audience. If the ratio is small , then your blog gets views by referrers and regular viewers. This will assure you a steady audience , but it is slightly hard to get new people "find" your blog.

This statistic also tells you which keywords the viewers use to find my blog. You can gleam lot of interesting things from this. For eg, almost all of my search terms are 3-5 words long and usually very specific. This either means that the user is an expert and has crafted specific query. It may also mean that user rewrote the query and my blog was not found in the general query. I also use the terms to figure out if the user would have been satisfied with my blog. For eg, I know that a user searching "install matlab to 64-bit" will be satisfied while some one who searches "k means determine k" will not be. You can do either of two things : augment your blog post to add information that users are searching , or point users to resources that satisfies their query. For eg, I found lot of people reached my blog searching for how to find k. I found geomblog had couple of good posts on it and updated my blog to link to these posts. Some times, I may add a FAQ if same search query comes multiple times and if my page contains the information but is obscure. Eg : lot of people reached my blog searching for Empathy’s chat log location. My post of Empathy  had it but not in a prominent fashion. So I added a FAQ which points the answer immediately.

5. Clicks : This statistic is tangentially useful in finding out which links the user clicks. One way I use it is to gauge the "tech" level of my reader who visits my blog using search engines. I usually link to Wikipedia articles for common terms. If the user clicks on these basic terms often ,then it might mean that I write articles at a level higher that the typical user and I have to explain it better. For eg , in my post of K-Means , this was the reason I explain supervised and unsupervised learning at the start even though most people learning k-means already know it.

Other Resources for Blog Traffic

There are other locations that give some useful information about your traffic. Some of them are :

a. Google Web Master Site : This is arguably one of the most comprehensive information about your blog post’s performance in Google. You can see it a Google Webmaster Page -> Your site on web -> Search queries. It has information like impressions, click throughs, average position etc. You can download all of them in a csv file too ! Literally a gold mine for data lovers.
b. Feedburner : Even though, WordPress has a feed, I switched to FeedBurner. One of the main reason was that it gave me a csv file detailing the number of views by my subscribers.
c. Quantcast : Useful for aggregate information. It has multiple charts that detail the number of views, unique visitors etc. The Quantcast data might not be accurate as it is usually estimated – but it gives a broad gauge of your blog. It also has some statistic which says how many of your visitors are addicts , how many are pass throughs etc. Quite useful !
d. Alexa : Similar to Quantcast . I primarily use my Alexa Rank for motivation to improve my blog ranking.

Pivot Tables

I primarily use Python prompt to play with data. If you are not comfortable with programmatic tweaking, use spreadsheets to do these analysis. If you have Windows, you can do lot of powerful analysis by importing the statistics obtained by the scripts into Microsoft Excel. Excel has a neat feature called Pivot tables. It is also an advanced topic that I will not discuss here. You can do some fantastic analysis using pivots. They also give you the ability to view the same data from multiple perspectives.

In this post, I have barely scratched the surface – You can do lot of amazing analysis using WordPress Stats API . I will talk about more complex analysis in a later post. Have fun with the data !

 

 

Add to DeliciousAdd to DiggAdd to FaceBookAdd to Google BookmarkAdd to RedditAdd to StumbleUponAdd to TechnoratiAdd to Twitter

Read Full Post »

1. Choosing the number of clusters I: The Elbow Method
A post with some ideas on choosing number of clusters in a principled way.

2. 4 Chatboxes for wordpress.com blogs
An interesting way to embed chats in blogs. I have added a chat window in my blog to chat with me just for the fun of it. Lets see how it goes.

3. This seat is reserved  and More women, more power?
Two interesting posts on Women reservation bill in India. Its a shame that the bill is not passed.

4. Finding Your Roots
Another nice post on complex numbers by Steven Strogatz. It was a nice refresher to learn again how to take roots (square , cube et al) for imaginary numbers.

5. reddit.com Interviews Peter Norvig
An interesting interview with Peter Norvig. He answered some interesting questions . For eg I liked his answer that linear classifiers have progressed beyond everyone’s expectations. Another good question was why Lisp is not much used in Google. For the list of questions see reddit site.I found this interview through this site.

6. A collection of code competition sites
Good collection of code competition sites.

Read Full Post »

I guess this Biweekly links edition is going to be a bumper one like this week’s Microsoft updates ๐Ÿ˜‰

1.Google News
Google made lot of splash this week.

Google Buzz
Google set blogosphere on fire with announcing Google Buzz. The official link is here. For those using Google reader, more information is here . Since GMail was the primary means of delivering Buzz, GMail specific news is here .  At last, Google has become more serious about Social search . They have also acquired the excellent Social Search Engine site Aardvark . I had used Aardvark and found it to be very good. The underlying AI algorithms seems to be doing a fine job.

Not many people were impressed (including me !) . Its cool but looks so limited to me. I felt they could have customized the incredibly more powerful Google Wave for this purpose that developing a new tool. Even then somehow all this looks very cramped. Its ties to GMail may be its biggest strength and weakness. Lets see how it goes. John Battelle’s take on Google Buzz . Microsoft  and Yahoo  have slammed Buzz , though not without justification.

Another big criticism was that Buzz is exposing your social circle without providing any control to customize it. A very passionate (angry ?) CNET article is here. LifeHacker had a post on how to prevent it . To Google’s credit, they had made changes to Google Buzz so that such customizations are very easy and quite intuitive. The post on it is here.

Google’s Experimental Fiber Network
In the other big news, Google announced that they will be operating a new fiber network with speeds around 1Gbps. Looks like they were not content with a 2X speedup with SPDY ;)  Awesome ! Btw, US’s national bandwidth plan’s target ? 5 Mbps ๐Ÿ˜‰ Way to go Google ! It will be very expensive to expand the network for everyone, but it should allow Google to make lot of experiments.

The other important news , but one that did not get much attention, is Youtube bringing in a Safety mode. Youtube currently has some options to filter results, but this new move represents a more comprehensive change. This is just not for pornography (which will removed soon if it exists anyway) and violence. From the many posts that talk about it, the changes seem to be very comprehensive. (For eg replacing objectionable words in comment by asterisks) . I am not sure how they will decide if a video is objectionable. Using some CV algorithms or a collaborative filtering from comments ? We shall know that in some time.

2. Sitemap pings for instant search updates
This is one of the coolest changes made in recent times in WordPress. Whenever a post is written , WordPress sends a Ping to major search engines who can index the new post immediately. I experienced this awesome feature after writing my post on Matlab . Within 2 hours, my page was in Google’s search results and I got couple of hits from it. I think that is kind of incredible as the underlying system is really complicated. From getting a ping to crawling  to creating inverted indices   , updating top-k lists  , there is too much work. It was amazing that Google had done it within 2 hours. Yahoo and Bing ? You guess the result ๐Ÿ™‚

3. Why Do Some Brands Hide Their Prices on Amazon?
An interesting discussion about why some brands hide their prices and what they hope to gain from it. I donโ€™t agree with the post’s conclusion that symmetric equilibrium for search with positive costs is setting price at monopoly price . But I have to agree it makes some sense too.

There was another interesting post titled โ€œPut All Your Eggs In One Basketโ€ where he starts by saying "Job market interviewing entails a massive duplication of effort." and discusses alternate solutions.

4. STOC 2010 Accepted Papers (with pdf files)
In Academia, the biggest news of last week is the list of accepted papers in STOC. Shiva Kintali’s post gives the list of STOC posts with a link to pdfs. There are some ones in AGT and machine learning. I should check them out. The list of AGT papers are given at Nisan’s blog post .

5. Will ARPA-E Receive Funding?
ARPA-E is a highly acclaimed initative to promote revolutionary results in energy. But looks like it may not receive as much money as it needs. That is more bad news that it sounds like.

6. Amazon S3 now supports Object Versioning
Another cool new feature where you can have versioning in S3. This affects the behavior of GET and DELETE. But the pricing applies to each individual versions though.

7. Network Coding
An very interesting two part article from MIT Research on Network Coding – part 1  and part 2 . I donโ€™t fully understand how it works when you apply it at the internet scale but looks like a fascinating idea.

8. Feds push for tracking cell phones
CNET post on currently undergoing trial about allowing Cell phone tracking.

9. Do-It-Yourself Genetic Engineering
A fascinating magazine article on Synthetic Biology . I felt the article was really long and without much meat but still a good read. I really liked the part where UC Berkeley people programmed a Robot to do DNA sequences. Talk of the impact of bringing in fast moving CS people into slow moving biology field ๐Ÿ˜‰

Read Full Post »

1. Social Search And Personalized Ads.
    Perfect personalized Search is kind of a holy grail for IR. Google seems to have made some more steps towards it this week. One is to integrate your social group into your search. More details are at Google’s blog post Search is getting more social.  Currently the way Google identifies your social group is kind of Google centric – using Google Profile, Orkut,Twitter, YouTube etc. Given the fact that Google never really had a good social networking technology like Facebook / MySpace, I wonder how different/improved the results are going to be. But still its an excellent step forward.
    The other major step is bring personalized ads in GMail. Read more at GMail’s blog post .  I have to say, Google is being really aggressive in getting more ads ! There used to be a common joke that if you want to avoid ads in GMail , add some words with negative connotations [eg death] in some ratio to the email. I remember testing it and it was true. Not sure if its true anymore. Definitely not with this new change !

2. EFF Reveals How Your Digital Fingerprint Makes You Easy to Track 
    This is a scary/interesting article that says how your browser leaks information about you that kind of makes it easy for data mining programs to track you. I checked their website Panopticlick but not sure if it is really as scary as it claims to be.

3. Post by Email Wrapup 
    Post by email is a cool feature in wordpress that allows you to create a new blog post by sending an appropriately formatted email. I havenโ€™t used it so far but may be in feature. They have excellent set of short codes to handle more advanced use cases like specifying tags, categories, publishing to twitter etc . See more at their short codes page.

4. Interviewing Insights and Test Frameworks

This Google Testing blog post talks about the various approaches  a person can use to test a code. Which way is yours ?

5. Re engineering cells 
    An interesting article about taking ordinary cells and reverse engineering them to stem cells . Stems cells  are informally cells that can turn into any other cells (think of heart, liver etc ). Currently, the most important source for stem cells is embryonic  but there is a huge controversy over that technique . Hence this alternate technique is all the more exciting !!

6. Microsoft Research’s CS Student Portal
The portal is still in the making and the information available currently is not vast. There are some good links though, like the link of all the important blogs in Theoretical CS . The TCS feed is here .

7. All or nothing    

An article from MIT News that discusses the importance of US health care . Just two weeks ago, it looked unstoppable, but now its future is in question. So are the ways of politics.

8. Female Bankers in India Earn Chances to Rule

A nice article on Female executives in banking. A decent read.

9. IPad
    Probably the biggest news of this week. I tried to find some good posts on it but none of them made the cut. May be because it is not available till March. Personally, I was underwhelmed. No multitasking or GPS ? You got to be kidding me. But may be I will hold my judgment till I play with one ๐Ÿ™‚

Read Full Post »

Using LaTeX in WordPress

LaTeX ,for people, who have not used it , is an awesome and a must learn tool . Quoting from its home page , โ€œLaTeX is a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents.โ€ Most of the papers in CS, Math and Physics communities use LaTeX in one form or the other.

I have been using LaTeX for almost an year, primarily as a way to make notes when I listen to video lectures or read some technical book. I intend to write (hopefully) many articles , mathematical in nature which will need LaTeX. So I have been exploring ways to add mathematical content in WordPress blog.

Luckily, WordPress supports LaTex and some of its most popular ams packages. You can get more information here. In summary , you create latex text by putting your content within a special markup called latex .

$latex all the LaTeX code here $

For more information look here.

I guess this will be sufficient for most of my needs. There are some inconveniences using LaTeX in WordPress . For a discussion see here. For more power users, Luca has written a Python tool to convert a LaTeX file to a form ready to be copy pasted to WordPress. It can be downloaded here. I have not used it myself per se, but I intend to use it a lot in the near future. Given the fact that it is used both by Luca and Terry Tao, I am sure that it will satisfy all my needs and more.

So get your LaTeX rolling !!

Read Full Post »