Analyzing my blog traffic is one of my favorite past times. Seeing traffic surge and strangers using my posts gives me gratification. Initially, my analysis was fairly low tech – checking WordPress stats page periodically. After doing it a few times, I realized I could do more methodically – and I can use the statistics to do some rudimentary data mining.
As you know, I have a WordPress.com blog. WordPress exposes the statistics in two ways : As a flash chart in the dashboard and as API service.
Viewing Blog Stats – Low Tech Way
If you want to take a look at your blog’s stats today , then you can do it in the dashboard. Assuming you are logged into your WordPress login, go to your blog. You will see WordPress toolbar at the top of the page. Click "My Dashboard". It will show a chart of total daily visits for the past 15 days. If you want more details, hover your mouse over "My Account" and select "Stats".
This page shows a wealth of information about your blog but it is limited to 2 days – today and yesterday. There are atleast 6 important statistics in that page.
WordPress Stats Sections
a) Views Per Day
This is probably the statistic you are most interested in. This plots the total number of visits each day for the last 30 days. This also has multiple tabs which allows you to aggregate the statistics information. For example you can view the total weekly and monthly visits . This gives a basic view about how your blog is faring.
Referrers are basically web pages from which visitors reached your blog. There are different types of referrers : pingbacks, related posts, explicit linking etc. If a certain website is driving lot of visitors to your blog , it is time to notice ! Referrers allows you to get that insight. This panel shows the top 10 referrers. You can click on the "Referrers" link to get the full listing.
c) Top Posts and Pages
This shows the top 10 posts for the given day. One interesting thing is that they track home page separately. Also if you have different pages (eg About), then visits to these pages are also tracked. Again, if you want to see the number of visits to each page, you can click on the title link.
One interesting thing is that each of the posts/pages in this section also has an addition chart icon near them. Clicking on them gives the stats for that particular post alone. The new page shows a chart detailing how many visits occurred to that page in the last few weeks. Most interestingly , they also show the average visits per day for the last few weeks and months. This gives a glimpse into the lasting popularity of your post.
d) Search Engine Terms
I am not very clear what this exactly means – The instruction says "These are terms people used to find your blog." . I am not sure if this corresponds to search terms typed in search engines or in WordPress search boxes etc . Anyway, this panel gives information about the search terms that fetch visits to your blog.
Clicks panel shows the list of links in your blog that your visitors clicked. In a way , you can consider it as an inverse of referrers. In this case, you act as a referrer for some other blog. This post gives some hints about the type of visitors to your blog.
f) Aggregate Statistics
There is also another panel that shows some aggregate stats. This shows the total number of views to your blog so far , number of blogs and posts , email subscribers etc.
A Better Way
Using WordPress Stats page to get your data is fairly low tech. This gives some data which can only give you an instinct of how things go. But it will not give you any deeper insights. For doing more data mining, you need data – lots of data. Fortunately, WordPress makes available a stats API which you can query to get regular data. In the rest of the post , we will talk about the API and how to use the data that you get out of it.
Using WordPress Stats API
The primary url which provides the stats is http://stats.wordpress.com/csv.php . You can click on it to see the required parameters. There are 4 important parameters to this API.
a) api_key : api_key is a mandatory parameter and ensures that only owner queries the website. There are three ways to get this information. This key is emailed to you at the time you created your blog. Or you can access it at My Dashboard -> Users -> Personal Settings. This will show your api key. For the truly lazy click this url .
b) blog_id : This is a number which uniquely identifies your blog. Either blog_id or blog_uri is mandatory. I generally prefer blog_id. Finding blog_id is a bit tricky. Go to the blog stats page (My Account -> Stats). Click on the title link of "Top Posts and Pages". This will open a new page which shows the statistics for last 7 days. If you look at the page’s url, it will have a parameter called blog. The value of this parameter is the blog_id . Atleast for my blog , it is a 8 digit number.
c) blog_uri : If you do not want to take all the trouble of getting blog_id , use blog_uri. This is nothing but the url of your blog (http://blah.wordpress.com).
d) table : This field identifies the exact statistic you want. One of views, postviews, referrers, searchterms, clicks. Each of these correspond to the sections of WordPress stats discussed above. If table is not specified , views is selected as the default table.
You can get more details from the stats API page given above.
Sample Python Scripts to fetch WordPress Stats
I have written a few scripts which fetch each of the WordPress stats. One of them run every hour and gets the total number of views so far for the whole blog. The other scripts runs once a day and fetch the total clicks, search terms, referrer and top posts for that day. All of these store the data as a csv file which lends itself to analysis.
If you are interested in the scripts , the links to them are :
1. getBlogDaysStats.py : Fetches the total views for the whole blog at the current time. For best results run every hour.
2. getBlogReferrers.py : Fetches all the referrers to your blog.
3. getBlogPostViews.py : Fetches the number of views for individual blog posts and pages.
4. getBlogSearchTerms.py : Fetches all the search terms used to find your blog today.
5. getBlogClicks.py : Fetches the urls that people who visited your blog clicked.
How to Collect WordPress Statistics
The first step is of course to collect data periodically. I use cron to run the scripts. My crontab file looks like this :
11 * * * * /usr/bin/python scriptpath/getBlogDaysStats.py
12 0 * * * /usr/bin/python scriptpath/getBlogClicks.py
14 0 * * * /usr/bin/python scriptpath/getBlogPostViews.py
15 0 * * * /usr/bin/python scriptpath/getBlogReferrers.py
16 0 * * * /usr/bin/python scriptpath/getBlogSearchTerms.py
Basically, I run the getBlogDaysStats every hour and other scripts every day. I also run the rest of scripts at early morning so that it fetches the previous day’s data.
How to Use WordPress Statistics
If you run the scripts for few days, you will have lot of data. The amount of analysis you can make is limited only by your creativity. In this section, I will tell some of the ways I use the stats instead of giving an explicit how-to.
1. Views per day : It is collected by getBlogDaysStats.py. The most basic stuff is to chart them. This will give a glimpse of your trend – If it is static or climbing, then good news. If it is falling down it is something to worry about. I must also mention that have a more or less a plateau in your chart happens often. For eg in my blog, the charts follow a pattern – It increases for quite some time , then stays at the same level for a long time and then increases again. Also , worrying about individual day’s statistics is not a good idea. Try to aggregate them into weekly and monthly values as they give a less noisy view of your blog traffic.
Another common thing to do is to analyze per hour traffic. This can be easily derived from the output of the script. Basically, if m is the number of views at time a and n is the number of views at time b , then you received n-m views in b-a hours. I usually calculate it for every hour. This gives a *basic* idea of peak time for your blog – You can also infer your primary audience , although the interpretation is ambiguous. As an example , I get most of my traffic at night – especially between 1 AM – 9 AM. Morning time traffic is pretty weak and it picks up again in the evening. Interpreting this is hard as my blog covers a lot of topics – but if your blog is more focused you learn a lot about your visitors.
2. Referrers : This is a very useful statistic if you do some marketing for your blog. For best results, you may want to use just the domain of the url instead of the whole url for analysis. Using it you can figure out which sites drive traffic to your blog. If it is another blog, then it is a good idea to cultivate some friendship with that blog’s owner. For eg, for my blog , I found that digg drives more traffic that reddit. Also facebook drives some traffic to my blog – so I use WordPress’s facebook publicize feature. I also find that I get some traffic due to WordPress’s related posts feature which means that I must use good use of categories and tags. Your mileage may vary but I hope the basic idea is clear.
3. Individual Post Views : This is probably the most useful set of statistics. Basically , it allows you to analyze the traffic of individual posts over a period of time. I have a file which associates a blog post with extra information : For eg it stores the categories, tags, original creation date, all modification dates etc. (If you are curious , I store the information in JSON format). Once you have this information lot of analysis is possible.
a. You can figure out your audience type. If for a post, you have lot of audience in the first week and almost no audience from then on – then most likely your audience is driven by subscription. If it keeps having a regular traffic, then probably it has some useful stuff and traffic is constantly driven to it by search engines. For eg, my Biweekly links belong to the first scenario : When I publish one, lot of people visit it and then after a few days it gets practically no visits. In the other case, my post of Mean Shift gets a steady stream of views every week. If you want to sustain a good viewership, you may want to write more posts which can attract long term views.
b. If you use categories and tags wisely, you can tally the number of views per each category. This will give you an idea of the blog posts which users prefer. I noticed that my audience seems to like my Linux / Data Mining posts than other categories. So it is a good idea to write more of those posts.
c. You can kind of see a pareto effect in your blog posts. For eg, my top 10 blogs account for atleast 70% of my blog traffic. So if I could identify them correctly, I can write lesser posts but still maintain my blog traffic 😉
You can do lot more than these simple analysis but this is just a start.
4. Search Terms : This is another neat statistic. You can use it to figure out the primary way in which users access your blog. For eg, the ratio of total blog post view for a day and number of search terms for the day is quite interesting. If the ratio is high, then most of the people find your blog using search engines. In a way , this a potential transient audience whom you can convert to regular audience. If the ratio is small , then your blog gets views by referrers and regular viewers. This will assure you a steady audience , but it is slightly hard to get new people "find" your blog.
This statistic also tells you which keywords the viewers use to find my blog. You can gleam lot of interesting things from this. For eg, almost all of my search terms are 3-5 words long and usually very specific. This either means that the user is an expert and has crafted specific query. It may also mean that user rewrote the query and my blog was not found in the general query. I also use the terms to figure out if the user would have been satisfied with my blog. For eg, I know that a user searching "install matlab to 64-bit" will be satisfied while some one who searches "k means determine k" will not be. You can do either of two things : augment your blog post to add information that users are searching , or point users to resources that satisfies their query. For eg, I found lot of people reached my blog searching for how to find k. I found geomblog had couple of good posts on it and updated my blog to link to these posts. Some times, I may add a FAQ if same search query comes multiple times and if my page contains the information but is obscure. Eg : lot of people reached my blog searching for Empathy’s chat log location. My post of Empathy had it but not in a prominent fashion. So I added a FAQ which points the answer immediately.
5. Clicks : This statistic is tangentially useful in finding out which links the user clicks. One way I use it is to gauge the "tech" level of my reader who visits my blog using search engines. I usually link to Wikipedia articles for common terms. If the user clicks on these basic terms often ,then it might mean that I write articles at a level higher that the typical user and I have to explain it better. For eg , in my post of K-Means , this was the reason I explain supervised and unsupervised learning at the start even though most people learning k-means already know it.
Other Resources for Blog Traffic
There are other locations that give some useful information about your traffic. Some of them are :
a. Google Web Master Site : This is arguably one of the most comprehensive information about your blog post’s performance in Google. You can see it a Google Webmaster Page -> Your site on web -> Search queries. It has information like impressions, click throughs, average position etc. You can download all of them in a csv file too ! Literally a gold mine for data lovers.
b. Feedburner : Even though, WordPress has a feed, I switched to FeedBurner. One of the main reason was that it gave me a csv file detailing the number of views by my subscribers.
c. Quantcast : Useful for aggregate information. It has multiple charts that detail the number of views, unique visitors etc. The Quantcast data might not be accurate as it is usually estimated – but it gives a broad gauge of your blog. It also has some statistic which says how many of your visitors are addicts , how many are pass throughs etc. Quite useful !
d. Alexa : Similar to Quantcast . I primarily use my Alexa Rank for motivation to improve my blog ranking.
I primarily use Python prompt to play with data. If you are not comfortable with programmatic tweaking, use spreadsheets to do these analysis. If you have Windows, you can do lot of powerful analysis by importing the statistics obtained by the scripts into Microsoft Excel. Excel has a neat feature called Pivot tables. It is also an advanced topic that I will not discuss here. You can do some fantastic analysis using pivots. They also give you the ability to view the same data from multiple perspectives.
In this post, I have barely scratched the surface – You can do lot of amazing analysis using WordPress Stats API . I will talk about more complex analysis in a later post. Have fun with the data !