Fine Cuppa Joe: 96 Days and 162 Cups of Coffee

Introduction

Let’s get one thing straight: I love me some coffee.
Some people would disagree with me on this, but coffee is really important. Really, really important, and not just to me. Not just because companies like Starbucks and Second Cup and Caribou and Timothy’s and Tim Hortons make it their business, but for another reason.
As far as I know, there are only three legal, socially acceptable drugs: alcohol, nicotine, and caffeine (and some would argue that the first two are not always socially acceptable). Coffee is really important because coffee is the most common, effective and ubiquitous source of delivery for that third drug – and one which is acceptable and ubiquitous not only socially, but also in the world of business.
I remember a long time ago there was a big blackout. I remember that after people pointed out how such a widespread outage was caused by such a small point of failure – they said things like ‘This just goes to show how fragile our infrastructure is! If the terrorists want to win, all they have to do is take out one circuit breaker here or there and all of North America will collapse!’

Ha ha ha, yeah.
But I’d argue that if you really wanted all of North American society to shut down, you could really hit us where it hurts, take away something from us without which we are completely and totally hopeless – cut off our supply of coffee. Think about it! The widespread effects of everyone across all walks of life and all the industries suddenly going Cold Turkey on coffee would be far more damaging in the long run than any little black out. Run for the hills, the great Tim Hortons’ riots of 2013 have erupted and apparently the Mayans only missed date of The Apocalypse by a small margin!
Or at least I think so. Or at least I think the idea is entertaining, though I probably largely got the idea from this Dilbert comic (which I find funnier and more spot-on than most).
But I digress.

Background

Like I said, I love me some coffee (it says so in my Twitter profile), and I’m no stranger to quantified self either, so I thought it would be interesting to apply it and answer the question “Exactly how much coffee am I drinking?” amongst others.
I kept track of my coffee consumption over the period spanning November 30, 2012 to March 5, 2013. I recorded when I had coffee, where it was from, what size, and how I took it. It wasn’t until almost the end of January that I realized I could also be keeping track of how long it took me to consume each cup of coffee, so I started doing that as well. Every time I do something like this I forget then remember how important it is to think about data collection before you set off on your merry way (like for example with the post on my commute).
As well as keeping track of the amount of coffee I drank in terms of cups, I converted the cups to volume by multiplying by the following approximate values:
  • Mug / Small / Short – 240 ml
  • Medium / Tall – 350 ml
  • Large / Grande – 470 ml

Analysis

First and foremost, we examine where I consumed the majority of coffee from over the 3 month period. Starbucks is the clear winner and apparently I almost never go to Second Cup.
bar chart of coffee consumption by location
Second was at work (which is not really a fair comparison, as it’s actually Starbucks coffee anyways). Third is at Dad’s place, almost of all which is due to my being home over the holidays.
Next we look at the time of day when the coffees were consumed. I am going to use this as an illustrative example of why it is important to visualize data.
First consider a histogram for the entire time period of when all the java was imbibed:
histogram of coffee consumption by hour of day
You can see there are peaks at the hours of 10 AM and also at 2 PM. However is this telling the whole story? Let’s look at the all the data plotted out by time of day:
scatterplot of coffee consumption by date and time of day
Having the data plotted out, you can see there is a distinct shift in the hours of the day when I was drinking coffee around the beginning of January. The earliest cup of the day goes from being around 9 AM to around 8, and the latest from in the evening from around 8 PM to the late afternoon (3-4 PM). Well, what happened to constitute this shift in the time of my daily java consumption? Simple – I got a new job.
You can see this shift if we overplot histograms for the hour of day before and after the change:
combined histogram of coffee consumption by hour of day
You can see that the distribution of my coffee consumption is different after I started the new gig – my initial morning coffees occur earlier (in the hours of 7-8 AM instead of 9 or later). You wouldn’t have known that just from looking at the other histogram – so you can see why it’s important to look at all the data before you can go jumping ahead into any analysis!
Using the ml values for the different sizes as mentioned in the introduction, we can calculate the amount consumed per day in ml for visualization of my total coffee consumption over time in volume:
cumulative coffee consumption by date
You can see that my coffee consumption is fairly consistent over time. Over the whole time period of 96 days I drank approximately 50 L of java which comes out to about 520 ml a day (or about 1.5 Talls from Starbucks). 
We can see this by adding a trend line which fits amazing well, the slope is ~0.52 and R-squared ~0.998:
cumulative coffee consumption by date (with trend line)
So the answer to the question from the beginning (“Exactly how much coffee am I drinking?”) is: not as much as I thought – only about 1-2 cups a day. 
When I am drinking it? The peak times of day changed a little bit, but early in the morning and in the mid-afternoon (which I imagine is fairly typical).
How does my daily consumption look over the time period in question? Remarkably consistent.
And just in case you were wondering, out of the 162 cups of coffee I drank over the 3 months, 160 were black.

Conclusions

  • Majority of coffee bought from Starbucks
  • Marked shift in time of day when coffees were consumed due to change in employment
  • Regular / daily rate of consumption about 520 ml and consistent over period of examination
  • I’ll take mine black, thanks

How to Think Like an Analyst

So I was talking to my Aunt a couple weekends ago. My Aunt explained that though she was happy for me and the work that I do, she didn’t understand any of it. I tried my best to explain in general terms what web analytics, and analytics as a whole, is and is all about.

Our conversation continued, and I further offered that though she may not understand exactly what it is I do, she could understand the spirit in which it is done – the way to think about analysis.

Not everyone is cut out to be an analyst. There are those who have always been good with numbers, and there are those describe themselves as ‘the one who was always bad at math in high school’.

And that’s fine. Like I said, not everyone is cut out to be an analyst, not everyone wants to be, and not everyone can be. However, it is a firm belief of mine that everyone, everyone can think like an analyst.

And I’ll show you how.

The Questions You Need to Ask

True, you may not have the skill set necessary to be an analyst – you may, in fact, be one of those who was bad at math in high school, and when people mention spreadsheets you think about bedding not computer software.

But that doesn’t mean you can’t think like an analyst.

Part of being a good analyst is not just being able to do analysis, but being able to ask the right questions which lead to it.

All good analysis starts with a question. So all you have to do is ask the right questions.

And, in this humble author’s opinion, those two questions are how and what.

Question 1 – How (many)?

This is the simple question, and is one of measurement and descriptive statistics.

Thinking quantitatively is a key part of thinking like an analyst.

If you learn to think in this way you will find that ordinary, everyday situations can become part of ordinary, everyday analytics.

For instance, any time you are at some sort of gathering of people or social function you can think like an analyst by asking yourself the question – how many?

How many men are there in the room? How many women? How many are there proportionally?
How many people at the party are wearing glasses? How many are not?
How many people at the networking mixer are eating and drinking? How many are just eating? Just drinking?
How many people at the dinner party decided to have the chicken? How many did not? How many finished all their food and how many left food behind? How many plates did each person have?

And so on.

But as I said, the question of how many is simply one of describing the state of affairs. To really think like an analyst you also need to ask the second question.

Question 2 – What (is the relationship between….)?

The second question helps you to think like an analyst and go beyond simply describing things quantitatively and start thinking about possible relationships.

Here, to illustrate how thinking like an analyst is subject-matter independent, we can pick a topic, any topic. So let’s go with….. peanut butter. I like peanut butter.

The second question is what lately I find I’m asking myself all the time about almost everything (whether I like it or not). And that very important question you can ask yourself is – what is the relationship between……?

Pick properties of, or related to, your subject of analysis – some of which you may compare across or between, and others which may be measured. In technical terms these are known as dimensions and measures, respectively.

For example, using our randomly chosen topic of peanut butter, first we brainstorm all the things we could possibly think of related to peanut butter.

Type (chunky or smooth), brand, container (size, type, colour), price, sales, consumption, nutritional content,  location, time…

And so on. Let’s stop there.

Then we ask the question: What is the relationship between a and b? Where a is one of the things we brainstormed as a category, and b is one of the things we brainstormed as a measurement.

What is the relationship between the type of peanut butter and its nutritional content? (That is, how is chunky peanut butter different from smooth peanut butter in terms of calories and fat?)

What is the relationship between the brand of peanut butter and its sales? (That is, how do the total sales of different peanut butter brands compare? You could also add time and location dimensions – how do sales between brands compare this year? Last year? Worldwide? In Canada vs. the US? Per store in Ontario?)

What is the relationship between the container size and location? (That is, do different countries have different sized containers for peanut butter? What is the average container size per country? In each region? Or look at location in store – are all the containers in the same aisle or are the different sizes in different places (e.g. the bulk food aisle)? How is the distribution of container sizes broken down across different stores across the country?

And so forth. As you can see, there are so many questions you can ask by combining properties of a topic of interest in this way. And these are only questions with two properties – many more questions of greater complexity could be generated by combining multiple properties (e.g. What is the relationship between peanut butter sales and consumption and the brand and type?)

The Hard Question

There is one final question which I did not mention, which, if you really want to think like an analyst, is the most important question of all. In fact, I would go further and say that even if you are not thinking like an analyst, this is the most important question of all. And that ultimate question is why.

The question of why is the most important question, the hardest question, the question which drives all of the analysis that analysts the world over do.

Why.

Why has our new marketing initiative not resulted in increased sales in the third quarter? Why is the sky blue? Why does Amazon send me so many emails related to Home and Garden products? Why can’t I sleep at night? Why are there three million kinds of laundry detergent but only two kinds of baking powder? Why? Why? Why.

This is the question which drives all investigation, which drives all measurement, which drives all analysis.

And this is the question, whether you want to think like an analyst or not, you should always be asking yourself.