How much do I weigh? – Quantified Self Toronto #12

Recently I spoke at the Quantified Self Toronto group (you can find the article on other talk here).

It was in late November of last year that I decided I wanted to lose a few pounds. I read most of The Hacker’s Diet, then began tracking my weight using the excellent Libra Android application. Though my drastic reductions of my caloric intake are no more (and so my weight is now fairly steady) I continue to track my weight day-to-day and build the dataset. Perhaps later I can do an analysis of the patterns in fluctuations in my weight separate from the goal of weight loss.

What follows is a rough transcription of the talk I gave, illustrated by the accompanying slides.

Hello Everyone, I’m Myles Harrison and today I’d like to present my first experiment in quantified self and self-tracking. And the name of that experiment is “How Much Do I Weigh?”

So I want to say two things. First of all, at this point you are probably saying to yourself, “How much do I weigh? Well, geez, that’s kind of a stupid question… why don’t you just step on a scale and find out?” And that’s one of the things I discovered as a result of doing this, is that sometimes it’s not necessarily that simple. But I’ll get to that later in the presentation.

The second thing I want to say is that I am not fat.

However, there are not many people whom I know where if you ask them, “Hey, would you like to lose 5 or 10 pounds?” the answer would be no. The same is true for myself. So late last November I decided that I wanted to lose some weight and perhaps get into slightly better shape. Being the sort of person I am, I didn’t go to the gym, I didn’t go a personal trainer, and I didn’t meet with my doctor to discuss my diet. I just Googled stuff. And that’s what lead me to this

The Hacker’s Diet, by John Walker. Walker was one of the co-founders of the company Autodesk which created the popular Autocad software and later went on to become a giant multinational company. Mr. Walker woke up one day and had a realization. He realized that he was very successful, very wealthy, and had a very attractive wife, but he was fat. Really fat. And so John Walker though, “I’ve used my intelligence and analytical thinking to get all these other great things in my life, why can’t I apply my intelligence to the problem of weight, and solve it the same way?” So that’s exactly what he did. And he lost 70 pounds.

Walker’s method was this. He said, let’s forget all about making this too complicated. Let’s look at the problem of health and weight loss as an engineering problem. So there’s just you:

and your body is the entire system, and all this system has, the only things we’re going to think about are inputs and outputs. I don’t care if you’re eating McDonald’s, or Subway, or spaghetti 3 times a day. We’re just talking about the amount of input – how much? Therefore, from this incredibly simplified model of the human body, the way to lose weight is just to ensure that the inputs are less than the outputs.

IN < OUT

Walker realized that this ‘advice’ is so simple and obvious that it is nearly useless in itself. He compared it to the wise financial guru, on being asked how to make money on the stock market by an apprentice, giving the advice: “It’s simple, buy low and sell high.” Still, this is the framework we have as a starting point, so we proceed from here.

So now this raises the question, “Okay well how do we do that?” Well, this is a Quantified Self meet up, so as you’ve probably guessed, we do it by measuring.

We can measure our inputs by counting calories and keeping track of how much we eat. Measuring output is a little more difficult. It is possible to approximate the number of calories burned when exercising, but actually measuring how much energy you are using on a day-to-day basis, just walking around, sitting, going to work, sleeping, etc. is more complicated, and likely not practically possible. So instead, we measure weight as a proxy for output, since this is what we are really concerned with in the first place anyhow. i.e. Are we losing weight or not?

Okay, so we know now what we’ve got to do. How are we going to keep track of all this? Walker, being a technical guy, suggests entering all the information into a piece of computer software, oh, say, I don’t know, like a certain spreadsheet application. This way we can make all kinds of graphs and find the weighted moving average, and do all kinds of other analysis. But I didn’t do that. Now don’t get me wrong, I love data and I love analyzing it, and so I would love doing all those different types of things. However, why would I use a piece of software that I hate (and am forced to on a regular basis) any more than I already have to? Especially when this is the 21st century and I have a perfectly good smartphone and somebody already wrote the software to do it for me!

So, I’m good! Starting in late November of last year I followed the Hacker’s Diet directions and weighed myself every day (or nearly every day, as often as I could) at approximately the same time of day. And along the way, I discovered some things.

One day I was at work and I got a text from my roommate, and it said “Myles, did you draw a square on the bathroom floor in black permanent marker?” To which I responded, “Why yes I did.” To which the response was “Okay, good.” And the reason I that I drew a square on the tiles of the bathroom floor in black permanent marker was because of observational error. More specifically, measurement error. 

If you know anything about your typical drugstore bathroom scale you probably know that they are not really that accurate. If you put the same scale on an uneven surface (say, like tiles on a bathroom floor) you can make the same measurement back-to-back and get wildly different values. That is to say the scales have a lot of random error in their measurement. And that’s why I drew that square on the bathroom floor. That was my attempt to control measurement error, by placing the scale in as close to the same position I could every morning when I weighed myself. Otherwise you get into this sort of bizarre situation where you start thinking, “Okay, so is the scale measuring me or am I measuring the scale?” And if we are attempting to collect some meaningful data and do a quantified self experiment, that is not the sort of situation we want to be in.

So I continued to collect data from last November up until today. And this is what it looks like.

As you can see like most dieters, I was very ambitious at the start and lost approximately 5 pounds between late November and and the tail end of December. That data gap, followed by a large upswing corresponds to the Christmas holidays, when I went off my diet. After that I continued to lose weight, albeit somewhat more gradually up until about mid-March, and since then I have ever-so-slowly been gaining it back, mostly due to the fact that I have not been watching my input as much as I was before.

So, what can we take away from this graph? Well, from my simple ‘1-D’ analysis, we can see a couple of things. The first thing, which should be a surprise to no one, is that it is a lot easier to gain weight than it is to lose it. I think most everyone here (and all past dieters) already knew that. 

Secondly, my diet aside, it is remarkable to see how much variability there is in the daily measurements. True, some of this may be due to the aforementioned measurement error, however in my readings online I also found that a person’s weight can vary by as much as 1 to 3 pounds on a day-to-day basis, due to various biological factors and processes.

Walker comments on this variability in the Hacker’s Diet. It is one of his reasons as to why looking at the moving average and weighing oneself every day is important, if you want to be able to really track whether or not a diet is working. And that’s why doing things like Quantified Self are important, and also what I was alluding to earlier when I said that the question of “How much do I weigh?” is not so simple. It’s not simply a matter of stepping on the scale and looking at a number to see how much you weigh. Because that number you see varies on a daily basis and isn’t a truly accurate measurement of how much you ‘really’ weigh.

!

This ties into the third point that I wanted to draw from the data. That point is that the human body is not like a light switch, it’s more like a thermostat. I remember reading about a study which psychologists did to measure people’s understanding of delayed feedback. They gave people a room with a thermostat, but there was a delay in the thermostat, and it was set to something very very high, on the order of several hours. The participants were tasked with getting to room to stay at a set temperature, however none of them could. Because people (or most people, anyhow) do not intuitively understand things like delayed feedback. The participants in the study kept fiddling with the thermostat and setting it higher and lower because they thought it wasn’t working, and so the temperature in the room always ended up fluctuating wildly. The participants in the study were responding to what they saw the temperature to be when they should have been responding to what the temperature was going to be.

And I think this is a good analogy for the problem with dieting and why it can be so hard. This is why it can be easy to become frustrated and difficult to tell if a diet is working or not. Because if you just step on the scale every day and look at that one number, you don’t see the overall picture, and it can be hard to tell whether you’re losing weight or not. And if you just see that one number you’d never realize that though I can eat a pizza today and I will weight the same tomorrow, it’s not until 3 days later that I have gained 2 pounds. It’s a problem of delayed feedback. And that’s one of the really interesting conclusions I came to ask a result of performing this experiment.

So where does this leave us for the future?

Well, I think I did a pretty good job of measuring my weight almost every day and was able to make some interesting conclusions from my simple ‘1-D’ analysis. However, though I did very well tracking all the output, and did not track any of my inputs whatsoever. In the future if I kept track of this as well (for instance by counting calories) I would have more data and be able to draw some more meaningful conclusions about how my diet is impacting my weight.

Secondly, I did not do one other thing at all. I didn’t exercise. This is something Walker gets to later in his book too (like most diet/health books) however I did not implement any kind of exercise routine or measurement thereof.

In the future I think if I implement these two things, as well as continuing with my consistent measurement of my weight, then perhaps I could ‘get all the way there’

 

|—————| 100%

 
That was my presentation, thank you for listening. If you have any questions I will be happy to answer them.

References / Resources

Libra Weight Manager for Android
https://play.google.com/store/apps/details?id=net.cachapa.libra 

The Hacker’s Diet
http://www.fourmilab.ch/hackdiet/www/hackdiet.html 

Quantified Self Toronto
http://quantifiedself.ca/ 

My bookshelf

I’d like to start with something small, and simple. The thing about analyzing the data of your own life is that you are the only one doing the research, so you also have to collect all of the data yourself. This takes effort; and, if you’d like to build a large enough data set to do some really interesting (and valid) analysis, time.

So I thought I’d start small. And simple. So I thought, what is an easily available source of data in my life to do some preliminary work? The answer was right next to me as I sat at my desk.

I am not a bibliophile by any stretch of the imagination, as I try to make good use of the public library when I can. I’d prefer to avoid spending copiously on books which will be read once and then collect dust. I have, over time however, amassed a small collection which is currently surpassing the capacity of my tiny IKEA bookcase.

I catalogued all the books in my collection and kept track of a few simple characteristics: number of pages, list price, publication year, binding, type (fiction, non-fiction or reference), subject, and whether or not I had read the book from cover-to-cover (“Completed”).

At the time of cataloguing I had a total of 60 books on my bookshelf. Summary of data:

> source(“books.R”)
[1] “Reading books.csv”
> summary(books)

Pages     

Min.   :  63.0 
1st Qu.: 209.5 
Median : 260.0 
Mean   : 386.1 
3rd Qu.: 434.0 Max.   :1694.0 
      Binding        Year               Type              Subject 
 Hardcover:21   Min.   :1921   Fiction    :15   Math          :12 
 Softcover:39   1st Qu.:1995   Non-fiction:34   Communications: 7 
                Median :2002   Reference  :11   Humour        : 6 
                Mean   :1994                    Coffee Table  : 5 
                3rd Qu.:2006                    Classics      : 4 
                Max.   :2011                    Sci-Fi        : 4 
                                                (Other)       :22 
     Price        Completed
 Min.   :  1.00   -:16    
 1st Qu.: 16.45   N:13    
 Median : 20.49   Y:31    
 Mean   : 35.41           
 3rd Qu.: 30.37           
 Max.   :155.90           

Some of this information is a bit easier to interpret if provided in visual form (click to enlarge):


Looking at the charts we can see that I’m not really into novels, and that almost 1/5th of my library is reference books – due mainly to textbooks from university I still have kicking around. For about 1/3rd of the books which are intended to be read cover-to-cover I have not done so (“Not Applicable” refers to books like coffee-table and reference books which are not intended to be read in their entirety).

Breaking it down further we look at the division by subject/topic:

Interestingly enough, the topics in my book collection are varied (apparently I am well-read?), with the largest chunks being made up by math (both pop-science and textbooks) and communications (professional development reading in the last year).

Let’s take a look at the relationship between the list price of books and other factors.

As expected, there does not appear to be any particular relationship between the publication year of the book and the list price. The outliers near the top of the price range are the textbooks and those on the very far left of publication date are Kafka.

A more likely relationship would be that between a book’s length and its price, as larger books are typically more costly. Having a look at the data for all the books it appears this could be the case:

We can coarsely fit a trendline to the data:
> price <- books$Price
> pages <- books$Pages
> page_price_line <- lm(price ~ pages)
> summary(page_price_line)

Call:
lm(formula = price ~ pages)

Residuals:
    Min      1Q  Median      3Q     Max
-56.620 -13.948  -6.641  -1.508 109.802

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  9.95801    6.49793   1.532    0.131   
pages        0.06592    0.01294   5.096 3.97e-06 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 32.19 on 58 degrees of freedom
Multiple R-squared: 0.3092,    Adjusted R-squared: 0.2973
F-statistic: 25.96 on 1 and 58 DF,  p-value: 3.971e-06
  
 

Our p-value is super small however our goodness of fit (R-squared) is not great. There appears to be some sort of clustering going on here as the larger values (both in price and pages) are more dispersed. We re-examine the plot and divide by binding type:

The softcovers make up the majority of the tightly clustered values and the values for the hardcovers seem to be more spread out. The dashed line is the linear fit for the hardcovers and the solid line for the soft. However the small number (n=21) and dispersion of the points for the former make even doing this questionable. That point aside, we can see on the whole that hardcovers appear to be more expensive, as one would expect. This is illustrated in the box plot below:
 

However there a lot of outlying points on the plot. Looking at the scatterplot again we divide by book type and the picture becomes clearer:

It is clear the reference books make up the majority of the extreme values away from those clustered in the lower regions of the plot and thus could be treated separately.

Closing notes:

  • I did not realize how many non-fiction / general interest / popular reading books have subtitles (e.g. Zero – The Biography of A Dangerous Idea) until cataloguing the ones I own. I suppose this is to make them seem more interesting, with the hopes that people browsing at bookstores to read the blurb on the back and be enticed to purchase the book.
  • Page numbering appears to be completely arbitrary. When I could I used the last page of each book which had a page number listed. Some books have the last page in the book numbered, others have the last full page of text numbered, and still others the last written page before supplementary material at the back (index, appendix, etc.) numbered. The first numbered page also varies, accounting for things like the table of contents, introduction, prologue, copyright notices and the like.
  • Textbooks are expensive. Unreasonably so.
  • Amazon has metadata for each book which you can see under “Details” when you view it (I had to look up some things like price when it was not listed on the book. In these cases, I used Amazon’s “list price”, the crossed out value at the top of the page for a book). I imagine there is an enormous trove of data which would lend itself to much more interesting and detailed analysis than I could perform here.