I’ve been doing more research into less common types of data visualization techniques recently, and was reading up on slopegraphs.
Andy Kirk wrote a piece praising slopegraphs last December, which goes over the construction of a slopegraph with some example data very nicely. However I’ve seen some other bad examples of data visualization across the web using them, and just thought I’d put in my two cents.
Pros and Cons
Also it should be noted in this case there is more than one value in the independent variable. As long as the scale in the vertical direction is still consistent, the changes in quantity can still be compared by the slope of the lines, even if the exact values cannot be compared because the vertical position no longer corresponds directly to quantity.
Either way, this type of slopegraph is closer to a group of sparklines (as Tufte originally noted), as it allows comparison of the changes in the dependent variable across values of the independent for each value of the categorical variable, but not the exact quantities.
Was the disproportionately large payroll of the Yankees as obvious in the previous visualization? Maybe, but not as saliently. The relative size of the payroll was encoded in the thickness of the line, but quantity is not interpreted as quickly and accurately when encoded using area/thickness as it is when using position. Also because the previous data were ranked (vertical position did not portray quantity), the much smaller number of wins by Kansas relative to the other teams was not as apparent at is it here.
Fry notes that he chose not to use a scatterplot as he wanted ranking for both quantities, which I suppose is the advantage of the original treatment, and something which is not depicted in the alternative I’ve presented. Also Park correctly notes in the examples on his post that different visualizations draw the eye to different features of the data, and some people have more difficulty interpreting a visualization like a bubble chart than slopegraph. Still, I remain a skeptical functionalist as far as visualization is concerned, and prefer the treatment above to the former.
Really what we are interested in is the change in the quantity over the two values of the independent variable (year). So we can instead look at that quantity (change between the two years), and visualize it as a bar graph with a baseline of zero. Here the bars are again coloured by whether the change is positive or negative.
This is fine; however we lost the information encoded in the thickness of the lines. We can encode that using the lightness (intensity) of the different bars. Dark for > 25% change, light for the others:
Hmm, not bad. However we’ve still lost the information about the absolute value of points each year. So let’s make that the value along the horizontal axis instead.
Okay fine, now the length of the bars corresponds to the magnitude of the change in points across the two years, with positive changes being coloured blue and negative orange, and the shading corresponding to whether the change was greater or less than 25%.
However, even if I put a legend and told you what the colours correspond to, it’s pretty common for people to think of things as progressing from left to right (at least in Western cultures). The graph is difficult to interpret because for bars in orange the score for the first year is on the right, whereas for those in blue it’s on the left. That is to say, we have the absolute values, but direction of the change is not depicted well. Changing the bars to arrows solves this, as below:
Now we have the absolute values of the points in each year for each team, and the direction of the change is displayed better than just with colour. Adding the gridlines allows the viewer to read off the individual values of points more easily. Lastly, we encode the other categorical variable of interest (change greater/less than 25%) as the thickness of the line.
Like so. After creating the above independently, I discovered visualization consultant Naomi Robbins had already written about this type of chart on Forbes, as an alternative to using multiple pie charts. Jon Peltier also has an excellent in-depth description how to make these types of charts in Excel, as well as showing another alternative visualization option to slope graphs, using a dot plot.
Of course, providing the usual fixings for a graph such as a legend, title and proper axis labels would complete the above, which brings me to my last point. Though I think it’s a good alternative to slopegraphs, it can in no way compete in simplicity given that Dr. Tufte’s example of a slopegraph as it had zero non-data ink. And, of course, this type of graph will not work when there are more than two values in the independent variable which to compare across.
That being said, it is (as always) very important when making choices regarding data visualization to consider the pros and cons of different visualization types, the properties of the data you are trying to communicate, and, of course, the target audience.
References & Resources
Salary vs. Performance of MLB Teams by Ben Fry
salary vs performance scatterplot (Tableau Public)