The human mind is limited.
We can only process so much information at one time. Numerals are text which communicate quantity. However, unlike other text, it’s a lot harder to read a whole bunch of numbers and get a high-level understanding of what is being communicated. There are sentences of numbers and quantities (these are called equations, but not everyone is as literate in them) however simply looking at a pile of data and having an understanding of the ‘big picture’ is not something most people can do. This is especially true as the amount of information becomes larger than a table with a few categories and values.
If you’re a market research, business, data, financial, or (insert other prefix here) analyst, part of your job is taking a lot of information and making sense of that information, so that other people don’t have to. Let’s face it – your Senior Manager or The VP doesn’t have time to wade through all the data – that’s why they hired you.
Ever since Descartes’ epiphany (and even before that) people have been realizing that there are other, more effective ways to communicate information than having to look at all the details. You can communicate the shape of the data without knowing exactly how many Twitter followers were gained each day. You can see what the data look like without having to know the exact dollar value for sales each and every day. You can feel what the data are like, and get an intuitive understanding of what’s going on, without having to look at all the raw information.
Enter data visualization.
Like any practice, data visualization and the depicting quantitative relationships visually can be done poorly or can be done well. I’m sure you’ve seen examples of the former, whether it be in a presentation or other report, or perhaps floating around the Internet. And the latter, like so many good things, is not always so plentiful, nor appreciated. Here I present some finer points between data visualization choices, in the hope that you will always find yourself depicting data well.
Pie (and Doughnut) Chart
Though again, as the number of quantities being compared increases the readability and visual utility generally decreases and you are better served by a bar chart in these cases. Also there is the issue that the area of each annulus will be different for the same angle, depending upon which ring it is in.
With circular charts it is best to avoid legends as this causes the eye to flit back and forth between the different segments and the legend, however when abiding by this practice for doughnut charts labeling becomes a problem, as you can see above.
|Isn’t that much better?|
Q. When is it a good idea to use a 3-D pie chart?
A. Never. Only as an example of bad data visualization!
Stephen Few contends that this still makes it difficult to compare proportions, similar to the problem with pie charts, and has other suggestions [PDF], though I think it is fine on some occassions, depending the nature of the data being depicted.
Scatterplot (and Bubble Graphs)
When used to depict relationships occurring over time, we instead use a special type of scatterplot known as a line graph (next section).
A bubble chart is a type of scatterplot used to compare relationships between three variables, where the points are sized by area according to a third value. Care should be taken to ensure that the points are sized correctly in this type of chart, so as not to incorrectly depict the relative proportion of quantities
Relationships between four variables may also be visualized by colouring each point according to the value of a fourth variable, though this may be a lot of information to depict all at once, depending upon the nature of the data. When animated to include a fifth variable (usually time) it is known as a motion chart, which is perhaps most famously demonstrated in Hans Rosling’s landmark TED Talk which has become somewhat of a legend.
For example, it makes sense to compare sales over time with a line graph, as time is numerical quantity that varies continuously:
However it would not make sense to use a line graph to compare sales across departments as that is categorical / nominal. Note that there is one exception to this rule and that is the aforementioned Pareto chart.
Omitting the points on the line graph and using a smooth graph instead of line segments creates an impression of more data being plotted, and hence a greater continuity. Compare with the plot above the one below:
So practically speaking save the smooth line graphs for when you have a lot of data and the points would just be visual clutter, otherwise it’s best to overplot the points to be clear about what quantities are being communicated.
Also note that unlike a bar chart, it is acceptable to have a non-zero starting point for the y-axis of a line graph as the change in values is being depicted, not their absolute values.