I have a startling confession to make: sometimes I just want things to be simple. Sometimes I just want things to be easier. All this talk about "Big Data", predictive modeling, machine learning, and all those associated bits and pieces that go along with data science can be a bit mentally exhausting. I want to take a step back and work with a smaller dataset, something simple, something easy, something everyone can relate to - after all, that's what this blog started out being about.
A while back, someone posted on Slashdot that the folks over at Finder.com had put together data sets of the install size of every PS4 and Xbox One game released to date. Being a a console owner myself - I'm a PS4 guy, but no fanboy or hardcore gamer by any means - I thought this would be a fun and rather unique data set to play around with, one that would fit well within the category of 'everyday analytics'. So let's take a look shall we?
A small aside here on data visualization: it's worth noting that the above is a good way to go for making a bar chart from a functional perspective. Since there are data labels and the y-axis metric is in the title, we can ditch the axis and maximize the data-ink ratio (well, data-pixel anyhow). I've also avoided using a stacked bar chart as interpretation of absolute values tends to suffer when not read from the same baseline. I'm okay with doing it for relative proportions though - as in the below, which further illustrates the difference in release type proportion between the two consoles:
Okay, that's interesting. But if you're like me, you'll be thinking about how 99% of the phenomena in the universe are distributed by a power law or have some kind of non-Gaussian based distribution, and so averages are actually not always such a great way to summarize data. Is this the case for our install size data set?
But is this entirely due to the indie games having small sizes? Might the major releases be centered around some average or median size?
Finally we can look at the distribution of the install sizes by using another type of visualization suited for this task, the boxplot. While it is at least possible to jury-rig up a boxplot in Excel (see this excellent how-to over at Peltier Tech) Google Sheets doesn't give us as much to work with, but I did my best (the data label is at the maximum point, and the value is the difference between the max and Q3):
Okay, that's all very interesting, but what about games that are available for both consoles? Are the install sizes generally the same or do they differ?
Difference in Install Size by Console Type
Because we've seen that the Xbox install sizes are generally larger than Playstation, here I take the PS4 size to be the baseline for games which appear on both (that is, differences are of the form XBOX Size - PS4 Size).
Of the 618 unique titles in the whole data set (798 titles if you double count across platform), 179 (~29%) were available on both - so roughly only a third of games are released for both major consoles.
Let's take a look at the difference in install sizes - do those games which appear for both reflect what we saw earlier?
Okay, but how much larger? Are we talking twice as large? Five times larger? Because the size of the games varies widely (especially between the release types) I opted to go for percentages here:
Finally, just to ground this a bit more I thought I'd look at the top 10 games in each release type where the absolute differences are the largest. As I said before, here the difference is Xbox size minus PS4:
For indies, we can see the absolute difference is a lot smaller for those games bigger on PS4, with Octodad having the largest difference of ~1.4 GiB (56% of its PS4 size). Warframe is 19.6 GiB bigger on Xbox than PS4, or 503% larger (!!)
Finally, I've visualized all the data together for you so you can explore it yourself. Below is a bubble chart of the Xbox install size plotted against PS4, coloured by release type, where the size of each point represents the absolute value of the percentage difference between the platforms (with the PS4 size taken to be the baseline). So points above the diagonal are larger for Xbox than PS4, and points below the diagonal are larger for PS4 than Xbox. Also note that the scale is log-log. You can see that most of the major releases are pretty close to each other in size, as they nearly lie on the y=x line.
- XBox games generally tend to have larger install sizes than PS4 ones, even for the same title
- Game install sizes follow a power law, just like most everything else in the universe (or maybe just 80% of it)
- What the heck a GiB is