Training an RNN on the Archer Scripts

Introduction

So all the hype these days is around “AI”, as opposed to “machine learning” (though I’ve yet to hear an exact distinction between the two), and one of the tools that seems to get talked about most is Google’s Tensorflow.
I wanted to get playing around with Tensorflow and RNN’s a little bit, since they’re not the type of machine learning I’m most familiar with, with a low investment in time to see what kind of outputs I could come up with.

Background

A little digging and I came across this tutorial, which is a pretty good brief overview intro to RNNs, and uses Keras and computes things character-wise.
This is turn lead me to word-rnn-tensorflow, which expanding on the works of others, uses a word-based model (instead of character based).
I wasn’t about to spend my whole weekend rebuilding RNNs from scratch – no sense reinventing the wheel – so just thought it’d be interesting to play around a little with this one, and perhaps give it a more interesting dataset. Shakespeare is ok, but why not something a little more culturally relevant… like I dunno, say the scripts from a certain cartoon featuring a dysfunctional foul-mouthed spy agency?

Continue reading “Training an RNN on the Archer Scripts”

Top 10 Super Bowl XLVII Commercials in Social TV (Respin)

So the Super Bowl is kind of a big deal.

Not just because there’s a lot of football. And not just because it’s a great excuse to get together with friends and drink a whole lot of beer and eat unhealthy foods. And not because it’s a good excuse to shout at your new 72″ flatscreen with home theater surround that you bought at Best Buy just for your Super Bowl party and are going to try to return the next day even though you’re pretty sure now that they don’t let you do that any more.

The Super Bowl is a big deal for marketers. For creatives. For ‘social media gurus’. Because there’s a lot of eyeballs watching those commercials. In fact, I’m pretty sure there’s people going to Super Bowl parties who don’t even like football and are just there for the commercials, that is if they’ve not decided to catch all the best ones after the fact on YouTube.

And also, you know, because if you’re putting down $6 million for a minute of commercial airtime, you want to make sure that those dollars are well spent.

So Bluefin Labs is generating a lot of buzz lately as they were acquired by Twitter. TV is big, social media is big, so Social TV analytics must be even bigger, right? Right?

Anyhow Bluefin showed up recently in my Twitter feed for a different reason: their report on the Top 10 Super Bowl XLVII commercials in Social TV that they did for AdAge.

The report’s pretty and all, but a little too pretty for my liking, so I thought I’d respin some of it.

Breakdown by Gender:

Superbowl XLVII Commercial Social Mentions by Gender

You can see that the male / female split is fairly even overall, with the exception of the NFL Network’s ad and to a lesser extent the ad for Fast & Furious 6 which were more heavily mentioned proportionally by males. The Budweiser, Calvin Klein and Taco Bell spots had greater percentages of women commenting.

Sentiment

The Taco Bell, Dodge and Budweiser ads had the most mentions with positive sentiment. The NFL ad had a very large amount of neutral comments (74%), moreso than any other ad, proportionally. The Go Daddy ad had the most negative mentions, for good reason – it’s gross and just kind of weird. It wouldn’t be the Super Bowl if Go Daddy didn’t air a commercial of questionable taste though, right?
Superbowl XLVII Commercial Sentiment Breakdown by Gender
Superbowl XLVII Commercial Sentiment Breakdown by Gender (Proportional)
Lastly, I am going to go against the grain here and say that the next big thing in football is most definitely going to be Leon Sandcastle.

What The Smeg? Some Text Analysis of the Red Dwarf Scripts

Introduction

Just as Pocket fundamentally changed my reading behaviour, I am finding that now having Netflix (and even before that, other downloadable or streaming digital content) is really changing my behaviour as far as television is concerned.

Where watching TV used to be an affair of browsing through 500 channels and complaining there was nothing on, now with the advent of on-demand digital services there is choice. Instead of flipping through hundreds of channels (is that a linear search or a random walk?), most of which have nothing whatsoever that interests you, now you can search for exactly the show you are looking for and watch it when you want. Without commercials.

Wait, what? That’s amazing! No wonder people are ‘cutting the cord’ and media corporations are concerned about the future of their business model.

True, you can still browse. People complain that the selection on Netflix is bad for Canada, but for 8 dollars a month, really it’s pretty good what you’re getting. And given the…. eclectic nature of the selection I sometimes find myself watching something I would never think to look for directly, or give a second chance if I just caught 5 minutes of the middle of it on cable.

Such is the case with Red Dwarf. Red Dwarf is one of those shows that gained a cult following, and, despite its many flaws, for me has a certain charm and some great moments. This despite my not being able to understand all of the jokes (or dialogue!) as it is a show from the BBC.

The point is that before Netflix, I probably wouldn’t come across something like this, and I definitely wouldn’t watch all of it, if there wasn’t that option so easily laid out.

So I watched a lot of this show and got to thinking, why not take this as an opportunity to do some more everyday analytics?

Background

If you’re not familiar with the show or a fan, I’ll briefly summarize here so you’re not totally lost.

The series centers around Dave Lister, an underachieving chicken-soup vending machine repairman aboard the intergalactic mining ship Red Dwarf. Lister inadvertently becomes the last human being alive when being put into stasis for 3 million years by the ship’s computer, Holly, when there is a radiation leak aboard the ship. The remainder of the ship’s crew are Arnold J. Rimmer, a hologram of Lister’s now-deceased bunkmate and superior officer; The Cat, a humanoid evolved from Lister’s pet cat; Kryten, a neurotic sanitation droid; and later Kristine Kochanski, a love interest who gets brought back to life from another dimension.

Conveniently, the Red Dwarf scripts are available online, transcribed by dedicated fans of the program. This just goes to show that the series truly does have cult following, when there are fans who love the show so much as to sit and transcribe episodes just for it’s own sake! But then again, I am doing data analysis and visualization on that same show….

Analysis

Of the ten seasons and 61 episodes of the series, the data set covers Seasons 1-8 and comprises and 51 episodes of those 52 (S08E03 – Back In The Red (Part III) is missing).
I did some text analysis of the data with the tm package for R. 

First we can see the prevalence of different characters within the show over the course of the series. I’ve omitted the x-axis labels as they made the chart appear cluttered, you can see them by interacting.

Lister and Rimmer, the two main characters, have the highest amount of mentions overall. Kryten appears in the eponymous S02E01 and is then later introduced as one of the core characters at the beginning of Season 3. The Cat remains fairly constant throughout the whole series as he appears or speaks mainly for comedic value. In S01E06, Rimmer makes a duplicate of himself which explains the high number of lines by his character and mentions of his name in the script. You can see he disappears after Episode 2 of Season 7 in which his character is written out, until re-appearing in Season 8 (he appears in S07E05 as there is an episode dedicated to the rest of the crew reminiscing about him).

Holly, the ship’s computer, appears consistently at the beginning of the program until disappearing with the Red Dwarf towards the beginning of Season 6. He is later reintroduced when it returns at the beginning of Season 8.

Lister wants to bring back Kochanski as a hologram in S01E03, and she also appears in S02E04, as it is a time travel episode. She is introduced as one of the core cast members in Episode 3 of Season 7 and continues to be so until the end of the series.

Ace is Rimmer’s macho alter-ego from another dimension. He appears a couple time in the series before S07E02, in which he is used as a plot device to write Rimmer out of the show for that season.

Appearance and mentions of other crew members of the Dwarf correspond to the beginning of the series and the end (Season 8) when they are reintroduced. The Captain, Hollister, appears much more frequently towards the end of the show.

Robots appear mainly as one-offs who are the focus of a single episode. The exceptions are the Scutters (Red Dwarf’s utility droids) whose appearances coincide with the parts of the show where the Dwarf exists, and simulants which are mentioned occasionally as villians / plot devices. The toaster and snarky dispensing machine also appear towards the beginning and end, with the former also having speaking parts in S04E04.

As mentioned before, the Dwarf gets destroyed towards at the end of Season 5 until being reintroduced at the beginning of Season 8. During this time, the crew live in one of the ship’s shuttlecraft, The Starbug. You can also see that the starbug is mentioned more frequently in episodes when the crew go on excursions (e.g. Season 3, Episodes 1 and 2).

One of the recurring themes of the show is how much Lister really enjoys Indian food, particularly chicken vindaloo. That and how he’d much rather just drink beer at the pub than do anything. S04E02 (spike 1) features a monster, a Chicken Vindaloo man (don’t ask), and the whole premise of S07E01 (spike 2) is Lister wanting to go back in time to get poppadoms.

Thought this would be fun. Space is a consistent theme of the show, obviously. S07E01 is a time travel episode, and the episodes with Pete (Season 8, 6-7) at the end feature a time-altering device.

Conclusions

I recall talking to associate of mine who recounted his experiences in a data analysis and programming workshop where the data set used was the Enron emails. As he quite rightly pointed out, he knew nothing about the Enron emails, so doing the analysis was difficult – he wasn’t quite sure what he was looking at, or what he should be expecting. He said he later used the Seinfeld scripts as a starting point, as this was at least something he was familiar with.

And that’s an excellent point. You don’t need necessarily need to be a subject matter expert to be an analyst, but it sure helps to have some idea what you exactly you are analyzing. Also I would think that there’s a higher probability you care about what you are trying to analyze more if you know something about it.

On that note, it was enjoyable to analyze the scripts in this manner, and see something so familiar as a television show visualized as data like any other. I think the major themes and changes in the plotlines of the show were well represented in this way.

In terms of future directions, I tried looking at the correlation between terms using the findAssocs() function but got strange results, which I believe is due to the small number of documents. At a later point I’d like to do that properly, with a larger number of documents (perhaps tweets). Also this would work better if synonym replacement for the characters was handled in the original corpus, instead of ad-hoc and after the fact (see code).

Lastly, another thing I took away from all this is that cult TV shows have very, very devoted fan-bases. Probably due to its systemic bias, there is an awful lot about Red Dwarf on Wikipedia, and elsewhere on the internet.

Resources

code and data on github
https://github.com/mylesmharrison/reddwarf

Red Dwarf Scripts (Lady of the Cake)