Introduction
Background
Googling the Archer scripts turns up the bunch of them at Springfield! Springfield!.
Unfortunately since it looks like the scripts have been laboriously transcribed by ardent fans, there isn’t any dialogue tagging like you’d see in a true script, but this is a limitation of the data set we’ll just have to live with. Hopefully the style of the dialogue and writers will still come through when we train the RNN on it (especially since sometimes there is not too much difference between the difference characters’ dialogue, given how terrible they all are, and the amount of non sequitur in the show).
from bs4 import BeautifulSoup import urllib2 # CREATE SOUP def soupify(url): # Open the request and create the soup req = urllib2.Request(url) response = urllib2.urlopen(req, timeout = 10.0) soup = BeautifulSoup(response.read(), "lxml") return soup # GET SCRIPT AND CLEAN def get_script(url): soup = soupify(url) script = soup.findAll("div", {"class":"episode_script"})[0] # Clean for br in script.find_all("br"): br.replace_with("n") scripttext = script.text scripttext = scripttext.replace('-',' ').replace('n',' ') scripttext = scripttext.strip() return scripttext # GET SCRIPT URLS def get_episode_urls(showurl): soup = soupify(showurl) # Get the urls and add the base URL to each in the list urls = soup.findAll("a", {"class":"season-episode-title"}) baseurl = 'http://www.springfieldspringfield.co.uk/' urls = map(lambda x: baseurl + '/' + x['href'], list(urls)) return urls ### MAIN def do_scrape(): # Scrape the script from each URL and add to a list episodes = list() # Get the episode list from the main page urls = get_episode_urls('http://www.springfieldspringfield.co.uk/episode_scripts.php?tv-show=archer') for url in urls: print url episodes.append(get_script(url)) # Write the output to a file f = open('archer_scripts.txt','w') for episode in episodes: f.write(episode.encode('ascii','ignore')) f.close()
Basically the script gets the list of episode URLs from the show page, then scrapes each script in turn and exports to a text file. And I didn’t even have to do any error handling, it just worked on the first shot! Wow. (Isn’t it nice when things just work?)
After a little manual data cleansing, we are ready to feed the data to the RNN model.
Training the model
Since this is the easy part that we are relying on the already built model for, there’s not much to say here. Just rename the file and plunk into a data directory like the demo file, then run
python train.py --data_dir data/archer
And let the model do its thing. My poor little laptop doesn’t even have a GPU so the model was grinding away overnight and then some but eventually finished.
![]() |
The end of the grind and testing the model output. |
word-rnn-tensorflow also conveniently pickles your model, so you can conveniently use it again at a later time, or continue training a previously trained model. I’d have made the model files available, however unfortunately they are rather large (~200 MB).
Anyhow, once the training is done you can get the model to spit out outputs by running:
python sample.py
Here are some sample outputs from the model which I split up and tidied a bit:
Oh, what do you mean “Lana hasn’t called”?
I mean, you don’t know how to tell you how to discipline my servant!
I think I was gonna say “flap hammered. ”
Oh, what are you talking about?
Hitting on the communist insurgency.
I don’t do dossiers.
Besides, this is a booty call, I’m flattered, but Oh my God, BPA!
I Transgendered frogs! [frog bribbit]
Shh, shh, shh.
Coyly. relieve in the cerebrospinal fluid at the dimmer switch in the bullpen, maybe spilled a miracle. And so sorry if you don’t clamp that artery!
One! Brett? What the hell is wrong with you?! And what are you doing?
Esatto! Che cosa stiamo facendo?!
Aww man, we go to Plan Manfred. And then Ruby Foo’s.
Yeah, I don’t know what grits are, or hush puppies!
Are you sure?
I don’t know.
Push it out of that shitheap of a weak, like the rest of our business! Oh, and speaking of an exquisitely prepared entre…
No, I don’t even know what grits are, or hush puppies!
Are you sure?
That’s what I was gonna say “flap hammered. ”
Oh, how are you bleaching mucho.
But I don’t know what grits are, or hush puppies!
Are you sure?
That’s what I was gonna say “flap hammered. ”
Oh, how are you bleaching mucho.
But I don’t know what grits are, or hush puppies!
Are you finished?
Yes.
No, no, no, no, no! [crashing] [crashing stops] [coughing]
Oh, shit, uh whatcha madoo HVAC.
Ooh!
Well?
God damn it. Off!
peasy zombie squeezy.
Yeah, of the sake of argument, let’s leave him Fister Roboto.
But it looks like bubble wrap.
What is your proposal?
I know the burns. And if you were “confused verging on angry” before…
Aaagh! Son of a fanboy request.
And you don’t know how to share, beyotch.
Easy, tough guy.
When do you think it was squash, sir.
I don’t know. I don’t know. Warranty’s in raspy Street, you know.
Woowoowoowoowoowoowoof!
What.
No, coordinate with Rome, then let me go. (wheezy coughing) (gasping)
Well, I am just a DJ?
Learning experience? Well, joke’s on sale, will you not?
She’s battled ivory poachers, raging rivers, even tuberculosis, and now Oscar winner Rona Thorne takes on the planet.
Look: CIA, Ml6, Interpol.
We can’t believe you don’t clamp that artery!
One! Brett?
What the hell was that?
Microfilm.
It was all the shocks damaged my frontal lobe.
In the brain’s language center?
About the output you would expect. Nothing that’s going to pass the Turing test, but if you’ve watched the show you can picture Archer, Lana, and Cyril having an argument that might contain some of the above (with maybe a couple other cast members thrown in… like that Italian line from The Papal Chase). And it seems to stitch together whole phrases or following lines since many are unique.
Some of the output is not that bad – there’s what could be some comedic gems in there if you look hard enough, that aren’t verbatim from the original scripts (e.g. “son of a fanboy request!”)