Mapping the TTC Lines with R and Leaflet

It’s been quite a while since I’ve written a post, but as of late I’ve become really interested in mapping and so have been checking out different tools for doing this, one of which is Leaflet. This is an example of a case where, because of a well-written package for R, it’s easy for the user to create interactive web maps directly from R, without even knowing any Javascript!

I had three requirements for myself:

  1. Write code that created an interactive web map using Leaflet
  2. Use Shapefile data about the City of Toronto
  3. Allow anyone to run it on their machine, without having to download or extract data

I decided to use shapefile data on the TTC, available from Toronto’s Open Data portal. Point #3 required a little research, as the shapefile itself was buried within a zip, but it’s fairly straightforward to write R code to download and unpack zip files into a temporary directory.

The code is below, followed by the result. Not a bad result for only 10 or 15 lines!


# MAPPING THE TORONTO SUBWAY LINES USING R & Leaflet
# --------------------------------------------------
#
# Myles M. Harrison
# https://www.everydayanalytics.ca

#install.packages('leaflet')
#install.packages('maptools')
library(leaflet)
library(htmlwidgets)
library(maptools)

# Data from Toronto's Open Data portal: http://www.toronto.ca/open

# Download the file and read in the
data_url <- "http://opendata.toronto.ca/gcc/TTC_subway%20lines_wgs84.zip"
cur_dir <- getwd()
temp_dir <- tempdir()
setwd(temp_dir)
download.file(data_url, "subway_wgs84.zip")
unzip("subway_wgs84.zip")
sh <- readShapeLines("subway_wgs84.shp")
unlink(dir(temp_dir))
setwd(cur_dir)

# Create a categorical coloring function
linecolor <- colorFactor(rainbow(16), sh@data$SBWAY_NAME)

# Plot using leaflet
m <- leaflet(sh) %>%
addTiles() %>%
addPolylines(popup = paste0(as.character(sh@data$SBWAY_NAME)), color=linecolor(sh@data$SBWAY_NAME)) %>%
addLegend(colors=linecolor(sh@data$SBWAY_NAME), labels=sh@data$SBWAY_NAME)

m

# Save the output
saveWidget(m, file="TTC_leaflet_map.html")

Plotting Choropleths from Shapefiles in R with ggmap – Toronto Neighbourhoods by Population

Introduction

So, I’m not really a geographer. But any good analyst worth their salt will eventually have to do some kind of mapping or spatial visualization. Mapping is not really a forte of mine, though I have played around with it some in the past.
I was working with some shapefile data a while ago and thought about how its funny that so much of spatial data is dominated by a format that is basically proprietary. I looked around for some good tutorials on using shapefile data in R, and even so it took me a while to figure it out, longer than I would have thought.
So I thought I’d put together a simple example of making nice choropleths using R and ggmap. Let’s do it using some nice shapefile data of my favourite city in the world courtesy of the good folks at Toronto’s Open Data initiative.

Background

We’re going to plot the shapefile data of Toronto’s neighbourhoods boundaries in R and mash it up with demographic data per neighbourhood from Wellbeing Toronto.
We’ll need a few spatial plotting packages in R (ggmap, rgeos, maptools).
Also the shapefile originally threw some kind of weird error when I originally tried to load it into R, but it was nothing loading it into QGIS once and resaving it wouldn’t fix. The working version is available on the github page for this post.

Analysis

First let’s just load in the shapefile and plot the raw boundary data using maptools. What do we get?
# Read the neighborhood shapefile data and plot
shpfile <- "NEIGHBORHOODS_WGS84_2.shp"
sh <- readShapePoly(shpfile)
plot(sh)
This just yields the raw polygons themselves. Any good Torontonian would recognize these shapes. There’s some maps like these with words squished into the polygons hanging in lots of print shops on Queen Street. Also as someone pointed out to me, most T-dotters think of the grid of downtown streets as running directly North-South and East-West but it actually sits on an angle.

Okay, that’s a good start. Now we’re going to include the neighbourhood population from the demographic data file by attaching it to the dataframe within the shapefile object. We do this using the merge function. Basically this is like an SQL join. Also I need to convert the neighbourhood number to a integer first so things work, because R is treating it as an string.

# Add demographic data
# The neighbourhood ID is a string - change it to a integer
sh@data$AREA_S_CD <- as.numeric(sh@data$AREA_S_CD)

# Read in the demographic data and merge on Neighbourhood Id
demo <- read.csv(file="WB-Demographics.csv", header=T)
sh2 <- merge(sh, demo, by.x='AREA_S_CD', by.y='Neighbourhood.Id')
Next we’ll create a nice white to red colour palette using the colorRampPalette function, and then we have to scale the population data so it ranges from 1 to the max palette value and store that in a variable. Here I’ve arbitrarily chosen 128. Finally we call plot and pass that vector of colours into the col parameter:
# Set the palette
p <- colorRampPalette(c("white", "red"))(128)
palette(p)

# Scale the total population to the palette
pop <- sh2@data$Total.Population
cols <- (pop - min(pop))/diff(range(pop))*127+1
plot(sh, col=cols)
And here’s the glorious result!

Cool. You can see that the population is greater for some of the larger neighbourhoods, notably on the east end and The Waterfront Communities (i.e. condoland)

I’m not crazy about this white-red palette so let’s use RColorBrewer’s spectral which is one of my faves:

#RColorBrewer, spectral
p <- colorRampPalette(brewer.pal(11, 'Spectral'))(128)
palette(rev(p))
plot(sh2, col=cols)

There, that’s better. The dark red neighborhood is Woburn. But we still don’t have a legend so this choropleth isn’t really telling us anything particularly helpful. And it’d be nice to have the polygons overplotted onto map tiles. So let’s use ggmap!


ggmap

In order to use ggmap we have to decompose the shapefile of polygons into something ggmap can understand (a dataframe). We do this using the fortify command. Then we use ggmap’s very handy qmap function which we can just pass a search term to like we would Google Maps, and it fetches the tiles for us automatically and then we overplot the data using standard calls to geom_polygon just like you would in other visualizations using ggplot.

The first polygon call is for the filled shapes and the second is to plot the black borders.

#GGPLOT 
points <- fortify(sh, region = 'AREA_S_CD')

# Plot the neighborhoods
toronto <- qmap("Toronto, Ontario", zoom=10)
toronto +geom_polygon(aes(x=long,y=lat, group=group, alpha=0.25), data=points, fill='white') +
geom_polygon(aes(x=long,y=lat, group=group), data=points, color='black', fill=NA)
Voila!

Now we merge the demographic data just like we did before, and ggplot takes care of the scaling and legends for us. It’s also super easy to use different palettes by using scale_fill_gradient and scale_fill_distiller for ramp palettes and RColorBrewer palettes respectively.

# merge the shapefile data with the social housing data, using the neighborhood ID
points2 <- merge(points, demo, by.x='id', by.y='Neighbourhood.Id', all.x=TRUE)

# Plot
toronto + geom_polygon(aes(x=long,y=lat, group=group, fill=Total.Population), data=points2, color='black') +
scale_fill_gradient(low='white', high='red')

# Spectral plot
toronto + geom_polygon(aes(x=long,y=lat, group=group, fill=Total.Population), data=points2, color='black') +
scale_fill_distiller(palette='Spectral') + scale_alpha(range=c(0.5,0.5))

So there you have it! Hopefully this will be useful for other R users wishing to make nice maps in R using shapefiles, or those who would like to explore using ggmap.

References & Resources

Neighbourhood boundaries at Toronto Open Data:
Demographic data from Well-being Toronto:

Toronto Data Science Group – One or the Other: An Overview of Binary Classification Methods

So Chris was kind enough to invite me to speak at the Toronto Data Science Group again this past Thursday. I spoke on binary classification, and made an effort to cover a fair bit of ground and some technical detail, while still making it accessible. I wanted to give an overview for an audience that was more interested in the ‘how’, and the practical realities of using classification to solve problems within an organization.


As before, I’ll keep my observations to be more about presenting and less about the content.

The meetup is a lot different now, having presentations at venues like MaRS or the conference room at Thompson Hotel with large audiences, as opposed to the early days when it was much smaller.

Speaking to a larger group is challenging; both in that it’s more nerve-racking, and I also noticed it was harder to make eye contact and include the whole audience than I am used to with smaller groups. The temptation is to just look out straight ahead in front of you. Speaking in front of a podium has its disadvantages this way, but it does keep you anchored and give you something on which to rest your hands and remain centered. Looking back toward the screen is usually a bad idea when presenting regardless of audience size, unless you are pointing something out, and is doubly so when that screen is very large and above you.

Some folks were kind enough to take some photos of me during the talk for social media and the like. In retrospect, while I do try to have a very visual style (and inject some humour with it) I think it can come across as overly simplistic and flippant in certain contexts, such as with this larger group. There’s a balance to be struck there, I’m sure. Also, as always, you need to be mindful of how large you are making things on your slides (especially text), given the size of the screen with respect to the venue.


The point I made about the explainability of different classification methods to the non-technical audience or end consumer (i.e. client) receiving the results of their application was less controversial than I would have thought. Chris commented on this as well.

As always I was overly ambitious and was able to get through a lot less material in the timeframe than I originally would have thought.

I was asked some very insightful and detailed questions, some of which I wasn’t totally prepared to answer. Talking about something is fairly easy, I think, because you can put together exactly what you want to say and rehearse; it’s in the answering of the questions that people decide whether you really know the subject, or just putting pretty pictures up on the screen and painting in broad verbal strokes. Many people in the audience seemed to have assumed that because I was speaking on the topic of binary classification that I was a complete expert on it – there’s a danger here too, I think, when you see anyone give a presentation.

All in all, I think the talk was very well received. As always I learned a lot putting it together, and even more afterward, discussing with Toronto’s data scientists and knowledgeable analysts with insightful points of view.

Looking forward to the next one.