Introduction
Background
Analysis
# Read the neighborhood shapefile data and plot
shpfile <- "NEIGHBORHOODS_WGS84_2.shp"
sh <- readShapePoly(shpfile)
plot(sh)
Okay, that’s a good start. Now we’re going to include the neighbourhood population from the demographic data file by attaching it to the dataframe within the shapefile object. We do this using the merge function. Basically this is like an SQL join. Also I need to convert the neighbourhood number to a integer first so things work, because R is treating it as an string.
# Add demographic data
# The neighbourhood ID is a string - change it to a integer
sh@data$AREA_S_CD <- as.numeric(sh@data$AREA_S_CD)
# Read in the demographic data and merge on Neighbourhood Id
demo <- read.csv(file="WB-Demographics.csv", header=T)
sh2 <- merge(sh, demo, by.x='AREA_S_CD', by.y='Neighbourhood.Id')
# Set the palette
p <- colorRampPalette(c("white", "red"))(128)
palette(p)
# Scale the total population to the palette
pop <- sh2@data$Total.Population
cols <- (pop - min(pop))/diff(range(pop))*127+1
plot(sh, col=cols)
Cool. You can see that the population is greater for some of the larger neighbourhoods, notably on the east end and The Waterfront Communities (i.e. condoland)
I’m not crazy about this white-red palette so let’s use RColorBrewer’s spectral which is one of my faves:
#RColorBrewer, spectral
p <- colorRampPalette(brewer.pal(11, 'Spectral'))(128)
palette(rev(p))
plot(sh2, col=cols)
There, that’s better. The dark red neighborhood is Woburn. But we still don’t have a legend so this choropleth isn’t really telling us anything particularly helpful. And it’d be nice to have the polygons overplotted onto map tiles. So let’s use ggmap!
ggmap
The first polygon call is for the filled shapes and the second is to plot the black borders.
#GGPLOT
points <- fortify(sh, region = 'AREA_S_CD')
# Plot the neighborhoods
toronto <- qmap("Toronto, Ontario", zoom=10)
toronto +geom_polygon(aes(x=long,y=lat, group=group, alpha=0.25), data=points, fill='white') +
geom_polygon(aes(x=long,y=lat, group=group), data=points, color='black', fill=NA)
Now we merge the demographic data just like we did before, and ggplot takes care of the scaling and legends for us. It’s also super easy to use different palettes by using scale_fill_gradient and scale_fill_distiller for ramp palettes and RColorBrewer palettes respectively.
# merge the shapefile data with the social housing data, using the neighborhood ID
points2 <- merge(points, demo, by.x='id', by.y='Neighbourhood.Id', all.x=TRUE)
# Plot
toronto + geom_polygon(aes(x=long,y=lat, group=group, fill=Total.Population), data=points2, color='black') +
scale_fill_gradient(low='white', high='red')
# Spectral plot
toronto + geom_polygon(aes(x=long,y=lat, group=group, fill=Total.Population), data=points2, color='black') +
scale_fill_distiller(palette='Spectral') + scale_alpha(range=c(0.5,0.5))
References & Resources
https://github.com/mylesmharrison/toronto_neighbourhoods
How we're you able to change the shapefile to read on R? I've never used QGIS and can't seem to get it to work
I've included the processed shapefile data with the code in github: https://github.com/mylesmharrison/toronto_neighbourhoods
It's not clear how you coordinate the demographic data with the specific shapes in the map. It appears that Sh is not just an image of the map, but it also contains neighborhood ID names which allow for the later merge. Can you further clarify this essential step? thanks
sh is a SpatialPolygonsDataFrame (from the sp package). It contains data on the different polys in a dataframe inside the object (here identifying polygons by neighbourhood ID) which we merge with the demo data with this line of code:
sh2 <- merge(sh, demo, by.x='AREA_S_CD', by.y='Neighbourhood.Id')
You can see the how the dataframes are being merged by looking at the one contained in the sh object, the demographic dataframe, and then the resulting columns in the new SpatialPolygonsDataFrame object when they are merged:
> head(sh@data,10)
AREA_S_CD AREA_NAME
0 097 Yonge-St.Clair (97)
1 027 York University Heights (27)
2 038 Lansing-Westgate (38)
3 031 Yorkdale-Glen Park (31)
4 016 Stonegate-Queensway (16)
5 118 Tam O'Shanter-Sullivan (118)
6 063 The Beaches (63)
7 003 Thistletown-Beaumond Heights (3)
8 055 Thorncliffe Park (55)
9 059 Danforth East York (59)
> head(demo[,1:4],10)
Neighbourhood Neighbourhood.Id Total.Area Total.Population
1 West Humber-Clairville 1 30.09 34100
2 Mount Olive-Silverstone-Jamestown 2 4.60 32790
3 Thistletown-Beaumond Heights 3 3.40 10140
4 Rexdale-Kipling 4 2.50 10485
5 Elms-Old Rexdale 5 2.90 9550
6 Kingsview Village-The Westway 6 5.10 21725
7 Willowridge-Martingrove-Richview 7 5.50 21345
8 Humber Heights-Westmount 8 2.80 10580
9 Edenbridge-Humber Valley 9 5.50 14945
10 Princess-Rosethorn 10 5.20 11200
> names(sh@data)
[1] "AREA_S_CD" "AREA_NAME"
> names(sh2@data)
[1] "AREA_S_CD" "AREA_NAME" "Neighbourhood" "Total.Area"
[5] "Total.Population" "Pop…Males" "Pop…Females" "Pop.0…4.years"
[9] "Pop.5…9.years" "Pop.10…14.years" "Pop.15..19.years" "Pop.20…24.years"
[13] "Pop..25…29.years" "Pop.30…34.years" "Pop.35…39.years" "Pop.40…44.years"
[17] "Pop.45…49.years" "Pop.50…54.years" "Pop.55…59.years" "Pop.60…64.years"
[21] "Pop.65…69.years" "Pop.70…74.years" "Pop.75…79.years" "Pop.80…84.years"
[25] "Pop.85.years.and.over" "Seniors.55.and.over" "Seniors.65.and.over" "Child.0.14"
[29] "Youth.15.24" "Home.Language.Category" "Language…Chinese" "Language…Italian"
[33] "Language…Korean" "Language…Persian..Farsi." "Language…Portuguese" "Language…Russian"
[37] "Language…Spanish" "Language…Tagalog" "Language…Tamil" "Language…Urdu"
Hello Myles, I’ve had similar issues with the same Toronto Geo shape files, so I had to adjust my own dataset in order to correctly map it with in Excel PowerMap visualization. And I just wanted to share those findings with you as well: http://datanrg.blogspot.ca/2015/11/powermap-with-custom-region-shapefiles.html
Interesting stuff. Thanks for sharing!
Hi Myles, thanks for sharing your work. is it possible to display Toronto visible minorities percentage on same map, if you do you have sample of syntax?
thanks
Hi Qamar. The demo data set does not include visible minority data, but does include language counts, so you could do something like:
toronto + geom_polygon(aes(x=long,y=lat, group=group, fill=Language.Chinese/Total.Population), data=points2, color='black') + scale_fill_distiller(palette='Spectral') + scale_alpha(range=c(0.5,0.5))
to find the percentage of Chinese speakers per neighbourhood. Otherwise you'd have to load other data and merge it by neighbourhood ID as from the beginning.
really cool.. however when I try to perform this on my dataset I get the error
Error: isTRUE(gpclibPermitStatus()) is not TRUE
when trying to fortify the sapefile
I've tried installing gcplib and rgeos but still hasn't worked, do you know if it's a common problem/there is any solution?
This post on Stackoverflow offers two possible solutions. Hopefully one of them works (I can't recall which did for me).
Hi there,
Great post! I'm trying to replicate the analysis using Australian data. Everything goes fine until I try to fill the polygons with my population data. This part of the code
qmap("Queensland", zoom = 5, source ="osm") + geom_polygon(aes(x = long, y = lat, group = group, fill=Population), data= qld_shapefile3, colour = "black") + scale_fill_gradient(low = "white", high = "red")
I get an error message saying "Error in eval(exp, envir, enclose) : object "Population" not found
When I use the colname() function it states that my dataframe (qld_shapefile3) doesnt have any column names. I've followed your steps, but I am not sure how you've ensured that the aes() function can read your column names in your data frame (points2)
Any advice?
Thanks
Adam