Introduction
In case you’ve been living under a rock, there’s been a lot of chatter in the financial world late about the price of oil going down.
Way, way, down. So much so that the Bank of Canada cut
interest rates. What crazy times are these we live in? I thought gas was only going to get more and more expensive until the end of time until everyone resorted to driving solarpowered cars.
I’m no economist (or commodities guy) and frankly, a lot of it seems like black magic and voodoo to me, but I thought it’d take a look at the data to see just how things have changed. Plus it’ll be an opportunity to muck about with some time series analysis in R.

Your average economists. 
Background
Not much background is really needed other than what I mentioned in the introduction, but, well, we do need some data to work with.
Ontario Gas Prices for a number of cities (as well as the province averages) can be found at the
Ontario Ministry of Energy. Unfortunately they’re yearbyyear CSVs with a lot of different information rowwise (Weekly, Month, YTD).
No biggie, a simple bash script will take care of downloading all the data with
wget and parsing and concatenating the data with other *nix tools:
# Download data
for i in $(seq 1990 2014)
do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv
done
# Retain the header
head n 2 ONTREG1990.csv  sed 1d > ONTREG_merged.csv
# Loop over the files and use sed to extract the relevant lines
for i in $(seq 1990 2014)
do
tail n 15 ONTREG$i.csv  sed 13,15d  sed 's/./01'$i',/4' >> ONTREG_merged.csv
done
Great! Now we have all monthly Ontario Gas Price data from 19902014 inclusive in one big file.
The WTI data I got from The Federal Reserve Bank of St. Louis, and the forecasts from the US Energy Information Administration.
Analysis
First, a pretty plot of the data:
Wow, that really is a cliff, isn’t it? The average Ontario Gas Price hasn’t been as low as it was in Dec 2014 since the fall of 2010 (Sep 2010, $1.01) and the WTI not since about May of that year.
Now to the fun stuff. Let’s read the data into R and do some hacky time series analysis.
library(ts)
library(forecast)
# Read in the Ontario Gas Price Data
data < read.csv(file="ONTREG_merged.csv", header=T, sep=",")
# Read in the WTI oil price data
WTI_data < read.csv(file='DCOILWTICO.csv',header=F, col.names=c("Date", "Value"))
# Create a time series object for the WTI and Ontario Avg
WTI < ts(data=WTI_data$Value, frequency=12, start=c(1990,1), end=c(2014,12))
ON < ts(data=data$ON.Avg, frequency=12, start=c(1990,1), end=c(2014,12))
# Plot and compare
combined < cbind(WTI, ON)
plot(combined)
We get a comparison plot of the data:
And we can look at the Ontario Gas Price as a function of the WTI. Linear is on the left, loglog on the right.
Next we build lm model objects and look at the diagnostics. I’ll spare the details, but I feel better about the loglog, so we’ll go with that.
# Create linear models (normal and loglog)
l1 < lm(ON ~ WTI, data=combined)
l2 < lm(log(ON) ~ log(WTI), data=combined)
# Compare relative performance
summary(l1)
summary(l2)
plot(l1)
plot(l2)
# Plot
plot(ON ~ WTI, data=combined, pch=16, cex=0.3)
abline(l1)
plot(log(ON) ~ log(WTI), data=combined, pch=16, cex=0.3)
abline(l2)
Lastly, we read in the forecast WTI data and use it to forecast the Ontario Gas price using our second model:
# Read in WTI forecast data
WTI_forecast < read.csv(file="WTI_forecast.csv", header=F, sep=",", col.names=c("Date", "Value"))
# Forecast Ontario Gas Price
fit < forecast(l2, newdata=data.frame(WTI=WTI_forecast$Value))
# Unlog
fit$mean < exp(fit$mean)
fit$lower < exp(fit$lower)
fit$upper < exp(fit$upper)
fit$x < exp(fit$x)
# Plot
plot(fit, ylab='Ontario Average Gas Price (cents/L)')
And there you have it! Putting the forecast data (blue line) and the WTI forecast back into our original graph, we can compare the two:
It’s that easy, right? That’s all there is to it?
Conclusion
Sadly, no. I’ve done some very quick work here and demonstrated some of the types of tools that are available in R, but “real” timeseries analysis is a practice which requires much more care and nuance.
For example, linear modeling assumes that variables are
stationary (
i.e. have constant mean and variance) and not autocorrelated, properties which are almost never true in the real world. Using methods such as above for nonstationary times series can result in what is known as “spurious regression” – finding relationships between variables which don’t really exist, even though the results have high Rsquared and pvalues.
In these cases, testing the stationary assumption and then massaging of the data (differencing & deseasonalization) is usually required beforehand to handle the problem, or other models which do not have as strict assumptions as linear regression are more appropriate. For more on this, see the excellent book “Forecasting: Principles and Practice” by Rob J. Hyndman and George AthanaÂsopouÂlos.
The takeaway, as always, is that real life is complicated, and so analysis requires care. Predicting the future has never been, and never will be easy; either way, I just hope gas stays cheap.
code & data on github:
Fuel Price Data: Regular Unleaded Gasoline 2014 (Ontario Ministry of Energy):
ShortTerm Energy Outlook (U.S. Energy Information Administration)
Forecasting: Principles and Practice (OTexts)