In case you’ve been living under a rock, there’s been a lot of chatter in the financial world late about the price of oil going down. Way, way, down
. So much so that the Bank of Canada cut interest rates
. What crazy times are these we live in? I thought gas was only going to get more and more expensive until the end of time until everyone resorted to driving solar-powered cars.
I’m no economist (or commodities guy) and frankly, a lot of it seems like black magic and voodoo to me, but I thought it’d take a look at the data to see just how things have changed. Plus it’ll be an opportunity to muck about with some time series analysis in R.
|Your average economists.
Not much background is really needed other than what I mentioned in the introduction, but, well, we do need some data to work with.
Ontario Gas Prices for a number of cities (as well as the province averages) can be found at the Ontario Ministry of Energy
. Unfortunately they’re year-by-year CSVs with a lot of different information row-wise (Weekly, Month, YTD).
No biggie, a simple bash script will take care of downloading all the data with wget
and parsing and concatenating the data with other *nix tools:
# Download data
for i in $(seq 1990 2014)
do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv
# Retain the header
head -n 2 ONTREG1990.csv | sed 1d > ONTREG_merged.csv
# Loop over the files and use sed to extract the relevant lines
for i in $(seq 1990 2014)
tail -n 15 ONTREG$i.csv | sed 13,15d | sed 's/./-01-'$i',/4' >> ONTREG_merged.csv
Great! Now we have all monthly Ontario Gas Price data from 1990-2014 inclusive in one big file.
The WTI data I got from The Federal Reserve Bank of St. Louis, and the forecasts from the US Energy Information Administration.
First, a pretty plot of the data:
Wow, that really is a cliff, isn’t it? The average Ontario Gas Price hasn’t been as low as it was in Dec 2014 since the fall of 2010 (Sep 2010, $1.01) and the WTI not since about May of that year.
Now to the fun stuff. Let’s read the data into R and do some hacky time series analysis.
# Read in the Ontario Gas Price Data
data <- read.csv(file="ONTREG_merged.csv", header=T, sep=",")
# Read in the WTI oil price data
WTI_data <- read.csv(file='DCOILWTICO.csv',header=F, col.names=c("Date", "Value"))
# Create a time series object for the WTI and Ontario Avg
WTI <- ts(data=WTI_data$Value, frequency=12, start=c(1990,1), end=c(2014,12))
ON <- ts(data=data$ON.Avg, frequency=12, start=c(1990,1), end=c(2014,12))
# Plot and compare
combined <- cbind(WTI, ON)
We get a comparison plot of the data:
And we can look at the Ontario Gas Price as a function of the WTI. Linear is on the left, log-log on the right.
Next we build lm model objects and look at the diagnostics. I’ll spare the details, but I feel better about the log-log, so we’ll go with that.
# Create linear models (normal and log-log)
l1 <- lm(ON ~ WTI, data=combined)
l2 <- lm(log(ON) ~ log(WTI), data=combined)
# Compare relative performance
plot(ON ~ WTI, data=combined, pch=16, cex=0.3)
plot(log(ON) ~ log(WTI), data=combined, pch=16, cex=0.3)
Lastly, we read in the forecast WTI data and use it to forecast the Ontario Gas price using our second model:
# Read in WTI forecast data
WTI_forecast <- read.csv(file="WTI_forecast.csv", header=F, sep=",", col.names=c("Date", "Value"))
# Forecast Ontario Gas Price
fit <- forecast(l2, newdata=data.frame(WTI=WTI_forecast$Value))
fit$mean <- exp(fit$mean)
fit$lower <- exp(fit$lower)
fit$upper <- exp(fit$upper)
fit$x <- exp(fit$x)
plot(fit, ylab='Ontario Average Gas Price (cents/L)')
And there you have it! Putting the forecast data (blue line) and the WTI forecast back into our original graph, we can compare the two:
It’s that easy, right? That’s all there is to it?
Sadly, no. I’ve done some very quick work here and demonstrated some of the types of tools that are available in R, but “real” time-series analysis is a practice which requires much more care and nuance.
For example, linear modeling assumes that variables are stationary
have constant mean and variance) and not auto-correlated, properties which are almost never true in the real world. Using methods such as above for non-stationary times series can result in what is known as “spurious regression” – finding relationships between variables which don’t really exist, even though the results have high R-squared and p-values.
In these cases, testing the stationary assumption and then massaging of the data (differencing & deseasonalization) is usually required beforehand to handle the problem, or other models which do not have as strict assumptions as linear regression are more appropriate. For more on this, see the excellent book “Forecasting: Principles and Practice” by Rob J. Hyndman and George Athanasopoulos.
The take-away, as always, is that real life is complicated, and so analysis requires care. Predicting the future has never been, and never will be easy; either way, I just hope gas stays cheap.
References & Resources
code & data on github:
Fuel Price Data: Regular Unleaded Gasoline 2014 (Ontario Ministry of Energy):
Short-Term Energy Outlook (U.S. Energy Information Administration)
Forecasting: Principles and Practice (OTexts)