How do you build a model from first principles? Here is a step by step guide.
Following on from last week’s post on Principled Bayesian Workflow I want to reflect on how to motivate a model.
The purpose of most models is to understand change, and yet, considering what doesn’t change and should be kept constant can be equally important.
I will go through a couple of models in this post to illustrate this idea.
Two weeks ago I discussed various linear and generalised linear models in R using ice cream sales statistics. The data showed not surprisingly that more ice cream was sold at higher temperatures.
icecream <- data.frame( temp=c(11.9, 14.2, 15.2, 16.4, 17.2, 18.1, 18.5, 19.4, 22.1, 22.6, 23.4, 25.1), units=c(185L, 215L, 332L, 325L, 408L, 421L, 406L, 412L, 522L, 445L, 544L, 614L) ) I used a linear model, a log-transformed linear model, a Poisson and Binomial generalised linear model to predict sales within and outside the range of data available.
Linear models are the bread and butter of statistics, but there is a lot more to it than taking a ruler and drawing a line through a couple of points.
Some time ago Rasmus Bååth published an insightful blog article about how such models could be described from a distribution centric point of view, instead of the classic error terms convention.
I think the distribution centric view makes generalised linear models (GLM) much easier to understand as well.
The 100m mean’s sprint finals of the 2012 London Olympics are over and Usain Bolt won the gold medal again with a winning time of 9.63s. Time to compare the result with my forecast of 9.68s, posted on 22 July.
My simple log-linear model predicted a winning time of 9.68s with a prediction interval from 9.39s to 9.97s. Well, that is of course a big interval of more than half a second, or ±3%.
It is less than a week before the 2012 Olympic games will start in London. No surprise therefore that the papers are all over it, including a lot of data and statistis around the games. The Economist investigated the potential financial impact on sponsors (some benefits), tax payers (no benefits) and the athletes (if they are lucky) in its recent issue and video. The Guardian has awhole series around the Olympics, including the data of all Summer Olympic Medallists since 1896.