Introduction It’s been three years since the Casualty Actuarial Society published our research paper on Hierarchical Compartmental Reserving Models (Gesmann and Morris (2020)). Time to revisit it, as developments of the Stan language, and its interfaces such as cmdstanr and brms have progressed and simplified the treatment of differential equations.
We have updated the bookdown version version of the paper to take advantage of these newer versions.
This post will give another example of how to use hierarchical compartmental reserving models, but rather than working with historical claims data, we use the model to generate future data, as may be required for a business plan of a new product, where no historical data exists.
This post is about the Black-Litterman (BL) model for asset allocation and the basis of my talk at the Dublin Data Science Meet-up. The original BL paper (Black and Litterman (1991)) is over 30 years old and builds on the ideas of modern portfolio theory by Harry Markowitz (Markowitz (1952)). A good introduction to the BL model is (Idzorek (2005)) or (Maggiar (2009)).
I am not sure how much the model is used by investment professionals, as many of the underlying assumptions may not hold true in the real world.
This article illustrates how ordinary differential equations and multivariate observations can be modelled and fitted with the brms package (Bürkner (2017)) in R1.
As an example I will use the well known Lotka-Volterra model (Lotka (1925), Volterra (1926)) that describes the predator-prey behaviour of lynxes and hares. Bob Carpenter published a detailed tutorial to implement and analyse this model in Stan and so did Richard McElreath in Statistical Rethinking 2nd Edition (McElreath (2020)).
At the Insurance Data Science conference, both Eric Novik and Paul-Christian Bürkner emphasised in their talks the value of thinking about the data generating process when building Bayesian statistical models. It is also a key step in Michael Betancourt’s Principled Bayesian Workflow.
In this post, I will discuss in more detail how to set priors, and review the prior and posterior parameter distributions, but also the prior predictive distributions with brms (Bürkner (2017)).
Ahead of the Stan Workshop on Tuesday, here is another example of using brms (Bürkner (2017)) for claims reserving. This time I will use a model inspired by the 2012 paper A Bayesian Nonlinear Model for Forecasting Insurance Loss Payments (Zhang, Dukic, and Guszcza (2012)), which can be seen as a follow-up to Jim Guszcza’s Hierarchical Growth Curve Model (Guszcza (2008)).
I discussed Jim’s model in an earlier post using Stan.
How do you build a model from first principles? Here is a step by step guide.
Following on from last week’s post on Principled Bayesian Workflow I want to reflect on how to motivate a model.
The purpose of most models is to understand change, and yet, considering what doesn’t change and should be kept constant can be equally important.
I will go through a couple of models in this post to illustrate this idea.
This is a follow-up post on hierarchical compartmental reserving models using PK/PD models. It will show how differential equations can be used with Stan/ brms and how correlation for the same group level terms can be modelled.
PK/ PD is usually short for pharmacokinetic/ pharmacodynamic models, but as Eric Novik of Generable pointed out to me, it could also be short for Payment Kinetics/ Payment Dynamics Models in the insurance context.
Today, I will sketch out ideas from the Hierarchical Compartmental Models for Loss Reserving paper by Jake Morris, which was published in the summer of 2016 (Morris (2016)). Jake’s model is inspired by PK/PD models (pharmacokinetic/pharmacodynamic models) used in the pharmaceutical industry to describe the time course of effect intensity in response to administration of a drug dose.
The hierarchical compartmental model fits outstanding and paid claims simultaneously, combining ideas of Clark (2003), Quarg and Mack (2004), Miranda, Nielsen, and Verrall (2012), Guszcza (2008) and Zhang, Dukic, and Guszcza (2012).
Last week I wrote about Glenn Meyers’ correlated log-normal chain-ladder model (CCL), which he presented at the 10th Bayesian Mixer Meetup. Today, I will continue with a variant Glenn also discussed: The changing settlement log-normal chain-ladder model (CSR).
Glenn used the correlated log-normal chain-ladder model on reported incurred claims data to predict future developments.
However, when looking at paid claims data, Glenn suggested to change the model slightly. Instead allowing for correlation across accident years, he allows for a gradual shift in the payout pattern to account for a change in the claim settlement rate across accident years.
On 23 November Glenn Meyers gave a fascinating talk about The Bayesian Revolution in Stochastic Loss Reserving at the 10th Bayesian Mixer Meetup in London. Glenn worked for many years as a research actuary at Verisk/ ISO, he helped to set up the CAS Loss Reserve Database and published a monograph on Stochastic loss reserving using Bayesian MCMC models.
In this blog post I will go through the Correlated Log-normal Chain-Ladder Model from his presentation.
Last Tuesday we got together for the 4th Bayesian Mixer Meetup. Product Madness kindly hosted us at their offices in Euston Square. About 50 Bayesians came along; the biggest turn up thus far, including developers of PyMC3 (Peadar Coyle) and Stan (Michael Betancourt).
The agenda had two feature talks by Dominic Steinitz and Volodymyr Kazantsev and a lightning talk by Jon Sedar.
Dominic Steinitz: Hamiltonian and Sequential MC samplers to model ecosystems Dominic shared with us his experience of using Hamiltonian and Sequential Monte Carlo samplers to model ecosystems.
Last week the French National Institute of Health and Medical Research (Inserm) organised with the Stan Group a training programme on Bayesian Inference with Stan for Pharmacometrics in Paris.
Daniel Lee and Michael Betancourt, who run the course over three days, are not only members of Stan’s development team, but also excellent teachers. Both were supported by Eric Novik, who gave an Introduction to Stan at the Paris Dataiku User Group last week as well.
Two Bayesian Mixer meet-ups in a row. Can it get any better?
Our third ‘regular’ meeting took place at Cass Business School on 24 June. Big thanks to Pietro and Andreas, who supported us from Cass. The next day, Jon Sedar of Applied AI, managed to arrange a special summer PyMC3 event. 3rd Bayesian Mixer meet-up First up was Luis Usier, who talked about cross validation. Luis is a former student of Andrew Gelman, so, of course, his talk touched on Stan and the ‘loo’ (leave one out) package in R.
Last Friday the 2nd Bayesian Mixer Meetup (@BayesianMixer) took place at Cass Business School, thanks to Pietro Millossovich and Andreas Tsanakas, who helped to organise the event. Bayesian Mixer at Cass First up was Davide De March talking about the challenges in biochemistry experimentation, which are often characterised by complex and emerging relations among components.
The very little prior knowledge about complex molecules bindings left a fertile field for a probabilistic graphical model.
We had our first successful Bayesian Mixer Meetup last Friday night at the Artillery Arms!
We expected about 15 - 20 people to turn up, when we booked the function room overlooking Bunhill Cemetery and Bayes’ grave. Now, looking at the photos taken during the evening, it seems that our prior believe was pretty good.
The event started with a talk from my side about some very basic Bayesian models, which I used a while back to get my head around the concepts in an insurance context.
There is a nice pub between Bunhill Fields and the Royal Statistical Society in London: The Artillery Arms. Clearly, the perfect place to bring people together to talk about Bayesian Statistics. Well, that’s what Jon Sedar (@jonsedar, applied.ai) and I thought.
Source: http://www.artillery-arms.co.uk/ Hence, we’d like to organise a Bayesian Mixer Meetup on Friday, 12 February, 19:00. We booked the upstairs function room at the Artillery Arms and if you look outside the window, you can see Thomas Bayes’ grave.
Last week I posted a biological example of fitting a non-linear growth curve with Stan/RStan. Today, I want to apply a similar approach to insurance data using ideas by David Clark [1] and James Guszcza [2].
Instead of predicting the growth of dugongs (sea cows), I would like to predict the growth of cumulative insurance loss payments over time, originated from different origin years. Loss payments of younger accident years are just like a new generation of dugongs, they will be small in size initially, grow as they get older, until the losses are fully settled.
It seems the summer is coming to end in London, so I shall take a final look at my ice cream data that I have been playing around with to predict sales statistics based on temperature for the last couple of weeks [1], [2], [3].
Here I will use the new brms (GitHub, CRAN) package by Paul-Christian Bürkner to derive the 95% prediction credible interval for the four models I introduced in my first post about generalised linear models.
I continue my Stan experiments with another insurance example. Here I am particular interested in the posterior predictive distribution from only three data points. Or, to put it differently I have a customer of three years and I’d like to predict the expected claims cost for the next year to set or adjust the premium.
The example is taken from section 16.17 in Loss Models: From Data to Decisions [1].
In my previous post I discussed how Longley-Cook, an actuary at an insurance company in the 1950’s, used Bayesian reasoning to estimate the probability for a mid-air collision of two planes.
Here I will use the same model to get started with Stan/RStan, a probabilistic programming language for Bayesian inference.
Last week my prior was given as a Beta distribution with parameters \(\alpha=1, \beta=1\) and the likelihood was assumed to be a Bernoulli distribution with parameter \(\theta\): \[\begin{aligned} \theta & \sim \mbox{Beta}(1, 1)\\ y_i & \sim \mbox{Bernoulli}(\theta), \;\forall i \in N \end{aligned}\]For the previous five years no mid-air collision were observed, \(x=\{0, 0, 0, 0, 0\}\).
Suppose you have to predict the probabilities of events which haven’t happened yet. How do you do this?
Here is an example from the 1950s when Longley-Cook, an actuary at an insurance company, was asked to price the risk for a mid-air collision of two planes, an event which as far as he knew hadn’t happened before. The civilian airline industry was still very young, but rapidly growing and all Longely-Cook knew was that there were no collisions in the previous 5 years [1].
Last week’s post about the Kalman filter focused on the derivation of the algorithm. Today I will continue with the extended Kalman filter (EKF) that can deal also with nonlinearities. According to Wikipedia the EKF has been considered the de facto standard in the theory of nonlinear state estimation, navigation systems and GPS. Kalman filter I had the following dynamic linear model for the Kalman filter last week:
At the last Cologne R user meeting Holger Zien gave a great introduction to dynamic linear models (dlm). One special case of a dlm is the Kalman filter, which I will discuss in this post in more detail. I kind of used it earlier when I measured the temperature with my Arduino at home.
Over the last week I came across the wonderful quantitative economic modelling site quant-econ.net, designed and written by Thomas J.
Last week’s Cologne R user group meeting was the best attended so far, and it was a remarkable event - I believe not a single line of R code was shown. Still, it was an R user group meeting with two excellent talks, and you will understand shortly why not much R code needed to be displayed. Introduction to Julia for R Users Download slides Hans Werner Borchers joined us from Mannheim to give an introduction to Julia for R users.
It is really getting colder in London - it is now about 5°C outside. The heating is on and I have got better at measuring the temperature at home as well. Or, so I believe.
Last week’s approach of me guessing/feeling the temperature combined with an old thermometer was perhaps too simplistic and too unreliable. This week’s attempt to measure the temperature with my Arduino might be a little OTT (over the top), but at least I am using the micro-controller again.
It is getting colder in London, yet it is still quite mild considering that it is late November. Well, indoors it still feels like 20°C (68°F) to me, but I have been told last week that I should switch on the heating.
Luckily I found an old thermometer to check. The thermometer showed 18°C. Is it really below 20°C?
The thermometer is quite old and I’m not sure that is works properly anymore.
At the R in Insurance conference Arthur Charpentier gave a great keynote talk on Bayesian modelling in R. Bayes’ theorem on conditional probabilities is strikingly simple, yet incredibly thought provoking. Here is an example from Daniel Kahneman to test your intuition. But first I have to start with Bayes’ theorem. Bayes’ theorem Bayes’ theorem states that given two events \(D\) and \(H\), the probability of \(D\) and \(H\) happening at the same time is the same as the probability of \(D\) occurring, given \(H\), weighted by the probability that \(H\) occurs; or the other way round.
Rasmus’ post of last week on binomial testing made me think about p-values and testing again. In my head I was tossing coins, thinking about gender diversity and toast. The toast and tossing a buttered toast in particular was the most helpful thought experiment, as I didn’t have a fixed opinion on the probabilities for a toast to land on either side. I have yet to carry out some real experiments.
Following on from last week, where I presented a simple example of a Bayesian network with discrete probabilities to predict the number of claims for a motor insurance customer, I will look at continuous probability distributions today. Here I follow example 16.17 in Loss Models: From Data to Decisions [1].
Suppose there is a class of risks that incurs random losses following an exponential distribution (density \(f(x) = \Theta {e}^{- \Theta x}\)) with mean \(1/\Theta\).
Here is a little Bayesian Network to predict the claims for two different types of drivers over the next year, see also example 16.15 in [1].
Let’s assume there are good and bad drivers. The probabilities that a good driver will have 0, 1 or 2 claims in any given year are set to 70%, 20% and 10%, while for bad drivers the probabilities are 50%, 30% and 20% respectively.