# Stan

## Use domain knowledge to review prior distributions

At the Insurance Data Science conference, both Eric Novik and Paul-Christian Bürkner emphasised in their talks the value of thinking about the data generating process when building Bayesian statistical models. It is also a key step in Michael Betancourt’s Principled Bayesian Workflow. In this post, I will discuss in more detail how to set priors, and review the prior and posterior parameter distributions, but also the prior predictive distributions with brms (Bürkner (2017)).

## Hierarchical loss reserving with growth curves using brms

Ahead of the Stan Workshop on Tuesday, here is another example of using brms (Bürkner (2017)) for claims reserving. This time I will use a model inspired by the 2012 paper A Bayesian Nonlinear Model for Forecasting Insurance Loss Payments (Zhang, Dukic, and Guszcza (2012)), which can be seen as a follow-up to Jim Guszcza’s Hierarchical Growth Curve Model (Guszcza (2008)). I discussed Jim’s model in an earlier post using Stan.

## Models are about what changes, and what doesn't

How do you build a model from first principles? Here is a step by step guide. Following on from last week’s post on Principled Bayesian Workflow I want to reflect on how to motivate a model. The purpose of most models is to understand change, and yet, considering what doesn’t change and should be kept constant can be equally important. I will go through a couple of models in this post to illustrate this idea.

## PK/PD reserving models

This is a follow-up post on hierarchical compartmental reserving models using PK/PD models. It will show how differential equations can be used with Stan/ brms and how correlation for the same group level terms can be modelled. PK/ PD is usually short for pharmacokinetic/ pharmacodynamic models, but as Eric Novik of Generable pointed out to me, it could also be short for Payment Kinetics/ Payment Dynamics Models in the insurance context.

## Hierarchical compartmental reserving models

Today, I will sketch out ideas from the Hierarchical Compartmental Models for Loss Reserving paper by Jake Morris, which was published in the summer of 2016 (Morris (2016)). Jake’s model is inspired by PK/PD models (pharmacokinetic/pharmacodynamic models) used in the pharmaceutical industry to describe the time course of effect intensity in response to administration of a drug dose. The hierarchical compartmental model fits outstanding and paid claims simultaneously, combining ideas of Clark (2003), Quarg and Mack (2004), Miranda, Nielsen, and Verrall (2012), Guszcza (2008) and Zhang, Dukic, and Guszcza (2012).

## Changing settlement rate model for paid losses

Last week I wrote about Glenn Meyers’ correlated log-normal chain-ladder model (CCL), which he presented at the 10th Bayesian Mixer Meetup. Today, I will continue with a variant Glenn also discussed: The changing settlement log-normal chain-ladder model (CSR). Glenn used the correlated log-normal chain-ladder model on reported incurred claims data to predict future developments. However, when looking at paid claims data, Glenn suggested to change the model slightly. Instead allowing for correlation across accident years, he allows for a gradual shift in the payout pattern to account for a change in the claim settlement rate across accident years.

On 23 November Glenn Meyers gave a fascinating talk about The Bayesian Revolution in Stochastic Loss Reserving at the 10th Bayesian Mixer Meetup in London. Glenn worked for many years as a research actuary at Verisk/ ISO, he helped to set up the CAS Loss Reserve Database and published a monograph on Stochastic loss reserving using Bayesian MCMC models. In this blog post I will go through the Correlated Log-normal Chain-Ladder Model from his presentation.

## Notes from 4th Bayesian Mixer Meetup

Last Tuesday we got together for the 4th Bayesian Mixer Meetup. Product Madnesskindly hosted us at their offices in Euston Square. About 50 Bayesians came along; the biggest turn up thus far, including developers of PyMC3(Peadar Coyle) and Stan(Michael Betancourt).The agenda had two feature talks by Dominic Steinitzand Volodymyr Kazantsevand a lightning talk by Jon Sedar.Dominic Steinitz: Hamiltonian and Sequential MC samplers to model ecosystemsDominic shared with us his experience of using Hamiltonian and Sequential Monte Carlo samplers to model ecosystems.

## Fitting a distribution in Stan from scratch

Last week the French National Institute of Health and Medical Research (Inserm) organised with the Stan Groupa training programme on Bayesian Inference with Stan for Pharmacometricsin Paris. Daniel Leeand Michael Betancourt, who run the course over three days, are not only members of Stan's development team, but also excellent teachers. Both were supported by Eric Novik, who gave an Introduction to Stanat the Paris Dataiku User Grouplast week as well.Eric Kramer (Dataiku), Daniel Lee, Eric Novik & Michael Betancourt (Stan Group)I have been playing around with Stan on and off for some time, but as Eric pointed out to me, Stan is not that kind of girl(boy?

## Notes from 3rd and 3.5th Bayesian Mixer Meetup

Two Bayesian Mixer meet-ups in a row. Can it get any better?Our third ‘regular’ meeting took place at Cass Business School on 24 June. Big thanks to Pietroand Andreas, who supported us from Cass. The next day, Jon Sedar of Applied AI, managed to arrange a special summer PyMC3 event.3rd Bayesian Mixer meet-upFirst up was Luis Usier, who talked about cross validation. Luis is a former student of Andrew Gelman, so, of course, his talk touched on Stan and the ‘loo’ (leave one out) package in R.

## Notes from 2nd Bayesian Mixer Meetup

Last Friday the 2nd Bayesian Mixer Meetup (@BayesianMixer) took place at Cass Business School, thanks to Pietro Millossovichand Andreas Tsanakas, who helped to organise the event.Bayesian Mixer at CassFirst up was Davide De March talking about the challenges in biochemistry experimentation, which are often characterised by complex and emerging relations among components. The very little prior knowledge about complex molecules bindings left a fertile field for a probabilistic graphical model. In particular, Bayesian networks can help the investigator in the definition of a conditional dependence/independence structure where a joint multivariate probability distribution is determined.

## Bayesian Mixer on Meetup

We had our first successful Bayesian Mixer Meetuplast Friday night at the Artillery Arms!We expected about 15 - 20 people to turn up, when we booked the function room overlooking Bunhill Cemetery and Bayes’ grave. Now, looking at the photos taken during the evening, it seems that our prior believe was pretty good.The event started with a talk from my side about some very basic Bayesian models, which I used a while back to get my head around the concepts in an insurance context.

## Hierarchical Loss Reserving with Stan

I continue with the growth curve model for loss reserving from last week's post. Today, following the ideas of James Guszcza [2]I will add an hierarchical component to the model, by treating the ultimate loss cost of an accident year as a random effect. Initially, I will use the nlmeR package, just as James did in his paper, and then move on to Stan/RStan[6], which will allow me to estimate the full distribution of future claims payments.

## Loss Developments via Growth Curves and Stan

Last week I posted a biological example of fitting a non-linear growth curvewith Stan/RStan. Today, I want to apply a similar approach to insurance data using ideas by David Clark [1]and James Guszcza [2].Instead of predicting the growth of dugongs (sea cows), I would like to predict the growth of cumulative insurance loss payments over time, originated from different origin years. Loss payments of younger accident years are just like a new generation of dugongs, they will be small in size initially, grow as they get older, until the losses are fully settled.

## Non-linear growth curves with Stan

I suppose the go to tool for fitting non-linear models in R is nlsof the statspackage. In this post I will show an alternative approach with Stan/RStan, as illustrated in the example, Dugongs: “nonlinear growth curve", that is part of Stan's documentation. The original example itself is taken from OpenBUGS. The data describes the length and age measurements for 27 captured dugongs (sea cows). Carlin and Gelfand (1991)model the data using a nonlinear growth curve with no inflection point and an asymptote as $x_i$ tends to infinity:$$Y_i \sim \mathcal{N}(\mu_i, \sigma^2),; i = 1,\dots,27\\\\mu_i = \alpha - \beta \lambda^{x_i},; \alpha,, \beta > 0;, 0 < \lambda < 1$$ Fitting the curve with nlsgives the following results:Writing the model in Stan requires a few more lines, but gives me also the opportunity to generate output from the posterior distributions.

## Bayesian regression models using Stan in R

It seems the summer is coming to end in London, so I shall take a final look at my ice cream data that I have been playing around with to predict sales statistics based on temperature for the last couple of weeks [1], [2], [3].Here I will use the new brms(GitHub, CRAN) package by Paul-Christian Bürknerto derive the 95% prediction credible interval for the four models I introduced in my first post about generalised linear models.

## Visualising the predictive distribution of a log-transformed linear model

Last weekI presented visualisations of theoretical distributions that predict ice cream sales statistics based on linear and generalised linear models, which I introduced in an earlier post.Theoretical distributionsToday I will take a closer look at the log-transformed linear model and use Stan/rstan, not only to model the sales statistics, but also to generate samples from the posterior predictive distribution. The posterior predictive distribution is what I am most interested in. From the simulations I can get the 95% prediction interval, which will be slightly wider than the theoretical 95% interval, as it takes into account the parameter uncertainty as well.

## Posterior predictive output with Stan

I continue my Stan experimentswith another insurance example. Here I am particular interested in the posterior predictive distribution from only three data points. Or, to put it differently I have a customer of three years and I'd like to predict the expected claims cost for the next year to set or adjust the premium.The example is taken from section 16.17 in Loss Models: From Data to Decisions[1]. Some time ago I used the same example to get my head around a Bayesian credibility model.

## Hello Stan!

In my previous postI discussed how Longley-Cook, an actuary at an insurance company in the 1950's, used Bayesian reasoning to estimate the probability for a mid-air collision of two planes.Here I will use the same model to get started with Stan/RStan, a probabilistic programming language for Bayesian inference. Last week my prior was given as a Beta distribution with parameters $\alpha=1, \beta=1$ and the likelihood was assumed to be a Bernoulli distribution with parameter $\theta$: \begin{aligned}\theta & \sim \mbox{Beta}(1, 1)\\\y_i & \sim \mbox{Bernoulli}(\theta), ;\forall i \in N\end{aligned}For the previous five years no mid-air collision were observed, $x={0, 0, 0, 0, 0}$.