Hierarchical loss reserving with growth curves using brms

Ahead of the Stan Workshop on Tuesday, here is another example of using brms (Bürkner (2017)) for claims reserving. This time I will use a model inspired by the 2012 paper A Bayesian Nonlinear Model for Forecasting Insurance Loss Payments (Zhang, Dukic, and Guszcza (2012)), which can be seen as a follow-up to Jim Guszcza’s Hierarchical Growth Curve Model (Guszcza (2008)).

I discussed Jim’s model in an earlier post using Stan. Paul-Christian Bürkner showed then a little later how to implement this model using his brms package as part of the vignette Estimating Non-Linear Models with brms.

The Bayesian model proposed in (Zhang, Dukic, and Guszcza (2012)) predicts future claim payments across several insurance companies using growth curves.

The abstract of the paper motivates the model well:

“This approach enables us to carry out inference at the level of industry, company, and/or accident year, based on the full posterior distribution of all quantities of interest. In addition, prior experience and expert opinion can be incorporated into the analyses through judgmentally selected prior probability distributions.”


Here is the data of the paper, which shows premiums and claim payments for 10 US insurance companies, for accidents years 1988 to 1997. In some cases the claims development is available for 10 years. The data itself is sourced from old regulatory filings.

url <- "https://raw.githubusercontent.com/mages/diesunddas/master/Data/wc_data.csv"
WorkersComp <- fread(url, key=c("entity_name", "origin_year", "dev_year"))
WorkersComp <- WorkersComp[, loss_ratio := cumulative_paid/premium]
xyplot(loss_ratio*100 ~ dev_year | entity_name, groups = origin_year,
       data=WorkersComp, t="l", layout=c(5,2), as.table=TRUE, 
       ylab="Loss ratio (%)", xlab="Dev year",
       main="Paid loss ratio developments by company and accident year",
       auto.key = list(space="top", columns=5),
       par.settings = theEconomist.theme(),
       scales = list(alternating=1))

As usual, it is the aim to predict the future claims payments for the various accident years.

I will take a subset of the data to fit my model, and use the hold out data to review its performance.

For my training data set I will use data up to calendar year 1996, i.e. I exclude the 1997 accident year and future developments, ending up with triangles at the end of 1996.

WorkersComp[, loss_ratio_train := ifelse(dev_year + origin_year - 1 < 1997,
                                         loss_ratio, NA)]


Growth curve

Growth curves are used to model the claims development process over time, see for example (Clark (2003)).

We can model the claims amount over time as: \[ \mbox{Paid claims}(t) = \mbox{Premiums} \cdot \gamma \cdot G(t) \] Here \(\gamma = \mbox{Ultimate paid claims} / \mbox{Premiums}\) represents the ultimate loss ratio (ULR) and \(G(t)\) a growth curve of cumulative paid claims to ultimate.

Alternatively, we can look at the loss ratio \(\ell(t) = \mbox{Paid claims}(t)/\mbox{Premiums}\) development:

\[ \ell(t) = \gamma \cdot G(t) \]

In this post I will use the loss ratio version, as it keeps the metrics around 1, regardless of the size of the insurance company.

There are many different options for growth curves (Panik (2014)). Here, following the paper (Zhang, Dukic, and Guszcza (2012)) I will also use the log-logistic curve:

\[ G(t; \theta, \omega, ) = \frac{t^{\omega}}{t^{\omega} + \theta^{\omega}} \]

growthcurve <- function(t, omega, theta) t^omega/(t^omega + theta^omega)
curve(growthcurve(x, 2, 4)*100, from = 0, to = 10, bty="n", ylim=c(0,100),
      main="Log-logistic gowth curve", ylab="% developed", xlab="Development period")

The growth curve parameter \(\theta\) corresponds to the time at which half of the growth has occurred, and the parameter \(\omega\) describes the slope of the curve around \(\theta\).

Hierarchical model

I shall assume that loss ratios follow a log-normal distribution, with the median described by growth curves to ultimate loss ratios. In addition, I believe that the ULRs and shapes of the growth curves are company \((k)\) specific and correlated, e.g. companies with lower ULRs have faster payment process; a lower value of \(\theta\), and a higher value of \(\omega\), but the ULR is expected to vary across accident years \((i)\) as well.

\[ \begin{aligned} \ell_{ik}(t) & \sim \log \mathcal{N}(\log{\mu_{ik}(t)}, \sigma^2) \\ \mu_{ik}(t) & = \gamma_{ik} \; G_k(t; \omega_k, \theta_k) \\ G_k(t; \omega_k, \theta_k) & = \frac{t^{\omega_k}}{t^{\omega_k} + \theta_k^{\omega_k}} \\ \gamma_{ik} & = \gamma + \gamma_k^0 + \gamma_{ik}^0\\ \omega_k & = \omega + \omega_k^0 \\ \theta_k & = \theta + \theta_k^0 \\ \mbox{Priors} & :\\ \gamma & \sim \log \mathcal{N}(\log(0.6), \log(2))\\ \theta & \sim \mathcal{N}(4, 1)^+ \\ \omega & \sim \mathcal{N}(2, 1)^+ \\ \sigma & \sim \mbox{Student-t}(3,0,1)^+\\ (\gamma_k^0, \omega_k^0, \theta_k^0) & \sim \mathcal{MN}\left(\mathbf{0}, \mathbf{\Sigma}\right) \\ \mathbf{\Sigma} & = \mathbf{D}(\sigma^0_{\gamma}, \sigma^0_{\omega}, \sigma^0_{\theta})\, \mathbf{\Omega}\,\mathbf{D}(\sigma^0_{\gamma}, \sigma^0_{\omega}, \sigma^0_{\theta})\\ \sigma^0_{\gamma}, \sigma^0_{\omega}, \sigma^0_{\theta}& \sim \mbox{Student-t}(3, 0, 1)^+\\ \mathbf{\Omega} & \sim \mbox{LKJ}(2) \\ \gamma_{ik}^0 & \sim \mathcal{N}(0, \sigma^2_{[ik]})\\ \sigma_{[ik]} & \sim \mbox{Student-t}(3,0,1)^+ \\ \end{aligned} \]

This model is similar to the one proposed in (Zhang, Dukic, and Guszcza (2012)), but there are some differences:

  • loss ratios are modelled instead of loss amounts,
  • \(\sigma\) is kept constant across all companies \(k\),
  • \(\gamma_k^0, \omega_k^0, \theta_k^0\) are modelled via a multivariate normal, instead of a log-normal distribution,
  • the prior for \(\Omega\) is assumed to be a \(LKJ\) instead of an inverse-Wishart distribution,
  • \(\gamma_{ik}^0\) are modelled additive and drawn from a normal, instead of log-normal distribution,
  • there is no auto-regressive process,
  • development years are used instead of development months.

Hopefully, all of the above changes have little impact on the model performance, but will make the use of brm easier/ possible.

Fitting the model

First I load the relevant R packages.

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

Next I specify the non-linear hierarchical model and set my prior distributions.

To allow for interaction between companies and accident years I use the “:” operator, additionally I add ID as a place holder to model the group effect of entity_name on ulr, omega and theta and to allow for correlation at company level. For more details see (Bürkner (2017)).

frml <- bf(loss_ratio_train ~ log(ulr * dev_year^omega/(dev_year^omega + theta^omega)),
           ulr ~ 1 +  (1|ID|entity_name) + (1|origin_year:entity_name) ,
           omega ~ 1 + (1|ID|entity_name),
           theta ~ 1 + (1|ID|entity_name),
           nl = TRUE)

my_priors <- c(
  prior(lognormal(log(0.6), log(2)), nlpar = "ulr", lb=0),
  prior(normal(2, 1), nlpar = "omega", lb=0),
  prior(normal(4, 1), nlpar = "theta", lb=0),
  prior(student_t(3, 0, 1), class = "sigma"),
  prior(student_t(3, 0, 1), class = "sd", nlpar = "omega"),
  prior(student_t(3, 0, 1), class = "sd", nlpar = "theta"),
  prior(student_t(3, 0, 1), class = "sd", nlpar = "ulr"),
  prior(lkj(2), class="cor"))

mdl <- brm(frml, data = WorkersComp, prior = my_priors, seed = 1234,
           family = lognormal(link = "identity", link_sigma = "identity"),
           control = list(adapt_delta = 0.999, max_treedepth=15))

Get yourself a cup of tea, as the sampling takes about 15 - 30 minutes, depending on your computer

Model review

Let’s look at the output of brm.

##  Family: lognormal 
##   Links: mu = identity; sigma = identity 
## Formula: loss_ratio_train ~ log(ulr * dev_year^omega/(dev_year^omega + theta^omega)) 
##          ulr ~ 1 + (1 | ID | entity_name) + (1 | origin_year:entity_name)
##          omega ~ 1 + (1 | ID | entity_name)
##          theta ~ 1 + (1 | ID | entity_name)
##    Data: WorkersComp (Number of observations: 450) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## Group-Level Effects: 
## ~entity_name (Number of levels: 10) 
##                                      Estimate Est.Error l-95% CI u-95% CI
## sd(ulr_Intercept)                        0.05      0.03     0.01     0.11
## sd(omega_Intercept)                      0.12      0.04     0.06     0.23
## sd(theta_Intercept)                      0.11      0.04     0.04     0.21
## cor(ulr_Intercept,omega_Intercept)      -0.14      0.33    -0.73     0.52
## cor(ulr_Intercept,theta_Intercept)       0.23      0.34    -0.50     0.80
## cor(omega_Intercept,theta_Intercept)     0.09      0.32    -0.52     0.69
##                                      Eff.Sample Rhat
## sd(ulr_Intercept)                           735 1.00
## sd(omega_Intercept)                        2719 1.00
## sd(theta_Intercept)                        2394 1.00
## cor(ulr_Intercept,omega_Intercept)         1676 1.00
## cor(ulr_Intercept,theta_Intercept)         2547 1.00
## cor(omega_Intercept,theta_Intercept)       3303 1.00
## ~origin_year:entity_name (Number of levels: 90) 
##                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sd(ulr_Intercept)     0.11      0.01     0.10     0.13       1294 1.00
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## ulr_Intercept       0.71      0.02     0.66     0.75       2707 1.00
## omega_Intercept     1.82      0.04     1.73     1.90       2678 1.00
## theta_Intercept     2.12      0.04     2.03     2.21       4000 1.00
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     0.03      0.00     0.03     0.04       4000 1.00
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

The output looks promising, particularly for ulr, omega and theta. The estimators are very similar to the ones published in (Zhang, Dukic, and Guszcza (2012)).

Perhaps, the correlation term is not relevant, given the wide credible intervals. However, the positive correlation between ULR and \(\theta\) (higher ULR correlates with longer claims settlement process) was expected, equally a negative correlation between ULR and \(\omega\) looks sensible.


The plots don’t show anything that raise concerns on my side.

Thus, let’s move on to something more interesting. I’d like to compare the estimated parameters \(\gamma_k\) for the ULRs across companies.

theme_update(text = element_text(family = "sans"))
posterior <- as.array(mdl)
   pars = names(mdl$fit)[c(1, 12:21)],
   prob = 0.8, # 80% intervals
   prob_outer = 0.99, # 99%
   point_est = "mean"
) + ggplot2::labs(
  title = "ULR posterior distributions",
  subtitle = "with medians and 80% intervals"

Now this is interesting! For the industry the expected URL is around 70% across all years. However, the chart shows as well how the companies vary from the industry in an additive way. Companies with a distribution left of \(0\) are likely to out-perform the industry, while companies to the right of \(0\) are likely under-performers.

Model prediction

Let’s use our model to predict future claims up to 1997 and shows also the 2.5%ile and 97.5%ile of the posterior predictive distribution.

What I really like about this model is that I can also make a prediction for the 1997 accident year, although I used only data up to 1996. The model estimated for each company the “average” development pattern and ULR, which can be used to give us an initial view for the 1997 accident year - basically a sensible planning assumption, based on past experience.

n_dev_years <- max(WorkersComp$dev_year)
n_entities <- length(unique(WorkersComp$entity_name))
n_origin_years <- length(unique(WorkersComp$origin_year))
newdat <- data.table(
  entity_name = rep(unique(WorkersComp$entity_name), each=n_origin_years * n_dev_years),
  origin_year = rep(unique(WorkersComp$origin_year), n_entities * n_dev_years),
  dev_year = rep(rep(c(1:n_dev_years), n_entities), each=n_origin_years), 
  key = c("entity_name", "origin_year", "dev_year"))

pp <- posterior_predict(mdl, newdata = newdat, allow_new_levels=TRUE)
plotData <- WorkersComp[newdat]
plotData$PredLR <- apply(pp, 2, mean)
plotData$PredLR25 <- apply(pp, 2, quantile, probs=0.025)
plotData$PredLR975 <- apply(pp, 2, quantile, probs=0.975)
plotData$loss_ratio_test <- WorkersComp[plotData]$loss_ratio

To visualise the output I use a function from an earlier post.

plotDevBananas(PredLR25*100 + PredLR975*100 + PredLR*100 + 
                 loss_ratio_train*100 + loss_ratio_test*100 ~ 
                 dev_year |  factor(origin_year) * entity_name,
               data=plotData[order(origin_year, dev_year)],  main="")


Overall, I am very happy with the predictions and the plots. The more recent accident years have a wider development funnel, as we have less data available.

The actual performance of the 1997 accident year looks a little peculiar. The recorded loss ratios for Great Amer is much worse than predicted, the loss ratios for Hartford Fire, Ohio Cas and most significantly for State Farm performed better than expected. Indeed, all 3 of them seem to have improved their performance from 1995 onwards.


Modelling several companies at the same time can be a powerful approach to incorporate more data into the reserving process. The model shown here not only predicted future claims, but as a by-product ULRs by company and for the industry. Further, it allowed me to make an initial prediction for future accident years, which can be helpful for business planning and pricing.

Using parametric growth curves can provide a robust approach to model the claims process and avoids the selection of tail factors in traditional reserving methods, such as chain-ladder.

Motivating growth curves via differential equations might be an appropriate step to take. I really like Jake Morris’ approach of hierarchical compartmental models for loss reserving (Morris (2016)), which I outlined in an earlier post. It highlights the value of developing an understanding of the data generating process.

Finally, one can take the generated Stan code of brm, use stancode(mdl) to access it, to create even more sophisticated models than I have shown here.

You can find further articles on claims reserving on my blog.

Session Info

## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.14
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## other attached packages:
##  [1] bindrcpp_0.2.2      bayesplot_1.6.0     brms_2.5.0         
##  [4] Rcpp_0.12.18        rstan_2.17.3        StanHeaders_2.17.2 
##  [7] ggplot2_3.0.0       latticeExtra_0.6-28 RColorBrewer_1.1-2 
## [10] lattice_0.20-35     data.table_1.11.4  
## loaded via a namespace (and not attached):
##  [1] Brobdingnag_1.2-6    gtools_3.8.1         threejs_0.3.1       
##  [4] shiny_1.1.0          assertthat_0.2.0     stats4_3.5.1        
##  [7] yaml_2.2.0           pillar_1.3.0         backports_1.1.2     
## [10] glue_1.3.0           digest_0.6.17        promises_1.0.1      
## [13] colorspace_1.3-2     htmltools_0.3.6      httpuv_1.4.5        
## [16] Matrix_1.2-14        plyr_1.8.4           dygraphs_1.1.1.6    
## [19] pkgconfig_2.0.2      bookdown_0.7         purrr_0.2.5         
## [22] xtable_1.8-3         mvtnorm_1.0-8        scales_1.0.0        
## [25] later_0.7.4          tibble_1.4.2         DT_0.4              
## [28] withr_2.1.2          shinyjs_1.0          lazyeval_0.2.1      
## [31] magrittr_1.5         crayon_1.3.4         mime_0.5            
## [34] evaluate_0.11        nlme_3.1-137         xts_0.11-1          
## [37] blogdown_0.8         colourpicker_1.0     rsconnect_0.8.8     
## [40] tools_3.5.1          loo_2.0.0            matrixStats_0.54.0  
## [43] stringr_1.3.1        munsell_0.5.0        compiler_3.5.1      
## [46] rlang_0.2.2          grid_3.5.1           ggridges_0.5.0      
## [49] htmlwidgets_1.2      crosstalk_1.0.0      igraph_1.2.2        
## [52] miniUI_0.1.1.1       base64enc_0.1-3      rmarkdown_1.10      
## [55] codetools_0.2-15     gtable_0.2.0         inline_0.3.15       
## [58] abind_1.4-5          curl_3.2             markdown_0.8        
## [61] reshape2_1.4.3       R6_2.2.2             gridExtra_2.3       
## [64] rstantools_1.5.1     zoo_1.8-3            knitr_1.20          
## [67] bridgesampling_0.5-2 dplyr_0.7.6          shinythemes_1.1.1   
## [70] bindr_0.1.1          shinystan_2.5.0      rprojroot_1.3-2     
## [73] stringi_1.2.4        parallel_3.5.1       tidyselect_0.2.4    
## [76] xfun_0.3             coda_0.19-1


Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

Clark, David R. 2003. LDF Curve-Fitting and Stochastic Reserving: A Maximum Likelihood Approach. Casualty Actuarial Society; http://www.casact.org/pubs/forum/03fforum/03ff041.pdf.

Guszcza, James. 2008. “Hierarchical Growth Curve Models for Loss Reserving.” In, 146–73. CAS Fall Forum; https://www.casact.org/pubs/forum/08fforum/7Guszcza.pdf.

Morris, Jake. 2016. “Hierarchical Compartmental Models for Loss Reserving.” In. Casualty Actuarial Society Summer E-Forum; https://www.casact.org/pubs/forum/16sforum/Morris.pdf.

Panik, Michael J. 2014. Growth Curve Modeling: Theory and Applications. Wiley. https://www.wiley.com/en-us/Growth+Curve+Modeling%3A+Theory+and+Applications-p-9781118764046.

Zhang, Yanwei, Vanja Dukic, and James Guszcza. 2012. “A Bayesian Non-Linear Model for Forecasting Insurance Loss Payments.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 175 (2). https://doi.org/10.1111/j.1467-985X.2011.01002.x: 637–56.


comments powered by Disqus