Correlated log-normal chain-ladder model

On 23 November Glenn Meyers gave a fascinating talk about The Bayesian Revolution in Stochastic Loss Reserving at the 10th Bayesian Mixer Meetup in London.

Glenn Meyers speaking at the Bayesian Mixer

Glenn Meyers speaking at the Bayesian Mixer

Glenn worked for many years as a research actuary at Verisk/ ISO, he helped to set up the CAS Loss Reserve Database and published a monograph on Stochastic loss reserving using Bayesian MCMC models.

Slides

In this blog post I will go through the Correlated Log-normal Chain-Ladder Model from his presentation. It is discussed in more detailed in his monograph.

Glenn kindly shared his code as well, which I have used as a basis for this post.

Getting the data

The CAS Loss Reserve Database is an excellent data source to test reserving models. It is hosted on the CAS website and contains historical regulatory filings of US insurance companies.

The following code allows me to download the data and extract the information for one company, here company 353, which was the example company Glenn used as well.

library(data.table)
CASdata <- fread("http://www.casact.org/research/reserve_data/comauto_pos.csv")

createLossData <- function(CASdata, company_code, loss_type="incurred"){
  library(data.table)

  compData <- CASdata[GRCODE==company_code,
                      c("EarnedPremNet_C", "AccidentYear", "DevelopmentLag", 
                        "IncurLoss_C", "CumPaidLoss_C", "BulkLoss_C")]
  # rename column names
  setnames(compData, names(compData),
           c("premium", "accident_year", "dev",
             "incurred_loss", "paid_loss", "bulk_loss"))
  compData <- compData[, `:=`(
    origin = accident_year - min(accident_year) + 1,
    # origin period starting from 1, id to identify first origin period
    origin1id = ifelse(accident_year == min(accident_year), 0,  1))]
  # set losses < 0 to 1, as we will take the log later
  if(loss_type %in% "incurred"){
    compData <- compData[, loss := pmax(incurred_loss - bulk_loss, 1)]
  }
  else{
    compData <- compData[, loss := pmax(paid_loss, 1)]
  }
  # add calendar period and sort data by dev and then origin
  compData <- compData[, cal := origin + dev - 1][order(dev, origin)]
  compData <- compData[, `:=`(
    loss_train = ifelse(cal <= max(origin), loss, NA),
    loss_test = ifelse(cal > max(origin), loss, NA))]
  traintest <- rbindlist(list(compData[cal <= max(origin)],
                              compData[cal > max(origin)]))
  return(traintest)
}

lossData <- createLossData(CASdata, company_code = 353, loss_type="incurred")

The data shows the historical annual developments of incurred claims for accident years 1988 to 1997. I separated the data into a training and test data set. The training data will have incurred claims reported up to calendar year 1997, while data from 1998 to 2006 will be used to test the model.

Let’s take a look at the data for Commercial Auto of company 353.

devPlot <- function(x, data, company_code,
                    main="Reported incurred loss development by accident year",
                    ylab = "Reported incurred loss ($k)",
                    xlab="Development year",...){
  library(lattice)
  key <- list(rep=FALSE, columns=2,
              lines=list(col=c("#00526D", "#00526D"), 
                         type=c("p", "p"), 
                         pch=c(19, 1)),
              text=list(lab=c("Observation", "Hold out observation")))
  xyplot(x, data, t=c("b", "b"), pch=c(19, 1), 
         col=c("#00526D", "#00526D"),
         layout=c(5,2),
         par.settings = list(strip.background=list(col="#CBDDE6")),
         par.strip.text = list(font = 2),
         key = key, 
         as.table=TRUE,
         scales=list(alternating=1),
         xlab=xlab, ylab=ylab, main=main,
         sub=paste("Data source: CAS Loss Reserving Database, Comm. Auto, Comp.",
                   company_code), ...)
}

devPlot(loss_train + loss_test 
        ~ dev | factor(accident_year), 
        data=lossData, company_code = 353)

The data appears to be reasonably well behaved. Although the years converge to different levels of ultimate incurred claims, the shape of the curves look similar for most years. The accident years 1988 and 1993 show a distinctly different pattern.

The correlated log-normal chain-ladder model

The correlated log-normal chain-ladder model combines ideas of the classical chain-ladder method, the growth curve model and Bayesian inference.

\[ \begin{align} C_{i,j} & \sim \log\mathcal{N}(\mu_{i,j}, \sigma^2_j)\\ L_i & = P_i \cdot \ell\\ \mu_{1,j} & = \log(L_1) + \beta_j \\ \mu_{i,j} & = \log(L_i) + \alpha_i + \beta_j + \rho \cdot \left(\log(C_{i-1,j}) - \mu_{i-1,j}\right) \\ & \mbox{for } i = 2, \dots, M \mbox{ and } j = 1, \dots, N+1-i \\ & \mbox{and } \alpha_1 := 0,\, \beta_N := 0\\ \sigma^2_j & = \sum_{k=j}^N a_k \\ & \mbox{with priors}\\ \alpha_i & \sim \mathcal{N}(0, 10) \\ \beta_i & \sim \mathcal{N}(0, 10) \\ \log(\ell) & \sim \mathcal{N}(0, 1) \\ \rho & \sim \mbox{Beta}(2, 2)\\ a_k & \sim \mbox{Uniform}(0,1)\\ &\mbox{and}\\ C_{i,j} &:=\mbox{cumulative incurred claims of origin } i\mbox{, development }j \\ P_{i} & := \mbox{premium of origin } i\\ \ell & := \mbox{'population' loss ratio across all origin periods}\\ L_i & := \mbox{initial expected ultimate loss for origin } i\\ M & := \mbox{number of origin periods} \\ N & := \mbox{number of development periods} \end{align} \]

Given that we assume the losses follow a log-normal distribution we have:

\[ \begin{align} \mbox{Median}(C_{i,j}) & = L_i \cdot \exp(\mu_{i,j})\\ \mathbb{E}(C_{i,j}) & = L_i \cdot \exp\left(\mu_{i,j} + \frac{1}{2} \sigma_j^2 \right) \end{align} \] Similar to the chain-ladder method we have one parameter for each development period \(\beta_j\) and one parameter for each origin period \(\alpha_i\).

But unlike the Mack-chain-ladder model, Glenn allows for correlation between origin periods \(i\) and \(i-1\) in his model: \(\rho \cdot \left(\log(C_{i-1,j}) - \mu_{i-1,j}\right)\).

Interestingly as well, Glenn assumes that the volatility shrinks as claims mature over development periods, modelling \(\sigma_j^2\) as the sum of uniform variances.

This model is certainly not short on parameters. I wonder, if it is over-parametrised? Does it provide good in sample prediction, but perform poorly out of sample?

Stan model

So let’s put this model into Stan. Again, I follow Glenn’s implementation, but I have changed some of the variable names and added some lines in the generated quantities block to predict incurred claims for the test data period.

library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
data{
  int <lower=1> len_data; // number of rows with data
  int<lower=1> len_pred; // number of rows to predict
  // indicator of first origin period
  int<lower=0, upper=1> origin1id[len_data + len_pred]; 
  real logprem[len_data + len_pred];
  real logloss[len_data];
  int<lower=1> origin[len_data + len_pred]; // origin period
  int<lower=1> dev[len_data + len_pred]; // development period
}
transformed data{
  int n_origin = max(origin);
  int n_dev = max(dev);
  int len_total = len_data + len_pred;
}
parameters{
  real r_alpha[n_origin - 1];
  real r_beta[n_dev - 1];
  real log_elr;
  real<lower=0, upper=100000> a_ig[n_dev];
  real<lower=0, upper=1> r_rho;
  real logloss_pred[len_pred];
}
transformed parameters{
  real alpha[n_origin];
  real beta[n_dev];
  real sig2[n_dev];
  real sig[n_dev];
  real mu[len_data];
  real mu_pred[len_pred];
  real <lower=-1, upper=1> rho;
  
  rho = -2*r_rho + 1;

    alpha[1] = 0;
  for (i in 2:n_origin){
    alpha[i] = r_alpha[i-1];
  }
  
  for (i in 1:(n_dev - 1)){
    beta[i] = r_beta[i];
  }
  beta[n_dev] = 0;
  
  // Create ascending set of sig2
  sig2[n_dev] = gamma_cdf(1/a_ig[n_dev],1,1); // map into [0,1]
  for (i in 1:(n_dev-1)){ 
    sig2[n_dev - i] = sig2[n_dev + 1 - i] + gamma_cdf(1/a_ig[i],1,1);
  }
  for (i in 1:n_dev){ 
    sig[i] = sqrt(sig2[i]);
  }
  // first origin and dev period (top left corner of triangle)
  mu[1] = logprem[1] + log_elr + beta[dev[1]];
  
  for (i in 2:len_data){
    mu[i] = logprem[i] + log_elr + alpha[origin[i]] +  beta[dev[i]] +
    rho*(logloss[i-1] - mu[i-1]) * origin1id[i];
  }
  
  mu_pred[1] =  logprem[(len_data) + 1] + alpha[origin[len_data + 1]] + 
  log_elr + beta[dev[len_data + 1]] +
  rho*(logloss[len_data] - mu[len_data]) * origin1id[len_data + 1];
  
  for (i in 2:len_pred){
    mu_pred[i] = logprem[len_data + i] + alpha[origin[len_data + i]] + 
    log_elr + beta[dev[len_data + i]] +
    rho*(logloss_pred[i-1] - mu_pred[i-1]) * origin1id[len_data + i];
  }
}
model {
  log_elr ~ normal(0, 1);
  r_alpha ~ normal(0, sqrt(10/1.0));
  r_beta ~ normal(0, sqrt(10/1.0));
  a_ig ~ inv_gamma(1,1); // inverse gamma for numerical resaons
  r_rho ~ beta(2,2);
  // model where we have data
  for (i in 1:(len_data)){
    logloss[i] ~ normal(mu[i], sig[dev[i]]);
  }
  // model where data is missing, the prediction period
  for (i in 1:(len_pred)){
    logloss_pred[i] ~ normal(mu_pred[i], sig[dev[len_data + i]]);
  }
}
generated quantities{
  vector[len_data] log_lik;
  vector[len_total] ppc_loss;
  
  for (i in 1:len_data){ 
  log_lik[i] = normal_lpdf(logloss[i] | mu[i], sig[dev[i]]);
  }
  // simulate posterior predicted losses
  for (i in 1:len_data){ 
    ppc_loss[i] = exp(normal_rng(mu[i], sig[dev[i]]));
  }
  for (i in 1:len_pred){ 
    ppc_loss[len_data + i] = exp(normal_rng(mu_pred[i], sig[dev[len_data + i]]));
  }
}

The next code block prepares the data as an input for Stan.

createStanDataList <- function(lossData){
  with(lossData, {
    train_idx <- !is.na(loss_train)
    list(
      len_data = sum(train_idx),
      len_pred = sum(!train_idx),
      logprem = log(premium),
      logloss = log(loss_train[train_idx]),
      origin = origin,
      dev = dev,
      origin1id = origin1id)
  })
}
stan_data <- createStanDataList(lossData)

Model run for company 353

Finally, we can run our Stan model for company 353.

fitCCL353 <- sampling(CCLmodel,
                      data=stan_data,
                      seed = 1234, iter = 4000,
                      control=list(adapt_delta = 0.99,
                                   max_treedepth = 10))
## Warning: There were 4824 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems

Stan comes back with a couple of warnings and messages to investigate the model in more detail.

I get pretty much the same output as Glenn (see slides 19, 20):

print(fitCCL353, 
      pars=c("alpha", "beta", "rho", "log_elr", "sig"), 
      probs=c(0.025, 0.975))
## Inference for Stan model: 478cdd2fd5b96bedcdb4ea70c25e0f67.
## 4 chains, each with iter=4000; warmup=2000; thin=1; 
## post-warmup draws per chain=2000, total post-warmup draws=8000.
## 
##            mean se_mean   sd  2.5% 97.5% n_eff Rhat
## alpha[1]   0.00    0.00 0.00  0.00  0.00  8000  NaN
## alpha[2]  -0.26    0.00 0.02 -0.29 -0.23  2770 1.00
## alpha[3]   0.11    0.00 0.02  0.06  0.15  1279 1.00
## alpha[4]   0.21    0.00 0.03  0.16  0.26  1314 1.00
## alpha[5]   0.01    0.00 0.03 -0.06  0.07   699 1.00
## alpha[6]  -0.06    0.00 0.04 -0.13  0.03  1694 1.00
## alpha[7]   0.45    0.00 0.06  0.33  0.56  1432 1.00
## alpha[8]   0.03    0.00 0.08 -0.13  0.20  1790 1.00
## alpha[9]   0.15    0.00 0.14 -0.14  0.43  1341 1.00
## alpha[10]  0.18    0.01 0.29 -0.36  0.77  1300 1.01
## beta[1]   -0.59    0.00 0.11 -0.83 -0.38  4196 1.00
## beta[2]   -0.19    0.00 0.07 -0.32 -0.05  2987 1.00
## beta[3]   -0.10    0.00 0.05 -0.20  0.00  1821 1.00
## beta[4]   -0.03    0.00 0.04 -0.10  0.05  1352 1.00
## beta[5]   -0.01    0.00 0.03 -0.07  0.05  1174 1.00
## beta[6]    0.00    0.00 0.03 -0.06  0.06  1169 1.00
## beta[7]    0.00    0.00 0.03 -0.05  0.06  1010 1.00
## beta[8]    0.01    0.00 0.02 -0.05  0.06  1257 1.00
## beta[9]    0.00    0.00 0.02 -0.05  0.05  1662 1.00
## beta[10]   0.00    0.00 0.00  0.00  0.00  8000  NaN
## rho        0.17    0.00 0.21 -0.27  0.55  2523 1.00
## log_elr   -0.39    0.00 0.01 -0.43 -0.37  1207 1.00
## sig[1]     0.26    0.00 0.09  0.14  0.51  3428 1.00
## sig[2]     0.15    0.00 0.04  0.09  0.26  3884 1.00
## sig[3]     0.09    0.00 0.03  0.05  0.16  1856 1.00
## sig[4]     0.06    0.00 0.02  0.03  0.11   467 1.01
## sig[5]     0.04    0.00 0.02  0.02  0.09   328 1.01
## sig[6]     0.04    0.00 0.02  0.02  0.07   290 1.01
## sig[7]     0.03    0.00 0.01  0.01  0.06   291 1.01
## sig[8]     0.02    0.00 0.01  0.01  0.05   289 1.01
## sig[9]     0.02    0.00 0.01  0.01  0.04   294 1.01
## sig[10]    0.01    0.00 0.01  0.00  0.03   330 1.01
## 
## Samples were drawn using NUTS(diag_e) at Thu Dec  7 00:59:20 2017.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

Some of the parameters don’t appear significant, such as \(\beta_5\) to \(\beta_9\), but then again they shouldn’t have done any harm here either, as they will have had little impact.

However, I am most interested in comparing the posterior predictive distribution with the training and test data. How does the model perform out of sample?

The next code chunk extracts the 95% credible interval from the posterior predictive simulated losses and plots the output.

createPlotData <- function(stanfit, data, probs=c(0.25, 0.975)){
  
  ppc_loss <- as.matrix(extract(stanfit, "ppc_loss")$ppc_loss)
  ppc_loss_summary <- cbind(
    mean=apply(ppc_loss, 2, mean),
    t(apply(ppc_loss, 2, quantile, probs=probs))
  )
  colnames(ppc_loss_summary) = c(
    "Y_pred_mean",
    paste0("Y_pred_cred", gsub("\\.", "", probs[1])),
    paste0("Y_pred_cred", gsub("\\.", "", probs[2])))
  
  return(cbind(ppc_loss_summary, data))
}
plotDevBananas <- function(x, data, company_code,
                           xlab="Development year", 
                           ylab="Reported incurred loss ($k)", 
                           main="Correlated Log-normal Chain Ladder Model",
                           ...){
  key <- list(
    rep=FALSE, 
    lines=list(col=c("#00526D", "#00526D", "purple"), 
               type=c("p", "p", "l"), 
               pch=c(19, 1, NA)),
    text=list(lab=c("Observation", "Hold out observation", "Mean estimate")),
    rectangles = list(col=adjustcolor("yellow", alpha.f=0.5), border="grey"),
    text=list(lab="95% Prediction credible interval"))
  
  xyplot(x, data=data,as.table=TRUE,xlab=xlab, ylab=ylab, main=main,
         sub=paste("Data source: CAS Loss Reserving Database,",
                   "Comm. Auto, Comp.",
                   company_code),
         scales=list(alternating=1), layout=c(5,2), key=key,
         par.settings = list(strip.background=list(col="#CBDDE6")),
         par.strip.text = list(font = 2),
         panel=function(x, y){
           n <- length(x)
           divisor <- 5
           cn <- c(1:(n/divisor))
           upper <- y[cn+n/divisor*0]
           lower <- y[cn+n/divisor*1]
           x <- x[cn]
           panel.polygon(c(x, rev(x)), c(upper, rev(lower)),
                         col = adjustcolor("yellow", alpha.f = 0.5), 
                         border = "grey")
           panel.lines(x, y[cn+n/divisor*2], col="purple")
           panel.points(x, y[cn+n/divisor*4], lwd=1, col="#00526D")
           panel.points(x, y[cn+n/divisor*3], lwd=1, pch=19, col="#00526D")
         }, ...)
}
plotData <- createPlotData(fitCCL353, data=lossData) 

plotDevBananas(Y_pred_cred025 + Y_pred_cred0975 + Y_pred_mean + 
                 loss_train + loss_test ~ dev | factor(accident_year),
               data=plotData, company_code = 353)

This looks pretty good!

We can see for the older years that the credible interval shrinks as the development years progress. The skewness of the log-normal distribution is clearly visible for the most recent accident years. In all cases the hold out observations are within the 95% prediction credible interval, often very close to the mean, apart from the years 1994 and 1995, where we observe a more unusual pattern in the data.

Finally, here is the distribution of the ‘population’ loss ratio \(\ell\) across all accident years and correlation parameter \(\rho\).

summary(elr <- exp(extract(fitCCL353, "log_elr")$log_elr))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.6033  0.6695  0.6740  0.6739  0.6786  0.7380
summary(rho <- extract(fitCCL353, "rho")$rho)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.86328  0.03551  0.18278  0.17077  0.31764  0.87749

The distribution of \(\rho\) is very interesting. Many of the traditional reserving methods, including the Mack-chain-ladder model assume that the accident years are independent. However, for this data set it is unlikely to be valid. Indeed, the probability of a positive correlation is 80%.

library(MASS); library(latex2exp)
truehist(rho, col="skyblue", xlab=TeX("$\\\\rho$")); 
abline(v=0, col=2)

Company 833

Let’s look at another company. Here is the data for company 833.

lossData <- createLossData(CASdata, company_code = 833, loss_type="incurred")
devPlot(loss_train + loss_test 
        ~ dev | factor(accident_year), 
        data=lossData, company_code = 833)

This data is not as well behaved as for company 353. The shape of the curves do vary quite a bit from one accident year to the next. Let’s find out how well our model predicts the hold out sample in this case.

stan_data <- createStanDataList(lossData)
fitCCL833 <- sampling(CCLmodel,
                      data=stan_data,
                      seed = 1234, iter = 4000,
                      control=list(adapt_delta = 0.99,
                                   max_treedepth = 10))
## Warning: There were 28 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: There were 1443 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems

Stan gives messages back, which I shall ignore for the time being and instead look at my ‘banana’ plot again.

plotData <- createPlotData(fitCCL833, lossData) 
plotDevBananas(Y_pred_cred025 + Y_pred_cred0975 + Y_pred_mean + 
                 loss_train + loss_test ~ dev | factor(accident_year),
               data=plotData, company_code = 833)

Interesting! Not too bad either.

Company 25275

Here is another company, whose data show some unusual patterns.

## Other companies
lossData <- createLossData(CASdata, company_code = 25275, loss_type="incurred")
devPlot(loss_train + loss_test 
        ~ dev | factor(accident_year), 
        data=lossData, company_code = 25275)

Let’s fit this data with the same model as well.

stan_data <- createStanDataList(lossData)
fitCCL25275 <-  sampling(CCLmodel,
                         data=stan_data,
                         seed = 1234, iter = 4000,
                         control=list(adapt_delta = 0.99,
                                      max_treedepth = 10))
## Warning: There were 132 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: There were 1738 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
## http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
## Warning: There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
## http://mc-stan.org/misc/warnings.html#bfmi-low
## Warning: Examine the pairs() plot to diagnose sampling problems
plotData <- createPlotData(fitCCL25275, lossData) 

plotDevBananas(Y_pred_cred025 + Y_pred_cred0975 + Y_pred_mean + 
                 loss_train + loss_test ~ dev | factor(accident_year),
               data=plotData, company_code = 25275,
               ylim=c(0, 500))

Well, what do you think about that?

References

Stochastic Loss Reserving Using Bayesian McMc Models. Glenn Meyers. CAS monograph series. Number 1. Casualty Actuarial Society. 2015

Session Info

sessionInfo()
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.1
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
## 
## other attached packages:
## [1] latex2exp_0.4.0      MASS_7.3-47          rstan_2.16.2        
## [4] StanHeaders_2.16.0-1 ggplot2_2.2.1        lattice_0.20-35     
## [7] data.table_1.10.4-3 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14     knitr_1.17       magrittr_1.5     munsell_0.4.3   
##  [5] colorspace_1.3-2 rlang_0.1.4      stringr_1.2.0    plyr_1.8.4      
##  [9] tools_3.4.2      parallel_3.4.2   grid_3.4.2       gtable_0.2.0    
## [13] htmltools_0.3.6  yaml_2.1.14      lazyeval_0.2.1   rprojroot_1.2   
## [17] digest_0.6.12    tibble_1.3.4     bookdown_0.5     gridExtra_2.3   
## [21] codetools_0.2-15 inline_0.3.14    evaluate_0.10.1  rmarkdown_1.8   
## [25] blogdown_0.3     stringi_1.1.6    compiler_3.4.2   scales_0.5.0    
## [29] backports_1.1.1  stats4_3.4.2

Related

comments powered by Disqus