Reserving based on log-incremental payments in R, part III

This is the third post about Christofides’ paper on Regression models based on log-incremental payments [1]. The first post covered the fundamentals of Christofides’ reserving model in sections A - F, the second focused on a more realistic example and model reduction of sections G - K. Today’s post will wrap up the paper with sections L - M and discuss data normalisation and claims inflation.

I will use the same triangle of incremental claims data as introduced in my previous post. The final model had three parameters for origin periods and two parameters for development periods. It is possible to reduce the model further as Christofides illustrates in section L onwards by using an inflation index to bring all claims payments to current value and a claims volume adjustment or weight for each origin period to normalise the triangle.

In his example Christofides uses claims volume adjustments for the origin years and an earning or inflation index for the different payment calendar years. The claims volume adjustments aims to normalise the triangle for similar exposures across origin periods, while the earnings index, which measures largely wages and other forms of compensations, is used as a first proxy for claims inflation. Note that the earnings index shows significant year on year changes from 5% to 9%. Barnett and Zehnwirth [2] would probably recommend to add further parameters for the calendar year effects to the model.

# Page D5.36
ClaimsVolume <- data.frame(origin=0:6, 
                           volume.index=c(1.43, 1.45, 1.52, 1.35, 1.29, 1.47, 1.91))
# Page D5.36
EarningIndex <- data.frame(cal=0:6, 
                           earning.index=c(1.55, 1.41, 1.3, 1.23, 1.13, 1.05, 1))
# Year on year changes
round((1-EarningIndex$earning.index[-1]/EarningIndex$earning.index[-7]),2)
## [1] 0.09 0.08 0.05 0.08 0.07 0.05
# [1] 0.09 0.08 0.05 0.08 0.07 0.05

dat <- merge(merge(dat, ClaimsVolume), EarningIndex)

# Normalise data for volume and earnings
dat$logvalue.ind.inf <- with(dat, log(value/volume.index*earning.index))
with(dat, interaction.plot(dev, origin, logvalue.ind.inf))
points(1+dat$dev, dat$logvalue.ind.inf, pch=16, cex=0.8)

Indeed, the interaction plot shows the various origin years now to be much more closely grouped. Only the single point of the last origin period stands out now. Christofides tests several models with different numbers of origin levels, but I am happy with the minimal model using only one parameter for the origin period, namely the intercept:

# Page D5.39
summary(Fit4 <- lm(logvalue.ind.inf ~  d + s, data=na.omit(dat)))
## 
## Call:
## lm(formula = logvalue.ind.inf ~ d + s, data = na.omit(dat))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.24591 -0.05066  0.01044  0.05202  0.26070 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.50073    0.05271 161.278  < 2e-16 ***
## d           -0.28598    0.06901  -4.144 0.000342 ***
## s           -0.48889    0.01725 -28.337  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1179 on 25 degrees of freedom
## Multiple R-squared:  0.9795, Adjusted R-squared:  0.9779 
## F-statistic: 597.2 on 2 and 25 DF,  p-value: < 2.2e-16

All coefficients are significant and I am left with a model of only three parameters. The residual plots suggest that my model is reasonable, only the QQ-plot shows that the distribution of the residuals is a little bit skewed.

op <- par(mfrow=c(2,2), mar=c(4,4,2,2))
attach(model.frame(Fit4))
with(na.omit(dat), 
     plot.default(rstandard(Fit4) ~ origin,
                  main="Residuals vs. origin years"))
abline(h=0, lty=2)

with(na.omit(dat), 
     plot.default(rstandard(Fit4) ~ dev,
                  main="Residuals vs. dev. years"))
abline(h=0, lty=2)
with(na.omit(dat), 
     plot.default(rstandard(Fit4) ~ cal,
                  main="Residuals vs. payments years"))
abline(h=0, lty=2)
plot.default(rstandard(Fit4) ~ dat$logvalue,
             main="Residuals vs. fitted")
abline(h=0, lty=2)

detach(model.frame(Fit4))
par(op)
op <- par(mfrow=c(2,2),oma = c(0, 0, 3, 0))
plot(Fit4)

par(op)
I am happy with the model. To forecast the future claims payments I prepare a data frame with the predictors for those years.

# Tail of 6 more years over the observed period
tail.years <- 6
# Create a data frame for the future periods
fdat <- data.frame(
  origin = rep(0:(m-1), n+tail.years),
  dev = rep(0:(n+tail.years-1), each=m)
)

fdat <- within(fdat,{
  d <- ifelse(dev < 1, 1, 0)
  s <- ifelse(dev < 1, 0, dev)
  cal <- origin + dev
  a6 <- ifelse(origin == 6, 1, 0)
}) 

# New data
ND <- subset(fdat, cal>6)
ND <- merge(ND, ClaimsVolume)

Next I update my prediction function from last week with new parameters for claims inflation and indexation. The volume index and claims inflation parameters will be used to scale the output back to the original units and to inflated the future payments by a constant rate. Of course the indexation is a model in itself with uncertainty, which can be considered as part of the model error. Note, that I scale the data back to my volume index, but not earnings/inflation.

log.incr.predict <- function(
    model, # lm output
    newdata, # same argument as in predict
    claims.inflation=0, # Assumed inflation (scalar)
    volume.index=NULL, # name of the v.i. column in newdata  
    origin.var="origin", # name of the origin column in newdata
    dev.var="dev", # name of the dev column in newdata
    cal.var="cal" # name of the cal. period col. in newdata
){
  origin <- newdata[[origin.var]] 
  dev <- newdata[[dev.var]]
  cal <- newdata[[cal.var]]
  if(is.null(volume.index)){
    index <- 1
  }else{
    index <- newdata[[volume.index]]
  }
  Pred <- predict(model, newdata=newdata, se.fit=TRUE)
  
  Y <- Pred$fit
  VarY <- Pred$se.fit^2 + Pred$residual.scale^2 
  P <- exp(Y + VarY/2)
  P <- P*index*(1 + claims.inflation)^(cal - min(cal) + 1)
  VarP <-  P^2*(exp(VarY)-1)
  seP <- sqrt(VarP)
  
  ## Recreate formula to derive the model.frame and future design matrix   
  model.formula <- as.formula(paste("~", as.character(formula(model)[3])))
  ## See also package formula.tool
  
  mframe <- model.frame(model.formula, data=newdata)
  fdm <- model.matrix(model.formula, data=newdata)
  varcovar <- fdm %*% vcov(model) %*% t(fdm)
  
  sigma <- summary(model)$sigma
  dsigma2 = diag(sigma^2, nrow = dim(varcovar)[1])
  
  Total.SE <- sqrt( t(P) %*% (exp(dsigma2 + varcovar) - 1) %*% P )
  Total.Reserve <- sum(P)
  
  # Prepare output
  Incr=data.frame(origin, dev, Y, VarY, P, seP, CV=seP/P)
  out <- list(Forecast=Incr[order(newdata[[origin.var]]),],              
              Totals=data.frame(Total.Reserve, 
                                Total.SE=Total.SE, 
                                CV=Total.SE/Total.Reserve))
  return(out)
}

With my new prediction function it is easy to test different scenarios of claims inflations and their potential impact on the overall reserve requirements. Following the paper I will assume claims inflation of 7.5%. This gives me the following future payements triangle and reserves.

FM4 <- log.incr.predict(Fit4, ND,
                        claims.inflation=0.075, 
                        volume.index="volume.index")

# Page D5.41
round(xtabs(P ~ origin + dev, data=FM4$Forecast),0)
##       dev
## origin    1    2    3    4    5    6    7    8    9   10   11   12
##      0    0    0    0    0    0    0  249  165  109   72   47   31
##      1    0    0    0    0    0  412  272  179  118   78   52   34
##      2    0    0    0    0  703  464  306  202  134   88   58   39
##      3    0    0    0 1018  671  443  292  193  127   84   56   37
##      4    0    0 1585 1045  690  455  300  198  131   87   57   38
##      5    0 2945 1942 1280  845  557  368  243  160  106   70   46
##      6 6241 4114 2712 1788 1180  778  514  339  224  148   98   65
round(xtabs(seP ~ origin + dev, data=FM4$Forecast),0)
##       dev
## origin   1   2   3   4   5   6   7   8   9  10  11  12
##      0   0   0   0   0   0   0  36  25  18  13   9   6
##      1   0   0   0   0   0  55  39  27  19  14  10   7
##      2   0   0   0   0  90  62  44  31  22  16  11   8
##      3   0   0   0 125  86  59  42  29  21  15  11   7
##      4   0   0 192 129  88  61  43  30  21  15  11   8
##      5   0 358 235 158 108  75  52  37  26  19  13   9
##      6 777 500 329 220 151 105  73  52  37  26  19  13
FM4$Totals
##   Total.Reserve Total.SE         CV
## 1      38083.25 1724.987 0.04529515

Compared to the results of the previous week the overall reserves increased by £5,000, while the overall standard error has been reduced due to the smaller number of parameters. Chirstofides explains that the big increase in the overall reserves is largely driven by the most recent origin year, for which I have only one data point. From the residual plot I can see that the standardised residual for this point is about -1. This is not statistical significant, but I noticed that the highest value of the original triangle in development period 0 has become the lowest after the volume adjustments, see also the interaction plot at the top.

By putting the last origin period back into the model I get an output which is more in line with the result of last week.

# Page D5.42
log.incr.predict(lm(logvalue.ind.inf ~ a6 + d + s, dat), ND,
                 claims.inflation=0.075, 
                 volume.index="volume.index")$Totals
##   Total.Reserve Total.SE         CV
## 1      35901.59  2609.29 0.07267895

Conclusions

Reserving is always mixture of art and science, a combination of sound data analysis with expert judgement. A statistical data analysis can help to understand how much expert judgement is required. As Christofides points out in his closing remarks of section L, it is desirable to embed reserving into a Bayesian framework. Wayne Zhang has done some great research in this area. Yet, simple linear models are powerful tools to investigate the data. The model presented here can particularly help to investigate trends in the calendar/payement year direction. Those trend changes can highlight movements in claims inflation, or indeed changes in the claims settling process. Neither of those factors should be ignored.

The assumption that claims follow a log-normal distribution feels intuitively reasonable to me. Yet, the occasional negative incremental claim needs to be carefully considered. It certainly is a prompt to check if the assumption of log-normal distributed incremental claims is reasonable. Packages like car (Companion to Applied Regression) offer lots of diagnostic tools. Any pointers to how I could use those tools effectively will be much appreciated.

Barnett and Zehnwirth present a further idea to test these models by reducing the data for the fitting exercise and testing how stable the coefficients and predictions of the model are, see section 3.3 and table 3.3 in [2].

As usual the R code of this post is available as a gist on GitHub. The code on Github contains more details than presented here. It also includes examples from the Barnett and Zehnwirth paper mentioned above. Is there a demand to include the functions presented here into the ChainLadder package? Please get in touch.

References

[1] Stavros Christofides. Regression models based on log-incremental payments. Claims Reserving Manual. Volume 2 D5. September 1997

[2] Glen Barnett and Ben Zehnwirth. Best estimates for reserves. Proceedings of the CAS, LXXXVII(167), November 2000.

Session Info

R version 2.15.2 Patched (2013-01-01 r61512)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_2.15.2    

Citation

For attribution, please cite this work as:

Markus Gesmann (Jan 22, 2013) Reserving based on log-incremental payments in R, part III. Retrieved from https://magesblog.com/post/2013-01-22-reserving-based-on-log-incremental_22/

BibTeX citation:

@misc{ 2013-reserving-based-on-log-incremental-payments-in-r-part-iii,
 author = { Markus Gesmann },
 title = { Reserving based on log-incremental payments in R, part III },
 url = { https://magesblog.com/post/2013-01-22-reserving-based-on-log-incremental_22/ },
 year = { 2013 }
 updated = { Jan 22, 2013 }
}

Related