APPLICATION OF CHAIN LADDER MODELS IN INSURANCE CLAIMS RESERVING

SUBMITTED BY:

SIBUSISO SHABANGU

149745

A RESERCH PROJECT PAPER SUBMITTED TO THE DEPARTMENT OF STATISTICS AND DEMOGRAPHY IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE BACHELOR OF ARTS IN SOCIAL SCIENCE

APRIL 2018

DECLARATION

I, Sibusiso Shabangu declare that this thesis entitled Application of chain ladder models in insurance claims reserving is my own, unaided work. It is being submitted for the Bachelor of Arts in Social Science at the University of Swaziland, It complies with the requirements of the University and meets the acceptable standards with respect to the originality and quality. It has not been submitted before for any degree or any other examination in any University.

………………………….

Signature

………………………….

Date

ABSTRACT

With our country, Swaziland, set to attain first world status by year 2022, it had led in the rise of insurance companies in the country over the decade. It is therefore important to project the future payment of losses which can be referred to as the claims reserves.

Using the chain ladder method to estimate these reserves is the most commonly used technique in actuarial science. There are several chain ladder models that will be considered in this study to make estimates of the future claims. This methods will use the run-off triangles, which are a summarized version of variables used in insurance.

These run-off triangle depend on historical data which is used to predict future liabilities. Historical data include past and current claims amount, number of settled claims, the costs of past and current losses and the lag time which can be described as the time between a claim is made and full settlement.

This study will use the number claims as the benchmark of prediction of the number of outstanding claims in the future. Stochastic models of the chain ladder will be used and the reserves will be compared. These are the basic chain ladder algorithm, chain ladder in linear regression, the Mack chain ladder, the Bootstrap chain ladder, Poisson regression model as well as the over-dispersion Poisson model.

ACKNOWLEDGMENTS

I would like to express the deepest appreciation to my Supervisor Prof E. Zwane who has played a major role in the writing and completion of this paper, without his guidance and continuous help this project would not have been possible.

I would like to thank Mr S. Ginindza, head of pricing in Metropolitan Life Insurance for his assistance and persistent assistance, going the extra mile for me and fitting me in his busy schedule just to make sure that this project is accomplished.

Special thanks goes to Miss F. Madlala who has always had her office door open throughout the course of my project and providing me with the professional guidance in my data quest.

In addition, a thank you to my colleagues who assisted me, my thanks are forwarded to my friends in the Computer Science department, who helped me in my project, who had a great impact in the coding and analysis of my data and always believed I could do it.

Dedication

This project is dedicated to my mother, Miss Q. Ginindza, as an appreciation for her unending support throughout my life.

Chapter 1

1.1 Introduction

Insurance is the modern day investment, it is means on indemnity against a future occurrence of an unexpected event, by definition. These unforeseen events have major negative impacts on insurance companies as claims start piling up and it is the duty of the insurer to see that these claims are settled. This brings about the intent of the study which is to calculate reserve estimates by using chain ladder models. It is actually an attempt to quantify these uncertainty and estimate the distribution implicit to the reserves of a specific portfolio.

This is an important aspect of an insurance company, it is essential that the claims reserves are carefully calculated if not the company the company holds unrequired excess capital with the intention to be ready for claims while those resource could have been used to improve services or for other purposes completely, this happens when the claims reserves have been overly estimated. An example of an investment with a higher risk, that automatically implies higher returns, since the claims reserves normally form a reasonable large proportion of the company’s total holdings, a mere miscalculation can result in a significant amount of money.

The stages of an insurance claim move from Incurred But Not Reported (IBNR) to Reported But Not Settled (RBNS) then finally to Settlement. It is important to note that an insurance contract is an agreement between the insurance company and the policy holder. If ever an unfavourable (specified) event as per contract occurs the client is to report to the insurer, who verifies that the claim made satisfies the requirements and conditions stated in the agreement. Then the settlement process begins and the client gets financial compensation as per contract agreement. The settlement process lasts until the client has been completely compensated, that is no further claims in the cover.

left799369

018986549720501898654152900180340327660018986524955501612901714500199390923925180340

1 2 3 4 5 6 7 time

Figure 1 A timeline representation of how claims occur.

In figure1 is interpreted as follows:

The time differential between 1 and 2 is termed as the reporting delay, then it is processed before payments are made that is from 2-3, it could take several years for the claim to be settled, that is, payments not all made at once, in some instances made in instalments over a certain period of time before full settlement, then the claim is closed, popular in liability insurance. It sometimes happen that a closed case is reopened in the basis of new and unforeseen developments or if an error had occurred.

In Swaziland insurance is still at a development stage, Musango ; Sibandze (2011) to ensure the insurance companies’ financial stability and their profit and loss balance, the companies should have a reliable estimate of their claims reserves to avoid losses and to provide good service to the Swazi in a way gaining their trust as insurance is a new service which are trying to gradually adjust in

There are several stochastic techniques that are useful in claims reservation. These are the Bootstrap, Mack Model, Markov chains, Poisson process, Chain-Ladder, Bayesian method and the diffusion approximation (Brownian motion). The Cramer-Lundberg and the Le vy models are useful in stochastic processes in modelling the Incurred But Not Reported (IBNR) which is the claims’ reserves. In property and casualty insurance, the loss payment delay related to claim reporting and handling can be of great importance.

The chain-ladder technique is a deterministic approach of calculating the claims amounts has been used in the calculation of claim reserves for quite a long time, it is greatly used because it is distribution free and this method still serves as the backbone of calculating the IBNR. It uses historical data (available claims at hand) to predict or rather estimate the distribution of the incurred losses this is with respect to the insurer’s information about the paid and reported amount or respectively.

The average ratios weighted by the claims size helps project future claims through the run-off triangles which present claims over a period of time. This method of estimation was developed by the introduction of the bootstrap method.

This method was means to help derive analytic functions from the estimates which was difficult to derive from the chain ladder method, bootstrap techniques adopt the assumptions of the chain ladder method, which will be discussed later in this paper, these assumptions have been deduced to give a measure of the best precision estimate without possible changing the estimate itself.

Another subsequence to the traditional chain ladder method is the Bayesian technique, the results portrayed by the chain ladder method show no in depth of the significantly high fluctuations that come about due to the large claim. So the Bayesian framework accommodates such extreme values and fluctuating values using probability distributions.

This method utilise Markov chains to analyse the chain ladder method outputs by transforming the multiplicative chain ladder model into a linear model for this is affiliated with the two-way analysis of variance and logarithms, which is possible through the calculation of the likelihood function for the parameters given in the observed data.

The chain ladder does not have provision for assessing checks and is unable to estimate the confidence intervals of the estimated claims reserves, the Mack model is then used to measure the extent of the variability of the chain ladder reserves estimates and then produce a confidence interval using the information obtained from the statistical reserving methods.

The Over-Dispersed Poisson model, it uses the information about the number of incurred clients in a specific time (accident year) period and developing year as inputs to find the mean and standard deviation of payments with respect to the accident, reporting time. They assume that the increments follow a Poisson distribution. Then the claim frequency can be estimated by accident and reporting year, a test to check the equality of frequencies has to be done, to avoid underestimation of future liabilities, utilising the generalized linear models’ method of quantifying uncertainty.

Log-normal models can estimate the claims reserves by assuming that claim increments follow a log normal distribution, to test the normality of the errors for the log normal, the Shapiro’s normality test to test normality can be used. The mean and standard deviation of the total claim amount are estimated as sums of payments in developing years, using the reporting year as a starting year.

The Poisson model is mainly used to for claim counts and it leads to the same reserves estimates as the chain ladder model in the basis of the chain ladder reserves being the maximum likelihood reserves for the Poisson model.

Now to find out the credibility of the estimates made, the mean square error of prediction (MSEP) is used. It measures the quality of the claims reserves estimates that are expressed terms of variance.

1.2 Problem Statement

Inaccurate calculation of the claims reserves affects both the insurer and the client. It brings about misleading management decisions, this means the company will be trying to fix problems that are non-existent or ignore critical problems meaning capital allocations are misdirected reducing profits.

We can consider two cases, Sydnor (2010) if the IBNR reserves are overestimated then that means that the company will reserve funds for incurred claims yet the overall claims do not come close to what they had expected, this implies that the company holds excess capital, termed as overpricing, this restricts company growth, optimality of production is not reached by the company.

In the case of underestimation, where the estimated IBNR reserves do not suffice the claims made, the inadequacy to cover the liability for the losses could greatly affect the financial condition of company, that is, the company experiences a shortfall. This problem leads to reduced profits and capital, operation losses therefore restrict the company from developing. On the other side the company loses creditability as the insurer is not able to deliver what it promised in the cover.

1.3 Objectives

The main objective was to make estimates of the claims reserves based on available claims settlement using chain ladder models.

The specific objectives were to:

To identify the range of the predicted claims reserves and determine those with great deviations from the mean of the reserves.

To analyse and identify which chain ladder model was reliable based on the mean square of error.

1.4 Significance of Study

The calculation and analysis of the claims reserves helps the insurer make good managerial decisions as accurate IBNR reserves provide a visual representation of reality, minimizing overlooked opportunities or existent problems as the company could be misguided due to miscalculation of the claims reserves because of lack of understanding of reality. Inaccurate reserves also results in economic depression which impact the unemployment rates leading to social riots and poor medical maintenance.

1.5 Research Questions

1. What were the estimates of the claims reserves?

2. Which approach had a better prediction estimate of the claims reserves?

3. Were chain ladder methods reliable of estimation of claims reserves?

1.6 Limitations of the Study

Availability of data could be the outstanding limitation of the study, as data is not easily accessible due to client privacy in policies.

Data analysis for most insurance company is not done in the country, there are a few experts on the field of study as most of claims reserves estimation is done in South Africa.

Stochastic reserving requires reliable data for it to construct a stochastic model so poor quality of historical information on available claims, missing data results in inaccurate estimates of the claims reserves, therefore misinterpretations implying misleading analysis of estimated claims reserves

Chapter 2

2.1 Literature Review

The existence of chain ladder can be traced to the mid – 60’s and the name refers to the chaining of a sequence of age-to-age development factors into a ladder of factors by which one can climb from the observations to date to the predicted ultimate claim cost. The chain-ladder was originally deterministic, but in order to assess the variability of the estimate it has been developed into a stochastic method.

Taylor (2000) presents different derivations of the chain-ladder procedure; one of them is deterministic while another one is based on the assumption that the incremental observations are Poisson distributed. Verrall (2000) provides several models which under maximum likelihood estimation reproduce the chain-ladder reserve estimate.

In the recent past a large variety of methods of loss reserving based on run-off triangles have been proposed. In each of these methods, it is assumed that all claims are settled within a fixed number of development years and that the development of incremental or cumulative losses from the same number of accident years is known up to the present calendar year such that the losses can be represented in a run off triangle. The most venerable and most famous of these methods are certainly the chain-ladder method and the Bornhuetter-Ferguson method.

The basic idea of the chain-ladder method was already known to Tarbell (1934) while the Bornhuetter-Ferguson method was first described almost four decades later in the paper by Bornhuetter and Ferguson (1972) in his paper he recognized outside information into the formulas to better predict the ultimate losses. Meyers and Verral (2007) combined the chain ladder and the Bornhuetter and Ferguson approach with a Bayesian methodology this was a significant accomplishment in that it made predictions of the distribution of future losses, and successfully validated these predictions on subsequent reported losses.

In 2007 and 2008, the General Insurance Reserving Oversight Committee, under the Institute of Actuaries in the U.K commissioned a study to test the model proposed by England and Verrall (2002). The findings of the study were that even under ideal conditions the probabilities of extreme results could be under-stated using the Mack and the over dispersed poison bootstrap models.

The detailed model proposed by Meyer (2007) was also tested with UK motor data they fitted the model on the data excluding that of the most recent diagonal, and then simulated distributions of the next diagonal to be compared with the actual diagonal. The model allowed for the error in parameter selection to help overcome some of the underestimation of risk seen in the Mack (2000 and 2007) and Over Dispersed Poisson bootstrap models.

However, this was not a guarantee of correctly predicting the underlying distribution and the variability in the reserves was still evident. Meyer et al (2011) performed a retrospective testing of stochastic loss reserves on the over-dispersed poison bootstrap model as well as a hierarchical Bayesian model, using commercial auto liability data from U.S. annual statements for reserves as of December 2007.

The first was to test the modelled distribution of each projected incremental loss for a single insurer. The second was to test the modelled distribution of the total reserve for many insurers. The findings were that in case there are environmental changes that cannot be identified by the model under study then one cannot solely rely on stochastic loss reserve models to manage the reserve risk and it would be desirable to develop other risk management strategies to deal with the unforeseen environmental changes.

Stephen P. D’Arcy(2008) traditional loss reserving techniques measure variability based on a single factor on historical loss development factors and loss reserves ranges. This limits the calculated variability to what occurred during the experience period. However, there are multiple factors that impact the variability of loss development and they are not always stationary. Inflation is a key element in loss development.

The traditional approach for determining loss reserve variability is reasonable as long as inflation is relatively constant. If inflation and its volatility, were to change, actual loss reserve variability would turn out to be higher or lower than expected based on the traditional approaches. Mack and bootstrap methods use only information from the historical loss development patterns and assume future development would follow those patterns.

Simulation method allow for customized inputs for simulating link ratios, but an increase or decrease in the mean or the standard deviation compared to that obtained from the historical data is difficult to justify, or properly quantify, on a one-factor basis. The objective was to accurately estimate the inflation variability and the residual reserve variability using a two factor model.

The model accommodated shifts in inflation as well as residual standard errors. The greater predictive power in calculating loss reserve variability by using multiple uncorrelated factors has been recognized by the increasingly popular use of statistical modelling techniques in loss reserving. The statistical nature of the modelling frame-work also allows separation of parameter uncertainty and process variability Barnett and Zehnwirth (1999).

These parameters are not easy to extract from the data and sometimes their introduction to the statistical model ends up distorting the framework hence producing an incorrect distribution of the reserves, hence statistical modelling techniques limit the calculated variability to what occurred during the experience period to a certain degree.

To overcome such limitations one would need to accurately extract information from the data trends, one would also need to have some flexibility in introducing variability that is different from what occurred during the experience period, sound actuarial judgment and ability to produce a reasonable distribution of the reserves.

One of the major advantages of the classical IBNR claims reserving methods, like the chain-ladder, Cape Cod and Bornhuetter-Ferguson methods or credibility like methods by Mack (2000), is their distribution-free validity. However, the insurance industry is slowly changing and the complexity in term of framework is increasing becoming evident, there is an accrued interest to know more about the standard deviation and the higher percentile values.

Therefore, attempts to model adequately not only the mean of the IBNR claims reserves but also its full distribution have the potential to retain more attention from both a theoretical and practical viewpoint. Early developments in this area include work by B•uhlmann et al.(1980), and Hertig (1985). The present approach is inspired from Mack(1997), which proposes distribution dependent IBNR claims reserving methods, in particular a cross-classified parametric method of multiplicative type.

Studies on the statistical basis of the chain-ladder method, with a focus on the distributional assumptions of the aggregate data and the use of generalized linear models have been advanced and the recent works focused on, over-dispersed Poisson (ODP) model Renshaw and Verrall (1998), negative binomial model Verrall (2000), Mack’s model. Mack (1993), and log-normal model Kremer (1982).

In recent years, the understanding of the chain-ladder technique has been further developed. Kuang et al. (2008) extends the chain-ladder model with a calendar effect and uses time-series analysis to forecast this e ect. Verrall et al. (2010) and Martinez Miranda et al. (2011, 2012) proposes a double chain-ladder method that simultaneously uses a triangle of paid losses and a triangle of incurred claim counts. Martinez-Miranda et al. (2013) reformulates the triangular data as a histogram and proposes a continuous chain-ladder model through the use of a kernel smoother.

Some studies done have also addressed the limitations of the chain ladder method, notably, over-parameterization of the chain-ladder method Wright (1990), unstable predictions for recent accident years Bornhuetter and Ferguson (1972), problems with the presence of zero or negative cells in run-off triangles Kunkler (2004), difficulties in separating assessment of RBNS and IBNR claims Schnieper (1991), Liu and Verrall (2009), difficulties in the simultaneous use of incurred and paid claims Quarg and Mack (2008).

At the heart of the limitations of such models is the small sample size and the inability to use any information about the individual claims. These issues are derived from the inherent nature of the use of aggregate data and thus generally cannot be addressed by any adjustments within the framework of chain ladder models. The observed data in a runoff triangle is typically small, leading to a prediction error that and could be very large England and Verrall (2002).

A run-off triangle is essentially a summary of the underlying individual ideally homogeneous claims data. If claims are believed to be heterogeneous, then they are often segmented by certain characteristics usually discrete and compiled into multiple triangles. In this respect, individual claim level information is used to segment the data before the modelling phase.

Nevertheless, under circumstances when the heterogeneity of claims is due to many characteristics including continuous characteristics, or the characteristics that contribute to the heterogeneity is not clear, or the number of claims in the portfolio is big enough, the segmentation may not be practical and the incorporation of individual claim level information would be desirable in the reserving model.

England and Verrall (2002) questioned the continuing use of aggregate data when the underlying extensive micro-level information is available and the computation is feasible. Parodi (2012) points out the misalignment of rate-making and reserving, they both value the same risk but the former is based on individual data whereas the latter is based on aggregate data.

Run-off triangles are used in general insurance to forecast future claim numbers and amounts. Usually run-off triangles arise in non-life insurance where it may take some time to establish the full extent of the claim before the final payment can be made. Run-off triangles attribute the claims to the year in which the accident occurred. The idea is to estimate how much of each class of business an insurance company is liable to pay in claims so that it can make adequate provisions.

It is clear that although the exact figure for total claims is unknown because of delays in the claim settlements, provisions can be made for future claims settlements with as much confidence and accuracy as possible. In any claim event there may be delays in between the occurrence of the claim event and the date on which the claim is reported to the insurer (reporting delay) and another delay between the reporting date and the date on which the claim loss is finally settled (settlement delay).

The first step in creating the claims loss settlement runoff triangle is to group the claims loss settlement amounts by the year in which the associated claims events occurred, this is called the claims occurrence year. Typically, claims losses settled for each claims occurrence year are not paid on one date but rather over a number of years (or time periods). This leads to development periods or delay lags measured from the accident date

Chapter 3

3.1 Introduction

This chapter serves to provide the methods and instruments used in the estimation of the claims reserves using the concepts of the chain ladder techniques, these include deterministic, stochastic and multivariate chain ladder models these models are the traditional chain ladder algorithm, the over-dispersed Poisson model, the bootstrap model, the Mack model and the log normal model.

3.2 Research design and Analysis

Since stochastic methods are deterministic by definition, a deterministic research design with time homogeneous data type was used in this study. Data analysis was carried out using a statistical software R and making use of the chain ladder package to model run off triangles and make predictions of claims reserves and finding their mean square error concurrently.

3.3 Data source and description

Secondary data on claims for a period of 10 years, from year 2008-2017 was be obtained from a known insurance company which is located in Mbabane. The data consisted of claim settlements over the 10 year period .The data is obtained from this company since it one of the best insurances in the country according to African Advice (2017) so data quality and reliability should at a professional scale.

The assumption for majority of the existing methods for claims reserve estimation is that the data is presented in the form of a run-off triangle. This presentation places the data into a period of origin and development period. The former i.e. the period of origin relates to the year in which the claims were reported or particularly when the policy relating the claim was underwritten. The period of development on the other hand indicates the entire period from the period or origin within which the claims were paid, incurred or reported.

These run-off triangles can also be constructed to contain incremental number of incurred (settled) claims. For each claims occurrence year the incremental claims loss settled for a particular development year is the amount settled in that development year. The next step employed is to develop cumulative incurred or paid claims triangles by accident year, that is the total amount settled up to that development year. Cells are summed forward to obtain cumulative incurred and paid claims. The result is a progression of payments toward ultimate pay-out for a given accident year this is the development stage. Bornhuetter ; Fergurson (1972) made an assumption that in this run-off triangle we consider an nxn matrix under full development.

3.4 Procedure

3.4.1 The Chain Ladder algorithm preview

The chain ladder method is the most popular method for estimating outstanding claims reserves. The main reason for this is its simplicity and the fact that it is distribution free. In most cases the results from the straight forward basic chain ladder method are used as benchmarks. We will relax the assumptions to this impression because it is clear that the chain ladder algorithm has far-reaching implications.

These implications also allow it to measure the variability of chain ladder reserve estimates and with the help of this measure it is possible to construct a confidence interval for the estimated ultimate claims amount and for the estimated reserves.

Traditional methodologies such as the chain ladder, though not necessarily stochastic based are robust and when used as intended tend to be a holistic approach to estimating reserves. Using such approaches may lead one to develop a gut feel for the uncertainty in his or her estimates, but may not necessarily be able to quantify that gut feel.

Conversely more modern stochastic methods bring with them quantification of the volatility of the forecasts, but usually conditioned on a specific set of assumptions. The chain-ladder method assumes that factors such as inflation, changes in portfolio mix

Step 1

Using the available claims, run-off triangles were created from the claim loss settlement this is done by grouping the claim loss settlement amounts by the year in which the corresponding accident occurred, the year in which an accident occurred is referred to as an accident year.

Table 1 Grouped claim loss settlements with respect to accident year

Accident year iClaim loss settlement n 0

1

.

.

.

I Cij .

.

.

.

CIjTable 1 shows the total number of settlements the insurance company made in the I year period, where j=0,1,…..n

In this study J= 10

0: denotes year the accident year

J: denotes the last year of final claim loss settlements

Cij: denotes the number of settled claims in year iCIj: denotes the number of settled claims in the last year

Step 2

Since the claim settlement period is a rather long process at times and due to reporting delay and reopening of claims, claims may take up to several years before the reach final settlement, then now the there is an increment on the claim losses settlement. The timeframe between the reported time and final settlement (length of pay-off period) is called the development period.

This the expansion of table one, showing the years in which the claims were settled, this means the settled claims will reduce annually as claims will be settled with respect to their accidental year. Such that Xio?Xik , where t=1, 2,…j the Cits are called the incremental values.

The incremental claims loss settlement can be represented in a run-off triangle form and using run-off triangles to estimate future claims settlement, it should be assumed that the development of settlement claims have a similar distribution for every accident year and it is also assumed that this pattern will hold for the future payments (Schmidt,2006)

Table 2 run-off triangle incremental claim loss settlement

Development year j (12 months period)

Accident year i 0 1 2 3 …. n-1 n 0

1

2

3

.

.

.

i-1 I

X00 X01 X02 X03 … X0,n-1 X0,n

X10 X11 X12 X13 … X1,n-1

X20 X21 X22 X23 …

X30 X31 X32 –

. . . .

. . . .

. . . .

XI-1,0 XI-1,1 –

XI0 –

Table 2 shows the amount of claims that are settled from the accident year to till final settlement of all claims. The above table shows for an accident year i Xij were settled in development year j. Taking X01 for example, an amount of X claims were settled a year after the accidental year. It is important to note that the sum of the incremental values in each development year give the total claims settled at the final period of the claim cycle being modelled. We have that

Cij=k=0jXik (3.1)

Cij is basically the ultimate claims loss for claims in the accident year i.

The (-) are the predictions of the future incremental claims loss settlement. These are used in the cumulative claims losses to give an estimation of how the total settled claims really look like given the past settled claims.

Step 3

Using tables 1 and 2, the cumulative run-off can be deduced which will give the total settled claims and the amount of claims settled in a certain period of time. This is done by summing the initial settled claims on the accident year with the incremental value at each development period. According to equation (3.1) incremental values should give the ultimate claims losses up to the claims settlement cycle.

Table 3 Claims loss settled for each accident year.

Development year jAccident year i 0 1 2 3 …. n-1 n 0

1

2

3

.

.

.

n-1 n

C00 C01 C02 C03 … C0,n-1 C0n

C10 C11 C12 C13 … C1,n-1 *

C20 C21 C22 C23 … *

C30 C31 C32 *

. . . .

. . . .

. . . .

Cn-1,0 Cn-1,1 *

Cn0 *

The values denoted by * are the future predictions of the settled claims in year i in development year j.

Ci0 = Xi0 (3.1)

The values of the cumulative claims loss settlements are given by

Cij = j=0JXij (3.2)

Taking an example of the value cumulative claims loss settlementsC32,

C32 = j=03X3j = X30+ X31+ X32+ X33

This is the data structure of the study, using the above steps it can also model reported claims according to reporting delay with respect to the accident year as it can also be set up in this format. It simplifies the prediction process.

Using table 3 an estimate for the claims reserves (future claim loss settlements amounts) was estimated using R statistical packages. The assumptions in this method tend to be flexible as they are used by most of the models and methods which succeed it.

Assumptions

There exist development factors f0,…,fj-1;0 for all 0?i?I and all0?j?J we have that

ECijCi0…,Ci,j-1=ECijCi,j-1= fj-1· Ci,j-1 (3.3)

and different accidental years are independent.

By assuming independence of the accident years, this means that all accounting year impacts on data are eliminated.

It can also be assumed that Ci0,Ci1…,Ci,j-1,… form Markov chains and that

Cij·l=0j-1fl-1 (3.4)

forms a martingale j?0 .

A martingale is a sequence of random variable in a stochastic process for which at a particular time in the observed sequence, the expectation of the n+1 value in the sequence is equal to the present value observed.

Letting DI={Cij;i+j?I, 0?j?J} be the set of observation in the upper triangle, these are the known values in the run-off triangle. By the assumption made in (3.3) we have that for all

I-J+1?i?I

ECiJDI=ECiJCi,I-i= Ci,I-1·fI-i···fJ-1 (3.5)

ECiJCi,I-i = ECiJDI= fJ-1 ·ECi,J-1|DI (3.6)

Using numerical methods to obtain the claim which is reached at the diagonal i+j=I by iterations.

This give the recursive algorithm for estimating the ultimate claim given the observation DINow for known chain ladder factors fj we can estimate the future claim loss settlements based on DIwith respect to the accident year by the using

ECiJDI- Ci,I-i = Ci,I-i ·(fI-i···fJ-1-1) (3.7)

We can define

j*i=min{J,I-i}

i *j=I-j

These are basically the last observations in the upper triangle on the diagonals, with reference to Table 1.3 these areCI0,CI-1,1,C23,C1,J-1and C0J.

The factor estimates deduced by

fj-1 =k=0i *jCkjk=0i *jCk,j-1 (3.8)

The chain ladder estimator then becomes

Cij=ECiJDI = Ci,I-i · fI-1···fj-1 (3.9)

The claims reserves estimate can be deduced from the chain ladder estimate

Reserves at the ithaccident year = Cij- Ci,I-i3.4.2 Over-Dispersed Poisson Model preview

The over-dispersed Poisson distribution differs from the Poisson distribution in that the variance is not equal to the mean, but, instead, it is proportional to the mean. In claims reserving, the over-dispersed Poisson model assumes that the incremental claims Xw,d are distributed as independent over-dispersed Poisson random variables, with mean and variance.

The assumptions of this model are similar to those in the Bayesian claims reserving models presented in England ; Verrall (2002), the assumption is that the parameters are modelled through prior distributions and conditional on these parameters are the incremental claims Xw,d have independence over dispersed Poisson distributions for accident years w and development years d. The final development year is given by I and the observations at time I are given in the upper run-off triangle.

DI={Xw,d:w+d?I} (3.10)

This model is primarily used due to its ability to give the same outstanding claim estimates as the chain ladder technique. For this condition to be satisfied the mean is modelled using

log(mi,j)=c+?i,+?j (3.11)

Then the Over-dispersed Poisson provides the same reserves estimates as the chain ladder technique. There is no particular shape of the run-off patterns due to the one parameter for each column and is consistent with the chain ladder model.

3.4.3 Bootstrap Model preview

The bootstrap technique is a particular resampling method used to estimate, in a consistent way, the variability of a parameter. This resampling method replaces theoretical deductions in statistical analysis by repeatedly resampling the original data or simulated data and making inferences from the resamples. By the England ; Verrall (2002) it can be performed in two stages.

Stage1: A quasi-Poisson model is applied to the claims triangle to forecast future payments. From this we calculate the scaled-Pearson residuals, assuming that they are approximately independent and identical distributed. These residuals are re-sampled with replacement many times to generate bootstrapped (pseudo) triangles and to forecast future claims payments to estimate the parameter error. The predictions of the quasi-Poisson model are the same as those from the chain-ladder method, hence we use the faster algorithm.

Stage 2: simulate the process error with the bootstrap value as the mean and an assumed process distribution, here a quasi-Poisson. The set of reserves obtained in this way forms the predictive distribution, from which summary statistics such as mean, prediction error or quartiles can be derived. In a Poisson regression, the Pearson’s residuals are:

?i,j= Yi,j- Yi,jVar (Yi,j) (3.12)

In order to have an appropriate estimator of the variance we have to adjust the residuals for the number of regression parameters k and observations n, Fisher (1973);

?i,j= (nn-k) Yi,j- Yi,jYi,j (3.13)

The strategy is to bootstrap among those residuals to get a sample?i,jb, and to generate a pseudo triangle

Yi,jb = Yi,j+ Yi,j ?i,jb (3.14)

Then we can use standard techniques to complete the triangle, and extrapolate the lower part. There are two kinds of uncertainty, uncertainty in the estimation of the model, and uncertainty in the process of future payments. Mack (2000) If we use the predictions from a quasi-Poisson model in this new triangle, we will predict the expected value of future payments. In order to quantify uncertainty, it is necessary to generate scenarios of payments.

3.4.4 Log-Normal model preview

If he claims appear to follow a log-normal distribution. Zehnwirth (1998) modelling the incremental claims is possible.

Zehnwirth (1998) assumed that the incremental claims Xi,j follows a structure of

lnXi,j=Yi,j=?i+k=1j?k+t=1i+jlt+?i,j,.. (3.15)

The errors are assumed to be normal with ?i,j.~N(0,?2, the parameters ?i,?j,lt model trends in three time directions, namely origin year, development year and calendar (or payment) year, respectively to test the notation

lnXi,j=Yi,j=?i+dj+?i,j,.. (3.16)

Is used where a, d represent the parameters in origin and development period direction

3.4.5 The Bayesian model preview

Predictive distributions can be used to estimate the claims reserves, the advantage of the Bayesian is that it can allow the run-off triangle (table 3.2) of the incremental claim loss settlements to be negative, which is not possible in under the traditional chain ladder method, these negative results could be caused by insuring errors, claim rejection by insurer or over estimation of the loss. It will prove helpful if the available data has negative incremental values. Again if the is an instability in the proportion of ultimate claim, in the early development years for instance, resulting in unsatisfactory estimates of the chain ladder, the Bornhuertter-Ferguson(BF) Bayesian technique can be used to obtain the initial estimate of the ultimate claims.

Bayesian forecasting methods assume that the time taken to for a claim settlement is known and constant and the development of partial pay-off has a stable pattern throughout de Alba (2007)

Assumption

Incremental claim in the run-off triangle have an over-dispersed Poisson distribution. If the random variables Xit(i=1,…,k;t=1,…k) denote claim figures this is the available information on claim frequency then the unobserved claims can be estimated. Let fxit? be the corresponding density function so that

L?x= i+t?k+1?fxit? (3.17)

is the likelihood function for the parameters given the data in the upper triangle in Table 3.3 then this available information on parameters is incorporated through the prior density ??By Bayes theorem to obtain the posterior distribution for parameters we combine the prior density with the likelihood function is given by

f?x?L?x?? (3.18)

Now using the observed values x=(x11,…,xkk) to estimate the observations in the lower triangle denoted by zij by using the posterior predictive distribution

fzijx= fzij?f?xd? (3.19)

Where {i=1,…,k j=1,…k , with i+t>k+1}The Bayesian is an upgrade to the CL method, if we consider Dk={Cij;i+j?I+k} defined the set of observed values in Table 3.3 now the lower triangle to be estimated can be defined as

Dkc={Cij;i+j>I+k,i?I} (3.20)

Since the different accidental years are independent as per assumption made, by conditional probability factors F = (f0,…,fJ-1) then Cij j?0 is a Markov process we can construct the expectation and variances as

ECij|Ci,j-1, F=Fj-1 Ci,j-1 (3.13)

VarCijCi,j-1, F= ?j-12(Fj-1)Ci,j-1 (3.21)

We should note that the components of F are independent so the posteriors of F given Dk are also independent. The minimum variance predictor for CiJ is given by

Ci,j(k) = ECiJDk = Ci,I-i+kj=I-i+kJ-1·EFJDk (3.22)

fj(k)= EFJDk (3.23)

Where fj(k) is the non-informative prior distribution for F at a given time k

The estimator CiJ at time k is given by

Ci,j(k) = ECiJDk = Ci,I-i+kj=I-i+kJ-1 · f,j(k) (3.24)

The Bayesian approach can serves a powerful alternative to deterministic statistical methods when given prior information but it can also be used when there is no prior information. Making inferences under these circumstances is known as objective Bayesian inference.

3.4.6 The mean square error of prediction (msep) review

The msep will be used to measure the quality of the estimate we consider in second moment. After an estimate of the ultimate claim has been produced it is vital to know how good that estimator is. A random variable X and a set of observations D, such that the estimation of the random variable X is measured by D for EX|DmsepX|D(X)= E(X-X)2 (3.25)

Since X is measurable by D then

msepX|D(X)= Var(X|D) + (X-EX|D)2 (3.26)

Var(X|D) is the variance within the stochastic model and (X-EX|D)2 is parameter per estimator error, it shows the uncertainty in the estimation of the expectation, it is inversely proportional to the observations.

3.4.7 Mack Model preview

Mack (1993) suggested a model to estimate the first two moments (mean and standard errors) of the chain-ladder forecast, without assuming a distribution under three conditions. In order to forecast the amounts Cik for k > n + 1- i , the Mack chain-ladder model assumes

Fik= Ci, k+1Ci, k (3.26)

VarFik Ci,1, Ci,2, . . . Ci,k)= ?k2wik Cik? (3.27)

Ci,1, Ci,2, . . . Ci,n,{Ci,1, Ci,2, . . . Cj,n} are independent at the origin

Mack uses the following notation for the weighting parameter ?=2-?By decreasing volatility as claims were settled Mack’s model gave an unbiased estimator of the future claims

fk= ?i=kn-kwik Ci,k?Fi,k?i=kn-kwik Ci,k?(3.28)

Which is an unbiased estimator for fk then

?k2 = 1n-k-1 ?i=1n-k(Fi,k – fk)2 Ci, k (3.28)

is also an unbiased estimator given the previous triangle.

The estimation error is estimated by

ERi- Ri2?= Ri2 k=n+1-in-1?k2fk2 (1Ci, k+ 1l=1j+1-kCi,k)(3.29)

Then the conditional of all years is given by

MSE= j=1nMSEi +2 j;1MSEi, j (3.30)

The formulas are implemented in R under the Chain ladder package

Chapter 4

4.1 Introduction

This chapter presents the statistical analysis of claims data acquired from an insurance company in Swaziland. The historical data of the claims was presented in a triangular structure which is the requirement of the chain ladder method. The claims were structured with respect to the accidental year which are defined as the origin. This data arrangement shows the development of claims with time. This analysis used yearly origins due to the nature of available data (aggregated from a certain homogeneous line of business) but the analysis could be approached on a monthly, quarterly, semi-yearly perspective. This means for this study period 1 denotes a 12 months period, can be referred to as development year/lag time/age.

The methodology stated in the previous chapter was used to analyse the incremental claims data from year 2008 to 2017, a 10 year period since the analysis was carried out to estimate yearly claims reserves based on the settled claim. The computer language used was rlang and the package ChainLadder which both run in R software. R statistical software was used to analyse and predict future claims reserves.

Table 4 Incremental Payments

development years Origin 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

2008 4000 3200 2600 1804 1641 1223 1041 950 720 540

2009 3865 3659 1924 1250 920 841 700 640 560 NA

2010 4500 3618 2000 1763 1372 1000 815 720 NA NA

2011 3980 2001 1500 1432 1032 915 810 NA NA NA

2012 4605 4000 2336 1964 1500 1200 NA NA NA NA

2013 5498 4864 3046 2007 1801 NA NA NA NA NA

2014 7200 6781 4921 2108 NA NA NA NA NA NA

2015 5477 3205 2600 NA NA NA NA NA NA NA

2016 6008 5860 NA NA NA NA NA NA NA NA

2017 8763 NA NA NA NA NA NA NA NA NA

Table 4 shows the incremental claims, this is the original data set output. Initially in year 2008 the insurer made a total of 4000 payments which then accumulated by 3200 in the following year then by 540 in the current year 2017. The values on the diagonal of represent the latest incremental payment.

Table 5 Cumulative Run-Off Triangle

development years Origin 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

2008 4000 7200 9800 11604 13245 14468 15509 16459 17179 17719

2009 3865 7524 9448 10698 11618 12459 13159 13799 14359 NA

2010 4500 8118 10118 11881 13253 14253 15068 15788 NA NA

2011 3980 5981 7481 8913 9945 10860 11670 NA NA NA

2012 4605 8605 10941 12905 14405 15605 NA NA NA NA

2013 5498 10362 13409 15415 17216 NA NA NA NA NA

2014 7200 13981 18904 21010 NA NA NA NA NA NA

2015 5477 8682 11282 NA NA NA NA NA NA NA

2016 6008 11868 NA NA NA NA NA NA NA NA

2017 8763 NA NA NA NA NA NA NA NA NA

Table 5 shows the known values of payments for every accidental year at the end of the 12 months period for each development year. The last diagonal (from upper right to lower left) represent the most current paid losses, these are called the latest paid.

8763 11868 11282 21010 17216 15605 11670 15788 14359 17719

Figure 2 Plot of Incremental Cumulative claims

This figure 2 shows the behaviour of the claims to be organised as they are grouped together in the lower development year, it can be said that they are well behaved but as the development years increase then they approximate an exponential progression as the differences in the settled claims greatly increase.

Figure 3 Plot of Cumulative Claims Development

Figure 3 also confirms the exponential development of claims settlements and therefore they are linearly distributed. This brings us to the purpose of the study.

4.2 The Traditional Chain Ladder

Using the chain ladder algorithm to estimate the claims reserves, we first obtain the fully predicted table of claims settlements, this is a deterministic approach in reserve modelling. The first step was to find the age to age link ratios (loss development factors). The output was found to be

For the next claim settlement in the development year tail factors are used, multiplied with the corresponding value in the latest paid diagonal. Using these tail factors the full triangle was developed.

1.824 1.297 1.154 1.116 1.083 1.065 1.053 1.042 1.031 1.000

Table 6 The Predicted Future claims settlements

development years Origin 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

2008 4000 7200 9800 11604 13245 14468 15509 16459 17179 17719

2009 3865 7524 9448 10698 11618 12459 13159 13799 14359 14810

2010 4500 8118 10118 11881 13253 14253 15068 15788 16456 16973

2011 3980 5981 7481 8913 9945 10860 11670 12286 12806 13209

2012 4605 8605 10941 12905 14405 15605 16614 17492 18232 18805

2013 5498 10362 13409 15415 17216 18643 19849 20898 21782 22466

2014 7200 13981 18904 21010 23442 25385 27027 28455 29658 30591

2015 5477 8682 11282 13018 14525 15730 16747 17631 18377 18955

2016 6008 11868 15393 17762 19818 21461 22850 24056 25074 25862

2017 8763 15983 20731 23922 26691 28903 30733 32398 33769 34830

Table 6 shows the estimated claims settlements an insurer should expect when the triangle has fully developed. The main attention is on the last column of the full triangle, these are the predicted ultimate settled claims

2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

17719 14810 16973 13209 18805 22466 30591 18955 25862 34830

When each of diagonal (latest paid) is multiplied by its corresponding loss development factor, the direct ultimate loss cost.

3.975 2.179 1.680 1.456 1.305 1.205 1.132 1.075 1.031 1.000

To get the growth curve which is the inverse of the loss development factor

0.2516 0.4589 0.5952 0.6868 0.7663 0.8298 0.8835 0.9302 0.9695 1.000

The outstanding claims reserve (IBNR) obtained by the deterministic chain ladder model is

68941

Considering the tail factor with the calculated value of 1.089

Then after the 10 year development of claims the reserve should be increased by 8.9%

Then the claim reserve is 75,077

4.3 The Mack Model

This is a stochastic chain ladder model which is also can be used as a prediction tool in attaining the full triangle.

Table 7 Mack model full triangle

development years Origin 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

2008 4000 7200 9800 11604 13245 14468 15509 16459 17179 17719

2009 3865 7524 9448 10698 11618 12459 13159 13799 14359 14810

2010 4500 8118 10118 11881 13253 14253 15068 15788 16456 16973

2011 3980 5981 7481 8913 9945 10860 11670 12286 12806 13209

2012 4605 8605 10941 12905 14405 15605 16614 17492 18232 18805

2013 5498 10362 13409 15415 17216 18643 19849 20898 21782 22466

2014 7200 13981 18904 21010 23442 25385 27027 28455 29658 30591

2015 5477 8682 11282 13018 14525 15730 16747 17631 18377 18955

2016 6008 11868 15393 17762 19818 21461 22850 24056 25074 25862

2017 8763 15983 20731 23922 26691 28903 30733 32398 33769 34830

Table 7 shows predictions for future claims settlements using the cumulative run-off triangle.

Table 8 Summary of the Mack model table

Origin Latest Deviation to date Ultimate IBNR Mack S.E CV(IBNR)

2008 17719 1 17719 0 0 NaN

2009 14359 0.97 14810 451 13.2 0.0292

2010 15788 0.93 16973 1185 46.2 0.0389

2011 11670 0.884 13209 1539 127.2 0.0827

2012 15605 0.83 18805 3200 236.2 0.0738

2013 17216 0.766 22466 5250 319.4 0.0608

2014 21010 0.687 30591 9581 557.7 0.0582

2015 11282 0.595 18955 7673 698.6 0.091

2016 11868 0.459 25862 13994 1181.2 0.0844

2017 8763 0.252 34830 26067 2919.8 0.112

Totals

Latest 145280

Dev 0.68

Ultimate 214220.7

IBNR 68940.7

Mack S.E 3606.69

CV(IBNR) 0.05

In table 8 shows all the outstanding claims which are the IBNR column values and standard errors and origins and across cumulative developments periods. The ultimate and latest paid are also exhibited in the summary as they are used in the calculation of the reserves.

The claims reserves are 68,940.70 and their standard error is 3,606.69 this result of the reserve is the same of that of the Chain ladder algorithm.

Figure 4 Expected cumulative claims development and the estimated standard error.

In figure 4 the cumulative claims development follow a linear distribution and this makes justifies the whole forecasting model but still need to check if the assumptions hold. This done by plotting the residual, if there is no trend in the residuals, the model holds

Figure 5 The Residual plot

From the residual plot it can be concluded that the Mack model holds and this is justified by the number of residuals that fit in the -2, 2 interval. It is also worth noting that in the lower development years the data is under fitted and at the higher stages in the development years the data is over fitted.

This is by inspection of the residual plots (bottom four) images. The first plot shows the latest claims with respect to the origin and the predicted values and their standard errors are incorporated by the whiskers.

4.4 Over-Dispersed Poisson Regression model

This model assumes that incremental factors are Poisson distributed, so incremental settlements were treated as Poisson distributed random variables.

Table 9 Summary of Poisson Model

Coefficients Estimate Std.Error z-value Pr(;?z?)

Intercept 8.40244 0.00891 942.55 ; 2e-16

originf2009 -0.17931 0.01131 -15.86 1.00E-04

originf2010 -0.043 0.01109 -3.88 ;2e-16

originf2011 -0.29376 0.0121 -24.28 ;2e-16

originf2012 0.05948 0.01122 5.3 ;2e-16

originf2013 0.23738 0.01101 21.56 1.20E-07

originf2014 0.54606 0.01059 51.57 ;2e-16

originf2015 0.06743 0.01244 5.42 ;2e-16

originf2016 0.37815 0.0124 30.5 6.00E-08

originf2017 0.67585 0.01391 48.58 ;2e-16

devf2 -0.19363 0.007 -27.65 ;2e-16

devf3 -0.61289 0.00848 -72.24 ;2e-16

devf4 -1.01028 0.01034 -97.73 ;2e-16

devf5 -1.5212 0.01224 -94.1 ;2e-16

devf6 -1.37624 0.01502 -91.65 ;2e-16

devf7 -1.54486 0.01827 -84.54 ;2e-16

devf8 -1.68483 0.02178 -77.34 ;2e-16

devf9 -1.85533 0.02888 -64.24 ;2e-16

devf10 -2.11087 0.04395 -48.03 ;2e-16

In table 9 the intercept estimates the first log payment then the coefficients are added depending on the origin and development year the estimate is to be projected.

Using these coefficients it is then possible to estimate the incremental claims payments which can be used to predict the reserves.

Table 10 Incremental Payments

Origin 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10,

1 , 4458 3673 2415 1623 1409 1125.7 951.1 826.8 697.2 540

2 , 3726 3070 2019 1357 1177 941 794.9 691.1 582.8 451.4

3 , 4270 3519 2314 1555 1349 1078.4 911 792 667.9 517.3

4 , 3323 2738 1800 1210 1050 839.2 709 616.4 519.7 402.5

5 , 4731 3898 2563 1723 1495 1194.7 1009.3 877.5 740 573.1

6 , 5652 4657 3062 2058 1786 1427.4 1205.9 1048.4 884 684.7

7 , 7696 6342 4170 2802 2432 1943.5 1641.9 1427.5 1203.7 932.3

8 , 4769 3929 2584 1736 1507 1204.3 1017.4 884.5 745.9 577.7

9 , 6507 5361 3524 2369 2056 1643.1 1388.1 1206.8 1017.7 788.2

10 , 8763 7220 4748 3191 2769 2212.9 1869.5 1625.3 1370.5 1061.5

Using Table 10 the sum of incremental payments for the year 2017 give the claims loss reserve for the next year, the IBNR was found to be

68,940.7

This is the same value obtained by the chain ladder algorithm, and the incremental triangle, this basically means that the development factors between these two models are equal.

The mean square error was found to be 4,513.977

Figure 6 Residual plot for the Over- dispersed Poisson.

Figure 6 shows that the model is well behaved except for the last plot, this shows a warning in R meaning which calls for further investigation.

4.5 Bootstrap Chain ladder model

This method is a resampling technique, it resamples the residuals of the model repeatedly to generate pseudo run-off triangles and then it simulates the process error. Using the cumulative triangle as the input parameter the following results were obtained.

Table 11 Bootstrap summary

Origin Latest Mean Ultimate Mean IBNR IBNR S.E IBNR 75% IBNR 95%

2008 17719 17719 0 0 0 0

2009 14359 16963 448 202 578 814

2010 15788 13216 1175 323 1394 1742

2011 11670 18793 1546 362 1750 2183

2012 15605 22461 3188 519 3514 4090

2013 17216 22461 5245 693 5715 6385

2014 21010 30557 9567 1075 10214 11454

2015 11282 18980 7698 914 8301 16196

2016 11868 18980 14061 1328 14940 30523

2017 8763 34905 26142 2554 27881 30320

Totals

Latest 145280

Mean Ultimate 214350

Mean IBNR 69070

IBNR S.E 4664

Total IBNR 75% 72185

Total IBNR 95% 76880

The output giving the claims reserve to be 69,141 with a standard error of 4,533. This technique permits for quartile analysis of the claims reserves against the development years to be estimated, these are percentages of the simulated data.

Table 12 Quartiles estimations of simulated data

Origin IBNR 75% IBNR 95% IBNR 99% IBNR 99.5%

2008 0 0 0 0

2009 578.5 814.05 949.07 983.04

2010 1394 1742.5 1956.03 2069.095

2011 1749.5 2183.15 2453.02 2567.095

2012 3514 4090.2 4494.51 4679.125

2013 5715 6385.4 6975.46 7254.02

2014 10213.5 11453.8 12260.97 12786.135

2015 8301 9250.25 9869.09 10104.06

2016 14940.5 16196.5 17216.03 17430.31

2017 27881 30523.25 32093.88 32661.415

Totals

IBNR 75% 72184.75

IBNR 95% 76880.4

IBNR 99% 79954.33

IBNR 99.5% 81640.66

This is a useful tool in making short term decisions more especially if the data is being analysed quarterly.

Figure 7 Distribution of future simulated future claim settlements

In Fig 7 the model is tested by compering latest claims settlements against the simulated data, this is shown by the graphs in lower row on the above figure. The histogram graph presents the frequencies of the simulated future claim settlements, the second graph on the top right shows the empirical distribution of the simulated claims reserves.

Testing the model can be done by fitting a log normal distribution in the forecasted claims settlements.

The standard error is significant, which means that the model is adequate, this is also confirmed by the figure below.

Figure 8 Fitted lognormal on simulated data

Figure 8 then justifies the study to be carried out as the model passed the test as the simulated future settlements follow a log normal distribution.

The claims to be settled in the next calendar year are given by

This is with respect to the development years.

8374 4260 2196 2780 1398 1001 462 945 770

Then the Claims reserve for the next year is

82027.53

4.6 Log-Normal Model

Using the cumulative data it was discovered that the outstanding claims followed a log normal distribution, a plot of the data on a log scale to see if this was true for the current claims.

Figure 9 Incremental log claims development

The claims are compressed at the early development stage them they disintegrate linearly as seen from development year 4 to 10 the application requires the use of dummy variables.

d1 for the first development period only

d210 from development year 2 to 10

Table 13 Summary of lognormal prediction

Coefficients Estimates Std.Error t-value Pr(;?t?)

Intercept 8.3493 0.0797 104.78 ; 2e-16

originf2009 -0.2944 0.076 -3.87 0.00036

originf2010 -0.1458 0.0791 -1.84 0.07216

originf2011 -0.379 0.0831 -4.56 4.20E-05

originf2012 -0.0311 0.0881 -0.35 0.72564

originf2013 0.145 0.0945 1.53 0.13246

originf2014 0.4391 0.1029 4.27 0.00011

originf2015 0.0257 0.1148 0.22 0.82356

originf2016 0.3916 0.1345 2.91 0.00568

originf2017 0.5959 0.1829 3.26 0.0022

d1 0.1331 0.0736 1.81 0.07743

d210 -0.2381 0.0124 -19.14 ;2e-16

Table 13 shows the estimated standard error and the p-values indicate if there is a significant difference between the development years. From the table we deduce that there is a significant difference in year 2010, 2012, 2013 and 2015 since they have p-values greater than 0.05. The model is must readjust to compensate these development years. This is done by putting other dummy variables in the years where the p-value is significant.

Table 14 The readjusted log normal prediction summary.

a11 -0.36266 -4.967 9.42E-06

a14 0.44875 4.755 1.91E-05

a16 0.3977 3.044 0.00382

a17 0.6025 3.241 0.00219

d1 0.12725 1.642 0.10719

d210 -0.24299 -19.874 ;2e-16

From Table 14 it is seem that the parameters have been reduced from 12 to 7 but we should note that the residual standard error increased in the process from 0.165 to 0.175. d210 means that claims settlement are predicted to drop by 24% each after the first development year.

Figure 10 Residual plot

The distribution of the error follows a normal distribution by the top right plot then to test if this true a Shapiro –Wilk normality test was performed in R it gave the following results.

Since the p-value is greater than 0.5 then the residuals were further analysed.

Figure 11 Residual plot 2

Now the residuals are normal and the claims reserve was predicted to be

Latest 145280

Dev 0.65

Ultimate 224931.74

IBNR 79651.74

S.E 5721.52

CV(IBNR) 0.07

4.7 Summary

In this chapter the analysis shows that all the models used to predict the future claims settlement are justified using assumptions and tests. The estimated IBNR ranges from 68,940.7 to 70,982.76 which is not too much difference. Whereas the mean square error ranges from 3,606.69 to 6,717.89 which is a large difference for the mean square error.

The traditional chain ladder does not have a mean square error this is because it is a new concept Thomas Mack developed in the later years, the chain ladder algorithm is the oldest technique in loss reserving. When trying to quantify uncertainty in the over-dispersed Poisson model, the residual plot on the showed a warning, this should be investigated further.

Overpricing and under-pricing can tackled using these models as they showed stability in the mid-stages of the development years, but other techniques can be used to try and solve under-pricing which later reveals before and at fully development of the reserve triangles.

Chapter 5

5.1 Introduction

The underlying objective of this research was to model the claims run-off triangle by the use of chain ladder models namely the Mack model, Traditional chain ladder algorithm, over-dispersed Poisson model, Bootstrap model and the log-normal model. This was well executed by R in this research it was found that the Over-Poisson and the traditional Chain Ladder algorithm gave similar results of the claims reserves regardless of their different assumptions. The traditional chain ladder algorithm assumes that development factor are independent of the origin periods, they follow a constant payment pattern, as suggested by England ; Verrall (1999).

On the other hand the Over-Dispersed Poisson assumes that incremental payments are Poisson distributed, Renshaw ; Verrall (1998). The unexpected was when the Mack model project the same results as the other two models. On further research Verrall (1999) stated that due to the nature of the chain ladder merging as an algorithm which can be used to forecast reserve estimates, it is therefore possible to define various models which give the same reserve estimates. Mack (1993) in his paper explored models which gave the same outstanding claim settlements as the chain ladder algorithm.

5.2 Discussions

The Bootstrap model also showed similarities with the Over-Dispersed model in the first two moments, this was for seen by England ; Verrall (2001) when they initially applies a quasi-linear Poisson model to the claims triangle to forecast future payments. By the prediction of the first moment of the bootstrap model is the same as that of the traditional Chain Ladder algorithm.

The residual plots of cumulative claims settlements show are usually over fitted at the lower stages of the development year and under fitted at the later stages of the development years when the run-off triangle approaches development. This explains Sydnor (2010) who stated that in insurance, under-pricing is most common in the earlier stages of the development years and the constant nature of the claims incremental triangle results in these effect.

It can be assumed that it is because at the origin there is no prior information available and the insurer will try to avoid any shortcomings this results in over-pricing. Ritter (2002) discussed the issue of over pricing at later development stages as a result of a long run stability syndrome. As projected by the residuals the middle stage of the development years is constant (stable) then declines before the triangle reaches full development.

5.3 Conclusion

Due to my findings from this analysis, based on the mean square error I would recommend the use of the Mack model as it has a low mean square error, but the model to be used depends on so many variables, could be the of the nature of data, time constraints, the purpose of an individual’s study could even be the line of business. This is mainly because all the models used above were able to satisfy their assumption, ruling one of would require more statistical evidence.

Verrall (1991) argued that the traditional chain ladder produces outstanding claims ignoring any tail factors. By applying the concept developed by Clark(2013) it was possible for this study to include the tail factor when calculating the claims reserve using the chain ladder algorithm.

5.4 Recommendations

A more direct approach to finding which model is better than the other, I would recommend the variance analysis of the claims reserve with respect to the run-off triangles, measuring variability in the estimates, will paint a better picture. According to Merz & Wuthrich(2008) it is possible to estimate the reserve risk over a year, this specifically looks at the change necessary in the estimate of the ultimate loss cost scheduled on the claims development over the following year but on one a 1 year time horizon. I would recommend on a further investigation on quantifying uncertainty in general linear models.

References

Barnett, G., Zehnwirth (2000): Best estimates for reserves, CAS Forum

Bornhuetter, R. L. & R. E. Ferguson (1972): The actuary and IBNR, In Proceedings of the Casualty Actuarial Society, Volume 59, pp. 181-195.

Buhlmann, H., Schnieper, R. & E. Straub (1980): Claims Reserves in Casualty Insurance

David R. Clark (2003): LDF Curve-Fitting and Stochastic Reserving; A Maximum Likelihood Approach, Casualty Actuarial Society

England, P. (2002): Addendum to “Analytic and Bootstrap Estimates of Prediction Error in Claims Reserving”. Insurance: Mathematics and Economics, 31: 461-466.

England P.D. & Verrall R.J. (2002): Stochastic claims reserving in general insurance, British Actuarial Journal 8, 443-544

England, P. & Verrall, R. (2002): Stochastic Claims Reserving in General Insurance. British Actuarial Journal, 8 (3), 443-518.

England, P. & Verrall, R. (2001): A Flexible Framework for Stochastic Claims

Reserving. Proceedings of the Casualty Actuarial Society, LXXXVIII, 1-38.

England, P. & Verrall, R. (1999): Analytic and Bootstrap Estimates of Prediction Errors in Claims Reserving. Insurance: Mathematics and Economics, 25,281-293.

England, P. & Verrall, R. (2006): Predictive Distributions of Outstanding Liabilities in general Static

Glenn G. and Peng Shi (2011): The Retrospective Testing of Stochastic Loss

Glenn G. Meyers (2007): Estimating Predictive Distributions for Loss Reserve Models,

Glenn G. Meyers (2008): Stochastic Loss Reserving with the Collective Risk

Glenn Meyers (2012): The Levelled Chain Ladder Model for Stochastic Loss Reserving, Presented at the ASTIN Colloquim October 2.

Hachemeister, C. A. & Stanard, J. N. (1975): IBNR Claims Count Estimation with Kaas, R.

Kremer, E. (1982): IBNR claims and the two way model of ANOVA.Meyers,

Kremer, F. (1982):Claims and the two-way model of anova. Scand. Actuary Journal., 1.

Mack, T. (1993): Distribution Free Calculation of the Standard Error of Chain Ladder Reserve Estimates. ASTIN Bulletin, 23, 213-225.

Mack, T. & Venter, G. (2000): A Comparison of Stochastic Models that Reproduce Chain Ladder Reserve Estimates. Insurance: Mathematics and Economics, 26, 101-107.

Miranda, M. D. M., Nielsen, J. P. & Verrall, R. (2012):Double Chain Ladder, ASTIN

Murphy, D. (1994):Unbiased loss development factors, PCAS 81, 154 { 222.

Renshaw, A. (1994): Modelling the Claims Process in the Presence of Covariates. ASTIN Bulletin, 24, 265-286.

Renshaw, A. & Verrall, R. (1998): A Stochastic Model Underlying the ChainLadder Technique. British Actuarial Journal, 4, IV, 903-923.

Schmidt, K. D (2006): Methods and Models of Loss Reserving Based on Run-O Triangles, Casualty Actuarial Society Forum, Fall

Taylor, G. (1986):Claim reserving in non-life insurance. North-Holland, Amsterdam.

Taylor, G. (2000):Loss reserving an actuarial perspective. Kluwer Academic

Taylor, G. (2000): Loss Reserving-An Actuarial Perspective. Boston: Kluwer

Taylor, G. C (1986): Claims Reserving In Non-Life Insurance, Claims Reserving In NonLife Insurance, Publishers, Boston model,. Journal of the Institute of Actuaries, 116

Verrall, R. J.(1989):A state space representation of the chain ladder linear model,. Journal of the Institute of Actuaries, 116

Verrall, R. J. (1989): A state space representation of the chain ladder linear

Verrall, R. (1991): On the Estimation of Reserves from Loglinear Models. Insurance: Mathematics and Economics, 10, 75-80.

Verrall, R. (2000): An Investigation into Stochastic Claims Reserving Models and the Chain-ladder Technique. Insurance: Mathematics and Economics, 26,

91-99.

Verrall, R. & England, P. (2000): Comments on: “A Comparison of Stochastic Models that reproduce Chain Ladder Reserve Estimates, by Mack and Venter.” Insurance: Mathematics and Economics, 26, 109-111.

Wütrich, M. & Merz, M. (2008): Stochastic Claims Reserving Methods in

Insurance. John Wiley & Sons Ltd.

Zehnwirth, B. (1989): Regression methods – Applications. Conference

paper, 1989 Casualty Loss Reserve Seminar.

APPENDIX

Data input

n <- 10

Claims <-

data.frame(originf = factor(rep(2008:2017, n:1)),

dev=sequence(n:1),

inc.paid=

c(4000,3200,2600,1804,1641,1223,1041,950,720,540,3865,3659,1924,1250,920,841,700,640,560,4500,3618,2000,1763,1372,1000,815,720,3980,2001,1500,1432,1032,915,810,4605,4000,2336,1964,1500,1200,5498,4864,3046,2007,1801,7200,6781,4921,2108,5477,3205,2600,6008,5860,8763))

Transformation of data to triangular form

(inc.triangle <- with(Claims, {

M <- matrix(nrow=n, ncol=n,

dimnames=list(origin=levels(originf), dev=1:n))

Mcbind(originf, dev) <- inc.paid

M

}))

Obtaining the cumulative development of claims triangle

(cum.triangle <- t(apply(inc.triangle, 1, cumsum)))

Getting the latest paid (diagonals)

(latest.paid <- cum.trianglerow(cum.triangle) == n – col(cum.triangle) + 1)

To get figure 4.1

op <- par(fig=c(0,0.5,0,1), cex=0.8, oma=c(0,0,0,0))

with(Claims, {

interaction.plot(x.factor=dev, trace.factor=originf, response=inc.paid,

fun=sum, type=”b”, bty=’n’, legend=FALSE); axis(1, at=1:n)

par(fig=c(0.45,1,0,1), new=TRUE, cex=0.8, oma=c(0,0,0,0))

interaction.plot(x.factor=dev, trace.factor=originf, response=cum.paid,

fun=sum, type=”b”, bty=’n’); axis(1,at=1:n)

})

mtext(“Incremental and cumulative claims development”,

side=3, outer=TRUE, line=-3, cex = 1.1, font=2)

par(op)

xyplot(cum.paid ~ dev | originf, data=Claims, t=”b”, layout=c(5,2),

as.table=TRUE, main=”Cumulative claims development”)

To obtain tail factors

f <- sapply((n-1):1, function(i) {

sum( cum.triangle1:i, n-i+1 ) / sum( cum.triangle1:i, n-i )

})

tail <- 1

(f <- c(f, tail))

To get the fully predicted run-off triangle

full.triangle <- cum.triangle

for(k in 1:(n-1)){

full.triangle(n-k+1):n, k+1 <- full.triangle(n-k+1):n,k*fk

}

full.triangle

The column for ultimate claim settlements

(ultimate.paid <- full.triangle,n)

The loss development factor estimates

(ldf <- rev(cumprod(rev(f))))

The growth curve

(dev.pattern <- 1/ldf)

Calculation of the claims reserve

sum(ultimate.paid – latest.paid)

Tail factors after full development

dat <- data.frame(lf1=log(f-c(1,n)-1), dev=2:(n-1))

(m <- lm(lf1 ~ dev , data=dat))

sigma <- summary(m)$sigma

extrapolation <- predict(m, data.frame(dev=n:100))

(tail <- prod(exp(extrapolation + 0.5*sigma^2) + 1))

Mack model summary results

(mack <- MackChainLadder(cum.triangle, weights=1, alpha=1,

est.sigma=”Mack”))

Fig 4.2.1 plot

plot(mack, lattice=TRUE, layout=c(5,2))

fig 4.2.2 plot

plot(mack)

Over-Dispersed Poisson summary

preg <- glm(inc.paid.k ~ originf + devf,

data=Claims, family=poisson(link = “log”))

summary(preg)

Incremental predicted claim settlements full triangle

allClaims <- data.frame(origin = sort(rep(2008:2017, n)),

dev = rep(1:n,n))

allClaims <- within(allClaims, {

devf <- factor(dev)

cal <- origin + dev – 1

originf <- factor(origin)

})

(pred.inc.tri <- t(matrix(predict(preg,type=”response”,

newdata=allClaims), n, n)))

The predicted claims reserve

sum(predict(preg,type=”response”, newdata=subset(allClaims, cal > 2017)))

Bootstrap summary

set.seed(1) # set seed to have a replicatable example

(B <- BootChainLadder(cum.triangle, R=1000, process.distr=”od.pois”))

Fig 4.4.1

plot(B)

Quatile predictions

quantile(B, c(0.75,0.95,0.99, 0.995))

Getting standard errors

(fit <- fitdist(B$IBNR.TotalsB$IBNR.Totals>0, “lnorm”))

Fig 4.4.2

plot(fit)

99.5% claims reserve prediction

qlnorm(0.995, fit$estimate’meanlog’, fit$estimate’sdlog’)

Predicted latest paid for the next year

ny <- (col(inc.triangle) == (nrow(inc.triangle) – row(inc.triangle) + 2))

paid.ny <- apply(B$IBNR.Triangles, 3,

function(x){

next.year.paid <- xcol(x) == (nrow(x) – row(x) + 2)

sum(next.year.paid)

})

paid.ny.995 <- B$IBNR.Triangles,,order(paid.ny)round(B$R*0.995)

inc.triangle.ny <- inc.triangle

(inc.triangle.nyny <- paid.ny.995ny)

Fig 4.5.1

Claims <- within(Claims, {

log.inc <- log(inc.paid.k)

cal <- as.numeric(levels(originf))originf + dev – 1

})

with(Claims,{

interaction.plot(x.factor=dev, trace.factor=originf, response=log.inc,

fun=sum, type=”b”, bty=’n’); axis(1, at=1:n)

title(“Incremental log claims development”)

})

Inputing dummy variabled

Claims <- within(Claims, {

d1 <- ifelse(dev < 2, 1, 0)

d210 <- ifelse(dev < 2, 0, dev – 1)

})

Log normal summary

summary(fit1 <- lm(log.inc ~ originf + d1 + d210, data=Claims))

Adjusted values summary

Claims <- within(Claims, {

a9 <- ifelse(originf == 2009, 1, 0)

a11 <- ifelse(originf == 2011, 1, 0)

a14 <- ifelse(originf == 2014, 1, 0)

a16 <- ifelse(originf == 2016, 1, 0)

a17 <- ifelse(originf == 2017, 1, 0)

})

summary(fit2 <- lm(log.inc ~ a9 + a11 +a14 +a16 +a17 + d1 + d210, data=Claims))

First plot of residuals

op <- par(mfrow=c(2,2), oma = c(0, 0, 3, 0))

plot(fit2)

par(op)

Normality test

shapiro.test(fit2$residuals)

Second plot of residuals

resPlot <- function(model, data){

xvals <- list(

fitted = model’fitted.values’,

origin = as.numeric(levels(data$originf))data$originf,

cal=data$cal, dev=data$dev

)

op <- par(mfrow=c(2,2), oma = c(0, 0, 3, 0))

for(i in 1:4){

plot.default(rstandard(model) ~ xvalsi ,

main=paste(“Residuals vs”, names(xvals)i ),

xlab=names(xvals)i, ylab=”Standardized residuals”)

panel.smooth(y=rstandard(model), x=xvalsi)

abline(h=0, lty=2)

}

mtext(as.character(model$call)2, outer = TRUE, cex = 1.2)

par(op)

}

resPlot(fit2, Claims)

log.incr.predict <- function(model, newdata){

Pred <- predict(model, newdata=newdata, se.fit=TRUE)

Y <- Pred$fit

VarY <- Pred$se.fit^2 + Pred$residual.scale^2

P <- exp(Y + VarY/2)

VarP <- P^2*(exp(VarY)-1)

seP <- sqrt(VarP)

model.formula <- as.formula(paste(“~”, formula(model)3))

mframe <- model.frame(model.formula, data=newdata)

X <- model.matrix(model.formula, data=newdata)

varcovar <- X %*% vcov(model) %*% t(X)

CoVar <- sweep(sweep((exp(varcovar)-1), 1, P, “*”), 2, P, “*”)

CoVarcol(CoVar)==row(CoVar) <- 0

Total.SE <- sqrt(sum(CoVar) + sum(VarP))

Total.Reserve <- sum(P)

Incr=data.frame(newdata, Y, VarY, P, seP, CV=seP/P)

out <- list(Forecast=Incr,

Totals=data.frame(Total.Reserve,

Total.SE=Total.SE,

CV=Total.SE/Total.Reserve))

return(out)

}

tail.years <-9

fdat <- data.frame(

origin=rep(2008:2017, n+tail.years),

dev=rep(1:(n+tail.years), each=n)

)

fdat <- within(fdat, {

cal <- origin + dev – 1

a9 <- ifelse(originf == 2009, 1, 0)

a11 <- ifelse(originf == 2011, 1, 0)

a14 <- ifelse(originf == 2014, 1, 0)

a16 <- ifelse(originf == 2016, 1, 0)

a17 <- ifelse(originf == 2017, 1, 0)

originf <- factor(origin)

p697 <- ifelse(cal < 2011 & cal > 20016, cal-2009, 0)

d1 <- ifelse(dev < 2, 1, 0)

d210 <- ifelse(dev < 2, 0, dev – 1)

})

Getting the Claims reserve

round(summary(MackChainLadder(cum.triangle, est.sigma=”Mack”,

tail=1.05, tail.se=0.02))$Totals,2)