Lecture notes, lectures 1-11, lecturer David Pitt PDF

Title	Lecture notes, lectures 1-11, lecturer David Pitt
Course	General Insurance Pricing and Reserving
Institution	Macquarie University
Pages	276
File Size	6.6 MB
File Type	PDF
Total Downloads	67
Total Views	146

Preview

CLICK TO PREVIEW PDF

Summary

Lecture notes- David Pitt...

Description

ACST 357 / 862 GENERAL INSURANCE PRICING AND RESERVING

Section 1

Introduction to R Complete Notes

OBJECTIVES 

To understand some of the basic features of R including using R as a calculator, using simple built-in R functions, storing objects in R and performing basic operations on scalars, vectors and matrices in R.



To be able to import data into an R data frame.



To perform simple simulations in R.



To understand how to implement some simple loops in R and to incorporate these into your own functions developed in R.

OPTIONAL READING 

There are many examples of R tutorials freely available on the web that you may like to refer to. The Help files in R also contain information that you may like to read.

OVERVIEW OF SECTION 1 During this unit we will study a number of statistical models. The parameters associated with these models can be estimated by hand, that is, using a calculator and pen and paper. However, it is far more efficient, once we have understood the workings behind these models, to use modern statistical software to perform the calculations for us. Use of statistical software allows us to concentrate more of our efforts on the results of the modelling rather than the, often very tedious, calculations required to estimate parameters for these models. Increasingly actuaries are making use of modern statistical software in their day to day work. In this course, I have chosen to use the statistical package R. I have made this choice for a number of reasons:    

R is freely available on the web to download; R is very similar to the commercially available and widely used software S-Plus; Many routines have been written for R that can automate the fitting process for many complicated statistical models; and Many of the very best universities around the world use R as their chosen piece of statistical software for both teaching and research.

COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 1 of 28

BASIC FEATURES OF R INSTALLATION OF R AND WELCOME TO R

In this unit, we will make use of the Windows version of the statistical software package called R. To download R go to the R Project for Statistical Computing webpage at http://www.r-project.org/. Click on Download R in the Getting Started box and follow the prompts. Use the University of Melbourne mirror, Windows installation and base package to download R2.15.1. Once you have downloaded R, open it and you will see a page something like that below.

This page is the console. It contains some basic information about the package that if you don’t read once it’s not a problem! The greater than sign is the command prompt next to which you can begin to write commands to perform mathematical and statistical calculations. We will begin by learning how to use R to perform operations that you could do perform equally well on a hand calculator.

R AS A SIMPLE CALCULATOR One of the simplest things you can do with the R software is to perform calculations that you could do with your calculator. At the command prompt, enter 2+3. R will display the following: > 2+3 [1] 5 >

COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 2 of 28

Note here that when I copy R commands into a Word Document, I change the font of the R commands to “Courier New”. This tends to give a better aligned presentation of the R code in Word than more commonly used fonts like “Times New Roman”.

Lecture Exercise 1 R can also perform subtraction, multiplication, division, exponentiation (raising to powers). Experiment with -,*,/ and ^ yourself in R and check that R returns the results you expect. STORING OBJECTS IN R

In the previous calculation the result, 5, was printed on screen. It was not however stored in the memory in R. As with using our calculator or an Excel spreadsheet, we often want to store the results of intermediate calculations so that these results can be used subsequently. In a compound interest calculation, you may want to store the result of 1.002530 for future use. In R, we can assign this value to a scalar object with name interest.30 using the following command > interest.30=1.0025^30 >

We can see the result of assigning this value to interest.30 by typing the name of our scalar object at the command prompt in R. > interest.30 [1] 1.077783 >

Lecture Exercise 2 (ACST101 in R!) A loan of $1500 is to be repaid by ten equal annual instalments in arrears. The interest rate is 1% per annum effective. Use R to find the repayment amount. First store the variables given in the question in scalar objects. In addition to storing scalars, R can also store vectors and matrices in objects. Consider the following code with comments after the # sign. > y=c(1,2,3,4,5) #storing a vector of length 5 containing the first 5 counting numnbers in an R vector object called y > y [1] 1 2 3 4 5 > mat1=matrix(c(1,2,3,4,5,6),3,2) #storing a matrix with 3 rows and 2 columns with the first six counting numbers in mat1 > mat1 [,1] [,2] [1,] 1 4 [2,] 2 5 [3,] 3 6 > COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 3 of 28

Lecture Exercise 3 Store the scalars 2, 5 and 4.6 in scalar objects with names s1, s2 and s3. Store the following vectors in vector objects with names v1, v2 and v3,  1   3  2         2  ,  7  and  3  . 3 8  1       4 1  4 Store the following matrices in matrix objects with names mat1 and mat2. If you coded up mat1 in the notes immediately before this exercise it will be overwritten here.  1 0 0 2 4     0 1 0  and  1 1 0 0 1 3 0   

0  3 . 2 

Use R to find (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k)

the sum of s1 and s2 how much greater is s3 than s2 the product of s1 and s3 the quotient when s2 is divided by s1 the product of v1 and s1 the result when v2 is divided by s2 the result when s1 is added to each element of v1 the element-wise product of v1 and v2 the product of v1 transpose and v2 the element by element product of mat1 and mat2 the matrix product when mat1 is multiplied by mat2

It is useful to list the objects in the workspace so that you can keep track of all your objects – to avoid overwriting matrices that you have stored previously like mat1! To get a list of the objects type > ls()

and R will return all the objects currently stored in the workspace. [1] "interest.30" "intRate" "mat1" "n" "payment" "principal" "s1" "s3" "v1" "v2" "v3"

COPYRIGHT MACQUARIE UNIVERSITY

"mat2" "s2" "y"

ACST 357 / 862 Section 1 Complete Notes, Page 4 of 28

BUILT-IN R FUNCTIONS R has a number of built-in functions which perform operations commonly needed. To get help about how to use a particular function at the command prompt, we type ? and the name of the function. For example, if you type ?mean at the command prompt, a window will open up with help on how to use the mean function in R. Not surprisingly, the mean function in R will take as input a vector and return the arithmetic mean of the elements in the vector. A useful alternative to the help or ? function is the example function. > example(mean) mean> x xm c(xm, mean(x, trim = 0.10)) [1] 8.75 5.50 mean> mean(USArrests, trim = 0.2) Murder Assault UrbanPop Rape 7.42 167.60 66.20 20.16

By typing example(mean) at the command prompt, we see some examples of how the mean function is used in R.

Lecture Exercise 4 Explain the output from the example(mean) command in R.

Lecture Exercise 5 Suppose we have the following count of the number of typos per page of a set of notes: 2 3 0 3 1 0 0 1 To enter this data into R we type > typos=c(2,3,0,3,1,0,0,1) > typos [1] 2 3 0 3 1 0 0 1

Write out what you think each of the following commands would return: (a)

typos[2]

(b)

mean(typos)

(c)

max(typos)

COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 5 of 28

(d)

min(typos)

(e)

length(typos[typos>1])

(f)

typos[typos>1]

(g)

sd(typos)

(h)

var(typos)

(i) Check that results from (g) and (h) are consistent.

THE R DATA FRAME Most data sets are stored in R as data frames. We saw the USArrests data frame in Lecture Exercise 4. Data frames are like matrices, but with the columns having their own names. We can create a data frame in R from vectors using the data.frame() built-in R function. > colours=c("red","yellow","blue") > numbers=c(1,2,3) > colours.and.numbers=data.frame(colours, numbers, more.numbers=c(4,5,6)) > colours.and.numbers colours numbers more.numbers 1 red 1 4 2 yellow 2 5 3 blue 3 6

Often we have data available in an Excel spreadsheet and want to import this into an R data frame. In this unit, I will provide you with Excel files containing the data that you need to work with in R. We therefore concentrate here on a simple procedure for getting data from an Excel file into an R data frame. Consider the data in the file “mortality.xls” on Blackboard. This Excel file contains 72 records (rows) of data. The first few records, along with the field (column) headers, are shown below. deaths 13 13 18

exposur e 48222.5 52743.5 58043

int

age

gender

1 1 1

30 31 32

0 0 0

A very simple way to import this data into R is to first convert the Excel file to a comma separated value (.csv) file. To do this save the Excel spreadsheet with filename mortality. Use the Save As option in the File menu. In the box labelled “Save as Type” choose CSV. After saving the file “mortality.csv”, close down the file. Open up R. At the command prompt, type > mortality=read.csv("mortality.csv", header=T) COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 6 of 28

Before you type the above command, make sure that R knows where to look for the file “mortality.csv”. In order to direct R properly, go to the File Menu in R and choose Change dir… Use the Browse Button in the window that comes up next to select the folder where you have saved the file “mortality.csv”. If you then type mortality at the command prompt, R will print out, on screen, the contents of the data frame mortality. This will contain all 72 records of data plus the field names. The object mortality which we have just created in R is a data frame object. A data frame contains a matrix of data but also contains names for each of the columns that make up that matrix. To request R to tell us the names associated with an R data frame, we type names(data frame object name). So for example in R, we type: > names(mortality) [1] "deaths" "exposure" "int"

"age"

"gender"

For large datasets it is wise to attach the data set in R before performing analysis on it. This allows R to focus its memory on the attached data files rather than other objects which have been stored previously in R. The command in R is attach(data frame object name). So in our mortality example, we type: > attach(mortality)

In Section 4 of this unit we will look at generalised linear models – a class of regression models that are very useful in General Insurance pricing and reserving. In the next exercise we consider how the simple linear regression model can be estimated using R.

Lecture Exercise 6 The table below give the weight in kg of a particular type of animal at age given in years. age 0 0 0 0 1 1 1 1 2 2 3 3 4 4

weight 0.9 1.25 1.5 0.81 2.79 3 3.14 2.53 3.28 3.76 4.19 3.67 3.8 4.56

(a) Create a data frame in R containing these data. COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 7 of 28

(b) Assuming that E Y | x     e x and Var Y | x   2 , obtain estimates of ,  and 2 using the method of least squares. Y is the response variable in the regression and x denotes the predictor variable. (c) Plot the observations and your fitted curve.

SIMULATION USING R R can be used to conduct simulation studies. Before we look to some specific R syntax for performing simulations, we do a quick exercise.

Lecture Exercise 7 Consider the standard normal distribution with mean 0 and variance 1. Write down (a) the 0.975 quantile of this distribution. (b) the probability that a value from this variable is less than 1.645. (c) the value of the probability density function for this variable at 0.

Turning to simulation using R, suppose we wanted to simulate 100 values from the normal distribution with mean 175 and standard deviation 10. The command required in R is >x=rnorm(100,175,10)

This places a vector of 100 values into an object called x. We can use R to simulate values from many other probability distributions. A selection of those available with the relevant R commands are given below. rchisq(n,df) : used to simulate n values from a chi-squared distribution with df degrees of freedom rgamma(n,shape,rate,scale) : used to simulate n values from a gamma distribution with parameters shape and rate (scale is just the reciprocal of the rate and can be specified as an alternative parametrisation for the gamma simulations). Note that if the random variable X has a Gamma distribution with shape parameter  and scale parameter then the probability density function for X is f  x 

COPYRIGHT MACQUARIE UNIVERSITY

1 x 1e x /  , x  0.    

ACST 357 / 862 Section 1 Complete Notes, Page 8 of 28

Lecture Exercise 8 Complete the following: 

dnorm(x, mean=0, sd=1): this function returns the value of the probability density function for a standard normal evaluated at a particular value of x. You will recall the standard normal probability density function:

1  1  exp   x 2  ,    x   2  2 

f  x 

> dnorm(1.1,0,1) [1] 0.2178522 

This is correct since



e

1.12 2

2

 0.2178522.

pnorm(q, mean=0, sd=1) : this returns the cdf for the standard normal density evaluated at the specified value of q.

> pnorm(1.96,0,1) [1] 0.9750021

This is correct since we know that the z-value 1.96 cuts off 2.5% in the upper tail of the standard normal density and therefore 97.5% in the lower tail of that density. 

qnorm(p, mean=0, sd=1) : this returns the quantile associated with the probability value p on the standard normal density. That is if the area to the left of some value (the quantile) is p then that quantile will be returned by this function.

> qnorm(0.975,0,1) [1] 1.959964

This is correct since the area to the left of 1.959964 under the standard normal density is 0.975. In other words the cumulative distribution function of the standard normal evaluated at 1.959964 is 0.975.

COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 9 of 28

PROGRAMMING WITH R FLOW CONTROL

Many of the calculations we use in this unit will involve repetition. For example, in approximating roots to non-linear equations we will apply the Newton-Raphson method many times. When programming a computer to perform these calculations, loops are helpful. We will consider the for loop in R. The for() statement in R allows one to specify that a certain operation should be repeated a fixed number of times. The syntax is for(name in vector) {commands}. This sets a variable called name equal to each of the elements of vector, in sequence. For each value, whatever commands are listed within curly braces will be performed. The curly braces serve to group the commands to that they are treated by R as a single command. If there is only one command to execute, the braces are not needed.

Lecture Exercise 9 Explain what the following R code will output: > > > >

fib=numeric(12) fib[1]=fib[2]=1 for(i in 3:12) fib[i]=fib[i-2]+fib[i-1] fib

Lecture Exercise 10 Suppose a car dealer promotes two options for the purchase of a new $20,000 car. The first option is for the customer to pay up front and receive a $1,000 rebate. The second option is “0%-interest financing” where the customer makes 20 monthly payments of $1,000 beginning in one month’s time. Because of option 1, the effective price of the car is really $19,000, so the dealer really is charging some interest rate i for option 2. We can calculate this by solving  1   1  i 20  19000  1000  .   i   Multiplying both sides of this equation by i and dividing by 19,000, we get the form a a fixed-point problem: 20

1  1  i  i 19

COPYRIGHT MACQUARIE UNIVERSITY

.

ACST 357 / 862 Section 1 Complete Notes, Page 10 of 28

By taking an initial guess for i and plugging it into the right-hand side of this equation, we can get an ‘updated’ value for i on the left. For example, if we start with i  0.006, then our update is

i

1  1  0.006  19

20

 0.00593.

By plugging this updated value into the right-hand side of the equation again, we get a new update. This kind of fixed-point iteration often requires many iterations before we can be confident of an accurate solution. Write R code to perform 1,000 iterations of the fixed-point algorithm using i  0.006 as the initial value for i. Next, we consider the if() statement in R. The if statement in R allows to control which statements are executed. This proves very useful in many situations. The syntax in R is if (condition) {commands when TRUE} if (condition) {commands when TRUE} else {commands when FALSE}

This statement causes a set of commands to be invoked if condition evaluates to TRUE. The else part is optional, and provides an alternative set of commands which are to be invoked in case the logical variable is FALSE. A simple example: > x=3 > if (x>2) y=2*x else y=3*x > y [1] 6

Lecture Exercise 11 The following code will be used in this exercise: > x=c(1,2,3,4,5,6,7,8,9,10) > x%%2 [1] 1 0 1 0 1 0 1 0 1 0 > x%%3 [1] 1 2 0 1 2 0 1 2 0 1 > x%%4 [1] 1 2 3 0 1 2 3 0 1 2

From the above code and output it is fairly clear that a%%b, where a is a vector and b is a scalar, returns a vector containing the remainders when each element of the vector a is divided by b. If a zero is returned, then the corresponding element of the vector a is a multiple of b. The function that follows is based on the sieve of Eratosthenes, the oldest known systematic method for listing prime numbers up to a given value n. The idea is as follows: begin with a COPYRIGHT MACQUARIE UNIVERSITY

ACST 357 / 862 Section 1 Complete Notes, Page 11 of 28

vector of numbers from 2 to n. Beginning with 2, eliminate all multiples of 2 which are larger than 2. The move to the next number remaining in the vector, in this case 3. Now, remove all multiples of 3 which are larger than 3. Proceed through all remaining entries of the vector in this way. The entry for 4 would have been removed in the first round, leaving 5 as the next entry to work with after 3; all multiples of 5 would be removed at the next step and so on. > + + + + + + + + + + + + + >

Eratosthenes=function(n) { if(n>=2) { sieve=seq(2,n) primes=c() for(i in seq(2,n)){ if(any(sieve==i)){ primes=c(primes,i) sieve=c(sieve[(sieve %% i)!=0],i) } } return(primes) }else{stop("Input value of n should be at least 2.") } } Eratosthenes(100) [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 ...