Final 21 6 May 2017, questions and answers PDF

Title Final 21 6 May 2017, questions and answers
Course Survival Analysis
Institution University of Leeds
Pages 13
File Size 343 KB
File Type PDF
Total Downloads 637
Total Views 723

Summary

MATHThis question paper consists of All calculators must carry 7 printed pages, each of which is an approval sticker issued by identified by the referenceMATH277501. the School of Mathematics. Statistical tables are provided on pages 6–7. Graph paper is provided.©cUniversity of Leeds School of Mathe...


Description

MATH277501

This question paper consists of 7 printed pages, each of which is identified by the reference MATH277501. Statistical tables are provided on pages 6–7.

All calculators must carry an approval sticker issued by the School of Mathematics. Graph paper is provided.

c University of Leeds 

School of Mathematics May/June 2017 MATH277501 Survival Analysis

Time allowed: 2 hours Answer no more than 4 questions. If you attempt 5, only the best 4 will be counted. All questions carry equal marks.

1

Turn Over

MATH277501 1.

(a) Baboons sleep in trees. Suppose you are interested in the distribution of times baboons stay in a tree to sleep at night. You plan to observe for each baboon when they climb the tree in the evening to sleep and when they leave the tree in the morning. When you arrive to register the times some of the baboons are already in the tree. When you are leaving in the morning some of the baboons are still in the tree. Are your observations subject to censoring? If yes, which type of censoring? Motivate your answer! (b) (i) For a random survival time T provide the formula of the hazard rate h(t) in terms of probability to experience the event in the next interval (t, t + δt] conditional on T > t . (ii) State a formula for S(t) in terms of the hazard rate h(s) . (c) In an experiment, individuals had to solve a difficult task. Some of the individuals did not want to complete the task and stopped early. Others did not complete the task by the end of the experiment. The experiment lasted for two hours, and the times (in minutes) until finishing were recorded: 50,

51,

66∗,

82,

92, 120∗, 120∗,

120∗ ,

where the asterisk indicates that the individual did not complete the task. We are interested in modelling the time that an individual completes the task. ˆ of the survival function S(t) and (i) Calculate the Kaplan-Meier estimate S(t) sketch its plot. Take care that all necessary information to understand the plot is provided. ˆ for t = 60 minutes and provide its standard error. (ii) Give S(t) (iii) Test the null hypothesis that half of the individuals complete the task. Formulate the null hypothesis and write down the Wald type test statistic. (iv) Compute the value of the test statistic in part (c) (iii) for the observed data. What is your conclusion with regard to the null hypothesis in part 1 (c) (iii)? Use a significance level of 5%. 2.

(a) (i) For a random survival time T ≥ 0, state the formula for the expectation E(Tx ) in terms of the survival distribution S(t) . Here, Tx is the residual life time. (ii) Give the mathematical definition of the actuarial symbols t qx and t px . (iii) Derive the following formula  Z t  µx+s ds , t px = exp − 0

with µt the force of mortality. (b) Consider the following table of hazard rates of age group versus calendar year, i.e. 0.10303 is the hazard rate to die between 0 and 1 year of age in 1890.

2

Turn Over

MATH277501 age 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9

1890 0.10303 0.05445 0.0324 0.02159 0.01652 0.01317 0.01099 0.00921 0.00806

1891 0.10061 0.05259 0.02878 0.01948 0.0148 0.01211 0.01015 0.00859 0.00739

1892 0.10716 0.04718 0.027 0.01886 0.01453 0.01184 0.00936 0.0076 0.00651

(i) Calculate the survival probability (ii) Calculate the survival probability and being born in 1891.

1893 0.0985 0.03945 0.0196 0.0132 0.00976 0.00768 0.0063 0.0056 0.00495

1894 0.09263 0.03792 0.02023 0.01443 0.01051 0.00792 0.00623 0.00544 0.00434

1895 0.08922 0.03575 0.01808 0.0129 0.00971 0.00734 0.00581 0.00478 0.00418

ˆS(5) for a person born in 1891. ˆS(5) conditional on surviving 2 years ( T > 2 )

(c) (i) Give two reasons to prefer a parametric survival model above a non-parametric survival model. (ii) Suppose you want to fit an exponential survival model to your data. Before doing that, how would you check whether this model is suitable for your data? (iii) Write down the log-likelihood function for the exponential model given the data in question 1 (c). 3.

In an experiment to study the effect of lack of water on time to flowering of plants, 50 plants are exposed to a drought regime and 50 plants grow under normal conditions. Every week the plants are checked for flowering. Time of interest is time from start of the experiment to flowering. After 4 weeks all plants have flowered in the normal group, while only 20 plants flowered in the group exposed to drought. In the group exposed to drought 5 plants died between the second and third week and another 10 between the third and fourth week of the experiment. The times to flowering for the two groups are given in the following table time interval 0-1 1-2 2-3 3-4

normal 5 10 12 23

drought 0 0 5 15

(a) Explain why the life table estimator is a good choice for estimating the survival curve for the two groups. ˆ at t = 2.5 weeks and also the corresponding two standard errors (b) (i) Compute S(t) for each of the two groups. (ii) Write down the test statistic to test the null hypothesis that the survival probability at 2.5 weeks is the same in the two groups. (iii) Compute the symmetric 95% confidence interval for S(2.5) in the group exposed to drought. 3

Turn Over

MATH277501 (iv) Suppose that a 95% confidence interval contains values above 1. Briefly explain why and how this situation should be addressed. (c) Derive an approximate formula to calculate the standard error of ˆ S(t) , ˆ 2 1 + S(t) ˆ . Here S(t) ˆ is the life table using Greenwood’s expression for the variance of S(t) estimator for S(t) . Hint: use the delta method. (d) Suppose that one of the reasons that plants die, is a virus which typically attacks the plant just before it starts flowering. Do you think that we can assume uninformative censoring? Explain your answer clearly. 4.

Twenty cancer patients are randomized to receive either treatment A or treatment B. Time of interest is time from surgery to relapse. The following times in months are observed for groups A and B, respectively: A: B:

1 1 1 1 1 1 2 3∗

2 2 2 2 4 4 5 6∗

5 6∗

6 6∗

where the asterisk indicates right censoring. (a) The interest is comparing the survival between the two groups of patients. (i) Write down the null hypothesis and the alternative hypothesis for comparing the survival between the two groups. (ii) Describe how the normal version of the log-rank test statistic (UL = OA − EA ) is calculated. Explain any notation that you use. (iii) State the formula of the variance of UL under H0 . (b) (i) Calculate the value of the chi-square version of the log-rank test. (ii) Given the value of the test statistic computed in part (b)(i), what is your conclusion with regard to the null hypothesis? Use a 5% significance level. (iii) Provide an example where you would prefer the normal version of the log-rank test. (c) We may also compare the two groups by assuming that the exponential distribution holds in both groups. (i) Write down the null hypothesis for comparison of the survival distributions of the two groups under the assumption that within the groups the survival follows an exponential distribution. Explain the notation you use. (ii) Write down the formula of the Wald test statistic for the null hypothesis in part (c) (i) and compute its value for these data. (iii) Given the value of the test statistic computed in part (c) (ii), what is your conclusion with regard to the null hypothesis? Use a 5% significance level. 4

Turn Over

MATH277501 5.

The table below shows the time in months until the first absence from school due to influenza in a small group of toddlers at a pre-school, boys: girls:

6∗ 4

11∗ 9

15 25∗ 9∗ 11∗

25∗ 20∗

28 24∗

where an asterisk means censoring due to leaving school for another reason. (a) The following Cox proportional hazards model is used to model these data: h(t, x) = h0 (t) exp(βx) , where h0 (t) is the baseline hazard rate and x = 0 for boys and x = 1 for girls. (i) Show that the partial likelihood function L(β) is given by L(β) =

exp(2β) , C{1 + exp(β )}2 {2 + exp(β )}

where C > 0 is a numerical constant, and identify C . (ii) Calculate the maximum partial likelihood estimate ˆβ of the parameter β . (iii) Calculate the approximate variance of the estimate ˆβ found in part (a) (ii). (iv) Using parts (ii) and (iii), test the null hypothesis that boys and girls have the same distribution of time to absence due to influenza at pre-school. Use a 5% significance level. (b) Write down a second test statistic (different from (a)(iv)) to test the null hypothesis H0 : β = 0 for this Cox proportional hazards model and describe how you calculate this test statistic. (c) With regard to time to absence due to influenza at pre-school, researchers were able to find a larger dataset. In addition to gender, they also have information on social economic status of the parents (SES) (continuous variable, higher values reflect higher SES) and whether the child had an anti influenza vaccination (1 if vaccinated and 0 otherwise). To assess the significance of these covariates for the time to absence due to influenza, various Cox proportional hazards models were fitted and the resulting values of the maximised partial log-likelihood were calculated as follows Model Mi M0 M1 M2 M3 M4 M5

ˆi Variables in Mi log L None -29.767 Vaccination -26.783 SES -27.691 Sex -29.425 Vaccination, SES -24.832 Vaccination, Sex -26.259

(i) In your opinion, which model is the best? Motivate your answer! (ii) Suppose for M4 the βˆV corresponds to the covariate vaccination (1 if vaccinated and 0 otherwise). Explain how to interpret βˆV . 5

End of Questions.

MATH277501

Normal Distribution Function Tables

0.0

0.1

0.2

0.3

0.4

The first table below gives the values of the cumulative distribution function of the standard normal distribution N (0, 1) , Z x 1 2 1 e− 2 u du, x ∈ R, Φ(x) = √ 2π −∞ shown as the shaded area in the figure. This is the probability that a random variable, normally distributed with zero mean and unit variance, is less than or equal to x. Values of Φ(x) are tabulated below for x ≥ 0 only; when x < 0 use Φ(x) = 1 − Φ(−x) (as the distribution N (0, 1) is symmetric).

−3

−2

−1

0

1

x

2

3

For (linear) interpolation use the formula  x − x1  Φ(x) ≈ Φ(x1 ) + Φ(x2 ) − Φ(x1 ) x2 − x1 (x1 < x < x2 )

Table 1 x 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

(Φx) 0.5000 0.5199 0.5398 0.5596 0.5793 0.5987 0.6179 0.6368 0.6554 0.6736 0.6915

x (x ) Φ x (x ) Φ x 0.50 0.6915 1.00 0.8413 0.55 0.7088 1.05 0.8531 0.60 0.7257 1.10 0.8643 0.65 0.7422 1.15 0.8749 0.70 0.7580 1.20 0.8849 0.75 0.7734 1.25 0.8944 0.80 0.7881 1.30 0.9032 0.85 0.8023 1.35 0.9115 0.90 0.8159 1.40 0.9192 0.95 0.8289 1.45 0.9265 1.00 0.8413 1.50 0.9332

(Φx) 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00

x 0.9332 0.9394 0.9452 0.9505 0.9554 0.9599 0.9641 0.9678 0.9713 0.9744 0.9772

(Φx) 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50

x (x) Φ 0.9772 2.50 0.9798 2.55 0.9821 2.60 0.9842 2.65 0.9861 2.70 0.9878 2.75 0.9893 2.80 0.9906 2.85 0.9918 2.90 0.9929 2.95 0.9938 3.00

0.9938 0.9946 0.9953 0.9960 0.9965 0.9970 0.9974 0.9978 0.9981 0.9984 0.9987

The inverse function Φ−1 (γ) is tabulated below for various values of γ .

Table 2 γ Φ (γ) −1

0.900 1.2816

0.950 1.6449

0.975 1.9600

0.990 2.3263

6

0.995 2.5758

0.999 3.0902

0.9995 3.2905

Turn Over

MATH277501

Percentage Points of the χ2 -Distribution This table gives the percentage points kν (P ) of the χ2ν -distribution with ν degrees of freedom for various values of P and ν , as indicated by the figure to the right (plotted in the case ν = 6). That is, if a random variable X has a χ2ν -distribution then P/100 is the probability that X > kν (P ) . For large values of ν (ν > 80), percentage points kν (P ) can be obtained using that √ 2X is approximately normally distributed √ with mean 2ν − 1 and unit variance. P ν

10

5

2.5

1

0.5

0.1

0.05

1 2 3 4 5

2.706 4.605 6.251 7.779 9.236

3.841 5.991 7.815 9.488 11.070

5.024 7.378 9.348 11.143 12.833

6.635 9.210 11.345 13.277 15.086

7.879 10.597 12.838 14.860 16.750

10.828 13.816 16.266 18.467 20.515

12.116 15.202 17.730 19.997 22.105

6 7 8 9 10

10.645 12.017 13.362 14.684 15.987

12.592 14.067 15.507 16.919 18.307

14.449 16.013 17.535 19.023 20.483

16.812 18.475 20.090 21.666 23.209

18.548 20.278 21.955 23.589 25.188

22.458 24.322 26.124 27.877 29.588

24.103 26.018 27.868 29.666 31.420

11 12 13 14 15

17.275 18.549 19.812 21.064 22.307

19.675 21.026 22.362 23.685 24.996

21.920 23.337 24.736 26.119 27.488

24.725 26.217 27.688 29.141 30.578

26.757 28.300 29.819 31.319 32.801

31.264 32.909 34.528 36.123 37.697

33.137 34.821 36.478 38.109 39.719

16 17 18 19 20

23.542 24.769 25.989 27.204 28.412

26.296 27.587 28.869 30.144 31.410

28.845 30.191 31.526 32.852 34.170

32.000 33.409 34.805 36.191 37.566

34.267 35.718 37.156 38.582 39.997

39.252 40.790 42.312 43.820 45.315

41.308 42.879 44.434 45.973 47.498

25 30 40 50 80

34.382 40.256 51.805 63.167 96.578

37.652 43.773 55.758 67.505 101.879

40.646 46.979 59.342 71.420 106.629

44.314 50.892 63.691 76.154 112.329

46.928 53.672 66.766 79.490 116.321

52.620 59.703 73.402 86.661 124.839

54.947 62.162 76.095 89.561 128.261

7

End of Paper.

IMPORTANT NOTE The attached check-sheet contains the final answers to some, not necessarily all questions on the exam. Answers to questions requiring longer answers, for example proofs, are not given.

Please note. In the exam, students are expected to show their full work on the exam script, not just final answers.

Advice. Use this check-sheet to check your answers AFTER you have worked through the exam as if you were in an exam situation, i.e. without access to notes, books, answers to exercises, etc. This way you will test whether you can tackle the problems without any help as in the exam.

MATH2775 Survival Analysis (Session 2016–2017) Check Sheet Exam Paper, May/June 2017 [4] 1.

(a) Left Censoring: some of the Baboons are already in the tree so you miss the time of origin Right Censoring: some of the Baboons are still in the tree when you leave so you miss the end time If interval censoring is mentioned, an appropriate explanation should be given. (b) (i) (Bookwork)

[2]

P (T ≤ t + δt|T > t) h(t) = lim δt→0 δt Rt (ii) (Bookwork) S(t) = exp(− 0 h(s)ds) .

(c) (i) The Kaplan-Meier estimate of the survival curve is given by time 0 50 51 82 92

n.risk 8 8 7 5 4

n.event 0 1 1 1 1

n.cens 0 0 1 0 3

The standard error is the square root of the variance and equals 0.153

1

[4]

survival 1 0.875 0.750 0.600 0.450

ˆ ˆ = 0.75 and the variance: (ii) t = S(60) = S(51) b ≈ 0.023. Var{S(t)}

[2]

[4]

[2] (iii) ˆ S(60) − 0.5 ˆ s.e.(S(60))

[2]

(iv) The null hypothesis is H0 : S(60) = 0.5 . (0.750 − 0.5) = 1.63 0.153 Since 1.63 < 1.96 and 1.63 > −1.96, we are not able to reject the the null hypothesis of S(60) = 0.5. Here, 1.96 is the critical value of the normal distribution. If 2 is used as critical value, it is fine as well. t=

2.

(a) (i) (Bookwork) For a random variable T ≥ 0, Z Z ∞ Sx (t)dt = E(Tx ) =

[2] ∞

0

0

S(x + t) dt S(x)

(ii) (Bookwork) t qx is the distribution of T given T > x which is P (T ≤ t+x|T > [2] x) , t px is 1 −t qx = Sx (t) = P (T > x + t|T > x) . (iii) (Bookwork) [3] [3] (b) (i) S(T > 5) = 0.83 (ii) S(T > 5|T > 2)) = 0.96 [2] (c) (i) A parametric model is more efficient since only a few parameters need to be [4] estimated. You may know that a specific parametric model holds from previous research and therefore this parametric model is the obvious choice. Might be computational attractive. ˆ ˆ ) versus t and the relationship should [2] (ii) (Bookwork) Plot H(t) (or − log( S(t)) be linear and the line should go through the origin. [4] (iii) The loglikelihood function is ℓ(λ) = 3 log(λ) − 701λ 3.

For the normal conditions: time 0-1 1-2 2-3 3-4

ni di 50 5 45 10 35 12 23 23

ci 0 0 0 0

ni′ 50 45 35 23

ni′ −di ni′

0.9 0.76 0.66 0

S(ti ) 0.9 0.68 0.45 0

For the draught conditions: time

ni

di

ci

0-1 1-2 2-3 3-4

50 0 0 50 0 0 50 5 5 40 15 10 2

ni′

ni′−di n′i

50 50 47.5 35

1 1 0.89 0.57

S(ti ) 1 1 0.89 0.51

(a) (Bookwork) The event times are given per interval and the exact event times are [3] not known. [4] ˆ ˆ and is 0.89 for the draught group and 0.45 (b) (i) S(t) at t = 2.5 weeks is S(2) for the normal group. The variances and standard errors are Var = 0.002 and s.e. of 0.04 and Var = 0.005 and s.e. of 0.07 for draught and normal group respectively. (ii) (Bookwork) [2] ˆ ˆ Snormal − Sdraught q ˆ ˆ Var( S normal ) + Var(Sdraught ) [2] ˆ in the group exposed to (iii) The symmetric 95% confidence interval for S(2.5) draught is (0.81, 0.97) (iv) (Bookwork) When the confidence interval comprises values larger than 1, one [3] ˆ . The transmay construct a confidence interval for a transformation of S(t) ˆ formed S(t) should be able to take all values from −∞ to ∞. A confidence ˆ should then be back transformed to obtain a interval for the transformed S(t) ˆ confidence interval for S(t) . [3] (c)

" # k ˆ 2} 2 X ˆ ˆ S(t) di S(t){1 − S(t) Var( )= ′ ′ ˆ ˆ 2 2 2 n (n 1 + S(t) {1 + S(t) } i i − di ) i=0

for tk ≤ t < tk+1

[3]

(d) When the plants die, because a virus typically attacks the plant just before it starts flowering the censoring is informative, because it is likely that the plant would have flowered in the next time interval. 4.

(a) (i) (Bookwork) H0 : SA (t) = SB (t) for all t > 0 and Ha : SA (t) 6= SB (t) [2] P (ii) (Bookwork) OA is the total number of events in group A. EA = i eiA where [3] i runs over event times ti of both groups and eiA = nAi dnii with di the total number of events at ti in group A and B and ni the total number of observations in group A and B at time ti . (iii) (Bookwork) [2] X n1i n2i di (ni − di ) . 2(n − 1) n i i i [4] (b) (i) The table to compute the log-rank statistic is as follows

3

time 1 2 4 5 6

nA 10 6 2 2 1

dA 4 4 0 1 1 10

cA 0 0 0 0 0

nB 10 8 6 4 3 6

dB 2 1 2 1 0

cB 0 1 0 0 0

d n

eiA 0.3 3 0.36 2.14 0.25 0.5 0.33 0.66 0.25 ...


Similar Free PDFs