Exam June 2015, questions PDF

Title Exam June 2015, questions
Course Survival Analysis
Institution University of Leeds
Pages 16
File Size 388.9 KB
File Type PDF
Total Downloads 47
Total Views 142

Summary

Download Exam June 2015, questions PDF


Description

MATH277501

This question paper consists of 9 printed pages, each of which is identified by the reference MATH277501.

All calculators must carry an approval sticker issued by the School of Mathematics. Statistical tables are attached. Graph paper is provided.

c University of Leeds  School of Mathematics Examination for the Module MATH2775 SURVIVAL ANALYSIS May/June 2015 Time allowed: 2 hours Answer no more than 4 questions. If you attempt 5, only the best 4 will be counted. All questions carry equal marks.

1

Turn Over

MATH277501 1.

(a) (i) Explain what is meant by censored observations in survival data. Describe the interval censoring and illustrate it by a practical example. (ii) In a study of age at onset of diabetic retinopathy, diagnosis of the condition was recorded in a sample of diabetic patients attending routine clinic appointments. Explain why the age at diagnosis and age at onset of the condition might differ, and identify the type of censoring involved. (b) In a certain country, the probability for a new-born baby to survive over the first year is estimated as 0.993. (i) Find the same probability for a country where the hazard rate for infants is three times higher. (ii) Find the same probability for a country where the hazard rate for infants is less by 0.001 year −1 . (c) A group of individuals with a sexually transmitted disease (STD), all initially HIVfree, was monitored for 12 years until HIV infection diagnosis. The numbers of those either diagnosed with HIV or censored are presented in the table below, where the term ‘censored’ refers to individuals who withdrew from the study prematurely for unrelated reasons, or who did not become HIV-infected during the course of the study. Time interval (years)

HIV-infected

Censored

0−2 2−4 4−6 6−8 8−10 10−12

4 5 6 5 4 6

4 7 7 10 8 14

(i) How many individuals were involved in the study? (ii) Treating times to becoming HIV-infected as ‘survival’ times, argue briefly whether it is appropriate to estimate the unknown survival function S(t) using the actue . arial (life-table) estimator S(t) e (iii) Calculate the actuarial estimate S(t) explaining any notation that you use, and sketch its plot using graph paper provided. (iv) Using part (iii), estimate a 10-year survival rate from these data.

e (5) and construct (v) Using Greenwood’s type formula, calculate the variance of S an approximate symmetric 95%-confidence interval for S(5) .

2

Turn Over

MATH277501 2.

(a) (i) Define the actuarial symbols t qx and t px , and show that t px

=

x+t p0

(x ≥ 0, t ≥ 0).

x p0

(ii) Demonstrate that for any fixed x ≥ 0 the function y(t) := t px (t ≥ 0) satisfies the differential equation y ′ (t) = −µx+t y(t), where µx+t is the force of mortality at time x + t .

(iii) Solving the differential equation of part (ii) subject to a suitable initial condition, obtain the formula   Z t µx+s ds , t ≥ 0. t px = exp − 0

(b) Below is an extract from the Life Table for the Total Population (USA, 2002): Age x (years)

Number surviving to age x (out of 100,000 births)

80 90

52,178 20,052

(i) Using part (a)-(i) or otherwise, verify the equality 10 p80

= 6 p80 · 4 p86

and explain its intuitive meaning. (ii) Assuming that the death probability between ages 80 and 90 satisfies a linear interpolation formula t q80 = (t/10)10 q80 (0 ≤ t ≤ 10), estimate from the data above the probability that a person aged exactly 86 years will die within the next four years. (c) A clinician studying survival times of patients with Dukes’ C colorectal cancer wishes to use a survival model with the hazard rate h(t) = αt/(t + 1), where α > 0 is an unknown parameter. This model is fitted to the data comprising n observed survival times t1 , . . . , tn , some of which may be right-censored. (i) Obtain the survival function S(t) and the probability density f (t) in this model. (ii) Suggest a suitable coordinate transformation y = F (S(t)), x = G(t) , which can be used for a graphical assessment of suitability of this model for a given survival data set. How could one obtain a crude graphical estimate of α ? (iii) Write down the likelihood function L(α) , defining any notation that you use, and derive the maximum likelihood estimator α b of the parameter α, explaining clearly why this is a maximum. 3

Turn Over

MATH277501 3.

b NA (t) of the (a) (i) Write down the general formula for the Nelson–Aalen estimator H unknown cumulative hazard function H(t), and use it to suggest an estimator bNA (t) of the survival function S(t) . S b NA (t)} and applying the delta method (ii) Using the formula for the variance Var{ H bNA (t) . as appropriate, derive an approximate expression for the variance of S (iii) Let t(1) be the first (uncensored) death time in a sample. Compare the expression for Var{SbNA (t(1) )} obtained in part (ii) with the corresponding value provided by Greenwood’s formula for the variance of the Kaplan–Meier (product-limit) estimate SbKM (t(1) ) , and prove a suitable inequality. When are these two values close to each other?

(b) In a clinical study of acute myeloma leukemia, patients were classified into two groups according to the presence (Group 1) or absence (Group 2) of a certain morphologic characteristic of white cells termed AG. The following death times ti (in months) were recorded in each group, where δi = 0 if ti is censored and δi = 1 otherwise: Group 1 (AG-positive)

ti δi

5 1

7 1

8 1

8 0

8 0

10 10 12 13 14 14 14 15 0 0 1 1 0 0 0 0

Group 2 (AG-negative)

ti δi

3 1

4 1

5 1

6 7 8 10 11 12 14 15 1 1 1 0 1 1 0 0

b Values of the Kaplan–Meier estimate S(t) for the two groups are given below: Group 1 (AG-positive)

t

5

Sb1 (t) t b2 (t) S

3

7

8

12

13

0.923 0.846 0.769 0.641 0.513 Group 2 (AG-negative)

4

5

6

7

8

11

12

0.909 0.818 0.727 0.636 0.545 0.455 0.341 0.227

(i) Apply a suitable graphical method to assess the validity of Weibull’s model for Groups 1 and 2 with common shape parameter γ = 2 and scale parameters λ1 and λ2 , respectively. Using fitted lines, obtain estimates for λ1 and λ2 . (ii) Using the log-likelihood function for a general Weibull model with γ = 2 : X X X ℓ(λ) = log(2λ) δi + δi log ti − λ ti2 , i

i

i

b2 of the parameters λ1 calculate the maximum likelihood estimates b λ1 and λ and λ2 , respectively, and estimate their variances. (iii) Apply a statistical test for differences between the two groups (at the 5%-signifib2 − λ b1 . cance level) based on the observed value λ 4

Turn Over

MATH277501 4.

(a) To study factors affecting recidivism, 432 prisoners were monitored during one year following their release from Maryland state prisons. Survival time recorded for each former prisoner was the number of weeks to first arrest after release; those not arrested during this period were censored at 52 weeks. Cox’s proportional hazards regression model was fitted to the data using the following covariates and the corresponding regression coefficients: • • • • • •

Age Fin Rac Wrk Mar Pri

(β1 ) : (β2 ) : (β3 ) : (β4 ) : (β5 ) : (β6 ) :

age (in years since 18th birthday) at the time of release from prison; value 1 if received financial aid after release, 0 otherwise; value 1 if black (African American), 0 otherwise; value 1 if ever had full-time work experience, 0 otherwise; value 1 if was married at the time of release, 0 otherwise; the number of prior convictions.

The results of estimation are shown below: j

Covariate

1 2 3 4 5 6

Age Fin Rac Wrk Mar Pri

b βj

–0.057 –0.379 0.314 –0.150 –0.434 0.091

sd(βbj ) 0.022 0.191 0.308 0.212 0.382 0.029

zj –2.591 –1.984 1.019 –0.708 –1.136 3.138

pj 0.010 0.047 0.308 0.479 0.256 0.002

where βbj is the maximum partial-likelihood estimate of βj , sd(βbj ) is its standard deviation, zj = βbj /sd(βbj ) , and pj is the corresponding p-value. (i) Define a suitable Cox regression model and identify the characteristics of former prisoners to whom the baseline hazard applies. How could this model be extended to elaborate the analysis?

(ii) Construct a symmetric 95%-confidence interval for the regression parameter β1 . β1 ? (iii) How do you interpret the minus sign of the estimate b Furthermore, observing that the estimated value βb1 is quite small, would you suggest that the factor Age is not significant? Argue using the confidence interval from part (ii). (iv) Without constructing confidence intervals for other parameters but using the corresponding p-values in the table above, argue briefly which covariates in the model are significant and which are not. (v) Using the fitted model, calculate and comment on the hazard ratio between: – a black prisoner aged 40, married, with no record of full-time employment, having one previous conviction, released with financial aid, and – a white prisoner aged 38, divorced, having 10 years of full-time employment, with three previous convictions, released without financial aid. 5

Turn Over

MATH277501 (b) A study was conducted to compare possible detrimental health effects of two different patterns of exercising: irregular/minor (A), and extensive (B) defined as three hours or more at least five times a week. Subjects aged 50 years were allocated to Groups A or B according to their average pattern of past exercising between ages 20 and 40 years, and then monitored until death of heart disease. A small excerpt from the data showing age at death is given below; right-censored observations (such as deaths of an unrelated cause or early withdrawals) are marked with an asterisk ∗ . Group A (irregular/minor)

58

64

66

66∗ 70∗ 70∗

Group B (extensive)

62∗ 66 67 69

70

70∗

(i) State a suitable null hypothesis H0 and an alternative hypothesis H1 to compare the effects of different exercising patterns. (ii) Describe briefly the Gehan–Breslow (weighted log-rank) test, explaining all the notation required. What is the purpose of using weights in this test? (iii) Carry out the Gehan–Breslow test for the above data at the 5%-significance level. Show your calculations and state the conclusions. (iv) From a suitable statistical table, calculate an approximate p-value in part (iii) using linear interpolation if necessary. What does this value tell you? 5.

(a) Acme Products operates a policy under which a warranty is offered for x years, where x is the largest integer for which there is no more than a 2% probability of product failure before time x. Acme introduces a new product with hazard rate h(t) = 0.001(2t + 1), where t ≥ 0 is time measured in years. Calculate the length of warranty for this product. (b) An actuary working for an insurance company needs to compare two categories of new drivers by analyzing times (measured in weeks) from obtaining a full driving licence until their first car accident leading to a claim. Data were collected for two groups of drivers, non-smokers (Group I) and smokers (Group II): Group I (non-smokers)

6∗ 16 22∗ 26∗ 28∗ 38 40∗

Group II (smokers)

8∗ 10 12∗ 14∗ 28

35 45

where an asterisk ∗ indicates right-censored observations due to loss to-follow-up or discontinued driving. The actuary wants to fit a Cox proportional hazards model h(t, x) = eβx h0 (t) , where h0 (t) is the baseline hazard rate, and x = 1 for a smoking driver and x = 0 otherwise. (i) State any limitations that apply to the use of Cox’s formula for the partial likelihood function. (ii) If conditions in part (i) are met for the data above, obtain the partial likelihood function L(β) . 6

Turn Over

MATH277501 (iii) Calculate the maximum partial-likelihood estimate bβ of the parameter β and interpret the result in terms of the hazard ratio of Group II relative to Group I. β found in part (iii). (iv) Evaluate the approximate variance of the estimate b (v) Using parts (iii) and (iv), test statistically (at the 0.1-significance level) whether smoking drivers are prone to a higher risk of car accident than non-smokers.

(c) Continuing the investigation of the data in question (b) above, the actuary decided, for a comparison, to fit an exponential model in each group, and then test again for a higher risk of smokers vs. non-smokers. (i) Calculate the maximum likelihood estimates for the parameters λ1 and λ2 of a hypothetical exponential model in Groups I and II, respectively, and hence compute an estimate ψb of the hazard ratio ψ = λ2 /λ1 .

b and test (at the 0.1-significance level) (ii) Evaluate the standard deviation of log ψ the null hypothesis H0 : log ψ = 0 against a suitable one-sided alternative H1 .

End of Questions

7

Turn Over

MATH277501

0.3 0.2 0.1 0.0

The first table below gives the values of the cumulative distribution function of the standard normal distribution N (0, 1) , Z x 1 2 1 e− 2 u du, x ∈ R, Φ(x) = √ 2π −∞ shown as the shaded area in the figure. This is the probability that a random variable, normally distributed with zero mean and unit variance, is less than or equal to x. Values of Φ(x) are tabulated below for x ≥ 0 only; when x < 0 use Φ(x) = 1 − Φ(−x) (as the distribution N (0, 1) is symmetric).

0.4

Normal Distribution Function Tables

−3

−2

−1

0

1

x

2

3

For (linear) interpolation use the formula  x − x1  Φ(x) ≈ Φ(x1 ) + Φ(x2 ) − Φ(x1 ) x2 − x1 (x1 < x < x2 )

Table 1 x 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50

(Φx) 0.5000 0.5199 0.5398 0.5596 0.5793 0.5987 0.6179 0.6368 0.6554 0.6736 0.6915

x (x ) Φ x (x ) Φ x 0.50 0.6915 1.00 0.8413 0.55 0.7088 1.05 0.8531 0.60 0.7257 1.10 0.8643 0.65 0.7422 1.15 0.8749 0.70 0.7580 1.20 0.8849 0.75 0.7734 1.25 0.8944 0.80 0.7881 1.30 0.9032 0.85 0.8023 1.35 0.9115 0.90 0.8159 1.40 0.9192 0.95 0.8289 1.45 0.9265 1.00 0.8413 1.50 0.9332

(Φx) 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00

x 0.9332 0.9394 0.9452 0.9505 0.9554 0.9599 0.9641 0.9678 0.9713 0.9744 0.9772

(Φx) 2.00 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50

x (x) Φ 0.9772 2.50 0.9798 2.55 0.9821 2.60 0.9842 2.65 0.9861 2.70 0.9878 2.75 0.9893 2.80 0.9906 2.85 0.9918 2.90 0.9929 2.95 0.9938 3.00

0.9938 0.9946 0.9953 0.9960 0.9965 0.9970 0.9974 0.9978 0.9981 0.9984 0.9987

The inverse function Φ−1 (γ) is tabulated below for various values of γ .

Table 2 γ Φ (γ) −1

0.900 0.950 0.975 0.990 0.995 0.999 0.9995 1.2816 1.6449 1.9600 2.3263 2.5758 3.0902 3.2905

8

Turn Over

MATH277501

Percentage Points of the χ2 -Distribution This table gives the percentage points kν (P ) of the χν2 -distribution with ν degrees of freedom for various values of P and ν , as indicated by the figure to the right (plotted in the case ν = 6). That is, if a random variable X has a χν2 -distribution then P/100 is the probability that X > kν (P ) . For large values of ν (ν > 80), percentage points kν (P ) can be obtained using that √ 2X is approximately normally distributed √ with mean 2ν − 1 and unit variance. P ν

10

5

2.5

1

0.5

0.1

0.05

1 2 3 4 5

2.706 4.605 6.251 7.779 9.236

3.841 5.991 7.815 9.488 11.070

5.024 7.378 9.348 11.143 12.833

6.635 9.210 11.345 13.277 15.086

7.879 10.597 12.838 14.860 16.750

10.828 13.816 16.266 18.467 20.515

12.116 15.202 17.730 19.997 22.105

6 7 8 9 10

10.645 12.017 13.362 14.684 15.987

12.592 14.067 15.507 16.919 18.307

14.449 16.013 17.535 19.023 20.483

16.812 18.475 20.090 21.666 23.209

18.548 20.278 21.955 23.589 25.188

22.458 24.322 26.124 27.877 29.588

24.103 26.018 27.868 29.666 31.420

11 12 13 14 15

17.275 18.549 19.812 21.064 22.307

19.675 21.026 22.362 23.685 24.996

21.920 23.337 24.736 26.119 27.488

24.725 26.217 27.688 29.141 30.578

26.757 28.300 29.819 31.319 32.801

31.264 32.909 34.528 36.123 37.697

33.137 34.821 36.478 38.109 39.719

16 17 18 19 20

23.542 24.769 25.989 27.204 28.412

26.296 27.587 28.869 30.144 31.410

28.845 30.191 31.526 32.852 34.170

32.000 33.409 34.805 36.191 37.566

34.267 35.718 37.156 38.582 39.997

39.252 40.790 42.312 43.820 45.315

41.308 42.879 44.434 45.973 47.498

25 30 40 50 80

34.382 40.256 51.805 63.167 96.578

37.652 43.773 55.758 67.505 101.879

40.646 46.979 59.342 71.420 106.629

44.314 50.892 63.691 76.154 112.329

46.928 53.672 66.766 79.490 116.321

52.620 59.703 73.402 86.661 124.839

54.947 62.162 76.095 89.561 128.261

End of Paper 9

IMPORTANT NOTE The attached check-sheet contains the final answers to some, not necessarily all questions on the exam. Answers to questions requiring longer answers, for example proofs, are not given.

Please note. In the exam, students are expected to show their full work on the exam script, not just final answers.

Advice. Use this check-sheet to check your answers AFTER you have worked through the exam as if you were in an exam situation, i.e. without access to notes, books, answers to exercises, etc. This way you will test whether you can tackle the problems without any help as in the exam.

MATH2775 Survival Analysis Check Sheet (Exam Paper May/June 2015) (a) (Bookwork) (i) Answer: Censored observations arise due to incomplete information about survival of some subjects. (ii) Answer: Interval censoring arises when the end-point event is only known to occur between two predetermined times (e.g., onset of breast tumour between two annual screenings).   R1 (b) Hint: Use the general formula S(t) = exp − 0 h(t) dt . For the reference country,   R1 we have S0 (t) = exp − 0 h0 (t) dt = 0.993. (i) h1 (t) = 3h0 (t) , S1 (1) = 0.979 .

(ii) h2 (t) = h(t) − 0.001, S2 (1) = 0.994 . (c) (i) Answer: 30 with HIV and 50 censored, so 80 altogether. Hint: Add the numbers. (ii) Yes (data reported as totals from fixed 2-year intervals [t′i , t′i+1 ) , i = 1, . . . , 6). (iii) Here “death” is an HIV-positive diagnosis, and “survival time” is time to HIV (unless censored). The actuarial estimate is given by (bookwork) Y  ni′ − di  e . S (t) = ni′ t′ i ≤t

1.0

Hint: Can find numbers ni by starting from the last interval and working backwards, e.g., we have d6 = 6, c6 = 14, hence n6 = 20 and n6′ = 13, etc. Alternatively, can work from the beginning using that n1 = 80 .

0.8 0.6 0.4

0.949 0.880 0.786 0.693 0.594 0.319

0.2

0–2 2–4 4–6 6–8 8–10 10–12

e (t) S

0.0

...


Similar Free PDFs