Introduction to Biostatistics for Clinicians PDF

Title	Introduction to Biostatistics for Clinicians
Author	yadgar Hamakarim
Course	Biostatistics
Institution	University of Sulaimani
Pages	147
File Size	4.3 MB
File Type	PDF
Total Downloads	302
Total Views	706

Preview

CLICK TO PREVIEW PDF

Summary

Introduction to Biostatistics for Clinicians Geert Verbeke Interuniversity Institute for Biostatistics and statistical Bioinformatics K.U Hasselt University, Belgium verbeke Contents I February 3, 2009 1 1 What is statistics ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

Description

Introduction to Biostatistics for Clinicians

Geert Verbeke I-BioStat: Interuniversity Institute for Biostatistics and statistical Bioinformatics K.U.Leuven & Hasselt University, Belgium [email protected] http://perswww.kuleuven.be/geert verbeke

Contents I

February 3, 2009

1

1

What is statistics ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3

Some frequently used tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

II

February 10, 2009

4

Errors in statistics: Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5

Errors in statistics: Practical implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

2

37

i

III

February 17, 2009

6

Diagnostic tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7

Survival analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Bibliography

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

94

142

ii

Part I February 3, 2009

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

1

Chapter 1 What is statistics ?

⊲ Example ⊲ Population – sample ⊲ Random variability

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

2

1.1

Example: Captopril data

• 15 patients with hypertension • The response of interest is the supine blood pressure, before and after treatment with CAPTOPRIL • Research question:

How does treatment affect BP ?

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

3

• Dataset ‘Captopril’ Before

After

Pati¨ent

SBP

DBP

SBP

DBP

1

210

130

201

125

2

169

122

165

121

3

187

124

166

121

4

160

104

157

106

5

167

112

147

101

6

176

101

145

85

7

185

121

168

98

8

206

124

180

105

9

173

115

147

103

10

146

102

136

98

11

174

98

151

90

12

201

119

168

98

13

198

106

179

110

14

148

107

129

103

15

154

100

131

82

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

Average (mm Hg) Diastolic before:

112.3

Diastolic after:

103.1

Systolic before:

176.9

Systolic after:

158.0

4

• It would be of interest to know how likely the observed changes in BP are to occur by pure chance. • If this is very unlikely, the above data provide evidence that BP indeed decreases after treatment with Captopril. Otherwise, the above data do not provide evidence for efficacy of Captopril. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

5

• Obviously, we are not interested in drawing conclusions about the 15 observed patients only. • Instead, we would like to draw conclusions about the effect of Captopril on the total population of all hypertensive patients. • Conclusion: Statistics aims at drawing conclusions about some population, based on what has been observed in a random sample

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

6

P O P U L A T I O N

•••••••••••••••••••••••••••••••• •••••••••• •••••• • • • • •••• ••• • • ••• •• • ••• • •• ••• • • • •••• •• • ••• • • • ••• • • • • • • ••• •• • • • ••• •• ••• ••• ••• • • •••• •• •••••• •••• • • • • •••••••• •••••••••••••••••••••••••••••••••••••••••••••• ••• ••• ••• ••• •• ••• ••••• •••• •••••••••••• ••••••••

Effect of Captopril in population

• •••••••••••• • • •• •• ••• ••• ••• ••• •• •••

RANDOM S A M P L E

•••••••••••••••••••••••••••••••••• ••••••• ••••• • • • • ••• • • ••• • ••• ••• • • •• • • • •• • • • •• •••• • ••• • • ••• ••• •••• • • • • ••••• • ••••••• ••• ••••••••••••••• ••••••••••••••••••• ••••••••••••

STATISTICS

••• •• •••• • • • ••••••• ••• • • • • • • • • • • • • • •

Effect of Captopril in 15 patients

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

7

1.2

Population versus random sample

• Population: Hypothetical group of current and future subjects, with a specific condition, about which conclusions are to be drawn • Sample: Subgroup from the population on which observations will be taken • In order for effects observed in the sample to be generalizable to the total population, the sample should be taken at random

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

8

1.3

Random variability

• Descriptive statistics of the observed differences in diastolic BP, after treatment with Captopril, in 15 subjects:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

Before

After

Change

Pati¨ent 1

DBP 130

DBP 125

5

2 3 4

122 124 104

121 121 106

1 3 −2

5

112

101

11

6 7 8 9

101 121 124 115

85 98 105 103

16 23 19 12

10 11 12

102 98 119

98 90 98

4 8 21

13

106

110

−4

14 15

107 100

103 82

4 18

9

• Note that not all subjects experience the same benefit from the treatment • An average decrease of 9.27 mm/Hg is observed in our sample • A new, similar, experiment would lead to another sample, hence to another observed change in BP: ⊲ More reduction (11.57 mm/Hg) ? ⊲ Less reduction (4.78 mm/Hg) ? ⊲ No change (0.00 mm/Hg) ? ⊲ Increase (-5.23 mm/Hg) ? • This shows that the observed decrease of 9.27 mm/Hg should not be overinterpreted • This also shows that one should not hope that 9.27 mm/Hg is the gain in BP one would observe if the total population were treated with Captopril. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

10

• Let µ be the average change in BP one would observe if the total population would be treated • 9.27 mm/Hg can then be interpreted as an estimate for µ, based on our sample • Question: Is our observed change of 9.27 mm/Hg sufficient evidence to conclude that the treatment really affects the BP ? • Answer: Hypothesis testing

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

11

P O P U L A T I O N

•••••••••••••••••••••••••••••••• ••••••••••• ••••••• • • • • • •••• ••• • • • •••• •• ••• • •• ••• • • • •••• •• • ••• • • • ••• • • • • • • ••• •• • • • ••• •• ••• ••• • ••• • •••• •• •••••• •••• • • • • •••••••• •••••••••••••••••••••••••••••••••••••••••••••• ••• ••• ••• ••• •• ••• ••••• •••• •••••••••••• ••••••••

Is µ different from 0

•••• •••••••••• • • ••• •••• •••• ••• ••• •• •••

RANDOM S A M P L E

•••••••••••••••••••••••••••••••••• ••••••• ••••• • • • • ••• • • • ••• ••• ••• • • • •• • • •• • • • •• •••• • ••• • • ••• ••• •••• • • • • ••••• • ••••••• ••• ••••••••••••••• ••••••••••••••••••• •••••••••••••

STATISTICS

?

••• •• •••• • • • •• ••••••• • • • • • • • • • • • • • • • • •

Observed effect of 9.27 mm/Hg in 15 randomly selected patients

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

12

Chapter 2 Hypothesis testing

⊲ Example ⊲ Null and alternative hypothesis ⊲ The p-value and level of significance ⊲ Possible errors in decision making

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

13

2.1

Example

• As before, µ is the average change in diastolic BP one would observe if the total population of hypertensive patients would be treated with Captopril. • Note that µ will never be known, but we can use our sample to learn about µ. • In case the treatment would have no effect, the average µ would be zero. • So, if one can show that there is (strong) evidence that µ 6= 0, then this can be considered as evidence for a treatment effect. c = 9.27mm/Hg. • Based on our sample of 15 observations, we estimated µ by µ

• Obviously, this estimate is relatively far away from 0, suggesting that the treatment might affect BP Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

14

c = 9.27 could have occurred by pure • On the other hand, the observed effect µ chance, even if there would be no treatment effect at all.

• Question:

How likely would that be ?

• Only if this would be very unlikely to happen, the observed data will be considered sufficient evidence for some effect of the treatment

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

15

2.2

Null and alternative hypothesis

• The procedure to decide whether there is sufficient evidence to believe the treatment did affect BP is called test of hypothesis • In practice, the research question is formulated in terms of a null hypothesis H0 and an alternative hypothesis HA: H0 : µ = 0

versus

HA : µ 6= 0

• Based on our observed data, we will investigate whether H0 can be rejected in favour of HA • If not, the null hypothesis H0 is accepted and one decides that the treatment was not effective

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

16

2.3

The p-value and level of significance

• Intuitively, it is obvious that H0 : µ = 0 will be rejected if the observed sample average µc is too far away from 0 • Question: How far is too far ? • Answers: If this result is very unlikely to happen by pure chance If this result is not at all what you expect to see if µ would be 0

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

17

• One can calculate that, if Captopril would have no effect at all, that there is only 0.1% chance of observing a sample with average change in BP at least as big as 9.27mm/Hg. • Hence, if Captopril would have no effect (i.e., if µ = 0), then it would be very unlikely to observe a sample with average as extreme as 9.27. This would happen only once every 1000 times a similar experiment would be performed. • We therefore consider the data observed in our experiment sufficient evidence to reject the null hypothesis and we conclude that the treatment effect is significantly different from 0, or equivalently, that there is a significant treatment effect • The probability 0.1% that expresses how extreme our observations are in case the null hypothesis would be true, is denoted by p, and is called the p-value.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

18

• A small p-value is indication of extreme results were H0 true. One then rejects the null hypothesis • A large p-value is indication that the observed results are perfectly in line with what can be expected to observe, if H0 is true. One then does not reject the null hypothesis, which is equivalent to accepting the null hypothesis • In practice, one has to decide how small p should get before the null hypothesis is rejected. • One therefore specifies the so-called level of significance α: p < α =⇒ reject H0 p ≥ α =⇒ accept H0

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

19

• α is typicaly a small value, such as 0.01, 0.05, 0.10 • In biomedical sciences α = 0.05 = 5% is standard. • One then rejects the null hypothesis as soon as the observed result would happen in less than 5 times in 100 experiments, assuming that the null hypothesis would be correct • Strictly speaking, one should always mention what level of significance has been used, and the conclusion would have to be formulated as “the treatment effect is significantly different from 0 at the 5% level of significance,” or equivalently, that “there is a significant treatment effect at the 5% level of significance.”

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

20

• Note that specification of α is only required if a formal decision is preferred (‘accept’ or ‘reject’). • It is therefore not meaningful to report ‘borderline significance’ in examples where p is only slightly larger than α (e.g., p = 0.06 > α = 0.05)

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

21

2.4

Possible errors in decision making

• In our example about the Captopril treatment, we obtained p = 0.001 leading to the rejection of the null hypothesis of no treatment effect. • This should not be considered as formal proof that there is a treatment effect • Even if the treatment has no effect at all, a sample like ours would occur once every 1000 times. • Maybe, our sample was indeed the extreme one that happens once every thousand experiments. • Alternatively, suppose we would have obtained p = 0.9812. We then would not have rejected the null hypothesis, and concluded that there is no evidence for any treatment effect. Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

22

• This should not have been considered as formal proof that any treatment effect would be absent. • Maybe, the treatment effect µ is not 0, but very close to 0. The data one then would observe would look very similar to data that would be observed if µ = 0, such that the data do not allow to detect that µ 6= 0 • Conclusion:

“Statistics can prove everything”

• Intuitively: Absolute certainty about population characteristics cannot be attained based on a finite sample of observations Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

23

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

24

Chapter 3 Some frequently used tests

⊲ The unpaired t-test ⊲ The chi-squared test ⊲ The paired t-test ⊲ Assumptions

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

25

3.1

The unpaired t-test

• Consider data from a rat experiment to study weight gain under a high or a low protein diet • Group-specific histograms:

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

26

• Group-specific summary statistics:

• On average, there is an observed difference of 19g between the rats on a high protein diet and those on a low protein diet. • Is this observed difference sufficient evidence to conclude that there indeed is an effect of diet on the weight gain ? • It would be of interest to know how likely such a difference of 19g is to occur if weight gain would be completely unrelated to the protein level of the diet.

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

27

• Based on the unpaired t-test, it can be calculated that, in case the diet would not affect the weight gain at all, one would have p = 0.0757 = 7.57% chance of observing a difference of at least 19g, in a similar experiment. • So, even if there is no relation at all between the protein content of the diet and weight gain, then one can still expect to observe a difference of at least 19g in 7.6% of the future similar experiments. • Since p = 0.0757 > 0.05 = α, we consider this unsufficient evidence to conclude that the protein level would indeed affect the weight gain

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

28

• Conclusion: There is no significant difference (p = 0.0757) in weight gain between rats on a high protein level diet, and rats on a low protein level diet

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

29

3.2

The chi-squared test

• We consider data on sickness absence, collected on 585 employees with a similar job:

Sickness absence Gender

No

Yes

female

245

184

429

male

98

58

156

343

242

585

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

30

• Research question:

Is there a relation between absence and gender ?

• 184/429 = 42.9% of the females, and 58/156 = 37.2% of the males have been absent • This suggests that females are more absent than males • However, even if absence due to sickness is equally frequent amongst males and females, the above results could have occurred by pure chance. • It therefore would be of interest to calculate how likely it would be to observe such differences, by pure chance

Eli Lilly: Introduction to Biostatistics for Clinicians, February 2009

31

• Based on the chi-squared test, it can be calculated that, even if males and females would be equally frequently absent, there would be p = 0.215 = 21.5% chance of observing a similar experiment with difference between the groups at least equal to 0.429 − 0.372 = 0.057 • So, even if there is no relation at all between gend...