Monitoring Part 4 - Tom Cook Spring 21 PDF

Title Monitoring Part 4 - Tom Cook Spring 21
Course Statistical Methods For Clinical Trials
Institution University of Wisconsin-Madison
Pages 17
File Size 585.7 KB
File Type PDF
Total Downloads 72
Total Views 134

Summary

Tom Cook Spring 21...


Description

10 Interim Monitoring

10.17

STAT/BMI 641 Spring 2020

Page 47

Conditional Power

Recall: Formal interim monitoring procedures allow early stopping for • benefit: the experimental treatment is shown to be superior to control • harm: apparent risks of the experimental treatment do not exceed potential benefit. • futility: little chance that the trial will be able to meet its objective – β-spending: we can rule out a beneficial effect of clinical importance using a formal interim tests of a fixed alternative hypothesis. – Conditional or “predictive” power: the probability of achieving a positive result at the planned end of the trial is small for alternative hypotheses of interest. Suppose we have H0 : µ = 0 Recall for any true µ (note that U and I are implicitly evaluated at zero), • E[U ] = Iµ p • E[Z] = Iµ. • At full information IK p – let θ = IK µ – E [Z(1)] = E [B(1)] = θ • At information fraction t, p – E[Z(t)] = tθ. – E[B(t)] = tθ.

Aside: Recall the general sample size formula I=

(Z1α/2 + Z1β )2 µ2

so at full information, IK , with power 1  β, two-sided level α (not adjusted for interim monitoring) p θ = IK µ = Z1α/2 + Z1β

Therefore, if we know power and significance level, we can find θ without needing to know anything else about the model parameters. • Normal data with mean difference µ, variance σ 2 – I=

nξ1 ξ2 σ2

10 Interim Monitoring

– θ=

r

STAT/BMI 641 Spring 2020

Page 48

nξ1 ξ2 µ σ2

• Binomial data with odds ratio ψ, mean rate p – I = nξ1 ξ2 p(1  p) – θ=

r

p nξ1 ξ2 (p0  p1 ) ⇡ nξ1 ξ2 p(1  p) log ψ p(1  p)

• Survival data with hazard ratio r, D total events – I = Dξ1 ξ2 – θ=

p Dξ1 ξ2 log r

We will reject H0 : µ = 0 at the end of the trial if Z(1) = B(1)  bK . Now suppose that at some interim point we have • accumulated information fraction t • observed test statistic Z(t) p – B(t) = Z(t) t The conditional probability of rejecting H0 at the end of the trial, given the observed value B(t) and an assumed value for θ is called conditional power and is given by Pr{Reject H0 | B(t), θ} = Pr {B(1)  bK | B(t), θ} The following figures show conditional Power, ignoring any early stopping for benefit or harm (the tick mark on right-hand side at approximately B(1) = 2 is the final critical value, bK ): Power = 1 − β

Conditional Power = 1 − β

4

4 E[B (t)|B (0.60)] = B (0.60) + θ(t − 0.60) E[B (t)] = θt

2

B (t)

I

3

I

3

2

B (t)

1

1

β 0

0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

-1

-1

-2

-2

0.2

0.4

0.6 0.

0.8

1.0

STAT/BMI 641 Spring 2020

4

3

3

I

4

2

B(t)

1

2

B(t)

0

Page 49

I

10 Interim Monitoring

1

0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

-1

-1

-2

-2

0.2

0.4

0.6

0.8

1.0

The upper gray tail probabilities in these figures show the conditional power given that the hypothesized value of θ is the true value. Here conditional power decrease over the four panels with increasing information and minimal evidence of treatment benefit. Conditional power is calculated using: Pr{Reject H0 | B(t), θ} = Pr {B(1)  bK | B(t), θ}  ⇢  B(1)  B(t)  (1  t)θ bK  B(t)  (1  t)θ  p p  = Pr B(t), θ  1t 1t ✓ ◆ bK  B(t)  (1  t)θ p = 1Φ 1t • If conditional power is low for alternative hypotheses of interest, then we may consider stopping the trial for futility. – Exposing additional subjects to risk when there is little prospect that the trial will yield sufficiently important results is unethical.

10 Interim Monitoring 10.17.1

STAT/BMI 641 Spring 2020

Page 50

Example: Traumatic Hemorrhagic Shock (THS), Conditional Power

• Planned sample size is 850, • Assumed mortality rates are 40% and 30% in the saline and DCLHb arms. θ=

p

850 ⇥ .35 ⇥ .65/4 log

40 ⇥ 70 = 3.07 60 ⇥ 30

• Given interim monitoring boundaries, bK = 2.044 – Note that for 85% power, Z1β = 1.036, 2.044 + 1.036 = 3.080. The two figures below show the planned interim monitoring boundaries for the THS trial that I discussed earlier. The left-hand panel shows the bounds on the Z-scale, the right-hand panel shows the bounds on the B-scale. The red dots shows the observed values of Z and B at the time that the trial was stopped. The right-hand panel also shows the unconditional and conditional expected values of B(t). Convert to B(t) scale

4

θ

3 2

2

1

Trial Stopped B (t)

Z-Score, All-cause Mortality

√ B(t) = Z(t) t

0 0.10

0.25

0.50

0.75

1.00

0 0.10

0.25

]=θ

E [B

(t)|θ

E [B

(t)|B

t

2), θ (0.13

θ 2) + (0.13 ]=B

0.50

0.75

(t −

) 0.132

1.00

-1 -2

-2

-3

-4

• At current analysis, – t = 112/850 = 0.132, – Z(t) = 2.427, p – B(t) = 2.427 t = 0.881 Conditional power is: ✓ ◆ 2.044  (0.881)  (1  0.132) ⇥ 3.07 p = 0.39 1Φ 1  0.132 0.881 + (1  0.132) ⇥ 3.07 = 1.786 Expected B(1) is only a little below bK , assuming that the expected θ is correct. • Conditional power 39% for assumed treatment difference. • Assumed treatment difference inconsistent with observed treatment difference. • May want to compute conditional power for plausible alternatives, given observed data.

10 Interim Monitoring

STAT/BMI 641 Spring 2020

Page 51

“Predictive Power” is the expected conditional power given observed data and a prior distribution for true treatment difference. • Predictive power uses Bayes’ Theorem, and hence is a mixed frequentist/Baysian procedure Z Predictive power = PC (θ) π(θ|B (t))dθ where • PC (θ) is conditional power • π(θ|B(t)) is the conditional density of θ given B(t). Predictive Power = 0.045% and the trial formally stopped on that basis. The red lines on the figure below represent random draws from the conditional (posterior) distribution of θ. Because the current estimate, θˆ is negative, the posterior mean is negative, and most samples from this distribution will also be negative. Hence, the conditional probability of demonstrating benefit is extremely small. θ

3 √ B(t) = Z(t) t

2

B(t)

1 0 0.10

0.25

0.50

-1 -2 -3 Random draws from π[θ|B(0.132)]

0.75

1.00

10 Interim Monitoring

10.18

STAT/BMI 641 Spring 2020

Page 52

Inference Following Sequential Testing

p • If Zk = Uk / Ik , then unconditionally, E[Zk ] =

p

Ik µ.

• When sequential testing is used, and we stop at analysis k ⇤ , the sampling distribution of p Zk⇤ not N ( Ik⇤ µ, 1) and p E[Zk⇤ ] 6= Ik⇤ µ. • This introduces bias in – point estimates of treatment differences – p-values • If unbiased estimates and valid p-values are desired, then adjustments to nominal values are required. p Note that because (unconditionally) E[Zk ] ⇡ Ik µ, we could estimate µ by (”one-step” estimate) Zk Bk Bk µ ˆ=p = p p = p Ik tk tk IK tk IK Also, because E[ZK ] = θ and E[Bk ] = θtk , ˆθ = Bk tk and

θˆ µ ˆ =p , IK

if we know IK (expected full information), then estimating θ is equivalent to estimating µ. We’ll consider estimation of θ. 10.18.1

Sub-densities when θ > 0

The two figures below show the sub-densities at four analysis times for B(t) and Z(t). 10

10 √ E[Z(t)] = θ t

E[B(t)] = θt θ = 1.62

8

6

θ = 1.62

8

6 Z(t) 4

B(t) 4

2

2

0

0 0.25

0.50 t=Information Fraction

0.75

1.00

√ Z(t) = B(t)/ t

0.25

0.50 t=Information Fraction

0.75

1.00

10 Interim Monitoring

STAT/BMI 641 Spring 2020

Page 53

The left-hand figure below shows the sub-densities for ˆθ at each analysis time, where p ˆ θ = Z/ t = B/t (θ is proportional to µ). The right-hand figure simply adds these up horizontally to obtain the sampling distribution of ˆθ. The dashed line shows the density of ˆθ if there were no early stopping. The mass that is removed from below this line is shifted to the right, so the sampling distribution of ˆθ is pushed to the right, inducing bias in the estimation of θ. 10

E[ ˆθ(t)] = θ

ˆ with no early stopping f (θ)

θ = 1.62

8

6 ˆ θ 4

2 √ ˆ θ(t) = Z (t)/ t = B(t)/t 0 0.25

10.18.2

0.50 t=Information Fraction

0.75

1.00

-2

0

2

4

6

8

10

Bias in θˆ

ˆ  θ, as a function of the true value of θ. For small θ, the The figure below shows the bias, E[θ] probability of early stopping is small, so the mass that is shifted to the right in the previous figure is small. As the true θ grows, bias grows until a maximum is reached near θ = 4. For large enough θ, we will stop at the first interim analysis with high probability, and the bias will return to zero. 0.30

ˆ −θ Bias=E[θ]

0.25 0.20 0.15 0.10 0.05 0.00 0

1

2

3

4

5

θ

Simple approach to bias in estimation: ˆ  θ = bias b(θ) = E[θ] so

θˆ  b(θ) is unbiased if θ is known (but then we wouldn’t need to estimate it!)

so let θˆC be the solution to

θˆC = θˆ  b(θˆC )

10 Interim Monitoring

STAT/BMI 641 Spring 2020

Page 54

Note that the solution can be found iteratively by letting ˆθ0 = θˆ and θˆj = θˆ  b(θˆj1 ) for j = 1, 2, . . . until ˆθj converges to θˆC . Caveats: • bias not of primary interest (MSE more important) • θˆC may have larger variance, larger MSE • depends on future looks (“what would have happened”) • given many other sources of bias (does θ even exist?) observed ˆθ is probably (IMO) “good enough”.

10.19

Adjusted p-values

• When a trial completes, it is common practice to report a final p-value for the overall result. • The p-value is used to – determine the overall “success” of the trial (is the result “statistically significant”?). – Assess the overall strength of evidence in the trial against the null hypothesis. • The p-value is the probability of a result as or more extreme than the one observed given that H0 is true. • In fixed sample trials (no early stopping), there is a direct relationship between the “strength of evidence” (as quantified by, e.g., the Z-statistic) and the p-value. p = 2(1  Φ (|Z|))

(two-sided)

• However, under sequential sampling, the nominal p-value does not have a uniform distribution under H0 . • If k ⇤ be the stopping stage for a group sequential procedure, then under H0 , Pr{Z(tk⇤ ) > z} > 1  Φ (z) I.e., the tail probability assuming a standard normal distribution is too small. • Also, when early stopping is possible, the direct link between “evidence” and p-values breaks down. • When sequential testing is used, how do we define “more extreme?” In the figure below, the Z -score for the green tra jectory is larger at t = .8, the early stopping time, than the Z -score for the red trajectory. On the other hand, the red tra jectory continues until the end of the trial, where the final Z-score is larger than the largest Z from the green trajectory. Because the outcomes are multi-dimensional with differing stopping times, it is unclear how to define a coherent ordering.

10 Interim Monitoring

STAT/BMI 641 Spring 2020

Page 55

3.0

2.5

2.0

Z(t) 1.5

1.0

0.5

0.0 0.2

0.4

0.6

0.8

1.0

t

• To determine the adjusted p-value, consider all possible realizations of a trial under H0 . • After stopping, we observe (k, B(tk )). • Given two different realizations (k, B(tk )) and (k 0 , B(tk0 )), we need to determine which is “more extreme” Definitions of “more extreme”, “” (from Jennison and Turnbull, 2000) If U is the score function at the stopping time (proportional to B). • Stage-wise ordering (SW). Using the SW ordering, (k 0 , U 0 )  (k, U ) if one of the following holds: 1. k 0 = k and U 0 > U , 2. k 0 < k. • MLE ordering. We consider (k 0 , U 0 )  (k, U ) if the θˆ0 = U 0 /Ik0 > θˆ = U/Ik . p • Likelihood Ratio (LR) ordering. Z =p U/ Ik is apmonotone function of the likelihood ratio so we consider (k 0 , U 0 )  (k, U ) if U 0 / Ik0 > U/ Ik . • Score ordering. In this ordering (k 0 , U 0 )  (k, U ) if U 0 > U . Assuming one-sided tests/p-values, the first figure below shows a sample trajectory for which the boundary is crossed at the third analysis. The subsequent figures show the tail probabilities (darker colors) used in the calculation of the adjusted p-value.

10 Interim Monitoring

STAT/BMI 641 Spring 2020

Page 56

Observed Data/Boundaries 4 Z3 = 2.75 3

Z(t) 2

1

0

Stagewise Ordering 4

Likelihood-ratio Ordering Z3 = 2.75

4

3

3

Z(t) 2

Z(t) 2

1

1

0

0

Z3 = 2.75

In the next two below, the Z at the stopping time is larger than in the previous figures. The p-value from the stagewise ordering is essentially unaffected whereas the p-value from the LR ordering decreases substantially. Stagewise Ordering 4

Likelihood-ratio Ordering Z3 = 3.50

4

3

3

Z(t) 2

Z(t) 2

1

1

0

0

Use power family, one-sided: g(t) = .025t2 > b2 summary(b2) ... Type: One-Sided Bounds alpha: 0.025 Spending function: Power Family: alpha * t^phi Boundaries: Time Upper 1 0.25 2.9552 2 0.50 2.5593 3 0.75 2.3008 4 1.00 2.0919

Exit pr. 0.0015625 0.0062500 0.0140625 0.0250000

Diff. pr. 0.0015625 0.0046875 0.0078125 0.0109375

Z3 = 3.50

10 Interim Monitoring

STAT/BMI 641 Spring 2020

Page 57

Suppose we observe Z3 = 2.75: First compute nominal p-value, SW can be computed using boundary with 3 analyses, the first two of which use the boundary values from b2, and the third is the observed Z = 2.72 > pnorm(2.75, lower=F) [1] 0.002979763 ## Stagewise p-value: > summary(ldPower(zb=c(b2$upper[1:2],2.75), za=rep(-10,3), + 1:3/4,drift=0)) ... Drift parameters: 0 Time Lower probs 1 0.25 7.6199e-24 2 0.50 7.6196e-24 3 0.75 7.6096e-24

Upper probs 0.0015625 0.0046875 0.0015989

Exit pr. 0.0015625 0.0046875 0.0015989

Cum exit pr. 0.0015625 0.0062500 0.0078488

Stagewise p-value is 0.00785 (compare to 0.00298) For LR ordering, we need exit probabilities at stages 1-4: > lr1 lr2 lr3 lr4 summary(lr1) ... Boundaries: Time Lower Upper 1 0.25 -10 2.9552 Drift parameters: 0 Time Lower probs 1 0.25 7.6199e-24

Upper probs 0.0015625

Exit pr. 0.0015625

Cum exit pr. 0.0015625

> summary(lr2) Lan-DeMets method for group sequential boundaries n =

2

Boundaries: Time Lower 1 0.25 -10 2 0.50 -10

Upper 2.9552 2.7500

Drift parameters: 0 Time Lower probs 1 0.25 7.6199e-24 2 0.50 7.6196e-24 > summary(lr3) Boundaries: Time Lower

Upper

Upper probs 0.0015625 0.0025661

Exit pr. 0.0015625 0.0025661

Cum exit pr. 0.0015625 0.0041286

10 Interim Monitoring

1 2 3

0.25 0.50 0.75

-10 -10 -10

Page 58

2.9552 2.5593 2.7500

Drift parameters: 0 Time Lower probs 1 0.25 7.6199e-24 2 0.50 7.6196e-24 3 0.75 7.6096e-24 > summary(lr4) ... Boundaries: Time Lower 1 0.25 -10 2 0.50 -10 3 0.75 -10 4 1.00 -10

STAT/BMI 641 Spring 2020

Upper probs 0.0015625 0.0046875 0.0015989

Exit pr. 0.0015625 0.0046875 0.0015989

Cum exit pr. 0.0015625 0.0062500 0.0078488

Upper 2.9552 2.5593 2.3008 2.7500

Drift parameters: 0 Time Lower probs 1 0.25 7.6199e-24 2 0.50 7.6196e-24 3 0.75 7.6096e-24 4 1.00 7.5676e-24

Upper probs 0.00156250 0.00468746 0.00781248 0.00075286

Exit pr. 0.00156250 0.00468746 0.00781248 0.00075286

Cum exit pr. 0.0015625 0.0062500 0.0140624 0.0148153

Get exit probabilities: > lr1$exit [1] 0.0015625 > lr2$exit [1] 0.001562500 0.002566141 > lr3$exit [1] 0.001562500 0.004687464 0.001598869 > lr4$exit [1] 0.0015625000 0.0046874638 0.0078124758 0.0007528551 ## Use final exit probability in each set: > lr1$exit[1] + lr2$exit[2] + lr3$exit[3] + lr4$exit[4] [1] 0.006480365

Likelihood-ratio p-value is 0.00648 (compare to 0.00298, 0.00785). The LR p-value is larger than the nominal, but smaller than SW. Suppose we observe Z3 = 3.5: > pnorm(3.5, lower=F) [1] 0.0002326291 > lr1 lr2 lr3 lr4 lr1$exit[1] + lr2$exit[2] + lr3$exit[3] + lr4$exit[4] [1] 0.0004408144 > summary(lr3)

#### Stagewise p-value:

10 Interim Monitoring

... Time 1 0.25 2 0.50 3 0.75

Lower probs 7.6199e-24 7.6196e-24 7.6096e-24

STAT/BMI 641 Spring 2020

Upper probs 0.001562500 0.004687464 0.000046147

Exit pr. 0.001562500 0.004687464 0.000046147

Page 59

Cum exit pr. 0.0015625 0.0062500 0.0062961

When “overshoot” is large, the SW p-value is relatively large, compared to the nominal because of the “failure to stop” penalty—the SW p-value cannot be smaller than cumulative stopping probability for the previous stages. In this example, the cumulative stopping probability through stage 2 is 0.00625, whereas the SW p-value is 0.006296, it is clearly dominated by this probability. On the other hand, the LR p-value does not suffer from this problem—the tail probabilities only include the parts of the tail above the observed Z. We do also need to include the tail probabilities for future analysis (that never happen), but if Zk⇤ is large, these will be small. We also have to know the number and timing of future analyses, although the LR p-value is relatively robust to the future. MERIT-HF • final nominal p-value is 0.00009 • adjusted p-value in main manuscript is 0.0062 (SW) • Smallest possible SW p-value given previous looks is αL (t1 ) + αU (t1 ) = 0.02 ⇥

1 1 + 0.0048 ⇥ = 0.0062 4 4

68 times nominal p-value! • Adjusted p-value essentially ignores final Z completely! • LR adjusted p-value is 0....


Similar Free PDFs