Chap6 - Total Probability and Bayes\' Theorem PDF

Title Chap6 - Total Probability and Bayes\' Theorem
Author Jaskiran Singh
Course Introduction to Probability
Institution Queen Mary University of London
Pages 9
File Size 210.3 KB
File Type PDF
Total Downloads 100
Total Views 138

Summary

Total Probability and Bayes' Theorem...


Description

Chapter 6 Total Probability and Bayes’ Theorem 6.1

Law of Total Probability

We saw in Example 5.8 that conditional probabilities can be used to compute the “total” probability of an event. To state this more formally, we need the idea of a partition which we illustrate first with another example. Example 6.1: Tossing thrice (revisited) A coin is tossed three times and the sequence of Heads/Tails is recorded just as in Example 1.2 (and Exercise 5.7). Consider the three events: E1 : “The first toss is a Head”; E2 : “The first toss is a Tail and the second toss is a Head”; E3 : “The first and second tosses are Tails”. State these events in set notation and consider how they relate to the sample space. Solution: With the same (obvious) notation as in Example 1.2, each outcome is a list of Heads (h) and Tails (t) in the order in which they are seen. We have E1 = {htt, hht, hth, htt}, E2 = {thh, tht}, E3 = {tth, ttt}. Notice that every outcome in the sample space appears in exactly one of these sets. The three events are pairwise disjoint (Ei ∩ Ej = ∅ for i �= j) and S = E1 ∪ E2 ∪ E3 . Loosely speaking, the three events “split” the sample space into three parts; more formally we say that E1 , E2 , and E3 partition the sample space.

Definition 6.1. The events E1 , E2 , . . . , En partition S if they are pairwise disjoint (i.e., Ek ∩ E� = ∅ if k �= �) and E1 ∪ E2 ∪ · · · ∪ En = S. We can also say that the set {E1 , E2 , . . . , En } is a partition of S . 51

52

6.1 Law of Total Probability

Remarks: • Some books explicitly require E1 , E2 , . . . , En to be non-empty sets; we will not insist on that here, although in practice it will usually be true. • Understanding the definition of a partition is important in seeing how to calculate the (total) probability of an event A from the conditional probabilities P(A|Ek ) (i.e., the probabilities under certain constraints) and the so-called marginal probabilities P(Ek ). Theorem 6.2 (Law of total probability). Suppose that E1 , E2 , . . . , En partition S with P(Ek ) > 0 for k = 1, 2, . . . , n. Then for any event A we have P(A) = P(A|E1 )P(E1 ) + P(A|E2 )P(E2 ) + · · · + P(A|En )P(En ) =

n �

P(A|Ek )P(Ek ).

k=1

Proof: Let Ak = A ∩ Ek , for k = 1, 2, . . . , n. Note that, by Definition 6.1, the sets E1 , E2 , . . . , En are pairwise disjoint and E1 ∪ E2 ∪ · · · ∪ En = S. Since Ak ⊆ Ek the events A1 , A2 , . . . , An are also pairwise disjoint and, furthermore, A1 ∪ A2 ∪ · · · ∪ An = A ∩ (E1 ∪ E2 ∪ · · · ∪ En ) = A ∩ S = A. Hence, by Definition 2.1(c), P(A) = P(A1 ) + P(A2 ) + · · · + P(An ).

(6.1)

Now, since P(Ek ) > 0, we also have (for k = 1, 2, . . . , n) P(Ak ) = P(A ∩ Ek ) =

P(A ∩ Ek ) P(Ek ) = P(A|Ek )P(Ek ). P(Ek )

(6.2)

Substituting (6.2) in (6.1) yields the statement of the theorem. In fact, we have already seen an example of the use of Theorem 6.2 in Example 5.8: by definition F and F c partition S. More generally, the approach is very widely applicable but for different problems one needs to think carefully about what partition to use. The technique is called conditioning. Exercise 6.2: Mind the gap In a recent survey [YouGov, 11th–16th June 2020], 1088 adults were asked (amongst other questions) if they thought Watford counted as part of London. The following excerpt

53

6.2 Total Probability for Conditional Probabilities

from the results shows the number of survey participants in different age categories and the percentage of them saying that they did consider Watford as part of London.

Number of participants Percentage saying Watford is in London

18-24

Age 25-49 50–64

65+

124 31

544 34

173 19

247 15

What is the probability a randomly-chosen participant thinks Watford is in London?

6.2

Total Probability for Conditional Probabilities

There is an analogue of Theorem 6.2 for conditional probabilities. Theorem 6.3. If E1 , E2 , . . . , En partition S with P(Ek ) > 0 for k = 1, 2, . . . , n, then for events A and B with P(B ∩ Ek ) > 0 for k = 1, 2, . . . , n, we have P(A|B) = P(A|B ∩ E1 )P(E1 |B) + P(A|B ∩ E2 )P(E2 |B) + · · · + P(A|B ∩ En )P(En |B) =

n �

P(A|B ∩ Ek )P(Ek |B).

k=1

Proof: The idea is to use the definition of conditional probability together with the result we proved in the previous section. Specifically, we start from Definition 4.1 P(A|B) =

P(A ∩ B) , P(B)

and apply Theorem 6.2 to P(A ∩ B) to yield P(A|B) =

P(A ∩ B|E1 )P(E1 ) + P(A ∩ B|E2 )P(E2 ) + · · · + P(A ∩ B|En )P(En ) . P(B) (6.3)

Now, for k = 1, 2, . . . , n, we have 1 1 P(A ∩ B ∩ Ek ) P(A ∩ B|Ek )P(Ek ) = P(Ek ) [by Definition 4.1] P(B) P(B) P (Ek ) P(B ∩ Ek ) 1 P(A ∩ B ∩ Ek ) = [using P(B ∩ Ek ) > 0] P(B) P(B ∩ Ek ) P(A ∩ B ∩ Ek ) P(B ∩ Ek ) = P(B ∩ Ek ) P(B) = P(A|B ∩ Ek )P(Ek |B)

[by Definition 4.1].

Substituting (6.4) in (6.3) yields the statement of the theorem.

(6.4)

6.2 Total Probability for Conditional Probabilities

54

Example 6.3: Magic coins (revisited) Consider again the set-up of the magician in Example 5.8. (a) Supposing there is a Head on the first toss, determine the probability that the coin is fair. (b) Use the result from (a) together with Theorem 6.3 to find the probability of getting a Head on the second toss given there is a Head on the first toss. Solution: (a) We already have P(F ) = 1/2, P(H1 |F ) = 1/2, and P(H1 ) = 5/8 (see Example 5.8). We expect P(F |H1 ) to be different to P(H1 |F ); to calculate the former, we start with the definition of conditional probability. Using the results we already know, we have1 P(F ∩ H1 ) [by Definition 4.1] P(H1 ) P(H1 |F )P(F ) = [by Theorem 4.2] P(H1 ) (1/2) × (1/2) = 5/8 2 = . 5

P(F |H1 ) =

(b) Using Theorem 6.3 with the partition {F, F c } gives P(H2 |H1 ) = P(H2 |H1 ∩ F )P(F |H1 ) + P(H2 |H1 ∩ F c )P(F c |H1 ).

(6.5)

We have P(F |H1 ) = 2/5 [from (a)] and P(F c |H ) = 1 − P(F |H1 ) = 3/5 [see Exercise 4.5]. The other two conditional probabilities on the right-hand side of (6.5) look more complicated at first sight. However, the property of conditional independence discussed in Example 5.8, leads to considerable simplification. Starting once again with the definition of conditional probability, we find P (H2 ∩ H1 ∩ F ) [by Definition 4.1] P(H1 ∩ F ) P(H2 ∩ H1 |F )P(F ) [by Theorem 4.2] = P(H1 |F )P(F ) P(H2 ∩ H1 |F ) = P(H1 |F ) P(H2 |F )P(H1 |F ) = [by conditional independence of H1 and H2 given F ] P(H1 |F )

P(H2 |H1 ∩ F ) =

= P(H2 |F ) 1 = . 2 1

We’ll see this method again in the next section.

55

6.3 Bayes’ Theorem

Similarly, by the conditional independence of H1 and H2 given F c , we have P(H2 |H1 ∩ F c ) = P(H2 |F c ) =

3 . 4

Hence, putting everything together, we conclude P(H2 |H1 ) =

13 1 2 3 3 × + × = . 20 2 5 4 5

Exercise 6.4: Magic coins (re-revisited) Show that the answer to Example 6.3(b) is consistent with the analysis in Example 5.8(c).

6.3

Bayes’ Theorem

As Example 6.3 reminded us, P(A|B) and P(B|A) are different conditional probabilities. However, as seen in that example, we can determine one from the other if we also know P(A) and P(B). The theorem is attributed to Thomas Bayes (1702–1761) although not actually published until after his death [Bay63]. Theorem 6.4 (Bayes’ theorem). If A and B are events with P(A), P(B) > 0, then P(B|A) =

P(A|B )P(B) . P(A)

Proof: Starting again from Definition 4.1 (and using that P(A), P(B) > 0) we have P(B ∩ A) P(A) P(A ∩ B) P(B) = P(A) P(B) P(A ∩ B) P(B) = P(B) P(A) P(B) = P(A|B) . P(A)

P(B|A) =

[Instead of multiplying numerator and denominator by P(B) in the second line, one could use the multiplication rule (Theorem 4.2) for P(A ∩ B ).] Remarks: • Bayes’ theorem has many practical applications. • We often need to use Theorem 6.2 (law of total probability) to calculate the probability in the denominator of Theorem 6.4.

56

6.3 Bayes’ Theorem

Example 6.5: Medical test Suppose there is a disease which 0.1% of the population suffers from. A test for the disease has a 99% chance of giving a positive result for someone with the disease, and only a 0.5% change of giving a positive result for someone without the disease (a “false positive”). What is the probability that a randomly-chosen person who tests positive actually has the disease? Solution: Let us define the events: D: “The selected person has the disease”; P : “The test for the selected person is positive”. We know P(D) =

1 , 1000

P(P |D) =

99 , 100

and

P(P |Dc ) =

5 . 1000

We want to compute P(D|P ) so, using Bayes’ theorem (Theorem 6.4), we write P(D|P ) =

P(P |D)P(D) . P(P )

To calculate P(P ) we can use Theorem 6.2 with the partition {D, Dc }: P(P ) = P(P |D)P(D) + P(P |Dc )P(Dc ) = P(P |D)P(D) + P(P |Dc )(1 − P(D)) � � 1 1 5 99 = × 1− × + 1000 100 1000 1000 1 5 999 99 . = × + × 100 1000 1000 1000

[using Proposition 2.2]

Hence, we find (99/100) × (1/1000) (99/100) × (1/1000) + (5/1000) × (999/1000) 990 = 5985 22 = 133 = 0.1654 (to 4 decimal places).

P(D|P ) =

So assuming the test is positive, there is only about 17% chance that the person has the disease. In other words, about 83% of positive tests are false positives. Does this mean the test is useless or is there anything one can do about the problem? Exercise 6.6: *Double medical test Consider the medical testing scenario of Example 6.5. Supposing a person tests positive in two separate tests, what is the probability that they actually have the disease? State clearly any assumptions you make.

6.4 From Axioms to Applications

6.4

57

From Axioms to Applications

The law of total probability and Bayes’ theorem are crucially important: not only are they needed for many exam questions but they have wide-reaching applications in real-life. At this point in the course it is worth pausing to see how far we have come. After introducing the language of sets and events, we started in what may have seemed quite an abstract way with Kolmogorov’s axioms (Definition 2.1) specifying the properties that probability should have. These simple axioms and the definition of conditional probability (Definition 4.1) are the basic ingredients which have allowed us to build up to proving the more complex results in the present chapter. This illustrates both the beauty and the power of the axiomatic approach to probability. We are also finally in a position to revisit a burning question from Chapter 0... Example 6.7: Innocent or guilty (revisited) Look back at Exercise 0.3. Based on the evidence there, a prosecution lawyer argues that there is a 1 in 50,000 chance of the suspect being innocent. (a) Why is such an argument flawed? (b) Suppose that London has a population of 10 million and the murderer is assumed to be one of these. If there is no evidence against the suspect apart from the fingerprint match then it is reasonable to regard the suspect as a randomly-chosen citizen. Under this assumption, what is the probability the suspect is innocent? (c) How does the argument change if one knows that the suspect is one of only 100 people who had access to the building at the time Professor Damson was killed? Solution: (a) Let us write I for the event “the suspect is innocent” and F for the event “the fingerprints of the suspect match those at the crime scene”. The prosecutor notes that P(F |I) = 1/50000 and deduces that P(I|F ) = 1/50000. This is nonsense. In general there is no reason why two conditional probabilities P(A|B) and P(B|A) should be equal or even close to equal. (b) By Bayes’ theorem combined with the law of total probability (using partition {I , I c }), P(I|F ) =

P(F |I )P(I ) P(F |I )P(I ) . = P(F |I )P(I) + P(F |I c )P(I c ) P(F )

Now, we are told P(F |I) and it is reasonable to assume that P(F |I c ) = 1 since if the suspect is guilty the fingerprints should certainly match.2 The quantity P(I) in Bayes’ theorem is the probability that the suspect is innocent in the absence of any evidence 2 In fact, you could set P(F |I c ) to be any reasonably large probability without changing the general conclusions.

58

6.5 Further Exercises

at all. We are told that the suspect should be regarded as a randomly-chosen citizen from a city of 10 million people so P(I c ) = 1/10000000 and P(I) = 1 − 1/10000000 = 9999999/10000000. This gives (1/50000) × (9999999/10000000) (1/50000) × (9999999/10000000) + 1 × (1/100000000) 9999999 = 9999999 + 50000 = 0.9950 (to 4 decimal places).

P(I|F ) =

Hence there is about a 99.5% chance that the suspect is innocent. (c) This new information will decrease our initial value of P(I) which, remember, is the probability that our suspect is innocent before we consider the fingerprint evidence. From the information given it is now reasonable to treat the suspect as randomlychosen from among the 100 people with access to the building. Hence we take P(I c ) = 1/100 and P(I) = 99/100 which gives (1/50000) × (99/100) (1/50000) × (99/100) + 1 × (1/100) 99 = 99 + 50000 = 0.0020 (to 4 decimal places).

P(I|F ) =

Hence there is now only about a 0.2% chance that the suspect is innocent.

The above example illustrates the so-called “prosecutor’s fallacy” which is not just of academic interest – it has been associated with several high-profile miscarriages of justice. Exercise 6.8: Wrongful conviction Find and discuss some real-life examples where a suspect has been convicted on the basis of faulty probabilistic arguments.

6.5

Further Exercises

Exercise 6.9: Football team Two important members of a football team are injured. Suppose that their recoveries before the match are independent events and each recovers with probability p. If both are able to play then the team has probability 2/3 of winning the match, if only one of them plays then the probability of winning is 5/12 and if neither play the probability of winning is 1/6. Show that the condition p > 2/3 guarantees that the match is won with probability greater than 1/2.

6.5 Further Exercises

59

Exercise 6.10: General partitioning Which of the following partition S when A and B are arbitrary events? Justify your answers. (a) The four events A, Ac , B, B c , (b) The two events A, B \ A, (c) The four events A \ B, B \ A, A ∩ B, (A ∪ B)c , (d) The three events A ∩ B, A�B , Ac ∩ B c , (e) The three events A, B, (A ∪ B)c . Exercise 6.11: Lost key Mimi and Rodolfo are looking for a key in the dark. Suppose that the key may be under the table, behind the bookshelf or in the corridor, and has a 1/3 chance of being in each of these places. Mimi searches under the table; if the key is there she has a 3/5 chance of finding it. Rodolfo searches behind the bookshelf; if the key is there he has a 1/5 chance of finding it. (a) Calculate the probability that the key is found. (b) Suppose that the key is found. Calculate the conditional probability that it is found by Rodolfo. (c) Suppose that the key is not found. Calculate the conditional probability that it is in the corridor. [Answers: 4/15, 1/4, 5/11] Exercise 6.12: *Are you smarter than a pigeon? (revisited) Consider the version of the Monty Hall problem presented in Exercise 0.2. Let the cards be numbered 1 to 3 with your initial pick being card 1. Assume that in the case where the ace is card 1, the street performer reveals card 2 with probability p, and card 3 with probability 1 − p. Otherwise, the card (out of 2 and 3) which is not the ace is always revealed. (a) Assuming that you do not switch your choice, compute the probability of winning, conditioned on the performer showing you card 2. Do the same conditioned on the performer showing you card 3. (b) Assuming that you do switch once the card has been revealed, compute the probability of winning, conditioned on the performer showing you card 2. Again, do the same conditioned on the performer showing you card 3. Check that regardless of the value of p, and regardless of which card is revealed, deciding to switch is always at least as good as deciding not to switch. (c) Use the law of total probability to calculate the probability of winning in both cases (a) and (b). Explain briefly why your result makes sense....


Similar Free PDFs