MT2320 Summary PDF

Title	MT2320 Summary
Course	Probability
Institution	Royal Holloway, University of London
Pages	39
File Size	1.3 MB
File Type	PDF
Total Downloads	111
Total Views	168

Preview

CLICK TO PREVIEW PDF

Summary

MT2320 Summary...

Description

PROBABILITY MT232 STEFANIE GERKE

I would very much appreciate being told of any corrections or possible improvements to these notes. You are warmly encouraged to ask questions in lectures, and to talk to me after lectures and in my office hours. Disclaimer: Much of the material is taken from Martin Widmer’s lecture notes some of which are verbatim from the textbook Probability - An Introduction - G.Grimmett, D. Welsh, (Oxford 2000), Library Ref. 518.1 GRI Why Probability Theory? A historical example Two players A and B are making a tournament.

• • • • •

Both players contribute the same amount to the prize pot First player winning 10 games gets the whole prize pot For each game probability that A wins=probability that B wins Tournament interrupted after 15 games A won 8 times and B won 7 times.

Question: How to split the prize pot in a fair way?

1. E V ENTS AND P ROBABILITIES 1.1. Experiments with Chance. An experiment is an action whose outcome is not completely determined, and which can be formulated as a mathematical object, called a probability space. A probability space can be described by :

• the set of all possible outcomes • list of (interesting) events that may occur • an assessment of likelyhoods of these events. The study of probability spaces is called probability theory. Date: Second term 2018/19.

2

1.2. Outcomes and events. Let

• E be an experiment, • Ω be the sample space, i.e., the set of all possible outcomes of E , • ω an elementary event, i.e., an element of Ω. Any event that may occur as the consequence of our experiment can be described as a set of elementary event ω, so, as a subset of the sample space Ω. On the other hand, every subset A of Ω describes a certain event, namely the event that the outcome ω lies in A. However, often the list of events that is interesting to us is much smaller than the collection of all possible subsets of Ω. The important thing here is that we want the collection of “interesting events” to have some algebraic structure. Let us make this precise. Definition 1.1 (Event space). A collection F of subsets of Ω is called an event space if (1) F is non-empty, (2) if A 2 F then also Ac = Ω\ A 2 F , (3) if A1 , A2 , A3 , . . . 2 F then also [∞ i =1 A i 2 F . The elements of F are called events. The following properties of an event space F will be used extensively. Lemma 1.2. Suppose F is an event space then (1) ∅, Ω 2 F . (2) A, B 2 F ) A \ B 2 F , (3) A, B 2 F ) A\ B 2 F , (4) A, B 2 F ) A∆ B 2 F , (5) A1 , A2 , A3 , · · · 2 F ) \ ∞ i =1 A i 2 F , (1) Let A 2 F . Note that A exists as F 6= ∅ by definition. Now by definition Ac 2 F and thus Ω = A [ Ac 2 F . Now ∅ = Ω\Ω 2 F . (2) A, B 2 F ) Ac , Bc 2 F ) Ac [ Bc 2 F ) A \ B = ( Ac [ Bc ) c 2 F . The rest is an Exercise. For (3) note that A\ B = ( Ω\ B) \ A. ⇤

Proof.

1.3. Probabilities. Consider an experiment E with a sample space Ω and an event space F . Definition 1.3 (Probability measure). A mapping P : F ! R is called a probability measure on ( Ω, F ) if (1) P( A)  0, (2) P( Ω) = 1, (3) if A1 , A2 , A3 , . . . 2 F pairwise disjoint events then P([∞ i =1 A i ) = ∞ ∑ i = 1 P( A i ) .

3

Lemma 1.4. Let P : F ! R be a probability measure on ( Ω, F ). Then • A, B 2 F and A ✓ B implies P( A)  P( B), • P( ∅) = 0, • 0  P( A)  1 for all A 2 F , • A, B 2 F implies P( A \ B) + P( A [ B) = P( A) + P( B), • A 2 F implies P( Ac ) = 1  P( A). Exercise: Prove the above lemma. Exercise: Let Ω be a non-empty set, let A ✓ Ω and A 6= ∅, Ω. Set F = {∅, Ω, A, Ac }. Verify that F is an event space, and determine all possible probability measures on ( Ω, F ).

4

1.4. Probability spaces. Definition 1.5 (Probability spaces). A probability space ( Ω, F , P) is a triple of objects such that (i) Ω is a non-empty set, (ii) F is an event space of subsets of Ω, (iii) P : F → [0, 1] is a probability measure on ( Ω, F ).

1.5. Countable sample spaces. Suppose ( Ω, F , P) is a probability space with a countable sample space Ω. Suppose we want to assign probabilities to all elementary outcomes ω, i.e., to all subsets {ω } ⊆ Ω with one element. Then each of these singletons must be an event, i.e., {ω } ∈ F for all ω ∈ Ω. As Ω is countable any subset A of Ω is a countable union of singletons A = ∪ω ∈ A {ω }, and thus, by the properties of an event space, every subset of Ω is an event. Hence, we are forced to choose F to be the power set1 of Ω. Note that in this case the probability measure P on ( Ω, F ) is determined by the values P({ω }) (ω ∈ Ω). Indeed, as any A ⊆ Ω is countable we have P( A) = P(∪ω ∈ A {ω }) = ∑ω ∈ A P({ω }). From now on we shall simply write P( ω ) instead of P({ω }). Warning: We do not always require {ω } ∈ F for all ω ∈ Ω. So even for countable sample spaces Ω the event space F is not necessarily the power set of Ω.

1.6. Conditional probabilities. Let ( Ω, F , P) be a probability space. Definition 1.6. Suppose A, B ∈ F and P( B) > 0. Then the (conditional) probability of A given B is denoted by P( A| B), and defined by P( A | B ) =

P( A ∩ B ) . P( B )

Theorem 1.7. Suppose B ∈ F with P( B) > 0. Set Q : F → R, Q ( A) = P( A| B). Then ( Ω, F , Q) is a probability space. 1The power set of the set Ω is the set of all subsets of Ω.

5

1.7. Independent events. Let ( Ω, F , P) be a probability space. We would like to say A, B 2 F are independent if P( A| B) = P( A) and P( B| A) = P( B). But this requires P( A), P( B) > 0. The following definition is more general. Definition 1.8 (Independent events). Let ( Ω, F , P) be a probability space. • The events A and B are called independent if P( A \ B) = P( A) P( B), and called dependent otherwise. • A family A = { Ai ; i 2 I } ✓ F of events is called independent if for all finite J ✓ I we have P(\ J Ai ) = ∏ J P( Ai ). • A family A = { Ai ; i 2 I } ✓ F of events is called pairwise independent if for all i, j 2 I, i 6= j we have P( Ai \ Ai ) = P( A i ) P( A j ) . Note that every independent family of events is pairwise independent but that we have seen an example in the lecture that the opposite is not true! 1.8. The partition theorem. If Ω = [ I Bi is a disjoint union (i.e., for i 6= j we have Bi \ Bj = ∅), then we say { Bi ; i 2 I } is a partition of Ω. Let ( Ω, F , P) be a probability space. Theorem 1.9 (Partition theorem). Suppose { B1 , B2 , B3 , . . .} is a partition of Ω, and all Bi are events of positive probability P( Bi ) > 0. Then ∞

P( A ) =

∑ P ( A | Bi ) P ( Bi ) i =1

for all A 2 F . Theorem 1.10 (Bayes’ theorem). Let { B1 , B2 , . . .} be a partition of the sample space Ω such that P( Bi ) > 0 for each i. For any even A with P( A) > 1, P( B j | A ) =

P( A | B j ) P( B j ( ∑ i P ( A | B i ) P ( Bi )

6

2. D ISCRETE RANDOM VARIABLES Definition 2.1 (Discrete random variable). Let ( Ω, F , P) be a probability space. A discrete random variable X on ( Ω, F , P) is a mapping X : Ω → R such that (1) Im X = { X ( ω ); ω ∈ Ω} is countable, (2) {ω ∈ Ω; X ( ω ) = x } ∈ F for all x ∈ R. Discrete refers to (1), (2) tells us that for any x ∈ R we can assign a probability to the set {ω ∈ Ω; X ( ω ) = x }. Exercise: Suppose we have a countable sample spaces and suppose F is the power set of Ω. Can you describe the possible discrete random variables on ( Ω, F , P)? Let X and Y be discrete random variables on a probability space ( Ω, F , P), and let λ ∈ R. We have seen in the lecture that X + Y (defined as the function ( X + Y )( ω ) = X ( ω ) + Y ( ω )) is a discrete random variable and it is a homework question to show that λX (defined as the function ( λX )( ω ) = λX ( ω )) is also discrete random variables on ( Ω, F , P). Therefore, the set of discrete random variables on a fixed probability space ( Ω, F , P) is a vector space over the field R. We shall soon introduce the expectation E( X ) of a discrete random variable X. It is a linear map from a subspace of the discrete random variables to R. 2.1. Probability mass function. Definition 2.2 (Probability mass function). Let X be a discrete random variable on a probability space ( Ω, F , P). The (probability) mass function p X of the random variable X is a map p X : R → [0, 1] given by p X ( x ) = P({ω ∈ Ω; X ( ω ) = x }). By convention we write P({ω ∈ Ω; X ( ω ) = x }) = P( X = x ). Exercise: Show that Im p X is countable and ∑R p X ( x ) = 1. −1 (0, 1] = { x ∈ R; p ( x ) > 0} is countable. Exercise: Conclude that pX X (Hint: recall that countable unions of countable sets are countable.)

Which functions occur as probability mass functions?

7

Question 2.3. Suppose f : R → [0, 1] is a function that satisfies the “obvious” properties of a probability mass function for a discrete random variable, i.e., Im f is countable and ∑R f ( x ) = 1. Is this also sufficient, i.e., does this imply that there exists a probability space ( Ω, F , P) and a discrete random variable X on ( Ω, F , P) with p X = f ? Theorem 2.4. Suppose si (i ∈ I) is a countable collection of distinct real numbers and suppose πi ≥ 0 for all i ∈ I and ∑ I πi = 1. Then there exists a probability space ( Ω, F , P) and a discrete random variable X on ( Ω, F , P) such that p X ( si ) = πi for i ∈ I, p X ( s) = 0 if s ∈ / { s i ; i ∈ I },

8

2.2. Important examples of distributions. Bernoulli distribution ( B(1, p)) : The random variable X has the Bernoulli distribution with parameter p if • Im X = {0, 1} • P( X = 0) = q, where q = 1 − p • P( X = 1) = p. Note that p X ( x ) = P( X = x ) = 0 for all x ∈ / Im X = {0, 1}. Binomial distribution ( B( n, p)) : The random variable X has the Binomial distribution with parameters n and p if • Im X = {0, 1, . . . , n } • P( X = k) = ( nk ) pk qn−k , for k ∈ {0, . . . , n }, where q = 1 − p. Exercise: Show directly that ∑R p X ( x ) = 1. Poisson distribution ( P( λ )) : The random variable X has the Poisson distribution with parameter λ ( λ > 0) if • Im X = N0 • P( X = k) = k!1 λk e−λ , for k ∈ N0 . Exercise: Show directly that ∑R p X ( x ) = 1. Geometric distribution ( G ( p)) : The random variable X has the geometric distribution with parameter p ( p > 0) if • Im X = N • P( X = k) = pqk−1 , for k ∈ N, where q = 1 − p. Exercise: Show directly that ∑R p X ( x ) = 1. 2.3. Examples in action. A coin is tossed n times. At each toss the probability of Heads is p. Then the corresponding probability space ( Ω, F , P) is given by

• Ω = { H, T }n , • F = power set of Ω, • P( ω ) = pk qn−k , where k =number of Heads in ω. Note that this completely determines the probability measure because here Ω is countable.

9

Exercise: Show that the constructed “probability measure” really defines a probability measure, i.e., all the three necessary conditions are fulfilled. We define a discrete random variable S : Ω → {0, 1, . . . , n } by S( ω ) = number of Heads in ω. Then S ∼ B( n, p). Indeed, ✓ ◆ n k n− k k n− k k n− k p q . =p q P( S ( ω ) = k ) = ∑ p ( 1 − p ) 1= ∑ k ω ω S(ω )=k

S(ω )=k

Note that we can write S( ω ) = X1 ( ω ) + · · · + X n ( ω ), where Xi ( ω ) = 1 if the i-th toss is Heads and Xi ( ω ) = 0 otherwise, and thus Xi ∼ B(1, p). More generally: if Xi ∼ B(1, p) and the Xi are independent (this will be made precise later on in this course) then ∑in=1 Xi ∼ B( n, p). 2.4. Poisson approximation. For big values of n the binomial coefficients ( nk) become very large (at least for 2 ≤ k ≤ n − 2). Therefore, to calculate the probabilities for binomially distributed random variables we are often forced to accept approximations. One that works fairly well if n is very big, p is small, and np = λ is of reasonable size (not “too small” and not “too big”) is the so-called Poisson approximation. That means one tries to approximate the Binomial distribution B( n, p) by the Poisson distribution P( λ ). Indeed, suppose k is a fixed non-negative integer and λ = np > 0 is also fixed. Then, as n tends to infinity, ✓ ◆ λ k −λ n k n− k p q → e , k k! and so if S ∼ B( n, p) then we have P( S = k) ≈

λk −λ . k! e

2.5. Functions of discrete random variables. Suppose X is a discrete random variable on the probability space ( Ω, F , P), and let g : R → R be a function. Theorem 2.5. The map Y : Ω → R defined as Y ( ω ) = g( X ( ω )) is a discrete random variable on ( Ω, F , P). Its mass function pY ( y ) is given by p Y ( y ) = P( Y = y ) =

∑

P( X = x ) .

x ∈ g −1 ( y ) x ∈Im X

2.6. Expectations. Let X be a discrete random variable on the probability space ( Ω, F , P). Definition 2.6. The expectation (or mean value) of X is denoted by E( X ) and defined by E( X ) =

∑ x∈Im X

xP( X = x ),

10

whenever this sum converges absolutely (i.e., ∑ x∈Im X | xP ( X = x )| < ∞).

• Absolute convergence is needed, otherwise the sum would depend on the order in which we add up the terms. • If masses with weights π1 , π2 , π3 , . . . and total weight ∑i πi = 1 are placed at the points x1 , x2 , x3 , . . . in R, then ∑i xi πi is the centre of gravity. 2.7. Expectations of functions of discrete random variables. Suppose X is a discrete random variable on the probability space ( Ω, F , P), and let g : R → R be a function. Theorem 2.7. The discrete random variable g( X ) has expectation E( g( X )) =

∑ x∈Im X

g ( x ) P( X = x ) .

11

2.8. Variance. Suppose X is a discrete random variable. We know that E( X ) measures the “center” of the random variable X. Another important quantity of X is the variance which measures the dispersion of X . Definition 2.8. The variance of X is denoted by Var ( X ) and defined by Var ( X ) = E(( X − E( X )) 2 ). Lemma 2.9. We have Var ( X ) = E( X 2 ) − E( X )2 . 2.9. Conditional expectation and the partition theorem. Let X be a discrete random variable on the probability space ( Ω, F , P) and let B ∈ F with P( B) > 0. If we are given that B occurs then this affects p X ( x ) as follows. The probabilities P( X = x ) are replaced by the conditional probabilities P( X = x | B ) =

P({ω ∈ Ω; X ( ω ) = x } ∩ B) . P( B )

Definition 2.10. If X be a discrete random variable on the probability space ( Ω, F , P) and let B ∈ F with P( B) > 0 then the conditional expectation of X given B is denoted by E( X | B) and defined as E( X | B) =

∑

xP( X = x | B),

x∈Im X

whenever this sum converges absolutely. Theorem 2.11. Suppose X is a discrete random variable on the probability space ( Ω, F , P), and let { B1 , B2 , B3 , . . .} be a partition of the sample space such that all Bi are events with P( Bi ) > 0. Then E ( X ) = ∑ E ( X | B i ) P ( Bi ) , i

whenever this sum converges absolutely.

12

3. M U LTIVARIATE DISCRETE DISTRIBU TIONS AND INDEPENDENCE 3.1. Bivariate discrete distributions. Let X and Y be discrete random variables on the probability space ( Ω, F , P). Instead of looking at X and Y separately, we consider ( X, Y ) as a random vector in R2 . Definition 3.1. The joint (probability) mass function p X,Y of X and Y is the function p X,Y : R2 → [0, 1] defined by p X,Y ( x, y ) = P({ω ∈ Ω; X ( ω ) = x, Y ( ω ) = y }). Note that {ω ∈ Ω; X ( ω ) = x, Y ( ω ) = y } is indeed an event (why?), so the definition makes sense. Convention: p X,Y ( x, y ) = P( X = x, Y = y ). 3.2. Bivariate discrete distributions. Let X and Y be discrete random variables on the probability space ( Ω, F , P). Lemma 3.2.

• • • •

p X,Y ( x, y ) = 0 unless x ∈ Im X and y ∈ Im Y ∑ x∈Im X ∑y ∈Im Y p X,Y ( x, y ) = 1 P( X = x ) = p X ( x ) = ∑y ∈Im Y p X,Y ( x, y ) P(Y = y ) = pY ( y ) = ∑ x∈Im X p X,Y ( x, y )

Exercise: Prove the above lemma. Definition 3.3. The functions p X ( x ) and pY ( y ) are called the marginal mass functions of X and Y respectively. 3.3. Multivariate discrete distributions. The bivariate case generalises in the obvious way to the multivariate case. Let X1 , . . . , X n be discrete random variables on the probability space ( Ω, F , P), and X = ( X1 , . . . , X n ). The joint (probability) mass function p X of X is the function p X : Rn → [0, 1] defined by p X ( x1 , . . . , xn ) = P({ω ∈ Ω; X1 ( ω ) = x1 , . . . , X n ( ω ) = xn }). 3.4. Expectations in the multivariate case. Suppose X and Y are discrete random variables on the probability space ( Ω, F , P), and g : R2 → R. Lemma 3.4. The function Z( ω ) = g( X ( ω ), Y ( ω )) is also a discrete random variable on the probability space ( Ω, F , P). Exercise: Prove the above lemma. Theorem 3.5. E( g( X, Y )) = ∑ x∈Im X ∑y ∈Im Y g( x, y ) P( X = x, Y = y ).

13

3.5. Independence of discrete random variables. Let X and Y be discrete random variables on the probability space ( Ω, F , P). Definition 3.6. X and Y are called independent if the events {ω; X ( ω ) = x } and {ω; Y ( ω ) = y } are independent for all x, y ∈ R. This means exactly P( X = x, Y = y ) = P( X = x ) P(Y = y ) for all x, y ∈ R. And this in turn can be written as p X,Y ( x, y ) =

∑ pX,Y ( x, y ) y

!

p X,Y ( x, y ) ∑ y

!

for all x, y ∈ R. The above definition of independence carries over to collections of more than 2 random variables. Let X1 , X2 , X3 , . . . be discrete random variables on the probability space ( Ω, F , P). Definition 3.7. Let n ≥ 2. We say X1 , . . . , Xn are independent if the events {ω ∈ Ω; X1 ( ω ) = x1 }, . . . , {ω ∈ Ω; Xn ( ω ) = xn } are independent for all x1 , . . . , xn ∈ R. We say that X1 , X2 , X3 , . . . are independent if all finite collections of n ≥ 2 of these random variables are independent. Of course, as for n = 2, X1 , . . . , Xn are independent means exactly that P ( X1 = x 1 , . . . , X n = x n ) = P ( X1 = x 1 ) · · · P ( X = x n ) . for all x1 , . . . , xn ∈ R. Let X and Y be discrete random variables on the probability space ( Ω, F , P), and let g, h : R → R. Question 3.8. Are the discrete random variables g( X ) and h(Y ) on ( Ω, F , P) also independent? For x, y ∈ R, ⇣ ⌘ P ( g ( X ) = a 1 , h ( Y ) = a 2 ) = P X ∈ g− 1 ( a 1 ) , Y ∈ h − 1 ( a 2 )

∑ ∑ P ( X = b1, Y = b2 ) , b1 b2

14

where b1 runs over all g−1 ( a1 ) ∩ Im X and b2 runs over all h−1 ( a2 ) ∩ Im Y. Using the independence we conclude that the above equals

∑ ∑ P ( X = b 1 ) P( Y = b 2 ) b1 b2

⇣ ⌘ ⇣ ⌘ = P X ∈ g −1 ( a 1 ) P Y ∈ h −1 ( a 2 )

= P ( g ( X ) = a 1 ) P ( h (Y ) = a 2 ) . So, g( X ), h(Y ) are independent. Let X and Y be discrete random variables on the probability space ( Ω, F , P). Theorem 3.9. X and Y are independent if and only if there exist functions f , g : R → R such that p X,Y ( x, y ) = f ( x ) g( y ) for all x, y ∈ R. As we have already seen, if X and Y are independent, then p X,Y ( x, y ) = f ( x ) g( y ) with f ( x ) = P( X = x ) and g( y ) = P(Y = y ). It remains to prove the other implication. 3.6. Independence and expectations of discrete random variables. Let X and Y be discrete random variables on the probability space ( Ω, F , P). Theorem 3.10. If X and Y are independent and E( X ) and E(Y ) exist then E( XY ) exists and E( XY ) = E( X ) E(Y ). Exercise: Show that the converse is not true. Theorem 3.11. X and Y are independent if and only if E( g( X ) h(Y )) = E( g( X )) E( h(Y )) . for all functions g, h : R → R for which the latter two expectations exists.

15

3.7. Sums of discrete random variables. Let X and Y be discrete random variables on the probability space ( Ω, F , P). We know by Lemma 0.4 that Z = X + Y is also a discrete random variables on the probability space ( Ω, F , P). What is the mass function of Z? Theorem 3.12. P ( Z = z) =

∑

P( X = x, Y = z  x ).

x2Im X

Corollary 3.13. If X and Y are independent, then P ( Z = z) =

∑

P ( X = x ) P (Y = z  x ) .

x2Im X

4. P ROBABILITY GENERATING FU ...