Chapter 3 - YU, Chi Wai PDF

Title	Chapter 3 - YU, Chi Wai
Course	Applied Statistics
Institution	香港科技大學
Pages	39
File Size	2.1 MB
File Type	PDF
Total Downloads	3
Total Views	182

Preview

CLICK TO PREVIEW PDF

Summary

MATH2411: Applied Statistics Dr. YU, Chi Wai Chapter 3: RANDOM VARIABLES 1 WHAT IS A RANDOM VARIABLE? In chapter 2, we defined an event which is a of outcomes from the sample space that we are interested. Indeed most often we are more interested in numerical values rather than the events themselves....

Description

MATH2411: Applied Statistics | Dr. YU, Chi Wai

Chapter 3: RANDOM VARIABLES HAT RAND NDO VARIAB ABLE 1 WHA T IS A RA ND OM VARI AB LE?

In chapter 2, we defined an event which is a subset/portion of outcomes from the sample space that we are interested. Indeed most often we are more interested in numerical values rather than the events themselves. This leads to the notion of a random variable (hereafter rv).

A Random Variable is a function which associates a real number to the elementary outcomes of a sample space.  Random variables are often denoted by capital letters, say X, Y, Z, and their possible numerical values (or called realizations) denoted by the same lowercase letters, say 𝑥, 𝑦, 𝑧.  Note that the rv X is a random quantity BEFORE the experiment is performed and its realization 𝑥 is the value of the random quantity AFTER the experiment has been performed. The word RANDOM reminds us of the fact that we cannot predict the outcomes of the experiment and consequently its associated numerical value beforehand. ~1~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

 Definition: A random variable 𝑋: 𝑆 → 𝑅 is a numerical valued function defined on a sample space. That is, a number 𝑋(𝑎) is assigned to an outcome 𝑎 in S.  Please always keep in mind that a rv X is a function rather than a number. The value of X depends on an outcome. More rigorously, we would write the event { 𝑋 = 𝑥 } to represent {𝑎 ∈ 𝑆|𝑋(𝑎) = 𝑥 }, the event { 𝑋 ≥ 𝑥 } to represent {𝑎 ∈ 𝑆|𝑋(𝑎) ≥ 𝑥 }, etc.  By the notion of rv, now we have a NEW interpretation of DATA: DATA ARE THE ACTUAL VALUES (REALIZATIONS) OF THE CORRESPONDING RANDOM VARIABLE.

EXA XAMP MP MPLE LE Suppose we roll a pair of fair/regular hexagonal dice and the rv X represents their sum. The figure below explicitly defines the sample space and the rv X mapping from the sample space (S) into the set (R) of all real numbers.

~2~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

Note that we can assign the same values to different elementary outcomes in S, e.g. X({1, 2}) = X({2, 1}) = 3. Therefore, the event like {𝑎 ∈ 𝑆|𝑋(𝑎) = 𝑥} can contain more than one element. Additionally, we have

𝑃(𝑋 = 𝑥) = 𝑃({𝑋 = 𝑥}) = 𝑃({𝑎 ∈ 𝑆| 𝑋(𝑎) = 𝑥})

From the previous picture, we can find that

X({1, 3}) = X({2, 2}) = X({3, 1}) = 4. Thus,

𝑃(𝑋 = 4) = 𝑃({𝑋 = 4}) = 𝑃({𝒂 ∈ 𝑺| 𝑿(𝒂) = 𝟒}) = 𝑃({𝟏, 𝟑}, {𝟐, 𝟐}, {𝟑, 𝟏}) 1 . = 𝑃({1, 3}) + 𝑃({2, 2}) + 𝑃 ({3, 1}) = 12

Similarly,

𝑃(𝑋 ≤ 4) = 𝑃({𝑋 ≤ 4}) = 𝑃({𝒂 ∈ 𝑺| 𝑿(𝒂) ≤ 𝟒}) = 𝑃({𝒂 ∈ 𝑺| 𝑿(𝒂) = 𝟐, 𝟑, 𝒐𝒓 𝟒})

= 𝑃({𝟏, 𝟏}, {𝟏, 𝟐}, {𝟐, 𝟏}, {𝟏, 𝟑}, {𝟐, 𝟐}, {𝟑, 𝟏}) =

1 . 6

Some of you may think that the above procedure of using the probability of the elementary outcomes to find the probability of the rv X is a bit tedious and less efficient.

Indeed, if we know the (probability) distribution of X, then we can use it to find the probability of X directly without using any probability of the elementary outcomes. This is so because the distribution of X can fully specify the rv X itself.

So, what is the probability distribution of a rv exactly?

MATH2411: Applied Statistics | Dr. YU, Chi Wai

2 PRO ROBABI BABILI LITY AND ROBABI BABILI LITY DISTRIB TRIBUTIO UTION RAN ANDOM VARIA IABLE BABI LI TY AN D PRO BABI LI TY DIS TRIB UTIO N OF A R AN DOM VAR IA BLE Note that a rv has its own probability law --- a rule that assigns probabilities to the different values of the r.v. Such a probability law or the probability assignment is often called a (probability) distribution.  In other words, a (probability) distribution of a rv is the collection of all values it can take on along with the probability of each value. So, a rv can be specified by its distribution.

This is a tabular form of showing the distribution of a (discrete) rv, where the discrete rv will be defined and discussed more in sections 3 and 4. ~4~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

Now, the distribution of X is known, so we can find the probability of X directly and easily. For instance,

𝑃(𝑋 ≤ 4) = 𝑃({𝑋 ≤ 4}) = 𝑃(𝑋 = 2, 3, or 4) = 𝑃(𝑋 = 2) + 𝑃(𝑋 = 3) + 𝑃(𝑋 = 4) 1 2 3 1 . = + + =6 36 36 36

Alternatively,

 A (cumulative) distribution function, denoted by 𝐹(𝑥), can also be used to specify a rv.

 Definition: A (cumulative) distribution function (or cdf) 𝐹(𝑥) is defined as

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥), for all real values 𝑥.

The cdf is quite useful when we want to find the probability that X falls in an open and close interval because for any 𝑎 < 𝑏, we have

𝑃(𝑎 < 𝑋 ≤ 𝑏) = 𝐹 (𝑏) − 𝐹(𝑎).

~5~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

EXAMP XAMP MPLE LE The distribution of the number X of mortgages approved per week at the local branch office of a bank is given below:

This table lists ALL (why?) possible values of X together with the associated probabilities for the distribution of X.

Questions i) ii)

What is the probability that on a given week fewer than 4 home mortgages had been approved? What is the probability that on a given week more than 2 but no more than 5 home mortgages had been approved?

~6~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

Answers i)

The probability for fewer than 4 mortgages is

ii)

The probability for more than 2 but no more than 5 mortgages is

The cdf of X is

~7~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

MORE AABOU BOU BOUT T TH THE EP PRO RO ROBABI BABI BABILI LI LITY TY O OFF RV

For the probability of the event that 𝑋 > 𝑎, the distribution of X must be known so that we can find 𝑃(𝑋 > 𝑎) exactly --- note that the probability can be found without ever making any observation (i.e. data), while for the corresponding event with the actual value 𝑥 of X, 𝑥 > 𝑎, we know that it either occurs or does not occur, so the respective probability of 𝑥 > 𝑎 is either 1 or 0. EXAMPLE: If we consider a random experiment of tossing a fair coin in 5 trials with X of the total number of heads and want to find 𝑃(𝑋 > 3), then we need to know the distribution of X. Later, we would study some well-known distributions and then would know that such a rv X follows a Binomial Distribution. According to this distribution, we know that

the exact value of 𝑃(𝑋 > 3) is 3/16 = 0.1875.

If the distribution is NOT known, then we can use the relative frequency of the event { 𝑋 > 3 } to get an approximation to 𝑃(𝑋 > 3) by repeatedly performing the random experiment (of tossing the coin in 5 trials) n times. Thus, we would have n data (realizations) of the random variable X. The relative frequency used in this case is the proportion of data greater than 3. In some probabilistic sense, we know that when n gets larger and larger, the relative frequency would be closer and closer to the exact value of (𝑋 > 3) . Here we use a free statistical package R (https://www.r-project.org/) to mimic the random experience to generate data of X and calculate the relative frequency with n = 100 and n = 500.

~8~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

~9~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

3 TWO TYP TYPES OFF RA RANDO NDOM VAR ARIABL IABLES ES O NDO MV AR IABL ES

 The range of a rv, denoted by 𝝌, is the collection of all possible values it can take on. For instance,

o 𝑋 → {0, 1, … , 𝑛} ; 𝑌 → {1, 2, 3, … , } ; 𝑍 → [0, ∞) . o We then can use the range of rv to classify it to be a  Continuous rv, or  Discrete rv. o Note that there exists a rv being both discrete and continuous, but we will not discuss it in this course.

3.1 DISC ISCRET RET RETE ER RAND AND ANDOM OM VAR VARIA IA IABLE BLE  It is a rv that has a finite or countable range. o The number of defective items, the number of sales for a store, …

3.2 CON ONTIN TIN TINUOU UOU UOUSS RAN RANDOM DOM VA VARIA RIA RIABLE BLE  It is a rv whose range is an interval over the real line. o Weight of an item, time until failure of a mechanical component, length of an object, …..

4 DISCR ISCRETE RANDO NDOM VARIA RIABLE ETE RA NDO M VA RIA BLE 4.1 PRO ROBAB BAB BABILI ILI ILITY TY M MA ASS FFUNCT UNCT UNCTION ION A AND ND DIS DISTR TR TRIBUT IBUT IBUTIO IO ION N FU FUNCTI NCTI NCTION ON (Probability mass function, pmf)

The probability MASS function of a DISCRETE rv X, denoted by 𝑝(𝑥), is a function that gives us the probability of occurrence for each possible value 𝑥 of X. It is valid for all possible values 𝑥 of X.  Conditions for a pmf: o 0 < 𝑝(𝑥) ≤ 1, for all 𝑥 in the range of X. o ∑𝑥∈𝜒 𝑝(𝑥) = 1. ~ 10 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

 Cdf of a discrete rv: o 𝐹(𝑎) = 𝑃(𝑋 ≤ 𝑎) = ∑ 𝑥≤𝑎 𝑝(𝑥), for all real values 𝑎. Referring to the pmf in the above question, we have

𝐹(1) = 𝑃(𝑋 ≤ 1) = 𝑃(𝑋 = 1) = 𝑝(1) = 1/6 𝐹(1.5) = 𝑃(𝑋 ≤ 1.5) = 𝑃(𝑋 = 1) = 1/6

𝐹(1.26) = 𝑃(𝑋 ≤ 1.26) = 𝑃(𝑋 = 1) = 1/6

𝐹(2) = 𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2) = 𝑝(1) + 𝑝(2) = 1/2 𝐹(3) = 𝑃(𝑋 ≤ 3) = 𝑝(1) + 𝑝(2) + 𝑝(3) = 1

As can be seen above, the cdf of a discrete random variable would be a step function with the size 𝑝(𝑎) of the jumps at the possible value 𝑎.

 If the range of the discrete rv X is expressed by {𝑥1 , 𝑥2 , 𝑥3 , …}, where 𝑥1 < 𝑥2 < 𝑥3 < ⋯ , then we have

𝑝(𝑥1 ) = 𝐹 (𝑥1 ) 𝑎𝑛𝑑 𝑝(𝑥𝑗 ) = 𝐹(𝑥𝑗 ) − 𝐹(𝑥𝑗−1 ),

where 𝑗 = 2, 3, ….

~ 11 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

4.2 POP OPULA ULA ULATION TION MEA MEAN N AN AND D POP POPULA ULA ULATION TION VA VARIAN RIAN RIANCE CE (Population mean)

If X is a discrete rv with its pmf 𝑝(𝑥) and its range 𝝌 , then the population mean (expectation, expected value) of X is defined by multiplying each possible value 𝑥 by its corresponding probability 𝑝(𝑥) and then sum these products. That is, 𝐸(𝑋) = ∑[𝑥 𝑝(𝑥)] ,

if the sum exists.

𝑥∈𝜒

 The population mean is usually denoted by 𝜇 or 𝜇𝑋 .  It is a common measure of central location of the random variable X.  It is different from the sample mean, the mean of data. o Population mean is determined by the distribution of the random variable, while sample mean is determined by the collection of the actual observations of the random variable. o Thus, population mean is fixed (even it is often unknown in practice) but sample mean is different when different data are used.

(Population variance)

If X is a discrete rv with its pmf 𝑝(𝑥) and its range 𝝌 , then the population variance of X is defined as the weighted average of the squared differences between each possible value and its population mean. That is, 𝑉𝑎𝑟(𝑋) = ∑[(𝑥 − 𝜇)2 𝑝(𝑥)] ,

if the sum exists.

𝑥∈𝜒

 The population variance is usually denoted by 𝜎 2 or 𝜎𝑋2 .  The positive square root of 𝜎 2 , denoted by 𝜎, is called the population standard deviation (sd) of X.  Both of 𝜎 2 and 𝜎 are common measures of spread of the random variable X.  Population variance (sd) is different from the sample variance (sd), the variance (sd) of data.

~ 12 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

o Population variance (sd) is determined by the distribution of the random variable, while sample variance (sd) is determined by the collection of the actual observations of the random variable. o Thus, population variance (sd) is fixed (even it is often unknown in practice) but sample variance (sd) is different when different data are used.  It can be showed that

𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 .

EXA XAMP MP MPLE LE

𝒙

𝒑(𝒙)

0

0.25

1

0.50

2

0.25

Probability

Toss a fair coin two times, and consider a random variable X of the number of heads.

0.50 0.25

0

Thus, E(X) = (0)(0.25) + (1)(0.50) + (2)(0.25) = 1 , and Var(X) =

~ 13 ~

1

2

X

MATH2411: Applied Statistics | Dr. YU, Chi Wai

EXAMP XAMP MPLE LE Consider the probability distribution for the returns on stock A and B provided below:

Probability 0.2 0.3 0.3 0.2

X Return on Stock A 1% 2% 3% 4%

Y Return on Stock B 10% 6% 2% -2%

Thus, the expected return on stock A is

𝜇𝑋 = 𝐸 (𝑋) = 1% × 0.2 + 2% × 0.3 + 3% × 0.3 + 4% × 0.2 = 2.5%

and the expected return on stock B is

𝜇𝑌 = 𝐸(𝑌) = 10% × 0.2 + 6% × 0.3 + 2% × 0.3 + (−2%) × 0.2 = 4%.

Moreover, the variance and standard deviation of return on stock A are:

𝐸(𝑋 2 ) = (1%)2 × 0.2 + (2%)2 × 0.3 + (3%)2 × 0.3 + (4%)2 × 0.2 = 0.00073

𝜎𝑋2 = 𝐸(𝑋 2 ) − 𝜇𝑋2 = 0.00073 − (0.025)2 = 0.000105

𝜎𝑋 = √0.000105 = 1.02% and

the variance and standard deviation of return on stock B are:

𝐸(𝑌 2 ) = (10%)2 × 0.2 + (6%)2 × 0.3 + (2%)2 × 0.3 + (−2%)2 × 0.2 = 0.00328

𝜎𝑌2 = 𝐸(𝑌 2 ) − 𝜇𝑌2 = 0.00328 − (0.04)2 = 0.00168

𝜎𝑌 = √0.00168 = 4.10%

~ 14 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

Even though stock B offers a higher expected return than stock A, it is more risky because its variance and standards deviation are greater than stock A’s. [Note that this is only part of the picture because most investors would choose to hold securities as part of a diversified portfolio.]

PROPER OPERTIES TIES Suppose X is a discrete random variable and c and d are two constants. Then 1. 𝐸(𝑐𝑋 + 𝑑) = 𝑐𝐸 (𝑋) + 𝑑. 2. 𝑉𝑎𝑟(𝑐𝑋 + 𝑑) = 𝑐 2 𝑉𝑎𝑟(𝑋) .

Refer to the example of stock returns. Now suppose a simple portfolio allocates 0.7 of the fund to invest in stock B and 0.3 to a term deposit with 2% fixed interest rate. Thus, the overall return of the portfolio can be expressed as 𝑅 = 0.7 × 𝑌 + 0.3 × 2% = 0.7𝑌 + 0.006 .

Thus, the expected return of the portfolio R is

𝐸(𝑅) = 0.7𝐸(𝑌) + 0.006 = 0.7 × 0.04 + 0.006 = 3.4%

with its risk evaluated as

𝜎𝑅2 = 𝑉𝑎𝑟(𝑅) = 0.72 𝑉𝑎𝑟(𝑌) = 0.72 × 0.00168 = 0.0008232

𝜎𝑅 = √0.0008232 = 2.87%

In this simple example, we demonstrate the risk-return trade off. Comparing to a total asset of stock B, this portfolio can reduce the risk substantially (4.1%  2.87%) with only little sacrifice in the expected return (4%  3.4%). ~ 15 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

5 CONTI ONTIN UOUS RAND ANDOM VARIA ARIABLE NUO US R AND OM V ARIA BLE 5.1 PRO ROBAB BAB BABILI ILI ILITY TY DE DENSI NSI NSITY TY FFUNCT UNCT UNCTION ION A AND ND DIS DISTRI TRI TRIBUTI BUTI BUTION ON FFUNC UNC UNCTION TION (Probability density function, pdf)

The probability DENSITY function of a CONTINUOUS rv X, denoted by 𝑓(𝑥), is a function that gives us a value for the measure of how likely it is that X is near to 𝑥. It is valid for all possible values 𝑥 of X.  Conditions for a pdf: o 0 < 𝑓(𝑥), for all 𝑥 in the range of X. ∞ o ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1.

 Note that the value of 𝑓(𝑥) does NOT give the probability that the corresponding random variables takes on the value 𝑥 ; indeed, in the continuous case, the probabilities are given by integrating 𝑓(𝑥) over a particular interval.  Probability that X belongs to A: 𝑃(𝑋 ∈ 𝐴) = ∫ 𝑓(𝑥) 𝑑𝑥. 𝑥∈𝐴

 Cdf of a continuous rv: o 𝐹(𝑎) = 𝑃(𝑋 ≤ 𝑎) = ∫−∞ 𝑓(x) 𝑑𝑥, for all real values 𝑎. 𝑎

 Convert 𝐹(𝑎) to 𝑓(𝑎):

𝑑𝑎

𝐹(𝑎) .

~ 16 ~

𝑓(𝑎) =

𝑑

MATH2411: Applied Statistics | Dr. YU, Chi Wai

EXAMP XAMP MPLE LE Consider the following pdf of a rv X 𝑓(𝑥) = {

𝑐(2𝑥 − 𝑥 2 ),

𝑓𝑜𝑟 0 < 𝑥 < 2;

0,

𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒.

Note that by the condition of being a pdf the constant c can be solved by 𝑥3 4𝑐 3 2 ⇒ 𝑐= . 1 = ∫ 𝑐(2𝑥 − 𝑥 ) 𝑑𝑥 = 𝑐 [𝑥 − ] = 3 0 3 4 0 2

2

2

For any 𝑎 𝑎𝑛𝑑 𝑏 in (0, 2], where 𝑎 < 𝑏, we have 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫

and the cdf of X is 𝑡

1 3 (2𝑥 − 𝑥 2 ) 𝑑𝑥 = [3(𝑏2 − 𝑎2 ) − (𝑏 3 − 𝑎 3 )] , 4 𝑎 4 𝑏

𝐹(𝑡) = ∫ 𝑓(𝑥) 𝑑𝑥 = ∫ −∞

𝐹(𝑡) = 0,

𝑡3

0

1 (2𝑥 − 𝑥 2 ) 𝑑𝑥 = (3𝑡 2 − 𝑡 3 ), 4 4

𝑓𝑜𝑟 𝑡 ≤ 0,

𝑎𝑛𝑑

~ 17 ~

𝐹(𝑡) = 1,

𝑓𝑜𝑟 0 < 𝑡 ≤ 2,

𝑓𝑜𝑟 𝑡 > 2.

MATH2411: Applied Statistics | Dr. YU, Chi Wai

5.2 POP OPULA ULA ULATION TION MEA MEAN N AN AND D POP POPULA ULA ULATION TION VA VARIAN RIAN RIANCE CE (Population mean)

If X is a continuous rv with its pdf 𝑓(𝑥), then the population mean (expectation, expected value) of X is defined as ∞

𝐸(𝑋) = ∫ [𝑥 𝑓(𝑥)] 𝑑𝑥, −∞

𝑖𝑓 𝑖𝑡 𝑒𝑥𝑖𝑠𝑡𝑠.

 The population mean is usually denoted by 𝜇 or 𝜇𝑋 .  It is a common measure of central location of the random variable X.  It is different from the sample mean, the mean of data. o Population mean is determined by the distribution of the random variable, while sample mean is determined by the collection of the actual observations of the random variable. o Thus, population mean is fixed (even it is often unknown in practice) but sample mean is different when different data are used.

(Population variance)

If X is a continuous rv with its pdf 𝑓(𝑥), then the population variance of X is defined as ∞

𝑉𝑎𝑟(𝑋) = ∫ [(𝑥 − 𝜇)2 𝑓(𝑥)] 𝑑𝑥, −∞

𝑖𝑓 𝑖𝑡 𝑒𝑥𝑖𝑠𝑡𝑠.

 The population variance is usually denoted by 𝜎 2 or 𝜎𝑋2 .  The positive square root of 𝜎 2 , denoted by 𝜎, is called the population standard deviation (sd) of X.  Both of 𝜎 2 and 𝜎 are common measures of spread of the random variable X.

~ 18 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

 Population variance (sd) is different from the sample variance (sd), the variance (sd) of data. o Population variance (sd) is determined by the distribution of the random variable, while sample variance (sd) is determined by the collection of the actual observations of the random variable. o Thus, population variance (sd) is fixed (even it is often unknown in practice) but sample variance (sd) is different when different data are used.  It can be showed that

𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2 .

PROPER OPERTIES TIES Suppose X is a continuous random variable and c and d are two constants. Then 1. 𝐸(𝑐𝑋 + 𝑑) = 𝑐𝐸 (𝑋) + 𝑑. 2. 𝑉𝑎𝑟(𝑐𝑋 + 𝑑) = 𝑐 2 𝑉𝑎𝑟(𝑋) .

~ 19 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

6

WELL-KN KNOW OWN DISTRI TRIBUTIO BUTIONS OW N DIS TRI BUTIO NS

The following well-known distributions are important in statistics as many results are derived for them leading to quick analyses.

 Discrete Distribution ฀ The distribution of a random variable X is discrete in the sense that all the possible values of X are isolated points. ฀ Its cdf only increases at jump points.  Continuous Distribution ฀ Describes events over a continuous range, where the probability of a specific outcome is zero. ฀ Its cdf is continuous.

~ 20 ~

MATH2411: Applied Statistics | Dr. YU, Chi Wai

INOMIA OMIALL D DIS ISTRI TRIBUT BUTION 6.1 BIN OMIA IS TRI BUT ION A Binomial distribution is related to a random experiment with the following features:

   

Fixed finite number of identical trials, say 𝑛 < ∞ . Trials are independent. Trials result in two possible outcomes denoted by success and failure. The probability of success p is constant across trials.

Here are some typi...