Chapter 6 - normal distribution PDF

Title Chapter 6 - normal distribution
Course Practical Statistics
Institution Brock University
Pages 10
File Size 179.2 KB
File Type PDF
Total Downloads 96
Total Views 138

Summary

Chapter 6 - normal distribution...


Description

CHAPTER 6: NORMAL DISTRIBUTION IN THE PREVIOUS CHAPTER, WE DISCUSSED RANDOM VARIABLES OF DISCRETE (INTEGER) TYPE. BUT MANY RANDOM VARIABLES ARE CONTINUOUS (E.G. WEIGHT, VOLUME, TEMPERATURE, HEIGHT, TIME, ETC.).

THE MOST COMMON DISTRIBUTION WHICH IS APPROPRIATE TO DEAL WITH THESE IS CALLED NORMAL (FIG. 6-1). NOTE THAT THE DISTRIBUTION IS NOW DEFINED BY A (PROBABILITY DENSITY) FUNCTION (GRAPHICALLY, A ‘BELLSHAPED’ CURVE).

FOR ANY CONTINUOUS DISTRIBUTION , IT IS IMPOSSIBLE TO COMPUTE THE PROBABILITY OF A SINGLE OUTCOME (E.G. WEIGHT BEING EXACTLY EQUAL TO 3.25048.... LB.) - THESE ARE ALL TOO SMALL TO BE MEANINGFUL.

1

WHAT WE CAN DO IS TO FIND THE PROBABILITY OF ANY RANGE OF VALUES (E.G. 120 TO 130 LB., OR 16.5 TO 17.5 LB.).

MATHEMATICALLY, THIS REQUIRES INTEGRATING THE PROBABILITY DENSITY FUNCTION OVER THE CORRESPONDING RANGE (GRAPHICALLY, THIS YIELDS THE AREA BETWEEN THE CURVE AND THE HORIZONTAL AXIS). THIS, BEING RATHER DIFFICULT, HAS BEEN DONE FOR US, ONCE AND FOR ALL, AND THE RESULTS ARE SUMMARIZED IN A TABLE (A26, ALSO THE INSERT) - LEARNING HOW TO USE IT IS THE MAIN PART OF THIS CHAPTER.

ONE CAN ALSO DEFINE THE NORMAL DISTRIBUTION’S MEAN AND STANDARD DEVIATION (THIS REQUIRES SWITCHING FROM A SUMMATION TO INTEGRATION - SO WE WILL NOT BOTHER WITH THE DETAILS). IT TURNS OUT THAT THE MEAN : IS WHERE THE CURVE PEAKS (AT THE CENTER OF SYMMETRY THIS IS ALWAYS THE CASE WHEN THE 2

DISTRIBUTION IS SYMMETRIC), AND THE STANDARD DEVIATION F IS THE DISTANCE FROM : TO EITHER OF THE TWO INFLECTION POINTS (CHANGING FROM CURVING UP TO CURVING DOWNWARDS). THIS GIVES US ONLY A WAY OF GRAPHICALLY ESTIMATING THESE, BUT THAT’S ALL WE NEED. SINCE : AND F ARE THE BASIC PARAMETERS OF THE NORMAL DISTRIBUTION, THEY WILL ‘NORMALLY’ BE GIVEN TO US.

NOTE THAT ABOUT 68% (MORE ACCURATELY, 68.2%) OF THE CURVE’S AREA (REPRESENTING PROBABILITY) IS BETWEEN : - F AND : + F , ABOUT 95% BETWEEN : 2F AND : + 2F , AND ALMOST EVERYTHING (SHORT OF 0.3%) IN THE : - 3F TO : + 3F RANGE.

TO VERIFY THESE (AND ANY OTHER) PROBABILITIES BASED ON TABLE 5 (A26), WE MUST FIRST APPRECIATE THIS: SINCE THE TWO PARAMETERS : AND F 3

CAN HAVE ANY REAL VALUES (F MUST BE POSITIVE), IT APPEARS THAT WE MAY NEED A MULTITUDE OF TABLES TO COVER THEM ALL. BY A SMART TRICK, WE CAN MAKE DO WITH ONLY ONE TABLE, THAT OF THE SO CALLED STANDARD NORMAL DISTRIBUTION, WHOSE MEAN IS 0 AND STANDARD DEVIATION IS 1.

ANY NORMALLY DISTRIBUTED RANDOM VARIABLE X CAN BE CONVERTED TO THIS STANDARD (Z) DISTRIBUTION WITH THE HELP OF THE FOLLOWING SIMPLE (LINEAR) FORMULA: X −µ = Z

σ

TO LEARN HOW TO ANSWER PROBABILITY QUESTIONS RELATING TO X, WE MUST FIRST LEARN HOW TO DEAL WITH Z. EXAMPLES: 4

P(Z < 1.83) = 0.9664

(DIRECTLY, IN 1.8 ROW, .03 COLUMN)

P(0.47 < Z < 1.06) = P(Z < 1.06) - P(Z < 0.47) = 0.8554 - 0.6808 = 0.1746 P( -1.14 < Z < 0.30) = P(Z < 0.30) - P(Z < -1.14) = 0.6179 - 0.1271 = 0.4908 (UNDERSTAND GRAPHICALLY). P(Z > 1.4) = 1 - P(Z < 1.4) = 1.0000 - 0.9192 = 0.0808 (COMPLEMENT RULE).

NOW WE CAN TRY ANSWERING QUESTIONS ABOUT ANY NORMALLY DISTRIBUTED RANDOM VARIABLE X. THE TRICK IS TO CONVERT IT INTO A QUESTION ABOUT Z. EXAMPLE: SUPPOSE THAT

X HAS THE NORMAL DISTRIBUTION WITH

: = 26.7 AND F = 3.4 . P(20 < X < 30) = P(

FIND:

20 − 26.7 X − 26.7 30 − 26.7 )= < < 3.4 3.4 3.4

P(-1.97 < Z < 0.97) = P(Z < 0.97) - P(Z < -1.97) = 0.8340 - 0.0244 = 0.8096 5

SOMETIMES, THE QUESTION IS REVERSED, E.G: FIND THE VALUE OF z SUCH THAT P( Z < z ) = 97% (I.E. 0.9700)

THIS MEANS THAT, IN OUR TABLE, WE HAVE TO SEARCH FOR 0.9700 (OR THE NEAREST VALUE). THE CLOSEST WE CAN GET IS 0.9699, AT z = 1.88 . MORE EXAMPLES: FIND

z SO THAT P(-z < Z < z) = 90%.

CLEARLY, THIS IS THE SAME AS P(Z < z) = 0.9500. SEARCHING FOR THE NEAREST VALUE YIELDS 0.9495 AT z = 1.64 (OR 0.9505 AT z = 1.65, AT OUR LEVEL OF ACCURACY, EITHER WILL BE CONSIDERED CORRECT).

FIND

z SUCH THAT P(Z < z) = 2%

(0.0200).

MOVING THOUGH OUR TABLE IN THE RIGHT TO LEFT MANNER, WE FIND THAT THE CLOSES ANSWER IS:

z = - 2.05 .

THE SAME KIND OF QUESTION CAN BE POSED IN TERMS OF X (ANY : AND F ): 6

ASSUMING THAT X IS NORMAL WITH : FIND x SUCH THAT

= 113 AND F = 17,

P(X < x) = 70% .

FIRST, WE CONVERT TO

P(

X − 113 x − 113 ) / P(Z < z) = .7000 < 17 17

THIS, WE ALREADY KNOW HOW TO SOLVE (WE GET AS CLOSE AS WE CAN TO 0.7000 BY TAKING z = 0.52). THE CONVERSION FROM z TO x

x = z⋅σ + µ

IS DONE WITH THE HELP OF I.E.

x = 0.52×17 + 113 = 121.84

ASSUMING THAT X IS NORMAL WITH :

= - 12.6 AND F = 8.4,

FIND x SUCH THAT

P(X > x) = 27% .

FIRST, WE ANSWER:

P(Z > z) = 0.2700 OR, EQUIVALENTLY,

P(Z < z) = 1.0000 - 0.2700 = 0.7300. THIS IS ACHIEVED (AS CLOSELY AS WE CAN) BY TAKING z = 0.61 .

ALL WE NEED TO DO NOW IS THE

x = 0.61 × 8.4 - 12.6 = - 7.476

CONVERSION.

THE REASON WHY THE NORMAL DISTRIBUTION IS SO IMPORTANT TO US IS THIS: THE EXPERIMENTAL AVERAGE OF ANY QUANTITY (E.G. SAMPLE MEAN) IS A RANDOM VARIABLE WHOSE DISTRIBUTION IS, FOR LARGE n (WHERE ‘LARGE’ IS 7

USUALLY TAKEN TO BE >30) ALMOST PERFECTLY NORMAL.

THIS EXTENDS TO THE CASE OF BINOMIAL DISTRIBUTION (WHEN BOTH np AND nq >5, SEE FIG. 6-37 ) AND ALSO TO THE POISSON DISTRIBUTION (8 > 30).

THE ONLY ‘IMPERFECTION’ OF THE ACTUAL MATCH IS THE FACT THAT THE BINOMIAL DISTRIBUTION REMAINS DISCRETE (THE HISTOGRAM CAN NEVER BE PERFECTLY ‘SMOOTH’). THIS REQUIRES INTRODUCING THE SO CALLED CONTINUITY CORRECTION.

EXAMPLE: USING THE NORMAL APPROXIMATION, FIND THE PROBABILITY OF OBTAINING MORE THAN 20 SIXES WHEN ROLLING A DIE 100 TIMES.

THE NUMBER OF SIXES (SAY X ) HAS, OBVIOUSLY, THE BINOMIAL DISTRIBUTION WITH n = 100 AND p = 1/6. SINCE np = 16.7 AND nq = 83.3 (EACH > 5), THE NORMAL APPROXIMATION SHOULD YIELD GOOD RESULTS.

8

THE MEAN OF THE DISTRIBUTION IS STANDARD DEVIATION EQUALS TO

np = 16.667 AND THE n pq = 3.727 .

TREATING X

AS NORMAL, WE HAVE TO COMPUTE P(X > 20.5), SINCE WE WANT TO INCLUDE 21, 22, 23, .... BUT EXCLUDE 20, 19, ... . USING THE HALF INTEGER BETWEEN THE INCLUDEEXCLUDE RANGES IMPROVES THE ACCURACY OF THE RESULT (AND IS CALLED CONTINUITY CORRECTION):   X −16.667 205 . −16667 .  = P( Z >103 > . ) P( X > 205 . ) = P  3727 . 3727 .  

= 10000 − 08485 = 1515% . . .

WITH THE HELP OF MINITAB, WE CAN EVALUATE THIS PROBABILITY EXACTLY (WE WOULD GET 15.19%, IN EXCELLENT AGREEMENT WITH OUR APPROXIMATION - HERE WE WOULD EXPECT THE ACCURACY OF ABOUT 0.25%).

SIMILARLY, IF CUSTOMERS ARRIVE AT AN AVERAGE RATE OF 12.7 PER HOUR, THE PROBABILITY OF GETTING MORE THAN 120 CUSTOMERS IN AN EIGHT HOUR SHIFT SHOULD BE COMPUTED USING THE POISSON FORMULA WITH 8 = 12.7 × 8 = 101.6 . IT IS A LOT EASIER (AND QUITE LEGITIMATE, SINCE OUR 8 IS A LOT BIGGER THAN 30) TO ASSUME THAT THE DISTRIBUTION IS, TO A GOOD APPROXIMATION, NORMAL, WITH THE MEAN OF 101.6 AND STANDARD DEVIATION OF 1016 . = 10.08. 9

WE CAN THEN DO: . > 1205 . −1016 . = . ) = P X −1016 P( X > 1205  . 1008 .  1008  



. ) = 10000 . . . = P (Z > 1875 − 09696 = 304%

WITH THE HELP OF MINITAB, THE EXACT ANSWER IS

10

3.31%...


Similar Free PDFs