Lecture 11 + 12 notes PDF

Title Lecture 11 + 12 notes
Author John Doe
Course Introductory to Statistics
Institution The University of British Columbia
Pages 16
File Size 396.8 KB
File Type PDF
Total Downloads 60
Total Views 833

Summary

COMMERCE 291 – Lecture Notes 2021 – © Jonathan BerkowitzNot to be copied, used, or revised without explicit written permission from the copyright owner.Summary of Lectures 11 and 12More on the Normal; Surveys and SamplingWhy Is the Normal Distribution So Important? In addition to the fact that so ma...


Description

COMMERCE 291 – Lecture Notes 2021 – © Jonathan Berkowitz Not to be copied, used, or revised without explicit written permission from the copyright owner.

Summary of Lectures 11 and 12

More on the Normal; Surveys and Sampling Why Is the Normal Distribution So Important? In addition to the fact that so many phenomena can be modelled by the normal curve, there are other important properties. #1. Adding Normally Distributed Random Variables If X and Y are random variables each with a normal distribution, X + Y also has a normal distribution. In short, adding “normals” give a normal. And, since subtracting is the same as adding a “negative,” X – Y is also normal. Multiplying by constants doesn’t spoil things either, so aX ± bY is also normal. Previously we learned about the rules for computing the mean, variance, and standard deviation of combinations of random variables. Let’s combine these two results. If X is N(μX,σ X), Y is N(μY,σY), and X and Y are independent, then: X + Y is N(𝜇𝑋 + 𝜇𝑌 ,√𝜎𝑋2 + 𝜎𝑌2 ), and

X – Y is N(𝜇𝑋 − 𝜇𝑌 , √𝜎𝑋2 + 𝜎𝑌2 ).

Example 1. Fred and Neil are playing in a golf tournament. Record-keeping from previous years show that Fred’s scores are normally distributed with mean 110 and standard deviation 10, and that Neil’s scores are normally distributed with mean 100 and standard deviation 8. They play independently. What is the probability that Fred will beat Neil? Solution: Let X = Fred’s score and Y = Neil’s score. In golf, the low score wins. Find Pr(X – Y 30) = Pr (𝑍 >

30−20 ) 4

= Pr(𝑍 > 2.5) = 0.0062 or 0.62%.

The chance of passing by guessing is very small, about 1 in 200!

Before we leave the Normal, I’ll remind you of the four “Normal” Excel Functions: NORM.DIST, NORM.INV, NORM.S.DIST, and NORM.S.INV. Make sure you are familiar with all of them.

4

Chapter 8: Surveys and Sampling We now turn to a completely different topic, Surveys and Sampling. A survey is an observational study, where variables of interest are measured but not influenced. It is different from an experiment where variables are actively manipulated. Three Principles of Sampling 1. Examine part of the whole Sampling means “take a subset (i.e., a sample) of a larger whole population and use the information about the sample to give information about the population.” A sample is different from a census, which is a collection of data on all individuals in the population. Compared with a sample, a census is generally harder to complete, costlier, more time-consuming, cumbersome to carry out, and useless when destructive testing is involved. A biased sample is one where the characteristics of the sample differ from the corresponding characteristics of the population. A biased sample has a systematic favouring of certain outcomes. Bias can arise from many sources, including poor wording of the question, undercoverage (leaving out subgroups of the population), nonresponse bias (individuals chosen for the sample refuse to participate), and response bias (attitude or behaviour of the interviewer or respondent, or poor memory for events being asked about, etc.). A good sample should be representative of the population. It is hard to achieve. To get it, the best strategy is Principle #2. 2. Randomize Use a “chance” process to prevent bias. Every sample is different: the difference as are called sampling error or sampling variability. Note that “error” does not mean “mistake”! 3. The Size of the Sample is what matters Sample size determines the generalizability of our conclusions, not the population size (as long as the population is very, very large). There are a number of ways of choosing a sample with randomization.

5

Terminology for Inference •

Population—the universe of interest; all individuals with a common characteristic of interest. Note that the portion of the population you can actually access and choose from is called the sampling frame.



Sample—a subset (randomly chosen AND representative) which will represent the population. Note that if you have the entire population it is called a census.



Parameter—a numerical fact or characteristic about the population of interest



Statistic/Estimate—a numerical fact or characteristic computed from the sample that is used to estimate a parameter of the population

A “clever” mnemonic device: Population ➔ Parameter Sample ➔ Statistic / Estimate Population and Parameter both start with “P”; Sample and Statistic both start with “S”. The brilliant part is that even the first syllable of Estimate is pronounced “S”! Example 1: Suppose you are interested in the mean household income of all Canadian households. The population is all Canadian households; the parameter is the mean household income. Only CRA has this information! However, a survey research firm contacts 500 households at random. These 500 households form the sample; the mean household income of the 500 households is the statistic/estimate. Example 2: Forecasting Elections – two possible parameters are: a) average age of eligible voters; and b) percentage of eligible voters who actually voted. The population is “all eligible voters”; a polling company would take a sample of, say, 1000 voters. How good a sample statistic is at estimating a population parameter depends on how representative the sample is. Remember, the best methods involve randomization.

6

Sampling Designs •

Simple Random Sampling (SRS)—every unit in the population has the same chance of being chosen for the sample. This is the usual approach.



Stratified Random Sampling—split the population into homogeneous (i.e., similar) subgroups called strata; then choose a SRS in each stratum. This has the benefit of reducing sampling variability.



Cluster Random Sampling—split the population into heterogenous (i.e., different) subgroups called clusters, each of which represents the population. Select a few clusters at random and take all the units within each one. If each cluster is fairly representative, the sample will be unbiased.



Multi-stage Cluster Sampling—repeatedly divide the population into smaller and smaller subgroups and, at each stage, use a chance procedure to pick the sample. This combines SRS and cluster sampling at multiple levels.



Systematic Sampling—get an approximate SRS by taking units at regular intervals through the population, but with a random starting point.

Note the difference between strata and clusters: strata are each homogeneous and different from one another; clusters are heterogeneous and similar to one another. Don’t rely on convenience sampling, voluntary response, or worst of all, anecdotal evidence. All are subject to huge bias. Poor samples are those drawn by convenience sampling (choose respondents that are easiest – that is, most convenient – to reach, such as family, friends, and neighbours). Poor samples are also those drawn by voluntary response, where respondents choose themselves. They are likely to be biased samples because people with strong and usually negative opinions are most likely to respond (e.g., open-line talk radio shows). Beware of anecdotal evidence, based on haphazardly selected cases which are not representation (e.g., call-in shows on talk radio, newspaper advice columns) Observational studies vs. Experimental studies In an observational study, the experimenter does not try to influence responses but simply measures variables of interest for a group of individuals. This is what surveys are. In an experiment, the experimenter imposes a treatment or intervention on individuals and studies their responses. Experiments are used to control for confounding factors (i.e., “lurking” variables) while comparing groups.

7

Chapter 9: Sampling Distributions and Confidence Intervals for Proportions The most important, and trickiest, idea of quantifying uncertainty is sampling distributions. We learned previously that a statistic is a quantity computed from sample information. A “statistic” is used to “estimate” a population “parameter.” If a statistic is based on a random sample, then the statistic is a random variable and so it has its own probability distribution which we call a “sampling distribution.” Sampling Distribution of Proportion and Mean Sampling Distribution: the theoretical distribution (think “histogram”) of the values taken by a statistic if a large number of samples of the same size were drawn from the same population. In other words, it is the distribution of all possible values of a statistic (or estimate) if there were many, many, many repetitions of the sampling process! We need this in order to be able to assess how much the sample statistic or estimate could be expected to change from sample to sample. We will figure out the sampling distribution of two popular statistics: • the sample proportion: 𝑝 (for categorical variables) Section 9.2 • the sample mean, 𝑥 (for quantitative variables) Section 9.3 (& 11.1) Sampling Distribution for a Proportion (used for categorical data) Suppose we take repeated random samples of size n (that is, repeated sets of n independent observations) where each observation can be one of only two possible outcomes—let’s call them, “success” and “failure.” For example, in an opinion poll a “success” could be a “yes” vote and a “failure” a “no” vote. For each of the random samples of size n, the number of successes, X, has a Binomial distribution. Convert each X into a proportion 𝑝 by dividing by n. Call this the sample proportion. 𝑝 =

𝑋 𝑛

where X = # successes

Do this many, many times (hypothetically, of course) and draw a histogram of all the 𝑝 's. The shape of this histogram is called the sampling distribution of 𝑝 .

Be careful not to confuse sample distribution and sampling distribution. => Sample distribution is the distribution of individual data => Sampling distribution is the distribution of the statistic or estimate.

8

The shape of this histogram is approximately Normal! This is not a surprise. We can approximate a Binomial distribution with a Normal distribution. The centre of this histogram is: E(𝑝 ) = Mean(𝑝 ) = p

The spread of this histogram is: 𝑆𝐷(𝑝 ) = √

𝑝𝑞 𝑛

Thus, the sampling distribution for 𝑝 , the sample proportion, if the values are a random sample (i.e., independent) is approximately Normal with mean p and

standard deviation √

𝑝𝑞 , 𝑛

IF n is large enough.

Remember that p is the true proportion in the population (i.e., it is the value of the parameter of interest here). Interpretation

When we say the mean of 𝑝 is p, we are saying the Expected Value of 𝑝 or long-run average value of 𝑝 is p. Hence 𝑝 is an unbiased estimate of p, and we can use 𝑝 to estimate p. For example, if a survey of 1000 people shows that 630 answered Yes to a question of interest, then 𝑝 = 630/1000 = 0.63 or 63%. That 63% is an estimate of the true proportion of Yes responses in the entire population. When we say that the standard deviation of 𝑝 is √

𝑝𝑞 , 𝑛

we are saying that the typical

distance from 𝑝 (your estimate) to p (the truth) is about √

𝑝𝑞 . 𝑛

When we say that the sampling distribution is Normal, we are saying that 𝑝 will usually be close to p and occasionally far away, and we can compute the probabilities of how close or how far away using the normal curve.

 into Z by standardizing: We can turn 𝒑

𝒁=

𝒑−𝒑 𝒑𝒒 𝒏



Assumptions and Conditions How large an n is “large enough”? • 10% Condition: The population should be at least 10 times as big as the sample. • Success/Failure Condition: np > 10 and nq > 10. And don’t forget, the sample must be a RANDOM sample.

9

Sampling Distribution for a Mean (used for quantitative data) Suppose we take repeated random samples of size n (that is, repeated sets of n independent observations) of quantitative observations from any population with mean µ and standard deviation σ. For each sample, compute the sample mean 𝑥 (Note: the text calls it 𝑦).

Draw a histogram of all the 𝑥 's. The shape of this histogram is called the sampling distribution of 𝑥 . It turns out that the shape is... ... exactly Normal if the distribution of the original population of observations is Normal ... approximately Normal even if the distribution of the original population is not Normal (as long as the sample size, n, is large enough) More about this later.

The centre of this histogram is: E(𝑥 ) = Mean(𝑥 ) = µ 𝜎 The spread of this histogram is: 𝑆𝐷(𝑥 ) = . √𝑛

Thus, the sampling distribution for 𝑥 , the sample mean, if the values are a random sample (i.e., independent) is approximately Normal with mean µ and standard 𝝈 deviation , IF n is large enough. √𝒏

We will discuss the “approximate” part in a few moments. Remember that µ is the true mean of the original population, and σ is the true standard deviation of the original population. (Although there are two parameters here, we are really only going to be interested in the value of µ.) Interpretation The “root n” in the denominator explains why the mean of several observations is more precise (i.e. has a smaller standard deviation) than a single observation. (In fact, statisticians can show that among all unbiased estimates of , 𝑥 is the most precise (i.e. has the smallest standard deviation). When we say the mean of 𝑥 is µ, we are saying the Expected Value of 𝑥 or long-run average value of 𝑥 is µ. Hence 𝑥 is an unbiased or fair estimate of µ, and we can use 𝑥 to estimate µ. When we say that the standard deviation of 𝑥 is

𝜎

√𝑛

, we are saying that the typical

distance from 𝑥 (your estimate) to µ (the truth) is about

𝜎

√𝑛

.

10

When we say that the sampling distribution is Normal, we are saying that 𝑥 will usually be close to µ and occasionally far away, and we can compute the probabilities of how close or how far away using the normal curve. Earlier I said that the shape of the sampling distribution was either exactly Normal or approximately Normal, depending on the population distribution. Let's look at this in more detail. Case 1. If the original population is Normal, then the sampling distribution of 𝑥 is exactly normal (not just approximately normal), regardless of sample size. That is, if you start with "normal," you retain the "normality." Case 2. If the original population has ANY distribution (even a non-normal one), then the sampling distribution of 𝑥 is approximately normal, if n is large enough. Case 2 is called the Central Limit Theorem (C.L.T.) and is probably the single most important result in all of statistics! How large an “n” is needed? It depends on the distribution of the population. If the distribution is reasonably symmetric, 30 is large enough. If the distribution is really skewed you may need a sample of at least 100. For most practical situations, however, and “n” of 30 is sufficient. Here’s a non-technical explanation of the Central Limit Theorem: If you add enough random quantities together, the sum is approximately normally distributed.

 into Z by standardizing: We can turn 𝒙

𝒁=

−𝝁 𝒙 𝝈 √𝒏

Assumptions and Conditions How large an n is “large enough”? • 10% Condition: The population should be at least 10 times as big as the sample. • Typically, but not always(!), a sample size greater than 30 is sufficient for the Central Limit Theorem to "work". And don’t forget, the sample must be a RANDOM sample.

11

Example 1 Revisited. What is the probability of getting between 40 and 60 heads in 100 tosses of a fair coin? 100(0.5) = 50 and SD(X) = √ 100(0.5)(0.5) = 5.

Solution: Previously we answered this by working with X, the count, where E(X) = 40 − 50 60 − 50 ) = Pr(−2 < 𝑍 < 2) ≈ 0.95 Pr (40 < 𝑋 < 60) = Pr (

) = Pr(𝑍 > 2.5) = 1 − 0.9938 = 0.0062 or 0.62%

0.3−0.2

√0.2(0.8) 100

Example 3. Political Survey In a recent election, a Member of Parliament (MP) received 52% of the votes cast. One year after the election, the MP organized a survey that asked a random sample of 300 people whether they would vote for him in the next election. If we assume that his popularity has not changed, what is the probability that more than half of the sample would vote for him? Solution: Find the probability that the sample proportion is greater than 50%; that is, Pr(𝑝 >0.50). We know that the sample proportion 𝑝 is approximately normally distributed with mean ⁄ 300 = 0.0288 p=0.52 and standard deviation = √𝑝𝑞𝑛⁄ = √(0.52)(0.48) Pr (𝑝 > 0.50) = Pr (𝑝𝑞 > √

) = Pr(𝑍 > −0.69) = 0.7549 or 75.49%

𝑝−𝑝 0.50−0.52 𝑛

√0.52(0.48) 300

12

If we assume that the level of support remains at 52%, the probability that more than half the sample of 300 people would vote for the representative is just over 75%. Example 4: Salaries of a Business School’s Graduates (Source: Keller) Deans and other faculty members in professional schools often monitor how well the graduates of their programs fare in the job market. Information about the types of jobs and their salaries may provide useful information about the success of the program. In the advertisements for a large university, the dean of the School of Business claims that the average salary of the school’s graduates one year after graduation is $800 per week with a standard deviation of $100. A second-year student in the business school who has just completed his statistics course would like to check whether the claim about the mean is correct. He does a survey of 25 people who graduated one year ago and determines their weekly salary. He discovers the sample mean to be $750. To interpret his finding he needs to calculate the probability that a sample of 25 graduates would have a mean of $750 or less when the population mean is $800 and the standard deviation is $100. After calculating the probability, he needs to draw some conclusion. Solution: We want the probability that the sample mean is less than $750: Pr(𝑥 < 750). The distribution of X, the weekly income, is likely to be positively skewed, but not sufficiently so to make the distribution of 𝑥 non-normal. Hence, we may assume that 𝑥 is normal with mean =  = 800 and standard deviation = 𝜎 ⁄√𝑛 = 100/√25. Pr(𝑥 < 750) = Pr (⁄

𝑥−𝜇  750−800 < 100√25 ) 𝜎 √𝑛 ⁄

= Pr(𝑍 < −2.50) = 0.0062 or 0.62%

The probability of observing a sample mean as low as $750 when the population mean is $800 is extremely small. Because this event is quite unlikely, we would have to conclude that the dean’s claim is not justified.

13

Example 5: Elevator Safety A sign in an elevator states that the load limit is 2640 pounds and that no more than 16 persons may occupy the elevator at one time. Assume a normal distribution of weights of elevator riders, with mean 156 pounds and standard deviation 12 pounds. (That is, X, the weight of one rider, is normal with mean 156 and standard deviation 12). What is the chance of exceeding the load limit with 16 randomly chosen riders? Solution: If the total weight is 2640, divide by 16; then the maximum allowable weight is 165 pounds per person. The probability of exceeding the load limit is the probability that the mean of 16 riders is greater than 165. Pr(𝑥 > 165) = Pr (⁄

𝑥−𝜇  165−156 > 12 16 ) 𝜎 √𝑛 ⁄ √

= Pr(𝑍 > 3.00) = 0.0013 or 0.13%. You are very safe!

Compare this with the probability that ONE randomly chosen rider exceeds 165 pounds. Pr(𝑥 > 165) = Pr ( > 𝜎

) = Pr(𝑍 > 0.75) = 0.2266 or 23%
...


Similar Free PDFs