STAT2040 notes - W21 Jeremy Balka PDF

Title STAT2040 notes - W21 Jeremy Balka
Course Statistics I
Institution University of Guelph
Pages 60
File Size 3.3 MB
File Type PDF
Total Downloads 14
Total Views 161

Summary

W21
Jeremy Balka...


Description

Unit 1 (Chapter 1&2) ● Descriptive statistics: plots and numerical summaries are used to describe the distribution of variables, e.g. using a histogram or a boxplot ○ Boxplots are very useful for comparing the distributions of two or more groups. ● In statistical inference we attempt to make appropriate statements about population parameters based on sample statistics. ● A statistically significant difference means it would be very unlikely to observe a difference of that size, if in reality the groups had the same true mean, thus giving strong evidence the observed effect is a real one. ● Individuals or units or cases are the objects on which a measurement is taken. In this example, the bags of popcorn are the units. ● The population is the set of all individuals or units of interest to an investigator. In this example, the population is all bags of popcorn of this type. ○ A population may be finite or infinite ● A parameter is a numerical characteristic of a population. Examples: the mean, the proportion. ● A sample is a subset of individuals or units selected from the population. The 20 bags of popcorn that were selected is a sample. ● A statistic is a numerical characteristic of a sample. The sample mean of the 20 bags (1210 calories) is a statistic. ○ The value of a parameter is typically unknown, but the value of a statistic is known once the sample is taken ● Voluntary response sample: people chose whether to respond or not; tend to be strongly biased; individuals are more likely to respond if they feel very strongly about the issue and less likely to respond if they are indifferent. We avoid bias by randomly selecting members of the population for our sample. ● Simple random sampling: oftentimes the procedures will be appropriate only if the sample is a simple random sample (SRS) from the population of interest. ○ In a simple random sample of size n from a finite population, each possible sample of size n has the same chance of being selected ● Stratified random sampling: The population was divided into different strata (the provinces), then a simple random sample was conducted within each stratum (province) ○ An advantage of this design is that adequate representation from each province is assured. ● Cluster sampling: The population was divided into clusters (the different communities), then the researchers drew a simple random sample of clusters. Within each cluster, the researchers surveyed all members of the population of interest.

Observational study (as opposed to an experiment): In an observational study the researchers observe and measure variables, but do not impose any conditions. ● Observational studies do not typically yield strong evidence of a causal relationship ● Response variable is the variable of interest in the study; An explanatory variable explains or possibly causes changes in the response variable ● Observational studies in and of themselves do not give strong evidence of a causal relationship. The relationship may be causal, but a single observational study cannot yield this information ● Lurking variables: other unmeasured variables, related to both the explanatory and response variables, that influences the interpretation of the results of an observational study Experiment: the researchers applied different conditions (the diets) to the rats. The different diet groups were not pre-existing, they were created by the researchers. In experiments the experimental units are randomly assigned to the different groups wherever possible ● well-designed experiments can give strong evidence of a cause-and-effect relationship ○ If we determine there is a relationship between the explanatory variable and the response variable (using techniques we will learn later), we can be confident that the explanatory variable causes changes in the response variable Confounding: Two variables are confounded if it is impossible to separate their effects on the response. ● A confounding variable may be a lurking variable, but it may also be a measured variable in the study ● Confounding variables can occur in both experiments and observational studies

Unit 2 (Chapter 3) ● A categorical variable is a variable that falls into one of two or more categories ○ Sometimes referred to as qualitative variables Plots for categorical variables ● Frequency table ○ The frequency is the number of observations in a category. ○ The relative frequency is the proportion of observations in a category: relative frequency =

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 , 𝑛

where n represents the total number of observations in the

sample ○ The percent relative frequency is the relative frequency expressed as a percentage: percent relative frequency =

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 × 𝑛

100%.

● Bar graph (bar chart) illustrates the distribution of a categorical variable ○ The categories are often placed on the x-axis, with the frequency, relative frequency, or percent relative frequency on the y-axis ○ A Pareto diagram is a bar graph in which the frequencies (y-axis) are sorted from largest to smallest. ● Pie charts display the same type of data from a different perspective ○ In a pie chart, the relative frequency of each category is represented by the area of the pie segment

A quantitative variable is a numeric variable that represents a measurable quantity histograms and boxplots will be the primary methods of illustrating the distribution of a quantitative variable. Stemplots and dot plots will be used on occasion. ● A histogram is a plot of the class frequencies, relative frequencies, or percent relative frequencies against the class boundaries (or class midpoints). ○ To create a histogram, we first create a frequency table. In a frequency table, a quantitative variable is divided into a number of classes (also known as bins), and the class boundaries and frequency of each class is listed. ○ The cumulative frequency of a class is the number of observations in that class and any lower class ○ The percent relative cumulative frequency of a class is the percentage of observations in that class and any lower class. ○ For mound-shaped distributions, the standard deviation often falls within or close to the interval

𝑅𝑎𝑛𝑔𝑒 𝑅𝑎𝑛𝑔𝑒 to 4 6

.

● A stemplot, also known as a stem-and-leaf display, is a different type of plot that is similar to a histogram. Stemplots are easier to construct by hand than histograms, but unlike histograms, stemplots retain the exact data values. ○ To construct a stemplot: 1. Split each observation into a stem and a leaf. 2. List the stems in ascending order in a column. 3. List the leaves in ascending order next to their corresponding stem. ● In symmetric distribution, the left side and right sides are mirror images. ○ Right skewness is often seen in variables that involve time to an event, or variables such as housing prices and salaries ○ Left skewness is not as common ● The distributions are unimodal (they have a single peak). Most distributions we deal with will be unimodal. But distributions can be bimodal (two peaks) or multimodal (multiple peaks)

Numerical Measures ● In right-skewed distributions, the mean is greater than the median. ● For left-skewed distributions, the mean is less than the median. ● For a perfectly symmetric distribution, the mean and median will be equal. (For an approximately symmetric distribution, the mean and median will be close in value.) ● The mean uses more information than the median, and has some nice mathematical properties that make it the measure of choice for many statistical inference procedures. But the mean can sometimes give a misleading measure of center, since it can be strongly influenced by skewness or extreme values. In these situations, the median is often the preferred descriptive measure of center.

Measures of variability ● The simplest measure of variability is the range: Range = Maximum−Minimum. ● The best measures of variability are based on deviations (xi − 𝑥) from the mean ○ For any set of observations, the deviations’ sum to 0 ● The mean absolute deviation (MAD) is

○ MAD =



Σ |𝑥𝑖−𝑥 | 𝑛

○ we will use the MAD only sparingly, and only as a descriptive measure of variability. The sample variance s2 as the average squared distance from the mean

2

○ s =

Σ (𝑥𝑖−𝑥) 𝑛−1

2

○ the units of the sample variance are the square of the units of the original variable. ● The (sample) standard deviation is defined to be the square root of the sample variance: 2

○ s= 𝑠 ● The variance and standard deviation cannot be negative (s2 ≥ 0, s ≥ 0). They both have a minimum value of 0, and equal exactly 0 only if all observations in the data set are equal. The larger the variance or standard deviation, the more variable the data set.

Empirical rule. (Although it’s called a rule, it’s really more of a rough guideline.) The empirical rule states that for mound-shaped distributions: ● Approximately 68% of the observations lie within 1 standard deviation of the mean. ● Approximately 95% of the observations lie within 2 standard deviations of the mean. ● All or almost all of the observations lie within 3 standard deviations of the mean. The empirical rule does not apply to skewed distributions, but if the skewness is not very strong we may still think of it as a very rough guideline. ● Z-scores measures how many standard deviations the observation is above or below the mean. ○ The z-score for the ith observation in the sample is zi

=

○ If the population parameters µ and σ are known, then: z

𝑥𝑖−𝑥 𝑠 𝑥−µ = σ

● The mean of all z-scores in a data set will always be 0, and the standard deviation of all z scores in a data set will always be 1

● The pth percentile is the value of the variable such that p% of the ordered data values are at or below this value. ○ One rule for calculating percentiles: ■ Order the observations from smallest to largest. 𝑝

■ Calculate n × 100 (n is # of observation; p is pth percentile) ■ If n ×

𝑝 100

is not an integer, round up to the next largest whole number.

The pth percentile is value with that rank in the ordered list. 𝑝

■ If n × 100 is an integer, the pth percentile is the average of the values with ranks n × ○ Quartiles

𝑝 100

𝑝

and n × 100 + 1 in the ordered list

The first quartile (Q1) is the 25th percentile. The second quartile is the 50th percentile----the median. The third quartile (Q3) is the 75th percentile. The interquartile range (IQR) is the distance between the first and third quartiles: IQR = Q3 − Q1 ○ The five-number summary is a listing of the five values; often illustrated with a boxplot ■ Minimum, Q1, Median, Q3, Maximum ● Boxplots: illustrates the distribution of a quantitative variable. ○ Boxplots are very useful for comparing two or more distributions ○ Specifically, a boxplot is made up of: ■ A box extending from Q1 to Q3. ■ A line through the box indicating the median. ■ Lines (whiskers) extending from the box to the largest and smallest observations (to a maximum length of 1.5 × IQR). Any observation outside of these values will be considered an outlier. ■ Outliers are plotted individually outside the whiskers (using lines, dots, asterisks, or another symbol). ■ ■ ■ ■

Linear transformations is of the form: x* = a + bx. ● If we add a constant to every value in the data set, the mean increases by that constant, but the standard deviation does not change. ● Multiplying by a constant changes both measures of location and measures of variability ● To obtain the mean of the transformed variable, simply apply the linear transformation to the mean of the old variable: 𝑥 ∗ = a + b𝑥 . This holds true for the median as well: 𝑥∗ = a + b𝑥 (The same relationship holds for other percentiles as well, provided b ≥ 0.) ● But the additive constant a does not affect the measures of variability: ○ sx* = |b| sx ○ IQRx* = |b| IQRx ○ s2x* = b2 s2x

Unit 3 (Chapter 4) ● The sample space of an experiment, represented by S, is the set of all possible outcomes of the experiment. ○ The individual outcomes in the sample space are called sample points. ■ e.g. S = {1, 2, 3, 4, 5, 6} ■ The sample points must be mutually exclusive (no two sample points can occur on the same trial) and collectively exhaustive (the collection of sample points contains all possible outcomes). ● An event is a subset of the sample space (a collection of sample points). ○ E.g. for the sample set S = {1, 2, 3, 4, 5, 6}. Let event E represent rolling a one, two, or three: E = {1, 2, 3}. ● If all of the sample points are equally likely, then the probability of an event A is:

P(A) =

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑜𝑖𝑛𝑡𝑠 𝑡ℎ𝑎𝑡 𝑚𝑎𝑘𝑒 𝑢𝑝 𝐴 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 𝑝𝑜𝑖𝑛𝑡𝑠

○ For any event A, 0 ≤ P(A) ≤ 1. (All probabilities lie between 0 and 1.) ○ For any sample space S, P(S) = 1. (One of the outcomes in the sample space must occur.) ● The intersection of events A and B is the event that both A and B occur. The intersection of events A and B is denoted by A ∩ B, A and B, or simply AB. ○ Events are mutually exclusive if they have no events in common. (A and B are mutually exclusive if and only if P(A ∩ B) = 0).

● The union of events A and B is the event that either A or B or both occurs. The union is denoted by A ∪ B, or sometimes simply A or B.

○ To find the probability of the union of two events, we can use the addition rule:

P(A ∪ B) = P(A) + P(B) − P(A ∩ B) ○ Note that if A and B are mutually exclusive, then P(A∩B) = 0 and the addition rule simplifies to: P(A ∪ B) = P(A) + P(B). 𝑐

𝑐

● The complement of an event A, denoted by 𝐴 , is the event that A does not occur. 𝐴 is the set of all outcomes that are not in A. ○ A and Ac are mutually exclusive events that cover the entire sample space and thus: P(A) + P(Ac ) = 1 and P(Ac ) = 1 − P(A).

● The conditional probability of event A, given that B has occurred, is:

P(A|B) =

𝑃(𝐴 ∩ 𝐵) (provided P(B) > 0). 𝑃(𝐵)

● The formal definition of independence: ○ Events A and B are independent if and only if P(A ∩ B) = P(A) · P(B). ○ For events with non-zero probability, this implies that P(A|B) = P(A) and P(B|A) = P(B). When two events are independent, the occurrence or nonoccurrence of one event does not change the probability of the other event. We can use any of the following as a check for independence: ■ P(A ∩ B) = P(A) · P(B) ■ P(A|B) = P(A) ■ P(B|A) = P(B) These statements are either all true or all false. If any one of these statements is shown to be true, then they are all true and A and B are independent. If any one of these statements is shown to be false, then they are all false and A and B are not independent. (If two events are not independent, they are called dependent.)

● The multiplication rule: ○ P(A ∩ B) = P(A) · P(B|A) = P(B) · P(A|B)

● Examples:

● In its simplest form, Bayes’ theorem is:



P(B|A) =

𝑃(𝐵) · 𝑃(𝐴|𝐵) 𝑃(𝐴)

,

provided P(A) > 0.

● The law of total probability: 𝑘

○ P(A) = ∑ P(Bi) · P(A|Bi) 𝑖=1

● The number of ways x items can be chosen from n distinct items if the order of 𝑛 𝑛! selection matters is 𝑃 = (𝑛−𝑥)! 𝑥

● The number of ways x items can be chosen from n distinct items, if the order of selection 𝑛 𝑛! does not matter is 𝐶 = 𝑥!(𝑛−𝑥)! 𝑥

Unit 4 (Chapter 5) ● Discrete random variables can take on a countable number of possible values (which could be either finite or infinite). ○ For discrete random variables, all probabilities must lie between 0 and 1, and the probabilities must sum to 1. ● A continuous random variable takes on an infinite number of possible values, corresponding to all possible values in an interval. ○ Let Z represent the volume of water in a randomly selected 500 ml bottle of water. Here Z can take on any value between 0 and the maximum capacity of a 500 ml bottle.

● The expected value of a random variable is the theoretical mean of the random variable. For a discrete random variable X: E(X) = µx =

∑ xp(x) 𝑎𝑙𝑙 𝑥

○ the symbol µX (or simply µ) to represent the mean of the probability distribution of X. ● To find the expectation of a function of a discrete random variable X (g(X), say): ○ E[g(X)] = ∑g(x)p(x) ● The variance of a discrete random variable X is:

○ σ2X = E[(X − µ)2 ] = ∑ (x − µ)2 p(x) =E(X2) − [E(X)]2 𝑎𝑙𝑙 𝑥

● For constants a and b:

○ E(a + bX) = a + bE(X) ○ σ2 a+bX = b2 σ2X (variance) ○ σa+bX = |b| σX (SD) ● If X and Y are two random variables:

○ ○ ○ ○

E(X + Y ) = E(X) + E(Y ) E(X − Y ) = E(X) − E(Y ) σ2X+Y = σ2X + σ2Y + 2 · Covariance(X, Y ) σ2X−Y = σ2X + σ2Y − 2 · Covariance(X, Y )

● If X and Y are independent, their covariance is equal to 0, and things simplify a great deal:

○ Var(X + Y )=σ 2X+Y = σ2X + σ2Y ○ Var(X - Y ) =σ 2X−Y = σ2X + σ2Y ● There are some common discrete probability distributions, such as the binomial distribution, the hypergeometric distribution, and the Poisson distribution. ● The binomial distribution is the distribution of the number of successes in n independent Bernoulli trials. There are n independent Bernoulli trials if: ○ There is a fixed number (n) of independent trials. ■ one of n + 1 possible values (0,1,2...n) ○ Each individual trial results in one of two possible mutually exclusive outcomes (these outcomes are labelled success and failure). ○ On any individual trial, P(Success) = p and this probability remains the same from trial to trial. (Failure is the complement of success, so on any given trial P(Failure) = 1 − p.) ● Let X be the number of successes in n trials. Then X is a binomial random variable with probability mass function: P(X ○

𝑛

(𝑥) =

𝑛

= x) = (𝑥 ) px (1 − p)n−x

𝑛! 𝑥!(𝑛 − 𝑥)!

● The hypergeometric distribution is related to the binomial distribution, but arises in slightly different situations. Suppose we are randomly sampling n objects without replacement from a source that contains a successes and N −a failures. Let X represent the number of successes in the sample. Then X has the hypergeometric distribution:

𝑎



P(X = x) =

𝑁−𝑎

(𝑥 )(

𝑛−𝑥 𝑁

(𝑛 )

) , for x = Max(0, n + a − N), . . . , Min(a, n).

● If the sampling were carried out with replacement, then the binomial distribution would be the correct probability distribution. If the sampling is carried out without replacement, but the number sampled (n) is only a small proportion of the total number of objects (N), then the probability calculated based on the binomial distribution would be close to the correct probability calculated based on the hypergeometric distribution. ● Certain conditions need to be true in order for a random variable to have a Poisson distribution. Suppose we are counting the number of occurrences of an event in a fixed unit of time, and events can be thought of as occurring randomly and independently in time. Then X, the number of occurrences in a fixed unit of time, has a 𝑥 −λ Poisson distribution: P(X

= x) =

λ 𝑒 𝑥!

○ The parameter λ represents the theoretical mean number of events in the time period under discussion. ○ For a Poisson random variable, the mean and variance are both equal to the parameter λ: µ = λ, σ2 = λ. ○ A Poisson random variable is a discrete random variable, taking on ○ the values 0; 1; 2... ● There is a relationship between the binomial and Poisson distributions. As n → ∞ and p → 0, while λ = np is constant, the binomial distribution tends toward the Poisson distribution. In practice, this means that we can use the Poisson distribution to approximate the binomial distribution if n is large and p is s...


Similar Free PDFs