Revision for the final exam with summary of each chapter PDF

Title Revision for the final exam with summary of each chapter
Author ngan huynh
Course Statistics for Business
Institution Trường Đại học Kinh tế Thành phố Hồ Chí Minh
Pages 12
File Size 607.9 KB
File Type PDF
Total Downloads 564
Total Views 878

Summary

CHAP 2: DATA COLLECTION2 Variables and Data Observation: a single member of a collection of items that we want to study, such as a person, firm, or region. Variable: a characteristic of the subject or individual, such as an employee’s income or an invoice amount. Data set: consists of all the val...


Description

CHAP 2: DATA COLLECTION 2.1 Variables and Data 

Observation: a single member of a collection of items that we want to study, such as a person, firm, or region.



Variable: a characteristic of the subject or individual, such as an employee’s income or an invoice amount.



Data set: consists of all the values of all of the variables for all of the observations we have chosen to observe.

Data set

Variables

Example

Typical Tasks

Univariate

One

Income

Histograms, basic statistics

Bivariate

Two

Income, Age

Scatter plots, correlation

Multivariate

More than two

Income, Age, Gender

Regression Model

Types of data =>1. Categorical (qualitative) => 1. Verbal label 2.. Coded 2. Numerical (quantitative) => 1. Discrete 2. Continuous - Time series data: each observation in the sample represents a different equally spaced point in time (year, months,days) -Cross-sectional data: Each observation represents a different individual unit (person) at the same point in time

2.2 Level of measurement Level of measurement Nominal

Characteristics Categories only

Ordinal

Rank has meaning. No clear meaning to distance Distance has meaning Meaningful zero exists

Interval Ratio

2.3 Sampling concept

Example Eye color ( blue, brown, green) Rarely, never Tempt Acc payable

- Population involve all of the items one is interested in. May be finite -Sample is a subset of the population and involves looking only at some of the items from the population. -Census is an examination of all items in a population -Parameter is a measurement or characteristic of the population. -Statistic is a numerical value computed from a sample

2.4 Sampling method Random sampling method Simple Random Sample Systematic Sample Stratified Sample

Random number to select items from a list Select every kth item from a list or sequence Randomly within defined data (by age, occupation, gender) Cluster Sample Random geographical regions ( zip code) that represent the population -If we do not allow duplicates when sampling, then we are sampling without replacement and vice versa

Non-random sampling method Judgment

Use expert knowledge to choose “typical” items Use a sample that happens to be available In-depth dialog

Convenience Focus Group

Sources of error or bias Source of error Nonresponse bias Selection bias Response error Coverage error Measurement error Interviewer error Sampling error

Characteristics Respondents differ from non-respodents Self-selected respondents are atypical Respondents give false information Incorrect specification of frame or popu Unclear survey instrument wording Responses influenced by interviewer Random and unavoidable

Chap 4: Descriptive Statistics 4.1 Numerical Description -3 key characteristics of numerical data: center, variability, shape

4.2 Measures of Center -Shape:

Mean Med Mode Geomean Midrange 5% trim mean

=AVERAGE (Data) =MEDIAN (Data) =MODE.SNGL(Data) =GEOMEAN(Data) =(MIN(data)+MAX(data))/2 =TRIMMEAN (Data,0.1)

4.3 Measures of variability -Population variance: sum of squared deviations from the mean divided by the population size (FORMULA SHEET) =VAR.P(data) -Sample variance: divided by n-1 (FORMULA SHEET) =VAR.S(data) -Standard deviation: square root of variance (FORMULA SHEET) =STDEV.S(data) -Sample coefficient of variation (FORMULA SHEET nhân 100%) =Standard dev / mean x100% -Mean absolute deviation: The average distance from the center. (FORMULA SHEET) =Tính mean => =abs(A1 – mean) => =sum(nguyên dòng nãy vừa tính)/n thành phần

4.4 Standardized data The Empirical Rule: + k=1, 68.26% lie within u +- 1SD + k=2, 95.44% lie within u +- 2SD + k=3, 99.73% lie within u +- 3SD -Standardized variable (Z) redefines each observation in terms of the number of standard dev from the mean. (FORMULA SHEET) =A1-mean/STDEV

4.5 Percentiles, Quartiles, Box-plots -Percentiles are data that have been divided into 100 groups -Quartiles are scale points that divide the sorted data into four groups of approximately equal size

4.6 Covariance and Correlation -Covariance: measures the degree to which the values of X and Y change together (FORMULA SHEET) =COVARIANCE.S(data) -Correlation coefficient: covariance divided by the product of the SD (FORMULA SHEET) =CORELL(Data)

4.7 Grouped data -Weighted mean: sum that assignes each data value a weight that represents a fraction of the total (FORMULA SHEET) =SUMPRODUCT( Data)/ SUM(Data) -Mean for grouped data: 1. Tìm midpoint 2. Frequency*midpoint 3. B1*B2 4. Mean = sum of B2/ sum of frequency -Variance for grouped data: 1. Làm giống trên tới B3 2. Variance = (sum of B3/sum of frequency)-Mean^2

CHAP 5: PROBABILITY 5.1 Random Experiments -Random experiment is an observational process whose results cannot be known in advance -Sample space: the set of all possible outcome -Event: subset of outcome in the sample space

5.2 Probability -Probability of an event is a number that measures the relative likelihood that event will occur -The probability of event A [P(A)] must lie within interval 0 to 1 -Probabilities of all sample events must sum to 1 Approach How assigned? Example Empirical Estimated from observed 3.2% chance of twins in a outcome frequency randomly chosen birth Classical Known a priori by the nature 50% chance of heads on a coin flip of the exp Subjective Based on informed opinion or 60% chance Toronto will bid judgement for the 2024 Olympics -Laws of large numbers: as number of trial increases, any empirical probability approaches its theoretical limit

5.3 Rule of probability -Complement of an event A is A’ -Union of 2 events consist of all outcomes in the sample space S that are contained either in event A or B or both (A  B) -Intersection of 2 events is the event consist of all outcomes in the sample space S that are contained in both A and B (AB) -General law of addition (FORMULA SHEET) -Mutually exclusive event: Events A and B are mutually exclusive (disjoint) if their intersection is the null set, no element -Addition rule for mutually exclusive event: (FORMULA SHEET) -Conditional probability: The probability of event A given that event B has occured (FORMULA SHEET) -General law of multiplication: (FORMULA SHEET) -Odds in favor of A:

-Odds against A:

5.4 Independent event independent of event B if the conditional probability is the same as the marginal -Event A is probability. Checked by multiplication law

5.5 Contingency table -A contigency table is a cross-tabulation of frequencies into rows and columns

5.6 Tree diagram -A tree diagram or decision tree helps visualize all possible outcomes

5.7 Bayes’ Theorem -Bayes’ formula/ theorem (FORMULA SHEET)

Chap 6: Discrete Probability Distributions 6.1 Discrete Distributions -A random variable is a function or rule that assigns a numerical value to each outcome in the sample space of a random experiment -A discrete random variable has a countable number of distinct values. -A discrete probability distribution assigns a probability to each value of a discrete random variable X. -A probability distribution function (PDF) is a mathematical function that shows the probability of each X value (X=x) -A cumulative distribution function (CDF) is a mathematical function that shows the cumulative sum of probabilities, adding from the smallest to the largest X-value, gradually approaching unity. (X =< x)

6.2 Expected value and variance -Expected value E(X) of a discrete random variable is the sum of all X-values weighted by their respective probabilities. (FORMULA SHEET) -E(X) is the central tendency. -Variance of a discrete random variable (FORMULA SHEET) -Standard deviation is the square root of the variance (FORMULA SHEET)

6.3 Uniform distribution -The discrete uniform distribution describes a random variable with a finite number of integer values from a to b (only 2 parameters)

6.4 Binomial distribution -Bemoulli experiment:+ a random experiment with only 2 outcomes. -How to recognize binomial distribution: + number of trials(n) is fixed +only 2 outcomes (failure or success) +the probability of success for each trial remains constant +trials are independent of each other +random variable (X) is the number of success out of n -The binomial distribution arises when a Bernoulli experiment is repeated n times. -Expected value E(X) of binomial distribution (FORMULA SHEET) -Variance of binomial distribution (FORMULA SHEET) -Binomial distribution (FORMULA SHEET) =BINOM.DIST(x,n,pi,0) (PDF)

6.5 Poisson Distribution -The poisson distribution describes the number of occurences within a randomly chosen unit of time ( minute, hour) or space ( foot, mile) = POISSON.DIST(x,lambda,0) (PDF) -How to recognize poisson distribution: +Event of interest occurs randomly over time or space +The average arrival rate (lambda) remain constant +Arrivals are independent of each other + Random variable (X) is the number of events within an observed time interval -X is number of event/unit of time - Lambda represents the mean number of events/unit of time = expected value E(X) = variance -Standard deviation of poisson distribution (FORMULA SHEET)

6.6 Hypergeometric Distribution -Hypergeometric distribution is similar to the binomial distribution, but sampling without replacement from a finite population of N items (FORMULA SHEET) =HYPGEOM.DIST(x,n,s,N,0) (PDF)

-How to recognize hypergeometric distribution: finite population (N) with a known number of success (s) and sampling without replacement (n items in the sample)

Chap 7: Continuous probability distributions 7.1 Continuous probability distributions -Probability density function (PDF): +f(x) + nonnegative +area under curve = 1 +mean, var, shape depend on PDF parameter +Reveals the shape of the distribution -Cumulative distribution function (CDF): +F(x) + P(X≤x) +Useful for finding probabilities

7.2 Uniform continuous distribution -Uniform continuous distribution (FORMULA SHEET) -Mean= expected value of uniform continuous distribution (FORMULA SHEET) -Standard deviation of a uniform distribution (FORMULA SHEET)

7.3Normal disttribution -Normal or Gaussian(bell shaped) distribution. (FORMULA SHEET)=NORM.DIST(x, μ, ,0) (PDF) -Denoted N(μ, )

7.4 Standard normal distribution -Transform normal random variable to standard normal distribution with mean=0 & STDEV= 1 -Standard normal distribution (FORMULA SHEET) =NORM.S.DIST(z,1) (CDF) -Inverse standard normal distribution (FORMULA SHEET)

7.5 Normal approximations -Binomial probabilities are difficult to calculate when n is large. -Rule: when npi>=10 and n(1-pi)>=10, it is appropriate to use normal approximation to binomial distribution

7.6 Exponential distribution -The time until the next event follows the exponential distribution (FORMULA SHEET) =EXPON.DIST(x,lambda,0) (PDF) -The probability of waiting more than x units of time until the next arrival is e−λx, while the probability of waiting x units of time or less is 1 – e^−λx.

Chap 8: Sampling distribution and estimation 8.1 Sampling and estimation -A sample statistic is a random variable whose value depends on which population items are included in the random sample. -The sampling variation can easily be illustrated by selecting random samples from a large population -An estimator is a statistic derived from a sample to infer the value of a population parameter -An estimate is the value of the estimator in a particular sample.

-The sampling distribution of an estimator is the probability distribution of all possible values the statistic may assume when a random sample of size n is taken. -Sampling error is the difference between an estimate and the corresponding population parameter. -Bias is the difference between the expected value of the estimator and the true parameter. I.e: bias = E(X) – u -Efficiency refers to the variance of the estimator’s sampling distribution. -Consistency estimator converges toward the parameter being estimated as the sample size increaases.

8.2 Central limit theorem -Expected value of the mean (the sample mean is an unbiased estimator for u) (FORMULA SHEET) -Standard error of the mean (FORMULA SHEET) -Expected range of sample means

8.3 Sample size and standard error 8.4 Confidence interval for a mean with known standard deviation -Construct a confidence interval for the unknown mean by adding and subtracting a margin of error from x, the mean of our random sample. (FORMULA SHEET) -If STDEV is known and we do not know whether the population is normal, a common rule of thumb is that n >= 30 is sufficient to use the formula

8.5 Confidence interval for a mean with unknown STDEV -If the population is normal but the STDEV is unknown, then the t distribution should be used instead of the z distribtuion. (FORMULA SHEET) -The confidence intervals will be wider because t(a/2) is always greater than z(a/2)

-Degrees of Freedom is a parameter based on the sample size that is used to determine the value of the t statistic

8.6 Confidence interval for a proportion (pi) -The distribution of the sample proportion p=x/n -Expected value E(X) (FORMULA SHEET) -Standard error (FORMULA SHEET)

-Confidence interval for pi(large sample) (FORMULA SHEET)

8.7 Estimating from finite population -The finite population correction factor (FPCF) reduces the margin of error and provides a more precise interval estimate. -Finite population correction factor (N=number of items in population N=number of items in sample)

8.8 Sample size determination for a mean -To estimate a population mean with a precision of +- E(allowable error), we need a sample of size. (FORMULA SHEET) How to Estimate σ? • Method 1: Take a Preliminary Sample Take a small preliminary sample and use the sample s in place of σ in the sample size formula. • Method 2: Assume Uniform Population Estimate rough upper and lower limits a and b and set σ = [(b-a)/12]1⁄2.

• Method 3: Assume Normal Population Estimate rough upper and lower limits a and b and set σ = (b-a)/6. This assumes normality with most of the data with μ ± 3σ so the range is 6σ. • Method 4: Poisson Arrivals In the special case when λ is a Poisson arrival rate, then STDEV=căn lambda

8.9 Sample size determination for a proportion - To estimate a population proportion with a precision of ± E (allowable error), you would need a sample size. -Since pi is a number between 0 and 1, the allowable error E is also between 0 and 1. How to Estimate π? • Method 1: Assume that π = .50. This conservative method ensures the desired precision. However, the sample may end up being larger than necessary. • Method 2: Take a Preliminary Sample. Take a small preliminary sample and use the sample p in place of π in the sample size formula. • Method 3: Use a Prior Sample or Historical Data. Unfortunately, π might be different enough to make it a questionable assumption.

Chap 9: One-Sample Hypothesis Testing 9.1 Logic of hypothesis testing Steps in hypothesis testing - Step 1: State the hypothesis to be tested. H0: null hypothesis H1: alternate hypothesis We cannot accept a null hypo but only reject or fail to reject - Step 2: Specify what level of consistency with the data will lead to rejection of the hypothesis. This is called the decision rule. - Step 3: Collect data and calculate necessary statistics to test the hypothesis. - Step 4: Make a decision. Should the hypothesis be rejected or not? - Step 5: Take action based on the decision.

9.2 Type I and Type II errors -Type I error: Reject the null hypothesis when it is true. This occurs with probability alpha (level of significance) -Type II error: Failure to reject the null hypothesis when it is false. This occurs with probability beta -The power of a test is a probability that a false hypothesis will be rejected = 1-beta

9.3 Decision rules and critical values -A statistical hypothesis is a statement about the value of a population parameter -A hypothesis test is a desicion between 2 competing mutually exclusive and collectively exhaustive hypotheseses about the value of the parameter. -The value of u0 that we are testing is a benchmark based on past experience, an industry standard…

9.4 Testing a mean: known population variance -The test statistic measures the difference between a given sample mean and a benchmark u0 in terms of the standard error of the mean (FORMULA SHEET) -Common z value

-If p-value < alpha we reject H0 -Hypothesis test, one-tail and two-tail (example slide 34)

9.5 Testing a mean: unknown population variance -Using student’s t -Hypothesis test using student’s t (example slide 50)

9.6 Testing a proportion

-Hypothesis testing a proportion (example slide 59)

Chap 10: Two-sample hypothesis tests 10.1 Two-sample tests -2 two-sample test compares two sample estimates with each other Test procedure: -State the hypotheses.

- Set up the decision rule. - Insert the sample statistics. - Make a decision based on the critical values or using p- values.

10.2 Comparing two means: independent samples

-Case 1: known variances => test statistics(FORMULA SHEET) -Case 2 : unknown variances, assumed equal(FORMULA SHEET) -Case 3: unknown variances, assumed unequal (FORMULA SHEET) Steps in Testing Two Means (See text for examples) • Step 1: State the hypotheses. • Step 2: Specify the decision rule. Choose α (the level of significance) and determine the critical value(s). • Step 3: Calculate the Test Statistic. • Step 4: Make the decision Reject H0 if the test statistic falls in the rejection region(s) as defined by the critical value(s). • Step 5: Take action based on the decision.

10.3 Confidence Interval for the difference of two means u1 – u2 -Confidence interval if STDEV 1 and 2 are unknown and cannot be assumed equal (FORMULA SHEET)

10.4 Comparing two means: paired samples...


Similar Free PDFs