Business Stats Notes PDF

Title Business Stats Notes
Course Business Statistics
Institution Grand Canyon University
Pages 14
File Size 85.5 KB
File Type PDF
Total Downloads 33
Total Views 144

Summary

All notes from year...


Description

Business Stats DCOVA D – Define C - Collect O - Organize V - Visualize A – Accept Define: Variable – a characteristic of an item or individual Data – The set of individual values associated with a variable Statistics – The methods that help transform data into useful information for decision makers Categorical (qualitative) – variable takes categories as their value such as “yes”, “no”, or “blue”, “brown”, “green” Numerical (quantitative) variables have values that represent a counted or measured quantity - Discrete variables arise from a counting process - Continuous variables arise from a measuring process -

An operational definition is a clear and precise statement that provides a common understanding of meaning

Collect: -

Need to avoid data flawed by biases, ambiguities, or other types of errors Results from flawed data will be suspect or in error Even the most sophisticated statistical methods are not very useful when the data is flawed

Wednesday Define

Collect

Problem

Sample

Variables (how measured)

Organize

Sources of Data Primary Sources: Things that you collect - Data from political survey - Data collected from an experiment - Observed data Secondary Sources: Data sets collected by someone else - Analyzing census data - Examining data published on internet Data collected from Population or Sample Population - Consists of all the items or individuals abut which you want to draw a conclusion, Ex: GCU Sample - Portion of a population selected for analysis. Portion of population, Ex: One class at GCU After collecting data there might ne a need for data cleaning Also helpful to recode variables - Mutually exclusive: categories do not overlap, Ex: no card can be a heart and a diamond - Collectively exhaustive: categories cover all possible values, Ex: All cards are either a heart, a diamond, a club, or a spade Samples: Two Categories - Convenience Sampling: Items are selected based on them being easy - Judgment Sample: You get the opinions of preselected experts Probability Samples: 4 Categories Simple Random: Every person has an equal chance of being selected - Sampling with replacement: Selecting someone and putting them back in the population with the possibility of being selected again - Sampling without replacement: Selecting someone and taking them out of the population Systematic Sample: Grouping Stratified Sample: Division of population into two or more subgroups, proportional to size Cluster Sample: Population is divided into several clusters, Ex: Zip code

Comparing Sampling Methods: Simple Radom Sample and Systematic Sample - Easy - May not be good sample Stratified Sample - Ensures representation of individuals in population Cluster Sample - Cost effective, not efficient

Evaluation of Survey - Survey based on probability? - Appropriate frame Errors: Coverage error or selection error - Exists if groups are excluded from frame Nonresponse error or bias - People who do respond are different from those who don’t Sampling error - Variation will exist Measurement error - Weakness in question design

Organizing Categorical Data: Two different categories One Categorical Variable: - Summary Table: tallies the frequencies or percentages of items in a set of categories so that you can see differences between categories Two Different Categorical Variables: - Contingency tables

Two types of statistical analysis: Descriptive Central tendency – Average Variation – data away from the average Shape – pattern of distribution for highest to lowest values X bar is the average Mean Median n+1/2 Mode most repeated, if all same: no mode, if two have the same amount: then two mode

Inferential Coefficient of Variation - The Standard deviation divided by the mean * 100% Z Score - Datapoint – mean / Standard deviation - Outliers are more than -3 or greater than 3 Describing Shapes - Skewness: Measures extent to which data is no symmetrical. Mean < median: left skewed, mean = meadian: symmetrical, mean > median: right skewed - Kurtosis: measures how sharply the curve rises approaching the center of the distribution, leptokurtic: sharper peak then bell curve, mesokurtic, platykurtic IQR -

Q3-Q1 Gets rid of outliers

5 Number Summary - Smallest - Q1 (n+1/4) - Median or Q2 (n+1/2) - Q3 (3(n+1/4)) - Largest Empirical Rule - 68% of all data points in a bell shape curve or within one SD - 95% of all data points will be in two SDs - 99.7% of all data points will be in three SDs

Scatter Plot - If the line has a slope there is a relationship (positive or negative) - If the line doesn’t have a slope (horizontal) there is no relationship - When cov (x,y) > 0, x an y move in same direction - When cov (x,y) < 0, x and y move in opposite direction - When cov (x,y) = 0, x and y are independent Coefficient Correlation - Usually r - Unit free - Are in between 1 and -1 - Closer to -1, more negative linear relationship - Closer to 1, stronger positive linear relationship - Closer to 0, weaker the linear relationship

Test Review Categorical (qualitative) - Characterized by non numerical values - Ex: Gender, degrees / programs Numerical (quantitative) Discreet or continuous - Characterized by numerical values - Discrete (counted) - Continuous (measured) Population - The entire group Sample - Selection from population How to collect sample: Non probability: - Convenience: Items that are selected based on the fact that they are easy, inexpensive - Judgment Sample: You get the opinions of preselected experts Probability - Simple Random: Every individual has equal chance. Can be with or without replacement - Systematic: Selection from series, divide sample into equally sized groups - Stratified: divide sample into classes (Ex. Divide into freshman, sophomore, junior, senior, take 10 from each) - Cluster: divided into several clusters (such as zip codes)

Types of statistics - Descriptive: Gives collection and summarizes sample, Patterns: Central tendencies, variation, shape, visual tools - Inferential: Draw conclusions about the population based upon sample, Organizing Data Categorical Variables - One Category – summary table - Two Categories Numerical Data - Ordered Array - Frequency Distributions - Cumulative Distributions Descriptive Statistics Variance is usually a large value, it is a squared value Skews: - Kutosis (height) - Skewness (left to right) Left Skewed: Mean < Median Symmetric: Mean = Median Right Skewed: Mean > Median Box Plot Steps - Sort small to large - Find N - Calculate POSITIONS of Q1, Q2, Q3 - Determine values of positions Empirical Rule - 68% all data fall in 1 deviation - 95% all data fall in 2 deviation - 99.7% all data fall in 3 deviation - anything over 3 is an outlier Correlation - Shows how closely related the data is to the line - -1 means strongly negative related - 1 means strongly positive related - 0 means no correlation Probability

-

the chance that an uncertain event occurs, range is between 0-1

Impossible event - probability of 0 Certain event - an event that is sure to occur has a probability of 1 Three ways to determine Priori - based on prior knowledge of the process. (number of ways in which the event occurs / total number of outcomes) - Ex: odds of selecting a random day and it being in January is 31 days of January / 365 days Empirical - Measured Subjective - Guess, Differs from person to person Events: - Each possible outcome of a variable is an event Simple event - An event described by a single characteristic - Ex: a day in January from all days in 2015 Joint event - An event described by two or more characteristics - Ex; A day in January that is also a Wednesday from all days in 2015 Complement event - All events that are not part of event A - All days in 2015 Sample Space - The collection of all possible outcomes Mutually exclusive events - Events that cannot occur simultaneously - Ex: randomly choosing a day from 2017 - A = a day in January; B = a day in February

Collectively exhaustive events - One of the events must occur - The set of events covers the entire sample space Computing the joint and marginal probabilities - Probability of a joint event A and B If the probability of P(A|B) = P(A) is independent Permutations - n!/(n-X)! - Order does matter Combinations - n!/X!(n-X)! - Order doesn’t matter Expected Value - Weighted Average - Xi*P(X=xi) Binomial Distribution Rules: - Outcome of one observation does not affect the observation of another - Fixed number of observations Mean of binomial distribution - Number of observations * probability Variance of binomial distribution Binomial on excel Always false Poisson Distribution - Number of times in an area of opportunity Use when wanting to know the number of times an occurrence happens in a area Use Lambda - Mean = lambda - Variance = lambda - Standard Deviation = Squrt of Lambda

Area of opportunity - Continuous unit or interval of time, volume, or an area where more than one occurrence of an event can occur Continuous Probability Distributions - A continuous variable Normal Distribution - Bell shaped - Symmetrical - Mean, median, and mode are all equal - Location is determined by the mean - Spread is determined by the standard deviation Standardized Normal - The standardized normal distribution (Z) has a mean of 0 and a standard deviation of 1 Z Score - X-mean/ standard deviation - Can be positive or negative Excel: Continuous, Cumulative: True Discrete, Cumulative: False Norm.S.Dist: M,stdev,x,true Everything to the left

Space between two points P(x1-x2) = P(x 5 and n(1-p) > 5

Arent given pie and cant calculate pilot .5 is most conservative Chapter 9 Hypothesis H0 - null HA,1 – alternative Type I Error - Reject a true null hypothesis - A type I error is a “false alarm” The probability of a type error is alpha - Called level of significance of the test - Set by researched in advance Type II Error - Failure to reject null hypothesis - Type II error represents a “missed opportunity” - The probability of a type II error is beta Confidence Coefficient - (1-alpha) is the probability of not rejecting H0 when it is true Confidence level - a hypothesis test is (1-alpha)*100% Power of Statistical Test - (1-beta) Hypothesis tests for the mean - Find Z stat 6 Steps of hypothesis testing 1. State the null hypothesis, H0 and the alternative hypothesis, H1 2. Choose the level of significance, alpha, and the sample size, n. The level of significance is based on the relative importance of Type I and Type II errors 3. Determining the appropriate test statistic and sampling distribution 4. Determine the critical values that divide the rejection and nonrejection region 5. Collect data and compute the value of the test statistic 6. Make the statistical decision and state the managerial conclusion

Hypothesis testing:

Alpha not known, 2 tailed: Tstat= xbar – M / S/ sqrt(n) Tcrit = t.inv.2t(prob,df) P value= t.dist.2t(tstat, df) One tail test Null hypothesis will always have an equal sign in it Lower tail test () There is only one critical value Upper tail: t.in(1-alpha, df) p value = 1-t.dist(tstat,df) Lower tail: t.inv(alpha,df) p-value= t.dist(tstat, df)

Hypothesis testing proportion P= x/n Sterror = sqrt( x-n pie / n) Zstat = x-n pie / sqrt( n pie (1-pie)) Zcrit = + - norm.s.inv (alpha / 2) P value = 2 [1-norm.s.dist(abs(z stat), true)]

Upper tail: Zcrtit = norm.s.inv(1-alpha) P value = 1 – norm.s.dist(zstat, true) Lower tail: Zcrit = norm.s.inv(alpha) P value = norm.s.dist(zstat, true) Exam 3 Review Chapter 7 – sampling distributions Chapter 8 – confidence interval estimate

Chapter 9 – fundamentals of hypothesis testing: one-sample tests Chapter 7 Sample distribution - Sample of all possible values of a sample statistic for a given size sample selected from a population -

Calculating the mean, standard error

Continuous variable distributions - Z-value (Zstat) for the sampling distribution of sample mean (xbar) Discrete variable distributions - Z value for proportions

Chapter 8 CI = Xbar + - Z half alpha standard error Confidence interval for the mean when alpha is not known Student’s t distribution Degrees of freedom = n-1 Sample proportion = x/n REQUIREMENTS: must have X > 5 and n-X > 5 Determining Sample Size (given mean) = Z^2*stdev^2 / e^2 Sample size (given proportion) = Z half alpha^2 * pi(1-pi) / e^2 Chapter 9 Developing the hypothesis Null hypothesis – represents the status quo and is designated by =, ( w/ equal) The alternate hypothesis is opposite of the null Decision

H0 True

H0 False

Do Not Reject

No error, probability (1-alpha)

Type II Error, Probability

Reject H0

Type I Error

No error, probability 1-B

F Stat should always be more than 1 Larger variance always goes on top Smaller variance always goes on bottom Total Variation SST = SSA + SSW Total Variation (SST) Among-group Variation (SSA) : Variance between groups Within-group Variation (SSW) : Variation within factors in a group

Chapter 10.2 Calculate medians Calculate absolute Tukey Kramer Critical Range Critical Range Test for difference in variation is a levene test Anova and Chisq the test is if it is below Linear regression Predict future The influence from one variable to another Beta 0 is the estimated mean value of Y when the value of X is zero, Y intercept Beta 1 is the estimated change in the mean Regression Dependent variable is y Independent variable is x

Exam 4...


Similar Free PDFs