Business Statistics Formula PDF

Title Business Statistics Formula
Author Brandon Welsh
Course Business Statistics
Institution University of Technology Sydney
Pages 13
File Size 1.1 MB
File Type PDF
Total Downloads 78
Total Views 154

Summary

Key Formulas/Definitions...


Description

Business Statistics Formula’s/Summaries Chap 1: Introduction to statistics  Two important concepts in statistics are population and sample.  A population is a set of units of interest and a sample is a subset of the population.  Data are broadly classified as qualitative (also called categorical) and quantitative (also called numerical).  Qualitative data can be further classified into nominal or ordinal, while quantitative data are further classified into discrete or continuous.  Data that are collected at a fixed point in time are called cross-sectional data, while data that are collected over time are called time-series data.  Data already available= secondary, whilst collecting data=primary Chap 2: Charts and Graph  Graphical analysis and summary of data help highlight outliers.  Data that have not been summarised in any way are called raw or ungrouped data. Data that are organised into a frequency distribution are called grouped data.  These data’s are organised as class midpoint, relative frequency and cumulative frequency.  Histograms

Frequency polygons line between each interval, ogive is cumulative frequency polygon. Chap 3: Descriptive summary measures Central tendency= mode, median and mean. 

Chap 4: Probability Three methods of assigning probabilities are (1) the classical method, (2) the relative frequency of occurrence method and (3) the subjective probability method.

Chap 5: Discrete Distributions Probability experiments produce random outcomes. Random variables, such that the set of all possible values is a finite or a countably infinite number of possible values, are called discrete random variables.  Measures of central tendency and measures of variability can be applied to discrete distributions to compute a mean and a variance.  The binomial distribution fits experiments when only two mutually exclusive outcomes are possible.  Population size is large enough in relation to the sample size (n30) sample mean is approximately normally  According to the central limit theorem, if the sample is large (n≥30), the sample mean is approximately normally distributed  The sampling distribution of the proportion is also approximately normally distributed provided the sample size is large enough such that both np>5 and nq>5.

Chap 8: Statistical inference: estimation for single populations  An interval estimate for the population mean can be constructed around a point estimate of the population mean when σ is known.  The confidence interval provides more information about a population parameter than simply the point estimate.  The confidence interval estimation for the population mean when σ is known is:



Second confidence interval for the population can be constructed when σ is unknown.

 

A third estimate of a population parameter is for the population proportion. Using the z distribution, a level of confidence and a specified sample size, a confidence interval can be constructed around a point estimate of the population proportion.



A fourth estimate is for the population variance from a sample variance. Using the chi-square distribution, a specified level of confidence and a sample variance



To minimise any expense in collecting a sample, as well as collecting a sample that is large enough to provide meaningful information.

Chap 9: Hypothesis testing for single populations  Hypothesis testing draws conclusions about a population parameter using sample statistics. It requires an understanding of how to establish the null and alternative hypotheses.  The null hypothesis always contains an equals sign (=, ≤, ≥) while the alternative hypothesis never contains an equals sign (≠, ).  LOOK for keywords with direction, allows to draw rejection and non-rejection region.  Hypothesis test= 6 steps. o Involve setting up the hypothesis test using mathematical symbols and deciding whether a one- or two-tailed test is required, along with a level of significance. o A diagram is then sketched to show the critical value(s) of the test statistic and a decision rule is written in terms of the critical value(s) of the test statistic. o Sample data is then collected, and relevant calculations are done compared with critical value to determine test statistic. o Then reject or not reject the null hypothesis o Not enough significance in the data.  P value< alpha auto reject  When small samples are involved, the variable of interest is normally distributed, and the sample standard deviation, s, is used instead of σ and the t distribution is used instead of the z distribution.  When calculating the standard error of a proportion, the hypothesised value of p is used in the calculation of SEp^ where:

Chap 10: Statistical inference about two populations  Statistical investigation into such comparisons is especially warranted when we cannot readily distinguish between the means and variances of the two populations through graphs and visual inspections of the data.  Sometimes we are required to test whether the means of two population distributions are significantly different.  Where the sample data from each population can be assumed to be approximately normal, the observations are random and independent, and the population variances are similar although unknown, the test can be done using the t statistic.  When we are interested only in comparing the proportion of successes in two populations, we can use a z statistic.  There are times when instead of focusing on means there is a significant difference between the variances of two populations.

Chap 11: Analysis of variance and design of experiments  The design of the experiment should encompass the treatment variables to be studied, manipulated and controlled. These variables are often referred to as the independent variables.  It is possible to study several independent variables and several levels, or classifications, of each of those variables in one design.  This produces an F value that can be compared with F values in table to determine whether the ANOVA F value is statistically significant.





A second experimental design is the randomised block design, which contains a treatment variable and a blocking variable. Independent variable is the main variable of interest in this design. Blocking variable is a variable the researcher is interested in controlling rather than studying. A third experimental design is the factorial design, which enables the researcher to test the effects of two or more independent variables simultaneously.

Chap 12: Chi- square tests  The chi-square goodness-of-fit test is used to determine whether a given set of data or a situation follows a particular distribution (e.g. normal or Poisson).  The chi-square goodness-of-fit test for two categories is also equivalent to a z test of the equality of two population proportions against a two-sided alternative.  The chi-square test of independence is used to investigate whether there is dependence (a relationship) or independence (no relationship) between two categorical variables. The data used to conduct a chi-square test of independence are arranged in a two-dimensional table called a contingency table.  A chi-square test of independence is computed in a manner similar to that used with the chi-square goodness-of-fit test.

Chap 13: Simple regression analysis  Simple regression is bivariate (two variables) and linear (only a straight-line fit is attempted). Simple regression analysis produces a model that attempts to predict a y variable, referred to as the dependent variable, using an x variable, referred to as the independent, or predictor, variable.  Having confirmed that the relationship is approximately linear, the strength of the relationship is measured by the correlation coefficient. The closer in magnitude the correlation coefficient is to 1, the stronger the linear relationship. The scatter plot and the correlation coefficient are used together.  The equation comprises a constant (the y intercept value) and the slope of the line (the coefficient of x); the form of the equation is y=b0+b1x. The coefficient b1 is the slope of the line; the interpretation of this coefficient is that, for a one unit increase in x, the dependent variable, y, increases by b1 units.  If the data include the value x=0, the coefficient b0is the intercept of the regression line with the vertical axis, and is the value of the dependent variable when the independent variable is zero.  The residual is the difference between the observed value of y and the predicted value of y: that is, (y−y^). Specifically, graphs of the residuals can reveal (1) lack of linearity, (2) lack of homogeneity of error variance and (3) dependence of error terms.  A single value of error measurement called the standard error of the estimate, se, can be computed. The standard error of the estimate is the standard deviation of a model's residuals.  Another widely used statistic for testing the strength of a regression model is r2, or the coefficient of determination.  Testing to determine whether the actual population slope of the regression line is different from zero is essential before accepting the regression model as valid. If the population slope of the regression line is not considered to be different from zero, we conclude there is no valid regression and no relationship. A t statistic, or a p-value, is used to test the significance of the slope.  Recognising that the predicted value is a sample result, a confidence interval can also be developed to provide a 95% confidence interval for the true population value of y. A prediction interval for a single y value for a given x value is also specified. This second interval is wider because it allows for the wide diversity of individual values, whereas the confidence interval for the mean y value reflects only the range of average y values for a given x.

Chap 14: Multiple regression analysis  Multiple regression analysis is a statistical tool in which a mathematical model is developed in an attempt to predict a dependent variable using two or more independent variables or in which at least one predictor is nonlinear. Because undertaking multiple regression analysis manually is extremely tedious and time consuming, it is always done using a computer.  An F test for the overall model is computed to determine whether at least one of the regression coefficients is significantly different from zero. This F value is displayed in an ANOVA table, which is part of the regression output.  F test indicates that at least one of the independent variables is significantly different from zero, then each of the independent variables is tested using the t test for the regression coefficient.





The regression output also displays the p-value associated with this t test. If the p-value is greater than the alpha value (normally .05), we conclude that the independent variable is not a valid predictor. Because R2 is inflated with the inclusion of additional independent variables, an adjusted R2should be used. Unlike R2, adjusted R2 takes into account the degrees of freedom and the sample size (number of observations).

Chap 15: Time series forecasting and index numbers  Time-series data are defined as data that have been gathered over a period of time.  Time-series data can comprise four key components: trend, cyclical, seasonal and irregular (or random).  The cyclical component refers to the business and economic cycles that occur over periods of more than one year. The seasonal component refers to the patterns or cycles of data behaviour that occur over time periods of less than one year. The irregular component refers to irregular or unpredictable changes in the time series that are caused by components other than trend, cyclical and seasonal movements in the series.  There are a number of ways to smooth a time series to reduce random variation in the data to reveal the trend. The first is the moving average method, which is a time period average that is revised for each time period by including the most recent value(s) in the computation of the average and deleting the value or values that are farthest away from the present time period.  Seasonal indices can be computed using the ratio-to-moving average method. Seasonal indices help us to identify the relative influence of various months or quarters in a time series and this information can be used to smooth or ‘deseasonalise’ the time series.  The trend in a time series can also be estimated using least squares trendbased forecasting models.  The presence of autocorrelation presents an opportunity to use the autoregressive model for forecasting. Autoregression is a forecasting



technique in which time-series data are predicted by independent variables that are lagged versions of the original dependent variable data. The accuracy of the forecasts of two or model models can be assessed by calculating the forecast error using two criteria: mean absolute deviation (MAD) and mean square error (MSE). The forecast error is the difference between the actual value of the variable and the forecast value....


Similar Free PDFs