Business Statistics - Lecture notes - 1 - 3 PDF

Title Business Statistics - Lecture notes - 1 - 3
Author Ekatarina Akatova
Course Business Statistics
Institution University of Technology Sydney
Pages 4
File Size 118.5 KB
File Type PDF
Total Downloads 70
Total Views 152

Summary

Download Business Statistics - Lecture notes - 1 - 3 PDF


Description

Business Statistics Summary Notes Chapter 1. 1.1 Statistics in Business Statistics is a mathematical science concerned with the collection, presentation, analysis, interpretation or explanation of data. Aim- to extract the best possible information from data and use it to make business decisions. Areas of Application -

Monitoring stock markets, investments and commodities Reporting economic data (interest rates, un-employment, consumer confidence, etc). Market research, interpreting sales figures, forecasting demand. Assessing the effectiveness of advertising Detecting credit card, mobile phone and banking fraud. Maintaining internet security Weather forecast Improving production methods and assessing product reliability.

1.2 Basic statistical concepts Population- A population is a collection of objects (often called units or subjects) of interest. Examples include; ALL small businesses, ALL workers currently employed for Mcdonalds. Collection of data on a whole population is called a census. Sample- A sample is a subset of the units in a population. Some forms of data collection are destructive. For example, crash test statistics for a particular model of car. Sampling is the only option as testing all cars would be impossible as they would all be destroyed. Two steps in analysing data from a sample; exploratory data analysis and statistical inference. These are related and both should be performed for any given data. Exploratory data analysis (EDA)- is the first step, in which numerical, tabular and graphical summaries of data are produced to summarise and highlight the key aspects and any special features of the data. Statistical inference uses sample data to reach conclusions about the population from which the sample was drawn. Usually the main aim of any statistical exercise and involves more formal data analysis techniques. An inference is a conclusion that patterns observed in the data sample are present in the wider population and clearly assumes that the sample is representative of the population.

Parameter- Is a descriptive measure of the population. Usually denoted by Greek letters. Examples of parameters are population mean, population standard deviation and population variance. A descriptive measure of a sample is called a Statistic. Usually denoted by Roman letters. Examples of statistics are sample mean, sample standard deviation and sample variance. Distinction between the two terms is important. A business researcher often wants to estimate the value of a parameter or draw inferences about the parameter. However the calculation of parameters is usually impossible or infeasible because of the amount of time and money required. In such cases, the business researcher can take a representative sample of the population and use the corresponding sample statistic to estimate the population parameter. 1.3 Types of Data Categorical data (qualitative)- is a data type that is simply an identifier or label and has no numerical meaning. For example, the employment of a person (teacher, doctor). This cannot be ranked in any meaningful way and thus is an example of a nominal data type. As another example, the grade in a test (A,B,C,D,E,F) is again simply a label and is a categorical data type. The test grades have a natural ordering and thus is an ordinal data type. Numerical data (quantitative)- have a natural order and the numbers represent some quantity. Two examples are the number of heads in ten tosses of a coin and the weights of rugby players. Note that in the first example we know in advance exactly which values the data may take (0,1…10), whereas in the second example all we can give is a range, say (80-140kg). The first example is that of a discrete data type, where we can list the possible values. The second example is that of a continuous data type, where we can only give a range of possible values for the data. Discrete data often arise from counting processes, while continuous data arise from measurements.

Cross-sectional data- data that are collected at a fixed point in time. Such data give a snapshot of the measured variables at that point in time. For example, monthly surveys of consumer confidence. Provides information on consumer confidence for the given month. Time series data- data collected over time. Example, data collected of consumer confidence over several months. 1.4 Obtaining data

Chapter 2. 2.1 Frequency Distributions Frequency distributions are a convenient way to group continuous data. It is a summary of the data presented as non-overlapping class intervals covering the entire range of data and their corresponding frequencies. Ungrouped data (Raw data)- data that have not been summarised in any way. Grouped data- data that have been organised into a frequency distribution. -

Range; difference between the largest and smallest data values.

To determine the width of each class interval, we divide the range of data by the number of class intervals and then suitably rounding the result. The example notation (0,20 000] indicates that the lower end is not included in the interval, but the upper end is included. Class midpoint (class mark)- the midpoint of each class. Relative frequency- is the ratio of the frequency of the class interval to the total frequency. This gives the proportion of data that lies in each class interval. Cumulative frequency- is the running total of frequencies through the classes of a frequency distribution. It is found by adding the frequency of that interval to the cumulative frequency of the previous class interval.

2.2 Graphical Display of Data Histograms- A histogram is a vertical bar chart, where the area of the bar is equal to the frequency of the corresponding interval. Most common graph for displaying continuous data. A histogram can show the shape of the distribution, spread or variability, central location of data, and any unusual observations such as outliers. -

Outliers- data points that appear outside of the main body of observations.

Frequency Polygons- are graphs constructed by plotting a dot for the frequency at the class mid-points and connecting the dots. Ogives- An ogive (pronounced o-jiive) is a cumulative frequency polygon. A dot of zero frequency is plotted at the beginning of the first class and construction proceeds by marking a dot at the end of each class interval for the cumulative value. Connecting the dots completes the ogive.

Chapter 3 3.1 Measures of central tendency Measures of central tendency yield information about the centre, or middle part, of a set of numbers. These include the mode, the median and the mean. Mode- most frequently occurring value in a set of data. Organising the data into an ordered array helps to locate the mode. It is an appropriate descriptive summary measure for categorical data. Bimodal- In the case of a tie for the most frequently occurring value, two modes are listed. Multimodal- data sets that contain two or more modes. Median- is the middle value in an ordered array of numbers. For an array with an odd number, the median is the middle number, if the array is even, it is the average of the two middle numbers. Another way to locate the median is by finding the (n+1)/2''th term in the ordered array. N equals the total number of observations. Mean (arithmetic mean)- the average of a set of numbers. It is found by summing all the numbers and dividing the sum by the count of numbers. The sample mean is represented by x, pronounced ‘x-bar’. The population mean is represented by the Greek letter mu (u). The capital Greek letter sigma ( ) is commonly used in mathematics to represent a summation of all the numbers in a grouping. N is used to represent the number of terms in a population, and n is used to represent the number of terms in a sample.

3.2 Measures of location Measures of location yield information about certain sections of a set of numbers when ranked into an ascending array. Percentiles- are measures of location that divide a set of data so that a certain fraction of data can be described as falling on or below this location. The Pth percentile is the value such that P% of the data are equal to or below that value and (100-P)% are above or equal to that value. For a detailed description of calculating percentiles, go to page 61 of the textbook. Interpolation- is a prediction of the value of something that is hypothetical or unknown based on other values that are known, such as other values in the data. Quartiles- are measures of location that divide a set of data into four subgroups or parts. The three quartiles are denoted as Q1, Q2 and Q3. The first quartile, separates the first quartile of the data from the upper three quartiles and is equal to the 25th percentile. The second quartile separates the second quarter of the data from the third quarter, and is located at the 50th percentile and equals the mean of the data. The third quartile, divides the first three quarters of the data from the last quarter and is equal to the value of the 75th percentile....


Similar Free PDFs