ECON1203 Notes PDF

Title ECON1203 Notes
Course Business and Economic Statistics
Institution University of New South Wales
Pages 46
File Size 3.2 MB
File Type PDF
Total Views 137

Summary

Whole course notes...


Description

ECON1203: Business and Economic Statistics

Table of Contents WEEK 1: INTRODUCTION, CENTRAL TENDANCY, SPREAD AND SKEWEDNESS (CHAPTERS 2 & 3) ...... 1 DESCRIPTIVE STATISTICS ....................................................................................................................... 1 TERMS AND DEFINITIONS.............................................................................................................................. 2 TYPES OF DATA: CONCEPTS AND JARGON......................................................................................................... 2 TYPES OF OBSERVATIONS ............................................................................................................................. 2 DESCRIPTIVE STATISTICS 1: FREQUENCY DISTRIBUTIONS ............................................................................... 2 OTHER DISPLAYS ......................................................................................................................................... 3 BAR CHARTS AND PIE CHARTS ........................................................................................................................ 3 DESCRIPTIVE STATISTICS 2: HISTOGRAMS ................................................................................................. 3 HISTOGRAMS ............................................................................................................................................. 4 DESCRIBING HISTOGRAMS ............................................................................................................................ 4 DESCRIPTIVE STATISTICS 3: BIVARIATE RELATIONSHIPS? .............................................................................. 4 INTERPRETING SCATTERPLOT ......................................................................................................................... 4 TIME SERIES PLOT ....................................................................................................................................... 5 NUMERICAL SUMMARIES...................................................................................................................... 5 MEASURES OF LOCATION ............................................................................................................................. 5 MEASURES OF VARIABILITY ........................................................................................................................... 5 STANDARDISING DATA................................................................................................................................. 6 COEFFICIENT OF VARIATION.......................................................................................................................... 6 MEASURES OF RELATIVE LOCATION................................................................................................................ 6 MEASURES OF ASSOCIATION......................................................................................................................... 7 MEASURES OF SHAPE .................................................................................................................................. 8

Week 1: Introduction, Central Tendancy, Spread and Skewedness (Chapters 2 & 3) Inferential stats à from descriptive to policies for the whole nation

Descriptive Statistics • •

• •

The nature and potential of data Descriptive statistics o Frequency distributions and histograms o Shapes of distributions o Describing bivariate relations o Measures of central tendency or location o Measures of dispersion or spread Measures of association Introduction to linear regression

Terms and definitions • • •

Raw or ungrouped data: data that has not been summarised in any way Grouped data: data that has been organised into a frequency distribution Range: difference between the largest and smallest data values

Types of data: concepts and jargon •



Variable: characteristic of a population or of a sample from a population o Observe values or observations of a variable o Data set: contains observations on variables o E.g. height, hair colour, skin colour, etc Variables may be: o Discrete (countable) or continuous (measurable) § Discrete: Football scores § Continuous: Time remaining in a football game o Quantitative (numerical) or qualitative (categorical) § Quantitative: exam scores and time § Qualitative: gender o Ordinal qualitative data feature a natural ordering § E.g. course evaluations: poor, good, very good § Standard and poor ratings: AAA > AAA- > AA+ > AA… o To apply statistical analyses directly to qualitative data, we must convert it to quantitative data

Types of observations •



The type of observation made by the statistician can also be used to classify data o Time series data: measurements of the same concept at different points in time § E.g. Sydney area births per day, for each day in a year § i.e. looking at the same type of data and following it for a period of time (i.e. each day in March) to see trends, etc o Cross sectional data: measurements of one or more concepts at a single point in time § E.g. age, gender, and marital status of a sample of UNSW staff in a particular year § i.e. considering many concepts/variables at a single point in time (e.g. January 2019) The type of data you have influences what type of analysis is appropriate o Examining monthly or seasonal patterns in number of births o Suppose marital status is coded as single = 1, married= 2, divorced= 3, widowed= 4 § Would it make sense to calculate the “average marital status” of your sample of UNSW staff? No, can do the most common, etc instead

Descriptive statistics 1: Frequency distributions • •

Summaries of categorical data using counts Example: UNSW is interested in how students get to campus for long term planning o Categories need to be mutually exclusive and exhaustive o Mutually exclusive: if you are in one category, you cannot be in another category

Mode of transport to campus Resident Walk Cycle Car Bus Other Total

Frequency

Relative Frequency

100

§

o o

o

Event cannot be both mutually exclusive and independent as the occurrence of a mutually exclusive event causes the probability of the other event to go to zero, which violates the definition of independent events § Event and its complement cannot occur at the same time and therefore are mutually exclusive § Probability will be greater if events are mutually exclusive than independent Exhaustive: all the categories fit in the data set, all options covered Relative frequency: making them into percentages § Represents the proportion of the total data that lie in the class interval, it is the ratio of the frequency of the class interval to the total frequency i.e. 7/50= 14% Class midpoint or class mark: the midpoint of each class. It can be calculated by taking the average of the class endpoints

Other displays •

Can convert the information contained in frequency distribution into o Cumulative frequency or relative frequency distributions § Cumulative frequency: the running total of frequencies through the classes of a frequency distribution, found by adding the frequency of that interval to the cumulative frequency of the previous class interval • Aussie marks: how many students got a credit or better? • Associated cumulative histograms o Stem and leaf displays § Some information needed to answer a question may be lost in histograms e.g. do examiners avoid marks close to a borderline?

Bar charts and pie charts • •

Bar Charts: graphical representation of frequency distributions Pie Charts: show relative frequencies more explicitly o Used to display categorical data. It is a circular display of data where the area of the whole pie represents 100% of the data being studied and slices represent percentage contributions of the sublevels

Descriptive statistics 2: Histograms •



Suppose data are ordinal (whether discrete or continuous) o Obvious categories for the data values may not exist o Can create categories or classes by defining lower- and upper-class limits o Categories need to be mutually exclusive and exhaustive How many categories? (or bins according to Excel) o Too many à doesn’t summarise o Too few à no information o Not set rules on number of bins o Bins need not be of equal width and may be open-ended at the top or bottom

Histograms •



Frequency distribution of continuous data as a vertical bar chart o If class intervals are equal in size, then bars in histogram are drawn equal in width and height of each bar represents frequency of corresponding interval o If class intervals chosen are not equal in size, bars of chart will vary in width (representing the class interval) and height (representing the frequency density) Beware of excel features o Gaps between bars don’t look great o Bin label default can be confusing o Bar areas should be proportional to frequencies

Describing histograms •





Symmetry (or lack thereof) o The left half of a symmetric histogram is a mirror image of the right half o The famous ‘bell shaped curve’ is symmetric Skewness o Feature of an asymmetric histogram o Long tail to the right: positively skewed o Long tail to the left: negatively skewed o May be associated with outliers Number of modal classes/bins o Model class: class with the highest frequency o Histograms may be unimodal (one peak), bimodal (2 peaks) multimodal (multiple peaks, 2 or more)

Descriptive statistics 3: Bivariate relationships? •

How can be characterise relationships between variables? o Contingency table (“cross tabulation” or “cross tab” table) § Captures the relationship between two qualitative variables o Scatterplots § Capture the relationship between two quantitative variables § If one of the variables is “time”, then we get a time series plot

Interpreting scatterplot • • • •

Positive relationship- higher GDP per capital associated with more cars Relationship looks approx. linear Some exceptions (outliers) e.g. Japan 2 clusters- developed vs undeveloped

Time series plot • •

Bivariate relationship between some variable and time e.g. petrol prices and time Interpreting the graph: o Upward trend- positive relationship between price and time o Increased volatility over time o Spiked in 1990- Gulf War o Gap in data in 1997? o Sydney and Melbourne prices move together o Brisbane prices lower especially in later periods

Numerical Summaries • • •

Key features of a single variable: location, spread, relative location, skewness Key features of two variables: measures of linear association We are interested in a population, but typically have data only on a sample

Measures of location • • • •





• •

Measures of location: yield information about certain sections of a set of numbers when ranked onto an ascending array Parameter: key feature of a population Statistic: key feature of a sample Arithmetic mean: natural measure of “location” or “central tendency” o Average of a set of numbers and is computed by summing all numbers and diving the sum buy the count of numbers Other variants: o Weighted mean (WAM) o Geometric mean Median: middle value of ordered observations o When n is odd, median will be a particular value o When n is even, median is the average of the middle two values o Median depends on ranks of observations, not absolute values, therefore doubling the largest observation will not change the median Mode: most frequently occurring value Mean, median and mode each provide different notions of “representative” or “typical” central values o Don’t use mode all that much o Mean vs mode § Symmetric distributions, mean = median § For positively skewed data: mean > median § For negatively skewed data: mean < median § Median often preferred when data contains outliers § Which is better depends on what you do with the information

Measures of variability • •

Measures of variability: describe the spread or dispersion of a set of data Range: simple measure of variability; the difference between the maximum and minimum values of a dataset o Range= maximum – minimum o Range is simple but potentially misleading i.e. could be 1 1 1 50 50 or 1 10 20 40 50





Variance: most common measure of variability o Measured average squared distance from the mean o Variance is in squared units, so all distances are positive o Measure spread of returns o N = number of observations/results minus 1 (N-1) Standard deviation: spread measured in the original units of the data (not squared) o Need to put a square root around equation to find standard deviation o Average distance each of the scores are away from each other

Standardising Data •









Can create a transformed variable with zero mean and “unit” variance from any original quantitative variable o Transformed variable is free of units of measurement o Called calculating Z-scores (one Z-score per observation) o Calculate [observation – mean] and then divide this difference by the standard deviation Z-Score: represents the number of standard deviations that a value (x) is above or below the mean of a set of numbers o Using z-scores, allows translation of a value’s raw distance from the mean into units of standard deviations o If z-score is negative, raw value (x) is below the mean o If z-score is positive, raw value (x) is above the mean For example: a teacher determines her students’ marks are normally distributed with a mean of 60 and a standard deviation of 5. She wants to determine the zscore of a student with a mark of 70. Z-score signifies student’s mark is two standard deviations above the mean Suppose that for Mutual Fund A, maximum return is 63%. This point has a Z score of (6310.95)/21.89 = 2.38 o 63% is 2.38 standard deviations about the mean return Z-scores allow direct comparisons across original variables measured in different units

Coefficient of Variation • •



Coefficient of variation (CV): a descriptive summary measure that is the ratio of the standard deviation to the mean expressed as a percentage Sometimes wish to measure variation relative to location o Case 1: observations all measured in millions and standard deviation is 20 à relatively little variability o Case 2: observations all positive but less than 100 à s = 20 may indicate a lot of variability Formula: standard deviation/ mean o Measure of relative variability o Comparable across variables

Measures of Relative Location • •

Median relies on a ranking of observations to measure location Idea generalises to percentiles

Percentile is the value for which P percent of observations are less than that value Median is 50th percentile 25th and 75th percentile is called lower and upper quartiles Difference between upper and lower quartiles is called the interquartile rangeanother measure of spread Percentiles: measures of location that divide a set of data so that a certain fraction of data can be described as falling on or below this location o Widely used in reporting test results Interpolation: prediction of the value of something that is hypothetical or unknown based on other values that are known Quartiles: measures of location that divide a set of data into four subgroups or parts Interquartile range (IQR): distance between first and third quartiles o Interquartile range: IQR = Q3 – Q1 o o o o



• • •

Measures of Association • •

• • •

“do large values of x tend to be associated with large values of y?” Covariance is a numerical measure o Positive covariance à positive linear association o Negative covariance à negative linear association o Zero covariance à no linear association Covariance between height and weight depends upon the units of measure of each variable Correlation: a measure of the degree of relatedness of two or more variables Correlation coefficient: standardised, unit free measure of association o 1 (-1) à perfect positive (negative) linear relationship

Measures of Shape •

Measures of shape: tools that are used to describe the appearance of a distribution of data o Mean > mode/median = positively skewed o Mean < mode/median = negatively skewed

Week 2: Introduction to Probability and Discrete Random Variables •

Chapters 4 and 5

Introduction • In descriptive statistics, objective is to describe what is occurring, such as using an average, a numeric measure that summarises the central tendency of the data • Inferential statistics involves using sample data to describe and make conclusions about what is occurring at the population level o Probability basis for inferential statistics o Analyst conducts inferential process under uncertainty Probability • The mathematical means of studying uncertainty • Provides the logical foundation of statistical inference o Studying probability helps us make judgement calls to support decisions on basis of partial information • To understand probability formally, need to understand: o Independence o Mutual exclusivity o Joint, marginal and conditional probability o Counting rules- combinations and permutations • Will concentrate on probability distributions o Rather than looking at individual events, look at every possible outcome • A probability is assigned to every possible outcome within a sample space • In statistics, we have a sample and see what we can infer from the information Methods of determining probability • Classical method o Involves assuming that each outcome is equally likely to occur, with no assumed knowledge or historical basis for what will occur o E.g. if a customer walks into a store that offers a $59 phone plan, $79 phone plan, $99 plan and $129 plan, what is the probably that they choose the $99 plan? ¼= 25% •

o Range of possible probabilities (highest value 1) Relative frequency of occurrence method o Based on cumulated historical data

Probability of an event occurring is equal to the number of times the event has occurred in the past divided by the total number of opportunities for the event to have occurred o Sometimes called empirical probability Subjective probability method o Based on the feeling or insights of the person determining the probability o



Structure of probability • Experiment: a process or activity that produces outcomes (i.e. interviewing 20 consumers and asking them which brand of appliance they prefer) • Event: the outcome of an experiment • Elementary events: events that cannot be decomposed or broken down into other events o E.g. suppose the experiment is to roll a die. Rolling an even number is an event but it is not an elementary event because it cannot be broken down into further events • Sample space: a complete roster or listing of all elementary events for an experiment e.g. sample space for the role of a single die is 1,2,3,4,5,6 • Set notation: the use of various mathematical symbols to define a group or set of events, as well as to describe outcomes relating to their occurrence with other sets of events • Union: formed by combining elements from each set i.e. X and Y • Venn diagram: graphical representation of how any event may occur in terms of its possible co-occurrence with any other event • Intersection: contains the elements common to all sets i.e. intersection of X and Y • Collectively exhaustive events: contains all possible elementary events for an experiment Probability Review • Let e and f be two events in a sample space, S •







o If e and f are mutually exclusive, then...


Similar Free PDFs