Stats notes - with Prof sentao miao PDF

Title Stats notes - with Prof sentao miao
Author Aiza Malik
Course Introduction to Financial Accounting
Institution McGill University
Pages 58
File Size 3.9 MB
File Type PDF
Total Downloads 98
Total Views 150

Summary

with Prof sentao miao...


Description

Chapter 1 – Examining Distributions Population vs. Sample  a sample has a subset of observation (cases) o Population: all houses in montreal, all financial advisors o Sample: 100 houses  Observation: one observed house in sample, one observed financial advisor Variable: a characteristic in an observation  There can be many variables in an observation  Ex: a house in MTL is one observation o Variables are: o Year built o Price o # of bedrooms o Neighborhood o Type of property o ID o Address Types of variables:  Quantitative Variable: o Numerical values o We can do arithmetic with them  Year built, price, # of bedroom. 

Categorical Variable: o Places an observation into categories o We can’t do arithmetic with them  Neighborhood, type of property



Label Variable: o Puts a label on the observation o We can’t do arithmetic with them  ID, address

Examining quantitative variables: Measures of position:  Refers to the central location of the data  Ex: what is the average price of a house in Montreal?

Measures of dispersion  Refers to how spread out (dispersed) the data are around the central location



Ex: do most houses cost the same, or do prices vary a lot?

Measures of position:  Mean  Median  Percentiles & quartiles  Mode Measures of dispersion:  Variance  Standard deviation  Coefficient of variation  Interquartile range Measures of position: Mean: it is the “center of mass”. The average Population mean: Population mean:  You have a population of N observations  Denote by  the Population Mean :  In excel AVERAGE() gives you mean μ=

x 1+x 2+ … . + xN N

Sample mean:  Sample with n observations  Observation 1 takes value x1, observation 2 takes value x2…  Denote x  Denote sample mean by

Median: value at which  50% of the data lie above and 50% of data lie below Step 1: sort observations by size Step 2:

   

If n Is off, median is the middle observation If n is even, median is the mean of two middle observations Unlike mean, median is not affected by outliers MEDIAN() in excel gives median

Percentiles: Pth percentile is the value at or below which P% of the observations lie  80th percentile- 80% of the observations lie below the value of P80 Quartiles: these are 3 “special” percentiles. Sorting the data from the smallest to the largest value 

Quartile 3: the 75th percentile o The median of the data above the overall mean



Quartile 2: the 50th percentile (the mean)



Percentile 1: the 25th percentile o The median of the data below the overall median



PERCENTILE (array, p%) gives you the percentile P

Mode: Mode: value which occurs most often  There may be no mode  May be several modes Same definition for both Population and for Sample Excel: MODE() gives you the mode  Calculating in excel can be problematic If multiple modes exist

Variance : measures how far a set of observations are spread out from the mean value  If the observations are crowed: variance is small  If the observations are spread out: variance is large The population variance for a population with N observation is denoted by o^2 (sometimes denoted by o^2) where u is the population mean:

Same variance for a sample with n observations is denoted by o^2 sample (or s^2) where x is the sample mean

Standard deviation: square root of variance

Population standard deviation:

 

Standard deviation has the same units as the data Variance has units (units of data)^2

Excel: VAR() STDEV.S() Variances can only be compared when:  They have the same units of measurement, and  Mean values are equal When these conditions are not met:  Use coefficient of variation to compare variability between two samples



Coefficient of variation is Unitless

in Excel: You calculate it manually 100*STDEV.S()/AVERAGE( Interquartile range: IQR: the range of the middle 50% of the observations in a data set.  75th percentile minus the 25th percentile, or  3rd quartile minus 1st quartile (Q3 – Q1)

Graphical analysis: BOX plot

Box plot: represents graphically the general shape of a distribution. The graph contains  Minimum value of the observation (excluding others)  Quartile 1  Q2  Q3  Maximum value of the observation  Outliers (sometimes if necessary)

An observation is considered an outlier if

Histogram:  Value of the variable is plotted on the horizontal axis  Frequency is plotted on the vertical axis

Skewness: refers to the shape of the distribution of values

In excel SKEW() gives u the skewness SKEW.P() for population Normal distribution:

  

smooth version of a histogram represents a distribution tells how often outcomes occur o symmetric o bell shaped o possible outcomes range from neg infinity to positive infinity o probabilities are always positive

Normal distribution is described in terms of:  Mean u  Standard deviation o We denote different Normal distributions with N (u, o) The normal distribution can be used to model continuous variables (ex. Temperature in C. stock returns in $, height of students in meters 

The mean: Determines where curve is centered



The standard deviation:Determines how spread the curve is around the mean



Roughly speaking, the variance measures the spread from the mean value

Area under the curve 

Shaded area represents the probability of something happening

For a continuous distribution, we cannot as "what is the probability that event x happens But we can ask, for example  What is the probability that a randomly chosen student at mcgill has a GPA between 3.1 and 3.3.

The 68-95-99.7 RULE The rule: is an easy way to remember what is the area under the graph  Approx 68% of the area is between u +/- o (standard deviation away from the mean



Aprox 95% of the area is between u +- 2o (2 standard deviations away from the mean)



Aprox 99.7 of the area between 3 standard deviations of the mean

what if our normal distribution is not standard? How to compute probabilities? Converting ay normal distribution into a standard normal:  We use the z-score 

Let x be an observation from a normal distribution N(u,o)

One interpretation of z-score is:  Z tells you how many standard deviations o. the point x is away from the mean value

Chapter 2 – Examining Relationships Scatter plot: visually examining data 

Scatter plot displays two quantitative variables as pairs of data



It is your job to decide whoch is the explanatory variable (x axis)



Which is the dependent variable

What to look for in a scatter plot?    

Direction of association Strength of relationship Outliers Skewness

Direction of association



There might be a non linear association between variables o Ex stats for divorce

Outliers   

An observation that ha sa low probabuility of occurrence Unusual or unexpected Outliers can be dangerous

Skewness: when one of the variables (or both) are skewed, then it might be difficult to analyze visually  Skewness can sometimes be partly remedied by a log transformation on the data

Coefficient of correlation:  Denoted by r  Measures the strength of the linear association between two variables x and y

Properties:       

R is always between -1 and 1 If r>0, then correlation is positive If r...


Similar Free PDFs