Title | Week 2 Descriptive stats |
---|---|
Course | eipom |
Institution | University of Central Lancashire |
Pages | 12 |
File Size | 686.2 KB |
File Type | |
Total Downloads | 770 |
Total Views | 824 |
⛳Week 2 : Descriptive statsLearning outcomes Know the appropriate descriptive and exploratory statistics for given data types Compute the measures of central tendency and dispersion Interpret descriptive statistics and graphical displays of data from published research2 Analysis TypesWhy is analysis...
Week 2 : Descriptive stats Learning outcomes Know the appropriate descriptive and exploratory statistics for given data types Compute the measures of central tendency and dispersion Interpret descriptive statistics and graphical displays of data from published research
2.1 Analysis Types Why is analysis important - How is it used? To test hypothesis and make inferences to larger population from which sample was drawn To summarise data from sample 2 types of statistical analysis What is descriptive statistics? methods used to summarise or describe the main features/characteristics of a collection of data (sample) Other features No generalisation beyond sample
Use diagrams/ numerical techniques to show patterns from data What are the types of diagrams used? Histograms, Box-and-whisker plots, scatterplots, Bar charts, Pie charts What are the types of numerical techniques used? Mean, Median, Mode, Standard Deviation, Range, Interquartile Range(IQR), Frequencies, Percentages (incidence, prevalence, odds , risk etc.) Think of it like : If you just presented the raw data, its very hard to visualise what the data is showing espc when there is a lot of it, descriptive statistics allows us to present the data in a more meaningful way that allows simple interpretation of the data. What is inferential statistics methods used to make inferences from sample to larger population
2.2 Descriptive statistics
2.2.1 Numerical Data Histogram Symmetrical/Bellshaped Half of values are on left, Half of values are to right
Skewed Data Extreme(tail) values to right = Right / + skew Extreme (tail) values to left = Left / skew
Box and whisker plot Useful for comparing groups Line across middle = Median If median is not in center of box = distribution is skewed Lower Quartile 25% to Upper Quartile 75% If upper whisker is much longer than lower whisker, impression of positive skewness Outside = Outlier (denoted by * or circle) Larger the box = Greater the spread of data
Scatter Plots used to investigate the correlation or relationship between two variables **NOT causation just association btw 2 variables Positive Negative No Relationship
2.2.2 Categorical Data and Quantitative descriptive Display Frequencies / Percentages
Table Bar Chart
Pie Chart
Stacked Bar Chart
2.2.3 Numerical Summaries Measures of central tendency Averages Find centre around data Mean, Median, Mode Mean = average Median = Middle value of data set Mode = most frequently occurring value
Mean
FORMULA Mean = Sum of data / # of data points
E.g. 8 , 4, 5, 7 4578 )/4 = 6
*Mean can be sig. influenced by outliers Sample mean is X, population mean is μ
Median Middle value → divide data into 2 equal sets If even # of values , pick middle no. If odd # of values, calculate average of 2 middle numbers *Median is less affected by outliers than mean
Mode Most Freq value
Symmetrical - Mean and median is same
Skewed - Mean and median are different
Measures of dispersion/spread Variability or spread or dispersion - values are further apart around the measure of central tendency Range Min, Max) IQR 25th and 75th percentiles) SD(measure of variability around mean)
Range Difference btw highest and lowest value Disadvantage : Range is not very representative
IQR
46 50 /2 = 48 Median Q1 = 3642 / 2 = 39 Q3 = 5458 / 2 = 56
Q1 = 46/2 = 5 Q3 = 1214/2 = 26/2 = 13 IQR = Q3 Q1 = 135 = 8 Standard Deviation Average Difference of all data points
FORMULA Standard Deviation Find the sample mean. Subtract mean from each value in data set Square all the numbers individually Sum up all values Divide by (number of values 1 Square root everything
Dataset 25,42,50,58,75 Find the sample mean of data set. Mean of data set = Sum of data points/ Number of data points = 2542505875 / 5 = 250/5 = 50 2. Take each value and subtract it from the mean of the dataset. 255025 42508 50500 58508 755025 3. Square all the values. 25^2 = 625 8^264 0^2 = 0 8^264 25^2625 4. Sum up all squared values. 625 64064 + 625 = 1,378 5. Divide that summation by the ( total number of data points 1
= 1378 - 51 = 1378 4 = 1374 6. Find the square root of the entire value Ans : 37
RULE : For data with symmetric shape : 68% of sample data falls within +- 1 SD 95% of sample data fall within 2 SD
Mean weight loss = 10kg Standard deviation = 2.5kg 1 S.D. 10kg +- 2.5kg 7.5kg and 12.5kg We can state that 68% of patients lost between 7.5 and 12.5kg. 2 S.D. 10kg+- 22.5kg)=5kg and 15kg We can state that 95% of patients lost between 5kg and 15kg.
Mean = 72kg S.D. = 8kg What is the interval of values around the mean that includes (a) 95% of the observed values , (b)68% of observed values ? Remember 95% is 2 SD
(a) 72kg +- 28 72kg +- 16 72 - 16 = 56kg 72 1688kg 5688 Range Remember 68% is +- 1SD (b) 72kg 8 72 8 = 64 728 = 80 6480 Range
If Mean & standard deviation present - normal distribution If Median & interquartile range present - skewed distribution Left skew – extreme values on the left Right skew – extreme values on the right If there are no outliers, normal distribution, bell shaped, u use the mean. When there are outliers, data is skewed, asymmetrical, u use median
Results are skewed as both median and IQR was used...