Week 2 Descriptive stats PDF

Title	Week 2 Descriptive stats
Course	eipom
Institution	University of Central Lancashire
Pages	12
File Size	686.2 KB
File Type	PDF
Total Downloads	770
Total Views	824

Preview

CLICK TO PREVIEW PDF

Summary

⛳Week 2 : Descriptive statsLearning outcomes Know the appropriate descriptive and exploratory statistics for given data types Compute the measures of central tendency and dispersion Interpret descriptive statistics and graphical displays of data from published research2 Analysis TypesWhy is analysis...

Description

Week 2 : Descriptive stats Learning outcomes Know the appropriate descriptive and exploratory statistics for given data types Compute the measures of central tendency and dispersion Interpret descriptive statistics and graphical displays of data from published research

2.1 Analysis Types Why is analysis important - How is it used? To test hypothesis and make inferences to larger population from which sample was drawn To summarise data from sample 2 types of statistical analysis What is descriptive statistics? methods used to summarise or describe the main features/characteristics of a collection of data (sample) Other features No generalisation beyond sample

Use diagrams/ numerical techniques to show patterns from data What are the types of diagrams used? Histograms, Box-and-whisker plots, scatterplots, Bar charts, Pie charts What are the types of numerical techniques used? Mean, Median, Mode, Standard Deviation, Range, Interquartile Range(IQR), Frequencies, Percentages (incidence, prevalence, odds , risk etc.) Think of it like : If you just presented the raw data, its very hard to visualise what the data is showing espc when there is a lot of it, descriptive statistics allows us to present the data in a more meaningful way that allows simple interpretation of the data. What is inferential statistics methods used to make inferences from sample to larger population

2.2 Descriptive statistics

2.2.1 Numerical Data Histogram Symmetrical/Bellshaped Half of values are on left, Half of values are to right

Skewed Data Extreme(tail) values to right = Right / + skew Extreme (tail) values to left = Left / skew

Box and whisker plot Useful for comparing groups Line across middle = Median If median is not in center of box = distribution is skewed Lower Quartile 25% to Upper Quartile 75% If upper whisker is much longer than lower whisker, impression of positive skewness Outside = Outlier (denoted by * or circle) Larger the box = Greater the spread of data

Scatter Plots used to investigate the correlation or relationship between two variables **NOT causation just association btw 2 variables Positive Negative No Relationship

2.2.2 Categorical Data and Quantitative descriptive Display Frequencies / Percentages

Table Bar Chart

Pie Chart

Stacked Bar Chart

2.2.3 Numerical Summaries Measures of central tendency Averages Find centre around data Mean, Median, Mode Mean = average Median = Middle value of data set Mode = most frequently occurring value

Mean

FORMULA Mean = Sum of data / # of data points

E.g. 8 , 4, 5, 7 4578 )/4 = 6

*Mean can be sig. influenced by outliers Sample mean is X, population mean is μ

Median Middle value → divide data into 2 equal sets If even # of values , pick middle no. If odd # of values, calculate average of 2 middle numbers *Median is less affected by outliers than mean

Mode Most Freq value

Symmetrical - Mean and median is same

Skewed - Mean and median are different

Measures of dispersion/spread Variability or spread or dispersion - values are further apart around the measure of central tendency Range Min, Max) IQR 25th and 75th percentiles) SD(measure of variability around mean)

Range Difference btw highest and lowest value Disadvantage : Range is not very representative

IQR

46 50 /2 = 48 Median Q1 = 3642 / 2 = 39 Q3 = 5458 / 2 = 56

Q1 = 46/2 = 5 Q3 = 1214/2 = 26/2 = 13 IQR = Q3 Q1 = 135 = 8 Standard Deviation Average Difference of all data points

FORMULA Standard Deviation Find the sample mean. Subtract mean from each value in data set Square all the numbers individually Sum up all values Divide by (number of values 1 Square root everything

Dataset 25,42,50,58,75  Find the sample mean of data set. Mean of data set = Sum of data points/ Number of data points = 2542505875 / 5 = 250/5 = 50 2. Take each value and subtract it from the mean of the dataset. 255025 42508 50500 58508 755025 3. Square all the values. 25^2 = 625 8^264 0^2 = 0 8^264 25^2625 4. Sum up all squared values. 625 64064 + 625 = 1,378 5. Divide that summation by the ( total number of data points 1

= 1378 - 51 = 1378 4 = 1374 6. Find the square root of the entire value Ans : 37

RULE : For data with symmetric shape : 68% of sample data falls within +- 1 SD 95% of sample data fall within 2 SD

Mean weight loss = 10kg Standard deviation = 2.5kg 1 S.D. 10kg +- 2.5kg 7.5kg and 12.5kg We can state that 68% of patients lost between 7.5 and 12.5kg. 2 S.D. 10kg+- 22.5kg)=5kg and 15kg We can state that 95% of patients lost between 5kg and 15kg.

Mean = 72kg S.D. = 8kg What is the interval of values around the mean that includes (a) 95% of the observed values , (b)68% of observed values ? Remember 95% is 2 SD

(a) 72kg +- 28 72kg +- 16 72 - 16 = 56kg 72 1688kg 5688 Range Remember 68% is +- 1SD (b) 72kg 8 72 8 = 64 728 = 80 6480 Range

If Mean & standard deviation present - normal distribution If Median & interquartile range present - skewed distribution Left skew – extreme values on the left Right skew – extreme values on the right If there are no outliers, normal distribution, bell shaped, u use the mean. When there are outliers, data is skewed, asymmetrical, u use median

Results are skewed as both median and IQR was used...