Mid-Terms Crash Course Pt 3 PDF

Title Mid-Terms Crash Course Pt 3
Course Quantitative reasoning with data
Institution National University of Singapore
Pages 4
File Size 157.1 KB
File Type PDF
Total Downloads 53
Total Views 138

Summary

nil...


Description

● Types of variables

Observing Data Macro-trends ● Data is often very large, with many data points ● Want to get some useful information out of data ● 2 ways we can observe data trends ○ Data visualisations using the following: ■ Histograms, boxplots, bar graphs, etc… ● Summary statistics: ○ Measures of central tendency ■ Mean, median, mode ○ Measures of dispersion ■ Standard deviation ■ Interquartile range Mean ● Mean ○ The average/mean of a list of numbers equals their sum, divided by how many there are ● Properties of mean ○ Adding a constant value c (positive or negative) to all points in the data set changes the mean to mean + c ○ Multiplying a constant value c to all points in the data set changes the mean to mean x c. ● What mean can tell us ○ Not all data points are below/above the mean. There exists at least 1 data point such that it is above/below the mean. ● What the mean cannot tell us ○ The mean cannot tell us anything about the distribution of the data points.

○ Eg: The distribution can be symmetrical about the mean, or it can be asymmetrical. Distribution of the points can also have a cluster of very high values, and a cluster of very low values etc… ● Finding the mean from mean of subgroups ○ Take the weighted average of means in the subgroups to get the overall mean Variance and Standard Deviation ● Standard deviation ○ A way of quantifying the “spread” of the data ○ Shows the average distance from each data point to their average (in picture below) ○ SD measures the size of deviation from average; it is a sort of average deviation ○ Square of the standard deviation is the variance and square root of variance gives the standard deviation

● Properties of standard deviation ○ Standard deviation is always non-negative ○ Adding a constant value c (positive or negative) to all points in the data set does not change the standard deviation ○ Multiplying a constant value c to all points in the data set changes the mean to standard deviation x |c|, where |c| is the absolute value of c. ● Coefficient of variation ○ = standard deviation/mean ○ A way of quantifying the degree of spread relative to the mean. Median ● Median ○ Is the middle value of a variable after arranging the values in ascending/descending order. ○ If there are 2 middle values, the median is the mean of the 2 middle values. ● Properties of median

○ Adding a constant value c (positive or negative) to all points in the data set changes the median to median + c ○ Multiplying a constant value c to all points in the data set changes the median to median x c. ● What median can tell us ○ 50% of all data points are below the median, and 50% of all data points are above the median ● What the median cannot tell us ○ The median cannot tell us anything about the distribution of the data points. ○ Eg: The distribution can be symmetrical about the median, or it can be asymmetrical. For example, a data set of numbers between 0 and 100. Say the median is 50. Half of the data points could all be 51 and the other half of the data points could be 0, so it is asymmetrical about the median. ● Finding the median from median of subgroups ○ Not possible to find overall median based solely on median of subgroups.

Quartiles and Interquartile range ● First quartile ○ Refers to the value, Q1 where 25% of the data points is below Q1 ● Third quartile ○ Refers to the value, Q3 where 75% of the data points is below Q3 ● Similar to concept of median, where 50% of data points lie above/below the median. ● Interquartile range ○ IQR = Q3 - Q1. ○ Tells us the spread of the middle 50% of the data. ○ Small IQR implies that middle 50% of data has small spread ○ Large IQR implies that middle 50% of data has large spread ● Properties of IQR ○ Adding a constant value c (positive or negative) to all points in the data set does not change the IQR ○ Multiplying a constant value c to all points in the data set changes the IQR to IQR x |c|. Mode ● Mode is the value that appears most frequently in the data. ● Properties of mode ○ Adding a constant value c (positive or negative) to all points in the data set changes the mode to mode + c ○ Multiplying a constant value c to all points in the data set changes the mode to mode x c.

● If mean = median = mode, then the distribution is symmetrical ○ Eg: the bell curve, or normal distribution has mean=median=mode....


Similar Free PDFs