STAT 103 - Lecture notes 1 PDF

Title STAT 103 - Lecture notes 1
Author Julia Hyman
Course Fundmntl Statistics
Institution Loyola University Chicago
Pages 3
File Size 118.9 KB
File Type PDF
Total Downloads 103
Total Views 179

Summary

Notes on probability lecture for Professor Bret Longman...


Description

STAT 103

How statistics are misused- all of these statements are based on studies or experiments, which yeild data ● “Dewey defeats Truman” Chicago sun said Dewey won presidential election even though Truman won ○ Happened because of biased polling sample data (not reflective of whole country) ● “Classical music makes babies smarter” ○ Study was actually done on college students- test on spacial relationships ○ Only worked once ● “Kids like to mutilate barbie” ○ Study of 100 8 yo British kids ○ Asked them what they THINK about doing to their toys ● “Non smokers exposed to second hand smoke are 25% more likely to get lung cancer than those not exposed” Data: ● Numbers or other concepts in a specific context ● Categorical (qualitative) data ○ Data within categories (color, race, gender) ○ Calculate the percent within a certain category ● Quantitative data ○ Numerical data (weight, height, etc) ○ Calculate averages, standard deviations Statistics: ● The correction and analysis of a sample of data for predicting the behavior of a population ● Population: the group you’re studying ○ Take a sample: a subset of the population used to represent the enitre population ■ Ex: all loyola students ------> take random sample of 182 students ■ Ex: pine trees in north dakota ------> take random sample of 98 trees ■ Ex: 10 year old children in Pakistan -------> sample of 212 children ○ Goal is to estimate a population parameter (a characteristic of the population) ○ Population parameter is estimated by a sample statistic ● Loyola students example: how many hours are they sleeping -----> 182 students averaged 6.2 hours per night ● Pine trees: % of all trees that have the disease --------> 61% of the 98 trees have the disease ● Kids: average height of the 10y/o --------> 212 kids averaged 4ft tall IMPORTANT!!!!!!!!!!!!!!!!!! This sentence will be on exam ● Population parameters are estimated by sample statistics Types of stats: (semester outline)

● ●

Discriptive statistics Probability ○ Print the probability review for class next Tuesday ● Normal distribution ● Confidence intervals & hypothesis testing for the mean ● Confidence intervals & hypothesis testing for porportion Controlled experiment ● Assigning treatments to experimental units in a controlled environment ○ We eliminate confounding factors (factors that affect your results outside outside of what you’re trying to research) ○ This is the advantage of a controlled experiment ● Often we blind the researchers and/or the experimental units to the treatments ● Disadvantage: due to ethical concerns, there are certain things you can’t do experiments on- dangerous things Observational studies ● We observe behavior to predict future ● Experimental units choose which group or treatment they’re in ● Disadvantage: must account for confounding factors ● Advantage: we can study pretty much anything Descriptive Statistics ● Random variable (y) - a variable whose outcome depends on a chance operation ○ Chance operation - flipping a coin ○ Random variable - y = result ( heads or tail ) ○ Ex: random sample of STAT 103 students ----> height of STAT 103 students ● Define random variable Y ○ Sample size = n ○ y1, y2, y3… yn = observed values of Y ○ yi= ith observed value for Y ----> i = 1,2,3 n ● Measures of center ○ Sample mean: average of a sample = sum of the observations divided by the sample size ■ Denoted





Ex: Y = stem length of tulips ○ 76, 72, 65, 70, 82 ○ N = 5 (sample size) ○ 76 + 72 + 65 + 70 + 82 / 5 = 73 mm

Sample median: value closest to the center, after arranging the data in ascending order ○ Denoted m

○ If n is an odd number, the median is the middle number ○ If n is even, the median is the average of the middle 2 numbers ● Mean v.s. Median ○ Mean has better mathematical qualities ■ We use the mean in other analysis ○ The median is less affected by unusually large or small values i.e. the median is more robust/ resistant than the mean ● EX: Trump says ‘my tax plan will save taxpayers an average of $1,000 next year’ ○ N = 100; y1= $1; y2= $1… y99= $1; y100 = $99,901 Y(bar) = $,1000 ■ But median is $1 3 Data sets 1) 4 5 5 5 6 ; mean and median both = 5 2) 0 1 5 9 10 ; mean and median both = 5 3) -100 0 5 50 70 ; mean and median both = 5 Measures of dispersion (spread) ● Range: maximum observation minus the minimum observation ● Sample standard deviation - the typical distance between an observation and its mean

● ● ●

Squared deviations (y1- ybar) squared Sum of squared deviations Divide by n-1...


Similar Free PDFs