Stats 250 Completed Notes PDF

Title Stats 250 Completed Notes
Author Han Na Shin
Course Introduction to Statistics and Data Analysis
Institution University of Michigan
Pages 212
File Size 10.4 MB
File Type PDF
Total Downloads 66
Total Views 140

Summary

All solutions and notes for 250...


Description



Statistics250 

Interactive LectureNotes Solutions 

Fall2015–Winter2016        Dr.BrendaGunderson DepartmentofStatistics UniversityofMichigan 









Stat250GundersonLectureNotes Introduction 



Statistics...the most important science in the whole world: for upon it depends the practicalapplicationofeveryotherscienceandofeveryart:theonescienceessentialto allpoliticalandsocialadministration,alleducation,allorganizationbasedonexperience, foritonlygivesresultsofourexperience."FlorenceNightingale,Statistician   

Definitions:  Statisticsarenumbersmeasuredforsomepurpose.  Statisticsisacollectionofproceduresandprinciplesforgatheringdataand analyzinginformationinordertohelppeoplemakedecisionswhenfacedwith uncertainty.  

 CourseGoal:Learnvarioustoolsforusingdatatogainunderstandingandmakesounddecisions abouttheworldaroundus.  



1







2

Stat250GundersonLectureNotes 1:SummarizingData 

“Youmustnevertellathing.Youmustillustrateit.Welearnthrough theeyeandnotthenoggin." ‐‐WillRogers(1879‐1935)  

 “Simplesummariesofdatacantellaninterestingstoryandareeasiertodigestthanlonglists.” Sowewillbeginbylookingatsomedata.   

RawData 

Rawdatacorrespondtonumbersandcategorylabelsthathavebeencollectedormeasuredbut havenotyetbeen processed inanyway.Onthe nextpage isasetofRAWDATA‐information aboutagroupofitemsinthiscase,individuals.ThedatasettitleisDEPRIVEDandhasinformation about a sample size of n = 86 college students. For each student we are provided with their answertothequestion:“Doyoufeelthat youare sleepdeprived?”(yesor no),andtheirself reportedtypicalamountofsleeppernight(inhours).Theinformationwehaveisorganizedinto variables.  In this case these 86 college students are a subset from a larger population of all collegestudents,sowehavesampledata.   

Definition: Avariableisacharacteristicthatdiffersfromoneindividualtothenext.   Sampledataarecollectedfromasubsetofalargerpopulation.  Populationdataarecollectedwhenallindividualsinapopulationaremeasured.  Astatisticisasummarymeasureofsampledata.  Aparameterisasummarymeasureofpopulationdata. 

 TypesofVariables 

Wehave2variablesinourdataset.Nextwewanttodistinguishbetweenthedifferenttypesof variables‐differenttypesofvariablesprovidedifferentkinds ofinformationandthetypewill guidewhatkindsofsummaries(graphs/numerical)areappropriate. 



3

Thinkaboutit:  Couldyoucomputethe“AVERAGEAMOUNTOFSLEEP”forthese86students?YES  Couldyoucomputethe“AVERAGESLEEPDEPRIVEDSTATUS”forthese86students?NO (couldcode,butwouldbearbitrary:0and1,orcoulduseanytwovalueslike1and203) 





SLEEPDEPRIVEDSTATUSissaidtobeaCATEGORICALvariable, 



AMOUNTOFSLEEPisaQUANTITATIVE_variable.    



Definitions: A categorical variable places an individual or item into one of several groups or categories.  When thecategories have an ordering or ranking, it is called an ordinal variable. 

Aquantitativevariabletakesnumericalvaluesforwhicharithmeticoperationssuchas adding and averaging make sense. Other names for quantitative variable are: measurementvariableandnumericalvariable.

 TryIt!– Foreachvariablelistedbelow,giveitstypeascategoricalorquantitative. 



 Age(years) 

QUANTITATIVE



 TypicalClassroomSeatLocation(Front,Middle,Back) CATEGORICAL 



 NumberofsongsonaniPodQUANTITATIVE 



 Timespentstudyingmaterialforthisclassinthelast24‐hourperiod(inhours)QUANTITATIVE 



 SoftDrinkSize(small,medium,large,super‐sized) 

CATEGORICAL(ordinal)



 The“Andthen...”countrecordedinapsychologystudyonchildren(detailswillbeprovided) numberoftimesachildsaysthephrase“andthen”whenaskedtorecallastudythey justheard QUANTITATIVE

  Lookingahead:Later,whenwetalkaboutrandomvariables,wewilldiscusswhether avariableismodeleddiscretely(becauseitsvaluesarecountable)orwhetheritwould bemodeledcontinuously(becauseitcan takeanyvalue inan intervalorcollection of intervals).Gobackthroughthelistaboveandthinkaboutisitdiscreteorcontinuous? 









4

DATASET=DEPRIVED From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.





 FeelSleep Deprived? No No No Yes Yes Yes Yes Yes No No No No Yes Yes Yes No No No Yes Yes No No No Yes No No No No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes Yes No Yes

AmountSleep perNight(hours) 9 7 8 7 7 8 7 8 10 8 9 8 8 4 6 8 10 4 7 8 9 9 7 8 9 9 8 6 9 7 11 7 9 7 8 7 7 9 1 7 6 8 6



FeelSleep Deprived? No No No Yes Yes Yes Yes Yes No Yes No No Yes Yes No No Yes Yes Yes No Yes Yes No Yes Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Yes Yes Yes No No Yes Yes Yes





5

AmountSleep perNight(hours) 8 7 9 7 7 7 7 6 8 6 9 8 7 8 8 8 7 7 7 7 7 8 7 7 7 7 8 6 6 8 9 7 8 6 7 8 5 6 7 8 8 7 6

Ourdatasetissomewhatlarge,containingalotofmeasurements inalonglist. Presentedasa tablelisting,wecanviewtherecordofaparticularcollegestudent,butitisjustalisting,andnot easytofindthelargest valuefor the amountofsleep or thenumberofstudents whofeltthey aresleepdeprived.Wewouldliketolearnappropriatewaystosummarizethedata. 

SummarizingCategoricalVariables 



NumericalSummaries How would you go about summarizing the SLEEP DEPRIVED STATUS data? The first step is to simplycounthowmanyindividuals/items fallintoeach category.Sincepercentsare generally more meaningful than counts, the second step is to calculate the percent (or proportion) of individuals/itemsthatfallintoeachcategory. 

Count = Frequency

Percent or Proport ion is ok!

SleepDeprived?

Count

Percent

Yes

51

(51/86)*100=59.3%

No

35

(35/86)*100=40.7%

Total

86

100%

The table above provides both the frequency distribution and the relative frequency distributionforthevariableSLEEPDEPRIVEDSTATUS.

 VisualSummaries There are two simple visual summaries for categoricaldata–abargraphorapiechart.Hereis thetablesummaryandbargraphmadewithR. Ifyouweremakingone:Don’tforgettolabeleach axisandshowsomevaluesoneachaxis! 

c oun t s : De p r i v e d No Yes 35 51 per c ent ages : De p r i v e d No Yes 40. 7 59. 3 

Aside:Doesitmatterwhether the‘No’or‘Yes’barisgivenfirst?No,notordinalhere=> weshouldnotcommenton‘shape’(i.e.donotusewordslike“skewed”or“increasing pattern”here)



6

PieChart:Anothergraphforcategoricaldatawhichhelpsussee whatpartofthewholeeach groupforms.  Piechartsarenotaseasytodrawbyhand. Itisnotaseasytocomparesizesofpie piecesversuscomparingheightsofbars.  Thuswewillprefertouseabargraphfor categoricaldata.   Recap: We have discussed that some variables are categorical and others are quantitative.Wehaveseen thatbargraphs and pie charts can be used to display data for categorical variables.  We turn next to displaying the data for quantitative variables.      

ExploringFeaturesofQuantitativeDatawithPictures  RecallourSleepDeprivedDataforn=86collegestudents.Wehavedataontwovariables:sleep deprivedstatusand hoursofsleeppernight. Howwouldyougoaboutsummarizingthesleep hoursdata?Thesemeasurementsdovary.Howdotheyvary?Whatistherangeofvalues?What isthepatternofvariation?

     



Findthesmallestvalue=____1______andlargestvalue=_____11_______  Takethisoverallrangeandbreakitupintointervals(ofequalwidth). Whatmightbereasonablehere? Perhapsby2’s;butweneedtowatchtheendpoints. 



7

SummaryTable:  

Class

Frequency (orcount)

RelativeFrequency (orproportion)

Percent

[0,2]

1

1/86=0.012

1.2%

(2,4]

2

2/86=0.023

2.3%

(4,6]

12

0.139

13.9%

(6,8]

56

0.651

65.1%

(8,10]

14

0.163

16.3%

(10,12]

1

0.012

1.2%







watch endpoints‐‐ different softwarewill dodifferent endpoints W/tablewe canreadily drawa histogram



  Graphforquantitativedata=Histogram: 





Note:ifwedivide ‘count’by86, wewouldhave proportionbut ‘picture’would looksame.

  















  Note:eachbarrepresentsaclass,andthebaseofthebarcoverstheclass.  TheabovetableandhistogramshowthedistributionofthisquantitativevariableSLEEPHOURS,  thatis,theoverallpatternofhowoftenthepossiblevaluesoccur. Remembertolabelaxesandaddsomevalues!



8

RHistograms(defaultontheleftandcustomizedontheright):

  

  Allimages





9

Howtointerpret?  LookforOverallPattern Threesummarycharacteristicsoftheoveralldistributionofthedata…  Shape(approximatelysymmetric,skewed,bell‐shaped,uniform)           





Location(center,average) Approximatelythemiddlevalueorwhereitwouldbalance

 

Spread(variability) Range(overallandthenwheremostoftheobservationsare)

 



LookfordeviationsfromOverallPattern Outliers=adatapointthatisnotconsistentwiththebulkofthedata. Outliersshouldnotbediscardedwithoutjustification. 

DescribethedistributionforSLEEPHOURS: Approximatelybell‐shaped,symmetricdistribution,unimodal, centeredaround7hours,withmostvaluesbetween4and10hours. Noapparentoutliers.

   

Whatif…youhadsomedataandyoumadeahistogramofitanditlookedlikethis… 

Count

 Whatwouldittellyou?

Response Wewouldcallthisabimodaldistribution.Thereappearstobetwosubgroupsofobservations.It wouldbebesttoinvestigatewhy–(e.g.maybeM/ForOLD/YOUNG).Itmayleadtoanalyzingdata separatelyforeachgroup.



10



Othercomments–  NOSPACEBETWEENBARS!Unlesstherearenoobservationsinthatinterval.  HowManyClasses?Useyourjudgment:generallysomewherebetween6and15 intervals.  Bettertouserelativefrequenciesontheyaxiswhencomparingtwoormoresetsof observations.  Softwarehasdefaultsandmanyoptionstomodifysettings.

 OneMoreExample: 

AstudywasconductedinDetroit,Michigantofind out the number of hours children aged 8 to 12 Howmany yearsspentwatchingtelevisiononatypicalday. areinfirst  class?3 Alistingofallhouseholdsinacertainhousingarea having children aged 8 to 12 years was first constructed.  Out of the 100 households in this listing, a random sample of 20 households was selectedandallchildrenaged8to12yearsinthe selectedhouseholdswereinterviewed.  Thefollowinghistogramwasobtainedforallthe childrenaged8to12yearsinterviewed.   a. Complete the sentence: Based on this histogram,thedistributionofnumberofhours spentwatchingTVisunimodal, 

withaslightskewnesstothe___left____. 

b. Assumingthatallchildreninterviewedare representedinthehistogram,whatisthetotalnumberofchildreninterviewed?

3+6+9+10+4=32 

c. Whatproportionofchildrenspentlessthan2hourswatchingtelevision?

(3+6)/32=0.281orabout28% 

d. Canyoudeterminethemaximumnumberofhoursspent watching television byoneofthe interviewedchildren?Ifso,reportit.Ifnot,explainwhynot.

No,itissomewherebetween4and5, buttheexactvalueisnotknownforsure.

  

 

11

NumericalSummariesofQuantitativeVariables 

Wehavediscussedsomeinterestingfeaturesofaquantitativedatasetandlearnedhowtolook fortheminpictures(graphs).Section2.5focusesonnumericalsummariesofthecenterandthe spreadofthedistribution(appropriateforquantitativedataonly). 

Notationforagenericrawsetofdata: x1,x2,x3,…,xnwheren=#itemsinthedatasetorsamplesize 

DescribingtheLocationorCenterofaDataSet Twobasicmeasuresoflocationorcenter: 



Mean‐‐thenumericalaveragevalue Werepresentthemeanofasample(calledastatistic)by…

x 

x1  x 2      xn  n

x n

i



Median‐‐themiddlevaluewhendataarrangedfromsmallesttolargest.

nodd:M=middleobs;neven:M=avgoftwomiddleobservations 

TryIt!FrenchFries Weightmeasurementsfor16smallordersofFrenchfries(ingrams). 78 72 69 81 63 67 65 75  79 74 71 83 71 79 80 69 

Whatshouldwedowithdatafirst?Graphit!

 Basedonourhistogram,the distributionofweightis unimodalandapproximately symmetric, socomputingnumericalsummariesisreasonable.Theweights(ingrams)rangefromthe60’sto thelower80’s,centeredaroundthelower70’s.



12

1. Computethemeanweight. 78  72  69    80  69 x Does73.6makesense?(yes–lookathistogram)Would83?(no) 16  73.5 grams 2. Computethemedianweight. Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83

 

(n+1)/2=(16+1)=8.5soavg8thand9thobservations=>(72+74)/2=73 Note:½areabove73and½arebelowit.



3. Whatifthesmallestweightwasincorrectlyenteredas3gramsinsteadof63grams? Medianwouldstaythesame.Meanwoulddecrease. 

Note: Themeanis____sensitivetoextremeobservations.  Themedianis________resistanttoextremeobservations.  Mostgraphicaldisplayswouldhavedetectedsuchanoutlyingvalue.

Somedianbetter ifoutliersor stronglyskewed.



SomePictures:MeanversusMedian





13

DescribingSpread:RangeandInterquartileRange  Midtermsarereturnedandthe“average” wasreportedas76outof100. Youreceivedascoreof88. Howshouldyoufeel?Happytojustbeaboveaverage?  Oftenwhatismissingwhenthe“average”ofsomethingisreported,isacorrespondingmeasure of spread or variability. Here we discuss various measures of variation, each useful in some situations,eachwithsomelimitations. 



Range:

Measuresthespreadover100%ofthedata. Range=Highvalue–Lowvalue=Maximum–Minimum Percentiles: Thepthpercentileisthevaluesuchthatp%oftheobservationsfallatorbelow thatvalue. 



SomeCommonpercentiles: Median:  50thpercentileQ2or .50 th Firstquartile: 25 percentileQ1or .25(medianofvaluesbelowmedian) th Thirdquartile: 75 percentileQ3or .75(medianofvaluesabovemedian) 

FiveNumberSummary:



VariableNameandUnits



(n=numberofobservations)

Median Quartiles Extremes

M

Q1 Min

Q3 Max

Provides a quick overview of the data values and information about the center and spread. Dividesthedatasetintoapproximatequarters. 

InterquartileRange: Measuresthespreadoverthemiddle50%ofthedata. 

Tryit!FrenchFriesData Ordered:63,65,67,69,69,71,71,72,74,75,78,79,79,80,81,83 

Findthefive‐numbersummary:Providesmeasuresoflocationandspread



WeightofFries(ingrams)



(n=16orders)

Median



73



Quartiles

69



79

Extremes

63



83



Range:83–63=20grams   





IQR: 79–69=10grams

14

IQR=Q3–Q1

 AndconfirmingthesevaluesusingRwehave: 

> numSummar y( Fr enchFr i es[ , " Wei ght " ] , st at i st i cs=c( " mean" , " sd" , " I QR" , + " quant i l es" ) , quant i l es =c( 0, . 25, . 5, . 75, 1) )

me a n s d I QR 0 % 2 5 % 50 % 75 % 1 0 0 % n 73. 5 6. 0663 10 63 6 9 7 3 79 83 16 

Example:TestScores The five‐number summary for the distribution of test scores for a very large math class is providedbelow:



TestScore(points)



(n=1200students)

Median Quartiles Extremes

58 46 34

78 95

1. Whatisthetestscoreintervalcontainingthelowest¼ofthestudents?

34to46points 

2. Supposeyouscoreda46onthetest.Whatcanyousayaboutthepercentageofstudentswho scoredhigherthanyou?

75% 

3. Supposeyouscoreda50onthetest.Whatcanyousayaboutthepercentageofstudentswho scoredhigherthanyou?

Between50%and75% 

4. Ifthetop25%ofthestudentsreceivedanAonthetest,whatwasthe...


Similar Free PDFs