Ch.3 PPDAC (Summary) PDF

Title Ch.3 PPDAC (Summary)
Author Sarah Kim
Course Statistics
Institution University of Waterloo
Pages 4
File Size 166.1 KB
File Type PDF
Total Downloads 2
Total Views 154

Summary

Summary of PPDAC...


Description

% • • • • • •



The$elements$of$the$Problem$address$questions$starting$with$“What”$ What$group$of$things$or$people$do$we$want$to$draw$conclusions$about$(target% population)?$ What$variates$need$to$be$defined?$ What$is/are$the$question(s)$that$we$are$trying$to$answer$here?$ What$conclusions$are$we$trying$to$draw?$$ Types$of$problems:$$ o Descriptive:$The$problem$is$to$determine$a$particular$attribute$of$a$population$ § Much$of$the$function$of$official$statistical$agencies$such$as$Statistics$Canada$ involves$problems$of$this$type$(EX.$the$government$needs$to$know$the$national$ unemployment$rate$and$whether$it$has$increased$or$decreased$over$the$past$ month)$$ o Causative:$The$problem$is$to$determine$the$existence$or$non-existence$of$a$causal$ relationship$between$two$variates$$ § In$this$type$of$problem,$the$experimenter$is$interested$in$whether$one$variate$X$ tends$to$cause$an$increase$or$a$decrease$in$another$variate$Y$$ § EX.$Does$taking$a$low$dose$of$aspirin$reduce$the$risk$of$heart$disease$among$ men$over$the$age$of$50?$/$Does$changing$from$assignments$to$multiple$term$ tests$improve$student$learning$in$STAT$231?$$ o Predictive:$The$problem$is$to$predict$the$response$of$a$variate$for$a$given$unit$ § This$is$often$the$case$in$finance$or$in$economics$ § EX.$financial$institutions$need$to$predict$the$price$of$a$stock$or$interest$rates$in$a$ week$or$a$month$because$this$effects$the$value$of$their$investments$ Defining$the$Problem$

o The$first$step$in$describing$the$Problem$is$to$define$the$units$and$the$target$ population$or$target$process$ o The%target%population%or%process$is$the$collection$of$nits$to$which$the$ conclusions$will$apply$ o Variate$is$a$characteristic$associated$with$each$unit$$ o An$attribute$is$a$function$of$the$variates$over$a$population$$ o EX)$Consider$a$survey$of$teenagers$in$Ontario$in$a$specific$week$to$learn$about$ their$smoking$marijuana$behavior$$ § Units:$Teenagers$in$Ontario$at$the$time$of$the$survey$$ § Target$population:$All$teenagers$in$ON$ § Attribute:$Proportion$of$teenagers$in$the$target$population$ § Possible$question$of$interest:$what$proportion$of$teenagers$in$ON$smoke$ marijuana?$ o

EX.$Two$machines$that$fill$cans$of$soft$drink$(pop,$soda,$etc.)$were$being$compared$at$a$ factory.$A$sample$of$cans$being$filled$by$an$old$machine$and$a$new$machine$was$taken,$ and$the$volumes$of$the$soft$drinks$in$the$cans$were$compared.$ § Units:$Individual$cans$ § Target$process:$All$such$cans$filled$now$and$in$the$future$under$current$ operating$conditions$$

§

Attributes$of$interest:$The$average$(mean)$volume,$The$variability$(variance$or$ standard$deviation)$of$the$volumes$for$all$cans$filled$by$each$machine$under$ current$conditions$now$and$into$the$future$(the$target$process)$

% • • •

• • •



$ $ $ $ $ $ $ $ $ $ $ $

The$purpose$of$the$Plan$step$is$to$decide$what%units$we$will$examine$(the$sample),$what%data$ we$will$collect$and$how%we%will%do%so$$ The$Plan$depends$on$the$questions$posed$in$the$Problem$step$$ The%study%population%or%study%process$is$the$collection$of$units$available$to$be$included$in$the$ study$$ o Often$the$study$population$is$a$subset%of%the%target%population$ § In$many$surveys$the$study$population$is$a$list$of$people$defined$by$their$ telephone$number$$ o In$some$cases,$the$study$units$are$not$part$of$the$target$population$ § In$many$medical$applications,$the$study$population$consists$of$laboratory$ animals$whereas$the$target$population$consists$of$humans$à$in$this$case,$the$ study$population:$laboratory$animals,$the$units$in$the$target$population:$humans$ o The$study$population$is$often$not$identical$to$the$target$population$$ o Study%Error:$the$difference$when$the$attributes$in$the$study$population$differ$from$the$ attributes$in$the$target$population$$ Sampling%protocol:$the$procedure$used$to$select$a$sample$of$units$from$the$study$

population$$ Sample%size:$the$number$of$units$sampled$$ o Sample$size$is$usually$a$compromise$between$cost,$availability$and$desired$precision$as$ calculated$using$a$model$$ If$the$attributes$in$the$sample$differ$from$the$attributes$in$the$study$population$then$the$ difference$is$called$sample%error%or%sampling%error$$ o Differing$sampling$protocols$are$likely$to$produce$different$sample$errors$$ o Since$we$do$not$know$the$values$of$the$study$population$attributes,$we$cannot$know$ the$sampling$error$$ Measurement$error$ o When$the$value$of$a$variate$is$determined$for$a$given$unit,$errors$are$often$introduced$ by$the$measurement$system$which$determines$the$value$$ o If$the$measured$value$and$the$true$value$of$a$variate$are$not$identical$the$difference$is$ called$measurement%error$$





For$an$empirical$study$the$Plan$should$indicate$$ 1) The$study$population$ 2) The$sampling$protocol$ 3) The$variates$which$are$to$be$measured$$ 4) The$quality$of$the$measurement$systems$that$are$intended$for$use$$ o Attention$must$be$paid$to$the$various$types$of$error$that$may$occur$and$how$ they$might$impact$the$conclusions$$ EX)$The$math$faculty$were$interested$in$learning$about$how$all$UW$students$taking$ notes$in$lecture.$$ o A:$laptops$and$phones,$B:$hand$written$notes,$C:$combination$of$both,$D:$not$ take$any,$E:$others$$ o They$decided$to$use$students$in$Section$001$STAT$231$to$invest$this$question$ o The$prof$used$a$clicker$question$to$gather$data$from$students$who$attended$a$ particular$class$

$ $ $ $ $ $ $ $ $ $

% •

• •

• • • •

Object$of$the$Data$step:$to$collect$the$data$according$to$the$Plan$ o Any$deviations$from$the$Plan$should$be$noted$ o The$data$must$be$stored$in$a$way$that$facilitates$the$Analysis$ $Mistakes$can$occur$in$recording$or$entering$data$into$a$data$base.$For$complex$investigations,$it$ is$useful$to$put$checks$in$place$to$avoid$these$mistakes$ $In$many$studies$the$units$must$be$tracked$and$measured$over$a$long$period$of$time$(e.g.$ consider$a$study$examining$the$ability$of$aspirin$to$reduce$strokes$in$which$persons$are$followed$ for$3$to$5$years).$This$requires$careful$management$$ When$data$are$recorded$over$time$or$in$different$locations,$the$time$and$place$for$each$ measurement$should$be$recorded$ Departures$from$the$Plan$should$be$recorded$since$the$may$have$an$important$impact$on$the$ Analysis$and$Conclusion$$ In$some$studies,$the$amount$of$data$may$be$extremely$large,$so$data$base$design$and$ management$is$important$$ Missing%Data%and%Response%Bias%% o Missing$data,$whether$due$to$missing$cases$or$due$to$skipped$items,$can$pose$problems$ for$both$choosing$a$statistical$analysis$and$interpreting$results$$ o Bigger$issues:$why$are$there$missing$data?$Do$these$missing$data$represent$a$bias$in$the$ results?$$ o Response%Bias:$respondents$may$not$answer$truthfully$to$survey$questions$$

§ § § §

Illegal$or$unpopular$behavior$such$as$drug$usage$$ Controversial$topics$$ Race$or$gender$of$interviewer$can$influence$answers$about$race$or$genderrelated$questions$$ Respondents$often$have$trouble$remembering$past$events$$

% • •

We$discussed$different$methods$of$summarizing$the$data$using$numerical$graphical$summaries$$ A$key$step$in$formal$analyses$is$the$selection$of$an$appropriate$model$that$can$describe$the$data$ how$it$was$collected$$

$

% • •

The$purpose$of$the$conclusion$step$is$to$answer$the$questions$posed$in$the$Problem$$ The$limitations$of$the$study$must$also$be$described$$...


Similar Free PDFs