Data handling report, general science PDF

Title Data handling report, general science
Author Ella Douthwaite
Course Science
Institution City College Norwich
Pages 10
File Size 613.5 KB
File Type PDF
Total Downloads 24
Total Views 137

Summary

Merit level report for general science on data handling...


Description

Data$Handling$and$statistics$$Ella$Douthwaite$$ $ Statistics$are$a$mathematical$analysis$that$uses$quantified$models$to$gather,$evaluate$and$ come$to$a$conclusion$from$a$set$of$data,$(Grant&(2019)&Provided&information&on&statistics.)& Statistics$are$an$important$factor$in$making$precise$scientific$discoveries$with$clear$evidence.$ Statistical$tests$are$used$to$test$hypothesis$and$allow$the$reader$to$see$clear$relations$in$ different$sets$of$results$in$different$ways,$in$graphs$for$example.$However,$statistics$can$be$ easily$made$inaccurate$which$is$a$negative$asset,$this$is$because$often$if$one$part$of$the$data$ is$wrong;$it$will$effect$all$of$the$data$and$make$the$whole$set$inaccurate$and$unusable.$ Statistics$are$important$for$medical$use$because$it$can$support$data$for$things$such$as$new$ drugs,$it$can$also$show$trends$in$certain$medical$practises$and$see$if$they$are$working$as$well$ as$once$thought.$$$ $ The$group$pulled$data$was$made$by$taking$the$measurements$of$each$person’s$height$and$ head$circumference$in$the$course,$measured$using$a$metre$ruler$and$a$tape$measure.$The$ data$was$separated$into$male$and$female$results$and$inputted$into$an$excel$sheet.$There$ were$several$different$groups$used,$$Accuracy$during$this$was$important$as$inaccurate$data$ could$lead$to$false$information$being$used.$$ $ $

Mean$Head$Circuference$(mm)$

Mean$Head$Circumference$Of$Females$ 570$ 565$ 560$ 555$ 550$ 545$ 540$ 535$ 530$

Number$of$people$

Mean$Heights$(mm)$

Mean$Average$Female$Heights$ 1670$ 1650$ 1630$ 1610$ 1590$ 1570$ 1550$ n=5$n=10$n=15$n=20$n=35$n=30$n=35$n=40$n=45$n=50$n=55$n=60$n=65$n=70$

Number$of$People$

$

Mean$Heights$(mm)$

Mean$Average$Of$Male$Heights$$ 1800$ 1780$ 1760$ 1740$ 1720$ 1700$

Number$of$people$

!

! !

Mean$Head$Circumference$ (mm)$

Mean$Average$of$Male$Head$Circumference$ 580$ 560$ 540$ 520$ 500$

Number$Of$people$

!

! ! !

Histograms$are$a$simple$way$of$summarising$data$in$a$way$that’s$easy$and$clear$to$read$ quickly,$They$are$useful$for$the$use$of$groups$of$people$who$don’t$have$time$to$work$out$ what$a$graph$is$telling$them;$histograms$are$quickly$readable,$However,$this$does$mean$that$ a$lack$of$information$is$usually$present$as$the$sacrifice$for$using$such$a$simple$image.$$ A$null$hypothesis$gives$an$opposing$idea$that$there$would$be$no$differences$between$the$ two$variables$in$the$data;$the$null$hypothesis$for$this$data$was$that$there$would$be$no$ correlation$between$a$person’s$gender$and$their$head$circumference$or$height.$An$ alternative$hypothesis$is$that$there$is$a$relation$between$a$person’s$gender$and$their$height$ but$not$with$their$head$circumference.$ $ $ Below$is$a$normal$distribution$curve,$which$shows$a$bell$shape$following$the$bars$in$the$ graph.$Both$sides$of$the$mean$will$be$equal$making$the$bell$curve$symmetrical.$$ $ $

! ! ! !

What&is&a&normal&distribution&curve,&Khan&Academy,&Date&Unknown.&&

! Measures$of$central$tendencies$show$where$the$central$point$within$the$data$is,$this$is$found$ by$calculating$the$mean,$median$and$mode$of$the$data,$these$are$calculated$using$formulas$ on$Excel;$mean$(=AVERAGE)$gives$an$average$of$the$data,$median$(=MEDIAN)$gives$the$ middle$value$and$(=MODE)$gives$the$most$frequently$occurring$number$in$the$data.$The$ formula$would$be$inputted$into$a$cell,$the$data$would$be$selected,$press$enter$and$the$result$ of$the$selected$data$will$show.$$To$manually$find$these$answers;$median$is$found$by$putting$ all$the$data$in$numerical$order$and$finding$the$centre$value,$$ The$mode$is$found$by$putting$the$numbers$in$order$and$finding$the$most$frequent$number$ within$the$data,$and$adding$all$the$data$and$dividing$by$the$amount$of$data$to$find$the$mean.$ For$example$for$these$numbers$(2,5,8,12,7,9$)$you$add$them$all$together,$2+5+8+12+7+9$=$ 43.$This$is$then$divided$by$the$amount$of$number$(6)$43/6$=7.2$which$is$the$mean.$ $Seen$below$is$an$example$of$a$graph$made$with$the$mean,$median$and$mode$values$shown.$ $ $ $ $ $ $ $ $ $ $ (Graph$1)$ $ $ $ $ $ $ Skewness$of$data$is$a$measure$of$asymmetry$within$a$histogram,$A$positive$skew$is$seen$by$ the$‘tail’$of$the$graph$on$the$right$side$of$the$mean;$showing$that$the$mode$is$less$than$the$ median,$where$as$with$a$negative$skew$the$‘tail’$is$on$the$left$side;$Which$would$show$the$ mean$is$less$than$the$median.$The$‘tail’$end$of$skewed$data$can$act$as$‘an$outlier$for$the$ statistical$model$and$we$know$that$outliers$adversely$affect$the$model’s$performance$ especially$regression-based$models.’$(Sharma&(2019)&provided&information&on&skewed&data).$ This$particular$graph$is$seen$to$have$a$positive$distribution$since$the$graph$is$skewed$to$the$ right.$(graph$1)$ $ $ Common$Logarithm$(Log$transformation)$can$be$used$over$the$data$to$find$reoccurrences$in$ the$data$making$it$useable$for$statically$models.$$The$value$of$Log10$is$1$and$is$used$to$find$ how$many$times$it$needs$to$times$by$itself$(x)$to$make$y.$Using$transformations$can$reduce$ skewness$In$data,$however$log$10$can’t$always$reduce$the$validity$of$the$data.$The$graph$ below$was$transformed$into$log$10$which$changed$the$central$tendency$$values$creating$a$ normal$distribution$curve$as$seen$in$graph$2.$$ $ $

$ $

$(graph$2)$

Frequency$of$Female$Head$Circumference$ 25$

Fequency$

20$ 15$ 10$ 5$ 0$ 521-530$531-540$541-550$551-560$561-570$571-580$581-590$591-600$611-620$

Class$Interval$

$ Example&of&positive&skewed&data& & & &

! & & & & & & & & & & &(Koehrsen, 2019 provided skewed data graph) & Standard$deviation$shows$‘how$much$the$data$deviates$from$the$mean$of$the$data$set.’$ (Bansal,&date&unknown,&provided&information&on&standard&deviation)&The$smaller$the$ number$shows$that$the$data$points$are$closer$to$the$mean,$where$as$the$higher$the$number$ shows$that$the$data$is$spread$and$there$could$be$‘outliers’$in$the$data$set.$In$excel,$=STEV.S$ is$used$to$ignore$the$text$and$logical$data.$Standard$deviation$is$manually$calculated$by$

finding$the$mean,$the$square$of$each$data$points$distance$to$the$mean,$sum$the$values,$ divide$that$sum$by$the$number$of$data$points$and$find$the$square$root.$$ An$example$of$ manually$calculated$standard$of$this$set$of$data:$6,5,8,3,6,7,9,2,1,8$ 1. Find$the$mean$ 2. $(6+5+8+3+6+7+9+2+1+8/10$=$5.5)$$ 3. Subtract$the$mean$from$each$of$the$numbers$and$square$it$ 6-5.5=$0.5$squared$=$0.25$ 5-5.5=$-squared$=$-0.25$ 8-5.5$=$3.5$squared$=$12.25$ 3-5.5$=-2.5$squared$=$-6.25$ 6-5.5=$0.5$squared$=$0.25$ 7-5.5=$1.5$squared$=$2.25$ 9-5.5$=$3.5$squared$=$12.25$ 2-5.5$=$-3.5$squared$=$-12.25$ 1-5.5$=$-4.5$squared$=$-20.25$ 8-5.5$=$2.5$squared$=$6.25$ 4. Add$up$the$results$$ 0.25+$-0.25$+12.25$+-6.25$+0.25+2.25$+12.25$+12.25+20.25+6.25$=59.75$ $$$$$$$5.$Divide$by$n$$ $$$$$$$$$$$=$59.75/10$=$5.98$ 5. Find$the$square$root$of$that$answer$=$2.45$is$the$standard$deviation.$ $ $ $

$ $ $ Dependant$and$independent$samples$t-tests$are$known$as$parametric$tests,$the$t$score$is$ the$ratio$difference$between$two$sets$of$data$and$the$difference$between$the$data.$The$ higher$the$t$score$means$there’s$more$of$a$difference$between$the$data$and$the$smaller$the$ score$means$more$similarities.$(Definition&and&example,&Statistics&How&To,&2020)& Pearson’s$Correlation$Coefficient$(pcc)$is$used$for$looking$for$similarities$between$two$ variables.$Shown$below$is$an$example.$PCC$is$often$known$as$the$best$method$to$use$for$ measuring$similarities$between$two$variables.$PCC$can$look$into$behaviour$that$you$wouldn’t$ ordinarily$be$able$to,$but$can$also$gain$quantitative$data,$which$is$simpler$to$analyse.$ However;$PCC$can’t$show$the$effects$and$cause$or$control$third$party$variables,$which$can$ interrupt$the$correlation.$Shown$below$is$the$PCC$graph,$which$shows$that$because$the$data$ is$close$together$there$is$a$correlation$between$the$height$and$cranial$size$however$there$ are$some$anomalies,$the$best$fit$line$shows$there$is$a$correlation.$ &

Comparison$of$Female$Head$Circumference$and$Height$ (mm)$ R²$=$0.14241$

Height(mm)$

620$ 600$ 580$ 560$ 540$ 520$ 1500$

1550$

1600$

1650$

1700$

1750$

1800$

1850$

Head$Circumference$(mm)$

& $ $ $ $ $

Head$Circumference$(mm)$

Comparrison$of$Male$Head$Circumference$and$ Height$(mm)$ 660$

R²$=$0.01314$

640$ 620$ 600$ 580$ 560$ 540$ 520$ 1600$

1650$

1700$

1750$

1800$

1850$

1900$

1950$

2000$

Height$(mm)$

$ $ $ Spearman$Rank$Correlation$Is$a$non-parametric$statistical$test$shown$as$a$scatter$graph$and$ can$only$compare$2$sets$of$data.$No$correlation$of$the$points$would$look$like$spaced$out$data$ with$no$obvious$trend,$where$as$positive$and$negative$correlation$do$have$a$clear$trend.$ Below$shows$an$example$of$a$positive$correlation$graph.$The$null$hypothesis$for$this$data$ was$that$there$is$no$significant$correlation$between$the$assessments$of$the$two$doctors,$this$ hypothesis$was$proven$false.$The$rs$value$being$0.857$relates$to$the$critical$values$table$ along$the$number$of$pairs$(8),$against$the$significance$level$p=0.05$(5%)$is$greater$than$ 0.738,$this$shows$that$the$correlation$is$significant.$

$ $ $ $ $ $

$ (Table$1)$

$ &

! ! ! ! ! ! ! ! ! ! ! $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $

$ $ $ An$example$of$a$paired$t-test$is$as$follows.$A$study$was$started$to$investigate$whether$ standing$or$supine$effect’s$systolic$blood$pressure$and$compared$the$differences.$12$people$ were$used$for$this$study$and$their$blood$pressure$was$taken$in$both$positions.$Because$the$ data$have$a$clear$similarity;$this$makes$it$a$paired$t-test.$Null$hypothesis$is$that$there$is$no$ difference$of$the$mean$blood$pressure$in$the$two$populations.$Alternatively,$There$is$a$ difference$between$the$mean$blood$pressure$in$the$two$populations.$$To$conclude$the$null$ hypothesis$was$correct.$ !

! ! ! ! !!! ! $ $ $ $ $ $ $ $ $ $ $ $ $ $

$ $ $ $ $ $ $

$ $ $ In$conclusion,$my$research$represents$that$the$head$circumference$and$heights$of$either$ gender$fluctuates$regardless$of$gender,$however$it$is$shown$that$female$heights$did$ fluctuate$slightly$more$compared$to$the$male$heights,$evident$also$in$the$head$ circumference$graphs.$The$alternative$hypothesis$is$proven$to$be$correct$due$to$this$as$there$ was$a$correlation$between$the$gender$of$a$person$and$their$height$but$not$their$head$ circumference.$ If$this$research$was$repeated$it$should$be$noted$that$it$may$be$beneficial$to$change$certain$ aspects$of$the$method,$such$as$having$the$same$person$measuring$for$all$groups$as$it$means$ each$head$and$height$would$have$been$measured$the$same.$The$same$ruler$would$also$need$ to$be$used$to$make$sure$the$measurements$aren’t$slightly$different$due$to$the$print$size$on$ the$ruler.$However,$using$this$method$was$simple$and$easy$to$use;$saving$time$which$was$ beneficial.$Accuracy$during$this$was$important$as$inaccurate$data$could$lead$to$false$ information$being$used.$ Graphs$have$the$ability$to$manipulate$data,$which$can$back$up$a$certain$bias,$which$can$ represent$data$differently$to$the$actual$data.$ Statistics$can$be$easily$made$inaccurate$which$is$a$negative$asset,$this$is$because$often$if$one$ part$of$the$data$is$wrong;$it$will$affect$all$of$the$data$and$make$the$whole$set$inaccurate$and$ unusable.$Statistics$are$important$for$medical$use$because$it$can$support$data$for$things$ such$as$new$drugs,$it$can$also$show$trends$in$certain$medical$practises$and$see$if$they$are$ working$as$well$as$once$thought.$$$ Personally$I$do$not$think$this$research$was$completed$accurately$and$would$not$recommend$ the$use$of$it$because$it$wasn’t$definite$that$the$same$person$was$measuring$each$volunteer,$ not$all$volunteers$were$accountable$for$and$measurements$were$only$taken$once$which$ could$lead$to$mistakes$and$inaccurate$data$from$the$start.$Ideally$measurements$should$ have$been$taken$3$times$each$to$insure$they$were$the$same$every$time$$ $ $ $

References$ $ Grant, M. (2019) Statistics. Available at:https://www.investopedia.com/terms/s/statistics.asp (Accessed:15 January 2021) What is normal distribution (Date unknown) Available at: https://www.khanacademy.org/math/statistics-probability/modeling-distributions-of-data/normaldistributions-library/a/normal-distributions-review (Accessed:15th January 2021). Sharma, R (2019) Skewed Data: A problem to your statistical model. Available at:https://towardsdatascience.com/skewed-data-a-problem-to-your-statistical-model9a6b5bb74e37 (Accessed: 15th January 2021) Koehrsen, W (2019) Age at death in Australia, a negatively skewed distribution, Available at:https://towardsdatascience.com/how-90-of-drivers-can-be-above-average-or-why-you-need-tobe-careful-when-talking-statistics-3df7be5cb116 (Accessed: 16th January 2021) Bansal, S, (Date unknown) .How to calculate standard deviation in Excel. Available at:https://trumpexcel.com/standard-deviation/ (Accessed: 16th January 2021). Statistics How to (2020) Definition and examples. Available at: https://www.statisticshowto.com/probability-and-statistics/t-test/(Accessed: 16th january 2021). Advantages and Disadvantages of using a correlation (2015), Available at getrevising.co.uk, (accessed 27/4/2021)

$...


Similar Free PDFs