Title | 26134 uts business statistics high distinction |
---|---|
Course | Business Statistics |
Institution | University of Technology Sydney |
Pages | 45 |
File Size | 5.3 MB |
File Type | |
Total Downloads | 70 |
Total Views | 181 |
Download 26134 uts business statistics high distinction PDF
26134–BusinessStatistics
Lecture1–IntroductiontoStatistics,DataandGraphingData LearningObjectives
HowStatisticsisusedinbusiness Thetypesofdata–qualitativevariableandquantitativedata. Measurementpropertiesofdata–Distinguishamongthenominal,ordinal,interval,and ratiolevelsofmeasurement Graphicalpresentationofdata
Chapter1–IntroductiontoStatistics TYPESOFSTATISTICS Statisticsisabranchofmathematicsthattransformsdataintousefulinformationfor decisionmakers. Descriptivestatistics–collecting,summarisinganddescribingdata. Inferentialstatistics–drawingconclusionsand/ormakingdecisionsconcerningapopulation basedonlyonsampledata. PROCESSOFDESCRIPTIVESTATISTICS Collectdata ‐ E.g.Survey Presentdata ‐ E.g.Tablesandgraphs Characterisedata ∑
‐ E.g.Samplemean= PROCESSOFINFERENTIALSTATISTICS Estimation ‐ E.g.estimatethepopulationmeanweightusingthesamplemeanweight. Hypothesistesting ‐ E.g.testtheclaimthatthepopulationmeanweightis100kgs Drawingconclusionsaboutalargegroupofindividualsbasedonasubsetofthelarge group. 1.1. STATISTICSINBUSINESS Statisticsisamathematicalscienceconcernedwiththecollection,presentation,analysisand interpretationorexplanationofdata. TheaimofBUSINESSSTATISTICSistoextractthebestpossibleinformationfromdataand useittomakebusinessdecisions. BASICVOCABULARYOFSTATISTICS Apopulationisacollectionofallpossibleindividuals,objects,ormeasurementsofinterest. Asampleisaportion,part,orsubsetofthepopulationofinterest. POPULATIONVERSUSSAMPLE Measuresusedtodescribethepopulationarecalledparameters. Measurescomputedfromsampledataarecalledstatistics. 1.2. BASICSTATISTICALCONCEPTS Apopulationisacollectionofobjects(oftencalledunitsorsubjects)ofinterest. E.g.allsmallbusinesses,allworkerscurrentlyemployedbyBHPBilliton,etc. Collectionofdataonawholepopulationiscalledacensus. Asampleisasubsetoftheunitsinapopulation. Asamplecanbeexpectedtoberepresentativeofthewholepopulation. Therearetwostepsinanalysingdatafromasample–EDAandstatisticalinference.
Page1
26134–BusinessStatistics
1. Exploratorydataanalysis(EDA)isthefirststep,inwhichnumerical,tabularandgraphical summaries(suchasfrequencytables,means,standarddeviationsandhistograms)ofdata areproducedtosummariseandhighlightthekeyaspectsoranyspecialfeaturesofthedata. 2. Astatisticalinferenceisaninferencebasedonaprobabilitymodellinkingthedatatothe population. ‐ Statisticalinferenceusessampledatatoreachconclusionsaboutthepopulationfromwhich thesamplewasdrawn.Aninferenceisaconclusionthatpatternsobservedinthedata (sample)arepresentinthewiderpopulationfromwhichthedatawerecollected. Asanexample,inpharmaceuticalresearch,testsmustbelimitedtoasmallsampleof patientssincenewdrugsareexpensivetoproduce. Researchersdesignexperimentswithsmall,representativesamplesofpatientsand drawconclusionsaboutthewholepopulationusingstatisticalinferencetechniques. Adescriptivemeasureofthepopulationiscalledaparameter.DenotedbyGreekletters. Examplesarepopulationmean,populationstandarddeviationandpopulationvariance. Adescriptivemeasureofasampleiscalledastatistic.DenotedbyRomanletters. Examplesofstatisticsaresamplemean,samplestandarddeviationandsamplevariance. FOURLEVELSOFMEASUREMENTOFDATA
1.3. o o ‐ ‐ o o
Eachlevelbuildson(e.g.intervalhasbothordinalandnominalcharacteristics). Ascendingfromleastnumericaltomostnumerical(i.e.ratioismostnumerical). Nominallevel–datathatisclassifiedintonon‐overlappingcategoriesandcannotbe arrangedinanyparticularorder/sorted. E.g.eyecolour,gender,brandofTV Ordinallevel–datathatisclassifiedintodistinctnon‐overlappingcategoriesinwhich rankingisimpliedi.e.datacanbearrangedinsomeorder/sorted. E.g.TestperformanceisgradedasHD,D,C,PorF(descendingorder). Intervallevel–isanorderedscaleinwhichthedifferencebetweenmeasurementsisa meaningfulquantitybutthemeasurementsdonothaveatruezeropoint. E.g.TemperatureontheFahrenheitscale,shoesize Notruezero–forexample,0degreesDOESNOTequalto0Fahrenheit. Ratiolevel–theintervallevelwithaninherentzerostartingpoint.Differencesandratios aremeaningfulforthislevelofmeasurement.“Zero”issignificant(meaningful). E.g. price,distancetravelledtolecturehall,timetakentoreachthelecturehall. Meaningfulzero–forexample,0kilogramsequalsto0pounds. TYPESOFDATA Categoricaldataissimplyanidentifierorlabelandhasnonumericalmeaning. Categoricaldatacanbefurthersub‐classifiedasnominalorordinal. Numericaldatahaveanaturalorderandthenumbersrepresentsomequantity. Numericaldatacanbesub‐classifiedasdiscreteorcontinuous. Discretedatatypeiswherewecanlistthepossiblevalues. ContinuousdatatypeiswherewecangiveonlyaRANGEofpossiblevaluesforthedata. Datathatarecollectedatafixedpointintimearecalledcross‐sectionaldata. Suchdatagiveasnapshotofthemeasuredvariablesatthatpointintime. Oftendataarecollectedovertimeandsuchdataarecalledtime‐seriesdata. Unlikecross‐sectionaldata,time‐seriesdataaretimedependent. Page2
1.4.
26134–BusinessStatistics
OBTAININGDATA Datacollectedtoaddressaspecificneedareknownasprimarydata. Datathatwerecollectedforsomeotherpurposeandarealreadyavailableareknownas secondarydata.
Chapter2–ChartsandGraphs GRAPHICALREPRESENTATIONOFQUALITATIVE/CATEGORICALDATA PieChart:Acirculardisplayofdatawheretheareaofthewholepierepresents100%ofthe databeingstudiedandslicesrepresentapercentagebreakdownofthesublevels. BarChart:Agraphinwhichabarshowseachcategory,thelengthofwhichrepresentsthe amount,frequencyorpercentageofvaluesfallingintoacategory.
GRAPHICALREPRESENTATIONOFQUANTITATIVE/NUMERICALDATA Histogram:Atypeofverticalbarwheretheareaofeachbarisequaltothefrequencyofthe correspondinginterval. ScatterPlot:Aplotorgraphofpairwisedatafromtwocontinuousvariables,toexplorethe relationshipbetweenthem.
SUMMARY–PRESENTINGDATA
DATACOLLECTION Cross‐sectiondataiscollectedatonepointintime. Example:Analysisofstockpriceon14thApril2000 Time‐seriesdataiscollectedovertime. Example:Collectingandanalysingstockpriceon14thAprilforyears–2000,2001…2008!
Page3
2.1.
26134–BusinessStatistics
FREQUENCYDISTRIBUTIONS Rawdata,ordatathathavenotbeensummarisedinanyway,aresometimesreferredtoas ungroupeddata. Frequencydistributionsareaconvenientwaytogroupcontinuousdata. Afrequencydistributionisasummaryofthatdatapresentedasnon‐overlappingclass intervalscoveringtheentirerangeofdataandtheircorresponding frequencies. Datathathavebeenorganisedintoafrequencydistributionarecalledgroupeddata.
ClassInterval Frequency 0–500 6 500–1000 9 1000–1500 7
Figure–FrequencyDistributionTable ClassMidpoint RelativeFrequency CumulativeFrequency 250 0.27 6 750 0.41 15 1250 0.32 22
Therangeisdefinedasthedifferencebetweenthelargestandsmallestdatavalues. Themidpointofeachclassiscalledtheclassmidpoint,alsosometimesknownasclassmark. Relativefrequencyistheratioofthefrequencyoftheclassintervaltothetotalfrequency. Thecumulativefrequencyistherunningtotaloffrequenciesthroughtheclassesofa frequencydistribution. 2.2. GRAPHICALDISPLAYOFDATA Ahistogramisaverticalbarchart,wheretheareaofthebarisequaltothefrequencyofthe correspondinginterval. Histogramsarethemostusefulandcommongraphsfordisplayingcontinuousdata. o Afrequencypolygonisagraphconstructedbyplottingadotforthefrequenciesattheclass mid‐pointsandconnectingthedots. o Anogiveisacumulativefrequencypolygon. Ogivesaremostusefulwhenthedecisionmakerwantstoseerunningtotals. Apiechartisacirculardisplayofdatawheretheareapfthewholepierepresents100%of thedatabeingstudiedandslicesrepresentpercentagecontributionsofthesublevels. UsedtodisplayCATEGORICALdataandshowsrelativemagnitudesofpartstothewhole. Anotherwaytodisplaycontinuousdataisbyusingastemandleafplot. ‐ Theleafistherightmostdigitofthedata,andtherestofthedigitsformthestem. Paretochartisaverticalbarchartthatdisplaysthemostcommontypesofdefects,rankedin orderofoccurrencefromlefttoright. ‐ Aspecialwayofdisplayingcategoricaldata. Ascatterplotisagraphofpairwisedatafromtwocontinuousvariables. ‐ Thescatterplot(alsocalledascattergraphorscattergram)isagraphicaltechniquefor qualitativelyexploringtherelationshipbetweentwocontinuousvariables.
Lecture2–DescriptiveStatistics(NumericalMeasures) LearningObjectives 1. CalculateMeasuresoflocation/tendency:mean,medianandmode 2. Explainthecharacteristics,uses,advantages,anddisadvantagesofeachmeasureof location/tendency. 3. Computeandinterpretthemeasureofdispersion/spread:range,variance,andstandard deviation,Interquartilerange(IQR),Coefficientofvariation(CV) 4. Applicationofmeasuresoftendencyandmeasuresofdispersion
Page4
26134–BusinessStatistics
Chapter3–DescriptiveSummaryMeasures
MEASUREOFCENTRALTENDENCY Measuresofcentraltendencyyieldinformationaboutthecenterofasetofnumbers–e.g. themiddle,oraveragevalue CommonMeasuresofCentralTendencyare: Mode(nominal,ordinal,interval,ratio) Median(ordinal,interval,ratio) Mean(interval,ratio) MEASUREOFCENTRALTENDENCY–MEAN Thearithmeticmeanisthemostwidelyusedmeasureoflocation. ‐ PROPERTIES Istheaverageofasetofnumbers Applicableforintervalandratiodata Affectedbyeachvalueinthedatasetincludingextremevalues. Itiscalculatedbysummingthevaluesanddividingbythenumberofvalues. Populationmean(mu)
Samplemean(xbar)
Page5
26134–BusinessStatistics MEASUREOFCENTRALTENDENCY–MEDIAN Medianisthemiddlevalueofasetofnumbersaftertheyhavebeenarrangedinanorder (ascendingordescending). ‐ PROPERTIESOFTHEMEDIAN Applicableforordinal,interval,andratiodata Unaffectedbyextremelylargeandextremelysmallvalues(outliers) Medianposition=
MEASUREOFCENTRALTENDENCY–MODE Modeisthemostfrequentlyoccurringvalueinadataset.Itisapplicabletoalllevelsofdata measurement(nominal,ordinal,interval,andratio).Youcanhavemultiplemodes.
PRACTICEEXERCISE#1
1) Mean=479/8=59.9 Medianposition=(n+1)/2=8+½=4.5.Therefore,median=50+51/2=50.5 Mode=47(appearstwice) 2) MEDIAN–Halfofthesalariesarebelow$50,500andhalfthesalariesareabove$50,500. MODE–Themostfrequentlyoccurringannualsalaryis$47,000. 3) Bestmeasureofuse–MedianandMode Meanisaffectedbyeachvalueinthedatasetincludingextremevalueswhereasthemedian andmodedonotchangeverymuch. MEASUREOFDISPERSION/SPREAD Measuresofdispersiondescribethespread/variabilityofasetofdata CommonMeasuresofVariability • Range(needtosortdatainascendingorder) • InterquartileRange(IQR)(needtosortdatainascendingorder) • Variance • StandardDeviation • Coefficientofvariation(CV) • zScores RANGE RANGE:Thedifferencebetweenthelargestandthesmallestvaluesinasetofdata.
Page6
26134–BusinessStatistics
INTERQUARTILERANGE(IQR) Interquartilerangeistherangeofvaluesbetweenthefirstandthirdquartiles.IQR=Q3‐Q1 Rangeofthe“middlehalf”i.e.middle50%ofdata Firstquartile(Q1)equals25thpercentilei.e.atleast25%ofthedataliebelowthisvalue. Thirdquartile(Q3)equals75thpercentilei.e.atleast75%ofthedataliebelowthisvalue. Lessinfluencedbyextremesandusefulmeasureforordinaldata. ‐ NOTE:SeeappendixonhowtocalculateQ1andQ3. InterpretationforIQRis–‘’the valuesofthemiddle50%ofsample orpopulationspanarangeofthe IQR.’’
GRAPHICALTECHNIQUETOEXAMINESPREADOFDATA:BOXPLOTS GraphicallyBOXPLOTSareausefulvisualrepresentationofdataandrepresentthe5 number‘summary’ofdataincludingIQR. 5numbersummaryis: Smallestvalue(Min)indata Greatestvalue(Max) FirstQuartile(Q1) Median(Q2) ThirdQuartile(Q3) VARIANCEANDSTANDARDDEVIATION VARIANCEistheaverageofthesquareddeviationsfromthearithmeticmean. STANDARDDEVIATIONisthesquarerootofthevariance. Forpopulationswhosevaluesaredispersedfromthemean,thepopulationvarianceand standarddeviationwillbelarge.
Page7
26134–BusinessStatistics
Interpretationforstandard deviationissimply–‘’the‘deviation’ ofindividualvaluesfromthe averagevalue.’’
COEFFICIENTOFVARIATION Coefficientofvariation(CV)isdefinedastheratioofthestandarddeviationtothemean, expressedasapercentage. Measurementofrelativedispersionandusedtocomparestandarddeviation/variabilityof datasetswithdifferentmeans.
PRACTISEEXERCISE#2 1. Whichisamoreusefulmeasureofunderstandingvariability–varianceorstandard deviation?Why? ‐ StandardDeviation(reason–UNITS) 2. Whichisthebestmeasuretocomparethevariabilityofthetwostocks?Why? ‐ Asseeninthetablebelow,bothstockshavedifferentmeanssoitisbettertouseCV (COEFFICIENTOFVARIATION)overSDorvariance.
Page8
26134–BusinessStatistics
Z‐SCORE Az‐scorerepresentsthenumberofstandarddeviationsavalue(x)isaboveorbelowthe meanofasetofnumbers. ‐ ‐ ‐
Interpretationofz‐score: Anegativez‐scoreindicatesthattheitem/elementisbelowaverage. Apositivez‐scoremeansthattheitem/elementisaboveaverage. Az‐scoreof0impliesx=µ!
APPENDIX APPENDIX:PERCENTILE Measuresofcentraltendencythatdivideasetofdatainto100parts. Atleastn%ofthedataliebelowthenthpercentile,andatmost(100–n)%ofthedatalie abovethenthpercentile ‐ Example–90thpercentileindicatesthatatleast90%ofthedataliebelowit,andatmost 10%ofthedatalieaboveit. Applicableforordinal,interval,andratiodatabutnotapplicablefornominaldata. ‐ 25thpercentile=firstquartile(Q1) ‐ Themedianandthe50thpercentilehavethesamevalue ‐ 75thpercentile=thirdQuartile(Q3)
Page9
Example Levelof measurement Measureof centraltendency
26134–BusinessStatistics
QUIZ1PREPARATION–SUMMARY Levelof Shoesize satisfaction NOMINAL ORDINAL INTERVAL
Colour
MODE
MODE,MEDIAN
MODE, MEDIAN, MEAN Standard Deviation
FBpagevisits dailybyagirl RATIO MODE, MEDIAN, MEAN Standard Deviation
N/A IQR(orrange) Measureof variability 3.1.MEASURESOFCENTRALTENDENCY Measuresofcentraltendencyyieldinformationaboutthecentre,ormiddlepart,ofasetof numbers. Themodeisthemostfrequentlyoccurringvalueinasetofdata. Inthecaseofatieforthemostfrequentlyoccurringvalue–bimodal. Datasetswithtwoormoremodesarereferredtoasmultimodal. Themedianisthemiddlevalueinanorderedarrayofnumbers. Thearithmeticmeanistheaverageofasetofnumbersandiscomputedbysummingall numbersanddividingthesumbythecountofnumbers. 3.2.MEASURESOFLOCATION Measuresoflocationyieldinformationaboutcertainsectionsofasetofnumberswhen rankedintoanascendingarray. Percentilesaremeasuresoflocationthatdivideasetofdatasothatacertainfractionof datacanbedescribedasfallingonorbelowthislocation. Where: i=thepercentilelocation P=thepercentileofinterest n=thenumberof observationsinthedataset. Quartilesaremeasuresoflocationthatdivideasetofdataintofoursubgroupsorparts. 3.3.MEASURESOFVARIABILITY Measuresofvariabilitydescribethespreadordispersionofasetofdata. Therangeisthedifferencebetweenthemaximumandminimumvaluesofadataset. Theinterquartilerangeisthedistancebetweenthefirstandthirdquartiles. IQR=Q3–Q1,essentiallyitistherangeofthemiddle50%ofthedata. Subtractingthemeanfromeachvalueofdatayieldsthedeviationfromthemean. Thevarianceistheaverageofthesquareddeviationsfromthemeanforasetofnumbers. Thestandarddeviationisthesquarerootofthevariance. Theempiricalruleisanimportantruleofthumbthatisusedtostatetheapproximate percentageofvaluesthatliewithinagivennumberofstandarddeviationsfromthemeanof asetofdataifthedataarenormallyorapproximatelynormallydistributed. Normaldistributionisaunimodalandsymmetricaldistribution(bell‐shapeddistribution). ‐ UsefulapplicationofEMPIRICALRULEisindetectingpotentialoutliers.Outliers are observationsthatareunusuallylargeorsmallvaluesthatappeartobeinconsistentwiththe restofthedata. ‐ Empiricalrulesuggeststhat,fornormallydistributeddata,nearlyall(99.7%)observations shouldfallwithinthreestandarddeviationsofthemean.Onlyasmallnumber(lessthan 0.3%)ofdataareexpectedtofalloutsidetheserangesifthedataarenormallydistributed.
Page10
26134–BusinessStatistics
Lecture3–DataDescription:IssuesinData LearningObjectives
1. Identifythepositionofthemean,median,andmodeforbothsymmetricandskewed distributions. 2. DetectingOutliersindata IQR Z‐scoreandapplicationofEmpiricalRule 3. Analysingtypeofrelationshipofquantitativevariables
Chapter3–DescriptiveSummaryMeasures THERELATIVEPOSITIONSOFTHEMEAN,MEDIANANDMODE Canweidentifythepositionofthemean,median,andmodeforbothsymmetricand skeweddistributions?YES!
o Exampleofsymmetricdistribution–timetakentocompleteamarathonrun. DATAISSUES–OUTLIERS
1. Isitusefultoincludethevalueofroomtype‘P’whenexaminingthepricesofstandard rooms? ‐ NO,itisanextremeobservation 2. Howdoextremeobservationsimpactthemean? ‐ Thepriceofroomtype‘P’isanextremelyhighvalueandtendstopushtheaverage upwards! 3. Howd...