26134 uts business statistics high distinction PDF

Title	26134 uts business statistics high distinction
Course	Business Statistics
Institution	University of Technology Sydney
Pages	45
File Size	5.3 MB
File Type	PDF
Total Downloads	70
Total Views	181

Preview

CLICK TO PREVIEW PDF

Summary

Download 26134 uts business statistics high distinction PDF

Description

26134–BusinessStatistics



Lecture1–IntroductiontoStatistics,DataandGraphingData LearningObjectives    

HowStatisticsisusedinbusiness Thetypesofdata–qualitativevariableandquantitativedata. Measurementpropertiesofdata–Distinguishamongthenominal,ordinal,interval,and ratiolevelsofmeasurement Graphicalpresentationofdata

Chapter1–IntroductiontoStatistics TYPESOFSTATISTICS  Statisticsisabranchofmathematicsthattransformsdataintousefulinformationfor decisionmakers. Descriptivestatistics–collecting,summarisinganddescribingdata. Inferentialstatistics–drawingconclusionsand/ormakingdecisionsconcerningapopulation basedonlyonsampledata. PROCESSOFDESCRIPTIVESTATISTICS  Collectdata ‐ E.g.Survey  Presentdata ‐ E.g.Tablesandgraphs  Characterisedata ∑ 

‐ E.g.Samplemean=    PROCESSOFINFERENTIALSTATISTICS  Estimation ‐ E.g.estimatethepopulationmeanweightusingthesamplemeanweight.  Hypothesistesting ‐ E.g.testtheclaimthatthepopulationmeanweightis100kgs Drawingconclusionsaboutalargegroupofindividualsbasedonasubsetofthelarge group. 1.1. STATISTICSINBUSINESS  Statisticsisamathematicalscienceconcernedwiththecollection,presentation,analysisand interpretationorexplanationofdata.  TheaimofBUSINESSSTATISTICSistoextractthebestpossibleinformationfromdataand useittomakebusinessdecisions. BASICVOCABULARYOFSTATISTICS  Apopulationisacollectionofallpossibleindividuals,objects,ormeasurementsofinterest.  Asampleisaportion,part,orsubsetofthepopulationofinterest. POPULATIONVERSUSSAMPLE  Measuresusedtodescribethepopulationarecalledparameters.  Measurescomputedfromsampledataarecalledstatistics. 1.2. BASICSTATISTICALCONCEPTS  Apopulationisacollectionofobjects(oftencalledunitsorsubjects)ofinterest.  E.g.allsmallbusinesses,allworkerscurrentlyemployedbyBHPBilliton,etc.  Collectionofdataonawholepopulationiscalledacensus.  Asampleisasubsetoftheunitsinapopulation.  Asamplecanbeexpectedtoberepresentativeofthewholepopulation.  Therearetwostepsinanalysingdatafromasample–EDAandstatisticalinference.  

Page1

26134–BusinessStatistics



1. Exploratorydataanalysis(EDA)isthefirststep,inwhichnumerical,tabularandgraphical summaries(suchasfrequencytables,means,standarddeviationsandhistograms)ofdata areproducedtosummariseandhighlightthekeyaspectsoranyspecialfeaturesofthedata. 2. Astatisticalinferenceisaninferencebasedonaprobabilitymodellinkingthedatatothe population. ‐ Statisticalinferenceusessampledatatoreachconclusionsaboutthepopulationfromwhich thesamplewasdrawn.Aninferenceisaconclusionthatpatternsobservedinthedata (sample)arepresentinthewiderpopulationfromwhichthedatawerecollected. Asanexample,inpharmaceuticalresearch,testsmustbelimitedtoasmallsampleof patientssincenewdrugsareexpensivetoproduce.  Researchersdesignexperimentswithsmall,representativesamplesofpatientsand drawconclusionsaboutthewholepopulationusingstatisticalinferencetechniques.   Adescriptivemeasureofthepopulationiscalledaparameter.DenotedbyGreekletters.  Examplesarepopulationmean,populationstandarddeviationandpopulationvariance.  Adescriptivemeasureofasampleiscalledastatistic.DenotedbyRomanletters.  Examplesofstatisticsaresamplemean,samplestandarddeviationandsamplevariance. FOURLEVELSOFMEASUREMENTOFDATA









1.3.  o  o ‐ ‐  o  o  

  Eachlevelbuildson(e.g.intervalhasbothordinalandnominalcharacteristics).  Ascendingfromleastnumericaltomostnumerical(i.e.ratioismostnumerical). Nominallevel–datathatisclassifiedintonon‐overlappingcategoriesandcannotbe arrangedinanyparticularorder/sorted.  E.g.eyecolour,gender,brandofTV Ordinallevel–datathatisclassifiedintodistinctnon‐overlappingcategoriesinwhich rankingisimpliedi.e.datacanbearrangedinsomeorder/sorted.  E.g.TestperformanceisgradedasHD,D,C,PorF(descendingorder). Intervallevel–isanorderedscaleinwhichthedifferencebetweenmeasurementsisa meaningfulquantitybutthemeasurementsdonothaveatruezeropoint.  E.g.TemperatureontheFahrenheitscale,shoesize  Notruezero–forexample,0degreesDOESNOTequalto0Fahrenheit. Ratiolevel–theintervallevelwithaninherentzerostartingpoint.Differencesandratios aremeaningfulforthislevelofmeasurement.“Zero”issignificant(meaningful).  E.g. price,distancetravelledtolecturehall,timetakentoreachthelecturehall.  Meaningfulzero–forexample,0kilogramsequalsto0pounds. TYPESOFDATA Categoricaldataissimplyanidentifierorlabelandhasnonumericalmeaning. Categoricaldatacanbefurthersub‐classifiedasnominalorordinal. Numericaldatahaveanaturalorderandthenumbersrepresentsomequantity. Numericaldatacanbesub‐classifiedasdiscreteorcontinuous. Discretedatatypeiswherewecanlistthepossiblevalues. ContinuousdatatypeiswherewecangiveonlyaRANGEofpossiblevaluesforthedata. Datathatarecollectedatafixedpointintimearecalledcross‐sectionaldata. Suchdatagiveasnapshotofthemeasuredvariablesatthatpointintime. Oftendataarecollectedovertimeandsuchdataarecalledtime‐seriesdata. Unlikecross‐sectionaldata,time‐seriesdataaretimedependent. Page2

 1.4.  

26134–BusinessStatistics

OBTAININGDATA Datacollectedtoaddressaspecificneedareknownasprimarydata. Datathatwerecollectedforsomeotherpurposeandarealreadyavailableareknownas secondarydata.

Chapter2–ChartsandGraphs GRAPHICALREPRESENTATIONOFQUALITATIVE/CATEGORICALDATA  PieChart:Acirculardisplayofdatawheretheareaofthewholepierepresents100%ofthe databeingstudiedandslicesrepresentapercentagebreakdownofthesublevels.  BarChart:Agraphinwhichabarshowseachcategory,thelengthofwhichrepresentsthe amount,frequencyorpercentageofvaluesfallingintoacategory.

 GRAPHICALREPRESENTATIONOFQUANTITATIVE/NUMERICALDATA  Histogram:Atypeofverticalbarwheretheareaofeachbarisequaltothefrequencyofthe correspondinginterval.  ScatterPlot:Aplotorgraphofpairwisedatafromtwocontinuousvariables,toexplorethe relationshipbetweenthem.



SUMMARY–PRESENTINGDATA

 DATACOLLECTION  Cross‐sectiondataiscollectedatonepointintime.  Example:Analysisofstockpriceon14thApril2000  Time‐seriesdataiscollectedovertime.  Example:Collectingandanalysingstockpriceon14thAprilforyears–2000,2001…2008!  

Page3

 2.1.

26134–BusinessStatistics

FREQUENCYDISTRIBUTIONS  Rawdata,ordatathathavenotbeensummarisedinanyway,aresometimesreferredtoas ungroupeddata.  Frequencydistributionsareaconvenientwaytogroupcontinuousdata.  Afrequencydistributionisasummaryofthatdatapresentedasnon‐overlappingclass intervalscoveringtheentirerangeofdataandtheircorresponding frequencies.  Datathathavebeenorganisedintoafrequencydistributionarecalledgroupeddata.

 ClassInterval Frequency 0–500 6 500–1000 9 1000–1500 7

Figure–FrequencyDistributionTable ClassMidpoint RelativeFrequency CumulativeFrequency 250 0.27 6 750 0.41 15 1250 0.32 22

 Therangeisdefinedasthedifferencebetweenthelargestandsmallestdatavalues. Themidpointofeachclassiscalledtheclassmidpoint,alsosometimesknownasclassmark. Relativefrequencyistheratioofthefrequencyoftheclassintervaltothetotalfrequency. Thecumulativefrequencyistherunningtotaloffrequenciesthroughtheclassesofa frequencydistribution. 2.2. GRAPHICALDISPLAYOFDATA  Ahistogramisaverticalbarchart,wheretheareaofthebarisequaltothefrequencyofthe correspondinginterval.  Histogramsarethemostusefulandcommongraphsfordisplayingcontinuousdata. o Afrequencypolygonisagraphconstructedbyplottingadotforthefrequenciesattheclass mid‐pointsandconnectingthedots. o Anogiveisacumulativefrequencypolygon.  Ogivesaremostusefulwhenthedecisionmakerwantstoseerunningtotals.  Apiechartisacirculardisplayofdatawheretheareapfthewholepierepresents100%of thedatabeingstudiedandslicesrepresentpercentagecontributionsofthesublevels.  UsedtodisplayCATEGORICALdataandshowsrelativemagnitudesofpartstothewhole. Anotherwaytodisplaycontinuousdataisbyusingastemandleafplot. ‐ Theleafistherightmostdigitofthedata,andtherestofthedigitsformthestem. Paretochartisaverticalbarchartthatdisplaysthemostcommontypesofdefects,rankedin orderofoccurrencefromlefttoright. ‐ Aspecialwayofdisplayingcategoricaldata.  Ascatterplotisagraphofpairwisedatafromtwocontinuousvariables. ‐ Thescatterplot(alsocalledascattergraphorscattergram)isagraphicaltechniquefor qualitativelyexploringtherelationshipbetweentwocontinuousvariables.

Lecture2–DescriptiveStatistics(NumericalMeasures) LearningObjectives 1. CalculateMeasuresoflocation/tendency:mean,medianandmode 2. Explainthecharacteristics,uses,advantages,anddisadvantagesofeachmeasureof location/tendency. 3. Computeandinterpretthemeasureofdispersion/spread:range,variance,andstandard deviation,Interquartilerange(IQR),Coefficientofvariation(CV) 4. Applicationofmeasuresoftendencyandmeasuresofdispersion

 

Page4

26134–BusinessStatistics



Chapter3–DescriptiveSummaryMeasures

MEASUREOFCENTRALTENDENCY  Measuresofcentraltendencyyieldinformationaboutthecenterofasetofnumbers–e.g. themiddle,oraveragevalue  CommonMeasuresofCentralTendencyare:  Mode(nominal,ordinal,interval,ratio)  Median(ordinal,interval,ratio)  Mean(interval,ratio) MEASUREOFCENTRALTENDENCY–MEAN  Thearithmeticmeanisthemostwidelyusedmeasureoflocation. ‐ PROPERTIES  Istheaverageofasetofnumbers  Applicableforintervalandratiodata  Affectedbyeachvalueinthedatasetincludingextremevalues.  Itiscalculatedbysummingthevaluesanddividingbythenumberofvalues. Populationmean(mu)





 Samplemean(xbar)





  

Page5

 26134–BusinessStatistics MEASUREOFCENTRALTENDENCY–MEDIAN  Medianisthemiddlevalueofasetofnumbersaftertheyhavebeenarrangedinanorder (ascendingordescending). ‐ PROPERTIESOFTHEMEDIAN  Applicableforordinal,interval,andratiodata  Unaffectedbyextremelylargeandextremelysmallvalues(outliers)  Medianposition=

 





MEASUREOFCENTRALTENDENCY–MODE  Modeisthemostfrequentlyoccurringvalueinadataset.Itisapplicabletoalllevelsofdata measurement(nominal,ordinal,interval,andratio).Youcanhavemultiplemodes.

 PRACTICEEXERCISE#1

 1) Mean=479/8=59.9 Medianposition=(n+1)/2=8+½=4.5.Therefore,median=50+51/2=50.5 Mode=47(appearstwice) 2) MEDIAN–Halfofthesalariesarebelow$50,500andhalfthesalariesareabove$50,500. MODE–Themostfrequentlyoccurringannualsalaryis$47,000. 3) Bestmeasureofuse–MedianandMode Meanisaffectedbyeachvalueinthedatasetincludingextremevalueswhereasthemedian andmodedonotchangeverymuch. MEASUREOFDISPERSION/SPREAD  Measuresofdispersiondescribethespread/variabilityofasetofdata  CommonMeasuresofVariability • Range(needtosortdatainascendingorder) • InterquartileRange(IQR)(needtosortdatainascendingorder) • Variance • StandardDeviation • Coefficientofvariation(CV) • zScores RANGE  RANGE:Thedifferencebetweenthelargestandthesmallestvaluesinasetofdata.  

Page6



26134–BusinessStatistics

 INTERQUARTILERANGE(IQR)  Interquartilerangeistherangeofvaluesbetweenthefirstandthirdquartiles.IQR=Q3‐Q1  Rangeofthe“middlehalf”i.e.middle50%ofdata  Firstquartile(Q1)equals25thpercentilei.e.atleast25%ofthedataliebelowthisvalue.  Thirdquartile(Q3)equals75thpercentilei.e.atleast75%ofthedataliebelowthisvalue.  Lessinfluencedbyextremesandusefulmeasureforordinaldata. ‐ NOTE:SeeappendixonhowtocalculateQ1andQ3. InterpretationforIQRis–‘’the valuesofthemiddle50%ofsample orpopulationspanarangeofthe IQR.’’

 GRAPHICALTECHNIQUETOEXAMINESPREADOFDATA:BOXPLOTS  GraphicallyBOXPLOTSareausefulvisualrepresentationofdataandrepresentthe5 number‘summary’ofdataincludingIQR. 5numbersummaryis:  Smallestvalue(Min)indata  Greatestvalue(Max)  FirstQuartile(Q1)  Median(Q2)  ThirdQuartile(Q3)  VARIANCEANDSTANDARDDEVIATION  VARIANCEistheaverageofthesquareddeviationsfromthearithmeticmean.  STANDARDDEVIATIONisthesquarerootofthevariance.  Forpopulationswhosevaluesaredispersedfromthemean,thepopulationvarianceand standarddeviationwillbelarge.

 



Page7



26134–BusinessStatistics



Interpretationforstandard deviationissimply–‘’the‘deviation’ ofindividualvaluesfromthe averagevalue.’’

 COEFFICIENTOFVARIATION  Coefficientofvariation(CV)isdefinedastheratioofthestandarddeviationtothemean, expressedasapercentage.  Measurementofrelativedispersionandusedtocomparestandarddeviation/variabilityof datasetswithdifferentmeans.



 PRACTISEEXERCISE#2 1. Whichisamoreusefulmeasureofunderstandingvariability–varianceorstandard deviation?Why? ‐ StandardDeviation(reason–UNITS) 2. Whichisthebestmeasuretocomparethevariabilityofthetwostocks?Why? ‐ Asseeninthetablebelow,bothstockshavedifferentmeanssoitisbettertouseCV (COEFFICIENTOFVARIATION)overSDorvariance.  

Page8

26134–BusinessStatistics



 Z‐SCORE  Az‐scorerepresentsthenumberofstandarddeviationsavalue(x)isaboveorbelowthe meanofasetofnumbers.   ‐ ‐ ‐

Interpretationofz‐score:  Anegativez‐scoreindicatesthattheitem/elementisbelowaverage. Apositivez‐scoremeansthattheitem/elementisaboveaverage. Az‐scoreof0impliesx=µ!

 APPENDIX APPENDIX:PERCENTILE Measuresofcentraltendencythatdivideasetofdatainto100parts.  Atleastn%ofthedataliebelowthenthpercentile,andatmost(100–n)%ofthedatalie abovethenthpercentile ‐ Example–90thpercentileindicatesthatatleast90%ofthedataliebelowit,andatmost 10%ofthedatalieaboveit.  Applicableforordinal,interval,andratiodatabutnotapplicablefornominaldata. ‐ 25thpercentile=firstquartile(Q1) ‐ Themedianandthe50thpercentilehavethesamevalue ‐ 75thpercentile=thirdQuartile(Q3)

 



Page9

 Example Levelof measurement Measureof centraltendency

26134–BusinessStatistics

QUIZ1PREPARATION–SUMMARY Levelof Shoesize satisfaction NOMINAL ORDINAL INTERVAL

Colour

MODE

MODE,MEDIAN

MODE, MEDIAN, MEAN Standard Deviation

FBpagevisits dailybyagirl RATIO MODE, MEDIAN, MEAN Standard Deviation

N/A IQR(orrange) Measureof variability 3.1.MEASURESOFCENTRALTENDENCY  Measuresofcentraltendencyyieldinformationaboutthecentre,ormiddlepart,ofasetof numbers.  Themodeisthemostfrequentlyoccurringvalueinasetofdata.  Inthecaseofatieforthemostfrequentlyoccurringvalue–bimodal.  Datasetswithtwoormoremodesarereferredtoasmultimodal.  Themedianisthemiddlevalueinanorderedarrayofnumbers.  Thearithmeticmeanistheaverageofasetofnumbersandiscomputedbysummingall numbersanddividingthesumbythecountofnumbers. 3.2.MEASURESOFLOCATION  Measuresoflocationyieldinformationaboutcertainsectionsofasetofnumberswhen rankedintoanascendingarray.  Percentilesaremeasuresoflocationthatdivideasetofdatasothatacertainfractionof datacanbedescribedasfallingonorbelowthislocation. Where: i=thepercentilelocation P=thepercentileofinterest n=thenumberof observationsinthedataset.   Quartilesaremeasuresoflocationthatdivideasetofdataintofoursubgroupsorparts. 3.3.MEASURESOFVARIABILITY  Measuresofvariabilitydescribethespreadordispersionofasetofdata.  Therangeisthedifferencebetweenthemaximumandminimumvaluesofadataset.  Theinterquartilerangeisthedistancebetweenthefirstandthirdquartiles.  IQR=Q3–Q1,essentiallyitistherangeofthemiddle50%ofthedata.  Subtractingthemeanfromeachvalueofdatayieldsthedeviationfromthemean.  Thevarianceistheaverageofthesquareddeviationsfromthemeanforasetofnumbers.  Thestandarddeviationisthesquarerootofthevariance.  Theempiricalruleisanimportantruleofthumbthatisusedtostatetheapproximate percentageofvaluesthatliewithinagivennumberofstandarddeviationsfromthemeanof asetofdataifthedataarenormallyorapproximatelynormallydistributed.  Normaldistributionisaunimodalandsymmetricaldistribution(bell‐shapeddistribution). ‐ UsefulapplicationofEMPIRICALRULEisindetectingpotentialoutliers.Outliers are observationsthatareunusuallylargeorsmallvaluesthatappeartobeinconsistentwiththe restofthedata. ‐ Empiricalrulesuggeststhat,fornormallydistributeddata,nearlyall(99.7%)observations shouldfallwithinthreestandarddeviationsofthemean.Onlyasmallnumber(lessthan 0.3%)ofdataareexpectedtofalloutsidetheserangesifthedataarenormallydistributed.  

Page10

26134–BusinessStatistics



Lecture3–DataDescription:IssuesinData LearningObjectives

1. Identifythepositionofthemean,median,andmodeforbothsymmetricandskewed distributions. 2. DetectingOutliersindata  IQR  Z‐scoreandapplicationofEmpiricalRule 3. Analysingtypeofrelationshipofquantitativevariables

Chapter3–DescriptiveSummaryMeasures THERELATIVEPOSITIONSOFTHEMEAN,MEDIANANDMODE  Canweidentifythepositionofthemean,median,andmodeforbothsymmetricand skeweddistributions?YES!

o Exampleofsymmetricdistribution–timetakentocompleteamarathonrun. DATAISSUES–OUTLIERS

 1. Isitusefultoincludethevalueofroomtype‘P’whenexaminingthepricesofstandard rooms? ‐ NO,itisanextremeobservation 2. Howdoextremeobservationsimpactthemean? ‐ Thepriceofroomtype‘P’isanextremelyhighvalueandtendstopushtheaverage upwards! 3. Howd...