Title | SPSS 1 intro LEM TRN 002 Wed |
---|---|
Author | Sin YI Tsang |
Course | IC Training couse |
Institution | 香港理工大學 |
Pages | 43 |
File Size | 1.4 MB |
File Type | |
Total Downloads | 56 |
Total Views | 147 |
lecture notes...
SPSS1Introduction
1/18/2021
1
LearningOutcomes • Bytheendofthismodule,youshouldbeable to: 1. Formulateengineeringandbusinessproblemsinstatistical modelforcomputer‐aidedanalysis 2. Applycomputer‐aidedstatisticalanalysistodiscoverhidden patternsandtrendsinsurveydatasets 3. Validatestatisticalhypothesisonexperimentdatausing computer‐aidedanalysis 4. Composeanalysisresultbasedontheoutputsofcomputer‐ aidedanalysis 1/18/2021
2
Inthissection • Structureofthiscourse • StatisticAnalysis • SPSSInterface • VariableSettingsandLevelofMeasurement • DataTransformation • CheckDataErrorBeforeDataAnalysis 1/18/2021
3
ClassSchedule Section (3hours each)
Topic
1
IntroductionofSPSS
GradedAssignment/*Non‐gradedAssignment Date 27‐Jan‐21
2
1. *Datafiledesign 2. *Datafrauddetection Descriptivestatistics, 1. *SummaryTable,ClusterBarChart,ClusterLineChart 2. *OLAPCube,Percentiles,BoxPlot graphical 3. *Histogram,NormalDistribution presentation
3‐Feb‐21
3
T‐Test
1.T‐testonstudents’gradesinquizzesandfinalexam
10‐Feb‐21
4
Correlation
1. *Multiplecorrelationstudyonhelpingbehaviour
24‐Feb‐21
5
Linearregression
1. Multipleregressionstudyonhelpingbehaviour
3‐Mar‐21
6
Analysisofvariance (ANOVA)
1. Relationshipbetweenyearsofeducation,gender,andemployment 10‐Mar‐21 2. Relationshipbetweenworkinghours,gender,andhighestdegree
7
Factorandreliability analysis
1. Factoranalysisonquestionnaire 2. Reliabilityanalysisonquestionnaire
8
Tutorial/Report writing Tutorial/Report writing Multiplechoicetest (1‐hour)
9 10
1/18/2021
17‐Mar‐21 24‐Mar‐21 31‐Mar‐21
1. Test 2. Report
7‐Apr‐21 4
Structureofthiscourse • Task – Classworkforappreciationandpractice – Notmarked,butneedyourparticipation
• GradedAssignment – – – – –
Assignmentforin‐classpracticeandevaluation Markedforfinalgrade Youshoulddoitin‐class Everysection(section3,5,6,7)carryequalweighting 50%ofthetotalmarkofthemodule
1/18/2021
5
Structureofthiscourse • Non‐GradedAssignment – – – –
Assignmentforin‐classpractice NotMarked Youshoulddoitin‐class Everysection(section1,2,4)
1/18/2021
6
Structureofthiscourse • Assignmentsubmission – Namethefilenameas [StudentID]_[SectionNo.] eg.forassignmentreportofstudent12345678d,insection 2:12345678d_2.doc – StoreyouassignmentinD:\student\[Student_ID]\ – Alwaysleaveabackupinyoure‐mailbox,USBdrive,etc – Donotputtheassignmentonthedesktopandshutdown themachine.Thedatawillbeerasedafterrebooting. – Submitbytheendoftheclass,unlessotherwisearranged. – Latesubmissionorsubmissionwithoutattendancewillnot bemarked. – CopyingcasewillbesenttoAcademicRegistrydirectly. 7 1/18/2021
Structureofthiscourse • Test – – – –
10multiple‐choicequestions 1‐hourduration Open‐booktest 30%ofthetotalmarkofthemodule
• Report – Reportonthemethods,findingsandevaluation – Submitrightaftercompletionofmodule – 20%ofthetotalmarkofthemodule
1/18/2021
8
Inthissection • Structureofthiscourse • StatisticAnalysis • SPSSInterface • VariableSettingsandLevelofMeasurement • DataTransformation • CheckDataErrorBeforeDataAnalysis 1/18/2021
9
StatisticAnalysis • StatisticAnalysisPurpose – Simplifyinformationandfilternoise – Findpatternandrelationships – Findmeaningindata
• StatisticAnalysisApplication – – – –
Measurethecauseandeffect Makepredictions Makedecisions Datamining
1/18/2021
10
Statisticalanalysissoftware • Dostatisticsanalysiswithoutbeingamathematician • SPSSmeans: – Importing/enteringdata – Processingofdata – Comparisonofdata – Computationofstatisticalresults – Presentationofstatisticalresults • SPSSdoesnot: – Automaticallygiveananswerfromapoolofdata 1/18/2021
11
Inthissection • Structureofthiscourse • StatisticAnalysis • SPSSInterface • VariableSettingsandLevelofMeasurement • DataTransformation • CheckDataErrorBeforeDataAnalysis 1/18/2021
12
SPSSInterface 1. Dataeditor – Dataview – Variableview
2. Outputviewer
Data editor
Output viewer
– Generatereport
3. Syntaxeditor – Automatetaskswith commandlanguage – File>New>Syntax Syntax editor 1/18/2021
13
SPSSHelp • Generalhelp – Help>Topics
• Dialogbox – Rightclickavariable>variableinformation
• Pivottable – Rightclickroworcolumnheader>What’sthis?
1/18/2021
14
OpenaDataFile • Openadatafile – File>Open>Data • Select “grades.sav”andclickopen
– DraganddropthedatafileintotheSPSSsoftware
• Fileformatsyoucanopen – *.sav (PASWstatisticaldatafile) – *.xlsx /*.xls (Spreadsheetfiles) – *.txt(delimited/fixed‐widthdatafile) 1/18/2021
15
DataView • Dataviewisadatatable – EachrowisaCase: • Theobjectyouwanttoinvestigate
– EachcolumnisaVariable • E.g.name,date,age… • UseVariableViewtosetthedetailofvariable
– EachcellisaValue: • Contentofavariableofacase
– AssigneachcasewithauniqueID Variable
Case
Value 1/18/2021
16
VariableView • Click VariableViewatthebottomoftheData View • Eachrowisavariableanditsproperties – E.g.variablename,type,width,decimal,label,value label,missing,measure,… Setting of variable
Variable
1/18/2021
Variable View
17
Inthissection • Structureofthiscourse • StatisticAnalysis • SPSSInterface • VariableSettingsandLevelofMeasurement • DataTransformation • CheckDataErrorBeforeDataAnalysis 1/18/2021
18
VariableProperties • Valuelabels – Addmeaningtovalues – e.g.1:“male”,2:“female”
• Missingvalues – Assignvaluefor“novalue” – e.g.‐1forage
• Measure ofavariable: – 3LevelsofMeasurement
Levelofmeasurement • Measurementcanbeclassifiedinto3levels: – Nominal: • categoricaldata,e.g.1:blue,2:red,3:green • norankingbetweendata
– Ordinal: • orderedcategoricaldata,e.g.1:bad,2:OK,3:good • usuallynotscalable:e.g.OKisnottwiceofbad
– Scale (interval,ratio): • continuousdatawithcomparablevaluesbetweendata, • e.g.ageof0‐100,populationofacity 1/18/2021
20
Levelofmeasurement • Higherlevelallowsmorepowerfulstatisticanalysis – Scale>Ordinal>Nominal • ScaledatamustbeOrdinaldata;Ordinaldatamust beNominaldata
1/18/2021
21
Task1:ImportExcelFileinSPSS variable 1, variable 2, variable 3, … 1, 10, 100, … 2, 20, 200, … 3, 30, 300, … …, …, … *.csv file
Import
Variable 1
Variable 2
Variable3
…
1
10
100
…
2
20
200
…
3
30
300
…
…
…
…
…
SPSS file
1. Doubleclick“country.csv”filetocheck 2. File>Open>Data – – – – – –
SetFilesoftype to“Allfiles”, Browse to“country.csv”toopenthefileandTextImportWizard ClickNext >Selectdelimited> Selectyes (for“Arevariableincludedatthetopofyourfile?”) ClickNext>Next>Selectcomma andDeselect otheroptions ClickNext>Next>Finish
1/18/2021
22
3.ModifytheVariablestomatchthefollowinginVariable View
1/18/2021
23
Step4and5.ValueLabel 4.Clickvaluescolumnfor“12.develop”variableandsetValueLabels: 5.Tocheckthat valuelabelworks, gotoDataView andclickthedata value button
1/18/2021
24
Step6.ValueLabel 6.InVariableView,clickValuescolumn for“11.region”variableandsetValue Labels:
25
Step7and8.AddNewCase 7.InDataView,add onemorecaseattheendof tablebytypingthefollowingdata: Variable country pop92 urban gdp lifeexpm lifeexpf birthrat deathrat infmr fertrate region develop radio phone hospbed docs lndocs lnphone sequence 1/18/2021
value NewZealand 3.347 76 14000 72 80 15 8 10 1.8 18 0 90.91 71.43 90.09 27.86 3.33 4.27 121
8.File>Savetosavethefile 26
Assignment1:DesignofDataFile • Thetableinthenextslideshowsthe demographicalinformationofaretail customerdatabase • Designadatafilewithappropriatename, label,value,measure,etc. • Keyinthedata • Savethefilewithappropriatefilenamefor futureuse 1/18/2021
27
Assignment1:DesignofDataFile(con’t) Marital Case Age Status 1 55 Married 2 56 Unmarried 3 28 Married 4 24 Married 5 25 Unmarried 6 45 Married 7 42 Unmarried 8 35 Unmarried 9 46 Unmarried 10 34 Married 11 55 Married 12 28 Unmarried 13 31 Married 14 42 Unmarried 15 35 Unmarried 16 52 Married 17 21 Married 18 32 Unmarried 19 42 Unmarried 20 40 Married 21 30 Unmarried 22 48 Unmarried 23 39 Married 24 42 Married 25 45 Married 1/18/2021
Primary Yearin Priceof vehicle current Household primary price address Income vehicle category 12 72 36.2 Luxury 29 153 76.9 Luxury 9 28 13.7 Economy 4 26 12.5 Economy 2 23 11.3 Economy 9 76 37.2 Luxury 19 40 19.8 Standard 15 57 28.2 Standard 26 24 12.2 Economy 0 89 46.1 Luxury 17 72 35.5 Luxury 3 24 11.8 Economy 9 40 21.3 Standard 8 137 68.9 Luxury 8 70 34.1 Luxury 24 159 78.9 Luxury 1 37 18.6 Standard 0 28 13.7 Economy 9 109 54.7 Luxury 12 117 58.3 Luxury 3 23 11.8 Economy 14 21 9.5 Economy 17 17 8.5 Economy 5 34 16.6 Standard 12 115 57.4 Luxury
Years LevelofEducation employed Retired JobSatisfaction Gender Didnotcompletehighschool 23 No Highlysatisfied Female Didnotcompletehighschool 35 Yes Somewhatsatisfied Male Somecollege 4 No Neutral Female Collegedegree 0 No Highlydissatisfied Male Highschooldegree 5 No Somewhatdissatisfied Male Somecollege 13 No Somewhatdissatisfied Male Somecollege 10 No Somewhatdissatisfied Male Highschooldegree 1 No Highlydissatisfied Female Didnotcompletehighschool 11 No Highlysatisfied Female Somecollege 12 No Somewhatsatisfied Male Somecollege 2 Yes Neutral Female Collegedegree 4 No Highlysatisfied Male Collegedegree 0 No Somewhatdissatisfied Female Somecollege 3 No Highlydissatisfied Female Somecollege 9 No Somewhatsatisfied Male Collegedegree 16 No Highlysatisfied Male Somecollege 0 Yes Highlydissatisfied Male Didnotcompletehighschool 2 No Somewhatsatisfied Female Somecollege 20 No Neutral Female Highschooldegree 19 No Highlysatisfied Female Didnotcompletehighschool 3 No Neutral Male Somecollege 2 No Neutral Male Collegedegree 2 No Neutral Male Highschooldegree 13 No Neutral Female Didnotcompletehighschool 27 No Somewhatsatisfied Female 28
Inthissection • Structureofthiscourse • StatisticAnalysis • SPSSInterface • VariableSettingsandLevelofMeasurement • DataTransformation • CheckDataErrorBeforeDataAnalysis 1/18/2021
29
DataTransformation • Variablesmayrequirepre‐processingbefore runninganalysis • Forexample: – takelogarithmonthedata – addanoffsettothevariable – reversethesequenceofordinaldatafrom0‐6to 6‐0
1/18/2021
30
Task: ComputeVariable • Tocomputenewvariablefromtheexistingvariables: 1. Transform >ComputeVariable 2. Totakenaturallogarithmonpop92,>FunctionGroup> Arithmetic 3. FunctionandSpecialVariables >doubleclickonLn> Doubleclickonpop92 fromthelistofvariable,you shouldseeLn(pop32)inNumericalExpression 4. Targetvariable >inputlnpop32 (newvariable)>OK Note:youmayIf… toselectthecasesforcomputation.
1/18/2021
31
Task: RecodeVariable • Createnewvariablesbydividingexisting variableintocategories • eg.classifypopulationintoclass pop92 popclass (newvariable)
1/18/2021
40
System‐ missing
1
2
3
4
5
32
Task: RecodeVariable(Con’t) • Torecodepopclass fromtheexistingvariablespop92: 1. Transform >RecodeVariableintoDifferentVariables 2. Doubleclickon pop92 >OutputVariable >setName aspopclass >set Label asPopulationClass >Change>OldandNewValues 3. SetRange,LOWESTthroughvalue to0 >System‐missing>Add toaddthe firstcategory(system‐missing) 4. SetRange as0 through10 >setvalue as1 >Add toaddthesecond category(1) 5. Repeat10‐20,20‐30,30‐40forcategory2to4 6. SetRange,valuethroughHIGHEST to40 >setvalue as5>Add toaddthe lastcategory(5) 7. >OK
1/18/2021
33
Inthissection • Structureofthiscourse • StatisticAnalysis • SPSSInterface • VariableSettingsandLevelofMeasurement • DataTransformation • CheckDataErrorBeforeDataAnalysis 1/18/2021
34
CheckDataValue– DataTable • Ifdatahaveerrors,dataanalysiswouldbeawasteoftime • Alwayscheckdataerrorbeforedataanalysis: 1. 2. 3. 4. 5. 6.
Checkduplicatecases Checkmissingorout‐of‐scopedata Checkdistinctvaluesfornominal/ordinal data Checkextremevaluethatareunreasonable Checkdistributionofvaluesforunusualfrequency Checkrelationshipsofvaluesthatareimpossible Life Life expectancy expectancy offemale ofmale
Infant Fertility Birth Deathrate mortality rate rate rate
Country
Population
GDP
Canada United States
27.351
‐
74
81
14
‐7
7.3
1.7
256.561
22470
85
79
14
9
10
1.9
China
1169.619
360
69
72
22
7
33
2.2
China
0
19100
77
82
10
100
4
1.8
1
5
2
6
4
Region
Develop
North Developed America country Developed North country America Developing EasternAsia country Developed EasternAsia country
3
35
CheckDataValue‐ Frequency • Alwayscheckdataerrorbeforedataanalysis 1. 2. 3. 4. 5. 6.
Checkduplicatecases Checkmissingorout‐of‐scopedata Checkdistinctvaluesfornominal/ordinal data Checkextremevaluethatareunreasonable Checkdistributionofvaluesforunusualfrequency Checkrelationshipsofvaluesthatareimpossible
Frequency
Frequency Unusual frequency 2
1
A
B
C
D
E
abc
Outside normal range
missing
Missing value
Variable
Case ID
Duplicate cases
36
Task2a:CheckDataValue Identify Duplicate Cases
• Tocheckduplicatecasesof Norminal/Ordinaldata 1. 2. 3.
Data>IdentifyDuplicateCases Set DefineMatchingCasesBytoCountry >OK LookforPrimaryLast equalto0 (DuplicatedCase)forduplicatedcase
• Tosummarizedatavalues 1. 2.
Analyze>Reports>CaseSummaries Select allrequiredvariables>OK
• Tocheckdistinctvaluesfornominal/ordinal data 1. 2.
Analyze>DescriptiveSatistics> Frequency SetVariables to:region,develop >OK
CaseID
Frequency
Percent
Duplicate case
5
5%
Primarycase
95
95%
Total
100
100%
Case Summaries
CaseID
Variable2 Variable3 Variable4
Case1
…
…
…
Case2
…
…
…
Case3
…
…
…
Case4
…
…
…
Frequency
Percent
Value1
80
80%
Value2
15
15%
#?@
3
3%
Missing
2
2%
Total
100
100%
Frequency table
Valid
Task2b:CheckDataDistribution forScaleData • Checkextremevalue(insteadofalldistinctvalues) 1. Analyze>DescriptiveStatistic>Explore >Statistic > Extreme values (outliers) checkOutliers Case 2. SetDependent Listaspop92 number
• Checkdistributionofvalues 1. Graphs>LegacyDialog>Histogram 2. SetVariable togdp
Highest
Variable
Histogram Frequency
Lowest Extreme values
A
B
C
D
E
Variable
Value
1
…
1000
2
…
100
3
…
99
4
…
98
5
…
97
1
…
‐3
2
…
0
3
…
1
4
…
2
5
…
3
38
Task2c:Checkrelationshipof values •
Togeneratescatterplot: 1. 2. 3. 4. 5. 6.
•
Graphs>LegacyDialog>Scatter/Plot>SimpleScatter>Define SetYAxistoInfantmortalityrate,XAxistoFertilityrate >OK Doubleclicktheplottoedit UsetheElements>DataLabelModebuttontoclick adottoshow thecaseID Rightclickthedottogotothecase Whatpatternisshowninthescatterplot?
Trytogenerateanotherscatterplotforvariables:Phonesand Naturallogofphones –
Whatpatternisshowninthescatterplot? Scatter plot
Variable 2 18
1/18/2021
Variable 1
39
SelectCasesForAnalysis • Toselectasubsetofcases(eg.gdp>200)foranalysis: 1. Data >SelectCases > 2. ClickIfconditionissatisfied>If>setconditiontogdp > 200 >Continue 3. ClickFilteroutunselectedcases >OK – Whyisthereanewvariable“filter_$”afterselection? – Toremovefilter:Data >SelectCases >ClickAllcases
• To...