Handout Introduction to SPSS Ryerson PDF

Title	Handout Introduction to SPSS Ryerson
Course	Multiple Regression for Business
Institution	Ryerson University
Pages	24
File Size	1.5 MB
File Type	PDF
Total Downloads	63
Total Views	131

Preview

CLICK TO PREVIEW PDF

Summary

Major Assignment...

Description

Introduction to SPSS

Facilitator:

Olesya Falenchuk Research Systems Analyst Research Design and Analysis Services 3-391, Education Commons, OISE/UT (416)978-1956 [email protected]

Data Management and Manipulation Where does data management and manipulation fit into quantitative research framework? Data Collection (Data Entry)

Data Management and Manipulation

Data Analysis

Why do we need to manage and manipulate data? • • • • •

Input, save and retrieve data Verify data records (data cleaning) Combine data from various sources Create new data Prepare data for further analyses

By the end of this workshop you will know how to: Use SPSS environment • • •

Import data in SPSS Navigate in SPSS and use its menus and dialogs Use SPSS Syntax Editor and SPSS Commands

Manage your data • • • • • • •

Identify duplicate cases Identify faulty entries in your data Aggregate your data Transpose data Merge files by adding cases and variables Split file for further analyses Select cases for further analyses

Transform your data • •

Recode Compute new variables

1

SPSS Environment Data Editor Window To open SPSS software, follow Start → IBM SPSS Statistics → IBM SPSS Statistics 21. The Data Editor window will open for you.

The Data Editor window has two views: Data View and Variable View. You can switch between the two views choosing appropriate tabs in the left bottom corner of the screen. The Data View displays the data, and the Variable View – contains information about the data (e.g., variable names, labels). The Data Editor window contains the following menus that can be accessed on the top of the screen:          

File – used to open existing files, read data files, save data files, and exit SPSS. Edit – copy, cut and paste functions. View – allows you to switch between data and variable view, and hide or show toolbars/status bars. Data – used to perform various functions on your data (define variables, insert variables or cases, sort cases, transpose, merge files, aggregate, split files, select cases). Transform – used to perform computations on variables, to recode variables, count and rank cases, and replace missing values etc. Analyze – used to perform statistical analyses. Direct Marketing – used for marketing applications Graphs – used to generate different types of graphs/charts. Window – used to check which window is active (check mark) or to move to any open window. Help – used to get help on SPSS topics and to access tutorials.

The toolbar on the top of the Data Editor window gives the user access to common tasks available in SPSS. Placing your mouse on a tool, gives you a brief description of the function of the tool.

2

The status bar, seen at the bottom of the SPSS window, allows the user to see what procedure is running at a particular point in time (see left of status bar) and what functions have been imposed on the data (see right of status bar). Syntax and Output Windows Most data entry, data manipulation, and data analyses can be conducted in SPSS using Pull-Down menus. Here are few scenarios that show you why this might be not the best option: 1. You have to conduct the same analysis with similar datasets. 2. After you’ve finished your analyses, you found a typo in the data and have to re-run all analyses again. 3. Six month after submitting your study to a journal, you’ve got suggestions from the reviewers of your paper to modify some of the analyses of your data. Using SPSS Syntax Editor instead of Pull-Down menus allows you to deal with these and other issues quickly and efficiently. Working with Syntax Editor allows keeping precise log of your work with SPSS and run the same program as many times as needed. It also allows you to make minor changes for various analyses and then re-run the program. To start using syntax, you simply need to click “Paste” instead of “OK” in any dialogue window. To run the analyses from syntax, select syntax lines and click “Run” button on the toolbar. You can customize your syntax by adding comments. Every comment should start with * and end with a period (.).In addition, you might want to opt to see syntax lines in your output. This will help you to keep track of the analyses you’ve done and options you’ve selected. Edits > Options > Viewer > Click “Display Commands in the Log”

Thus, when using syntax, you will have to work in SPSS with three windows: Data Editor, Syntax and Output. Each of these windows has to be saved separately. Each type of window is saved in the file with different file extension: .sav for SPSS data, .sps for syntax commands, and .spv for the output. Note: Output files created in versions prior to release 16.0 cannot be opened in SPSS 16.0 or later. In Windows operating systems, you can install the SPSS 15.0 Smart Viewer (available on the installation CD or on SPSS website) to access older (.spo) output files. The Output window displays the results of your analyses. When you just manage and manipulate your data, you won’t see any output except syntax commands. SPSS output can be exported in a variety of formats: .doc, .xls, .pdf, .ppt, .html. It is a convenient option as SPSS output created with SPSS 16 and newer versions is not compatible with SPSS output created with older versions of the software. 3

Dialog Box Each menu selection has a dialog box and includes 3 basic features: (1) source variable list – a list of variables (on the left) from the working data file; (2) target variable lists – variables chosen (appropriate) for the analysis; and (3) command buttons – buttons that can be used to run procedures (OK, Paste, Reset Cancel, Help etc.).

Data Entry/Import in SPSS The SPSS data editor can be a good choice for entering your data. It has a friendly interface that resembles an Excel spreadsheet and by entering the data directly into SPSS you don't need to worry about converting the data from some other format into SPSS. If you decide to have your data entered in SPSS, you’ll need to tell SPSS the names of your variables first. You can double click on the column heading of the variable. That permits you to enter information about the variable for that column. Otherwise, you can switch to the Variable view and enter information about all your variables there. You can also have your data entered in some other software package and have them imported to SPSS. For example, you might enter your data in Excel, and then import them to SPSS. Other software packages that can be used for data entry are Access, Stata, and SAS. To import your data into SPSS or open the data that were previously entered into SPSS, follow Start → IBM SPSS Statistics → IBM SPSS Statistics 20 (this will open the software), then in the File menu select Open > Data, then choose the appropriate data file. Task 1. Open the working file. 1. Open SPSS software, then follow File > Open > Data, find the location of the file Employee_data.sav on your computer and click ‘Paste’. This will save the SPSS command for opening the file in the syntax. 2. In the Syntax window, select the commands and run them.

4

Data Management and Manipulation Once you enter or import your data into SPSS, the next few steps involve checking and validating the data quality. Step 1 • • • • • •

Review the variables in your data file and determine their valid values, labels, and measurement levels Identify combinations of variable values that are impossible but commonly miscoded Define validation rules based on this information Specify the code for missing values in each variable Make sure the variable names are short and intuitive Assign variable labels

Step 2 •

Run basic checks and checks against defined validation rules to identify invalid cases, variables, and data values. When invalid data are found, investigate and correct the cause.

Step 3 • • •

Identify potential statistical outliers that can cause problems for many statistical analyses Examine the shape of the distribution for each variable Conduct variable transformations if necessary

Data Transformation Recoding variables RECODE command allows you to reassign the values of existing variables or to collapse the ranges of existing values into new values for a new variable. However there are some considerations to keep in mind:  You can recode numeric and string variables;  Numeric variables can be recoded to string variables and vice versa;  Where multiple variables have been selected they must all be the same type – numeric and string variables cannot be recoded together. To recode a variable, follow Transform > Recode > Into Different Variables. This produces the following dialog boxes as shown below:

5

Task 2. Recoding continuous variable into a categorical variable using Recode into Different Variables function. We will recode a continuous variable ‘ salary’ into new variable with 3 categories: less than $50,000, $50,000-100,000, more than $100,000.

1. Go to Transform > Recode Into Different Variables 2. Select the variable ‘salary’ and move to Input Variable > Output Variable window 3. Next name your output variable ‘salcat’ and label it as ‘Salary Categorized’ and click Change. You will now see the output variable name in the Numerical Variable > Output Variable window 4. Click Old and New Values 5. To create the first category ‘less than $50,000’ under Old Value click Range, LOWEST through value and enter the number 49999. Then under New Value, enter the number 1, and then click Add. 6. To create the second category ‘$50,000-100,000’ under Old Value click Range through and enter the numbers 50000 and 100000. Then under New Value, enter the number 2, and then click Add. 7. To create the third category ‘more than $100,000’ under Old Value click Range, value through HIGHEST and enter the number 100001. Then under New Value, enter the number 3, and then click Add. 8. Choose System or user missing option under Old value and System missing under New value and click Add. 9. Click Continue. 10. Click Paste. 11. Go to Syntax Editor Window, comment this command, select and run it. Note: To define the values of the new categorical variable, switch to variable view window, go to the new variable and click on the Values cell, and enter appropriate labels for each category.

6

Task 3. Collapsing categories in categorical variables. We will recode the response categories ‘clerical’ and ‘custodial’ to ‘staff’ for the variable ‘jobcat’. 1. Go to Transform > Recode Into Different Variables 2. Select the variable ‘jobcat’ and move to Input Variable > Output Variable window 3. Next name your output variable ‘jobcatr’ and label it as ‘Job category recoded’ and click Change. You will now see the output variable name in the Numerical Variable > Output Variable window 4. Click Old and New Values 5. Put the following pairs of values in the Old value and New Value spaces, 0 – 0, 1 – 1, 2 –1, 3 – 2 and click Add after each pair. 6. Choose System or user missing option under Old value and System missing under New value and click Add. 7. Click Continue. 8. Click Paste. 9. Go to Syntax Editor Window, comment this command, select and run it. Note: To define the values of the new categorical variable, switch to variable view window, go to the new variable and click on the Values cell, and enter appropriate labels for each category.

Computing new variables COMPUTE command computes values for a variable based on numeric transformations of other variables. You can:  Compute values for numeric or string variables  Create new variables or replace the values of existing variables. For new variables, you can also specify the variable type and label  Compute values selectively for subsets of data based on logical conditions Task 4. Computing a new variable Salary difference ( sal_dif) from the variables Current Salary (salary) and Beginning Salary (salbegin) 1. Go to Transform > Compute Variable 2. Under Target Variable: type the variable name ‘sal_dif’ 3. Click on Type and level button, enter Difference between current and beginning salary under Label and click Continue. 4. Select the variable ‘salary’ and move to the Numeric Expression: box 5. From the key pad provided or from your keyboard select the minus sign 6. Then select the variable ‘salbegin’ and move to the Numeric Expression: box 7. Click Paste. 8. Go to Syntax Editor Window, comment this command, select and run it.

Running analyses separately for groups of cases If you data set contains sub-samples of cases representing specific demographic groups (defined by age group, income group, gender etc.), you can run separate analyses to explore the characteristics of each subsample. You’ll just need to tell SPSS to splits the data file before running your analyses. To 7

split the file into separate groups for analysis based on the values of one or more grouping variables, follow Data > Split File. This brings up the following dialog box:

Analyze all cases, do not create groups, is the default option and if chosen all analyses are conducted on the entire dataset. The second and third choice Compare groups and Organize output by groups result in the same values but the output is presented differently. If the Compare groups option is selected then the output for all groups is presented in the same table, however, if Organize output by groups is chosen then the output is presented separately for each group. Task 5. Split file into groups using Gender (gender variable) as criterion 1. 2. 3. 4. 5.

Go to Data > Split File Select Compare Groups option Select Gender variable and move to Groups Based on: window Click Paste. Go to Syntax Editor Window, comment this command, select and run it.

Note: There will be no visible output for this command; however, pay attention to the right side of the status bar which will indicate Split File On. Also remember to turn Split file off when you no longer want to analyze your data by group (Data > Split file > Analyze all cases, do not create groups).

Running analyses for a selected group of cases You also might be interested in running some analyses only for a selected group of cases with certain characteristics. In this case, you will need to use SPSS command called Select Cases. Select Cases provides several methods for selecting a subgroup of cases based on criteria that include variables and complex expressions. You can also select a random sample of cases. The criteria used to define a subgroup can include: • • • • • •

Variable values and ranges Date and time ranges Case (row) numbers Arithmetic expressions Logical expressions Functions 8

To select cases, follow Data > Select Cases. This brings up the select cases dialog box as shown below.

Select Options:  All cases – this is the default option  If condition is satisfied is based on choosing a subgroup of cases based on conditional expressions (e.g., variable names, mathematical operations, functions).  Random sample of cases is based on choosing a random sample of cases from all cases; this can be in the form of a percentage or an exact number of cases.  Based on time or case range refers to choosing a subgroup of cases by specifying a range of cases or a range of dates or times (available only for time series data). Output Options:  Filter out unselected cases – retains all cases in the database but only those that meet the selection criteria are used in analysis.  Copy selected cases to a new dataset – creates a new dataset with only select cases  Delete – retains only those cases that meet the selection criteria and removes all other cases from the database. Note: cases cannot be recovered once you save the data file after deleting the cases. Task 6. Selecting cases only for ‘Clerical’ and ‘Custodial’ workers. 1. Go to Data > Select Cases 2. Click on the If condition is satisfied button and then on If button which brings up a Select Cases: If dialog box. 3. From the list of variables to the left, select the variable (jobcat) and move to the white space to the right. 4. From the calculator pad (or from the keyboard) choose appropriate signs until the following appears ‘jobcat = 1 | jobcat = 2’ 5. Click Continue, then Click Paste. 6. Go to Syntax Editor Window, comment this command, select and run it. Note: There will be no visible output for this command however pay attention to the right side of the status bar which will indicate Filter On. The cases that were not selected will be crossed in the data view window (see

9

case numbers on left side). Also remember to turn Filter off when you no longer want to analyze your data by selected group(s) (Data > Select Cases > All cases).

Univariate Exploratory Data Analysis SPSS offers a variety of tools for exploratory data analysis. The type of tool you choose depends on the properties of the data (measurement scale) and your goals for the analysis (exploring entire distribution vs. summarizing the distribution of scores). Below is a decision map that you can use to choose appropriate tools.

Creating a Frequency Table The Frequencies procedure can be used to describe and summarize one or more categorical variables. It allows determining the frequencies (counts) and percentages of cases for each category of a variable (default settings). In addition, you can request descriptive statistics (measures of central tendency and variability) and graphs (histograms, bar graphs, and pie charts) from this procedure. Note, that pie charts are not the best option for displaying your data for academic research purposes! To request Frequencies procedure, follow Analyze > Descriptive Statistics > Frequencies. Although histograms and bar graphs can be requested from the Graphs menu in SPSS, if you want to obtain these graphs for multiple variables at the same time, using Frequencies procedure will make it a more efficient process. In addition, you can get descriptive statistics from the Descriptives option in the Analysis menu. However, this option does not allow you to request a median of the variable. Therefore, if you need to compute a median of the variable, use Frequencies procedure.

10

Task 7. Explore the distribution of responses of variable ‘ educ’ for the entire sample. 1. Select Analyze > Descriptive Statistics > Frequencies 2. Select variable ‘educ’ and move to the Variable(s) window. 3. Click Paste and run syntax.

Selected SPSS Output

From the output above we can see that most employees in the sample have either 12 or 15 years of education. Majority of the sample have 15 or fewer years of education (77.0%).

11

Task 8. Explore the distribution of responses of variable ‘ educ’ separately for each gender. 1. Data > Split file. Click on the Compare groups button. From the list of variables to the left, select the variable (gender) and move it to the window under Groups based on. 2. Click OK or click Paste and then run the syntax. 3. Repeat Task 1. 4. Data > Split File. Click on the Analyze all cases, do not create groups button to remove Split file.

Selected SPSS Output

The frequency table above allows us to compare the distribution of years of education for male and female employees in the sample. As we can see from the table, while majority of female employees have 12 or fewer years of education (73.1%), only 32.9% of male employees have this level of education. So, male employees tend to be more educated in this sample. Running Descriptive Statistics SPSS offers two procedures, Descriptives and Explore, to describe quantitative data using statistical indices, including measures of central tendency, variability, skewness and kurtosis. In addition, with Explore procedure, you can explore your data using stem-and-leaf plots, boxplots, and check whether the distribution of scores is normally distributed. ...