STAT-3094 - Lecture 7 - Data Step Processing PDF

Title STAT-3094 - Lecture 7 - Data Step Processing
Author M.R. Smith
Course SAS Programming
Institution Virginia Polytechnic Institute and State University
Pages 3
File Size 234.9 KB
File Type PDF
Total Downloads 106
Total Views 129

Summary

Lecture number corresponds to the module number for the course.
Professor: EP Smith...


Description

Daniel T. Eisert

STAT-3094

7 – Data Step Processing Creating a SA SAS S Dataset

There are two phases that occur when SAS runs a data statement to create a SAS dataset: - Compilation Phase: each statement in the SAS data step is scanned for errors; SAS determines the variables that are written to the SAS dataset. - Execution Phase: the DATA step reads and processes the input data; the DATA step executes once for each record in the input file unless otherwise directed. During the compilation phase, an input buff buffer er er, an area of memory where raw data from an external file is stored, is created. - The input buffer is not a real storage location; it is only logical, so it is inaccessible. - The purpose of the input buffer is to store the raw data one observation at a time before processing it into SAS data. After the input buffer is made, the program data ve vector ctor (PDV) is created. The PDV is an area of memory that SAS uses to create the SAS dataset one observation at a time. Similarly, to the input buffer, it is logical. - The PDV sets up the variables that will be read into the SAS dataset. - SAS also checks for errors here in the compilation phase. - SAS determines the variables that will be created in the PDV.

Example

In the execution phase, SAS begins to read and process the input data into the program data vector. The DATA step executes once for each record in the raw data file unless otherwise directed. Compilation Phase: - SAS scans the SAS code submitted for errors (and not the raw data set). If there is an error in the data, you would encounter it here. Some of the errors SAS checks for are missing or misspelled keywords, invalid variable names, missing or invalid punctuation, and invalid options. - The input buffer and program data vector are created (notice the input statement and the calculation statement in the code). - First, in the PDV are two automatic variables that can be used for processing, but are not literally written into the SAS dataset: _n_ (counts the number of times that the DATA step begins to execute) and _error_ (signals an occurrence for an error that is caused by the data curing execution: 0 = no error, 1 = error) variables. - Both variables are placed in the PDV. SAS loads the variables that the code creates (see chart).

1

Daniel T. Eisert Example

STAT-3094 Ex Execution ecution Phase: - SAS begins to process the raw data to create the dataset. The DATA step executes one for each record in the raw dataset. - Starting with the first line in the raw dataset, the data is placed into the input buffer first (see chart). - Next, SAS initializes the variables to missing. The _n_ variable starts at 1 and the _error_ variable starts at 0 since there are no errors to begin with (see chart). - SAS then reads the instructions in the infile statement to figure out how to apply the input statement and read the data into the PDV. SAS now dumps the raw data from the input buffer to the PDV. SAS realizes that something is in the PDV. If the PDV was at the end of the file, SAS would leave the DATA statement. - Since there is something in the PDV, SAS executes other statements in the DATA step (calculates the new price value in this case). SAS then outputs -

-

this information to the SAS dataset. SAS then takes takes in the next raw dataset line into the input buffer. SAS initializes the variables to missing. The _n_ variable now increments to 2 and the _error_ variable value initializes to zero again. SAS again reads the instructions in the infile statement to figure out how to apply the input statement and read the data into the PDV. SAS now dumps the raw data (the 2ND observation since _n_ = 2) from the input buffer to the PDV. SAS realizes that something is in the PDV. Since there is something in the PDV, SAS executes the other statements in the data step (the new_price statement). Eventually, there is no more raw data. The _n_ variable now increments to

6 and the _error_ variable value initializes to zero again. SAS is ready to read the instructions in the infile statement to figure out how to apply the input statement, but SAS noticed that nothing is there. - SAS therefore determines it is at the end of the file. There is nothing else to output. Statements are not executed. SAS exits the DATA step.  When does the input buffer not run when creating a SAS dataset? o The input buffer does not run if you are creating a new SAS dataset from an existing SAS dataset. Only the PDV runs.  When does the PDV not run when creating a SAS dataset? o The PDV always exists and runs regardless of how you create the SAS dataset.  Do we need to worry about the information stored in the PDV after the SAS dataset is made? o No, the next time we create another SAS dataset, SAS will reset the PDV in the execution phase. -

Key P Points oints

2

Daniel T. Eisert

STAT-3094

3...


Similar Free PDFs