UNDERSTANDING THE EXECUTION OF A DATA STEP IN SAS
1. Compilation Phase
This is where SAS prepares to execute your code.
Key tasks:
• Syntax check: SAS checks the code for errors.
• Descriptor portion creation:
o Lists variable names, types (character or numeric), length, and labels.
• Input buffer is created if you're reading raw data (e.g., using INFILE).
• Program data vector (PDV) is created: a temporary memory area where SAS
holds one observation at a time.
2. Execution Phase
This is where SAS processes the data row by row (observation by observation).
What happens:
• SAS reads one row at a time into the PDV.
• Executes DATA step statements (e.g., IF, DO, calculations).
• At the end of the iteration, if there's no DROP, KEEP, or DELETE, the row is
written to the output dataset.
• Then, it loops back for the next observation.
UNDERSTANDING THE COMPILATION PHASE IN SAS
1. Checks for Syntax Errors
SAS reads your code and verifies the syntax. If something is written incorrectly,
it throws an error here.
2. Builds the Input Buffer (for raw data)
If you’re reading from a raw data file (e.g., using an INFILE statement), SAS
creates an input buffer to hold a single record at a time.
3. Creates the Program Data Vector (PDV)
o SAS creates a temporary area in memory to hold one observation (row) at
a time.
o Each variable is assigned space in the PDV.
o All variables in the PDV are initialized to missing at the start of each
iteration.
4. Identifies Variables and Attributes
Based on your code (like INPUT, SET, or LENGTH), SAS figures out:
o Variable names
o Type (character or numeric)
o Length
o Format and informat
5. Determines When to Execute Code
SAS distinguishes which statements to execute during compilation (e.g., DROP,
KEEP, RETAIN) and which during execution.
6. Prepares the Output Dataset Structure
It defines the structure of the output dataset — what variables it will contain
and their properties.
UNDERSTANDING THE EXECUTION PHASE IN SAS
After the Compilation Phase sets everything up, the Execution Phase is where SAS
actually runs your code and processes the data. This phase handles each
observation one at a time, applying the logic you've written.
What Happens Step-by-Step:
1. One Observation at a Time is Read
o From a dataset (like with the SET statement) or raw file (INFILE/INPUT).
2. PDV (Program Data Vector) is Populated
o The current observation is loaded into memory (PDV).
3. Executable Code is Run
o Statements like IF, DO, OUTPUT, CALCULATIONS, etc., are applied to
that row.
4. Observation is Output
o Unless blocked (e.g., by DELETE or missing OUTPUT), the modified
observation is added to the final dataset.
5. Loop Repeats
o SAS goes to the next observation and repeats the steps until all data is
processed.
6. PDV Resets Each Time
o All non-retained variables are set back to missing before the next row is
processed.
UNDERSTANDING THE PROGRAM DATA VECTOR (PDV) IN SAS
• Program Data Vector (PDV) in SAS
The Program Data Vector (PDV) is an essential component of the SAS System used
during the execution phase of a DATA step. It is a logical area in memory where SAS
builds a data set one observation at a time.
• Purpose of the PDV
The primary purpose of the PDV is to temporarily store data values for each variable
in a DATA step as they are processed. It allows SAS to:
• Execute programming logic (such as IF, DO, CALCULATE) for each observation.
• Assemble each output observation before writing it to the final data set.
• Contents of the PDV
The PDV includes:
• All variables in the input data set.
• New variables created within the DATA step.
• Automatic variables:
o _N_ – counts the number of times the DATA step has iterated.
o _ERROR_ – indicates whether an error occurred in the current observation
(0 = no error, 1 = error).
EXPLAINING THE ROLE OF THE INPUT BUFFER IN SAS
The Input Buffer is a temporary memory area used by SAS only when reading raw
data files using the INFILE statement in a DATA step. It holds a single line of raw data
as a character string from an external file before SAS processes it into variables.
Purpose of the Input Buffer
• Acts as a staging area to load a line of raw data.
• Allows SAS to scan and read values into variables using INPUT statements.
• Works in conjunction with the Program Data Vector (PDV) to construct each
observation.
When Is It Used?
The Input Buffer is used only when:
• Reading external raw data files (like .txt or .csv).
• The DATA step uses the INFILE and INPUT statements.
It is not used when reading from existing SAS data sets (in that case, the SET
statement is used and data goes directly into the PDV).
Example
data students;
infile '[Link]';
input name $ age height weight;
run;
Explanation:
• SAS reads each line from [Link] into the Input Buffer.
• Then, using the INPUT statement, values are read from the buffer and assigned
to variables in the PDV.
• Finally, the observation is written to the output dataset.
THANK YOU FOR READING THIS SAS PROGRAMMING RESOURCE
I HOPE THIS DOCUMENT HELPS YOU BETTER UNDERSTAND IN
YOUR SAS JOURNEY