SAS Programming Conceptual Doubts
1. INPUT Statement Behavior in DATA Steps
Concept:
When you use multiple input statements in a DATA step, each input reads from the input
buffer based on how previous statements or line-hold specifiers ( @) are used.
a. Workflow
Single use:
input x y;
Reads both variables from the current input record (row).
Multiple uses (no hold):
input x;
input y;
Each new input statement moves to the next raw data line, so x and y are read
from separate lines.
b. Hold Specifiers
Adding @ at the end holds the current line for more input:
input x @;
input y;
Both x and y read from the same line.
Diagram
Raw Data: 5 10
input x; x=5 (line 1)
input y; y=10 (line 2)
2. Implicit Data Conversion in SAS
Concept:
SAS automatically converts data types in certain cases (called implicit conversions):
Character-to-numeric:
When you use a character variable in a numeric context.
data test;
charval = "123";
num = charval + 1; /* SAS converts charval to 123 */
run;
Numeric-to-character:
When you use a numeric variable in a character context.
data test;
numval = 456;
char = put(numval, 8.); /* explicit, but SAS will also auto-convert if needed */
run;
Warning:
SAS logs NOTE messages like “numeric values have been converted...” if implicit
conversions occur.
3. Difference between .sas and .sas7bdat
.sas:
o Type: Text program file.
o Use: Contains SAS code, data steps, procedures, macros, etc.
o Editable: Yes, in any text editor.
.sas7bdat:
o Type: Binary data file.
o Use: Stores SAS datasets (tables).
o Editable: Only readable by SAS software.
o
Extensio File Type Use Editabl
n e?
.sas SAS Code/ Yes
Program Script
.sas7bd SAS Data No
at Dataset Tables
4. Controlled Terminology vs. Codelist
Aspect Controlled Terminology Codelist
Definition A standard set of terms defined by an A finite list of allowable values, which may or
authority (e.g., CDISC, MedDRA) may not be controlled standards
Source External standards (MedDRA, CDISC, Study-specific or sponsor-defined or external
SNOMED) source
Change Strict (managed by organizations) Variable, may be flexible
Control
Example Gender: M=Male, F=Female (CDISC) Status: 1=Active, 2=Inactive
5. WHERE Statement Restrictions Inside DATA Steps
The WHERE statement can only be applied directly with SET, MERGE, UPDATE, or
MODIFY inside a DATA step, not arbitrarily in the middle.
data new;
set old;
where x=1; /* Valid */
/* ... */
run;
Placing a WHERE statement after other DATA step code (without being part of a SET
or similar statement) will cause an error.
6. No Statements Allowed After DATALINES
When using datalines; (or cards;), no SAS statements are allowed after the data
lines in the same DATA step.
The block must end with a single semicolon on a line.
data test;
input x y;
datalines;
12
34
; /* NO code between this and the run; */
run;
7. WHERE vs. IF After INFILE/INPUT
When reading raw/external data (with infile + input), WHERE statements cannot be
used to filter observations. Use IF instead.
The WHERE statement is only valid when reading from an existing SAS dataset
(e.g., SET statement).
Example:
data filter1;
infile '[Link]';
input id age;
if age > 20; /* Valid */
run;
data filter2;
set data_existing;
where age > 20; /* Valid */
run;
8. FIND Function Case Sensitivity
The find() function in SAS is case-sensitive by default.
o find(string, 'XX') will not match 'xx' in the string.
To ignore case, use the modifiers or findc() with :i
o e.g., find(string, 'xx', 'i') ignores case.
9. Nesting DATA Steps in SAS
Standard Practice:
SAS does not allow one DATA step to be placed inside another (no true nesting).
Implementation:
You can have multiple DATA steps sequentially, but not nested:
data a;
/* code */
run;
data b;
/* code */
run;
If attempted, SAS returns a syntax error.
10. LEAVE vs. BREAK Statements
Stateme Behavior Where Used
nt
LEAVE Immediately exits current loop (DO or DO DATA step loops
WHILE/UNTIL)
BREAK Used with PROC REPORT for breaking on variable PROC REPORT
value only
LEAVE is similar to “break” in other languages: it stops the nearest enclosing loop
and continues afterwards.
11. Default Length of Character Variables
By default:
SAS assigns a character variable length of 8 bytes if you do not specify a
length.
Exception:
If you create a character variable using the CHAR() function, the default length is 200
bytes.
Variable Creation Default
Length
char variable (data step, no explicit 8 bytes
length)
using CHAR() function 200 bytes
12. Default SAS Date Format
SAS stores dates as numbers (days since 1960-01-01).
When displayed, the default format is DATE9., e.g., 01JAN1960.
13. Character Variables: Length by Value
If you specify a value on creation, but don’t use a LENGTH statement, the length of
the variable is determined by the first occurrence.
data test;
x = 'ABCD'; /* x has length 4 */
run;
But if assigned a shorter value later, it is still stored at the assigned length.
Effect of the DROP Statement on SAS
Dataset Variables
Dataset, [Link]:
Nam Gend Ag
e er e
Alice F 23
Bob M 30
I want to create two datasets:
males: containing only male students (without the Gender variable).
females: containing only female students (without the Gender and Age variables).
SAS Code Example
data males females(drop=age);
set [Link];
drop gender;
if gender = 'M' then output males;
else if gender = 'F' then output females;
run;
What Happens Step by Step
SAS reads each observation from [Link].
It uses the gender variable in the if condition to decide whether to write the record
into males or females.
The drop gender; statement removes the gender variable from both output datasets.
The (drop=age) option removes age only from the females dataset.
All variables are available during processing (so you can use gender in logic),
but they are dropped only when written to the output datasets.
Resulting Datasets
Datas Variables Observations
et Included
males Name, Age Bob (gender =
M)
femal Name Alice (gender
es = F)
Key Points remember
Variables listed in the DROP statement can still be used in DATA step code and
logic.
Variables specified for dropping do not appear in the final output datasets.
Using the DROP statement (and the (drop=...) dataset option) to keep only those
variables you want in your output.