SPSS Syntax StatLab Advanced SPSS
SPSS Syntax StatLab Advanced SPSS
SPSS Syntax
This section provides an introduction and sampling of basic syntax in SPSS. All illustrations and details are
intended to apply to SPSS v19 for Windows (and most Macs). Most of the information will be the same for other
versions, but there may be discrepancies.
Syntax refers to the computer language SPSS uses to complete analyses. While most commonly used commands
are available through the menu system of SPSS (point & click) many more options and functionalities are available
using syntax.
Using syntax can save a great deal of time when running repetitive analyses.
It allows you to instantly duplicate that work with a new (or updated) data set.
It allows you to 'tweak' your analysis in ways not available through dialog boxes
If you'd like to follow along, please open the sample data file satisf.sav'
Creating and Saving a Syntax File: Syntax files in SPSS are plain text files with an extension of '.sps'.
1) Double click and it will open to the data file- variable view.
3) To save the file File >Save > (Dialogue box will pop up. Name and designate location)
1
1) Syntax files in SPSS are plain text files with an extension of '.sps'. You can create syntax several ways.
Probably the easiest way to start using syntax is by using the 'paste' button available in most dialog
boxes. It is seen below in the Descriptives dialog box.
First select the options you wish from the dialog box, then, instead of clicking 'OK', click 'Paste'. If you do
not have a syntax window open already, this will open a new syntax window containing the commands you
selected in the dialog box as seen below.
• If you already have a syntax window open, the commands will be pasted at the bottom of the currently
active syntax window. The Syntax Editor allows you to edit a plain text file and submit selected commands
to SPSS directly. Hit the green arrow (run command) and you will see on your output viewer the
descriptive statistics for these data.
2
• Next, let’s say that we want every variable in the data set.
• Likewise if you would only like the mean and standard deviation, you can eliminate the MIN and MAX
from the /STATISTICS =
• You can add notes (using *), cut, paste and edit just as with any other text file.
FREQUENCIES
VARIABLES=gender regular
/ORDER= ANALYSIS.
In the syntax window there is a very useful toolbar button called 'Syntax Help'. It is context
sensitive, meaning that it will display a syntax chart for the command where the cursor is
currently located. Clicking on the 'Syntax Help' button provides the following information
about the FREQUENCIES command.
Don't let the long list intimidate you. Many people are surprised to learn that FREQUENCIES
has so many subcommands and specifications!
Subcommands and specifications in square brackets ([ ]) are optional, and those in braces ({
}) indicate a choice between elements. Look closely, there are only two words that are NOT
in brackets, FREQUENCIES and 'varlist'. This means that you can run frequencies on the two
variables 'sex' and 'race' by typing the following into a syntax editor window:
There are many abbreviations allowed in syntax (generally the first 3 or 4 letters of a
command/specification will suffice.) So the following syntax will provide the same output:
Don't forget the period at the end of the command. The 'Syntax Help' button provides the
skeleton of the command you are using, but does not provide a detailed explanation of each
possible subcommand and/or specification. That information is found in the Command
Syntax Reference (use the menus: Help >Command Syntax Reference).
4
Now lets add a little more to the syntax. The following will provide descriptive statistics and
barcharts for our two variables:
If you want to add documentation to your syntax file, indicate the start of a comment with an
asterisk (*). Everything between that asterisk and the next period will by ignored by SPSS.
Remember not to add other periods in your documentation if you use this method, since
SPSS will try to interpret everything after the period as commands. Another method is to use
/* and */ to set off a comment. That is, start with /*, insert your comment of as many or as
few words and lines as you want then end the comment with */. Remember not to include
any periods in the comment using this technique either.
• Creating a well documented data file can be quite tedious. Syntax statements can make the process a little
less painful. The MISSING VALUES command declares values for variables as user-missing. User-missing
values are then treated the same as the system-missing values. (That is, they are usually ignored.) Multiple
missing values are separated by commas, and a range of missing values may be declared using the
keywords LO, LOWEST, HI, HIGHEST, and THRU. If the variable(s) are strings, enclose the missing values in
single quotes. Still using the 'demo' file from above; the following command sets values of the variable
'age' higher than 99 (including 99) and values of 'region' equal to 999 to user-missing. Don't forget the
period at the end.
• To cancel previously declared missing values, simply reassign the missing values to blank (use () in the
previous statement.) To remove all missing values settings at once, use the following code:
5
Variable and Value Labels
• VARIABLE LABELS assigns descriptive labels to variables in the data file. You can assign a label to one
variable or to a long list at the same time. The following syntax assigns labels to two variables in the active
file. NOTE: Each variable label can be up to 120 characters long, although most procedures will only print
fewer than the 120 characters. All statistical procedures display at least 40 characters.
• In general, syntax will ignore spaces and lines within commands and subcommands. It is often easier to
read a syntax file if you add spaces and start new lines to create columns, as below.
• The VALUE LABELS command assigns descriptive labels to values of variables in the data file. Many people
confuse variable and value labels when they are new to them. Variable labels describe the variables and
value labels allow you to assign descriptions to particular values of a variable. In the 'demo' file, ‘contact’ is
either 0 or 1. Value labels help you to remember whether 0 means married or not married.
VALUE LABELS
contact 0 'no' 1 'yes'.
• NOTE: The VALUE LABELS command deletes all existing value labels for the specified variable(s) and
assigns new value labels. The ADD VALUE LABELS command can be used to add new labels or to alter labels
for specified values without deleting existing labels.
•
• To create value labels for additional variables just list the next variable after the last value label of the
previous, followed by the value labels in single quotes. Remember to put a period only at the very end of
the command.
6
Data Management: COMPUTE, RECODE, SPLIT FILE, and FILTER
• In this section we will go over creating new variables (COMPUTE), recoding the values of existing variables
(RECODE), running the same analysis on subgroups (SPLIT FILE) and using filters to select subsections of
your data (FILTER).
o The COMPUTE command creates new numeric variables or modifies the values of existing string or
numeric variables. You may be familiar with the dialog box, accessed by clicking Transform
>Compute
o BUT, It is often more efficient to write COMPUTE statements in syntax, instead of 'pointing and clicking'
your way to a new variable. While the dialog box is especially useful for functions you are not familiar
with, I find it faster to code common formulas directly in syntax. The examples below illustrate why
computing with syntax might save you some time. Note the differences between the 4 different
computations of the Satisfaction.
7
Note: Syntax will not warn you if you are about to rewrite an existing variable. But you will have record of
what you have done.
Notice the EXECUTE statement at the end. SPSS requires this at the end of COMPUTE commands, unless a
procedural command follows it (i.e. FREQUENCIES, or other statistical analyses.) If you forget and run
the syntax without it, the Data Editor window will display "Transformations Pending" in the bottom
display area. Either add the EXECUTE statement to the syntax, highlight it and run it, or go to the menus
and click Transform >Run Pending Transforms to complete the command. EXECUTE can be
shortened to EXE.
o The RECODE function allows you change the values of a variable. For example, it is sometimes useful to
reverse code responses to a survey (change the highest response to the lowest and vice versa.) You may
also need to collapse categories for an analysis. In general, it is safer to recode into a new variable,
rather than change an existing one, so I will not address that option here.
Here is an example from our data set. We would like to change distance from store into a binary
variable where people have traveled either more or less than 10 miles to shop:
Notice, at this time it makes sense to apply desired variable and value labels.
o The RECODE command needs a list of variables to act upon (yes, you can recode many variables at once
by listing them after the RECODE statement). The variable(s) are followed by a list of recodes each
enclosed in parentheses. Here is an example, splitting the satisfaction variables into low and high
groups.
8
o The keywords MISSING and SYSMIS both refer to missing values. MISSING includes both system-
missing values (no value entered in the data set) and user-missing (values entered but set to missing by
the user.) SYSMIS refers only to system-missing values. Since I included MISSING, I did not need to
include SYSMIS.
o Remember that you can get reference information by clicking on the 'Syntax Help' button in the Syntax
Editor, or by looking in the Command Reference.
o Another great recode feature that is not available through the dialogue boxes is the ability to recode
given particular If statements. Let’s say that we want to create a variable that tells us if the customer
who answered the survey was an older woman or not, because we are curious if the older female
customers were getting the same attention as the other shoppers by the sales associates. With syntax we
can create a variable with two (or more) conditional statements. Here is what that would look like.
COMPUTE FemOlder = 0.
EXECUTE.
EXECUTE.
Now we have a variable that we can cross with Contact to check if out our ideas.
o If you are interested in running the same analysis on a set of subgroups in your data you can use SPLIT
FILE to accomplish this. Using the survey data, let's see if the distribution of education is different for
males and females. We'll use SPLIT FILE and FREQUENCIES to get the output we're looking for. SPLIT
FILE requires that the data be sorted by the variable(s) we want to split by. The SORT CASES command
will sort by the variables listed after the command. The default is to sort ascending. If you want to sort
descending, add (D) after that variable.
9
SORT CASES BY contact.
SPLIT FILE LAYERED BY contact.
FREQUENCIES Satisfaction
/HISTOGRAM.
o By default, SPLIT FILE will produce output with the values of the split variable as the outermost
column entries of a table (the LAYERED BY option.) If you want split file groups to display in separate
tables, use SEPARATE BY instead. It is a good practice to 'turn off' splits as soon as you complete the
analysis. SPLIT FILE will be overridden by a later SPLIT FILE command. Until SPLIT FILE OFF is
entered, all analyses will be carried out on a split file.
o Filter variables (also known as indicator variables and dummy variables) are used to identify cases that
meet the criteria you have specified. For example, let's create a variable to identify all the women in the
who have had contact with a sales employee and live within 10 miles of the store that they visited. To
accomplish this enter:
USE ALL.
VARIABLE LABELS filter_$ 'gender=1 & contact=1 & distance >2 (FILTER)'.
FILTER BY filter_$.
EXECUTE.
o The USE ALL command is added by the dialogue box to ensure that no other filters are active when
creating the new variable
o The COMPUTE command creates a new variable 'filter_$'. This is the default SPSS name for filters. If you
used the dialogue boxes to create another filter, it would also be named 'filter_$' which would overwrite
the previous filter. You could, of course, rename the filter after using the dialogue boxes, but it is
surprisingly easy to forget. Using syntax, you can name the variable something more useful as you
create it.
o The next three lines of syntax (VARIABLE LABEL, VALUE LABEL, and FORMAT) are not absolutely
necessary but are very helpful for documentation sake, not to mention ease of use of the data set.
Remember, if you change the name of the filter variable, you need to change 'filter_$' to the new name
in each of the next four lines as well.
o The next line of the syntax applies the filter to your data. FILTER BY filters out all cases where the filter
variable is 0. In other words, applying a filter selects only those cases where the filter variable equals 1
or greater.
o The SELECT IF statement is another way to select subsets of cases. Instead of using a filter variable, the
logical expression (or formula) to select cases is specified in the command. The major difference is that
this command permanently deletes non-selected cases. So, SELECT IF can be used if you need to
create a new data set that is a subset of existing data.For example, the following syntax will provide a
data set that includes only females:
10
SELECT IF (gender = 1).
o SELECT IF will evaluate the expression you enter between the parentheses as either True, False, or
Missing; all the False and Missing cases are dropped from the active data set.
o Using the TEMPORARY command will allow you to return to the original data set once a command that
reads the data is run. That is, temporary transformations only apply until the next command that reads
data and are no longer in effect once that command has run. So, the following syntax will temporarily
filter out males, and then run DESCRIPTIVES for the variables 'happy' and 'life'. Since the DESCRIPTIVES
command reads the data, it turns off the TEMPORARY command. Repeating the same DESCRIPTIVES
command will then act upon the entire data set, not just females.
TEMPORARY.
SELECT IF (gender=1).
DESCRIPTIVES distance overall.
o Which is better; SELECT IF or creating a filter? That depends on how you prefer to work in SPSS. If
you are writing a syntax file to document an analysis, it may be easier to follow if you use SELECT IF,
particularly if you need to run only one command on a subset of data. If you need to run multiple
commands on a subset of data, it may be easier to use filters to subset your data, depending upon how
you need to split the data. If you are in the midst of an analysis and are switching back and forth from
syntax to point & click, it is useful to have permanent filter variables instead of re-creating them each
time you want to use them.
o Syntax can also be helpful in keeping a good record of the preparation that has gone into a data set
BEFORE hypothesis testing begins. For example let’s consider our data set. Let’s say that we want to
test out customer satisfaction. We have 6 different variables regarding customer satisfaction. The ideal
is to test these items for normality, to confirm a single latent factor in these items (of satisfaction), and
to confirm this idea with reliability analysis. Syntax can serve as a good record of due diligence in data
preparation.
To run frequencies:
/HISTOGRAM NORMAL
/ORDER=ANALYSIS.
We can use the dialogue box under Analyze > Dimension Reduction > Factor
11
Next Click on Extraction. In this case, we will allow for 3 factors to emerge. Under Extract click
Fixed number of factors and then specify 3 in the dialogue box.
Press Continue. Once back to the main dialogue box, click PASTE.
FACTOR
/MISSING LISTWISE
/EXTRACTION PC
/CRITERIA ITERATE(25)
/ROTATION VARIMAX
12
/METHOD=CORRELATION.
The Output :
To run reliability:
Go to Analyze > Scale > Reliability Analysis You will see a dialogue box such as this:
13
Press the Statistics Button and check off item, scale, scale if item deleted, means, variances
and then hit continue.
Once back to the main dialogue box, enter the variables of interest
14
RELIABILITY
/MODEL=ALPHA
/STATISTICS=DESCRIPTIVE SCALE
Let’s run it and make some decisions about our measure of satisfaction. The analysis tells us that although
Item quality satisfaction is not really hurting us, Organization satisfaction actually is.
Item-Total Statistics
Scale Mean Scale Corrected Squared Cronbach's
if Item Variance if Item-Total Multiple Alpha if Item
Deleted Item Deleted Correlation Correlation Deleted
Price satisfaction 15.57 23.133 .731 .589 .772
Variety satisfaction 15.55 23.404 .701 .595 .779
Organization
15.38 27.817 .267 .096 .867
satisfaction
Service satisfaction 15.50 23.235 .675 .489 .783
Item quality
15.48 23.422 .612 .395 .797
satisfaction
Overall satisfaction 15.52 24.100 .653 .453 .789
Here is one of the beauties of syntax. Now we simply make a note about what we see in our syntax (i.e. *alpha at
.828, but shows that measure could be improved by the removal of Organization), copy and paste the first
reliability command, remove the Organization satisfaction variable and run it again. Now we have
alpha at .867 and the following output:
Item-Total Statistics
Scale Mean if Scale Corrected Squared Cronbach's
Item Deleted Variance if Item-Total Multiple Alpha if Item
Item Deleted Correlation Correlation Deleted
Price satisfaction 12.35 18.132 .748 .584 .825
Variety satisfaction 12.33 18.072 .750 .590 .824
Service satisfaction 12.28 18.290 .682 .483 .841
Item quality
12.26 18.460 .615 .388 .859
satisfaction
Overall satisfaction 12.30 19.016 .665 .452 .845
15
Creating our new variable.
o Now we know how to do this! Let’s compute our new variable of satisfaction.
COMPUTE Satisfaction=SUM(price,numitems,service,quality,overall)/
NVALID (price,numitems,service,quality,overall).
EXECUTE.
These procedures can be modified and used whenever data preparation is necessary BEFORE that
all-important analysis is done. It is crucial to understand the distributions and the nature of your
measures before you begin hypothesis testing. Using syntax is a very useful way to keep track for
yourself and others, of your work.
16