0% found this document useful (0 votes)
65 views16 pages

SPSS Syntax StatLab Advanced SPSS

Elementary SPSS syntax

Uploaded by

Anonymous mIBVSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views16 pages

SPSS Syntax StatLab Advanced SPSS

Elementary SPSS syntax

Uploaded by

Anonymous mIBVSE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Intermediate SPSS

StatLab Workshop February 21, 2013

Sherlock Campbell & Oriana Aragón

SPSS Syntax
This section provides an introduction and sampling of basic syntax in SPSS. All illustrations and details are
intended to apply to SPSS v19 for Windows (and most Macs). Most of the information will be the same for other
versions, but there may be discrepancies.

Syntax refers to the computer language SPSS uses to complete analyses. While most commonly used commands
are available through the menu system of SPSS (point & click) many more options and functionalities are available
using syntax.

Why Use Syntax?

Using syntax can save a great deal of time when running repetitive analyses.

It is also an easy way to document your work.

It allows you to instantly duplicate that work with a new (or updated) data set.

It allows you to 'tweak' your analysis in ways not available through dialog boxes

If you'd like to follow along, please open the sample data file satisf.sav'

It should be located in C:\Program Files\IBM\SPSS\Statistics\19\Samples\English\satisf.sav.

Creating and Saving a Syntax File: Syntax files in SPSS are plain text files with an extension of '.sps'.

1) Double click and it will open to the data file- variable view.

2) Next open a new syntax file:

File > New > Syntax

3) To save the file File >Save > (Dialogue box will pop up. Name and designate location)

Archiving Procedure and Describing Data

1
1) Syntax files in SPSS are plain text files with an extension of '.sps'. You can create syntax several ways.
Probably the easiest way to start using syntax is by using the 'paste' button available in most dialog
boxes. It is seen below in the Descriptives dialog box.

First select the options you wish from the dialog box, then, instead of clicking 'OK', click 'Paste'. If you do
not have a syntax window open already, this will open a new syntax window containing the commands you
selected in the dialog box as seen below.

• If you already have a syntax window open, the commands will be pasted at the bottom of the currently
active syntax window. The Syntax Editor allows you to edit a plain text file and submit selected commands
to SPSS directly. Hit the green arrow (run command) and you will see on your output viewer the
descriptive statistics for these data.

2
• Next, let’s say that we want every variable in the data set.

• Likewise if you would only like the mean and standard deviation, you can eliminate the MIN and MAX
from the /STATISTICS =

• You can add notes (using *), cut, paste and edit just as with any other text file.

Structure of Commands in SPSS syntax


• Commands in SPSS begin with a keyword that is the name of the command followed by any subcommands
and user specifications. The end of the command is marked by a period/full stop.
• In SPSS syntax files, commands must always be placed in the first column. Refer to the Command Syntax
Reference for a discussion of available commands and options. It can be found in the menus under Help >
Command Syntax Reference.
o Example: the FREQUENCIES command:
 FREQUENCIES produces tables of frequency counts and percentages of the values of
individual variables. FREQUENCIES is used to obtain frequencies and statistics for categorical
variables and to obtain statistics and graphical displays for continuous variables.
 By default, SPSS will paste syntax with commands and specifications in all caps, and will
display variables as you have entered them. Commands and specifications do not have to be
entered in all caps, but I will continue to display them that way to help differentiate them
from variables. In the syntax below, just as pasted from the dialog box, 'sex' and 'race' are
3
the variables that I selected. NOTE: Variable names in SPSS are generally separated by
spaces.

FREQUENCIES
VARIABLES=gender regular
/ORDER= ANALYSIS.

 In the syntax window there is a very useful toolbar button called 'Syntax Help'. It is context
sensitive, meaning that it will display a syntax chart for the command where the cursor is
currently located. Clicking on the 'Syntax Help' button provides the following information
about the FREQUENCIES command.

 Don't let the long list intimidate you. Many people are surprised to learn that FREQUENCIES
has so many subcommands and specifications!
 Subcommands and specifications in square brackets ([ ]) are optional, and those in braces ({
}) indicate a choice between elements. Look closely, there are only two words that are NOT
in brackets, FREQUENCIES and 'varlist'. This means that you can run frequencies on the two
variables 'sex' and 'race' by typing the following into a syntax editor window:

FREQUENCIES gender regular.

 There are many abbreviations allowed in syntax (generally the first 3 or 4 letters of a
command/specification will suffice.) So the following syntax will provide the same output:

FREQ gender regular.

 Don't forget the period at the end of the command. The 'Syntax Help' button provides the
skeleton of the command you are using, but does not provide a detailed explanation of each
possible subcommand and/or specification. That information is found in the Command
Syntax Reference (use the menus: Help >Command Syntax Reference).

4
 Now lets add a little more to the syntax. The following will provide descriptive statistics and
barcharts for our two variables:

FREQUENCIES gender regular


/STATISTICS=STDDEV MINIMUM MAXIMUM MEAN MEDIAN MODE
/BARCHART.

 If you want to add documentation to your syntax file, indicate the start of a comment with an
asterisk (*). Everything between that asterisk and the next period will by ignored by SPSS.
Remember not to add other periods in your documentation if you use this method, since
SPSS will try to interpret everything after the period as commands. Another method is to use
/* and */ to set off a comment. That is, start with /*, insert your comment of as many or as
few words and lines as you want then end the comment with */. Remember not to include
any periods in the comment using this technique either.

Missing Values, Variable and Value Labels


Missing Values

• Creating a well documented data file can be quite tedious. Syntax statements can make the process a little
less painful. The MISSING VALUES command declares values for variables as user-missing. User-missing
values are then treated the same as the system-missing values. (That is, they are usually ignored.) Multiple
missing values are separated by commas, and a range of missing values may be declared using the
keywords LO, LOWEST, HI, HIGHEST, and THRU. If the variable(s) are strings, enclose the missing values in
single quotes. Still using the 'demo' file from above; the following command sets values of the variable
'age' higher than 99 (including 99) and values of 'region' equal to 999 to user-missing. Don't forget the
period at the end.

MISSING VALUES age (99 THRU HIGHEST) region (999).

• To cancel previously declared missing values, simply reassign the missing values to blank (use () in the
previous statement.) To remove all missing values settings at once, use the following code:

MISSING VALUES ALL ().

5
Variable and Value Labels

• VARIABLE LABELS assigns descriptive labels to variables in the data file. You can assign a label to one
variable or to a long list at the same time. The following syntax assigns labels to two variables in the active
file. NOTE: Each variable label can be up to 120 characters long, although most procedures will only print
fewer than the 120 characters. All statistical procedures display at least 40 characters.

VARIABLE LABELS contact 'Contact with Employee' regular ‘Shopping Frequency’.

• In general, syntax will ignore spaces and lines within commands and subcommands. It is often easier to
read a syntax file if you add spaces and start new lines to create columns, as below.

VARIABLE LABELS contact 'Contact with Employee'.

• The VALUE LABELS command assigns descriptive labels to values of variables in the data file. Many people
confuse variable and value labels when they are new to them. Variable labels describe the variables and
value labels allow you to assign descriptions to particular values of a variable. In the 'demo' file, ‘contact’ is
either 0 or 1. Value labels help you to remember whether 0 means married or not married.

VALUE LABELS
contact 0 'no' 1 'yes'.

• NOTE: The VALUE LABELS command deletes all existing value labels for the specified variable(s) and
assigns new value labels. The ADD VALUE LABELS command can be used to add new labels or to alter labels
for specified values without deleting existing labels.

• To create value labels for additional variables just list the next variable after the last value label of the
previous, followed by the value labels in single quotes. Remember to put a period only at the very end of
the command.

6
Data Management: COMPUTE, RECODE, SPLIT FILE, and FILTER
• In this section we will go over creating new variables (COMPUTE), recoding the values of existing variables
(RECODE), running the same analysis on subgroups (SPLIT FILE) and using filters to select subsections of
your data (FILTER).

The COMPUTE command

o The COMPUTE command creates new numeric variables or modifies the values of existing string or
numeric variables. You may be familiar with the dialog box, accessed by clicking Transform
>Compute

o BUT, It is often more efficient to write COMPUTE statements in syntax, instead of 'pointing and clicking'
your way to a new variable. While the dialog box is especially useful for functions you are not familiar
with, I find it faster to code common formulas directly in syntax. The examples below illustrate why
computing with syntax might save you some time. Note the differences between the 4 different
computations of the Satisfaction.

7
Note: Syntax will not warn you if you are about to rewrite an existing variable. But you will have record of
what you have done.

Notice the EXECUTE statement at the end. SPSS requires this at the end of COMPUTE commands, unless a
procedural command follows it (i.e. FREQUENCIES, or other statistical analyses.) If you forget and run
the syntax without it, the Data Editor window will display "Transformations Pending" in the bottom
display area. Either add the EXECUTE statement to the syntax, highlight it and run it, or go to the menus
and click Transform >Run Pending Transforms to complete the command. EXECUTE can be
shortened to EXE.

The RECODE command

o The RECODE function allows you change the values of a variable. For example, it is sometimes useful to
reverse code responses to a survey (change the highest response to the lowest and vice versa.) You may
also need to collapse categories for an analysis. In general, it is safer to recode into a new variable,
rather than change an existing one, so I will not address that option here.

Here is an example from our data set. We would like to change distance from store into a binary
variable where people have traveled either more or less than 10 miles to shop:

Notice, at this time it makes sense to apply desired variable and value labels.

o The RECODE command needs a list of variables to act upon (yes, you can recode many variables at once
by listing them after the RECODE statement). The variable(s) are followed by a list of recodes each
enclosed in parentheses. Here is an example, splitting the satisfaction variables into low and high
groups.

8
o The keywords MISSING and SYSMIS both refer to missing values. MISSING includes both system-
missing values (no value entered in the data set) and user-missing (values entered but set to missing by
the user.) SYSMIS refers only to system-missing values. Since I included MISSING, I did not need to
include SYSMIS.

RECODE price (1=0) (2=0) (3=0) (ELSE=SYSMIS) INTO price_low. EXECUTE.

o Remember that you can get reference information by clicking on the 'Syntax Help' button in the Syntax
Editor, or by looking in the Command Reference.
o Another great recode feature that is not available through the dialogue boxes is the ability to recode
given particular If statements. Let’s say that we want to create a variable that tells us if the customer
who answered the survey was an older woman or not, because we are curious if the older female
customers were getting the same attention as the other shoppers by the sales associates. With syntax we
can create a variable with two (or more) conditional statements. Here is what that would look like.

COMPUTE FemOlder = 0.

EXECUTE.

If (Gender = 1) and (agecat > 4) FemOlder = 1.

EXECUTE.

Now we have a variable that we can cross with Contact to check if out our ideas.

The SPLIT FILE command

o If you are interested in running the same analysis on a set of subgroups in your data you can use SPLIT
FILE to accomplish this. Using the survey data, let's see if the distribution of education is different for
males and females. We'll use SPLIT FILE and FREQUENCIES to get the output we're looking for. SPLIT
FILE requires that the data be sorted by the variable(s) we want to split by. The SORT CASES command
will sort by the variables listed after the command. The default is to sort ascending. If you want to sort
descending, add (D) after that variable.

SORT CASES BY contact.


SPLIT FILE SEPARATE BY contact.
FREQUENCIES Satisfaction
/HISTOGRAM.

9
SORT CASES BY contact.
SPLIT FILE LAYERED BY contact.
FREQUENCIES Satisfaction
/HISTOGRAM.

SPLIT FILE OFF.

o By default, SPLIT FILE will produce output with the values of the split variable as the outermost
column entries of a table (the LAYERED BY option.) If you want split file groups to display in separate
tables, use SEPARATE BY instead. It is a good practice to 'turn off' splits as soon as you complete the
analysis. SPLIT FILE will be overridden by a later SPLIT FILE command. Until SPLIT FILE OFF is
entered, all analyses will be carried out on a split file.

The FILTER and SELECT IF command

o Filter variables (also known as indicator variables and dummy variables) are used to identify cases that
meet the criteria you have specified. For example, let's create a variable to identify all the women in the
who have had contact with a sales employee and live within 10 miles of the store that they visited. To
accomplish this enter:

USE ALL.

COMPUTE filter_$=(gender=1 & contact=1 & distance >2).

VARIABLE LABELS filter_$ 'gender=1 & contact=1 & distance >2 (FILTER)'.

VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.

FORMATS filter_$ (f1.0).

FILTER BY filter_$.

EXECUTE.

o The USE ALL command is added by the dialogue box to ensure that no other filters are active when
creating the new variable
o The COMPUTE command creates a new variable 'filter_$'. This is the default SPSS name for filters. If you
used the dialogue boxes to create another filter, it would also be named 'filter_$' which would overwrite
the previous filter. You could, of course, rename the filter after using the dialogue boxes, but it is
surprisingly easy to forget. Using syntax, you can name the variable something more useful as you
create it.
o The next three lines of syntax (VARIABLE LABEL, VALUE LABEL, and FORMAT) are not absolutely
necessary but are very helpful for documentation sake, not to mention ease of use of the data set.
Remember, if you change the name of the filter variable, you need to change 'filter_$' to the new name
in each of the next four lines as well.
o The next line of the syntax applies the filter to your data. FILTER BY filters out all cases where the filter
variable is 0. In other words, applying a filter selects only those cases where the filter variable equals 1
or greater.
o The SELECT IF statement is another way to select subsets of cases. Instead of using a filter variable, the
logical expression (or formula) to select cases is specified in the command. The major difference is that
this command permanently deletes non-selected cases. So, SELECT IF can be used if you need to
create a new data set that is a subset of existing data.For example, the following syntax will provide a
data set that includes only females:

10
SELECT IF (gender = 1).

o SELECT IF will evaluate the expression you enter between the parentheses as either True, False, or
Missing; all the False and Missing cases are dropped from the active data set.
o Using the TEMPORARY command will allow you to return to the original data set once a command that
reads the data is run. That is, temporary transformations only apply until the next command that reads
data and are no longer in effect once that command has run. So, the following syntax will temporarily
filter out males, and then run DESCRIPTIVES for the variables 'happy' and 'life'. Since the DESCRIPTIVES
command reads the data, it turns off the TEMPORARY command. Repeating the same DESCRIPTIVES
command will then act upon the entire data set, not just females.

TEMPORARY.
SELECT IF (gender=1).
DESCRIPTIVES distance overall.

o Which is better; SELECT IF or creating a filter? That depends on how you prefer to work in SPSS. If
you are writing a syntax file to document an analysis, it may be easier to follow if you use SELECT IF,
particularly if you need to run only one command on a subset of data. If you need to run multiple
commands on a subset of data, it may be easier to use filters to subset your data, depending upon how
you need to split the data. If you are in the midst of an analysis and are switching back and forth from
syntax to point & click, it is useful to have permanent filter variables instead of re-creating them each
time you want to use them.

Preparing Data for Analysis

o Syntax can also be helpful in keeping a good record of the preparation that has gone into a data set
BEFORE hypothesis testing begins. For example let’s consider our data set. Let’s say that we want to
test out customer satisfaction. We have 6 different variables regarding customer satisfaction. The ideal
is to test these items for normality, to confirm a single latent factor in these items (of satisfaction), and
to confirm this idea with reliability analysis. Syntax can serve as a good record of due diligence in data
preparation.
 To run frequencies:

FREQUENCIES VARIABLES=price numitems org service quality overall

/STATISTICS=STDDEV MEAN MEDIAN

/HISTOGRAM NORMAL

/ORDER=ANALYSIS.

 To identify latent factors:

We can use the dialogue box under Analyze > Dimension Reduction > Factor

11
Next Click on Extraction. In this case, we will allow for 3 factors to emerge. Under Extract click
Fixed number of factors and then specify 3 in the dialogue box.

Press Continue. Once back to the main dialogue box, click PASTE.

Here is the resulting syntax.

FACTOR

/VARIABLES price numitems org service quality overall

/MISSING LISTWISE

/ANALYSIS price numitems org service quality overall

/PRINT INITIAL EXTRACTION ROTATION

/CRITERIA FACTORS(3) ITERATE(25)

/EXTRACTION PC

/CRITERIA ITERATE(25)

/ROTATION VARIMAX

12
/METHOD=CORRELATION.

The Output :

Rotated Component Matrixa


Component
1 2 3
Price satisfaction .700 .480 .098
Variety satisfaction .683 .539 -.040
Organization satisfaction .127 .096 .982
Service satisfaction .846 .165 .142
Item quality satisfaction .271 .907 .134
Overall satisfaction .803 .202 .121
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
The Rotation Matrix lets us know that indeed not all items load onto only one factor. Indeed it appears that
what seems to load with Overall satisfaction are the variables of Price, Variety, and Service. The
items of Organization satisfaction and the Item quality satisfaction seem to have different
originations and possibly should not be considered in the same scale. But we do not need to guess about this, we
can now run reliability analysis that will provide for us inter-item correlations so that we may verify the internal
consistency of this dependent variable of satisfaction.

 To run reliability:

Go to Analyze > Scale > Reliability Analysis You will see a dialogue box such as this:

13
Press the Statistics Button and check off item, scale, scale if item deleted, means, variances
and then hit continue.

Once back to the main dialogue box, enter the variables of interest

Then hit Paste . You will see a syntax like this…

14
RELIABILITY

/VARIABLES=price numitems org service quality overall

/SCALE('ALL VARIABLES') ALL

/MODEL=ALPHA

/STATISTICS=DESCRIPTIVE SCALE

/SUMMARY=TOTAL MEANS VARIANCE.

Let’s run it and make some decisions about our measure of satisfaction. The analysis tells us that although
Item quality satisfaction is not really hurting us, Organization satisfaction actually is.

Item-Total Statistics
Scale Mean Scale Corrected Squared Cronbach's
if Item Variance if Item-Total Multiple Alpha if Item
Deleted Item Deleted Correlation Correlation Deleted
Price satisfaction 15.57 23.133 .731 .589 .772
Variety satisfaction 15.55 23.404 .701 .595 .779
Organization
15.38 27.817 .267 .096 .867
satisfaction
Service satisfaction 15.50 23.235 .675 .489 .783
Item quality
15.48 23.422 .612 .395 .797
satisfaction
Overall satisfaction 15.52 24.100 .653 .453 .789

Here is one of the beauties of syntax. Now we simply make a note about what we see in our syntax (i.e. *alpha at
.828, but shows that measure could be improved by the removal of Organization), copy and paste the first
reliability command, remove the Organization satisfaction variable and run it again. Now we have
alpha at .867 and the following output:

Item-Total Statistics
Scale Mean if Scale Corrected Squared Cronbach's
Item Deleted Variance if Item-Total Multiple Alpha if Item
Item Deleted Correlation Correlation Deleted
Price satisfaction 12.35 18.132 .748 .584 .825
Variety satisfaction 12.33 18.072 .750 .590 .824
Service satisfaction 12.28 18.290 .682 .483 .841
Item quality
12.26 18.460 .615 .388 .859
satisfaction
Overall satisfaction 12.30 19.016 .665 .452 .845

15
Creating our new variable.

o Now we know how to do this! Let’s compute our new variable of satisfaction.

COMPUTE Satisfaction=SUM(price,numitems,service,quality,overall)/

NVALID (price,numitems,service,quality,overall).

EXECUTE.

VARIABLE LABELS Satisfaction 'Measure of Satisfaction'.

These procedures can be modified and used whenever data preparation is necessary BEFORE that
all-important analysis is done. It is crucial to understand the distributions and the nature of your
measures before you begin hypothesis testing. Using syntax is a very useful way to keep track for
yourself and others, of your work.

16

You might also like