Theory Action
Select a measure Hierarchical methods:
of (dis)similarity ► Analyze ► Classify ► Hierarchical Cluster ► Method ► Measure
Depending on the scale level, select the measure;
convert variables with multiple categories into a set of binary variables and
use matching coefficients; standardize variables if necessary (on a range of
0 to 1).
k-means clustering:
Uses Euclidean distances per default.
Two-step clustering:
► Analyze ► Classify ► Two-Step Cluster ► Distance Measure
Use Euclidean distances when all variables are continuous; for mixed vari-
ables, you have to use the log-likelihood.
Deciding on Hierarchical clustering:
the number of Examine the dendrogram:
clusters
► Analyze ► Classify ► Hierarchical Cluster ► Plots ►Dendrogram
Draw a scree plot: Double-click on the Agglomeration Schedule in the
output window, highlight all coefficients in the column and right-click the
mouse button. In the menu that opens up, select Create Graph ► Line
Compute the VRC using an ANOVA:
► Analyze ► Compare Means ► One-Way ANOVA
Move the cluster membership variable in the Factor box and the clustering
variables in the Dependent List box;
Compute VRC for each segment solution and compare values.
Include practical considerations in your decision.
k-means:
Run a hierarchical cluster analysis and decide on the number of segments
based on a dendrogram or scree plot; use this information to run k-means
with k clusters.
Compute the VRC using an ANOVA:
► Analyze ► Classify ► K-Means Cluster ► Options ►ANOVA table;
Compute VRC for each segment solution and compare values.
Include practical considerations in your decision.
Two-step clustering:
Specify the maximum number of clusters:
► Analyze ► Classify ► Two-Step Cluster ►Number of Clusters
Run separate analyses using the AIC and BIC as clustering criteria:
► Analyze ► Classify ► Two-Step Cluster ► Clustering Criterion
Examine the model summary output.
Include practical considerations in your decision.
Theory Action
Validating and interpreting the cluster solution
Stability Re-run the analysis using different clustering procedures, algorithms or dis-
tance measures.
Change the order of objects in the dataset.
Differentiation of Compare the cluster centroids across the different clusters for significant
the data differences.
If possible, assess the solution’s criterion validity.
Profiling Identify observable variables (e.g., demographics) that best mirror the parti-
tion of the objects based on the clustering variables.
Interpreting Identify names or labels for each cluster and characterize each cluster using
of the cluster observable variables.
solution