SPSS - Using K-means clustering after factor analysis

Question

I am a developer that has been tasked with working out how previous results using SPSS were gathered, so we can repeat the process with some new data. We can't ask the person who did the original analysis because he is sadly no longer with us, so it has fallen to me to unravel what he did.

I am not a statistician and do not need to understand the principles involved. I really just need to know what menu items to navigate to.

We had a survey done, which asked a lot of questions of 10,000 people. A subset of 15 of these questions is being used for the analysis.

I know that factor analysis was done to reduce the data to 4 sets. K-means clustering was then used to find the cluster centers. This is what I'm after now.

I have worked out how to do the factor analysis to get the component score coefficient matrix that matches the data I have in my database. This was done by going to Analyze > Dimension Reduction > Factor. I then chose a fixed number of factors (4) from the "Extract" section, "Varimax" rotation from the "Rotation" section and checked the "Display factor score coefficient matrix" in the "Scores" section.

This gave data like this:

Matrix   Value 1   Value 2   Value 3   Value 4
Q1       -0.0756   0.2134    -0.0245   -0.1236
Q2       ...       ...       ...       ...
Q3       ...       ...       ...       ...
...

What I have no idea of is how to proceed with this to do the k-means clustering.

The results I have in the database look like this:

Cluster centers   Value 1   Value 2   Value 3   Value 4   Value 5
FAC1_1            -0.8373   -0.5766   0.2100    1.3499    0.2940
FAC2_1            ...       ...       ...       ...       ...
FAC3_1            ...       ...       ...       ...       ...
FAC4_1            ...       ...       ...       ...       ...

Now, I know that k-means clustering can be done on the original data set by using Analyze > Classify > K-means Cluster, but I don't know how to reference the factor analysis I've done.

Could someone give me some insight into how to create these cluster centers using SPSS?

Jignesh Sutar Jignesh Sutar · Accepted Answer · 2015-05-21T10:33:09

In the GUI for FACTOR analysis (Analyze > Dimension Reduction > Factor), you have a sub-dialog "Scores", make sure "Save as variables" is checked.

This will save the factor scores in your data i.e. the variables FAC1_1, FAC2_1, FAC3_1, FAC4_1.

It is these variable that you then need to add as input variables in the K-means GUI.

It is better to setup your work in a syntax so if ever anyone else ever wants to replicate your work they can do so (and ideally your predecessor should have left his bread crumbs in a syntax document too. I would make every attempt to find this document if there is a remote possibility of it existing, a file of .sps file extension).

Here's how you'd set this up in syntax and what his/her workings may have looked like:

/* Replicate the factor analysis (four factors) and save the factor score variables */.
FACTOR
  /VARIABLES < INPUT THE 15 VARIABLES HERE >
  /MISSING LISTWISE 
  /ANALYSIS < INPUT THE 15 VARIABLES HERE >
  /PRINT EXTRACTION ROTATION FSCORE
  /FORMAT SORT BLANK(.10)
  /PLOT ROTATION
  /CRITERIA FACTORS(4) ITERATE(25)
  /EXTRACTION PC
  /CRITERIA ITERATE(25)
  /ROTATION VARIMAX
  /SAVE REG(ALL)
  /METHOD=CORRELATION.

 /* Replicate the clustering using factor scores as inputs, generating 5 segments */.
QUICK CLUSTER FAC1_1 FAC2_1 FAC3_1 FAC4_1
  /MISSING=LISTWISE
  /CRITERIA=CLUSTER(5) MXITER(10) CONVERGE(0)
  /METHOD=KMEANS(NOUPDATE)
  /SAVE CLUSTER (Seg5)
  /PRINT INITIAL.

/* Check centroids match*/.
MEANS FAC1_1 FAC2_1 FAC3_1 FAC4_1 BY Seg5 /CELLS MEAN.

If you can replicate the FACTOR score variables to match exactly, then that is a good start, if the centroids do not match then, given the factor scores do match, then it can only be/most likely to be because the segment assignments are now different. Despite using the same input/methodology if the case ordering is different to previously, K-Means QUICK CLUSTER, can and will most likely yield different segment assignments due to random starting points.

I don't know any way round this but in principle these are the likely steps he/she had taken.

SPSS - Using K-means clustering after factor analysis

4 Answers