0
votes

I've conducted a series of long-term surveys with a same group of 44 respondents (not that many, but I could not do better).

I need to cluster the sample in SPSS using two-step analysis, however there are really a lot of variables. As 6 different survey questionnaires were conducted, there are about 200 quantitative questions (variables), let alone the qualitative ones.

My first question is: should I use all the quantitative variables to perform the cluster analysis? All manuals I read choose some selected variables for the clustering solution, instead of all of them.

The second problem is that I attempted to use hierarchical clustering with all the quantitative data, but SPSS notified that:

Warnings

Not enough valid cases to perform the cluster analysis.

...which means the data set I have cannot be used to perform cluster analysis... In this case what should I do to perform the cluster analysis?

2

2 Answers

1
votes

This sounds rather problematic. You have a huge number of variables. You haven't said how many cases, but it sounds like it might be only 44 x 6. This is not a good combination. What is the purpose of the clustering exercise?

You might consider extracting a few principle components from the quantitative variables to use in clustering and add a small number of other variables. The message from the hierarchical clustering procedure is a warning.

0
votes

Question 1 ,

Although you have 200 variables there might be strong correlation between certain variables. So it is a best practice to use variables which are less correlated to each other in order to perform cluster analysis.

alternatively you can use an unsupervised method like principal component analysis to reduce the dimension of the data set and transform into a low correlated space.

Question 2 ,

Following link provides a good explanation about your SPSS error, http://www-01.ibm.com/support/docview.wss?uid=swg21481097