0
votes

There is an example from the SAS Documentation of PROC CLUSTER that performs cluster analysis on Iris dataset:

proc cluster data=iris method=ward print=15 ccc pseudo;
   var petal: sepal:;
   copy species;
run;

proc tree noprint ncl=3 out=out;
   copy petal: sepal: species;
run;

From PROC FREQ we can see that there has been 16 misclassifications:

proc freq;
      tables cluster*species / nopercent norow nocol plot=none;
run;

How can I get plots of all pairwise projections of variables (6 in total) where cluster membership is indicated by different colors and highlight all misclassifications with a separate color or other marker shape?

I know that scatter plot can be obtained with PROC SGPLOT but I can't highlight misclassified observations.

1
How do you know there's a mismatch? What's the rule for that? I would add a new variable that indicated if a variable has a mismatch and then use the GROUP= option on my SGPLOT SCATTER statement to highlight those records.Reeza
@Reeza well, sashelp.iris already has a column species with three possible results (150 rows in total and 50 for each species/cluster). That is what we are trying to achieve - get the result as close as possible to species columnadeline

1 Answers

2
votes

The definitions are somewhat manual but you could streamline it if you assumed the lower amount for each categorization is wrong.

data mismatched;
    set out;
    length mismatch $20.;

    if (cluster=3 and species='Versicolor') or (cluster=1 and species='Virginica') 
        then
            mismatch="Mismatched";
    else
        mismatch="Matched";
run;

proc sgplot data=mismatched;
    scatter x=petalLength y=petalWidth / group=mismatch;
run;