1
votes

currently I'm using the proc discrim in SAS to run a kNN analysis for a data set, but the problem may require me to get the top k neighbor list for each rows in my table, so how can I get this list from SAS??

thanks for the answer, but I'm looking for the neighbor list for each data point, for example if i got data set: name age zipcode alcohol John 26 08439 yes Cathy 49 47789 no smith 37 90897 no Tom 34 88642 yes

then i need list:

name neighbor1 neighbor2 John Tom cathy Cathy Tom Smith Smith Cathy Tom Tom John Cathy

I could not find this output from SAS, is there any whay that I can program to get this list? Thank you!

1
thanks for the answer, but I'm looking for the neighbor list for each data point, for example if i got data set:user2926523
A similar question was asked here: stats.stackexchange.com/questions/276866/…hanna

1 Answers

1
votes

I am not a SAS user, but a quick web lookup seems to give a good answers for your problem:

As far as i know you do not have to implement it by yourself. DISCRIM is enough.

Code for iris data from http://www.sas-programming.com/2010/05/k-nearest-neighbor-in-sas.html

ods select none;
proc surveyselect data=iris  out=iris2  
                  samprate=0.5  method=srs  outall;
run;
ods select all;

%let k=5;
proc discrim data=iris2(where=(selected=1))   
             test=iris2(where=(selected=0))
             testout=iris2testout
             method=NPAR k=&k 
             listerr crosslisterr; 
      class Species; 
      var SepalLength SepalWidth PetalLength PetalWidth; 
      title2 'Using KNN on Iris Data'; 
run; 

The long and detailed description is also avaliable here: http://analytics.ncsu.edu/sesug/2012/SD-09.pdf

And from the sas community:

Simply ask PROC DISCRIM to use nonparametric method by using option "METHOD=NPAR K=". Note that do not use "R=" option at the same time, which corresponds to radius-based of nearest-neighbor method. Also pay attention to how PROC DISCRIM treat categorical data automatically. Sometimes, you may want to change categorical data into metric coordinates in advance. Since PROC DISCRIM doesn't output the Tree it built internally, use "data= test= testout=" option to score new data set.