0
votes

my data looks like this,

S. No   AB001A  AB0002A AB362   VAR1    VAR2    VAR3    SUM %Match  Rank
    1   -/-        C/C   A/A                        
    2   C/C        C/C   A/A                        
    3   C/C        C/C   A/A                        
    4   C/C        C/C   A/A                        
    5   C/C        C/C   A/A                        
    6   C/C        C/C   A/A                        
    7   C/C        C/C   A/A                        
    8   -/-        -/-   -/-                        
    9   C/C        C/C   A/A                        
    10  C/C        C/C   A/A                        
    11  -/-        C/C   A/A                        
    12  C/C        C/C   A/A                        
    13  C/C        C/C   A/A                        
    14  C/C        C/C   A/A                        
    16  C/C        -/-   A/A                        
    17  -/-        C/C   A/A                        
    18  C/C        C/C   A/A                        
    19  C/C        C/C   A/A                        

I want to match obs 3 with obs 2 if it exactly matched then score will be 1 else 0, that will be stored in var1 for AB001a, in var2 for ab0002a and in var3 for ab362 and i want to calculate sum of all the 1's and observation match percent and their rank (top ten matchers), I did this successfully in excel but it took me lot of time, i used if condition in excel like (=if(A3=A$2,1,0) and then i dragged among all obs and i did sum of all obs, their %match and rank. My question is how can i do this in sas? can i use arrays for this? or in combinations of loops and arrays? my data is so big with 5,15,567 obs. can any one guide me how to do this in sas because i want to reduce my time to analyze my data. Thanking you Regards,

1
Could you try to explain a bit better want your trying to achieve, because I don't understand how the result should be like. Do you want to count the number of times each value occurs in a column and then run some statistics on them?Laurent de Walick

1 Answers

0
votes

You don't need to use loops or arrays if you're only summarising 3 variables, although I would use an array if there are many more. I'm using the LAG function to perform this task, it does require a certain amount of care to ensure it works how you want. Make sure you don't use IF...THEN...ELSE statements. I've calculated all the summary statistics except Rank as I'm not sure which variable you're ranking. If you check out PROC RANK in the online documentation then you should be able to work out what to do. Hope this helps.

data have;
input S_No AB001A $  AB0002A $ AB362 $;
datalines;
    1   -/-        C/C   A/A                        
    2   C/C        C/C   A/A                        
    3   C/C        C/C   A/A                        
    4   C/C        C/C   A/A                        
    5   C/C        C/C   A/A                        
    6   C/C        C/C   A/A                        
    7   C/C        C/C   A/A                        
    8   -/-        -/-   -/-                        
    9   C/C        C/C   A/A                        
    10  C/C        C/C   A/A                        
    11  -/-        C/C   A/A                        
    12  C/C        C/C   A/A                        
    13  C/C        C/C   A/A                        
    14  C/C        C/C   A/A                        
    16  C/C        -/-   A/A                        
    17  -/-        C/C   A/A                        
    18  C/C        C/C   A/A                        
    19  C/C        C/C   A/A
;
run;

/* count number of variables starting with AB */
proc sql noprint;
select (count(*) into :num_ab
from dictionary.columns
where libname='WORK' and upper(memname)='HAVE' and name eqt 'AB';
quit;

%put numvars = &num_ab.;

data want;
set have;
array mole{&num_ab.} AB: ;
array newvar{&num_ab.} var1 - var%left(&num_ab.);
do i=1 to dim(mole);
    newvar{i} = (mole{i}=lag(mole{i}));
end;
sumvar=sum(of var1-var%left(&num_ab.));
match_pc=sumvar/&num_ab.;
drop i;
run;