I have a dataset with each observation having two space-separated lists as string variables. I want a third variable showing the overlap between the string lists. Using another SO post, I've created a macro to calculate the overlap. I can't work out how to implement it in a DATA step to get the third variable.
This is my dataset, with dummy data:
data use;
infile datalines dlm='~~';
input list1:$100. list2:$100. expected_match:$10.;
datalines;
Homer Bart~~Homer Bart~~Full
Marge Lisa~~Lisa Marge~~Full
Homer Marge~~Marge~~Partial
Bart Lisa~~Bart~~Partial
Homer Marge Bart Lisa~~Maggie~~None
;;;;
run;
This is the macro, with tests (all of which pass):
%macro list_overlap(list1, list2);
%local i matches match_type;
%let matches = 0;
%do i = 1 %to %sysfunc(countw(&list1, %str( )));
%if %sysfunc(findw(&list2, %scan(&list1, &i,, s)))
%then %let matches = %eval(&matches + 1);
%end;
%if &matches = %sysfunc(countw(&list1, %str( )))
and %sysfunc(countw(&list1, %str( ))) = %sysfunc(countw(&list2, %str( )))
%then %let match_type = 'Full';
%else %if &matches = 0 %then %let match_type = 'None';
%else %let match_type = 'Partial';
match_type = &match_type%str(;)
%mend list_overlap;
%put NOTE: %list_overlap(Homer Bart,Homer Bart);
%put NOTE: %list_overlap(Marge Lisa,Lisa Marge);
%put NOTE: %list_overlap(Homer Marge,Marge);
%put NOTE: %list_overlap(Bart Lisa,Bart);
%put NOTE: %list_overlap(Homer Marge Bart List,Maggie);
This is how I'm trying to implement it in a DATA step:
data matches;
set use;
call execute(catt('%list_overlap(', list1, ',', list2, ')'));
run;
I'm getting the following error with this case:
NOTE: Line generated by the CALL EXECUTE routine.
1 + match_type = 'Full';
__________
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
I've tried other ways too, but this is the closest I've got.