I have a bunch of character variables which I need to sort out from a large dataset. The unwanted variables all have entries that are the same or are all missing (meaning I want to drop these from the dataset before processing the data further). The data sets are very large so this cannot be done manually, and I will be doing it a lot of times so I am trying to create a macro which will do just this. I have created a list macro variable with all character variables using the following code (The data for my part is different but I use the same sort of code):
data test;
input Obs ID Age;
datalines;
1 2 3
2 2 1
3 2 2
4 3 1
5 3 2
6 3 3
7 4 1
8 4 2
run;
proc contents
data = test
noprint
out = test_info(keep=name);
run;
proc sql noprint;
select name into : testvarlist separated by ' ' from test_info;
quit;
My idea is then to just use a data step to drop this list of variables from the original dataset. Now, the problem is that I need to loop over each variable, and determine if the observations for that variable are all the same or not. My idea is to create a macro that loops over all variables, and for each variable counts the occurrences of the entries. Since the length of this table is equal to the number of unique entries I know that the variable should be dropped if the table is of length 1. My attempt so far is the following code:
%macro ListScanner (org_list);
%local i next_name name_list;
%let name_list = &org_list;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
%put &next_name;
proc sql;
create table char_occurrences as
select &next_name, count(*) as numberofoccurrences
from &name_list group by &next_name;
select count(*) as countrec from char_occurrences;
quit;
%if countrec = 1 %then %do;
proc sql;
delete &next_name from &org_list;
quit;
%end;
%let i = %eval(&i + 1);
%end;
%mend;
%ListScanner(org_list = &testvarlist);
Though I get syntax errors, and with my real data I get other kinds of problems with not being able to read the data correctly but I am taking one step at a time. I am thinking that I might overcomplicate things so if anyone has an easier solution or can see what might be wrong to I would be very grateful.