I have a very large number of datasets that are not consistently formatted - I am trying to read them into SAS and normalize them.
The basic need here is to locate a 'key column' that contains a certain string - from there I know what to do with all the variables to the left and right of that column.
The 'GREP' macro from the sas website (http://support.sas.com/kb/33/078.html) seems like it can handle this, but I need help adapting the code in the following ways:
1 - I only need to search one dataset at a time, already in the 'work' library.
2 - I need to capture the name of the variable (and the position number of it) that prints to the log at the end of this macro. This seems like it would be easy but it just returns the last column in the dataset instead of the (correct) column that prints to the log at the end.
Current code below:
%macro grep(librf,string); /* parameters are unquoted, libref name, search string */
%let librf = %upcase(&librf);
proc sql noprint;
select left(put(count(*),8.)) into :numds
from dictionary.tables
where libname="&librf";
select memname into :ds1 - :ds&numds
from dictionary.tables
where libname="&librf";
%do i=1 %to &numds;
proc sql noprint;
select left(put(count(*),8.)) into :numvars
from dictionary.columns
where libname="&librf" and memname="&&ds&i" and type='char';
/* create list of variable names and store in a macro variable */
%if &numvars > 0 %then %do;
select name into :var1 - :var&numvars
from dictionary.columns
where libname="&librf" and memname="&&ds&i" and type='char';
quit;
data _null_;
set &&ds&i;
%do j=1 %to &numvars;
if &&var&j = "&string" then
put "String &string found in dataset &librf..&&ds&i for variable &&var&j";
%end;
run;
%end;
%end;
%mend;
%grep(work,Source Location);
The log returns: "String Source Location found in dataset WORK.RAW_IMPORT for variable C" (the third), which is correct.
I just need usable macro variables equal to "C" and "3" at the end. This macro will be part of a larger macro (or a prelude to it) so the two macro variables need to reset with each dataset I run through it. Thanks for any help offered.