0
votes

I have a group of data sets where certain variables have been defined as having lengths >2000 characters. What I want to do is create a macro that identifies these variables and then creates a set of new variables to hold the values.

doing this in base code would be something like:

data new_dset;
set old_dset:
length colnam1 colnam2 colnam3 2000.;
colnam1 = substr(long_column,1,2000);
colnam2 = substr(long_column,2001,2000);
run;

I can build up the list of variable names and lengths as a set of macro variables, But I don't know how to create the new variables from the macro variables.

What I was thinking it would look like is:

%macro split;

data new_dset;
set old_dset;
%do i = 1%to &num_cols;

   if &&collen&i > 2000 then do;

      &&colnam&i 1 = substr(&&colnam&i,1,2000);
   end;
%en;
run;
%mend;

I know that doesn't work, but that's the idea I have.

If anyone can help em work out how I can do this I would be very grateful.

Thanks

Bryan

1

1 Answers

1
votes

Your macro doesn't need to be an entire data step. In this case it's helpful to see exactly what you're replicating and then write a macro based on that.

So your code is:

data new_dset;
set old_dset:
length colnam1 colnam2 colnam3 2000.;
colnam1 = substr(long_column,1,2000);
colnam2 = substr(long_column,2001,2000);
run;

Your macro then really needs to be:

length colnam1 colnam2 colnam3 2000.;
colnam1 = substr(long_column,1,2000);
colnam2 = substr(long_column,2001,2000);

So what you can do is put that in a macro:

%macro split(colname=);
 length &colname._1 &colname._2 $2000;
 &colname._1 = substr(&colname.,1,2000);
 &colname._2 = substr(&colname.,2001,4000);
%mend;

Then you generate a list of calls:

proc sql;
 select cats('%split(colname=',name,')') into :calllist separated by ' '
  from dictionary.columns 
  where libname = 'WORK' and memname='MYDATASET'
  and length > 2000;
quit;

Then you run them:

data new_dset;
set old_dset;
&calllist;
run;

Now you're done :) &calllist contains a list of %split(colname) calls. If you may need more than 2 variables (ie, > 4000 length), you may want to add a new parameter 'length'; or if you're in 9.2 or newer you can just use SUBPAD instead of SUBSTR and generate all three variables for each outer variable.