1
votes

I'm trying to automatically create some integrity constraints based on this dataset:

ds_name | var_name | ic_clause                          | ic_msg
--------+----------+------------------------------------+-----------------------
tableA  | var1     | primary key($var$)                 | $var$ is a primary key
tableB  | var2     | check(where=($var$ in ('a', 'b'))) | invalid $var$ value

The idea is to create a generic program that loops through this dataset and creates the ICs accordingly. In this specific case, the equivalent hard coded program would be:

proc datasets nolist;
    modify tableA;
        ic create primary key(var1)
            message = "var1 is a primary key";
quit;

proc datasets nolist;
    modify tableB;
        ic create check(where=(var2 in ('a', 'b')))
            message = "invalid var2 value";
quit;

These are the steps I imagine in the program, but I need help to translate them into actual code:

  1. Fetch values for a row and put them in macro variables
  2. Replace $var$ substrings with the actual variable name in column var_name
  3. Run a generic proc datasets, eg.:

        proc datasets nolist;
            modify &my_ds;
                ic create &my_clause
                    message = &my_msg;
        quit;
    
  4. Loop through all rows

Can anyone please help me with this code? I don't know whether the steps I've suggested are the best way to implement what I'm trying to do. Basically I'm trying to simulate a relational database within SAS, and to automate things the maximum possible.

Thank you!

3
@user667489 thanks for the article! Data-driven programming really is an interesting concept.Will Razen

3 Answers

0
votes

You probably will find that you cannot turn SAS into a DBMS. It might be better to use your metadata to generate programs that check the data instead of trying to implement integrity constraints.

But the concept of data driven code generation is interesting one so lets see if we can use your example to demonstrate how to generate code from metadata. I find that it works better when you match the variable names in the metadata to the code that needs to be generated. So lets call the variable that is used to create the MESSAGE= option on the IC statement MESSAGE.

Now we can use a simple data step to generate the code. Not sure why you used pseudo code in the constraint and message fields instead of just hard coding the values, but we can use TRANWRD() function to replace the $varname$ strings with the value of the VARNAME variable.

So let's make a sample metadata file.

data ic_metadata;
  infile datalines dlm="|";
  length libname $8 memname $32 varname $32 constraint message $200;
  input libname memname varname constraint message ;
datalines;
work|tableA|var1|primary key($varname$)                |$varname$ is a primary key
work|tableB|var2|check(where=($varname$ in ('a', 'b')))|invalid $varname$ value
;

And some sample data to work on.

data tablea tableb ;
 length var1 8 var2 $8 ;
 var1+1;
 var2='a';
run;

Now let's use the metadata to generate the code and %INCLUDE to run it.

 filename code temp;
 data _null_;
   file code ;
   set ic_metadata ;
   by libname memname ;
   if first.libname then put 'proc datasets lib=' libname 'nolist;' ;
   if first.memname then put '  modify ' memname ';' ;
   constraint=tranwrd(constraint,'$varname$',trim(varname));
   message=tranwrd(message,'$varname$',trim(varname));
   put 'ic create ' constraint message= :$quote. ';' ;
   if last.memname then put 'run;';
   if last.libname then put 'quit;' ;
 run;
 %include code / source2 ;

So running the example we get a SAS log like this:

161  +proc datasets lib=work nolist;
162  +  modify tableA ;
163  +ic create primary key(var1) message="var1 is a primary key" ;
NOTE: Integrity constraint _PK0001_ defined.
164  +run;

NOTE: MODIFY was successful for WORK.TABLEA.DATA.
165  +  modify tableB ;
166  +ic create check(where=(var2 in ('a', 'b'))) message="invalid var2 value" ;
NOTE: Integrity constraint _CK0001_ defined.
167  +run;

NOTE: MODIFY was successful for WORK.TABLEB.DATA.
168  +quit;
0
votes

You can use the data variables themselves to write your macro statements as you suggested. You basically want to create a new, long variable that is what a macro call would look like written in a datastep for each row by stringing together all the variables. You can use the tranwrd function to replace the placeholder text with the actual VAR_NAME. Below should work:

data test;
    infile datalines dlm="|";
    length DS_NAME VAR_NAME IC_CLAUSE IC_MSG $50;
    input DS_NAME $ VAR_NAME $ IC_CLAUSE $ IC_MSG $;
    datalines;
    tableA  | var1     | primary key($var$)                 | $var$ is a primary key
    tableB  | var2     | check(where=($var$ in ('a', 'b'))) | invalid $var$ value
    ;
run;

** write your master macro **;
%MACRO master_loop(DS_NAME=,IC_CLAUSE=,IC_MSG=);
proc datasets nolist;
    modify &DS_NAME.;
        ic create &IC_CLAUSE.
            message = "&IC_MSG.";
quit;
%MEND;

** create all your macro statements **;
data master_strings; 
    length STR $200;
    set test;
    IC_CLAUSE1 = tranwrd(IC_CLAUSE,"$var$",strip(VAR_NAME)); /* replace the placeholder with the actual VAR_NAME contents */
    IC_MSG1 = tranwrd(IC_MSG,"$var$",strip(VAR_NAME)); /* replace the placeholder with the actual VAR_NAME contents */
    STR = %nrstr("%master_loop("||"DS_NAME="||strip(DS_NAME)||",IC_CLAUSE="||strip(IC_CLAUSE1)||",IC_MSG="||strip(IC_MSG1)||");");
run;

** put all macro statements into a list**;
** this would look similar to writing out multiple %master_loop statements if hard-coded **;
proc sql noprint;
    select STR
    into: all_macro_calls separated by " "
    from master_strings;
quit;

** submit all your macro calls **;
%put &all_macro_calls.;
0
votes

You can use call execute to make your 'hardcoded' program completely dynamic (where IC is your base dataset with the constraints):

data _null_;
set IC;
call execute("proc datasets nolist;modify "||strip(ds_name)
   ||";ic create "||tranwrd(strip(ic_clause),'$var$',strip(var_name))
   ||" message = '"||tranwrd(strip(ic_msg),'$var$',strip(var_name))
   ||"';quit;");
run;

Basically, for each observation in your dataset, call execute will execute the appropriate proc datasets by inserting the variable values (ds_name,var_name, etc...) in the right place. The tranwrd functions will take care of replacing the $var$ placeholder by the actual value of var_name.