1
votes

Is there a way to be able to determine a list of all the columns used in a sas script? I want to be able to write a sas code which can go through a SAS code file and determine all the columns used (in select, conditional processing/joins, etc.).

Has anyone ever done anything similar. Need suggestions on how to do it.

Thanks!

2

2 Answers

1
votes

To do this perfectly, you would need to parse the SAS source files and extract the information you want from the resulting 'abstract syntax tree' (AST), finding all statements that deal with columns. Your code would need to 'know' about every statement, function and PROC available. You would also need to parse and execute any macro statements in the code - otherwise you wouldn't know what additional column-related statements might get generated by the macro code. Because the macro code may rely on data generated in earlier PROCs or data steps, executing the macro code would require executing the underlying SAS code as well, so to get correct results you would basically need to recreate the entire SAS system.

There are tools like ANTLR (http://www.antlr.org/) that are intended to generate code in some language (typically Java) that can parse/interpret/compile code in other languages. There's discussion of ANLTR grammars for SAS here:

https://groups.google.com/forum/#!topic/comp.soft-sys.sas/omyZ2sZR0pk

But I would say that solving this problem in the general case is virtually impossible, certainly if you want to do it just from a set of source files, ie without actually running the programs.

However, for certain well-defined subsets, you would be able to do it... Your question seems to be talking about SQL specifically, rather than SAS in general. If you don't need to process macro code, and limit yourself to just any PROC SQL steps, you might even be able to use a pre-existing SQL parser (there are several listed here: http://www.antlr3.org/grammar/list.html) or just write SAS code to look for a limited subset of SQL statements and clauses - SELECT, ON, WHERE etc.

Good luck!

0
votes

A way to obtain metadata is Proc contents.

proc contents data=File_in_question 
    out= Data_out(keep=name) noprint;
run;

There is a lot of more information there. To access it drop the (keep=name) restriction. For more on proc contents see: http://libguides.library.kent.edu/SAS/ViewContents