2
votes

I'm trying to find a way to identify the subset of numeric variables that represent dates, times or datetimes from an arbitrary dataset that contains many variables of both numeric or character type. The goal is to have a macro or template that I could run against any given dataset. I believe that the SAS formats for these variables should correctly represent what the variables are, but I'm trying to see if I can avoid having to create some code that parses the text of the SAS format for any of the applicable format names.

I was originally thinking it should be possible to do something like the following to get the SAS formats for a dataset and then do various things based on the columns that come back as date/time/datetime.

PROC SQL;
   CREATE TABLE lib_X.Dataset_A AS
   SELECT tbl_CL.LIBNAME
         ,tbl_CL.MEMNAME AS TABLE_NAME
         ,tbl_CL.NAME    AS COLUMN_NAME
         ,tbl_CL.FORMAT  AS SAS_Format
         ,tbl_CL.TYPE    AS SAS_Type
         ,tbl_CL.varnum
         ,tbl_CL.LENGTH
   FROM DICTIONARY.COLUMNS AS tbl_CL
   WHERE tbl_CL.MEMNAME = %UPCASE("&SAS_DatasetNm")
     AND tbl_CL.LIBNAME = %UPCASE("&SAS_LibNm");
QUIT;

Followed by:

DATA lib_X.Dataset_A;
   SET lib_X.Dataset_A;
   IF FCN_FORMAT_TYPE(SAS_Format) = "DATETIME" 
   THEN ...;
RUN;

Where the FCN_FORMAT_TYPE function is some function that examines the SAS format and returns which of the 4 categories of SAS Format (character, numeric, date/time, or ISO 8601) the variable is, but there doesn't seem to be an existing function that does this that I can find. Perhaps there's another approach that would work?

Currently running on SAS 9.4 M2 on a Linux server. The code will primarily be used via batch file, but possibly via Enterprise Guide as well (EG 7.1 or 6.1)

2

2 Answers

1
votes

No out-of-the-box solution, but here's a workable one.

For simplicity, let's work with dates only, excluding time and datetime formats. The solution can easily be generalized to include those.

As a first step, you work out the filters that will select the plausible date formats you can expect. Here's an example of a query that will match about 100 SAS date formats.

proc sql;
  create table fmts as
  select fmtname
    from sashelp.vformat
   where fmttype = "F"
         and fmtname not like '$%'
         and prxmatch("/(DATE|YY|YEAR)/",fmtname)
         and not prxmatch("/(DT|TIME)/",fmtname)
   order by fmtname;
quit;

Once you've fine-tuned your WHERE filter, you can use it in a macro routine like the following, where the 3rd argument vmname is the name of the global macro variable you wish to store the date column names in.

%macro getDateCols(libname, dataset, vmname);

  %global &vmname;

  proc sql noprint;
    select name
      into :&vmname separated by ", " /* Use ", " or " " depending on your needs */
      from dictionary.columns
     where libname = upcase("&libname")
           and memname = upcase("&dataset")
           and format not like '$%'
           and prxmatch("/(DATE|YY|YEAR|MONTH|DAY|WEEK)/", format)
           and not prxmatch("/(DT|TIME)/", format);
  quit;

%mend getDateCols;

After calling the macro, you should get the column name HireDate stored in &dsdates:

%getDateCols(sasuser, empdata, dsdates);
%put &dsdates;

Then you can use the macro variable in proc sql, for example...

proc sql;
  select &dsdates 
   from sasuser.empdata;
quit;
1
votes

Dominic's answer gave me an idea about how to solve it, using the same general approach, but putting it in a PROC FCMP function instead of a macro.

PROC FCMP OUTLIB=LIB_X.FCMP_FUNCS.Format_Category_ID;
   FUNCTION Format_Category(VAR_Format $, VAR_Type $) $ 16
        GROUP = 'Format Category ID'
        LABEL = 'Pass SAS variable type and format as CHAR and it will return what sort of variable it is. 
                 Breaks down date/time/dt formats into independent categories.';
        LENGTH FORMAT_CATEGORY $16;
        SELECT;
            WHEN        (MISSING(VAR_Format))                                                   FORMAT_CATEGORY = 'MISSING_FORMAT';
            WHEN        (VAR_Type = 'CHAR')                                                     FORMAT_CATEGORY = 'CHARACTER_FORMAT';
            WHEN        (PRXMATCH("/S370FZDT/", VAR_Format))                                    FORMAT_CATEGORY = 'NUMERIC_FORMAT';
            WHEN        (PRXMATCH("/8601/", VAR_Format))                                        FORMAT_CATEGORY = 'ISO_8601_FORMAT';
            WHEN        (PRXMATCH("/(DT|DATEAMPM|DATETIME)/", VAR_Format))                      FORMAT_CATEGORY = 'DATETIME_FORMAT';
            WHEN        (PRXMATCH("/(HHMM|HOUR|MMSS|TIME|TOD)/", VAR_Format))                   FORMAT_CATEGORY = 'TIME_FORMAT';
            WHEN        (PRXMATCH("/(DAT|DAY|YY|DOW|JUL|MON|QTR|WEEK|YEAR)/", VAR_Format))      FORMAT_CATEGORY = 'DATE_FORMAT';
            WHEN        (VAR_Type = 'NUM')                                                      FORMAT_CATEGORY = 'NUMERIC_FORMAT';
            OTHERWISE                                                                           FORMAT_CATEGORY = 'UNKNOWN';
        END;
        RETURN(FORMAT_CATEGORY);
   ENDSUB;

It's definitely not bullet-proof, but it worked for my purposes and will hopefully be useful to others. The biggest caveat I'd add is that handling user-defined formats will be touch and go since it's simply parsing the SAS format which is passed as a character variable, so if the name overlaps with the regular expressions I'm using, all bets are off.