2
votes

I have approximately 1,000,000 rows and 25 columns of data and I'm trying to return a list of column names, the number of distinct values and whether there are missing values.

I am not able to directly code in column names in PROC SQL and count distinct as I have numerous data sets with different column names and I'm trying to automatically return the desired outcome for all tables with one piece of code.

I've tried running the following code

proc freq nlevels data= &DATASET_NAME;
    ods output nlevels=nlevels ;
    tables _all_ NOPRINT;
run;

This returns an out of memory error. Is there another way to achieve the result, avoiding the out of memory error.

2

2 Answers

0
votes

It is unnecessary to input column name by table _all_, but it possibly makes out of memory by inputting all columns at the same time, try to separate column to do proc freq and then combine results:

proc sql;
   create table name as
   select name from dictionary.columns where libname='SASHELP' and memname='CLASS';
quit;

data want;
run;

data _null_;
   set name;
   call execute(
      'proc freq data=class nlevels;
       table '||name||';
       ods output nlevels=nlevels;
       run;
       data want;
          set want nlevels;
       run;'
    );
run;
0
votes

This question is very similar to SAS summary statistic from a dataset

The answers cover techniques for

  • transpose + freq
  • hash
  • freq w/ ODS exclude+output