0
votes

I need help checking if several variables are not empty. Normally, a "where VarName is not missing" would suffice, however, the number of variables that are generated will vary.

I have the following macro I found which correctly determines the number of variables when called in a data step:

%macro var_count(var_count_name);
    array vars_char _character_;
    array vars_num _numeric_;
    &var_count_name = dim(vars_char) + dim(vars_num);
%mend;

My datasets is creating a variable number of COLs (i.e. COL1, COL2, COL3, etc.) depending on the dataset I use. I would like to use this in a data step where it returns observations where each of my generated COL1, COL2, COL3, etc. variables are looked at. I envision something like below:

Data Want;
    set Have;

    where cats(COL, %var_count(total_vars)) is not missing;
run;

But this does not work. I would very much like to avoid having to write "where COL1 is not missing or COL2 is not missing or ..." everytime the program is run. Any and all help will be appreciated.

EDIT: I fear I may have been too unspecific in my needs above. I will try to be more clear below. Not sure if I should make a new post, but here goes.

Dataset that I have (CVal = Character value)

| ID | COL1 | COL2 | COL3 | COL4 | COL5 | COL6 | COL7 |
| 1  |      |      |      |      |      |      | CVal |
| 2  | CVal | CVal |      |      |      |      |      |
| 3  |      |      |      |      |      |      |      | 
| 4  |      | CVal |      |      |      |      |      | 

I would like to keep ID1, 2 and 4, due to there being information in either COL1 through COL7 in each of these.

Essentially I would like a piece of code that can do the following:

Data Want;
    Set data have; 

    if missing(COL1) and missing(COL2) and missing(COL3) and missing(COL4) 
    and missing(COL5) and missing(COL6) and missing(COL7) then delete; 

run;

My problem is then, the number of COLs will vary depending on the input dataset. It may sometimes just be COL1-COL5, sometimes COL1-COL20. How can this be made "automatic", so it will automatically register the number of COL-columns and then automatically check those columns, if they are all empty and then delete the observation.

2
What is the purpose of the WHERE statement in your example? Are you trying to exclude observations that only have missing values?Tom
I see I could have specified a lot more clearly. I want to create a dynamic where statement, where I only include COLs that are not empty. The number of COLs (COL1, COL2, COL3, etc.) may vary though from time to time when running the script depending on the dataset I receive.Andreas Redsted
I updated my answer for your example dataset. if not missing(cats(of COL:));Tom

2 Answers

3
votes

In your case to test if any of the COL: variables is non-empty you can just test if the concatenation of them is non-empty.

data want;
  set have;
  if not missing(cats(of COL:));
run;

You need to use subsetting IF because you cannot use variable lists in a WHERE statement.

Example:

35   data test;
36    set sashelp.class;
37    where nmiss(of height weight) > 0 ;
                     ------
                     22
                     76
ERROR: Syntax error while parsing WHERE clause.
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, ), *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ,
              GE, GT, IN, IS, LE, LIKE, LT, NE, NOT, NOTIN, OR, ^, ^=, |, ||, ~, ~=.

ERROR 76-322: Syntax error, statement will be ignored.

38   run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEST may be incomplete.  When this step was stopped there were 0 observations and 5 variables.
WARNING: Data set WORK.TEST was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

Note that if any of the COL... variables is numeric you would need to modify the test a little. When the MISSING option is set to ' ' then the line above will work. But if it is set to the normal '.' then the numeric missing values will appear as periods. If you don't mind also treating any character values that just have periods as missing also then you could just use compress() to remove the periods.

if not missing(compress(cats(of COL:),'.'));
1
votes

You can use N to count the number of non-missing numerics and CATS to check for some character values being not-missing.

Example:

Presume numeric and character variables are segregated with prior variable based array statements such as

  array chars col1-col7;
  array nums x1-x10;

The subsetting if would be

  if N(of vars_num(*)) or not missing (CATS(of vars_char(*)));

or test using COALESCE and COALESCEC

  if not missing(coalesce(of nums(*))) or
     not missing(coalesceC(of chars(*)));

If you don't know the variable names ahead of time, you will need to examine the data set ahead of the subsetting DATA step and codegen the array statements into macro variables.