1
votes

I want to try to find a way to count the number of observations within multiple (more than two) BY group variables. After which, I wish to delete observations pertaining to ones whose count is less than two. Here is an example of what I am trying to do:

VAR1 VAR2 VAR3
 a     a    1
 a     a    2
 a     b    1
 a     b    2
 b     a    1
 b     a    2
 b     b    1
 b     b    2
 c     a    1
 c     b    1
 d     a    1

Over here, I would like to make sure that there are exactly two distinct values of VAR3, with respect to VAR1 and VAR2.

In this example, you can see that I want to delete the last three observations as there is one value per VAR1/VAR2 pair.

Was there a simple way to do this?

I have tried:

data want;
set have;
by VAR1 VAR2 VAR3;
if first.VAR3 = last.VAR3 then delete;
run;

But that did not work as it deleted observations with the same VAR3 within the same VAR1. I need to help building something more robust.

In the end, I want this:

VAR1 VAR2 VAR3
 a     a    1
 a     a    2
 a     b    1
 a     b    2
 b     a    1
 b     a    2
 b     b    1
 b     b    2

Would appreciate any help. Thank you.

EDIT:

Giving extra clarity for what I need. I'd like to check if VAR3 contains both of the values 1 AND 2 for each combination of VAR1 and VAR2 present. Else delete if the entry if it contains only one of the values or none.

Thank you.

2

2 Answers

1
votes

Since your condition depends on all of the values in the VAR1*VAR2 group you probably want to use a double DOW loop. In the first loop calculate flags and in the second loop use those to decide which observations to write.

data have;
  input VAR1 $ VAR2 $ VAR3 @@;
cards;
a a 1 a a 2 a b 1 a b 2 b a 1 b a 2 b b 1 b b 2 c a 1 c b 1 d a 1
;


data want;
  do until(last.var2);
    set have;
    by VAR1 VAR2 VAR3;
    if var3=1 then any1=1;
    else if var3=2 then any2=1;
    else anyother=1;
  end;
  do until(last.var2);
    set have;
    by VAR1 VAR2 VAR3;
    if any1 and any2 and not anyother then output;
  end;
  drop any1 any2 anyother;
run;
0
votes

something like this.

  data have;
input
VAR1 $ VAR2 $;
datalines;
a     a
a     a
a     b
a     b
b     a
b     a
b     b
b     b
c     a
c     b
d     a 
;

proc sort data=have ;
 by var1 var2;
run;

data want;
set have;
by var1 var2;
if first.var1 or first.var2 then var3=1;
else var3+1;
if (first.var1 and last.var1) or (first.var2 and last.var2) then delete;
run;

proc print;

enter image description here