Dataset HAVE includes id values and a character variable of names. Values in names are usually missing. If names is missing for all values of an id EXCEPT one, the obs for IDs with missing values in names can be deleted. If names is completely missing for all id of a certain value (like id = 2 or 5 below), one record for this id value must be preserved.
In other words, I need to turn HAVE:
id names
1
1
1 Matt, Lisa, Dan
1
2
2
2
3
3
3 Emily, Nate
3
4
4
4 Bob
5
into WANT:
id names
1 Matt, Lisa, Dan
2
3 Emily, Nate
4 Bob
5
I currently do this by deleting all records where names is missing, then merging the results onto a new dataset KEY with one variable id that contains all original values (1, 2, 3, 4, and 5):
data WANT_pre;
set HAVE;
if names = " " then delete;
run;
data WANT;
merge KEY
WANT_pre;
by id;
run;
This is ideal for HAVE because I know that id is a set of numeric values ranging from 1 to 5. But I am less sure how I could do this efficiently (A) on a much larger file, and (B) if if I couldn't simply create an id KEY dataset by counting from 1 to n. If your HAVE had a few million observations and your id values were more complex (e.g., hexadecimal values like XR4GN), how would you produce WANT?
namesin different rows with the sameid? If so, whichnameswould you keep? - user667489