0
votes

This should be very simple, but somehow I confuse myself.

data in_both 
   missing_name (drop = name);

   merge employee (in=in_employee)
         hours (in = in_hours);

         by ID;

   if in_employee and in_hours then output in_both;
   else if in_employee and not in_hours then output missing_name;

run;

I have two questions: (1): For the first statement "missing_name(drop = name)", I understand that, it means keep all the data except the column whose head is name. But keep which data here? What is the input? (2): I know we can create two datasets within one data step, but that means we should use "data in_both missing_name", instead of "data in_both", right?

Many thanks for your time and attention. I appreciate your help.

3

3 Answers

0
votes

(1) The DROP= option refers to dropping variables from the dataset MISSING_NAME. With no drop= or keep= option, all variables that exist in EMPLOYEE or HOURS would be written to MISSING_NAME. You can run PROC CONTENTS on the four datasets to see which variables are included in each.

(2) As written, your code will output two datasets IN_BOTH and MISSING_NAME. As @Tom just commented, your current DATA statement already lists both datasets, because the semicolon ends the statement, not the white space/carriage return.

0
votes

The DATA statement is determining which datasets will be created by the data step. The dataset options, like the DROP= option in your example, can we used to control which of the variables are written into those datasets.

It is the OUTPUT statement that is deciding which observations will be written. So in your example your IF/THEN/ELSE logic is determining which output statements to execute.

0
votes

Using your posted code:

data in_both 
   missing_name (drop = name);
   merge employee (in=in_employee)
          hours (in = in_hours);
         by ID;
run;

Inputs - merge_employee & hours Outputs - in_both & missing_name

In this example the output missing_name has the column NAME dropped.

The best way to view what's going on if the line breaks are confusing is to look for the semi-colon. At first glance I got a little confused too!