0
votes

I have two data sets that come from different sources, but contain similar variables. One such similar variable is "Procedure" from data set A and "SurgicalProcedure" from data set B.

I have already merged these two data sets into one giant data set we'll call data set C.

Sometimes the observations in these two variables match, sometimes they are different, sometimes an observation is missing in one but present in the other.

What I would like to do is merge the two variables so that "Procedure" overwrites whatever is in "SurgicalProcedure", but if "Procedure" is missing an observation and "SurgicalProcedure" has that observation, then the "SurgicalProcedure" observation is kept in the final merged column.

Example:

Procedure          SurgicalProcedure
1                  total mastectomy
2                  segmental mastectomy
.                  MRM
5                  Seg. Mast with IOLM
7                  .
10                 total mast. w/ IOLM, SLNB
.                  Seg. Mast with IOLM, SLNB
3                  OLM, SLNB

Desired Result:

Procedure
1
2
MRM
5
7
10
Seg. mast. w/ IOLM, SLNB
3

Can someone help me solve my problem in SAS? If not SAS, then R is okay, or even Excel, but prefer SAS.

Thanks!

2
How are you "merging" the files? You did not show any id variables that could be use to match the observations.Tom
Are you just looking for the coalescec() function?Tom

2 Answers

0
votes

The following is rather crude, but it may meet your needs. It is crude in that it assumes the value in "surgical" is always going to be what you want. There is no error checking and no warning if the value is not what you want.

The data step "MergeAB" performs a "One-to-One" reading of data sets A and B. It then examines the "procedure" variable. If the value is missing, it is replaced with the value of the "surgical" variable. Otherwise, the value of the "procedure" variable is kept. Remove the drop statement and you will be able to more clearly see how the MergeAB data step is functioning. That is, replace the line data MergedAB (drop = procedure surgical); with just data MergedAB;.

This recreates the data you describe. The ampersand & in the input statement tells SAS to read in spaces.

/*Create dataset A*/
data A;
  length procedure $ 100;
  input procedure;

datalines;
1                 
2                 
.                 
5                 
7                 
10                
.                 
3                 
;
run;  



/*Create data set B*/
data B;
  length surgical $ 100;
  input surgical &;

datalines;
total mastectomy
segmental mastectomy
MRM
Seg. Mast with IOLM
.
total mast. w/ IOLM, SLNB
Seg. Mast with IOLM, SLNB
OLM, SLNB
;
run;

This step actually performs the merge:

/*Merge A and B*/
data MergedAB (drop = procedure surgical);
  set A;
  set B;

  if missing(procedure) then merged = surgical;
  else merged = procedure;
run;

I should note that I assumed your data was of character type. However, it looks like the "procedure" column could be numeric data. If this is the case, you may need to use a PUT statement to convert the numeric to character to avoid implicit type conversions.

0
votes

From where you are just use a data step. The following should do the trick:

data want;
drop procedure SurgicalProcedure;    
set have;
if procedure = . then char_procedure = put(SurgicalProcedure, 7.) ; 
else char_procedure = procedure;
run;

Notice that your new variable is a character variable