Inputting missing value in primary dataset based on values in secondary dataset and a matching condition

Question

my understanding of SAS is very elementary. I am trying to do something like this and i need help.

I have a primary dataset A with 20,000 observations where Col1 stores the CITY and Col2 stores the MILES. Col2 contains a lot of missing data. Which is as shown below.

+----------------+---------------+
|       Col1     |    Col2       |
+----------------+---------------+
|  Gary,IN       |       242.34  |
+----------------+---------------+
| Lafayette,OH   |       .       |
+----------------+---------------+
|   Ames, IA     |       123.19  |
+----------------+---------------+
|  San Jose,CA   |       212.55  |
+----------------+---------------+
| Schuaumburg,IL |       .       |
+----------------+---------------+
| Santa Cruz,CA  |       454.44  |
+----------------+---------------+

I have another secondary dataset B this has around 5000 observations and very similar to dataset A where Col1 stores the CITY and Col2 stores the MILES. However in this dataset B, Col2 DOES NOT CONTAIN MISSING DATA.

+----------------+---------------+
|       Col1     |    Col2       |
+----------------+---------------+
| Lafayette,OH   |       321.45  |
+----------------+---------------+
|  San Jose,CA   |       212.55  |
+----------------+---------------+
| Schuaumburg,IL |       176.34  |
+----------------+---------------+
| Santa Cruz,CA  |       454.44  |
+----------------+---------------+

My goal is to fill the missing miles in Dataset A based on the miles in Dataset B by matching the city names in col1.

In this example, I am trying to fill in 321.45 in Dataset A from Dataset B and similarly 176.34 by matching Col1 (city names) between the two datasets.

I am need help doing this in SAS

Pekka Pekka · Accepted Answer · 2014-08-16T19:07:48

You just have to merge the two datasets. Note that values of Col1 needs to match exactly in the two datasets.

Also, I am assuming that Col1 is unique in dataset B. Otherwise you need to somehow tell more exactly what value you want to use or remove the duplicates (for example by adding nodupkey in proc sort statement).

Here is an example how to merge in SAS:

proc sort data=A;
    by Col1;

proc sort data=B;
    by Col1;

data AB;
    merge A(in=a) B(keep=Col1 Col2 rename=(Col2 = Col2_new));
    by Col1;
    if a;
    if missing(Col2) then Col2 = Col2_new;
    drop Col2_new;
run;

This includes all observations and columns from dataset A. If Col2 is missing in A then we use the value from B.

Inputting missing value in primary dataset based on values in secondary dataset and a matching condition

2 Answers