0
votes

my understanding of SAS is very elementary. I am trying to do something like this and i need help.

I have a primary dataset A with 20,000 observations where Col1 stores the CITY and Col2 stores the MILES. Col2 contains a lot of missing data. Which is as shown below.

+----------------+---------------+
|       Col1     |    Col2       |
+----------------+---------------+
|  Gary,IN       |       242.34  |
+----------------+---------------+
| Lafayette,OH   |       .       |
+----------------+---------------+
|   Ames, IA     |       123.19  |
+----------------+---------------+
|  San Jose,CA   |       212.55  |
+----------------+---------------+
| Schuaumburg,IL |       .       |
+----------------+---------------+
| Santa Cruz,CA  |       454.44  |
+----------------+---------------+

I have another secondary dataset B this has around 5000 observations and very similar to dataset A where Col1 stores the CITY and Col2 stores the MILES. However in this dataset B, Col2 DOES NOT CONTAIN MISSING DATA.

+----------------+---------------+
|       Col1     |    Col2       |
+----------------+---------------+
| Lafayette,OH   |       321.45  |
+----------------+---------------+
|  San Jose,CA   |       212.55  |
+----------------+---------------+
| Schuaumburg,IL |       176.34  |
+----------------+---------------+
| Santa Cruz,CA  |       454.44  |
+----------------+---------------+

My goal is to fill the missing miles in Dataset A based on the miles in Dataset B by matching the city names in col1.

In this example, I am trying to fill in 321.45 in Dataset A from Dataset B and similarly 176.34 by matching Col1 (city names) between the two datasets.

I am need help doing this in SAS

2

2 Answers

1
votes

You just have to merge the two datasets. Note that values of Col1 needs to match exactly in the two datasets.

Also, I am assuming that Col1 is unique in dataset B. Otherwise you need to somehow tell more exactly what value you want to use or remove the duplicates (for example by adding nodupkey in proc sort statement).

Here is an example how to merge in SAS:

proc sort data=A;
    by Col1;

proc sort data=B;
    by Col1;

data AB;
    merge A(in=a) B(keep=Col1 Col2 rename=(Col2 = Col2_new));
    by Col1;
    if a;
    if missing(Col2) then Col2 = Col2_new;
    drop Col2_new;
run;

This includes all observations and columns from dataset A. If Col2 is missing in A then we use the value from B.

1
votes

Pekka's solution is perfectly working, I add an alternative solution for the sake of completeness.

Sometimes in SAS a PROC SQL lets you skip some passages compared to a DATA step (with the relative gain in storage resources and computational time), and a MERGE is a typical example.

Here you can avoid sorting both input datasets and handling the renaming of variables (here the matching key has the same name col1 but in general this is not the case).

proc sql;
    create table want as
    select A.col1,
            coalesce(A.col2,B.col2) as col2
        from A left join B
            on A.col1=B.col1
        order by A.col1;
quit;

The coalesce() function returns the first non missing element encountered in the arguments list.