Merging two datasets with different formats in sas

Question

I have two sas data sets as mentioned below ie Table1 and Table2. There is one common variable among these two datasets by name Account. But the problem I am facing is that the format of the data in table1 account and table2 account are diffrent as shown below. I have 2 problems here: Problem1: There are double inverted commas("") and hyphen (-) in the Table1 Account Problem2: The Table2 Account has continous numbers with different digit length. I want the Table2 Account to be a 12 digit number by prefixing required number of zero's to make them 12 digit number. Also change the format of Account number as present in Table1 so that I can pull the account numbers in Table1 matching the Table2 account numbers.

Table1

   ID    Account              dt
    1   "212-3276-45600"    454
    2   "562-3248-45674"    565
    3   "789-3946-45888"    6767
    4   "382-3776-45612"    766
    5   "232-3446-45674"    767
    6   "038-3276-45674"    77
    7   "232-3246-45674"    88

Table2

Account
562324845674
789394645888
38327645674

Joe Joe · Accepted Answer · 2013-10-30T20:41:42

To use an actual data step merge, you first have to make the variables match in type/format/etc. You could do this in SQL:

proc sql;
create table want as select t1.account, t1.id, t1.dt
  from table1 t1, table2 t2
  where input(compress(t1.account,,'kd'),BEST12.) = t2.account;
quit;

You can't manipulate the merge variable in a data step merge, but you could in a previous data step.

data table1_fixed;
set table1;
new_account = input(compress(account,,'kd'),BEST12.);
run;

Then rename table2's account to the same thing. I don't recommend trying to make t2's account fit t1, as it's more complex.

What I'm doing here is using compress to remove or keep unwanted characters; the third argument 'k' means 'keep' (not remove) and 'd' means 'digits'. So it keeps only numeric digits and removes the rest. Then input converts it to a number.

Merging two datasets with different formats in sas

1 Answers