1
votes

I have two data sets in which some records are relevant to each other.

E.g.

Dataset1

Var1

abcde

bad man

big bang

strange

everyday

exactly

Dataset2

var1

abc

cde

bad

bad man a

stranger

Now I want to compare those records using a loop logic, and here is my code.

%let id1=%sysfunc(open(dataset2,in)); %let colterm=%sysfunc(varnum(&id1,var1)); %do %while(%sysfunc(fetch(&id1)) eq 0); %let vterm=%sysfunc(getvarc(&id1,&colterm));

        data dataset1;
            set dataset1;
            if index(strip(var1),strip("&vterm"))>0 or index(strip("&vterm"),strip(var1))>0 then do;/*when one contains the other*/
            match="Fuzzy";
            cnt=cnt+1;
            end;
        run;
    %end;

    proc sql noprint;
      select max(cnt) into:maxnum/*to get max cnt*/
      from dataset1;
    quit;

Now dataset1 looks like below

Var1 cnt match

abcde 2 Fuzzy

bad man 2 Fuzzy

big bang 0

strange 1 Fuzzy

everyday 0

exactly 0

I want to merge those relevant records in dataset2 into dataset1, and the new dataset1 should look like below

Var1 cnt match FM_dataset2_1 FM_dataset2_2

abcde 2 Fuzzy abc cde

bad man 2 Fuzzy bad bad man a

big bang 0

strange 1 Fuzzy stranger

everyday 0

exactly 0

As you can see the new variables FM_dataset2_1 and FM_dataset2_2 are auto-reassigned ones based one counter, cnt. But I just couldn't think out a proper way of realizing this step using SAS code.

Further more, I need to output the dataset into an xml file. And the result should look like below

<text>abcde</text>
<match>Fuzzy</match>
  <matchitem>abc</matchitem> 
  <matchitem>tecde</matchitem> 

The problem, as with the issue above, is also about how to determine the number of matchitem element and write into the file. In xml map file, I can determine the position as follows

     <COLUMN name="FM_dataset2_1">
            <PATH syntax="XPath">/../matchitem[position()=**1**]</PATH>
...

     <COLUMN name="FM_dataset2_2">
            <PATH syntax="XPath">/../matchitem[position()=**2**]</PATH>

But this has to be done mannually case by case. Is it possible to customize map file based on cnt counter(maxnum) automatically?

Can anybody suggest?

1

1 Answers

1
votes

I'm sure there is more efficient code than the following, but I tried to stay with your line of thought. I am not familiar with working with the XML engine, so I'll leave that part to someone else. Otherwise, if you need to create it manually then you were on the right track creating the MAXNUM macro variable, then you can use it in a loop.

%let id1=%sysfunc(open(dataset2,in));
%let colterm=%sysfunc(varnum(&id1,var1));
%do %while(%sysfunc(fetch(&id1)) eq 0);
%let vterm=%sysfunc(getvarc(&id1,&colterm));

   data dataset1;
      set dataset1;
      format vterm $20.;
      if match eq "Fuzzy" then output;
      if index(strip(var1),strip("&vterm"))>0 or index(strip("&vterm"),strip(var1))>0 then do;
         cnt=sum(cnt,1);
         match="Fuzzy";
         vterm = "&vterm";
         output;
      end;
      else do;
         cnt=sum(cnt,0);
         output;
      end;
   run;

   proc sort data=dataset1;
      by var1 match vterm descending cnt;
   proc sort data=dataset1 nodupkey;
      by var1 match vterm;
   run;
%end;

proc sql;
   create table maxcnt as
      select
         var1,
         match,
         max(cnt) as cnt
      from dataset1
      group by 1,2
      ;
   quit;
run;

proc transpose data=dataset1 out=dataset1(drop=FM_dataset2_0 _name_) prefix=FM_dataset2_;
   by var1 match;
   id cnt;
   var vterm;
run;

data dataset1;
   merge dataset1 maxcnt;
   by var1 match;
run;

%let id2=%sysfunc(close(&id1));    /*closes out dataset2 in case you need it later */