0
votes

I need to create a (simple?) error report from a dataset. The dataset looks like this:

Contract   DrvrNum   LicNum
 -------   -------   ---------
 2212621         2   8241323
 2212621         2   65256129
 6385371         1   973385261
 6385371         3   973385261
 2366922         1   B931151BA
 2366922         2   B931151BA
 1007922         1   60916004
 1007922         2   60916004

The first 2 observations indicate that I have two different license numbers for the same driver, whereas the next three pairs of observations indicate that a license number has been duplicated for two or more drivers.

My output needs to look like this:

Contract   DrvrNum   LicNum     ErrorReason
 -------   -------   ---------  -----------
 2212621         2   8241323    
 2212621         2   65256129   Multiple License Numbers for Same Driver
 6385371         1   973385261
 6385371         3   973385261  Duplicate License Number
 2366922         1   B931151BA
 2366922         2   B931151BA  Duplicate License Number
 1007922         1   60916004
 1007922         2   60916004   Duplicate License Number

I have tried using the LAG() function in a data step combined with first.Contract = 0, but because every other observation is FALSE, the lag values got all out of whack, keeping me from doing something like:

if LicNum = lag(LicNum) then ErrorReason = 'Duplicate License Number';
else ErrorReason = 'Multiple License Numbers for Same Driver';

If anyone can provide some assistance, I would be grateful. I'm at my wit's end with this one.

Thanks!

1
tried using the LAG() function in a data step combined with first.Contract = 0, can you show that code ? Were they in separate statements or combined in a single statement. Values that need to be carried over to the next iteration of the implicit loop should be stored in a retained variable.Richard

1 Answers

1
votes

The queue of values for LAG() is based on when you execute the LAG() function. It has nothing to do with the observations in your dataset. So in general you do not want to conditionally execute the LAG() function. So instead you should assign the value unconditionally to a variable and then you can conditionally test the value of the variable.

But LAG() will not solve your problem if there are more than two observations per contract. Try something like this that will keep track of all of the license numbers under the contract.

data have ;
  input Contract :$10. DrvrNum LicNum :$10. ;
cards;
2212621 2 8241323
2212621 2 65256129
6385371 1 973385261
6385371 3 97338526x
6385371 3 973385261
2366922 1 B931151BA
2366922 2 B931151BA
1007922 1 60916004
1007922 2 60916004
;

data want ;
   set have ;
   by contract drvrnum licnum notsorted;
   length ErrorReason $100 LicenseList $200 ;
   retain licenselist ;
   if first.contract then licenselist=LicNum;
   else do;
     if indexw(licenselist,LicNum,' ') then
       ErrorReason = catx(' ',ErrorReason,'Duplicate License Number.')
     ;
     else licenselist=catx(' ',licenselist,licnum);
     if first.licnum and not first.drvrnum then
       ErrorReason = catx(' ',ErrorReason,'Multiple License Numbers for Same Driver.')
     ;
   end;
run;

enter image description here