0
votes

Long story short, I need to compare two data sets (A and B). B should be a copy of A but on an Impala server. Everyday I am retrieving B to SAS from the Impala server through a SQL pass-through.

I am writing a program that will be run everyday to make sure those two data sets match 1:1. However, I am struggling with numerical approximations (precision).

Here is an example of A:

ID value
01 0
02 5000
03 978908.69
04 109789503.12
05 49505954.92

In order to make the comparison, I am concatenating the two columns in a conc column and then I am comparing A and B.

data want;
set have;
conc=cats(id,value);
run;

This basically works for all the observations except one. The observation has a value for A of 9128554507.9 and B of 9128554507.8. No format or informat are applied to the variables.

                                                enter image description here

However, when applying a comma32.30 format on the value variable, I see that for A and B, the value is the same 9128554507.850000000000000000000. So they should be the same.

                                 enter image description here

Then I saw the following in the documentation for the CATS function

The CATS function removes leading and trailing blanks from numeric arguments after it formats the numeric value with the BESTw. format.

Then I thought it would be a good idea to transform the numeric variables to character variables. I did it using this macro:

/*macro to convert all numeric to char*/
%macro vars(dsn, outp);
  %let list=;
  %let type=;
  %let dsid=%sysfunc(open(&dsn));
  %let cnt=%sysfunc(attrn(&dsid,nvars));
   %do i = 1 %to &cnt;
    %let list=&list %sysfunc(varname(&dsid,&i));
    %let type=&type %sysfunc(vartype(&dsid,&i));
   %end;
  %let rc=%sysfunc(close(&dsid));
  data &outp(drop=
    %do i = 1 %to &cnt;
     %let temp=%scan(&list,&i);
       _&temp
    %end;);
   set &dsn(rename=(
    %do i = 1 %to &cnt;
     %let temp=%scan(&list,&i);
       &temp=_&temp
    %end;));
    %do j = 1 %to &cnt;
     %let temp=%scan(&list,&j);
   /** Change C to N for numeric to character conversion  **/
     %if %scan(&type,&j) = N %then %do;
   /** Also change INPUT to PUT for numeric to character  **/
      &temp=PUT(_&temp,best.);
     %end;
     %else %do;
      &temp=_&temp;
     %end;
    %end;
  run;
%mend vars;

See that the best. format was used. Unfortunately I end up with the same value has before, a value for A of 9128554507.9 and B of 9128554507.8. I don't know what is happening between the two data sets? The "real" value seems to be the same but when applying the best. format for both of then, SAS rounds the value of A to .9 and the value of B to .8.

                                                enter image description here

Is there any workaround ? I can't reproduce the error when trying to manually input the observation in a datalines statement. I don't want to round the values to a pre-defined decimal. Ideally, I would want to dynamically truncate the value to the number of actual decimal - 1 (e.g. 3462829.374 becomes 3462829.37 and 18726347.39 becomes 18726347.3) for both tables and then compare them.

2
I thought the goal was to compare them, so why are concatenating the two variables? Why not just compare using the original two variables?Tom
In the larger problem, I have ~15 columns in A and in B. I concat them and use md5() to compute a hash that I compare between the two tables. I only showed the variable (value) because this is the only one that cause problem.Kermit
I am curious why you are using macro functions for this instead of arrays in the data step. It would make the code easier to work with.Stu Sztukowski

2 Answers

2
votes

I have no idea what your larger problem is but to answer the question at the end just convert the number to a string using BEST32. format and remove the last character when the resulting string contains a decimal point.

data test;
  input number expect $32. ;
  string=put(number,best32.-l);
  if index(string,'.') then string=substrn(string,1,length(string)-1);
  format number best32. ;
cards;
3462829.374  3462829.37
18726347.39  18726347.3
12345 12345
.
;

Results:

Obs         number    expect        string

 1     3462829.374    3462829.37    3462829.37
 2     18726347.39    18726347.3    18726347.3
 3           12345    12345         12345
 4               .
0
votes

Can you compute the fuzz and examine that ?

* retrieve a data from impala;
proc sql;
  create table b as select * from connection to impala (select * from a);

data a_b_fuzz;
  merge a b(rename=number=impala_number);
  by id;
  fuzz = number - impala_number;
run;