Long story short, I need to compare two data sets (A
and B
). B
should be a copy of A
but on an Impala server. Everyday I am retrieving B
to SAS from the Impala server through a SQL pass-through.
I am writing a program that will be run everyday to make sure those two data sets match 1:1. However, I am struggling with numerical approximations (precision).
Here is an example of A:
ID | value |
---|---|
01 | 0 |
02 | 5000 |
03 | 978908.69 |
04 | 109789503.12 |
05 | 49505954.92 |
In order to make the comparison, I am concatenating the two columns in a conc
column and then I am comparing A
and B
.
data want;
set have;
conc=cats(id,value);
run;
This basically works for all the observations except one.
The observation has a value for A
of 9128554507.9
and B
of 9128554507.8
. No format or informat are applied to the variables.
However, when applying a comma32.30
format on the value variable, I see that for A
and B
, the value is the same 9128554507.850000000000000000000
. So they should be the same.
Then I saw the following in the documentation for the CATS function
The CATS function removes leading and trailing blanks from numeric arguments after it formats the numeric value with the BESTw. format.
Then I thought it would be a good idea to transform the numeric variables to character variables. I did it using this macro:
/*macro to convert all numeric to char*/
%macro vars(dsn, outp);
%let list=;
%let type=;
%let dsid=%sysfunc(open(&dsn));
%let cnt=%sysfunc(attrn(&dsid,nvars));
%do i = 1 %to &cnt;
%let list=&list %sysfunc(varname(&dsid,&i));
%let type=&type %sysfunc(vartype(&dsid,&i));
%end;
%let rc=%sysfunc(close(&dsid));
data &outp(drop=
%do i = 1 %to &cnt;
%let temp=%scan(&list,&i);
_&temp
%end;);
set &dsn(rename=(
%do i = 1 %to &cnt;
%let temp=%scan(&list,&i);
&temp=_&temp
%end;));
%do j = 1 %to &cnt;
%let temp=%scan(&list,&j);
/** Change C to N for numeric to character conversion **/
%if %scan(&type,&j) = N %then %do;
/** Also change INPUT to PUT for numeric to character **/
&temp=PUT(_&temp,best.);
%end;
%else %do;
&temp=_&temp;
%end;
%end;
run;
%mend vars;
See that the best.
format was used. Unfortunately I end up with the same value has before, a value for A
of 9128554507.9
and B
of 9128554507.8
. I don't know what is happening between the two data sets? The "real" value seems to be the same but when applying the best.
format for both of then, SAS rounds the value of A
to .9
and the value of B
to .8
.
Is there any workaround ? I can't reproduce the error when trying to manually input the observation in a datalines statement. I don't want to round the values to a pre-defined decimal. Ideally, I would want to dynamically truncate the value to the number of actual decimal - 1
(e.g. 3462829.374
becomes 3462829.37
and 18726347.39
becomes 18726347.3
) for both tables and then compare them.