1
votes

I have a binary file of fixed length records created in MS SSIS which I need to read into SAS 9.4 64bit. Currently the file is read within a data step using this code:

data outputdata.(EOC=no
                  compress = yes 
                  keep = a b c);

length a $4.;
length b 4.;

infile "&inputfile." obs= 999999999 lrecl=308 recfm=F;

input @5    a $4.
      @9    b ib4.
      @13   c rb4.
;
...
...
...

All variables are read correctly into the output dataset except c. c is a floating point number with 2dp, minimum value 0.00 and maximum value 99.99. In case it's useful, c starts its life off as a VB.Net Single value which is converted to binary using VB.Net's BitConverter.GetBytes(Single) which returns a 4-byte array. This array is then written to the binary record.

From what I can tell from my research on the subject rb4. is the correct way to read a 4-byte floating point ('real'?) value from a binary record in SAS so presumably the issue lies in how to then format that value so that it appears correctly in the output dataset. I've tried the following:

format c rb2.2;
format c 2.2;
format c 4.;

along with variations on the values of the formats statements (e.g. format c 5.; etc). None of the formats I've tried have resulted in anything close to the correct values; most result in numbers in scientific form such as 17E9.

c is a new addition to the binary file and is the only 'real' variable contained within it so I don't have an example to work from. I'm new to SAS and have inherited this project so there's a good chance the issue is something fairly fundamental!

Any guidance appreciated. Thanks

1
Is this SSIS /etc. on a Windows machine? And for reference, best12. should be a perfectly fine format to use (which is the default if you don't mention a format). Your issue is input informat, not output format, likely.Joe
From a quick scan of the docs for the VB.NET function and SAS's RBw.d format, I would say that RB4. is a reasonable choice, but FLOAT4. might be better - see the note about IEEE floats here: support.sas.com/documentation/cdl/en/leforinforref/69823/HTML/…. It would also be worth checking the byte order in the source binary file - you might be reading the value with the wrong endian-ness. Try opening the file in a hex editor and converting the bytes you find with this: h-schmidt.net/FloatConverter/IEEE754.htmlChris Long
For the most useful help, you might read it in with $4 and do put c $hex4. and post a few lines, with the value you think it should be, and what $hex4shows it as. Chris might be correct about endian issues, though SSIS should be on Windows I would expect...Joe
Yes, sorry - all on Windows, so hopefully endianness shouldn't be an issue. Thanks for the pointers; will give them a try and report backSpink
Chris hit the nail on the head - I needed float4.. The values now come out perfectly (with a little tweaking via a format statement). Thanks both for your comments - I don't know if this is the done thing here but Chris if you'd like to construct a formal answer I'll mark it as correct.Spink

1 Answers

1
votes

Repeating my comment as an answer...

You should use FLOAT4. to read a value that was written by the VB.NET BitConverter.GetBytes(Single) function. The RB4. informat reads four input bytes as if they are a truncated double-precision floating-point value, but the output of the VB.NET function is a single-precision floating-point value, aka a 'float', which is not the same thing.

The note on SAS's documentation page for the FLOAT format explains:

The FLOATw.d informat is useful in operating environments where a float value is not the same as a truncated double.

On the IBM mainframe systems, a four-byte floating-point number is the same as a truncated eight-byte floating-point number. However, in operating environments that use the IEEE floating-point standard, such as the IBM PC-based operating environments and most UNIX platforms, a four-byte floating-point number is not the same as a truncated double. Therefore, the RB4. informat does not produce the same results as FLOAT4. Floating-point representations other than IEEE might have this same characteristic. Values read with FLOAT4. typically come from some other external program that is running in your operating environment.