Reading binary file into SAS

Question

I have a binary file of fixed length records created in MS SSIS which I need to read into SAS 9.4 64bit. Currently the file is read within a data step using this code:

data outputdata.(EOC=no
                  compress = yes 
                  keep = a b c);

length a $4.;
length b 4.;

infile "&inputfile." obs= 999999999 lrecl=308 recfm=F;

input @5    a $4.
      @9    b ib4.
      @13   c rb4.
;
...
...
...

All variables are read correctly into the output dataset except c. c is a floating point number with 2dp, minimum value 0.00 and maximum value 99.99. In case it's useful, c starts its life off as a VB.Net Single value which is converted to binary using VB.Net's BitConverter.GetBytes(Single) which returns a 4-byte array. This array is then written to the binary record.

From what I can tell from my research on the subject rb4. is the correct way to read a 4-byte floating point ('real'?) value from a binary record in SAS so presumably the issue lies in how to then format that value so that it appears correctly in the output dataset. I've tried the following:

format c rb2.2;
format c 2.2;
format c 4.;

along with variations on the values of the formats statements (e.g. format c 5.; etc). None of the formats I've tried have resulted in anything close to the correct values; most result in numbers in scientific form such as 17E9.

c is a new addition to the binary file and is the only 'real' variable contained within it so I don't have an example to work from. I'm new to SAS and have inherited this project so there's a good chance the issue is something fairly fundamental!

Any guidance appreciated. Thanks

Is this SSIS /etc. on a Windows machine? And for reference, best12. should be a perfectly fine format to use (which is the default if you don't mention a format). Your issue is input informat, not output format, likely. — Joe
From a quick scan of the docs for the VB.NET function and SAS's RBw.d format, I would say that RB4. is a reasonable choice, but FLOAT4. might be better - see the note about IEEE floats here: support.sas.com/documentation/cdl/en/leforinforref/69823/HTML/…. It would also be worth checking the byte order in the source binary file - you might be reading the value with the wrong endian-ness. Try opening the file in a hex editor and converting the bytes you find with this: h-schmidt.net/FloatConverter/IEEE754.html — Chris Long
For the most useful help, you might read it in with $4 and do put c $hex4. and post a few lines, with the value you think it should be, and what $hex4shows it as. Chris might be correct about endian issues, though SSIS should be on Windows I would expect... — Joe
Yes, sorry - all on Windows, so hopefully endianness shouldn't be an issue. Thanks for the pointers; will give them a try and report back — Spink
Chris hit the nail on the head - I needed float4.. The values now come out perfectly (with a little tweaking via a format statement). Thanks both for your comments - I don't know if this is the done thing here but Chris if you'd like to construct a formal answer I'll mark it as correct. — Spink

Chris Long Chris Long · Accepted Answer · 2017-07-12T12:00:26

Repeating my comment as an answer...

You should use FLOAT4. to read a value that was written by the VB.NET BitConverter.GetBytes(Single) function. The RB4. informat reads four input bytes as if they are a truncated double-precision floating-point value, but the output of the VB.NET function is a single-precision floating-point value, aka a 'float', which is not the same thing.

The note on SAS's documentation page for the FLOAT format explains:

The FLOATw.d informat is useful in operating environments where a float value is not the same as a truncated double.

On the IBM mainframe systems, a four-byte floating-point number is the same as a truncated eight-byte floating-point number. However, in operating environments that use the IEEE floating-point standard, such as the IBM PC-based operating environments and most UNIX platforms, a four-byte floating-point number is not the same as a truncated double. Therefore, the RB4. informat does not produce the same results as FLOAT4. Floating-point representations other than IEEE might have this same characteristic. Values read with FLOAT4. typically come from some other external program that is running in your operating environment.

Reading binary file into SAS

1 Answers