EDIT: My earlier speculations were all wrong. The problem was that the database file had big-endian numbers and the data were being read on a little-endian computer. Skip down to the section marked "START READING HERE" to avoid wasting your time with the early speculations (left here for their very limited historical value).
I suspect that your problem is related to the printf()
format specification %lf
used to print values, together with the fact that you declared d_buffer
to be of type int
. Likely sizeof(double) > sizeof(int)
is true, so you are interpreting extra bytes of data to be part of the float values.
I don't know this for sure since I can't see the data your program is using to run, but if your fieldsize
for float data is sizeof(float)
rather than sizeof(double)
then perhaps you are storing the float values into d_buffer
just fine but messing them up when printing. Alternatively, if fieldsize
is equal to sizeof(double)
, and sizeof(double)
is greater than sizeof(int)
, then you are writing off the end of d_buffer
and then something is corrupting your data.
I suggest you change the declaration to double d_buffer[10];
and see if your program works better. Also, see if fieldsize
is set to sizeof(float)
or sizeof(double)
; if it is sizeof(float)
then declare d_buffer
as type float
and use this code:
if ( DataDes[j].fieldtype == 'F' )
{
double d = d_buffer[j];
printf("%lf ", d);
}
EDIT: Also, I recommend that you switch to using fopen()
and fread()
for all your I/O. The basic open()
and read()
can return EINTR
which means you need to retry the operation, so these are only properly used inside a loop that will retry the call if it returns EINTR
. The fopen()
and fread()
calls insulate you from that sort of detail, and have some buffering that can make your programs run more smoothly. (I'm pretty sure your current project is only reading and writing small amounts of data, so the performance difference likely doesn't matter much to you at this time.)
Also, I suggest you get rid of your eof()
function; it is very unusual and likely a bit slow to read a character and then use fseek()
to put it back. The C library has the function fgetc()
which gets one character from the input stream, and you can test that character to see if it is the EOF
value to detect end-of-file. And the C library also has the function ungetc()
that can push back one character; this will not involve actually seeking the disk, but rather will just put back the character into some buffer somewhere. But you don't even need to use fgetc()
or ungetc()
for your code; just check the return value from fread()
and if you get a zero-length read, you know you hit end-of-file. In production code, you will need to check the return value of each function call anyway; you can't get away with just hoping that every read succeeds.
EDIT: One more thing you could try: change the format code from "%lf"
to just "%f"
and see what happens. I'm not exactly sure what that l
will do and you shouldn't need it. Plain old "%f"
should format a double
. But it may not change anything: according to this web page I found, in printf()
there is no difference between "%lf"
and "%f"
.
http://www.dgp.toronto.edu/~ajr/209/notes/printf.html
* START READING HERE *
EDIT: Okay, one thing I have figured out for sure. The database format is an index value (an integer value), then a float value, then a string value. You will need to read each one to advance the current position within the file. So your current code that is checking format codes and deciding which thing to read? Not correct; you need to read an integer, a float, and a string for each record.
EDIT: Okay, here is a correct Python program that can read the database. It doesn't bother to read the metadata file; I just hard-coded in the constants (it's okay since it is just a throw-away program).
import struct
_format_rec = ">id20s" # big-endian: int, double, 20-char string
_cb_rec = struct.calcsize(_format_rec) # count of bytes in this format
def read_records(fname):
with open(fname) as in_f:
try:
while True:
idx, f, s = struct.unpack(_format_rec, in_f.read(_cb_rec))
# Python doesn't chop at NUL byte by default so do it now.
s, _, _ = s.partition('\0')
yield (idx, f, s)
except struct.error:
pass
if __name__ == "__main__":
for i, (idx, f, s) in enumerate(read_records("db-data")):
print "%d) index: %d\tfloat: %f \ttext: \"%s\"" % (i, idx, f, s)
So the index values are 32-bit integers, big-endian; the floats are 64-bit floats, big-endian; and the text fields are 20 characters fixed (so a string of 0 to 19 characters plus a terminating NUL byte).
Here's the output of the above program:
0) index: 1 float: 3.141593 text: "Pi"
1) index: 2 float: 12.345000 text: "Secret Key"
2) index: 3 float: 2.718282 text: "The number E"
Now, when I try to compile your C code, I get garbage values out because my computer is little-endian. Are you trying to run your C code on a little-endian computer?
EDIT: To answer the most recent comment, which is a question: For each input record, you must call read()
three times. The first time you read the index, which is a 4-byte integer (big-endian). The second read is an 8-byte float value, also big-endian. The third read is 20 bytes as a string. Each read moves the current position in the file; the three reads together read a single record from the file. After you have read and printed three records, you are done.
Since my computer is little-endian, it's tricky to get the values out correctly in C, but I did it. I made a union that lets me read out an 8-byte value as an integer or as a float, and I used that as the buffer for the call to read()
; then I called __builtin_bswap64()
(a feature in GCC) to swap the big-endian value to little-endian, stored the result as a 64-bit integer, and read it out as a float. I also used __builtin_bswap32()
to swap the integer index. My C program now prints:
The database content is:
1 3.141593 Pi
2 12.345000 Secret Key
3 2.718282 The number E
So, read each record, make sure you are correct with respect to endian-ness of the data, and you will have a working program.
EDIT: Here are code fragments that show how I fixed the endianness problem.
typedef union
{
unsigned char buf[8];
double d;
int64_t i64;
int32_t i32;
} U;
// then, inside of main():
printf("\nThe database content is:\n");
{ /* ------------------
Read next record
------------------ */
for (j = 0; j < n_fields; j++)
{
U u;
read(fd, u.buf, 4);
u.i32 = __builtin_bswap32(u.i32);
i_buffer[j] = u.i32;
read(fd, u.buf, 8);
u.i64 = __builtin_bswap64(u.i64);
d_buffer[j] = u.d;
read(fd, c_buffer[j], 20);
}
I'm sort of surprised the database was in big-endian format. Any computer using an x86 family processor is little-endian.
You should definitely not hard-code those numbers (4, 8, 20) like I did; you should use the metadata you received. I'll leave that for you.
EDIT: Also you shouldn't call __builtin_bswap32()
or __builtin_bswap64()
. You should call ntohl()
and... I'm not sure what the 64-bit one is. But ntohl()
is portable; it will skip the swap if you are compiling on a big-endian computer, and do the swap if you are compiling on a little-endian computer.
EDIT: I found a solution to a 64-bit equivalent to ntohl()
, right here on StackOverflow.
https://stackoverflow.com/a/4410728/166949
It's simple if you only care about Linux: instead of using #include <arpa/inet.h>
you can use #include <endian.h>
and then use be32toh()
and be64toh()
.
Once you have this you can read the database file like so:
u.i64 = be64toh(u.i64);
If you compile the above code on a big-endian machine, it will compile to a do-nothing and read the big-endian value. If you compile the above code on a little-endian machine, it will compile to the equivalent of __builtin_bswap64()
and swap the bytes so the 64-bit value is read correctly.
EDIT: I said in a couple of places that you need to do three separate reads: one to get the index, one to get the float, and one to get the string. Actually, you can declare a struct
and issue a single read that would pull in all the data in one read. The tricky part, though, is that the C compiler might insert "padding" bytes inside your struct
that would cause the byte structure of the struct
to not exactly match the byte structure of the record read from the file. The C compiler should provide a way to control the alignment bytes (a #pragma
statement) but I didn't want to get into the details. For this simple program, simply reading three times solves the problem.
fieldsize
is for the float values. – stevehaprintf()
for the float value is using format code"%d"
; it should be"%f"
. – steveha1
read on a little-endian machine prints as16777216
. That's 2 raised to the 24th power. It works the other way, too: a little-endian1
read on a big-endian machine will also print as16777216
. If I had memorized that, I could have figured this out much faster! – steveha