reading big-endian files in little-endian system

Question

I have a data file that I need to read in C. It is compirsed of alternating 16-bit integer stored in binary form, and I need only the first column (ie, every other entry starting at 0)

I have a simple python script that reads the files accurately:

import numpy as np


fname = '[filename]'
columntypes = np.dtype([('curr_pA', '>i2'),('volts', '>i2')])
test = np.memmap(fname, dtype=columntypes,mode='r')['curr_pA']

I want to port this to C. Because my machine is natively little-endian I need to manually perform the byte swap. Here's what I have done:

void swapByteOrder_int16(double *current, int16_t *rawsignal, int64_t length)
{
    int64_t i;
    for (i=0; i<length; i++)
    {
        current[i] = ((rawsignal[2*i] << 8) | ((rawsignal[2*i] >> 8) & 0xFF));
    }
}


int64_t read_current_int16(FILE *input, double *current, int16_t *rawsignal, int64_t position, int64_t length)
{
    int64_t test;

    int64_t read = 0;

    if (fseeko64(input,(off64_t) position*2*sizeof(int16_t),SEEK_SET))
    {
        return 0;
    }
    test = fread(rawsignal, sizeof(int16_t), 2*length, input);
    read = test/2;
    if (test != 2*length)
    {
        perror("End of file reached");
    }
    swapByteOrder_int16(current, rawsignal, length);
    return read;
}

In the read_current_int16 function I use fread to read a large chunk of data (both columns) into rawsignal array. I then call swapByteOrder_int16 to pick off every other value, and swap its bytes around. I then cast the result to double and store it in current.

It doesn't work. I get garbage as the output in the C code. I think I've been starting at it for too long and can no longer see my own errors. Can anyone spot anything glaringly wrong?

Why does code use double in swapByteOrder_int16(double *current, int16_t *rawsignal, int64_t length) instead of int16_t? — chux - Reinstate Monica
The rest of the code uses double to process the output of the I/O section of the code. Since I'm casting from int16_t to double there should be no loss of precision. There is an implicit cast in the assignment to current[i]. — KBriggs
rawsignal[2*i] means that the input is 32 bits per int. Now, if those are also swapped, you should use rawsignal[2*(i+1)-1] Use a debugger to check what is happening. — Paul Ogilvie
rawsignal should be unsigned. Otherwise the >> 8 may move in 1-bits (sign extension). — Paul Ogilvie
You may not need to port this to C. See Byte-swapping Introduction to byte ordering and ndarrays from the numpy docs. — Steven Rumbalski

chux - Reinstate Monica chux - Reinstate Monica · Accepted Answer · 2016-09-21T17:14:45

Perform the endian swap as unsigned math and then assign to double.

void swapByteOrder_int16(double *current, const int16_t *rawsignal, size_t length) {
    for (size_t i = 0; i < length; i++) {
      int16_t x = rawsignal[2*i];
      x = (x*1u << 8) | (x*1u >> 8);
      current[i] = x;
    }
}

reading big-endian files in little-endian system

4 Answers