5
votes

I'm facing some problems with WAV files in Java.

WAV format: PCM_SIGNED 44100.0 Hz, 24-bit, stereo, 6 bytes/frame, little-endian.

  • I extracted the WAV data to a byte array with no problems.
  • I'm trying to convert the byte array to a double array, but some doubles come with NaN value.

Code:

ByteBuffer byteBuffer = ByteBuffer.wrap(byteArray);
double[] doubles = new double[byteArray.length / 8];
for (int i = 0; i < doubles.length; i++) {
    doubles[i] = byteBuffer.getDouble(i * 8);
}

The fact of being 16/24/32-bit, mono/stereo makes me confused.

I intend to pass the double[] to a FFT algorithm and get the audio frequencies.

3
Sounds like some of these numbers can't actually be interpreted as double...Louis Wasserman
@maszter: not a duplicate, as the bytes don't represent doubles.MvG
@Leandro T if you are satisfied with the answer, then do accept it, or leave a comment why do think an answer should be improved.Vishrant

3 Answers

12
votes

try this:

public static byte[] toByteArray(double[] doubleArray){
    int times = Double.SIZE / Byte.SIZE;
    byte[] bytes = new byte[doubleArray.length * times];
    for(int i=0;i<doubleArray.length;i++){
        ByteBuffer.wrap(bytes, i*times, times).putDouble(doubleArray[i]);
    }
    return bytes;
}

public static double[] toDoubleArray(byte[] byteArray){
    int times = Double.SIZE / Byte.SIZE;
    double[] doubles = new double[byteArray.length / times];
    for(int i=0;i<doubles.length;i++){
        doubles[i] = ByteBuffer.wrap(byteArray, i*times, times).getDouble();
    }
    return doubles;
}

public static byte[] toByteArray(int[] intArray){
    int times = Integer.SIZE / Byte.SIZE;
    byte[] bytes = new byte[intArray.length * times];
    for(int i=0;i<intArray.length;i++){
        ByteBuffer.wrap(bytes, i*times, times).putInt(intArray[i]);
    }
    return bytes;
}

public static int[] toIntArray(byte[] byteArray){
    int times = Integer.SIZE / Byte.SIZE;
    int[] ints = new int[byteArray.length / times];
    for(int i=0;i<ints.length;i++){
        ints[i] = ByteBuffer.wrap(byteArray, i*times, times).getInt();
    }
    return ints;
}
4
votes

Your WAV format is 24 bit, but a double uses 64 bit. So the quantities stored in your wav can't be doubles. You have one 24 bit signed integer per frame and channel, which amounts to these 6 bytes mentioned.

You could do something like this:

private static double readDouble(ByteBuffer buf) {
  int v = (byteBuffer.get() & 0xff);
  v |= (byteBuffer.get() & 0xff) << 8;
  v |= byteBuffer.get() << 16;
  return (double)v;
}

You'd call that method once for the left channel and once for the right. Not sure about the correct order, but I guess left first. The bytes are read from least significant one to most significant one, as little-endian indicates. The lower two bytes are masked with 0xff in order to treat them as unsigned. The most significant byte is treated as signed, since it will contain the sign of the signed 24 bit integer.

If you operate on arrays, you can do it without the ByteBuffer, e.g. like this:

double[] doubles = new double[byteArray.length / 3];
for (int i = 0, j = 0; i != doubles.length; ++i, j += 3) {
  doubles[i] = (double)( (byteArray[j  ] & 0xff) | 
                        ((byteArray[j+1] & 0xff) <<  8) |
                        ( byteArray[j+2]         << 16));
}

You will get samples for both channels interleaved, so you might want to separate these afterwards.

If you have mono, you won't have two channels interleaved but only once. For 16 bit you can use byteBuffer.getShort(), for 32 bit you can use byteBuffer.getInt(). But 24 bit isn't commonly used for computation, so ByteBuffer doesn't have a method for this. If you have unsigned samples, you'll have to mask all signs, and to offset the result, but I guess unsigned WAV is rather uncommon.

0
votes

For floating-point types in DSP they usually prefer values in the range [0, 1] or [0, 1), so you should divide each element by 224-1. Do like the answer of MvG above but with some changes

int t = ((byteArray[j  ] & 0xff) <<  0) |
        ((byteArray[j+1] & 0xff) <<  8) |
         (byteArray[j+2]         << 16);
return t/double(0xFFFFFF);

But double is really a waste of space and CPU for data process purposes. I would recommend convert it to 32-bit int instead, or float which has the same precision (24 bits) but bigger range. In fact 32-bit int or float is the biggest type for a data channel when you do audio or video processing

Finally you can utilize multithreading and SIMD to accelerate the conversion