1
votes

I am aiming to convert wave data (read from .wav file with wave module) to a numpy array.

The data is currently formatted as a byte array. This means each element of the byte array is 8 bits wide. The wav file is mono, so contains only 1 channel. Most wav files are stereo however and therefore the data are formatted as a sequence of samples for the left and right channels interweaved.

The samples are 16 bits, therefore each pair of sequential bytes in the array is 1 "16 bit" sample. Some audio files are 24 bits per sample. The number of bits per sample can be obtained from

len(bytearray) // (wave.getnframes() *  wave.getnchannels())

So I need to somehow

  • group bytes into pairs of bytes (samples)
  • copy the pairs of bytes to some new storage with a "stride". for mono stride = 0 ? for stereo the stride is presumably 1? (It would depend on how python counts in memory)
  • convert the new storage to a numpy array
  • at some point convert from 16 bit signed integer format to floating point format, which could be done at any stage of the process

I could implement a C++ style solution, using for loops and indices. I assume this will be very slow in python.

My guess is that Python (probably) includes some functions for

  • conversion between int and float/double formats (perhaps as a numpy array or perhaps elsewhere in the above described process)
  • "de-interlacing" data (seperating out the left/right channel data from the byte array)
  • converting a byte array with a specified format to a numpy array with a specified type

However, I have no ideas what these might be or what form they might be in. (Builtins? Libraries/Modules?)

This seemed like a problem that should be easily "duckduckgoable" - but I had no luck. Probably working with wav format data these days is a bit of a niche application?

Even a simple answer with a list of things to type into duckduckgo would be appreciated. I can read/figure out the documentation, I just don't know what to search for.

1

1 Answers

0
votes

I usually do so with scipy.io.wavfile.read, it will parse the wave file header and give you the data as a numpy array and the sampling frequency obtained from the header.

If you really want to start from the bytes you could use numpy.frombuffer

data_s16 = np.frombuffer(bytes, dtype=np.int16, count=len(bytes)//2, offset=0)
float_data = data_s16 * 0.5**15