I am trying to convert a .wav file to a numpy array. I am using the wave library in python to convert the .wav file to a bytes object using readframes(), then converting the bytes object to a numpy array using np.frombuffer().
My .wav file has the following properties:
Number of channels: 2 sample width: 2 frame rate: 44100 number of frames: 4692480 compression type: NONE compression type (human-readable name): not compressed
When I convert it to np array and check the shape I get a shape of 18769920, 4 times what is expected (4692480). I suspect that it is because of the number of channels (2) and the sample width (2).
I have included my code below:
wave_read = wave.open('sample_jazz.wav', mode='rb')
frames = wave_read.readframes(wave_read.getnframes())
sound_arr = np.frombuffer(frames, dtype=np.uint8)
Here are the questions (they are very much related to each other).
- How do I read one channel at a time?
- And how do I read 2 bytes at a time instead of 1? Do I just change dtype to np.uint16? The reason is that I am not sure how the bytes are ordered in the bytes object. Perhaps it has channel 1, frame 1, byte 1, and then channel 2, frame 1, byte 1, and then channel 1, frame 1, byte2, and then channel 2, frame1, byte 2, in which I cannot just interpret two contiguous bytes as a uint16 since the channels are interspersed and I end up just interpreting two byte 1 from different channels as a 16-bit int.