This post is more about clarification than it is about implementing some sort of audio waveform algorithm. I've read a myriad of posts concerning the subject (both on SO and out on the web), and here's what I've gathered:
- In the context of 16-bit WAV, I want to read every two bytes as a
short
, which will result in a value between -32768 to 32767. - With a sample rate of 44.1kHz, I'll have 44 thousand samples for every one second of audio.
This is pretty straight-forward, however I have the following questions:
- A WAV rendered in mono only has one channel, which is two bytes of information per frame. In stereo, this becomes four bytes of information. In my situation, I'm not required to display both channels, so would I simply skip the right channel and read only the left? Some solutions I've read mentioned combining both the left and right channels, though I'm not sure if this is required.
- Say I had an audio file that is two seconds long, and another that is thirty seconds long. If I need to grab a minimum of 800 samples to represent the waveform, would grabbing 800 samples along the length of the file introduce accuracy issues, e.g.
(44,000 * 2) / 800
for the two second audio file, and(44,000 * 30) / 800
for the thirty second audio file.
An explanation would really be appreciated!