0
votes

For example take the case of a stereo channel wav file with sample rate as 44100 and a bit depth of 16 bits.

Exactly how is the 16 bits divided up?


The audio clip that I was using, the first 4 bytes had data about the first audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).


The next 4 bytes had data about the second audio channel the next 4 bits - I have no idea what it is( even when replaced with 0 , there is no effect on final audio file).

So I would like to figure out what those 4 bits are.

2
The 16 bits isn't divided up - it is just a 16 bit number giving the sound volume at that instance.greg-449

2 Answers

2
votes

A WAV File contains several chunks. The FMT chunk specifies the format of the audio data. The actual audio data are within the data chunk. It depends on the actual format. But let's assume the following format as example:

PCM, 16 bit, 2 channels with a samplerate of 44100Hz.

Audio data is represented as samples. In this case each sample takes 16 bits = 2 Bytes. If we got multiple channels (in this examples 2 = Stereo), it will look like this:

left sample, right sample, left sample, right sample, ...

since each sample takes 2 Bytes (16 bits) we got something like this:

Byte 1 | Byte 2 | Byte 3 | Byte 4 | Byte 5 | Byte 6 | Byte 7 | Byte 8 | ...
left sample     | right sample    | left sample     | right sample    | ...

Each second of audio contains 44100 samples for EACH channel. So in total, one second of audio takes 44100 * ( 16 / 8 ) * 2 Bytes.

0
votes

WAV format audio file starts with a 44 byte header followed by the payload which is the uncompressed raw PCM audio data ... in the payload area as you walk across the PCM data each sample (point on audio curve) will contain data for all channels ... header will tell you number of channels ... for stereo using bit depth of 16 you will see two bytes (16 bits == bit depth) for a given channel immediately followed by the two bytes of the next channel etc...

For a given channel a given set of bytes (2 bytes in your case) will appear in two possible layouts determined by choice of endianness ... 1st byte followed by 2nd byte ... ordering of endianness is important here ... header also tells you what endianness you are using ... typically WAV format is little endian

each channel will generate its own audio curve

in your code to convert from PCM data into a usable audio curve data point you must combine all bytes of a given sample for given channel into a single value ... typically its integer and not floating point again the header defines which ... if integer it could be signed or unsigned ... little endian means as you read the file the first (left most) byte will become the least significant byte followed by each subsequent byte which becomes the next most significant byte

in pseudo code :

int mydatapoint  // allocate your integer audio curve data point

step 0

mydatapoint = most-significant-byte

stop here for bit depth of 8

... if you have bit depth greater than 8 bits now left shift this to make room for the following byte if any

step 1

mydatapoint = mydatapoint << 8 // shove data to the left by 8 bits
                               // which effectively jacks up its value
                               // and leaves empty those right most 8 bits

step 2

// following operation is a bit wise OR operation
mydatapoint = mydatapoint  OR next-most-significant-byte

now repeat doing steps 1 & 2 for each subsequent next byte of PCM data in order from most significant to least significant (for little endian) ... essential for any bit depth beyond 16 so for 24 bit audio or 32 bit you will need to combine 3 or 4 bytes of PCM data into your single integer output audio curve data point

Why are we doing this bit shifting nonsense

The level of audio fidelity when converting from analog to digital is driven by how accurately are you recording the audio curve ... analog audio is a continuous curve however to become digital it must be sampled into discrete points along the curve ... two factors determine the fidelity when sampling the analog curve to create its digital representation ... the left to right distance along the analog audio curve is determined by sample rate and the up and down distance along the audio curve is determined by bit depth ... higher sample rate gives you more samples per second and a greater bit depth gives you more vertical points to approximate the instantaneous height of the analog audio curve

bit depth  8 == 2^8  ==   256 distinct vertical values to record curve height
bit depth 16 == 2^16 == 65536 distinct vertical values to record curve height

so to more accurately record into digital the height of our analog audio curve we want to become as granular as possible ... so the resultant audio curve is as smooth as possible and not jagged which would happen if we only allocated 2 bits which would give us 2^2 which is 4 distinct values ... try to connect the dots when your audio curve only has 4 vertical values to choose from on your plot ... the bit shifting is simply building up a single integer value from many bytes of data ... numbers greater than 256 cannot fit into one byte and so must be spread across multiple bytes of PCM data

http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html