Starting with a stereo wav file with a bit depth of 16 bits at 44,100 kHz sample rate you have a standard CD quality audio file ... issue this on command line to display such stats on a file
ffprobe Cesária_Évora.wav
typical output
Duration: 00:00:21.51, bitrate: 1411 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s
to create a PCM file from the wav issue
ffmpeg -i Cesária_Évora.wav -f s16le -acodec pcm_s16le cesaria.dat
be aware a wav file is simply a 44 byte header followed by payload which is the raw audio curve in PCM format ... this PCM file is strictly L1 R1 L2 R2 nothing more nothing less ... any notion of frames is an abstraction of how we parse the data with no bits dedicated to implement a frame (like start/end markers) ... to write code to manipulate PCM data keep in mind your bit depth as well as whether your file has little endian or big endian byte structure ... whenever your file has a bit depth of 8 bits then you can safely ignore endianness since you will never need to shift bytes however since above file has bit depth of 16 bits this means each point of the audio curve is represented by a single 16 bit number per channel ( stereo is two channel, mono one channel )
when reading such a file this 16 bit number is stored across two bytes ... if little endian as you read the bytes the left most byte ( first encountered in your loop as you iterate across the file ) is the littlest byte followed by the next more significant byte meaning
L1 R1 L2 R2
below we indicate the stereo representation of two 16 bit points on the audio curve
Llittle1 Lbig1 Rlittle1 Rbig1 Llittle2 Lbig2 Rlittle2 Rbig2
when we speak of individual bytes used to store those two points ... note above shows 8 bytes ... similarly if we had a bit depth of 24 bytes it would be the following for one raw audio sample on one channel
Llittle1 Lbigger1 Lbiggest1 Rlittle1 Rbigger1 Rbiggest1
so conceptually when reading a little endian file of bit depth 16 bits here is how you parse the PCM for one channel for one point on the raw audio curve
Llittle1 Lbig1
now to generate a single value L1
you conceptually do this
L1 = ( Lbig1 << shift 8 bits to left ) + Llittle1
Not sure if this is the level of abstraction you where looking for however it is a stepping stone to nailing digital audio
Super helpful tool Audacity permits you to import a raw audio file in PCM format as we generated above cesaria.dat ... Audacity -> File -> Import -> Raw Data -> choose cesaria.dat ->