After going through the documentation of pyaudio and reading some other articles on the web, I am confused if my understanding is correct.
This is the code for audio recording found on pyaudio's site:
import pyaudio
import wave
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
stream.stop_stream()
stream.close()
p.terminate()
and if I add these lines then I am able to play whatever I recorded:
play=pyaudio.PyAudio()
stream_play=play.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
output=True)
for data in frames:
stream_play.write(data)
stream_play.stop_stream()
stream_play.close()
play.terminate()
- "RATE" is the number of samples collected per second.
- "CHUNK" is the number of frames in the buffer.
- Each frame will have 2 samples as "CHANNELS=2".
- Size of each sample is 2 bytes, calculated using the function:
pyaudio.get_sample_size(pyaudio.paInt16)
. - Therefore size of each frame is 4 bytes.
- In the "frames" list, size of each element must be 1024*4 bytes, for example, size of
frames[0]
must be 4096 bytes. However,sys.getsizeof(frames[0])
returns4133
, butlen(frames[0])
returns4096
. for
loop executesint(RATE / CHUNK * RECORD_SECONDS)
times, I cant understand why. Here is the same question answered by "Ruben Sanchez" but I cant be sure if its correct as he saysCHUNK=bytes
. And according to his explanation, it must beint(RATE / (CHUNK*2) * RECORD_SECONDS)
as(CHUNK*2)
is the number of samples read in buffer with each iteration.- Finally when I write
print frames[0]
, it prints gibberish as it tries to treat the string to be ASCII encoded which it is not, it is just a stream of bytes. So how do I print this stream of bytes in hexadecimal usingstruct
module? And if later, I change each of the hexadecimal value with values of my choice, will it still produce a playable sound?
Whatever I wrote above was my understanding of the things and many of them maybe wrong.