The reason is that raw audio files do not contain information about the audio format in the file, so you need to provide these. Sample rate is just one such indicator, so you'll need to do this also for a few other parameters.
Quoted from sox.sourceforge.net:
SoX can work with ‘self-describing’ and ‘raw’ audio files. ‘self-describing’ formats (e.g. WAV, FLAC, MP3) have a header that completely describes the signal and encoding attributes of the audio data that follows. ‘raw’ or ‘headerless’ formats do not contain this information, so the audio characteristics of these must be described on the SoX command line or inferred from those of the input file.
The following four characteristics are used to describe the format of audio data such that it can be processed with SoX:
sample rate
The sample rate in samples per second (‘Hertz’ or ‘Hz’). Digital telephony traditionally uses a sample rate of 8000 Hz (8 kHz), though these days, 16 and even 32 kHz are becoming more common. Audio Compact Discs use 44100 Hz (44.1 kHz). Digital Audio Tape and many computer systems use 48 kHz. Professional audio systems often use 96 kHz.
sample size [...]
- data encoding [...]
- channels [...]
The pysox documentation describes the set_input_format
method:
set_input_format(file_type=None, rate=None, bits=None, channels=None, encoding=None, ignore_length=False)
Sets input file format arguments. This is primarily useful when dealing with audio files without a file extension. Overwrites any previously set input file arguments.
If this function is not explicitly called the input format is inferred from the file extension or the file’s header.
Parameters:
file_type
: str
or None
, default=None
The file type of the input audio file. Should be the same as what the file extension would be, for ex. ‘mp3’ or ‘wav’.
rate
: float
or None
, default=None
The sample rate of the input audio file. If None
the sample rate is inferred.
[...]
So, you should set the rate as follows:
tfm.set_input_format(file_type='raw', rate=8000, bits=16, channels=1, encoding='signed-integer')
You'll have to adjust the values to what you really have encoded in that raw file. This method call will apply to all files with the "raw" extension, so if you would process more than one such file, there is no need to call the above again. Only when the characteristics are different in a different "raw" file you would need to call it again with the appropriate values.