FFmpeg remove silence with exact duration detected by detect silence

Question

I have an audio file, that have some silences, which I am detecting with ffmpeg detectsilence and then trying to remove with removesilence, however there is some strange behavior. Specifically:

1) File's Basic info based on ffprobe show_streams

Input #0, mp3, from 'my_file.mp3':
  Metadata:
    encoder         : Lavf58.64.100
  Duration: 00:00:25.22, start: 0.046042, bitrate: 32 kb/s
    Stream #0:0: Audio: mp3, 24000 Hz, mono, fltp, 32 kb/s

2) Using detectsilence

ffmpeg -i my_file.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -

I get this result

[mp3float @ 000001ee50074280] overread, skip -7 enddists: -1 -1
[silencedetect @ 000001ee5008a1c0] silence_start: 6.21417
[silencedetect @ 000001ee5008a1c0] silence_end: 6.91712 | silence_duration: 0.702958
[silencedetect @ 000001ee5008a1c0] silence_start: 16.44
[silencedetect @ 000001ee5008a1c0] silence_end: 17.1547 | silence_duration: 0.714708
[mp3float @ 000001ee50074280] overread, skip -10 enddists: -3 -3
[mp3float @ 000001ee50074280] overread, skip -5 enddists: -4 -4
[silencedetect @ 000001ee5008a1c0] silence_start: 24.4501
size=N/A time=00:00:25.17 bitrate=N/A speed=1.32e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 000001ee5008a1c0] silence_end: 25.176 | silence_duration: 0.725917

That also match the values and points based on Adobe Audition

So far all good.

3) Now, based on some calculations (which is based on application's logic on what should be the final duration of the audio) I am trying to delete the silence with "0.725917"s duration. For that, based on ffmpeg docs (https://ffmpeg.org/ffmpeg-filters.html#silencedetect)

Trim all silence encountered from beginning to end where there is more than 1 second of silence in audio: silenceremove=stop_periods=-1:stop_duration=1:stop_threshold=-90dB

I run this command

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72 result1.mp3

So, I am expecting that it should delete only the silence with "0.725917" duration (the last one in the above image), however it is deleting the silence that starts at 16.44s with duration of "0.714708"s. Please see the following comparison:

4) Running detectsilence on result1.mp3 with same options gives even stranger results

ffmpeg -i result1.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -

result

[mp3float @ 0000017723404280] overread, skip -5 enddists: -4 -4
[silencedetect @ 0000017723419540] silence_start: 6.21417
[silencedetect @ 0000017723419540] silence_end: 6.92462 | silence_duration: 0.710458
[mp3float @ 0000017723404280] overread, skip -7 enddists: -6 -6
[mp3float @ 0000017723404280] overread, skip -7 enddists: -2 -2
[mp3float @ 0000017723404280] overread, skip -6 enddists: -1 -1
    Last message repeated 1 times
[silencedetect @ 0000017723419540] silence_start: 23.7308
size=N/A time=00:00:24.45 bitrate=N/A speed=1.33e+03x
video:0kB audio:1146kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 0000017723419540] silence_end: 24.456 | silence_duration: 0.725167

So, the results are:

With command to remove silences that are longer than "0.72 second", a silence that was "0.714708"s, got removed and - a silence with "0.725917"s remained as is (well, actually changed a little - as per 3rd point)
The first silence that had started at "6.21417" and had a duration of "0.702958"s, suddenly now has a duration of "0.710458"s
The 3rd silence that had started at "24.4501" (which now starts at 23.7308 - obviously because the 2nd silence was removed) and had a duration of "0.725917", now suddenly is "0.725167"s (this one is not a big difference, but still why even removing other silence, this silence's duration should change at all).

Accordingly the expected results are:

Only the silences that match the provided condition (stop_duration=0.72) should be removed. In this specific example only the last one, but in general any silence that matches the condition of the length - irrelevant of their positioning (start, end or in the middle)
Other silences should remain with same exact duration they were before

FFMpeg: 4.2.4-1ubuntu0.1, Ubuntu: 20.04.2

Some attempts and results, while playing with ffmpeg options

a)

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:detection=peak tmp1.mp3

result: First and second silences are removed, 3rd silence's duration remains exactly the same

b)

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.71 tmp_0.71.mp3

result: First and second silences are removed, 3rd silence remains, but the duration becomes "0.72075"s

c)

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.7 tmp_0.7.mp3

result: all 3 silence are removed

d) the edge case

this command still removes the second silence (after which the first silence become exactly as in point #4 and last silence becomes "0.721375")

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72335499999 tmp_0.72335499999.mp3

but this one, again does not remove any silence:

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.723355 tmp_0.723355.mp3

e) window param case 0.03

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0.03 window_0.03.mp3

does not remove any silence, but the detect silence

ffmpeg -i window_0.03.mp3 -af silencedetect=noise=-50dB:d=0.2 -f null -

gives this result (compare with silences in result1.mp3 - from point #4 )

[mp3float @ 000001c5c8824280] overread, skip -5 enddists: -4 -4
[silencedetect @ 000001c5c883a040] silence_start: 6.21417
[silencedetect @ 000001c5c883a040] silence_end: 6.92462 | silence_duration: 0.710458
[mp3float @ 000001c5c8824280] overread, skip -7 enddists: -6 -6
[mp3float @ 000001c5c8824280] overread, skip -7 enddists: -2 -2
[silencedetect @ 000001c5c883a040] silence_start: 16.4424
[silencedetect @ 000001c5c883a040] silence_end: 17.1555 | silence_duration: 0.713167
[mp3float @ 000001c5c8824280] overread, skip -6 enddists: -1 -1
    Last message repeated 1 times
[silencedetect @ 000001c5c883a040] silence_start: 24.4508
size=N/A time=00:00:25.17 bitrate=N/A speed=1.24e+03x
video:0kB audio:1180kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 000001c5c883a040] silence_end: 25.176 | silence_duration: 0.725167

f) window case 0.01

ffmpeg -i my_file.mp3 -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0.01 window_0.01.mp3

removes first and second silences, the detect silence with same params has the following result

[mp3float @ 000001ea631d4280] overread, skip -5 enddists: -4 -4
    Last message repeated 1 times
[mp3float @ 000001ea631d4280] overread, skip -7 enddists: -2 -2
[mp3float @ 000001ea631d4280] overread, skip -6 enddists: -1 -1
    Last message repeated 1 times
[silencedetect @ 000001ea631ea1c0] silence_start: 23.0108
size=N/A time=00:00:23.73 bitrate=N/A speed=1.2e+03x
video:0kB audio:1113kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[silencedetect @ 000001ea631ea1c0] silence_end: 23.736 | silence_duration: 0.725167

Any thoughts, ideas, points are much appreciated.

@llogan here is the file: dropbox.com/s/rd8oopeicjam3sm/my_file.mp3?dl=0 — dav
@llogan the goal is to remove only those silences that match the given condition (meaning longer than x seconds) - irrelevant of their positioning (start, middle or end) - so though in this example the silence is at the end, but in other cases the silence to be removed might be in the middle/at the beginning. (Updated the question to clarify that part in the post as well) — dav
@dav I updated my post over at the other ffmpeg question you were interesting in. I ended up supplying a big python script that does a lot of the functionality being asked for. stackoverflow.com/questions/47910301/… — Mark H

Mark H Mark H · Accepted Answer · 2021-03-17T07:33:06

You're suffering from two things:

You are converting back to an mp3 (a lossy format), which is causing result1.mp3 to be reencoded and become slightly different than a perfect cut. The fix for this is to use .wav's (a lossless format).
The silenceremove function is using a window and you need to set it to 0 to do sample-by-sample.

ffmpeg -i my_file.mp3 my_file.wav
ffmpeg -i my_file.wav -af silencedetect=noise=-50dB:d=0.2 -f null -
ffmpeg -i my_file.wav -af silenceremove=stop_periods=-1:stop_threshold=-50dB:stop_duration=0.72:window=0 result1.wav
ffmpeg -i result1.wav -af silencedetect=noise=-50dB:d=0.2 -f null -

Final output of the last line. I would consider this a solid solution, because the silence starts and durations match up perfectly with their values before the cut:

[silencedetect @ 0x5570a855b400] silence_start: 6.21417
[silencedetect @ 0x5570a855b400] silence_end: 6.91712 | silence_duration: 0.702958
[silencedetect @ 0x5570a855b400] silence_start: 16.44
[silencedetect @ 0x5570a855b400] silence_end: 17.1547 | silence_duration: 0.714708
size=N/A time=00:00:24.45 bitrate=N/A speed=4.49e+03x

You can then reencode it to .mp3 if you want.

FFmpeg remove silence with exact duration detected by detect silence

1 Answers