ffmpeg conversion to mp4 shifts the audio by one frame

1

votes

I have a .mov file (codec = motion jpeg) that has an audio stream that includes small pulses at every second.

When I convert this file to mp4 using ffmpeg I notice that all my pulses are now off by one frame.

I simply used "ffmpeg -i source_file.mov target_file.mp4"

Here is an image of the comparison between the audio signals:

A1 is the original audio (.mov) and A2 is the mp4 output audio of ffmpeg. As you can see the pulses are one frame late compared to the original.

I know that the h264 codec is lossy but one frame offset seems like a big loss if you ask me.

Is there any option I could use with ffmpeg to have a better audio stream ?

Here is the input file: https://www.dropbox.com/s/6y5g7lo5dvu0ub1/BBB_09_tree_trunk_009_ANIM_001.mov?dl=0

Here is the output file: https://www.dropbox.com/s/10zuzwn0qs8l853/BBB_09_tree_trunk_009_ANIM_001.mp4?dl=0

audioffmpegmp4

2

votes

If you copy the audio over, you shouldn't get the shift.

ffmpeg -i source_file.mov -c:a copy target_file.mp4

0

votes

I've been working on this issue for my own needs and my file format has to be mp4. I'm working from mxf files. I've tried several options and found this to give the most accurate result (I've removed specifics for simplicity):

ffmpeg -ss 00:00:00.021 -i "input.mxf" -itsoffset -0.044 -i "input.mxf" -c:v libx264 -c:a aac -map 0:a -map 1:v "output.mp4"

Starting the first file at 21ms and mapping it as the audio, then shifting the video back 44ms gave gave me the most accurate sync (within several samples). I don't know why 22ms wasn't as accurate (when that's what the primer sample issue seems to equate to) and I found nothing that allowed me to work more granular, in samples. A filter with a PTS offset had no affect. Perhaps it works differently with different file formats. It's also worth noting that the same command without the -itsoffest gave the same sync result with one difference; the video stream duration was 1 frame and 1ms off the audio and container durations. With the -itsoffest, the durations were only 1ms different. You can use 22ms to achieve an accurate duration, but check your sync, it might be out that slightest bit more.

Also worth noting that I stumbled across some developer commentary on the -itsoffset tag which clarified that it doesn't work on audio, it works on video. It seems like the answer above is suggesting to map the offest against the audio, which apparently is not how the function is built to work. https://trac.ffmpeg.org/ticket/1349

ffmpeg conversion to mp4 shifts the audio by one frame

2 Answers