2
votes

I'm trying to convert batches of png images into a single mp4 x264 video using ffmpeg. The conversion, for reasons I won't go into, converts groups of frames into short mp4 chunks and then I want to take those chunks and merge them into the final video at a specific fps (in this case 30fps).

My understanding of ffmpeg and the x264 options is too limited, and while I can produce the individual mp4 chunks from the source png frames without trouble, the final merge always ends up duplicating and/or dropping frames especially with very short chunks (< 4 frames).

The conversion from png to mp4 uses this command:

ffmpeg -start_number 1001 -framerate 30 -f image2 -i 'intermediate.%d.png' -c:v libx264 -crf 1 -pix_fmt yuv420p -movflags +faststart -frames:v 4 -r 30 chunk.1.mp4 -y

which appears to work as expected, I get a playable mp4 chunk of, in this case, 4 frames of the sequence of png images at 30fps. The length of each chunk can be anywhere from 1 frame to around 100 frames.

When all the chunks are generated, I've been trying to use the concat demuxer to combine without re-encoding, placing all the source chunk paths in a file:

concat.txt:

file 'chunk.1.mp4'
file 'chunk.2.mp4'
file 'chunk.3.mp4'
...

and then running this ffmpeg command:

ffmpeg -f concat -i concat.txt -c:v copy merged.mp4 -y

but it says this during the concatenation:

[concat @ 0x315ff80] Estimating duration from bitrate, this may be inaccurate

and the resulting mp4 has dropped/duplicated frames. So I tried adding duration info to the concat.txt file:

file 'chunk.1.mp4'
duration 0.133333
file 'chunk.2.mp4'
duration 0.133333
file 'chunk.3.mp4'
duration 0.066666

in this case, two 4-frame/30fps chunks and one 2-frame/30fps chunk. Which gets rid of that estimation warning, but the result is still duplicating/dropping frames.

I'm not sure where I'm going wrong here...what do I need to do either in the production of the short mp4 segments, or in the combination stage, to get a single mp4 at the right framerate with no duplicated or dropped frames?

As suggested, here's the console output for the conversion from png->mp4 chunks:

ffmpeg -loglevel verbose -start_number 1001 -framerate 30 -f image2 -i 'intermediate.%d.png' -c:v libx264 -crf 1 -pix_fmt yuv420p -movflags +faststart -frames:v 4 -r 30 chunk.1.mp4 -y
ffmpeg version 2.5.4 Copyright (c) 2000-2015 the FFmpeg developers
  built on Feb 26 2015 10:23:42 with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-3)
  configuration: --prefix=/dept/srd/vendor/ffmpeg/bundle.rhel6/ffmpeg2.5.4 --enable-static --enable-pthreads --enable-gpl --enable-version3 --disable-ffserver --disable-ffplay --disable-ffprobe --enable-x11grab --enable-nonfree --extra-cflags=-I/dept/srd/vendor/ffmpeg/extern/rhel6/include --extra-ldflags=-L/dept/srd/vendor/ffmpeg/extern/rhel6/lib --enable-libx264 --enable-fontconfig --enable-libfreetype --enable-swscale --enable-libmp3lame --enable-libfaac --disable-yasm
  libavutil      54. 15.100 / 54. 15.100
  libavcodec     56. 13.100 / 56. 13.100
  libavformat    56. 15.102 / 56. 15.102
  libavdevice    56.  3.100 / 56.  3.100
  libavfilter     5.  2.103 /  5.  2.103
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
Input #0, image2, from 'intermediate.%d.png':
  Duration: 00:00:00.27, start: 0.000000, bitrate: N/A
    Stream #0:0: Video: png, rgba, 1024x1024 (0x0), 30 fps, 30 tbr, 30 tbn, 30 tbc
[graph 0 input from stream 0:0 @ 0x273e9c0] w:1024 h:1024 pixfmt:rgba tb:1/30 fr:30/1 sar:0/1 sws_param:flags=2
[auto-inserted scaler 0 @ 0x2737ea0] w:iw h:ih flags:'0x4' interl:0
[format @ 0x273ece0] auto-inserting filter 'auto-inserted scaler 0' between the filter 'Parsed_null_0' and the filter 'format'
[auto-inserted scaler 0 @ 0x2737ea0] w:1024 h:1024 fmt:rgba sar:0/1 -> w:1024 h:1024 fmt:yuv420p sar:0/1 flags:0x4
[libx264 @ 0x273c540] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 0x273c540] profile High, level 3.2
[libx264 @ 0x273c540] 264 - core 142 - H.264/MPEG-4 AVC codec - Copyleft 2003-2014 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=36 lookahead_threads=6 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'chunk.1.mp4':
  Metadata:
    encoder         : Lavf56.15.102
    Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1024x1024, q=-1--1, 30 fps, 15360 tbn, 30 tbc
    Metadata:
      encoder         : Lavc56.13.100 libx264
Stream mapping:
  Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
No more output streams to write to, finishing.
[mp4 @ 0x273baa0] Starting second pass: moving the moov atom to the beginning of the file
frame=    4 fps=0.0 q=-1.0 Lsize=     197kB time=00:00:00.06 bitrate=24228.7kbits/s    
video:196kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.439751%
Input file #0 (intermediate.%d.png):
  Input stream #0:0 (video): 8 packets read (2341016 bytes); 5 frames decoded; 
  Total: 8 packets (2341016 bytes) demuxed
Output file #0 (chunk.3.mp4):
  Output stream #0:0 (video): 4 frames encoded; 4 packets muxed (201023 bytes); 
  Total: 4 packets (201023 bytes) muxed
[libx264 @ 0x273c540] frame I:1     Avg QP: 0.47  size:116049
[libx264 @ 0x273c540] frame P:1     Avg QP: 2.29  size: 37932
[libx264 @ 0x273c540] frame B:2     Avg QP: 2.37  size: 23184
[libx264 @ 0x273c540] consecutive B-frames: 25.0%  0.0% 75.0%  0.0%
[libx264 @ 0x273c540] mb I  I16..4: 80.0%  4.5% 15.5%
[libx264 @ 0x273c540] mb P  I16..4:  0.2%  0.1%  0.4%  P16..4:  8.1%  3.6%  3.7%  0.0%  0.0%    skip:83.9%
[libx264 @ 0x273c540] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8:  4.8%  1.2%  1.6%  direct: 4.3%  skip:88.1%  L0:38.6% L1:39.3% BI:22.1%
[libx264 @ 0x273c540] 8x8 transform intra:4.6% inter:14.8%
[libx264 @ 0x273c540] coded y,uvDC,uvAC intra: 20.7% 22.9% 22.8% inter: 8.7% 10.1% 10.0%
[libx264 @ 0x273c540] i16 v,h,dc,p: 95%  1%  3%  1%
[libx264 @ 0x273c540] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 21% 22%  6%  6%  6%  7%  5%  6%
[libx264 @ 0x273c540] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 17% 18%  7%  8%  7%  8%  6%  8%
[libx264 @ 0x273c540] i8c dc,h,v,p: 89%  4%  4%  3%
[libx264 @ 0x273c540] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x273c540] ref B L1: 89.5% 10.5%
[libx264 @ 0x273c540] kb/s:12020.88

as I said, this appears to produce a valid mp4 at 30fps with no duplicated or dropped frames from the input images.

Here's the output of the combine phase:

ffmpeg -loglevel verbose -f concat -i concat.txt -c:v copy merged.mp4 -y
ffmpeg version 2.5.4 Copyright (c) 2000-2015 the FFmpeg developers
  built on Feb 26 2015 10:23:42 with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-3)
  configuration: --prefix=/dept/srd/vendor/ffmpeg/bundle.rhel6/ffmpeg2.5.4 --enable-static --enable-pthreads --enable-gpl --enable-version3 --disable-ffserver --disable-ffplay --disable-ffprobe --enable-x11grab --enable-nonfree --extra-cflags=-I/dept/srd/vendor/ffmpeg/extern/rhel6/include --extra-ldflags=-L/dept/srd/vendor/ffmpeg/extern/rhel6/lib --enable-libx264 --enable-fontconfig --enable-libfreetype --enable-swscale --enable-libmp3lame --enable-libfaac --disable-yasm
  libavutil      54. 15.100 / 54. 15.100
  libavcodec     56. 13.100 / 56. 13.100
  libavformat    56. 15.102 / 56. 15.102
  libavdevice    56.  3.100 / 56.  3.100
  libavfilter     5.  2.103 /  5.  2.103
  libswscale      3.  1.101 /  3.  1.101
  libswresample   1.  1.100 /  1.  1.100
  libpostproc    53.  3.100 / 53.  3.100
Input #0, concat, from 'concat.txt':
  Duration: 00:00:00.67, start: 0.000000, bitrate: 2 kb/s
    Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1024x1024, 7791 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc
Output #0, mp4, to 'merged.mp4':
  Metadata:
    encoder         : Lavf56.15.102
    Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1024x1024 (0x0), q=2-31, 7791 kb/s, 30 fps, 15360 tbn, 15360 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
No more output streams to write to, finishing.
frame=   20 fps=0.0 q=-1.0 Lsize=     748kB time=00:00:00.56 bitrate=10805.0kbits/s    
video:746kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.141687%
Input file #0 (concat.txt):
  Input stream #0:0 (video): 20 packets read (764361 bytes); 
  Total: 20 packets (764361 bytes) demuxed
Output file #0 (merged.mp4):
  Output stream #0:0 (video): 20 packets muxed (764361 bytes); 
  Total: 20 packets (764361 bytes) muxed
1

1 Answers

0
votes

Please include your console output.

That said, out of curiosity, can you try using the concatenate filter and combining this all into one command and see if that changes anything. It would look something like this:

ffmpeg -start_number 1001 -r 30 -i 'intermediate_1.%d.png' -i 'intermediate_2.%d.png' -i 'intermediate_3%d.png' -filter_complex "[0:v][1:v][2:v]concat=n=2:v=1:a=0[v]" -map [v] -c:v libx264 -pix_fmt yuv420p -movflags +faststart Output.mp4