I'm trying to convert batches of png images into a single mp4 x264 video using ffmpeg. The conversion, for reasons I won't go into, converts groups of frames into short mp4 chunks and then I want to take those chunks and merge them into the final video at a specific fps (in this case 30fps).
My understanding of ffmpeg and the x264 options is too limited, and while I can produce the individual mp4 chunks from the source png frames without trouble, the final merge always ends up duplicating and/or dropping frames especially with very short chunks (< 4 frames).
The conversion from png to mp4 uses this command:
ffmpeg -start_number 1001 -framerate 30 -f image2 -i 'intermediate.%d.png' -c:v libx264 -crf 1 -pix_fmt yuv420p -movflags +faststart -frames:v 4 -r 30 chunk.1.mp4 -y
which appears to work as expected, I get a playable mp4 chunk of, in this case, 4 frames of the sequence of png images at 30fps. The length of each chunk can be anywhere from 1 frame to around 100 frames.
When all the chunks are generated, I've been trying to use the concat demuxer to combine without re-encoding, placing all the source chunk paths in a file:
concat.txt:
file 'chunk.1.mp4'
file 'chunk.2.mp4'
file 'chunk.3.mp4'
...
and then running this ffmpeg command:
ffmpeg -f concat -i concat.txt -c:v copy merged.mp4 -y
but it says this during the concatenation:
[concat @ 0x315ff80] Estimating duration from bitrate, this may be inaccurate
and the resulting mp4 has dropped/duplicated frames. So I tried adding duration info to the concat.txt file:
file 'chunk.1.mp4'
duration 0.133333
file 'chunk.2.mp4'
duration 0.133333
file 'chunk.3.mp4'
duration 0.066666
in this case, two 4-frame/30fps chunks and one 2-frame/30fps chunk. Which gets rid of that estimation warning, but the result is still duplicating/dropping frames.
I'm not sure where I'm going wrong here...what do I need to do either in the production of the short mp4 segments, or in the combination stage, to get a single mp4 at the right framerate with no duplicated or dropped frames?
As suggested, here's the console output for the conversion from png->mp4 chunks:
ffmpeg -loglevel verbose -start_number 1001 -framerate 30 -f image2 -i 'intermediate.%d.png' -c:v libx264 -crf 1 -pix_fmt yuv420p -movflags +faststart -frames:v 4 -r 30 chunk.1.mp4 -y
ffmpeg version 2.5.4 Copyright (c) 2000-2015 the FFmpeg developers
built on Feb 26 2015 10:23:42 with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-3)
configuration: --prefix=/dept/srd/vendor/ffmpeg/bundle.rhel6/ffmpeg2.5.4 --enable-static --enable-pthreads --enable-gpl --enable-version3 --disable-ffserver --disable-ffplay --disable-ffprobe --enable-x11grab --enable-nonfree --extra-cflags=-I/dept/srd/vendor/ffmpeg/extern/rhel6/include --extra-ldflags=-L/dept/srd/vendor/ffmpeg/extern/rhel6/lib --enable-libx264 --enable-fontconfig --enable-libfreetype --enable-swscale --enable-libmp3lame --enable-libfaac --disable-yasm
libavutil 54. 15.100 / 54. 15.100
libavcodec 56. 13.100 / 56. 13.100
libavformat 56. 15.102 / 56. 15.102
libavdevice 56. 3.100 / 56. 3.100
libavfilter 5. 2.103 / 5. 2.103
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
Input #0, image2, from 'intermediate.%d.png':
Duration: 00:00:00.27, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, rgba, 1024x1024 (0x0), 30 fps, 30 tbr, 30 tbn, 30 tbc
[graph 0 input from stream 0:0 @ 0x273e9c0] w:1024 h:1024 pixfmt:rgba tb:1/30 fr:30/1 sar:0/1 sws_param:flags=2
[auto-inserted scaler 0 @ 0x2737ea0] w:iw h:ih flags:'0x4' interl:0
[format @ 0x273ece0] auto-inserting filter 'auto-inserted scaler 0' between the filter 'Parsed_null_0' and the filter 'format'
[auto-inserted scaler 0 @ 0x2737ea0] w:1024 h:1024 fmt:rgba sar:0/1 -> w:1024 h:1024 fmt:yuv420p sar:0/1 flags:0x4
[libx264 @ 0x273c540] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
[libx264 @ 0x273c540] profile High, level 3.2
[libx264 @ 0x273c540] 264 - core 142 - H.264/MPEG-4 AVC codec - Copyleft 2003-2014 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=36 lookahead_threads=6 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'chunk.1.mp4':
Metadata:
encoder : Lavf56.15.102
Stream #0:0: Video: h264 (libx264) ([33][0][0][0] / 0x0021), yuv420p, 1024x1024, q=-1--1, 30 fps, 15360 tbn, 30 tbc
Metadata:
encoder : Lavc56.13.100 libx264
Stream mapping:
Stream #0:0 -> #0:0 (png (native) -> h264 (libx264))
Press [q] to stop, [?] for help
No more output streams to write to, finishing.
[mp4 @ 0x273baa0] Starting second pass: moving the moov atom to the beginning of the file
frame= 4 fps=0.0 q=-1.0 Lsize= 197kB time=00:00:00.06 bitrate=24228.7kbits/s
video:196kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.439751%
Input file #0 (intermediate.%d.png):
Input stream #0:0 (video): 8 packets read (2341016 bytes); 5 frames decoded;
Total: 8 packets (2341016 bytes) demuxed
Output file #0 (chunk.3.mp4):
Output stream #0:0 (video): 4 frames encoded; 4 packets muxed (201023 bytes);
Total: 4 packets (201023 bytes) muxed
[libx264 @ 0x273c540] frame I:1 Avg QP: 0.47 size:116049
[libx264 @ 0x273c540] frame P:1 Avg QP: 2.29 size: 37932
[libx264 @ 0x273c540] frame B:2 Avg QP: 2.37 size: 23184
[libx264 @ 0x273c540] consecutive B-frames: 25.0% 0.0% 75.0% 0.0%
[libx264 @ 0x273c540] mb I I16..4: 80.0% 4.5% 15.5%
[libx264 @ 0x273c540] mb P I16..4: 0.2% 0.1% 0.4% P16..4: 8.1% 3.6% 3.7% 0.0% 0.0% skip:83.9%
[libx264 @ 0x273c540] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 4.8% 1.2% 1.6% direct: 4.3% skip:88.1% L0:38.6% L1:39.3% BI:22.1%
[libx264 @ 0x273c540] 8x8 transform intra:4.6% inter:14.8%
[libx264 @ 0x273c540] coded y,uvDC,uvAC intra: 20.7% 22.9% 22.8% inter: 8.7% 10.1% 10.0%
[libx264 @ 0x273c540] i16 v,h,dc,p: 95% 1% 3% 1%
[libx264 @ 0x273c540] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 21% 21% 22% 6% 6% 6% 7% 5% 6%
[libx264 @ 0x273c540] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 17% 18% 7% 8% 7% 8% 6% 8%
[libx264 @ 0x273c540] i8c dc,h,v,p: 89% 4% 4% 3%
[libx264 @ 0x273c540] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x273c540] ref B L1: 89.5% 10.5%
[libx264 @ 0x273c540] kb/s:12020.88
as I said, this appears to produce a valid mp4 at 30fps with no duplicated or dropped frames from the input images.
Here's the output of the combine phase:
ffmpeg -loglevel verbose -f concat -i concat.txt -c:v copy merged.mp4 -y
ffmpeg version 2.5.4 Copyright (c) 2000-2015 the FFmpeg developers
built on Feb 26 2015 10:23:42 with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-3)
configuration: --prefix=/dept/srd/vendor/ffmpeg/bundle.rhel6/ffmpeg2.5.4 --enable-static --enable-pthreads --enable-gpl --enable-version3 --disable-ffserver --disable-ffplay --disable-ffprobe --enable-x11grab --enable-nonfree --extra-cflags=-I/dept/srd/vendor/ffmpeg/extern/rhel6/include --extra-ldflags=-L/dept/srd/vendor/ffmpeg/extern/rhel6/lib --enable-libx264 --enable-fontconfig --enable-libfreetype --enable-swscale --enable-libmp3lame --enable-libfaac --disable-yasm
libavutil 54. 15.100 / 54. 15.100
libavcodec 56. 13.100 / 56. 13.100
libavformat 56. 15.102 / 56. 15.102
libavdevice 56. 3.100 / 56. 3.100
libavfilter 5. 2.103 / 5. 2.103
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
Input #0, concat, from 'concat.txt':
Duration: 00:00:00.67, start: 0.000000, bitrate: 2 kb/s
Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1024x1024, 7791 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc
Output #0, mp4, to 'merged.mp4':
Metadata:
encoder : Lavf56.15.102
Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1024x1024 (0x0), q=2-31, 7791 kb/s, 30 fps, 15360 tbn, 15360 tbc
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
No more output streams to write to, finishing.
frame= 20 fps=0.0 q=-1.0 Lsize= 748kB time=00:00:00.56 bitrate=10805.0kbits/s
video:746kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.141687%
Input file #0 (concat.txt):
Input stream #0:0 (video): 20 packets read (764361 bytes);
Total: 20 packets (764361 bytes) demuxed
Output file #0 (merged.mp4):
Output stream #0:0 (video): 20 packets muxed (764361 bytes);
Total: 20 packets (764361 bytes) muxed