Opened 13 days ago
#10967 new defect
MP4 Inconsistencies in presentation time when negative ctts_offsets are present.
Reported by: | Casey Bateman | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avformat |
Version: | 5.1.4 | Keywords: | fmp4 pts cts dts dtsshift |
Cc: | Casey Bateman | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
I am attempting to encode HLS variants packaged with the fMP4 file type. I have noticed that there is a discrepancy in PTS time calculated by the demuxer (mov.c), and that calculated by the muxer (movenc.c). I believe this discrepancy is resulting in behavior making it difficult to align HLS segment boudaries. The demuxer will calculate PTS time using the following calculation:
PTS = DTS + DTS_SHIFT + CTTS_OFFSET
Where as the muxer(movenc.c) calculates the earliest presentation timestamp shown in the sidx, or just the pts time of a packet using the following calculation:
PTS = DTS + CTTS_OFFSET
The result is a sidx earliest presentation time that does not match the reported PTS time for the first packet in a video stream with negative cts offsets.
This feels particularly problematic for my use case which is encoding HLS variant segments on demand. We do this to save on storage costs, as well as deliver dynamic qualities to users based on their specific needs and abilities. When I run my first segment through ffprobe to inspect the DTS/PTS of the first packet I get the following:
ffprobe -show_packets -select_streams v -read_intervals "%+#1" -print_format json -hide_banner -loglevel error -i "concat:1080_0000.mp4|1080_0000.m4s" { "packets": [ { "codec_type": "video", "stream_index": 0, "pts": 4096, "pts_time": "0.320000", "dts": 2048, "dts_time": "0.160000", "duration": 512, "duration_time": "0.040000", "size": "452669", "pos": "2702", "flags": "K_" } ] }
I then attempt to encode the first HLS segment to a lower variant 720p, using the follow command and get the following output.
../../ffmpeg -i "concat:1080_0000.mp4|1080_0000.m4s" -color_trc bt709 -colorspace bt709 -color_primaries bt709 -vf "scale=width=1280:height=720:flags=bicubic, setdar=dar=16/9, setsar=sar=1/1" -c:v libx264 -pix_fmt yuv420p -max_muxing_queue_size 9999 -s 1280x720 -maxrate 1200k -bufsize 1200k -profile:v main -hls_segment_type fmp4 -hls_time 999 -hls_fmp4_init_filename "720_0000.mp4" -hls_segment_filename "720_%04d.m4s" -hls_segment_options "fragment_index=1:movflags=+frag_discont+cmaf+negative_cts_offsets" -start_number 0 -fps_mode passthrough -copyts -enc_time_base -1 "playlist_720.m3u8" ffmpeg version n5.1.2-6-gc3bdbb5 Copyright (c) 2000-2022 the FFmpeg developers built with gcc 10 (GCC) configuration: libavutil 57. 28.100 / 57. 28.100 libavcodec 59. 37.100 / 59. 37.100 libavformat 59. 27.100 / 59. 27.100 libavfilter 8. 44.100 / 8. 44.100 libswscale 6. 7.100 / 6. 7.100 libswresample 4. 7.100 / 4. 7.100 libpostproc 56. 6.100 / 56. 6.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'concat:1080_0000.mp4|1080_0000.m4s': Metadata: major_brand : iso6 minor_version : 512 compatible_brands: iso6cmfcmp41 title : Hudl encoder : Lavf59.27.100 Duration: 00:00:04.16, start: 0.320000, bitrate: 5609 kb/s Stream #0:0[0x1](und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv, progressive), 1920x1080, 5603 kb/s, SAR 1:1 DAR 16:9, 25 fps, 25 tbr, 12800 tbn (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] encoder : Lavc59.37.100 libwz265 Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> h264 (libx264)) Press [q] to stop, [?] for help [libx264 @ 0x4b36640] using SAR=1/1 [libx264 @ 0x4b36640] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 [libx264 @ 0x4b36640] profile Main, level 3.1, 4:2:0, 8-bit [libx264 @ 0x4b36640] 264 - core 164 - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x1:0x111 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=22 lookahead_threads=3 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 vbv_maxrate=1200 vbv_bufsize=1200 crf_max=0.0 nal_hrd=none filler=0 ip_ratio=1.40 aq=1:1.00 [hls @ 0x4b378c0] Opening '720_0000.mp4' for writing Output #0, hls, to 'playlist_720.m3u8': Metadata: major_brand : iso6 minor_version : 512 compatible_brands: iso6cmfcmp41 title : Hudl encoder : Lavf59.27.100 Stream #0:0(und): Video: h264, yuv420p(tv, bt709, progressive), 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 25 fps, 12800 tbn (default) Metadata: handler_name : VideoHandler vendor_id : [0][0][0][0] encoder : Lavc59.37.100 libx264 Side data: cpb: bitrate max/min/avg: 1200000/0/0 buffer size: 1200000 vbv_delay: N/A [hls @ 0x4b378c0] Opening '720_0000.m4s' for writingbitrate=N/A speed= 0x [hls @ 0x4b378c0] Opening 'playlist_720.m3u8.tmp' for writing frame= 100 fps=0.0 q=-1.0 Lsize=N/A time=00:00:00.00 bitrate=N/A speed= 0x video:498kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown [libx264 @ 0x4b36640] frame I:1 Avg QP:36.23 size: 66380 [libx264 @ 0x4b36640] frame P:25 Avg QP:37.11 size: 14045 [libx264 @ 0x4b36640] frame B:74 Avg QP:41.92 size: 1235 [libx264 @ 0x4b36640] consecutive B-frames: 1.0% 0.0% 3.0% 96.0% [libx264 @ 0x4b36640] mb I I16..4: 21.9% 0.0% 78.1% [libx264 @ 0x4b36640] mb P I16..4: 0.8% 0.0% 0.3% P16..4: 37.0% 18.4% 12.2% 0.0% 0.0% skip:31.4% [libx264 @ 0x4b36640] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 39.2% 0.4% 0.1% direct: 0.1% skip:60.3% L0:35.5% L1:64.3% BI: 0.2% [libx264 @ 0x4b36640] coded y,uvDC,uvAC intra: 66.2% 26.6% 0.1% inter: 5.9% 1.1% 0.0% [libx264 @ 0x4b36640] i16 v,h,dc,p: 28% 24% 29% 18% [libx264 @ 0x4b36640] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 13% 17% 26% 7% 9% 6% 11% 4% 7% [libx264 @ 0x4b36640] i8c dc,h,v,p: 85% 9% 6% 0% [libx264 @ 0x4b36640] Weighted P-Frames: Y:8.0% UV:0.0% [libx264 @ 0x4b36640] ref P L0: 68.7% 17.2% 12.0% 1.9% 0.2% [libx264 @ 0x4b36640] ref B L0: 98.3% 1.5% 0.1% [libx264 @ 0x4b36640] ref B L1: 96.8% 3.2% [libx264 @ 0x4b36640] kb/s:1017.76
Then running ffprobe to query the first packet I see the following in the new variant:
ffprobe -show_packets -select_streams v -read_intervals "%+#1" -print_format json -hide_banner -loglevel error -i "concat:720_0000.mp4|720_0000.m4s" { "packets": [ { "codec_type": "video", "stream_index": 0, "pts": 5120, "pts_time": "0.400000", "dts": 4096, "dts_time": "0.320000", "size": "67125", "pos": "1816", "flags": "K_" } ] }
When inspecting the sidx for both files, the earliest presentation time for the 1080_0000.m4s shows a value 2048, the 720_0000.m4s shows a value of 4096. Both show different PTS times in ffprobe due to the calculation of PTS including DTS_DRIFT. Basically I am unable to align timestamps successfully using this approach. So my questions are, Is there a bug in the calculation from the muxer? or demuxer? When aligning HLS segment boundaries in fMP4 which value (sidx > earliest_presentation_time or ffprobe packet -> pts) is correct as to not introduce discontinuities?
I believe I can align the value reported in the sidx by using a setpts
filter, but the ffprobe reported value I have been unable to align.
720 primer