Opened 3 years ago
Last modified 2 years ago
#9808 new defect
Curly brace characters in WebVTT subtitle files not encoded correctly
Reported by: | Gavin Llewellyn | Owned by: | |
---|---|---|---|
Priority: | normal | Component: | avcodec |
Version: | unspecified | Keywords: | |
Cc: | Gavin Llewellyn, tfischer | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Summary of the bug:
I am trying to add subtitles from a VTT file to an MP4 video. However, where the VTT file has a pair of curly braces, I only see a backslash character in the MP4 when played back with QuickTime Player.
I can see the same issue when using ffmpeg to convert a VTT file to an SRT file.
How to reproduce:
% ./ffmpeg -i curly_braces.vtt output.srt ffmpeg version N-107064-g7adeeff91f-tessus Copyright (c) 2000-2022 the FFmpeg developers built with Apple clang version 11.0.0 (clang-1100.0.33.17) configuration: --cc=/usr/bin/clang --prefix=/opt/ffmpeg --extra-version=tessus --enable-avisynth --enable-fontconfig --enable-gpl --enable-libaom --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libmysofa --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-libopus --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-version3 --pkg-config-flags=--static --disable-ffplay libavutil 57. 26.100 / 57. 26.100 libavcodec 59. 33.100 / 59. 33.100 libavformat 59. 24.100 / 59. 24.100 libavdevice 59. 6.100 / 59. 6.100 libavfilter 8. 40.100 / 8. 40.100 libswscale 6. 6.100 / 6. 6.100 libswresample 4. 6.100 / 4. 6.100 libpostproc 56. 5.100 / 56. 5.100 Input #0, webvtt, from 'curly_braces.vtt': Duration: N/A, bitrate: N/A Stream #0:0: Subtitle: webvtt Output #0, srt, to 'output.srt': Metadata: encoder : Lavf59.24.100 Stream #0:0: Subtitle: subrip Metadata: encoder : Lavc59.33.100 srt Stream mapping: Stream #0:0 -> #0:0 (webvtt (native) -> subrip (srt)) Press [q] to stop, [?] for help size= 0kB time=00:00:03.00 bitrate= 0.3kbits/s speed=4.44e+03x video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2040.000000%
The input file:
$ cat curly_braces.vtt WEBVTT 1 00:00:01.000 --> 00:00:06.000 { 2 00:00:02.000 --> 00:00:07.000 } 3 00:00:03.000 --> 00:00:08.000 {}
The output file:
$ cat output.srt 1 00:00:01,000 --> 00:00:06,000 \{ 2 00:00:02,000 --> 00:00:07,000 \} 3 00:00:03,000 --> 00:00:08,000 \
Note that these characters do not need to be escaped in the VTT file from what I can see from the WebVTT spec: https://www.w3.org/TR/webvtt1/#webvtt-cue-text-span
Attachments (1)
Change History (3)
comment:1 by , 2 years ago
Component: | undetermined → avcodec |
---|
comment:2 by , 2 years ago
Cc: | added |
---|
Note:
See TracTickets
for help on using tickets.
I can confirm this bug. It happens when a WebVTT file is decoded in libavcodec/webvttdec.c, function webvtt_event_to_ass. See also line 40's comment "escape to avoid ASS markup conflicts".
Although removing line 40 would "solve" this particular bug, it may introduce new problems in other areas of subtitle handling, esp. ASS markup.
My proposed solution is to change function webvtt_encode_frame in libavcodec/webvttenc.c to "unescape" the two affected characters "{" and "}".
I am going to attach a patch that implements this feature.