Opened 2 years ago

Closed 21 months ago

#9804 closed defect (worksforme)

MP4 captions break when extracted to scc

Reported by: Zach Owned by:
Priority: normal Component: undetermined
Version: git-master Keywords: cc
Cc: Zach Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description (last modified by Zach)

Summary of the bug:

I have mp4 files coming in from outside content providers that contain user space 608 and 708 captions. When trying to extract an scc file, some of the captions data is lost, resulting in mispellings in the resulting captions file when opened in Adobe Premiere 2022. SRT extraction and transcode to mpegts transport stream or mxf files also breaks captions in different ways.

How to reproduce:

% ffmpeg -f lavfi -i "movie=\'D:\\IIW242.mp4\'"[out+subcc] -map 0:1 -c:s copy "D:\IIW242.scc"
ffmpeg version 2022-06-06-git-73302aa193-full_build-www.gyan.dev

Link to sample file:
https://drive.google.com/file/d/1zdX-Vw3iU37xWR2SPRwvonMhoT1gPEUz/

Attachments (4)

IIW242.scc (268.5 KB ) - added by Zach 2 years ago.
scc extraction from longer program
iiw242.html (8.5 KB ) - added by Zach 2 years ago.
source file mediainfo export
IIW242pp.scc (82.2 KB ) - added by Zach 2 years ago.
SCC Converted from MCC via Premiere Pro
IIW242mcc.zip (577.0 KB ) - added by Zach 2 years ago.
MCC File extracted with ccextract (drastic tv)

Download all attachments as: .zip

Change History (10)

by Zach, 2 years ago

Attachment: IIW242.scc added

scc extraction from longer program

by Zach, 2 years ago

Attachment: iiw242.html added

source file mediainfo export

comment:1 by Zach, 2 years ago

Description: modified (diff)

comment:2 by Balling, 2 years ago

What misspelings? The only thing I can see is that the srt file produced from scc is slightly diferent from srt file produced directly, but only in timestamps:

00:00:00,567 --> 00:00:06,006

vs

00:00:00,561 --> 00:00:06,000

or
00:00:27,427 --> 00:00:29,763

vs
00:00:27,396 --> 00:00:29,759

Last edited 2 years ago by Balling (previous) (diff)

by Zach, 2 years ago

Attachment: IIW242pp.scc added

SCC Converted from MCC via Premiere Pro

by Zach, 2 years ago

Attachment: IIW242mcc.zip added

MCC File extracted with ccextract (drastic tv)

comment:3 by Zach, 2 years ago

The spelling errors are in the decoded captions. Ezekiel becomes Ezekl, etc. That is the easiest example to see in the first 45 seconds of the program. I uploaded some other files to help explain this that are from the entire program. The proper output should be very similar to the scc file I obtained by extracting with ccextract from Drastic TV and converted to SCC via Premiere Pro.

comment:4 by Balling, 2 years ago

Ezekiel becomes Ezekl, etc.

No, it does not. Ezekiel is correctly preserved.

I obtained by extracting with ccextract from Drastic TV and converted to SCC via Premiere Pro.

Well, it has some problems, like

getting what you deserve."</font><font face="Monospace">{\an7}Well, yes,</font>

is one one line, even though, looks like it is wrong. Also some \h problems but that is a known bug in ffmpeg.

Oh and also I see, "this is It is written" is not preserved in part "this is".

Last edited 2 years ago by Balling (previous) (diff)

comment:5 by Marton Balint, 22 months ago

Priority: importantnormal

Not a regression or a crash, so does not qualify as important.

comment:6 by Carl Eugen Hoyos, 21 months ago

Keywords: cc added
Resolution: worksforme
Status: newclosed

There are definitely no captions missing.

Note: See TracTickets for help on using tickets.