Opened 7 months ago
Closed 7 months ago
#11082 closed defect (duplicate)
Converting multichannel audio in FLTP sample format to stereo in S16 attenuates volume unexpectedly
Reported by: | Jiamin.X | Owned by: | |
---|---|---|---|
Priority: | important | Component: | swresample |
Version: | unspecified | Keywords: | resampling |
Cc: | Jiamin.X | Blocked By: | |
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
When converting multichannel audio in FLTP sample format to stereo in S16 sample format, volume is decreased unexpectedly.
The original 6-channel audio input file in FLTP sample format:
% ffprobe multich-audio.mp4 ffprobe version 6.0 Copyright (c) 2007-2023 the FFmpeg developers built with Apple clang version 15.0.0 (clang-1500.0.40.1) configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox libavutil 58. 2.100 / 58. 2.100 libavcodec 60. 3.100 / 60. 3.100 libavformat 60. 3.100 / 60. 3.100 libavdevice 60. 1.100 / 60. 1.100 libavfilter 9. 3.100 / 9. 3.100 libswscale 7. 1.100 / 7. 1.100 libswresample 4. 10.100 / 4. 10.100 libpostproc 57. 1.100 / 57. 1.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomdby1iso2mp41 encoder : www.aliyun.com - Media Transcoding Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default) Metadata: handler_name : SoundHandler vendor_id : [0][0][0][0] Side data: audio service type: main
Converted from 6-channel in fltp to 2-chhanel in flt, output: stereo-flt.mkv
% ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_f32le stereo-flt.mkv Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4': Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default) Stream mapping: Stream #0:0 -> #0:0 (eac3 (native) -> pcm_f32le (native)) Output #0, matroska, to 'stereo-flt.mkv': Stream #0:0(und): Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz, stereo, flt, 3072 kb/s (default)
Converted from 6-channel in fltp to 2-chhanel in s16, output: stereo-s16.mkv
% ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_s16le stereo-s16.mkv Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4': Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default) Stream mapping: Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native)) Output #0, matroska, to 'stereo-s16.mkv': Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s (default)
Converted from 6-channel in fltp to 2-chhanel in s16, with -rematrix_maxval 1000, output: stereo-s16-rematrix_maxval-1000.mkv
% ffmpeg -i multich-audio.mp4 -rematrix_maxval 1000 -ac 2 -c:a pcm_s16le stereo-s16-rematrix_maxval-1000.mkv Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4': Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default) Stream mapping: Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native)) Output #0, matroska, to 'stereo-s16-rematrix_maxval-1000.mkv': Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo, s16, 1536 kb/s (default)
Using volumedetect to check the max and mean volumes of the original file and the 3 generated files above:
- Volume statistics of multich-audio.mp4 (the original file):
% ffmpeg -i multich-audio.mp4 -af "volumedetect" -vn -sn -f null /dev/null Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4': Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz, 5.1(side), fltp, 256 kb/s (default) Stream mapping: Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native)) Output #0, null, to '/dev/null': Stream #0:0(und): Audio: pcm_s16le, 48000 Hz, 5.1(side), s16, 4608 kb/s (default) [Parsed_volumedetect_0 @ 0x7fd9a47052c0] n_samples: 19150848 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] mean_volume: -25.4 dB [Parsed_volumedetect_0 @ 0x7fd9a47052c0] max_volume: -1.9 dB [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_1db: 68 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_2db: 2022 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_3db: 3665 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_4db: 6371 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_5db: 10144
- Volume statistics of stereo-flt.mkv:
% ffmpeg -i stereo-flt.mkv -af "volumedetect" -vn -sn -f null /dev/null Input #0, matroska,webm, from 'stereo-flt.mkv': Stream #0:0: Audio: pcm_f32le, 48000 Hz, 2 channels, flt, 3072 kb/s (default) Stream mapping: Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native)) Output #0, null, to '/dev/null': Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default) [Parsed_volumedetect_0 @ 0x7fce1e404200] n_samples: 6383616 [Parsed_volumedetect_0 @ 0x7fce1e404200] mean_volume: -21.6 dB [Parsed_volumedetect_0 @ 0x7fce1e404200] max_volume: 0.0 dB [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_0db: 1466 [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_1db: 1310 [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_2db: 3452 [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_3db: 4591
- Volume statistics of stereo-rematrix_maxval-1000.mkv:
% ffmpeg -i stereo-s16-rematrix_maxval-1000.mkv -af "volumedetect" -vn -sn -f null /dev/null Input #0, matroska,webm, from 'stereo-s16-rematrix_maxval-1000.mkv': Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default) Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Output #0, null, to '/dev/null': Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default) [Parsed_volumedetect_0 @ 0x7febc6a06180] n_samples: 6383616 [Parsed_volumedetect_0 @ 0x7febc6a06180] mean_volume: -21.6 dB [Parsed_volumedetect_0 @ 0x7febc6a06180] max_volume: 0.0 dB [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_0db: 1466 [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_1db: 1310 [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_2db: 3452 [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_3db: 4591
- Volume statistics of stereo-s16.mkv:
% ffmpeg -i stereo-s16.mkv -af "volumedetect" -vn -sn -f null /dev/null Input #0, matroska,webm, from 'stereo-s16.mkv': Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default) Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Output #0, null, to '/dev/null': Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s (default) [Parsed_volumedetect_0 @ 0x7fbf80a080c0] n_samples: 6383616 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] mean_volume: -29.3 dB [Parsed_volumedetect_0 @ 0x7fbf80a080c0] max_volume: -5.9 dB [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_5db: 21 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_6db: 294 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_7db: 598 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_8db: 831 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_9db: 1816 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_10db: 4223
From the above, we can see that converting to fltp or s16 with -rematrix_maxval 100 have the exact same max and mean volume, while converting to s16 directly without setting -rematrix_maxval results in lot weaker volume. If I convert from 2-channel flt to 2-channel s16, the volume will not be affected at all.
I checked the code related to the rematrix_maxval setting in libswresample/rematrix.c (shown below), we can see if rematrix_maxval is not manually set, it will be treated differently based on the input and output sample formats, basically for s16 as either the input or output sample format, rematrix_maxval will be set to 1, which will affect the matrix params used in the later rematrix process, essentially attenuates the coefficients used for downmixing and cause volume attenuation as a result.
My confusion is that why we have to check the output sample format and adjust this rematrix_maxval accordingly before dowxmixing, it looks to me that only the input sample format will affect the rematrix/downmix process, because rematrix/downmix will operate on the input data the same way regardless of the output sample format. If I am right, we may need to remove the av_get_packed_sample_fmt(s->out_sample_fmt) < AV_SAMPLE_FMT_FLT check in the fowlloing code (If this is correct, I may send a pull request later):
av_cold static int auto_matrix(SwrContext *s) { double maxval; int ret; if (s->rematrix_maxval > 0) { maxval = s->rematrix_maxval; } else if ( av_get_packed_sample_fmt(s->out_sample_fmt) < AV_SAMPLE_FMT_FLT || av_get_packed_sample_fmt(s->int_sample_fmt) < AV_SAMPLE_FMT_FLT) { maxval = 1.0; } else maxval = INT_MAX; ... } av_cold int swr_build_matrix(uint64_t in_ch_layout_param, uint64_t out_ch_layout_param, double center_mix_level, double surround_mix_level, double lfe_mix_level, double maxval, double rematrix_volume, double *matrix_param, int stride, enum AVMatrixEncoding matrix_encoding, void *log_context) { ... if(maxcoef > maxval || rematrix_volume < 0){ maxcoef /= maxval; for(i=0; i<SWR_CH_MAX; i++) for(j=0; j<SWR_CH_MAX; j++){ matrix_param[stride*i + j] /= maxcoef; } } .... }