Opened 22 months ago
Last modified 22 months ago
#10229 open defect
Wrong data size for WAV (RIFF) format
Reported by: | polochon | Owned by: | mkver |
---|---|---|---|
Priority: | normal | Component: | undetermined |
Version: | unspecified | Keywords: | pcm_u8 RIFF WAV |
Cc: | Blocked By: | ||
Blocking: | Reproduced by developer: | no | |
Analyzed by developer: | no |
Description
Hello,
I think there is sometimes an issue when converting wav files to another wave file with specific format.
A problematic input file is freely available on this free sound bank: https://lasonotheque.org/detail-0254-klaxon-de-vielle-voiture-1.html (click on "télécharger" red button to download). The issue is not systematic but happend quite often with different files tried.
What I do: I need to convert sounds for the thymio robot project which accept only:
- 8000Hz sampling
- mono channel
- PCM u8 unsigned
- absolutely no metadatas in file.
The command line used:
ffmpeg -i originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav -f wav -bitexact -map_metadata -1 -c:a pcm_u8 -ac 1 -ar 8000 Pwork.wav
Observed behavior: Thymio robot don't play sounds
Hypothesis:
There is an issue in the datasize field, at adress 0x28 just near the "64 46 74 61" data block.
I understand (by reading some docs on RIFF files) that the next 4 bytes are caclculated like this: size of the file - size of the header, which is 44 bytes in my case.
For my file:
hexdump -C -n 60 Pwork.wav
00000000 52 49 46 46 78 07 00 00 57 41 56 45 66 6d 74 20 |RIFFx...WAVEfmt |
00000010 10 00 00 00 01 00 01 00 40 1f 00 00 40 1f 00 00 |........@...@...|
00000020 01 00 08 00 64 61 74 61 53 07 00 00 7f 7f 7f 7f |....dataS.......|
00000030 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f |............|
0000003c
In this case, the file is 1920bytes so the data size should be 1920-44=1876bytes.
The size is 0x753 which is 1875 bytes, so there is an 1 byte error.
In this case, I'm not able to play the sound with the thymio robot. But if I put the right size (I edit the file manually), this work wells. I suppose that the robot firmware do some verifications before playing a sound.
Thanks
Polochon
Additional infos with complete log:
ffmpeg -v 9 -loglevel 99 -i originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav -f wav -bitexact -map_metadata -1 -c:a pcm_u8 -ac 1 -ar 8000 Pwork.wav
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
Splitting the commandline.
Reading option '-v' ... matched as option 'v' (set logging level) with argument '9'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument '99'.
Reading option '-i' ... matched as input url with argument 'originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav'.
Reading option '-f' ... matched as option 'f' (force format) with argument 'wav'.
Reading option '-bitexact' ... matched as option 'bitexact' (bitexact mode) with argument '1'.
Reading option '-map_metadata' ... matched as option 'map_metadata' (set metadata information of outfile from infile) with argument '-1'.
Reading option '-c:a' ... matched as option 'c' (codec name) with argument 'pcm_u8'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) with argument '1'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '8000'.
Reading option 'Pwork.wav' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument 9.
Successfully parsed a group of options.
Parsing a group of options: input url originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav.
Successfully parsed a group of options.
Opening an input file: originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav.
[NULL @ 0x562bce2d47c0] Opening 'originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav' for reading
[file @ 0x562bce2d5240] Setting default whitelist 'file,crypto'
Probing wav score:99 size:2048
[wav @ 0x562bce2d47c0] Format wav probed with size=2048 and score=99
[wav @ 0x562bce2d47c0] Before avformat_find_stream_info() pos: 44 bytes read:76136 seeks:3 nb_streams:1
[wav @ 0x562bce2d47c0] probing stream 0 pp:32
Probing mp3 score:1 size:4096
[wav @ 0x562bce2d47c0] Probe with size=4096, packets=2469 detected mp3 with score=1
[wav @ 0x562bce2d47c0] probing stream 0 pp:31
[wav @ 0x562bce2d47c0] probing stream 0 pp:30
[wav @ 0x562bce2d47c0] probing stream 0 pp:29
[wav @ 0x562bce2d47c0] probing stream 0 pp:28
[wav @ 0x562bce2d47c0] probing stream 0 pp:27
[wav @ 0x562bce2d47c0] probing stream 0 pp:26
[wav @ 0x562bce2d47c0] probing stream 0 pp:25
[wav @ 0x562bce2d47c0] probing stream 0 pp:24
[wav @ 0x562bce2d47c0] probing stream 0 pp:23
[wav @ 0x562bce2d47c0] probing stream 0 pp:22
[wav @ 0x562bce2d47c0] probing stream 0 pp:21
[wav @ 0x562bce2d47c0] probed stream 0
[wav @ 0x562bce2d47c0] parser not found for codec pcm_s16le, packets or times may be invalid.
[wav @ 0x562bce2d47c0] stream 0: start_time: -209146758205323.719 duration: 0.234
[wav @ 0x562bce2d47c0] format: start_time: -9223372036854.775 duration: 0.234 bitrate=1447 kb/s
[wav @ 0x562bce2d47c0] After avformat_find_stream_info() pos: 42394 bytes read:118486 seeks:3 frames:11
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav':
Metadata:
comment : Klaxon de vielle voiture 1
encoded_by : LaSonotheque.org
originator_reference: 0254
date : 2007-03-30
time_reference : 0
coding_history : A=PCM,F=44100,W=16,M=stereo
copyright : CC0 / WTFPL / Domaine public
IKEY : voiture,klaxon,alerte,signal,sonore,vehicule,auto,automobile,pouet,car,horn,vehicle,klaxonner
Duration: 00:00:00.23, bitrate: 1447 kb/s
Stream #0:0, 11, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s
Successfully opened the file.
Parsing a group of options: output url Pwork.wav.
Applying option f (force format) with argument wav.
Applying option bitexact (bitexact mode) with argument 1.
Applying option map_metadata (set metadata information of outfile from infile) with argument -1.
Applying option c:a (codec name) with argument pcm_u8.
Applying option ac (set number of audio channels) with argument 1.
Applying option ar (set audio sampling rate (in Hz)) with argument 8000.
Successfully parsed a group of options.
Opening an output file: Pwork.wav.
File 'Pwork.wav' already exists. Overwrite ? [y/N] y
[file @ 0x562bce2e8d80] Setting default whitelist 'file,crypto'
Successfully opened the file.
Stream mapping:
Press [q] to stop, ? for help
cur_dts is invalid st:0 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
detected 8 logical cores
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'time_base' to value '1/44100'
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'sample_rate' to value '44100'
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'sample_fmt' to value 's16'
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'channel_layout' to value '0x3'
[graph_0_in_0_0 @ 0x562bce2f5000] tb:1/44100 samplefmt:s16 samplerate:44100 chlayout:0x3
[format_out_0_0 @ 0x562bce2f5740] Setting 'sample_fmts' to value 'u8'
[format_out_0_0 @ 0x562bce2f5740] Setting 'sample_rates' to value '8000'
[format_out_0_0 @ 0x562bce2f5740] Setting 'channel_layouts' to value '0x4'
[format_out_0_0 @ 0x562bce2f5740] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0'
[AVFilterGraph @ 0x562bce2e8880] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
[auto_resampler_0 @ 0x562bce2f86c0] [SWR @ 0x562bce2f8bc0] Using s16p internally between filters
[auto_resampler_0 @ 0x562bce2f86c0] [SWR @ 0x562bce2f8bc0] Matrix coefficients:
[auto_resampler_0 @ 0x562bce2f86c0] [SWR @ 0x562bce2f8bc0] FC: FL:0.500000 FR:0.500000
[auto_resampler_0 @ 0x562bce2f86c0] ch:2 chl:stereo fmt:s16 r:44100Hz -> ch:1 chl:mono fmt:u8 r:8000Hz
Output #0, wav, to 'Pwork.wav':
Stream #0:0, 0, 1/8000: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 8000 Hz, mono, u8, 64 kb/s
Metadata:
encoder : Lavc pcm_u8
[out_0_0 @ 0x562bce2f5f80] EOF on sink link out_0_0:default.
No more output streams to write to, finishing.
size= 2kB time=00:00:00.23 bitrate= 65.5kbits/s speed=57.3x
video:0kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.400000%
Input file #0 (originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav):
Input stream #0:0 (audio): 11 packets read (41332 bytes); 11 frames decoded (10333 samples);
Total: 11 packets (41332 bytes) demuxed
Output file #0 (Pwork.wav):
Output stream #0:0 (audio): 12 frames encoded (1875 samples); 12 packets muxed (1875 bytes);
Total: 12 packets (1875 bytes) muxed
11 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x562bce2d7f80] Statistics: 4 seeks, 5 writeouts
[AVIOContext @ 0x562bce2dd5c0] Statistics: 118486 bytes read, 3 seeks
Attachments (4)
Change History (14)
by , 22 months ago
Attachment: | VEHHorn_Klaxon de vielle voiture 1 (ID 0254)_LS.wav added |
---|
comment:1 by , 22 months ago
So mediainfo does not warn, here is trace from it:
000 WAVE (12 bytes) 000 Header (12 bytes) 000 Name: RIFF 004 Size: 1912 (0x00000778) 008 Real Name: WAVE 00C -------------------------- 00C --- Wave, accepted --- 00C -------------------------- 00C -------------------------- 00C --- Wave, accepted --- 00C -------------------------- 000 Wave (0 bytes) 00C Stream format - Audio (24 bytes) 00C Header (8 bytes) 00C Name: fmt 010 Size: 16 (0x00000010) 014 FormatTag: 1 (0x0001) 016 Channels: 1 (0x0001) 018 SamplesPerSec: 8000 (0x00001F40) 01C AvgBytesPerSec: 8000 (0x00001F40) 020 BlockAlign: 1 (0x0001) 022 BitsPerSample: 8 (0x0008) 024 Raw datas (8 bytes) 024 Header (8 bytes) 024 Name: data 028 Size: 1875 (0x00000753) 02C Alignement: (1 bytes) 02D ...Continued - Unknown - 1 (0x1) (1875 bytes) 02D ...Continued (0 bytes) 02D Block (1875 bytes) 02D Data: (1875 bytes)
and here is without bitexact:
000 WAVE (12 bytes) 000 Header (12 bytes) 000 Name: RIFF 004 Size: 1946 (0x0000079A) 008 Real Name: WAVE 00C -------------------------- 00C --- Wave, accepted --- 00C -------------------------- 00C -------------------------- 00C --- Wave, accepted --- 00C -------------------------- 000 Wave (0 bytes) 00C Stream format - Audio (24 bytes) 00C Header (8 bytes) 00C Name: fmt 010 Size: 16 (0x00000010) 014 FormatTag: 1 (0x0001) 016 Channels: 1 (0x0001) 018 SamplesPerSec: 8000 (0x00001F40) 01C AvgBytesPerSec: 8000 (0x00001F40) 020 BlockAlign: 1 (0x0001) 022 BitsPerSample: 8 (0x0008) 024 Tags (34 bytes) 024 Header (12 bytes) 024 Name: LIST 028 Size: 26 (0x0000001A) 02C Real Name: INFO 030 Encoded_Application - Lavf60.4.100 (22 bytes) 030 Header (8 bytes) 030 Name: ISFT 034 Size: 13 (0x0000000D) 038 Value: Lavf60.4.100 045 Alignement: (1 bytes) 046 Raw datas (8 bytes) 046 Header (8 bytes) 046 Name: data 04A Size: 1875 (0x00000753) 04E Alignement: (1 bytes) 04F ...Continued - Unknown - 1 (0x1) (1875 bytes) 04F ...Continued (0 bytes) 04F Block (1875 bytes) 04F Data: (1875 bytes)
Audition 2023 agrees:
000 WAVE (12 bytes) 000 Header (12 bytes) 000 Name: RIFF 004 Size: 1912 (0x00000778) 008 Real Name: WAVE 00C -------------------------- 00C --- Wave, accepted --- 00C -------------------------- 00C -------------------------- 00C --- Wave, accepted --- 00C -------------------------- 000 Wave (0 bytes) 00C Stream format - Audio (24 bytes) 00C Header (8 bytes) 00C Name: fmt 010 Size: 16 (0x00000010) 014 FormatTag: 1 (0x0001) 016 Channels: 1 (0x0001) 018 SamplesPerSec: 8000 (0x00001F40) 01C AvgBytesPerSec: 8000 (0x00001F40) 020 BlockAlign: 1 (0x0001) 022 BitsPerSample: 8 (0x0008) 024 Raw datas (8 bytes) 024 Header (8 bytes) 024 Name: data 028 Size: 1875 (0x00000753) 02C Alignement: (1 bytes) 02D ...Continued - Unknown - 1 (0x1) (1875 bytes) 02D ...Continued (0 bytes) 02D Block (1875 bytes) 02D Data: (1875 bytes) 780 ------------------------- 780 --- PCM, accepted --- 780 ------------------------- 780 ------------------------ 780 --- PCM, filling --- 780 ------------------------ 780 ------------------------- 780 --- Wave, filling --- 780 ------------------------- 780 -------------------------- 780 --- Wave, finished --- 780 --------------------------
comment:2 by , 22 months ago
Audacity also agrees: there are 1874 samples counting from 0, so 1875 samples.
The only quirk here is that VEHHorn_Klaxon de vielle voiture 1 (ID 0254)_LS.wav is slightly longer. But that is expected, next sample will make it more than half sample longer. In fact -af aresample=resampler=soxr is even one sample smaller. LOL
comment:3 by , 22 months ago
Thank you for your answer. What is the argument you are using with mediainfo command? I didn't know this tool and the option --full don't show the same output.
Several points:
- I see there is a "alignement byte"
Alignement: (1 bytes)
02D ...Continued - Unknown - 1 (0x1) (1875 bytes)
but I don't find it with hexdump. I'm surprised, the sequence I have is 53 07 00 00 7f 7f 7f 7f... but maybe it is normal, I confess I don't read completely the RIFF specifications. Just in case: do you know exactly where I can find the part of the spec describing the size and the alignment?
But if you assume there is no issue, so maybe the real issue is on the "player side" which is the robot's firmware.
I also add the -bitexact option to remove extra infos that I have without: please see ffprobe output:
Metadata:
encoder : Lavf58.29.100
this cause another issue with the robot, so I added -bitexact. If you have a better option please let me know!
thanks A LOT
Pol
comment:4 by , 22 months ago
Hi, sorry to insist but there is something strange here. This is my mediainfo output:
Channel(s) : 1 channel Sampling rate : 8000 Sampling rate : 8 000 Hz Samples count : 1876 Bit depth : 8
It says 1876 samples, not 1875.
I've attached the file converted with the issue to this ticket.
Please, can you compare with the file you've converted on your side? It seems that ffmpeg don't produce the same result...
comment:5 by , 22 months ago
It says 1876 samples, not 1875.
Correct, it is 1876 samples, that is a bug in Audacity, in Adobe Audition it says correctly 1876 samples (or 1875 counting from 0). Strange, but it is what it is.
comment:6 by , 22 months ago
Not sure to understand the point: what is the audacity bug you are talking?
Regarding this file:
- mediainfo says 1876 samples
- according to you, Adobe audition says 1876 samples (if you count from 0 to 1875, the result is still 1876).
=> both tools says the same thing.
So why the data size encoded to the file by ffmpeg is 0x0753 => 1875? Basically, if you do a memcpy(dest,src,size) or a for(i=0; i<size; i++), you right size value is 1876, not 1875 or you will miss a sample.
I've attached to the ticket a new file which is correctly encoded by ffmpeg: aston.wav (source) and astonEncoded.wav (converted by ffmpeg). This file is read correctly by my robot firmware.
mediainfo output:
Channel(s) : 1 channel Sampling rate : 8000 Sampling rate : 8 000 Hz Samples count : 51972
output of converted file:
00000000 52 49 46 46 28 cb 00 00 57 41 56 45 66 6d 74 20 |RIFF(...WAVEfmt | 00000010 10 00 00 00 01 00 01 00 40 1f 00 00 40 1f 00 00 |........@...@...| 00000020 01 00 08 00 64 61 74 61 04 cb 00 00 83 83 83 83 |....data........| 00000030 81 81 81 81 80 7d 7e 7f 7e 80 80 81 81 80 7e 7e |.....}~.~.....~~| 00000040 7e 7e 80 81 81 80 81 81 7f 7d 7e 7e 7d 7e 7e 7e |~~.......}~~}~~~| 00000050
0xCB04 => 51972 samples, exactly the size computed by mediainfo.
To resume:
So, when ffmpeg encode the right size:
- all the tools says are aligned on the same size value
- I can compute the same value with the formula "file size - 44 bytes" (in my case without metadatas):
- there is no issue on the thymio robot
When ffmpeg miss 1 byte is the size field: thymio robot can't play the sound
If I manually edit the size field to put the same value than mediainfo -> the robot can play the file
Maybe it's an issue on certain conditions, maybe this is becauce the 8000Hz data-rate, but I still highly suspect that ffmpeg miss 1 byte in the data size field sometimes
Pol
comment:7 by , 22 months ago
So why the data size encoded to the file by ffmpeg is 0x0753 => 1875?
Because they count from 0. Just like Adobe or Audacity.
comment:8 by , 22 months ago
It's pretty hard to discuss, you don't answer to all my points.... So this is my last question.
Based on the principe that there is an odd samples number (I forget mediainfo output voluntarely), and regarding the official specifications:
- https://www.mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/riffmci.pdf
- https://learn.microsoft.com/en-us/windows/win32/directshow/avi-riff-file-reference#riff-file-format
I've extacted the very important sentence:
ckSize gives the size of the valid data in the chunk; it does not include the padding, the size of ckID, or the size of ckSize.
If we look again in Pwork.vaw:
at address 0x28: 0x753=1875 samples
And the end pf the file:
00000770 80 81 81 80 80 80 80 80 80 80 80 80 80 80 80 00 |................|
The last bythe is the padding byte, OK! So whatever the real samples number, this is coherent with the specification and a good point.
However in this case the ckSize at address 0x04 (0x0778) include this padding byte despite specification seems assume it shouldn't be the case... It seems ffmpeg at this point consider that the last byte is a data like another and count it in the global size.
What is your point of view on this point? Is this behavior specific to ffmpeg or all software are aligned?
Thanks in advance, lot of kids are stuck whith no sounds in their robots at this moment :/
comment:9 by , 22 months ago
Owner: | set to |
---|---|
Status: | new → open |
Based on the principe that there is an odd samples number
No, there are 1876 samples. You are correct.
It seems ffmpeg at this point consider that the last byte is a data like another and count it in the global size.
Possible. We need some people who know the spec. mkver?
comment:10 by , 22 months ago
It appears that if you change wav to -f flac or -f wavpack it says there are two streams there. It appears that this wav is two streams connected together. That is why last sample in Audacity is so strange, yet allows right click, and cut in that last sample.
input file to reproduce the issue