Opened 12 months ago

Last modified 12 months ago

#10229 open defect

Wrong data size for WAV (RIFF) format

Reported by: polochon Owned by: mkver
Priority: normal Component: undetermined
Version: unspecified Keywords: pcm_u8 RIFF WAV
Cc: Blocked By:
Blocking: Reproduced by developer: no
Analyzed by developer: no

Description

Hello,

I think there is sometimes an issue when converting wav files to another wave file with specific format.

A problematic input file is freely available on this free sound bank: https://lasonotheque.org/detail-0254-klaxon-de-vielle-voiture-1.html (click on "télécharger" red button to download). The issue is not systematic but happend quite often with different files tried.

What I do: I need to convert sounds for the thymio robot project which accept only:

  • 8000Hz sampling
  • mono channel
  • PCM u8 unsigned
  • absolutely no metadatas in file.

The command line used:
ffmpeg -i originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav -f wav -bitexact -map_metadata -1 -c:a pcm_u8 -ac 1 -ar 8000 Pwork.wav

Observed behavior: Thymio robot don't play sounds

Hypothesis:
There is an issue in the datasize field, at adress 0x28 just near the "64 46 74 61" data block.
I understand (by reading some docs on RIFF files) that the next 4 bytes are caclculated like this: size of the file - size of the header, which is 44 bytes in my case.
For my file:
hexdump -C -n 60 Pwork.wav
00000000 52 49 46 46 78 07 00 00 57 41 56 45 66 6d 74 20 |RIFFx...WAVEfmt |
00000010 10 00 00 00 01 00 01 00 40 1f 00 00 40 1f 00 00 |........@...@...|
00000020 01 00 08 00 64 61 74 61 53 07 00 00 7f 7f 7f 7f |....dataS.......|
00000030 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f 7f |............|
0000003c

In this case, the file is 1920bytes so the data size should be 1920-44=1876bytes.
The size is 0x753 which is 1875 bytes, so there is an 1 byte error.

In this case, I'm not able to play the sound with the thymio robot. But if I put the right size (I edit the file manually), this work wells. I suppose that the robot firmware do some verifications before playing a sound.

Thanks
Polochon

Additional infos with complete log:
ffmpeg -v 9 -loglevel 99 -i originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav -f wav -bitexact -map_metadata -1 -c:a pcm_u8 -ac 1 -ar 8000 Pwork.wav
ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers

built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100

Splitting the commandline.
Reading option '-v' ... matched as option 'v' (set logging level) with argument '9'.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument '99'.
Reading option '-i' ... matched as input url with argument 'originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav'.
Reading option '-f' ... matched as option 'f' (force format) with argument 'wav'.
Reading option '-bitexact' ... matched as option 'bitexact' (bitexact mode) with argument '1'.
Reading option '-map_metadata' ... matched as option 'map_metadata' (set metadata information of outfile from infile) with argument '-1'.
Reading option '-c:a' ... matched as option 'c' (codec name) with argument 'pcm_u8'.
Reading option '-ac' ... matched as option 'ac' (set number of audio channels) with argument '1'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '8000'.
Reading option 'Pwork.wav' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option v (set logging level) with argument 9.
Successfully parsed a group of options.
Parsing a group of options: input url originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav.
Successfully parsed a group of options.
Opening an input file: originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav.
[NULL @ 0x562bce2d47c0] Opening 'originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav' for reading
[file @ 0x562bce2d5240] Setting default whitelist 'file,crypto'
Probing wav score:99 size:2048
[wav @ 0x562bce2d47c0] Format wav probed with size=2048 and score=99
[wav @ 0x562bce2d47c0] Before avformat_find_stream_info() pos: 44 bytes read:76136 seeks:3 nb_streams:1
[wav @ 0x562bce2d47c0] probing stream 0 pp:32
Probing mp3 score:1 size:4096
[wav @ 0x562bce2d47c0] Probe with size=4096, packets=2469 detected mp3 with score=1
[wav @ 0x562bce2d47c0] probing stream 0 pp:31
[wav @ 0x562bce2d47c0] probing stream 0 pp:30
[wav @ 0x562bce2d47c0] probing stream 0 pp:29
[wav @ 0x562bce2d47c0] probing stream 0 pp:28
[wav @ 0x562bce2d47c0] probing stream 0 pp:27
[wav @ 0x562bce2d47c0] probing stream 0 pp:26
[wav @ 0x562bce2d47c0] probing stream 0 pp:25
[wav @ 0x562bce2d47c0] probing stream 0 pp:24
[wav @ 0x562bce2d47c0] probing stream 0 pp:23
[wav @ 0x562bce2d47c0] probing stream 0 pp:22
[wav @ 0x562bce2d47c0] probing stream 0 pp:21
[wav @ 0x562bce2d47c0] probed stream 0
[wav @ 0x562bce2d47c0] parser not found for codec pcm_s16le, packets or times may be invalid.
[wav @ 0x562bce2d47c0] stream 0: start_time: -209146758205323.719 duration: 0.234
[wav @ 0x562bce2d47c0] format: start_time: -9223372036854.775 duration: 0.234 bitrate=1447 kb/s
[wav @ 0x562bce2d47c0] After avformat_find_stream_info() pos: 42394 bytes read:118486 seeks:3 frames:11
Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav':

Metadata:

comment : Klaxon de vielle voiture 1
encoded_by : LaSonotheque.org
originator_reference: 0254
date : 2007-03-30
time_reference : 0
coding_history : A=PCM,F=44100,W=16,M=stereo
copyright : CC0 / WTFPL / Domaine public
IKEY : voiture,klaxon,alerte,signal,sonore,vehicule,auto,automobile,pouet,car,horn,vehicle,klaxonner

Duration: 00:00:00.23, bitrate: 1447 kb/s

Stream #0:0, 11, 1/44100: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s

Successfully opened the file.
Parsing a group of options: output url Pwork.wav.
Applying option f (force format) with argument wav.
Applying option bitexact (bitexact mode) with argument 1.
Applying option map_metadata (set metadata information of outfile from infile) with argument -1.
Applying option c:a (codec name) with argument pcm_u8.
Applying option ac (set number of audio channels) with argument 1.
Applying option ar (set audio sampling rate (in Hz)) with argument 8000.
Successfully parsed a group of options.
Opening an output file: Pwork.wav.
File 'Pwork.wav' already exists. Overwrite ? [y/N] y
[file @ 0x562bce2e8d80] Setting default whitelist 'file,crypto'
Successfully opened the file.
Stream mapping:

Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_u8 (native))

Press [q] to stop, ? for help
cur_dts is invalid st:0 (0) [init:0 i_done:0 finish:0] (this is harmless if it occurs once at the start per stream)
detected 8 logical cores
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'time_base' to value '1/44100'
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'sample_rate' to value '44100'
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'sample_fmt' to value 's16'
[graph_0_in_0_0 @ 0x562bce2f5000] Setting 'channel_layout' to value '0x3'
[graph_0_in_0_0 @ 0x562bce2f5000] tb:1/44100 samplefmt:s16 samplerate:44100 chlayout:0x3
[format_out_0_0 @ 0x562bce2f5740] Setting 'sample_fmts' to value 'u8'
[format_out_0_0 @ 0x562bce2f5740] Setting 'sample_rates' to value '8000'
[format_out_0_0 @ 0x562bce2f5740] Setting 'channel_layouts' to value '0x4'
[format_out_0_0 @ 0x562bce2f5740] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_0'
[AVFilterGraph @ 0x562bce2e8880] query_formats: 4 queried, 6 merged, 3 already done, 0 delayed
[auto_resampler_0 @ 0x562bce2f86c0] [SWR @ 0x562bce2f8bc0] Using s16p internally between filters
[auto_resampler_0 @ 0x562bce2f86c0] [SWR @ 0x562bce2f8bc0] Matrix coefficients:
[auto_resampler_0 @ 0x562bce2f86c0] [SWR @ 0x562bce2f8bc0] FC: FL:0.500000 FR:0.500000
[auto_resampler_0 @ 0x562bce2f86c0] ch:2 chl:stereo fmt:s16 r:44100Hz -> ch:1 chl:mono fmt:u8 r:8000Hz
Output #0, wav, to 'Pwork.wav':

Stream #0:0, 0, 1/8000: Audio: pcm_u8 ([1][0][0][0] / 0x0001), 8000 Hz, mono, u8, 64 kb/s
Metadata:

encoder : Lavc pcm_u8

[out_0_0 @ 0x562bce2f5f80] EOF on sink link out_0_0:default.
No more output streams to write to, finishing.
size= 2kB time=00:00:00.23 bitrate= 65.5kbits/s speed=57.3x
video:0kB audio:2kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.400000%
Input file #0 (originalWav/VEHHorn_Klaxon_de_vielle_voiture_1.wav):

Input stream #0:0 (audio): 11 packets read (41332 bytes); 11 frames decoded (10333 samples);
Total: 11 packets (41332 bytes) demuxed

Output file #0 (Pwork.wav):

Output stream #0:0 (audio): 12 frames encoded (1875 samples); 12 packets muxed (1875 bytes);
Total: 12 packets (1875 bytes) muxed

11 frames successfully decoded, 0 decoding errors
[AVIOContext @ 0x562bce2d7f80] Statistics: 4 seeks, 5 writeouts
[AVIOContext @ 0x562bce2dd5c0] Statistics: 118486 bytes read, 3 seeks

Attachments (4)

VEHHorn_Klaxon de vielle voiture 1 (ID 0254)_LS.wav (41.4 KB ) - added by polochon 12 months ago.
input file to reproduce the issue
Pwork.wav (1.9 KB ) - added by polochon 12 months ago.
file-with-issue.wav
aston.wav (610.2 KB ) - added by polochon 12 months ago.
new aston martin source file
astonEncoded.wav (50.8 KB ) - added by polochon 12 months ago.
aston file encoded no issue

Download all attachments as: .zip

Change History (14)

by polochon, 12 months ago

input file to reproduce the issue

comment:1 by Balling, 12 months ago

So mediainfo does not warn, here is trace from it:

000 WAVE (12 bytes)
000  Header (12 bytes)
000   Name:                                 RIFF
004   Size:                                 1912 (0x00000778)
008   Real Name:                            WAVE
00C  --------------------------
00C  ---   Wave, accepted   ---
00C  --------------------------
00C --------------------------
00C ---   Wave, accepted   ---
00C --------------------------
000 Wave (0 bytes)
00C  Stream format - Audio (24 bytes)
00C   Header (8 bytes)
00C    Name:                                fmt 
010    Size:                                16 (0x00000010)
014   FormatTag:                            1 (0x0001)
016   Channels:                             1 (0x0001)
018   SamplesPerSec:                        8000 (0x00001F40)
01C   AvgBytesPerSec:                       8000 (0x00001F40)
020   BlockAlign:                           1 (0x0001)
022   BitsPerSample:                        8 (0x0008)
024  Raw datas (8 bytes)
024   Header (8 bytes)
024    Name:                                data
028    Size:                                1875 (0x00000753)
02C   Alignement:                           (1 bytes)
02D  ...Continued - Unknown - 1 (0x1) (1875 bytes)
02D   ...Continued (0 bytes)
02D   Block (1875 bytes)
02D    Data:                                (1875 bytes)

and here is without bitexact:

000 WAVE (12 bytes)
000  Header (12 bytes)
000   Name:                                 RIFF
004   Size:                                 1946 (0x0000079A)
008   Real Name:                            WAVE
00C  --------------------------
00C  ---   Wave, accepted   ---
00C  --------------------------
00C --------------------------
00C ---   Wave, accepted   ---
00C --------------------------
000 Wave (0 bytes)
00C  Stream format - Audio (24 bytes)
00C   Header (8 bytes)
00C    Name:                                fmt 
010    Size:                                16 (0x00000010)
014   FormatTag:                            1 (0x0001)
016   Channels:                             1 (0x0001)
018   SamplesPerSec:                        8000 (0x00001F40)
01C   AvgBytesPerSec:                       8000 (0x00001F40)
020   BlockAlign:                           1 (0x0001)
022   BitsPerSample:                        8 (0x0008)
024  Tags (34 bytes)
024   Header (12 bytes)
024    Name:                                LIST
028    Size:                                26 (0x0000001A)
02C    Real Name:                           INFO
030   Encoded_Application - Lavf60.4.100 (22 bytes)
030    Header (8 bytes)
030     Name:                               ISFT
034     Size:                               13 (0x0000000D)
038    Value:                               Lavf60.4.100
045    Alignement:                          (1 bytes)
046  Raw datas (8 bytes)
046   Header (8 bytes)
046    Name:                                data
04A    Size:                                1875 (0x00000753)
04E   Alignement:                           (1 bytes)
04F  ...Continued - Unknown - 1 (0x1) (1875 bytes)
04F   ...Continued (0 bytes)
04F   Block (1875 bytes)
04F    Data:                                (1875 bytes)

Audition 2023 agrees:

000 WAVE (12 bytes)
000  Header (12 bytes)
000   Name:                                 RIFF
004   Size:                                 1912 (0x00000778)
008   Real Name:                            WAVE
00C  --------------------------
00C  ---   Wave, accepted   ---
00C  --------------------------
00C --------------------------
00C ---   Wave, accepted   ---
00C --------------------------
000 Wave (0 bytes)
00C  Stream format - Audio (24 bytes)
00C   Header (8 bytes)
00C    Name:                                fmt 
010    Size:                                16 (0x00000010)
014   FormatTag:                            1 (0x0001)
016   Channels:                             1 (0x0001)
018   SamplesPerSec:                        8000 (0x00001F40)
01C   AvgBytesPerSec:                       8000 (0x00001F40)
020   BlockAlign:                           1 (0x0001)
022   BitsPerSample:                        8 (0x0008)
024  Raw datas (8 bytes)
024   Header (8 bytes)
024    Name:                                data
028    Size:                                1875 (0x00000753)
02C   Alignement:                           (1 bytes)
02D  ...Continued - Unknown - 1 (0x1) (1875 bytes)
02D   ...Continued (0 bytes)
02D   Block (1875 bytes)
02D    Data:                                (1875 bytes)
780   -------------------------
780   ---   PCM, accepted   ---
780   -------------------------
780   ------------------------
780   ---   PCM, filling   ---
780   ------------------------
780 -------------------------
780 ---   Wave, filling   ---
780 -------------------------
780 --------------------------
780 ---   Wave, finished   ---
780 --------------------------

Last edited 12 months ago by Balling (previous) (diff)

comment:2 by Balling, 12 months ago

Audacity also agrees: there are 1874 samples counting from 0, so 1875 samples.

The only quirk here is that VEHHorn_Klaxon de vielle voiture 1 (ID 0254)_LS.wav is slightly longer. But that is expected, next sample will make it more than half sample longer. In fact -af aresample=resampler=soxr is even one sample smaller. LOL

comment:3 by polochon, 12 months ago

Thank you for your answer. What is the argument you are using with mediainfo command? I didn't know this tool and the option --full don't show the same output.

Several points:

  • I see there is a "alignement byte"

Alignement: (1 bytes)
02D ...Continued - Unknown - 1 (0x1) (1875 bytes)
but I don't find it with hexdump. I'm surprised, the sequence I have is 53 07 00 00 7f 7f 7f 7f... but maybe it is normal, I confess I don't read completely the RIFF specifications. Just in case: do you know exactly where I can find the part of the spec describing the size and the alignment?

But if you assume there is no issue, so maybe the real issue is on the "player side" which is the robot's firmware.

I also add the -bitexact option to remove extra infos that I have without: please see ffprobe output:
Metadata:

encoder : Lavf58.29.100

this cause another issue with the robot, so I added -bitexact. If you have a better option please let me know!

thanks A LOT
Pol

by polochon, 12 months ago

Attachment: Pwork.wav added

file-with-issue.wav

comment:4 by polochon, 12 months ago

Hi, sorry to insist but there is something strange here. This is my mediainfo output:

Channel(s)                               : 1 channel
Sampling rate                            : 8000
Sampling rate                            : 8 000 Hz
Samples count                            : 1876
Bit depth                                : 8

It says 1876 samples, not 1875.
I've attached the file converted with the issue to this ticket.
Please, can you compare with the file you've converted on your side? It seems that ffmpeg don't produce the same result...

Last edited 12 months ago by polochon (previous) (diff)

comment:5 by Balling, 12 months ago

It says 1876 samples, not 1875.

Correct, it is 1876 samples, that is a bug in Audacity, in Adobe Audition it says correctly 1876 samples (or 1875 counting from 0). Strange, but it is what it is.

Last edited 12 months ago by Balling (previous) (diff)

by polochon, 12 months ago

Attachment: aston.wav added

new aston martin source file

by polochon, 12 months ago

Attachment: astonEncoded.wav added

aston file encoded no issue

comment:6 by polochon, 12 months ago

Not sure to understand the point: what is the audacity bug you are talking?

Regarding this file:

  • mediainfo says 1876 samples
  • according to you, Adobe audition says 1876 samples (if you count from 0 to 1875, the result is still 1876).

=> both tools says the same thing.

So why the data size encoded to the file by ffmpeg is 0x0753 => 1875? Basically, if you do a memcpy(dest,src,size) or a for(i=0; i<size; i++), you right size value is 1876, not 1875 or you will miss a sample.

I've attached to the ticket a new file which is correctly encoded by ffmpeg: aston.wav (source) and astonEncoded.wav (converted by ffmpeg). This file is read correctly by my robot firmware.

mediainfo output:

Channel(s)                               : 1 channel
Sampling rate                            : 8000
Sampling rate                            : 8 000 Hz
Samples count                            : 51972

output of converted file:

00000000  52 49 46 46 28 cb 00 00  57 41 56 45 66 6d 74 20  |RIFF(...WAVEfmt |
00000010  10 00 00 00 01 00 01 00  40 1f 00 00 40 1f 00 00  |........@...@...|
00000020  01 00 08 00 64 61 74 61  04 cb 00 00 83 83 83 83  |....data........|
00000030  81 81 81 81 80 7d 7e 7f  7e 80 80 81 81 80 7e 7e  |.....}~.~.....~~|
00000040  7e 7e 80 81 81 80 81 81  7f 7d 7e 7e 7d 7e 7e 7e  |~~.......}~~}~~~|
00000050

0xCB04 => 51972 samples, exactly the size computed by mediainfo.

To resume:

So, when ffmpeg encode the right size:

  • all the tools says are aligned on the same size value
  • I can compute the same value with the formula "file size - 44 bytes" (in my case without metadatas):
  • there is no issue on the thymio robot

When ffmpeg miss 1 byte is the size field: thymio robot can't play the sound
If I manually edit the size field to put the same value than mediainfo -> the robot can play the file

Maybe it's an issue on certain conditions, maybe this is becauce the 8000Hz data-rate, but I still highly suspect that ffmpeg miss 1 byte in the data size field sometimes

Pol

comment:7 by Balling, 12 months ago

So why the data size encoded to the file by ffmpeg is 0x0753 => 1875?

Because they count from 0. Just like Adobe or Audacity.

Last edited 12 months ago by Balling (previous) (diff)

comment:8 by polochon, 12 months ago

It's pretty hard to discuss, you don't answer to all my points.... So this is my last question.

Based on the principe that there is an odd samples number (I forget mediainfo output voluntarely), and regarding the official specifications:

I've extacted the very important sentence:

ckSize gives the size of the valid data in the chunk; it does not include the padding, the size of ckID, or the size of ckSize.

If we look again in Pwork.vaw:
at address 0x28: 0x753=1875 samples
And the end pf the file:

00000770  80 81 81 80 80 80 80 80  80 80 80 80 80 80 80 00  |................|

The last bythe is the padding byte, OK! So whatever the real samples number, this is coherent with the specification and a good point.

However in this case the ckSize at address 0x04 (0x0778) include this padding byte despite specification seems assume it shouldn't be the case... It seems ffmpeg at this point consider that the last byte is a data like another and count it in the global size.

What is your point of view on this point? Is this behavior specific to ffmpeg or all software are aligned?
Thanks in advance, lot of kids are stuck whith no sounds in their robots at this moment :/

comment:9 by Balling, 12 months ago

Owner: set to mkver
Status: newopen

Based on the principe that there is an odd samples number

No, there are 1876 samples. You are correct.

It seems ffmpeg at this point consider that the last byte is a data like another and count it in the global size.

Possible. We need some people who know the spec. mkver?

Last edited 12 months ago by Balling (previous) (diff)

comment:10 by Balling, 12 months ago

It appears that if you change wav to -f flac or -f wavpack it says there are two streams there. It appears that this wav is two streams connected together. That is why last sample in Audacity is so strange, yet allows right click, and cut in that last sample.

Note: See TracTickets for help on using tickets.