This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

[BUG] Passthrough audio changes sample rate on multiple audio streams
#1
Hello,

I have some PAL SD videos captured from digital camera using Windows Media Player over FireWire. They are Type-2 AVIs (the audio stream is stored twice - once interleaved in the video, and once separately, where the interleaved stream is disabled). Here's a sample of ffprobe:

  Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 25000 kb/s, 25 fps, 25 tbr, 25 tbn
  Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
  Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s

For some reason, the interleaved stream is at 48000 Hz (wrong!), but the real stream is correct at 32000 Hz. This shouldn't be a problem because the wrong stream is disabled anyway. When I encode with Hybrid using "passthrough all", I expect the streams to be preserved. However, it seems like instead, the second one gets resampled to 48000 Hz:

  Stream #0:0: Video: utvideo (ULY0 / 0x30594C55), yuv420p(tv, bt470bg/bt470bg/bt709), 720x576, SAR 16:15 DAR 4:3, 50 fps, 50 tbr, 1k tbn
    Metadata:
      ENCODER         : Lavc59.25.100 utvideo
      BPS             : 117941433
      DURATION        : 00:00:52.040000000
      NUMBER_OF_FRAMES: 2602
      NUMBER_OF_BYTES : 767209028
      _STATISTICS_WRITING_APP: mkvmerge v67.0.0 ('Under Stars') 64-bit
      _STATISTICS_WRITING_DATE_UTC: 2022-07-09 14:27:00
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
  Stream #0:1: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default)
    Metadata:
      BPS             : 1536016
      DURATION        : 00:00:34.757000000
      NUMBER_OF_FRAMES: 869
      NUMBER_OF_BYTES : 6673416
      _STATISTICS_WRITING_APP: mkvmerge v67.0.0 ('Under Stars') 64-bit
      _STATISTICS_WRITING_DATE_UTC: 2022-07-09 14:27:00
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
  Stream #0:2: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
    Metadata:
      BPS             : 1535994
      DURATION        : 00:00:51.840000000
      NUMBER_OF_FRAMES: 1296
      NUMBER_OF_BYTES : 9953244
      _STATISTICS_WRITING_APP: mkvmerge v67.0.0 ('Under Stars') 64-bit
      _STATISTICS_WRITING_DATE_UTC: 2022-07-09 14:27:00
      _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES

As you would expect, this resampling completely messes the audio when played back.

Out of curiosity, why does Hybrid extract the audio streams and them merge them back in? This sounds like an error-prone process.

Actually, even if you fix the "passthrough" issue, so both streams keep their sample rate, I think that would still break my case because when encoding the final video, Hybrid won't set the second stream as default/the first one as disabled. Would it be possible to just add audio "auto add (last)" the same way there is "auto add (first)"? This will be a nice trick for all Type-2 AVIs. I was planning to work around this issue by using the mkvmerge custom option "--audio-tracks 1" and select the second stream, but that fails since Hybrid already uses "--no-audio" due to the extraction mechanism above.
The reason why I can't just manually make these selections is because I am running an automated pipeline with zero user interaction.

My system:
- Windows 11
- Hybrid 2022.07.01.1

Thank you!
Reply
#2
Quote:Out of curiosity, why does Hybrid extract the audio streams and them merge them back in? This sounds like an error-prone process.
Hybrid does process audio and video separately since it offers way more varity in how to deal with the audio and video.
(normally works fine, but I haven't had to work with 32Hz audio for 10+years Wink)

Quote:Would it be possible to just add audio "auto add (last)" the same way there is "auto add (first)"?
Possible in theory, sure. But sorry this won't come soon. (Haven't changed anything in the general auto audio routines for ages, would need some time to re-read how everything is intertwined)
(side note: Hybrid does allow filtering by language, if the audio is tagged accordingly, which your streams doe not seem to be)

If you can share a small sample of such a source I can try to reproduce and fix the sample rate issue.

Quote:They are Type-2 AVIs (the audio stream is stored twice - once interleaved in the video, and once separately, where the interleaved stream is disabled).
Out of curiousity: Who does that and why?

Cu Selur
Reply
#3
Quote:Out of curiousity: Who does that and why?

Type-1 and Type-2 AVI dates to the times when computers were severely underpowered. Here is a very good explanation by Microsoft themselves:

Quote:DV cameras produce interleaved audio-video; each frame of video also contains the audio information. If you save DV data to an AVI file, you have a choice:

- Store the interleaved data as one stream in the AVI file. This is known as a type-1 file.
- Split the interleaved data into separate audio and video streams. This is known as a type-2 file.

For video capture, where maximum throughput is crucial, it is better to use a type-1 file, because type-2 files carry redundant audio data. (The video stream still has the audio data. The audio is simply hidden by labeling the stream as video.) Also, writing a type-2 file requires some additional processor time to split the interleaved stream.

On the other hand, type-1 files are less efficient for real-time editing. The application must extract the audio from the interleaved stream, make the edits, and interleave the data again. Also, the type-1 format is not compatible with Microsoft® Video for Windows® (VFW). DirectShow can handle both types of files.

Source: https://docs.microsoft.com/en-us/windows...-avi-files

Quote:If you can share a small sample of such a source I can try to reproduce and fix the sample rate issue.

A sample incoming to your PMs!

Thanks
Reply
#4
Okay, the problem seems to be that MediaInfo reports only one audio stream:
General
Count                                    : 331
Count of stream of this kind             : 1
Kind of stream                           : General
Kind of stream                           : General
Stream identifier                        : 0
Count of video streams                   : 1
Count of audio streams                   : 1
Video_Format_List                        : DV
Video_Format_WithHint_List               : DV
Codecs Video                             : DV
Audio_Format_List                        : PCM
Audio_Format_WithHint_List               : PCM
Audio codecs                             : PCM
Complete name                            : C:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi
Folder name                              : C:\Users\Selur\Desktop
File name extension                      : multiaudio_32kHz_2006-04-02 17.24.00.avi
File name                                : multiaudio_32kHz_2006-04-02 17.24.00
File extension                           : avi
Format                                   : AVI
Format                                   : AVI
Format/Info                              : Audio Video Interleave
Format/Extensions usually used           : avi
Commercial name                          : AVI DVCAM
Commercial name                          : DVCAM
Internet media type                      : video/vnd.avi
File size                                : 22880256
File size                                : 21.8 MiB
File size                                : 22 MiB
File size                                : 22 MiB
File size                                : 21.8 MiB
File size                                : 21.82 MiB
Duration                                 : 6320
Duration                                 : 6 s 320 ms
Duration                                 : 6 s 320 ms
Duration                                 : 6 s 320 ms
Duration                                 : 00:00:06.320
Duration                                 : 00:00:06:08
Duration                                 : 00:00:06.320 (00:00:06:08)
Overall bit rate mode                    : CBR
Overall bit rate mode                    : Constant
Overall bit rate                         : 28962349
Overall bit rate                         : 29.0 Mb/s
Frame rate                               : 25.000
Frame rate                               : 25.000 FPS
Frame count                              : 158
Stream size                              : 128256
Stream size                              : 125 KiB (1%)
Stream size                              : 125 KiB
Stream size                              : 125 KiB
Stream size                              : 125 KiB
Stream size                              : 125.2 KiB
Stream size                              : 125 KiB (1%)
Proportion of this stream                : 0.00561
Recorded date                            : 2009-07-12 19:57:55.000
File creation date                       : UTC 2022-07-10 04:47:11.314
File creation date (local)               : 2022-07-10 06:47:11.314
File last modification date              : UTC 2022-07-10 04:47:11.421
File last modification date (local)      : 2022-07-10 06:47:11.421

Video
Count                                    : 381
Count of stream of this kind             : 1
Kind of stream                           : Video
Kind of stream                           : Video
Stream identifier                        : 0
StreamOrder                              : 0
ID                                       : 0
ID                                       : 0
Format                                   : DV
Format                                   : DV
Commercial name                          : DVCAM
Commercial name                          : DVCAM
Internet media type                      : video/DV
Duration                                 : 6320
Duration                                 : 6 s 320 ms
Duration                                 : 6 s 320 ms
Duration                                 : 6 s 320 ms
Duration                                 : 00:00:06.320
Duration                                 : 00:00:06:08
Duration                                 : 00:00:06.320 (00:00:06:08)
Bit rate mode                            : CBR
Bit rate mode                            : Constant
Bit rate                                 : 24441600
Bit rate                                 : 24.4 Mb/s
Encoded bit rate                         : 28800000
Encoded bit rate                         : 28.8 Mb/s
Width                                    : 720
Width                                    : 720 pixels
Height                                   : 576
Height                                   : 576 pixels
Pixel aspect ratio                       : 1.067
Display aspect ratio                     : 1.333
Display aspect ratio                     : 4:3
Frame rate mode                          : CFR
Frame rate mode                          : Constant
Frame rate                               : 25.000
Frame rate                               : 25.000 FPS
Frame count                              : 158
Standard                                 : PAL
Color space                              : YUV
Chroma subsampling                       : 4:2:0
Chroma subsampling                       : 4:2:0
Bit depth                                : 8
Bit depth                                : 8 bits
Scan type                                : Interlaced
Scan type                                : Interlaced
Scan order                               : BFF
Scan order                               : Bottom Field First
Compression mode                         : Lossy
Compression mode                         : Lossy
Bits/(Pixel*Frame)                       : 2.357
Delay                                    : 3755440
Delay                                    : 1 h 2 min
Delay                                    : 1 h 2 min 35 s 440 ms
Delay                                    : 1 h 2 min
Delay                                    : 01:02:35.440
Delay                                    : 01:02:35:11
Delay                                    : 01:02:35.440 (01:02:35:11)
Delay_DropFrame                          : No
Delay, origin                            : Stream
Delay, origin                            : Raw stream
Time code of first frame                 : 01:02:35:11
TimeCode_DropFrame                       : No
Time code source                         : Subcode time code
Stream size                              : 22752000
Stream size                              : 21.7 MiB (99%)
Stream size                              : 22 MiB
Stream size                              : 22 MiB
Stream size                              : 21.7 MiB
Stream size                              : 21.70 MiB
Stream size                              : 21.7 MiB (99%)
Proportion of this stream                : 0.99439

Audio
Count                                    : 285
Count of stream of this kind             : 1
Kind of stream                           : Audio
Kind of stream                           : Audio
Stream identifier                        : 0
ID                                       : 0-0
ID                                       : 0-0
Format                                   : PCM
Format                                   : PCM
Commercial name                          : PCM
Format settings                          : Big / Signed
Format settings, Endianness              : Big
Format settings, Sign                    : Signed
Muxing mode                              : DV
Muxing mode, more info                   : Muxed in Video #1
Duration                                 : 6320
Duration                                 : 6 s 320 ms
Duration                                 : 6 s 320 ms
Duration                                 : 6 s 320 ms
Duration                                 : 00:00:06.320
Duration                                 : 00:00:06.320
Bit rate mode                            : CBR
Bit rate mode                            : Constant
Bit rate                                 : 1536000
Bit rate                                 : 1 536 kb/s
Encoded bit rate                         : 0
Encoded bit rate                         : 0 b/s
Channel(s)                               : 2
Channel(s)                               : 2 channels
Sampling rate                            : 48000
Sampling rate                            : 48.0 kHz
Samples count                            : 303360
Bit depth                                : 16
Bit depth                                : 16 bits
Delay                                    : 3755440
Delay                                    : 1 h 2 min
Delay                                    : 1 h 2 min 35 s 440 ms
Delay                                    : 1 h 2 min
Delay                                    : 01:02:35.440
Delay                                    : 01:02:35.440
Delay, origin                            : Stream
Delay, origin                            : Raw stream
Delay relative to video                  : 0
Delay relative to video                  : 00:00:00.000
Delay relative to video                  : 00:00:00.000
Stream size                              : 1213440
Stream size                              : 1.16 MiB (5%)
Stream size                              : 1 MiB
Stream size                              : 1.2 MiB
Stream size                              : 1.16 MiB
Stream size                              : 1.157 MiB
Stream size                              : 1.16 MiB (5%)
Proportion of this stream                : 0.05303
Encoded stream size                      : 0
Encoded stream size                      : 0.00 Byte (0%)
Encoded stream size                      :  Byte0
Encoded stream size                      : 0.0 Byte
Encoded stream size                      : 0.00 Byte
Encoded stream size                      : 0.000 Byte
Encoded stream size                      : 0.00 Byte (0%)
StreamSize_Encoded_Proportion            : 0.00000


FFmpeg/FFprobe/FFplayer report:
Input #0, avi, from 'c:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi':
  Duration: 00:00:06.32, start: 0.000000, bitrate: 28962 kb/s
  Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 25000 kb/s, 25 fps, 25 tbr, 25 tbn
  Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
  Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
MPlayer reports:
======= VIDEO Format ======
  biSize 40
  biWidth 720
  biHeight 576
  biPlanes 1
  biBitCount 0
  biCompression 1685288548='dvsd'
  biSizeImage 0
===========================
[lavf] stream 0: video (dvvideo), -vid 0
==> Found audio stream: 1
ID_AUDIO_ID=0
======= WAVE Format =======
Format Tag: 1 (0x1)
Channels: 2
Samplerate: 48000
avg byte/sec: 192000
Block align: 1
bits/sample: 16
cbSize: 0
==========================================================================
[lavf] stream 1: audio (pcm_s16le), -aid 0
==> Found audio stream: 2
ID_AUDIO_ID=1
======= WAVE Format =======
Format Tag: 1 (0x1)
Channels: 2
Samplerate: 32000
avg byte/sec: 128000
Block align: 1
bits/sample: 16
cbSize: 0
==========================================================================
[lavf] stream 2: audio (pcm_s16le), -aid 1
LAVF: 2 audio and 1 video streams found
LAVF: build 3868772
VIDEO:  [dvsd]  720x576  0bpp  25.000 fps  25000.0 kbps (3051.8 kbyte/s)
[V] filefmt:35  fourcc:0x64737664  size:720x576  fps:25.000  ftime:=0.0400
==========================================================================
First time mediainfo reports less audio streams than libav/ffmpeg -> looking into it.

Cu Selur
Reply
#5
Okay,. reading https://forum.videohelp.com/threads/3747...io-streams and testing that with your stream, I that FFmpeg still only copies the first audio stream.
Reply
#6
Seems like ffmpeg's (default?) behavior is to copy the first audio track only. I can get it to copy both audio streams by using:
-map 0
The sample rate is preserved. I can also force the first stream to be skipped like this:
-map 0 -map -0:a:0
Sadly, in both cases the "correct" audio stream is silent, apart from a high-pitched buzz at the start...
Reply
#7
Okay, I know what happens.
MediaInfo reports a sample rate of 48khz for the first audio stream and nothing for the second. Hybrid then assumes that the audio sample rate is 48kHz, which is above what is reported by ffmpeg, and thus Hybrid uses the 48kHz.
I changed that for testing, but like you when using:
to extract the first
ffmpeg -y -threads 8 -i "C:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi" -map 0:1 -vn -sn -ac 2 -ar 48000 -acodec pcm_s16le -f wav -map_metadata -1 -metadata encoding_tool="Hybrid 2022.07.08.1" "E:\Temp\iId_1_aid_0_2022-07-11@16_07_06_7110_01.wav"
and
ffmpeg -y -threads 8 -i "C:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi" -map 0:2 -vn -sn -ac 2 -ar 32000 -acodec pcm_s16le -f wav -map_metadata -1 -metadata encoding_tool="Hybrid 2022.07.08.1" "E:\Temp\iId_2_aid_1_2022-07-11@16_07_06_7110_02.wav"
two extract the second audio, the second audio is silent.

Not sure I should keep my workaround for this case or .avi files. Especially when the second audio when extracted should be the same as the first.

Cu Selur
Reply
#8
I'm actually not too sure anymore whether my videos are really Type-2 (one interleaved audio stream, one separate). I tried the tool dvdate (https://paulglagla.com/en/dvdate-2/), and it identified the file as Type-1. When I converted it to Type-2 using dvdate, it identifies like this:

Input #0, avi, from '2005-08-14 14.19.26_type2.avi':
  Duration: 00:00:52.04, start: 0.000000, bitrate: 30029 kb/s
  Stream #0:0: Video: dvvideo (dvsd / 0x64737664), yuv420p, 720x576 [SAR 16:15 DAR 4:3], 28822 kb/s, 25 fps, 25 tbr, 25 tbn
  Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 32000 Hz, 2 channels, s16, 1024 kb/s

Then I converted the Type-2 back to Type-1 using dvdate, and it looks like this:

Input #0, avi, from '2005-08-14 14.19.26_type2_type1.avi':
  Duration: 00:00:52.04, start: 0.000000, bitrate: 28890 kb/s
  Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 25000 kb/s, 25 fps, 25 tbr, 25 tbn
  Stream #0:1: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
  Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s

Both files produced by dvdate work wonders using ffmpeg and Hybrid. I think in my video's case and Type-1 in general, the second audio stream is actually not a real audio stream, but just the timecode and metadata for the interleaved audio stream, and this is why extracting it while dropping the interleaved stream causes high pitched buzz and no sound. I suspect what Hybrid and ffmpeg need to do, and what VLC Player does, is use the metadata from the second stream and apply it on the content of the first/interleaved stream. Not sure how feasible this is, or whether it's worth the effort at all, since it sounds like a super edge case.
Reply
#9
Not feasable for Hybrid, unless there's an option in FFmpeg to do this.

Cu Selur
Reply
#10
I agree. I think the main issue here is MediaInfo's and ffmpeg's handling of Type-1 AVIs. MediaInfo reports one audio stream (the real/interleaved stream), while the latter reports two streams (the interleaved stream and the timecode/metadata stream). Both seem unable to apply the timecode/metadata stream's sample rate onto the real/interleaved stream. Only VLC player appears to do that. Honestly though, this is an extreme edge case, since in almost all other videos the metadata of both streams matches anyway. I worked around this issue by converting my problematic videos back and forth using dvdate, which properly applies the correct sample rate to both audio streams. Then, I simply do "auto add (first)" in Hybrid, and I am done.

Fun discovery: Even with the proper Type-1 AVI produced by dvdate, the second audio stream is silent. This confirms my theory that it does not contain any audio content, but merely metadata.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)