This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

[BUG] Passthrough audio changes sample rate on multiple audio streams
#1
Hello,

I have some PAL SD videos captured from digital camera using Windows Media Player over FireWire. They are Type-2 AVIs (the audio stream is stored twice - once interleaved in the video, and once separately, where the interleaved stream is disabled). Here's a sample of ffprobe:

  Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 25000 kb/s, 25 fps, 25 tbr, 25 tbn   Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s   Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s

For some reason, the interleaved stream is at 48000 Hz (wrong!), but the real stream is correct at 32000 Hz. This shouldn't be a problem because the wrong stream is disabled anyway. When I encode with Hybrid using "passthrough all", I expect the streams to be preserved. However, it seems like instead, the second one gets resampled to 48000 Hz:

  Stream #0:0: Video: utvideo (ULY0 / 0x30594C55), yuv420p(tv, bt470bg/bt470bg/bt709), 720x576, SAR 16:15 DAR 4:3, 50 fps, 50 tbr, 1k tbn     Metadata:       ENCODER        : Lavc59.25.100 utvideo       BPS            : 117941433       DURATION        : 00:00:52.040000000       NUMBER_OF_FRAMES: 2602       NUMBER_OF_BYTES : 767209028       _STATISTICS_WRITING_APP: mkvmerge v67.0.0 ('Under Stars') 64-bit       _STATISTICS_WRITING_DATE_UTC: 2022-07-09 14:27:00       _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES   Stream #0:1: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s (default)     Metadata:       BPS            : 1536016       DURATION        : 00:00:34.757000000       NUMBER_OF_FRAMES: 869       NUMBER_OF_BYTES : 6673416       _STATISTICS_WRITING_APP: mkvmerge v67.0.0 ('Under Stars') 64-bit       _STATISTICS_WRITING_DATE_UTC: 2022-07-09 14:27:00       _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES   Stream #0:2: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s     Metadata:       BPS            : 1535994       DURATION        : 00:00:51.840000000       NUMBER_OF_FRAMES: 1296       NUMBER_OF_BYTES : 9953244       _STATISTICS_WRITING_APP: mkvmerge v67.0.0 ('Under Stars') 64-bit       _STATISTICS_WRITING_DATE_UTC: 2022-07-09 14:27:00       _STATISTICS_TAGS: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES

As you would expect, this resampling completely messes the audio when played back.

Out of curiosity, why does Hybrid extract the audio streams and them merge them back in? This sounds like an error-prone process.

Actually, even if you fix the "passthrough" issue, so both streams keep their sample rate, I think that would still break my case because when encoding the final video, Hybrid won't set the second stream as default/the first one as disabled. Would it be possible to just add audio "auto add (last)" the same way there is "auto add (first)"? This will be a nice trick for all Type-2 AVIs. I was planning to work around this issue by using the mkvmerge custom option "--audio-tracks 1" and select the second stream, but that fails since Hybrid already uses "--no-audio" due to the extraction mechanism above.
The reason why I can't just manually make these selections is because I am running an automated pipeline with zero user interaction.

My system:
- Windows 11
- Hybrid 2022.07.01.1

Thank you!
Reply
#2
Quote:Out of curiosity, why does Hybrid extract the audio streams and them merge them back in? This sounds like an error-prone process.
Hybrid does process audio and video separately since it offers way more varity in how to deal with the audio and video.
(normally works fine, but I haven't had to work with 32Hz audio for 10+years Wink)

Quote:Would it be possible to just add audio "auto add (last)" the same way there is "auto add (first)"?
Possible in theory, sure. But sorry this won't come soon. (Haven't changed anything in the general auto audio routines for ages, would need some time to re-read how everything is intertwined)
(side note: Hybrid does allow filtering by language, if the audio is tagged accordingly, which your streams doe not seem to be)

If you can share a small sample of such a source I can try to reproduce and fix the sample rate issue.

Quote:They are Type-2 AVIs (the audio stream is stored twice - once interleaved in the video, and once separately, where the interleaved stream is disabled).
Out of curiousity: Who does that and why?

Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#3
Quote:Out of curiousity: Who does that and why?

Type-1 and Type-2 AVI dates to the times when computers were severely underpowered. Here is a very good explanation by Microsoft themselves:

Quote:DV cameras produce interleaved audio-video; each frame of video also contains the audio information. If you save DV data to an AVI file, you have a choice:

- Store the interleaved data as one stream in the AVI file. This is known as a type-1 file.
- Split the interleaved data into separate audio and video streams. This is known as a type-2 file.

For video capture, where maximum throughput is crucial, it is better to use a type-1 file, because type-2 files carry redundant audio data. (The video stream still has the audio data. The audio is simply hidden by labeling the stream as video.) Also, writing a type-2 file requires some additional processor time to split the interleaved stream.

On the other hand, type-1 files are less efficient for real-time editing. The application must extract the audio from the interleaved stream, make the edits, and interleave the data again. Also, the type-1 format is not compatible with Microsoft® Video for Windows® (VFW). DirectShow can handle both types of files.

Source: https://docs.microsoft.com/en-us/windows...-avi-files

Quote:If you can share a small sample of such a source I can try to reproduce and fix the sample rate issue.

A sample incoming to your PMs!

Thanks
Reply
#4
Okay, the problem seems to be that MediaInfo reports only one audio stream:
General Count : 331 Count of stream of this kind : 1 Kind of stream : General Kind of stream : General Stream identifier : 0 Count of video streams : 1 Count of audio streams : 1 Video_Format_List : DV Video_Format_WithHint_List : DV Codecs Video : DV Audio_Format_List : PCM Audio_Format_WithHint_List : PCM Audio codecs : PCM Complete name : C:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi Folder name : C:\Users\Selur\Desktop File name extension : multiaudio_32kHz_2006-04-02 17.24.00.avi File name : multiaudio_32kHz_2006-04-02 17.24.00 File extension : avi Format : AVI Format : AVI Format/Info : Audio Video Interleave Format/Extensions usually used : avi Commercial name : AVI DVCAM Commercial name : DVCAM Internet media type : video/vnd.avi File size : 22880256 File size : 21.8 MiB File size : 22 MiB File size : 22 MiB File size : 21.8 MiB File size : 21.82 MiB Duration : 6320 Duration : 6 s 320 ms Duration : 6 s 320 ms Duration : 6 s 320 ms Duration : 00:00:06.320 Duration : 00:00:06:08 Duration : 00:00:06.320 (00:00:06:08) Overall bit rate mode : CBR Overall bit rate mode : Constant Overall bit rate : 28962349 Overall bit rate : 29.0 Mb/s Frame rate : 25.000 Frame rate : 25.000 FPS Frame count : 158 Stream size : 128256 Stream size : 125 KiB (1%) Stream size : 125 KiB Stream size : 125 KiB Stream size : 125 KiB Stream size : 125.2 KiB Stream size : 125 KiB (1%) Proportion of this stream : 0.00561 Recorded date : 2009-07-12 19:57:55.000 File creation date : UTC 2022-07-10 04:47:11.314 File creation date (local) : 2022-07-10 06:47:11.314 File last modification date : UTC 2022-07-10 04:47:11.421 File last modification date (local) : 2022-07-10 06:47:11.421 Video Count : 381 Count of stream of this kind : 1 Kind of stream : Video Kind of stream : Video Stream identifier : 0 StreamOrder : 0 ID : 0 ID : 0 Format : DV Format : DV Commercial name : DVCAM Commercial name : DVCAM Internet media type : video/DV Duration : 6320 Duration : 6 s 320 ms Duration : 6 s 320 ms Duration : 6 s 320 ms Duration : 00:00:06.320 Duration : 00:00:06:08 Duration : 00:00:06.320 (00:00:06:08) Bit rate mode : CBR Bit rate mode : Constant Bit rate : 24441600 Bit rate : 24.4 Mb/s Encoded bit rate : 28800000 Encoded bit rate : 28.8 Mb/s Width : 720 Width : 720 pixels Height : 576 Height : 576 pixels Pixel aspect ratio : 1.067 Display aspect ratio : 1.333 Display aspect ratio : 4:3 Frame rate mode : CFR Frame rate mode : Constant Frame rate : 25.000 Frame rate : 25.000 FPS Frame count : 158 Standard : PAL Color space : YUV Chroma subsampling : 4:2:0 Chroma subsampling : 4:2:0 Bit depth : 8 Bit depth : 8 bits Scan type : Interlaced Scan type : Interlaced Scan order : BFF Scan order : Bottom Field First Compression mode : Lossy Compression mode : Lossy Bits/(Pixel*Frame) : 2.357 Delay : 3755440 Delay : 1 h 2 min Delay : 1 h 2 min 35 s 440 ms Delay : 1 h 2 min Delay : 01:02:35.440 Delay : 01:02:35:11 Delay : 01:02:35.440 (01:02:35:11) Delay_DropFrame : No Delay, origin : Stream Delay, origin : Raw stream Time code of first frame : 01:02:35:11 TimeCode_DropFrame : No Time code source : Subcode time code Stream size : 22752000 Stream size : 21.7 MiB (99%) Stream size : 22 MiB Stream size : 22 MiB Stream size : 21.7 MiB Stream size : 21.70 MiB Stream size : 21.7 MiB (99%) Proportion of this stream : 0.99439 Audio Count : 285 Count of stream of this kind : 1 Kind of stream : Audio Kind of stream : Audio Stream identifier : 0 ID : 0-0 ID : 0-0 Format : PCM Format : PCM Commercial name : PCM Format settings : Big / Signed Format settings, Endianness : Big Format settings, Sign : Signed Muxing mode : DV Muxing mode, more info : Muxed in Video #1 Duration : 6320 Duration : 6 s 320 ms Duration : 6 s 320 ms Duration : 6 s 320 ms Duration : 00:00:06.320 Duration : 00:00:06.320 Bit rate mode : CBR Bit rate mode : Constant Bit rate : 1536000 Bit rate : 1 536 kb/s Encoded bit rate : 0 Encoded bit rate : 0 b/s Channel(s) : 2 Channel(s) : 2 channels Sampling rate : 48000 Sampling rate : 48.0 kHz Samples count : 303360 Bit depth : 16 Bit depth : 16 bits Delay : 3755440 Delay : 1 h 2 min Delay : 1 h 2 min 35 s 440 ms Delay : 1 h 2 min Delay : 01:02:35.440 Delay : 01:02:35.440 Delay, origin : Stream Delay, origin : Raw stream Delay relative to video : 0 Delay relative to video : 00:00:00.000 Delay relative to video : 00:00:00.000 Stream size : 1213440 Stream size : 1.16 MiB (5%) Stream size : 1 MiB Stream size : 1.2 MiB Stream size : 1.16 MiB Stream size : 1.157 MiB Stream size : 1.16 MiB (5%) Proportion of this stream : 0.05303 Encoded stream size : 0 Encoded stream size : 0.00 Byte (0%) Encoded stream size : Byte0 Encoded stream size : 0.0 Byte Encoded stream size : 0.00 Byte Encoded stream size : 0.000 Byte Encoded stream size : 0.00 Byte (0%) StreamSize_Encoded_Proportion : 0.00000


FFmpeg/FFprobe/FFplayer report:
Input #0, avi, from 'c:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi': Duration: 00:00:06.32, start: 0.000000, bitrate: 28962 kb/s Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 25000 kb/s, 25 fps, 25 tbr, 25 tbn Stream #0:1: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s
MPlayer reports:
======= VIDEO Format ====== biSize 40 biWidth 720 biHeight 576 biPlanes 1 biBitCount 0 biCompression 1685288548='dvsd' biSizeImage 0 =========================== [lavf] stream 0: video (dvvideo), -vid 0 ==> Found audio stream: 1 ID_AUDIO_ID=0 ======= WAVE Format ======= Format Tag: 1 (0x1) Channels: 2 Samplerate: 48000 avg byte/sec: 192000 Block align: 1 bits/sample: 16 cbSize: 0 ========================================================================== [lavf] stream 1: audio (pcm_s16le), -aid 0 ==> Found audio stream: 2 ID_AUDIO_ID=1 ======= WAVE Format ======= Format Tag: 1 (0x1) Channels: 2 Samplerate: 32000 avg byte/sec: 128000 Block align: 1 bits/sample: 16 cbSize: 0 ========================================================================== [lavf] stream 2: audio (pcm_s16le), -aid 1 LAVF: 2 audio and 1 video streams found LAVF: build 3868772 VIDEO: [dvsd] 720x576 0bpp 25.000 fps 25000.0 kbps (3051.8 kbyte/s) [V] filefmt:35 fourcc:0x64737664 size:720x576 fps:25.000 ftime:=0.0400 ==========================================================================
First time mediainfo reports less audio streams than libav/ffmpeg -> looking into it.

Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#5
Okay,. reading https://forum.videohelp.com/threads/3747...io-streams and testing that with your stream, I that FFmpeg still only copies the first audio stream.
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#6
Seems like ffmpeg's (default?) behavior is to copy the first audio track only. I can get it to copy both audio streams by using:
-map 0
The sample rate is preserved. I can also force the first stream to be skipped like this:
-map 0 -map -0:a:0
Sadly, in both cases the "correct" audio stream is silent, apart from a high-pitched buzz at the start...
Reply
#7
Okay, I know what happens.
MediaInfo reports a sample rate of 48khz for the first audio stream and nothing for the second. Hybrid then assumes that the audio sample rate is 48kHz, which is above what is reported by ffmpeg, and thus Hybrid uses the 48kHz.
I changed that for testing, but like you when using:
to extract the first
ffmpeg -y -threads 8 -i "C:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi" -map 0:1 -vn -sn -ac 2 -ar 48000 -acodec pcm_s16le -f wav -map_metadata -1 -metadata encoding_tool="Hybrid 2022.07.08.1" "E:\Temp\iId_1_aid_0_2022-07-11@16_07_06_7110_01.wav"
and
ffmpeg -y -threads 8 -i "C:\Users\Selur\Desktop\multiaudio_32kHz_2006-04-02 17.24.00.avi" -map 0:2 -vn -sn -ac 2 -ar 32000 -acodec pcm_s16le -f wav -map_metadata -1 -metadata encoding_tool="Hybrid 2022.07.08.1" "E:\Temp\iId_2_aid_1_2022-07-11@16_07_06_7110_02.wav"
two extract the second audio, the second audio is silent.

Not sure I should keep my workaround for this case or .avi files. Especially when the second audio when extracted should be the same as the first.

Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#8
I'm actually not too sure anymore whether my videos are really Type-2 (one interleaved audio stream, one separate). I tried the tool dvdate (https://paulglagla.com/en/dvdate-2/), and it identified the file as Type-1. When I converted it to Type-2 using dvdate, it identifies like this:

Input #0, avi, from '2005-08-14 14.19.26_type2.avi':   Duration: 00:00:52.04, start: 0.000000, bitrate: 30029 kb/s   Stream #0:0: Video: dvvideo (dvsd / 0x64737664), yuv420p, 720x576 [SAR 16:15 DAR 4:3], 28822 kb/s, 25 fps, 25 tbr, 25 tbn   Stream #0:1: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 32000 Hz, 2 channels, s16, 1024 kb/s

Then I converted the Type-2 back to Type-1 using dvdate, and it looks like this:

Input #0, avi, from '2005-08-14 14.19.26_type2_type1.avi':   Duration: 00:00:52.04, start: 0.000000, bitrate: 28890 kb/s   Stream #0:0: Video: dvvideo, yuv420p, 720x576 [SAR 16:15 DAR 4:3], 25000 kb/s, 25 fps, 25 tbr, 25 tbn   Stream #0:1: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s   Stream #0:2: Audio: pcm_s16le, 32000 Hz, stereo, s16, 1024 kb/s

Both files produced by dvdate work wonders using ffmpeg and Hybrid. I think in my video's case and Type-1 in general, the second audio stream is actually not a real audio stream, but just the timecode and metadata for the interleaved audio stream, and this is why extracting it while dropping the interleaved stream causes high pitched buzz and no sound. I suspect what Hybrid and ffmpeg need to do, and what VLC Player does, is use the metadata from the second stream and apply it on the content of the first/interleaved stream. Not sure how feasible this is, or whether it's worth the effort at all, since it sounds like a super edge case.
Reply
#9
Not feasable for Hybrid, unless there's an option in FFmpeg to do this.

Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#10
I agree. I think the main issue here is MediaInfo's and ffmpeg's handling of Type-1 AVIs. MediaInfo reports one audio stream (the real/interleaved stream), while the latter reports two streams (the interleaved stream and the timecode/metadata stream). Both seem unable to apply the timecode/metadata stream's sample rate onto the real/interleaved stream. Only VLC player appears to do that. Honestly though, this is an extreme edge case, since in almost all other videos the metadata of both streams matches anyway. I worked around this issue by converting my problematic videos back and forth using dvdate, which properly applies the correct sample rate to both audio streams. Then, I simply do "auto add (first)" in Hybrid, and I am done.

Fun discovery: Even with the proper Type-1 AVI produced by dvdate, the second audio stream is silent. This confirms my theory that it does not contain any audio content, but merely metadata.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)