This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

[HELP] perf. RTX 3060 ti slower than GTX 1070 - NVEnc
#11
I just tested in multi GPU, the GTX 1070 is the GPU0 and RTX 3060ti is GPU1.

when I run the encoding, it runs on GPU 1 with the same performance as a single GPU (with RTX of course)

this is my last log encoding :



This log is only intended for user information. It should not be part of a bug/problem report!!
Detected NVIDIA PureVideo compatible cards: NVIDIA GeForce RTX 3060 Ti NVIDIA GeForce GTX 1070
Detected vfwDecoders woth 32bit:  VIDC.FFDS  VIDC.FPS1  VIDC.RTV1  vidc.DIVX  vidc.cvid  vidc.i420  vidc.iyuv  vidc.mrle  vidc.msvc  vidc.pDAD  vidc.uyvy  vidc.yuy2  vidc.yv12  vidc.yvu9  vidc.yvyu
Detected vfw64BitDecoders:  VIDC.FFDS  VIDC.FPS1  VIDC.RTV1  vidc.DIVX  vidc.cvid  vidc.i420  vidc.iyuv  vidc.mrle  vidc.msvc  vidc.pDAD  vidc.uyvy  vidc.yuy2  vidc.yv12  vidc.yvu9  vidc.yvyu
  Avisynth+ is available,..
    DGDecNV available,..
  Vapoursynth is available,..
    DGDecNV available,..
  added new job with id 2022-01-06@15_46_20_8610
Finished startup, finished after 17.051s
starting 2022-01-06@15_46_20_8610_01_audio@15:59:39.387 - J:\Download 10 To\4K UHD enc\The.Prote test 3.mkv
2022-01-06@15_46_20_8610_01_audio finished after 00:00:20.526
starting 2022-01-06@15_46_20_8610_02_video@15:59:59.927 - J:\Download 10 To\4K UHD enc\The.Prote test 3.mkv
2022-01-06@15_46_20_8610_02_video finished after 00:08:31.779
starting 2022-01-06@15_46_20_8610_05_muxing@16:08:31.720 - J:\Download 10 To\4K UHD enc\The.Prote test 3.mkv
2022-01-06@15_46_20_8610_05_muxing finished after 00:00:06.577
delete D:\Temp\mkvtags_2022-01-06@15_46_20_8610__03.xml
delete D:\Temp\The.Prote test 3_2022-01-06@15_46_20_8610_02.265
delete D:\Temp\2022-01-06@15_46_20_8610__04.chp
delete D:\Temp\iId_2_aid_0_lang_fr_DELAY_24ms_2022-01-06@15_46_20_8610_01.eac3
delete J:\Download 10 To\4K UHD enc\The.Protege.2021.2160p.UHD-001_id_2_lang_fr_forced.srt
Job 2022-01-06@15_46_20_8610 finished!


How to share the whole decoding chain, how and where do I get the info?
Reply
#12
Easiest way: create a debug output (you should have read the sticky and know how to do it)
More complicated:
a. check the sub jobs while disabling 'Minimize job command lines' and copy the _video subjob calls.
b. if you use Vapoursynth or Avisynth copy the scripts that are shown in the script preview

Cu Selur
Reply
#13
The speed figures that I provided are for standard 1080p movies.

Of course the encoding speed will decrease when are encoded Ultra-HD movies.

To give an example these are my encoding speed with my RTX 3060 obtained with Ultra-HD movies

Movie size: 4096x1744 -> speed: 104 fps
Movie size: 3840x2160 -> speed: 93 fps

I strongly suggest you, if you want to get the max speed to let NVenc to fully manage the video tasks.
To do that you must enable the checkbox "NVEnc->Harwdare->Only use encoder"
Reply
#14
lien vers le fichier debug : https://uptobox.com/qhklaejpoo1u

I don't use Vapoursynth or Avisynth

I know that the size of the video decreases the fps for encoding, which I don't understand is why the RTX is slower than the GTX with exactly the same settings.
Reply
#15
(06.01.2022, 21:12)mimile Wrote: lien vers le fichier debug : https://uptobox.com/qhklaejpoo1u

I don't use Vapoursynth or Avisynth

I know that the size of the video decreases the fps for encoding, which I don't understand is why the RTX is slower than the GTX with exactly the same settings.

Hybrid is using ffmpeg to send the video frames to NVenc. It is difficult to say if the problem is due to ffmpeg or to NVEnc.
Please try to perform the test again with the 2 cards, with the checkbox "NVEnc->Harwdare->Only use encoder" enabled.
In this way "ffmpeg" is not used and NVEnc will be the only software involved in the encoding.
Reply
#16
(06.01.2022, 21:24)Dan64 Wrote:
(06.01.2022, 21:12)mimile Wrote: lien vers le fichier debug : https://uptobox.com/qhklaejpoo1u

I don't use Vapoursynth or Avisynth

I know that the size of the video decreases the fps for encoding, which I don't understand is why the RTX is slower than the GTX with exactly the same settings.

Hybrid is using ffmpeg to send the video frames to NVenc. It is difficult to say if the problem is due to ffmpeg or to NVEnc.
Please try to perform the test again with the 2 cards, with the checkbox "NVEnc->Harwdare->Only use encoder" enabled.
In this way "ffmpeg" is not used and NVEnc will be the only software involved in the encoding.

this is what I just did, by checking the box it did not change anything with the rtx
Reply
#17
Looking at the debug output:
The decoding call is:
"C:\Program Files\Hybrid\64bit\ffmpeg.exe" -y -loglevel fatal -noautorotate -nostdin -threads 9 -i "J:\Download 10 To\Film 4K UHD\The.Protege.2021.2160p.UHD-001.mkv" -map 0:0 -an -sn -vf zscale=rangein=tv:range=tv -pix_fmt yuv420p10le -strict -1 -vsync 0 -f yuv4mpegpipe -
the encoding call is:
"C:\Program Files\Hybrid\64bit\NVEncC.exe" --y4m -i - --fps 23.976 --codec h265 --profile main10 --level 5.1 --tier high --sar 1:1 --lookahead 32 --output-depth 10 --vbrhq 19769 --max-bitrate 10000 --gop-len 0 --ref 3 --bframes 0 --no-b-adapt --mv-precision Q-pel --preset quality --colorrange limited --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020c --max-cll 1000,923 --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) --cuda-schedule sync --psnr --ssim --output "J:\Download 10 To\Film 4K UHD\The.Prote test 1_2022-01-06@19_41_20_0610_02.265"
NVEncC reports:
OS Version     Windows 8 x64 (9200) [UTF-8]
CPU            Intel Xeon(R) E5-2697 v3 @ 2.60GHz [TB: 3.40GHz] (14C/28T)
GPU            #0: NVIDIA GeForce RTX 3060 Ti (4864 cores, 1695 MHz)[PCIe3x16][497.29]
NVENC / CUDA   NVENC API 11.1, CUDA 11.5, schedule mode: sync
Input Buffers  CUDA, 41 frames
Input Info     y4m(yv12(10bit))->p010 [AVX2], 3840x2160, 24000/1001 fps
Vpp Filters    copyHtoD
               ssim psnr (yv12(10bit))
Output Info    H.265/HEVC main10 @ Level 5.1
               3840x2160p 1:1 23.976fps (24000/1001fps)
Encoder Preset quality
Rate Control   VBR
Multipass      2pass-full
Bitrate        19769 kbps (Max: 10000 kbps)
Target Quality auto
Initial QP     I:20  P:23  B:25
QP Offset      cb:0  cr:0
VBV buf size   auto
Lookahead      on, 32 frames, Adaptive I Insert
GOP length     240 frames
B frames       0 frames [ref mode: disabled]
Ref frames     3 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
VUI            matrix:bt2020c,colorprim:bt2020,transfer:smpte2084,range:limited
MasteringDisp  G(0.265000 0.690000) B(0.150000 0.060000) R(0.680000 0.320000)
               WP(0.312700 0.329000) L(1000.000000 0.000100)
MaxCLL/MaxFALL 1000/923
Others         mv:Q-pel repeat-headers
and
NVEnc output: 10541 frames: 21.38 fps, 9682 kb/s, GPU 7%, VE 100%, VD 5%

Options to speed up:
a. enable "NVEnc->Harwdare->Only use encoder" and recreate your job, changing setting without creating a new job does nothing.
(this way the video decoder chip will be used and the decoded content will directly be send to the encoder)
or
b. enable "Config->Input->Decoding->Use gpu for decoding"
(this way the video decoder chip will be used through ffmpeg and the decoded content then be processed and send to the encoder)

atm. the decoding is done with the ffmpeg software decoder (so your cpu).

Cu Selur
Reply
#18
I redid the test with your settings, nothing changes, the "video encoder" part is always used at 100% during encoding, the card fans also run faster, I have the impression that the RTX is restricted, is the LHR the cause? normally no...
I may have missed a parameter in windows ...
I feel like I bought an RTX for nothing ...
I will redo a last test with the gtx and send you the debug to compare
Reply
#19
I tested with the last reset nvidia drivers, and tested with the nvidia studio (511.09) drivers, it's still the same.

I redid the test with the GTX 1070 to compare the debug file, it turns out that it is slower with the "Use gpu for decoding" option, and without it, it behaves as before.

debug with the "Use gpu for decoding" option:
https://uptobox.com/0r5i6kf0feh0

debug without the "Use gpu for decoding" option:
https://uptobox.com/o5mr38jjc0xi

I don't know what to do with the RTX 3060 Ti ... pff
Reply
#20
Decode call:
"C:\Program Files\Hybrid\64bit\ffmpeg.exe" -y -loglevel fatal -noautorotate -nostdin -hwaccel_device 0 -hwaccel auto -threads 1 -i "J:\Download 10 To\Film 4K UHD\The.Protege.2021.2160p.UHD-001.mkv" -map 0:0 -an -sn -vf zscale=rangein=tv:range=tv -pix_fmt yuv420p10le -strict -1 -vsync 0 -f yuv4mpegpipe -
looks fine,
as does the encoding:
"C:\Program Files\Hybrid\64bit\NVEncC.exe" --y4m -i - --fps 23.976 --codec h265 --profile main10 --level 5.1 --tier high --sar 1:1 --lookahead 32 --output-depth 10 --vbrhq 19769 --max-bitrate 10000 --gop-len 0 --ref 3 --bframes 0 --no-b-adapt --mv-precision Q-pel --preset quality --colorrange limited --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020c --max-cll 1000,923 --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) --cuda-schedule sync --output "J:\Download 10 To\4K UHD enc\The.Prote test 1_2022-01-07@00_10_44_6810_02.265"
only option seems to be using "NVEnc->Harwdare->Only use encoder"

=> please create a debug output like written in the sticky that contains the analysis of the source, the part where you change the settings, the creation of the jobs and make sure to enable "NVEnc->Harwdare->Only use encoder".

Cu Selur
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)