This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

[HELP] perf. RTX 3060 ti slower than GTX 1070 - NVEnc
#31
I think you have the same problem as me while encoding, look at "video engine load" it is 99%, it is the same for me.  Wink

               

I'm not trying to benchmark my GPU.
In addition, a benchmark will not necessarily reflect the problem because it appears during encoding.
Reply
#32
Video engine load should be full since you use the video engine for encoding having it at a higher value only means more of the chip is used,... so usually that is not something to bother with.
To compare your results you should both use one command line and the same file for reencoding,....

Cu Selur
--- mainly offline 20.-26 of May ---
Reply
#33
(08.01.2022, 04:36)mimile Wrote: I think you have the same problem as me while encoding, look at "video engine load" it is 99%, it is the same for me.  Wink

When it is used CUDA, the encoding is performed by using the NVIDIA Video Engine. The capabilities reported by NVEnc are the capabilities reported by NVIDIA Video Engine, so it is right the it is used at 99%.

Moreover I think that using a benchmark it is the right why to compare your GTX 1070 with your RTX 3060Ti.

For example this is the benchmark of my RTX

[Image: Benchmark-GPU.png]

As you can see in this case the benchmark is 96.8%, not the best but good enough I don't have applied any overclock and this is the reason why is performing below potential. For comparison my old GTX 1070 was reported having a benchmark of about 78%.
Reply
#34
it's just Selur, in gpu-z video engine load does not reflect the reality of the use of encoder, I did the test with my GTX.

[Image: NgRWtYz]

So Dan64, I'll ask my question again, is your RTX 3060 100% (video encoder in Windows task manager) while encoding?

this is my bench to compare :

[Image: tqqQtPh]

That's what I thought, the bench is therefore useless and does not reflect my problem, I already know that my RTX is more powerful than the GTX in computing and in games.
So it is not logical that the GTX encodes 2x faster than the RTX with the same parameters, do we agree?

Selur rightly, we must compare with the same settings:

[Image: Th9xq04]

I'm using a UHD HDR10 video (2160p) of 7.34 min, 4.38 GB, it is the first minutes of the film "The Protégé", but I can't upload to uptobox ...
you can use a video of your choice which is of course equivalent.


my command line:

NVEnc --avhw -i -INPUT- --fps 23.976 --codec h265 --profile main10 --level 5.1 --tier high --sar 1:1 --lookahead 32 --output-depth 10 --vbrhq 19769 --max-bitrate 10000 --gop-len 0 --ref 3 --bframes 0 --no-b-adapt --mv-precision Q-pel --preset quality --colorrange limited --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020c --max-cll 1000,923 --master-display G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,1) --vpp-resize auto --output-res 3840x2160 --vpp-gauss disabled --cuda-schedule sync --keyfile GENERATED_KEY_FILE --output "J:\Download 10 To\4K UHD enc\The.Prote test 4.265"
Reply
#35
Hello,

  I cannot see the pictures that you posted.
  I suggest to use for the test "Big-Buck Demo" : https://4kmedia.org/big-buck-bunny-4k-demo/
  But at this point I think that is an issue for rigaya: https://github.com/rigaya/NVEnc/issues
Reply
#36
(09.01.2022, 01:35)Dan64 Wrote: Hello,

  I cannot see the pictures that you posted.
  I suggest to use for the test "Big-Buck Demo" : https://4kmedia.org/big-buck-bunny-4k-demo/
  But at this point I think that is an issue for rigaya: https://github.com/rigaya/NVEnc/issues

hello, 
Right click on the image and "open the image in a new tab" Wink
sorry, the image didn't want to load normally

ok for test with Big Buck Bunny with my setting, i'm unchek "use gpu for decoding" and not add audio track
adapt the settings a bit in the nvenc "misc" tab:

[Image: QDwJBn9]

test result with the GTX:
encoded 38072 frames, 52.49 fps, 8862.06 kbps, 670.35 MB
encode time 0:12:05, CPU: 0.1, GPU: 5.9, VE: 51.7, VD: 24.3

test result with the RTX :
encoded 38072 frames, 21.51 fps, 8703.42 kbps, 658.35 MB
encode time 0:29:29, CPU: 0.1, GPU: 2.4, VE: 100.0, VD: 9.8

https://github.com/rigaya/NVEnc/issues/280
an other person have the same problem and rigaya only offers to lower the quality ...
I tested and I have to use --preset P5 to go faster than the GTX but the gpu (video encode) is still at 100%

I will see if there is a other solution with Rigaya


Attached Files Thumbnail(s)
   
Reply
#37
Hello,

   The Rigya's answer says it all. You need to adjust your setting to match your RTX capabilities. For example the RTX is able to encode by using adaptive B-frames, while GTX 170 not, and this feature decrease the size with little impact on quality.
  
   I encoded the test movie using your settings and I obtained a speed of 20.37 fps (near to your result).

   [Image: Big-Buck-4-K-Base.png]

  As suggested by Rigaya it is better to adjust the CQP avoiding to use the VR 2pass encoding. I found that the following CQP values match your quality : 9-10-13

[Image: Big-Buck-4-K-Base-CQ.png]

  With this settings the speed increase to 37.41 fps. But as suggested by Rigaya it also necessary to change the preset to "default" as shown in the picture above, using this preset the speed increase to 101.63 fps, with a little decrease in PSNR (from 54.242 to 54.231) and a very little increase in SSIM (from 0.997560 to 0.997562).

  But I will never use your CQP because are very near to "placebo" values, this imply that you get a big size without a noticeable (using human eyes) improvement in quality. Moreover I will use the extra speed obtained, to enable the B-frames as shown in the picture below

  [Image: Big-Buck-4-K-Base-Frame.png]

    Using your implied CQP of 9-10-13 and the B-frames, the encoding size decrease to 61,253.99 kbps from 82,237.40. A decrease in size of about 25% with not noticeable decrease in quality (PSNR = 54.03, SSIM = 0.9975) and little decrease in speed (92 fps).

     Even if using the B-frames the size decreased significantly I will avoid to use near "placebo" values. The good in using the CQP is that you don't have to guess what is the best encoding "size" for a given movie since it depends on its content. Once you have decided what are your "acceptable" CQP settings, your are sure that all the movies will be encoded using the same quality. For example try to use the following CQP: 18-20-25 and try to see if there are noticeable differences respect to the near "placebo" values: 9-10-13.

Good Luck! Smile
Reply
#38
Hello,
for now, i don't know how cqp works, i will find out about that.
but I prefer to stay in variable bitrate to encode my movies in order to "control" the output size, this way I know that I can put 3 or 4 movies on a blu ray while optimizing the space burned on the disc. Depending on the content of the film, I adapt its size with values that I have already calculated to get closer to 18000 kbit / s, that can be less or a little more.

according to Nvidia, it is "normal" for the rtx to work differently, here is the table for the new settings to be adopted:

https://docs.nvidia.com/video-technologi...ion-guide/

so I will use the "preset P5", but I still do not know if it is normal that the "video encoder" is at 100% during encoding ...
Reply
#39
Hello,

  I performed some test to see the impact of presets on speed/quality. In Hybrid it is possible to set the encoder so that are provided in output the values of SSIM and PSNR. The explanation of these metrics is outside the scope of this post, in summary they provide a quantitative measure of the encoded quality.

   In the table below I reported some summary statistic of the encoding for each available preset in NVEnc (with CQP= 18-20-25).

[Image: Big-Buck-4-K-Presets.png]

   As it is possible to see the presets P1, P2 and P3 are essentially the same since the BitRate, SSIM and PSNR are equal. Starting from the preset P4 (default) the speed, BitRate, SSIM and PSNR start to decrease. But while there is a significant drop in speed between P4 and P7 (quality) there is a little impact on the quality metrics SSIM and PSNR.

   The problem regarding these metrics is that it is difficult to say if a given figure is good or bad. For this reason I computed these metrics for 2 specific value of CQP: CQP= 1, which I consider the "placebo" value and represent the best available encoding quality; CQP = 50, which represent the worst possible quality. The calculated values are reported in the table below.

[Image: Big-Buck-4-K-Quantizer.png]

    As you can see with CQ=1 the BitRate increase to 186,965 kbps, while with CQ=50 drop to 567 kbps. Having defined the best and worst values, I calculated the following weighted figures:

W_SSIM = (SSIM - SSIM_worst) / (SSIM_placebo - SSIM_worst)
W_PSNR = (PSNR - PSNR_worst) / (PSNR_placebo - PSNR_worst)

    These values are in a range between 100% (best available quality) and 0% (worst quality). As it is possible to see in this case the P4 (default) preset has a strange behavior, since the quality metrics are a little lower than the performance presets. In summary given the above analysis, in my opinion, the optimal preset to use with NVEnc and RTX 3060 is P3.
Reply
#40
Quote:in my opinion, the optimal preset to use with NVEnc and RTX 3060 is P3.
Following your data and since ".. P1, P2 and P3 are essentially the same .." you could also say that P1 is optimal for you. Tongue

As a side notes:
  • I hope you are aware that if you did your testings only for one sample it's probably only true for that sample. Wink
  • Both psrn and ssim doen't always align with human perception.

Quote: As it is possible to see the presets P1, P2 and P3 are essentially the same since the BitRate, SSIM and PSNR are equal. Starting from the preset P4 (default) the speed, BitRate, SSIM and PSNR start to decrease. But while there is a significant drop in speed between P4 and P7 (quality) there is a little impact on the quality metrics SSIM and PSNR.
Your numbers do not really show that. SSIM changes by 0.0001 points, PSRN changes by 0.069, Q_SSIM changes by 0.16%, W_PSNR changes by 0.4%, that all seems rather minial changes.


Cu Selur
--- mainly offline 20.-26 of May ---
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)