Selur's Little Message Board
[HELP] perf. RTX 3060 ti slower than GTX 1070 - NVEnc - Printable Version

+- Selur's Little Message Board (https://forum.selur.net)
+-- Forum: Hybrid - Support (https://forum.selur.net/forum-1.html)
+--- Forum: Problems & Questions (https://forum.selur.net/forum-3.html)
+--- Thread: [HELP] perf. RTX 3060 ti slower than GTX 1070 - NVEnc (/thread-2232.html)

Pages: 1 2 3 4 5


RE: perf. RTX 3060 ti slower than GTX 1070 - NVEnc - Dan64 - 11.01.2022

(11.01.2022, 06:31)Selur Wrote:
Quote:in my opinion, the optimal preset to use with NVEnc and RTX 3060 is P3.
Following your data and since ".. P1, P2 and P3 are essentially the same .." you could also say that P1 is optimal for you.  Tongue

As a side notes:
  • I hope you are aware that if you did your testings only for one sample it's probably only true for that sample. Wink
  • Both psrn and ssim doen't always align with human perception.

I performed the same tests with other movies, and the figures obtained are different for every movie, but was does not change is the fact that the quality and size of movie encoded with P1, P2 and P3 are the same. I selected P3 hoping that a day it will be implemented some difference in the encoding of P1, P2 and P3, in this case P3 is the more conservative choice. Another common behavior across all the encoded movies is that the encoding BitRate is a decreasing function of preset (i.e. P1 has higher BitRate of P4 and P4 has an higher BitRate of P5 and so on...). But for some movie I found that P1, P2 and P3 don't only have the higher BitRate but also the best figures of SSIM and PSNR. Of course you are free to perform your own test to make you your opinion.  Tongue


RE: perf. RTX 3060 ti slower than GTX 1070 - NVEnc - mimile - 13.01.2022

Hello,

for testing, I would like to know what the 3 values of the CQP corresponding ?

thank you


RE: perf. RTX 3060 ti slower than GTX 1070 - NVEnc - Selur - 13.01.2022

Quantizer used for I-/P-/B- frames, usuallay it is recommended to set the QP value to be I < P < B.

Cu Selur


RE: perf. RTX 3060 ti slower than GTX 1070 - NVEnc - Dan64 - 15.01.2022

Hello,

  you can find a short description of frames: I, B, P at the following link:

    https://en.wikipedia.org/wiki/Video_compression_picture_types
  
  as you can guess, most of encoding standards try to improve the compression and quality of these frames to improve the overall compression of the video. Since Rigaya is using the NVDIA video engine, you can find the features available in the NVIDIA video encoder at the following link:

    https://docs.nvidia.com/video-technologies/video-codec-sdk/nvenc-application-note/
 
  as you can see the "HEVC B frame" is supported only by Ampere and Turing GPUs.

  The rate controls provided by NVIDIA video engine are described at the following link:

    https://docs.nvidia.com/video-technologies/video-codec-sdk/nvenc-video-encoder-api-prog-guide/#rate-control
  
   The Rigaya's NVenc define as "Constant Quality" the rate control called by NVDIA as "Target Quality" it is a rate control similar to CRF (constant rate factor) available in x265. The NVEnc "Constant Quantizer" is the rate control called by NVIDIA "Constant QP". In this case all the frames are encoded using the same "quantization", you can find a description of this video quality parameter at the following link: https://en.wikipedia.org/wiki/Quantization_(image_processing)
   In NVEnc you can define a quantization value for each of the frames: I, B, P. Unfortunately it is missing a simply rule that allows to map the Target Quality with the Quantization.

   As explained at the following link: https://slhck.info/video/2017/02/24/crf-guide.html
  
   CQP is a “constant quality” encoding mode, as opposed to constant bitrate (CBR). Typically you would achieve constant quality by compressing every frame of the same type the same amount, that is, throwing away the same (relative) amount of information. In tech terminology, you maintain a constant QP (quantization parameter). The quantization parameter defines how much information to discard from a given block of pixels (a Macroblock). This typically leads to a hugely varying bitrate over the entire sequence. The Constant Rate Factor (CRF) is a little more sophisticated than that. It will compress different frames by different amounts, thus varying the QP as necessary to maintain a certain level of perceived quality. It does this by taking motion into account. A constant QP encode at QP=18 will stay at QP=18 regardless of the frame (there is some small offset for different frame types, but it is negligible here). Constant Rate Factor at CRF=18 will increase the QP to, say, 20, for high motion frames (compressing them more) and lower it down to 16 for low motion parts of the sequence. This will essentially change the bitrate allocation over time.

   
   Constant QP is very good to evaluate new tools, and that’s how MPEG evaluate coding tools for new standards - they don’t want to take into account your rate control allocation algorithm. CRF maintains quality by varying bit rate as needed.


RE: perf. RTX 3060 ti slower than GTX 1070 - NVEnc - Dan64 - 15.01.2022

To provide an example of difference between CQP and CRF, I encoded to 1080p the movie "Big Buck Bunny" using the x265 encoder with CRF=22 which is the suggested setting for HD movies. In the table below you can find the output quality parameters calculated by x265.

[13:53:05] work: average encoding speed for job is 58.058235 fps
[13:53:05] comb detect: heavy 4 | light 561 | uncombed 13750 | total 14315
[13:53:05] decomb: deinterlaced 4 | blended 561 | unfiltered 13750 | total 14315
[13:53:05] vfr: 14315 frames output, 0 dropped and 0 duped for CFR/PFR
[13:53:05] vfr: lost time: 0 (0 frames)
[13:53:05] vfr: gained time: 0 (0 frames) (0 not accounted for)
[13:53:05] hevc-decoder done: 14315 frames, 0 decoder errors
[13:53:05] sync: got 14315 frames, 14315 expected
[13:53:05] sync: framerate min 24.000 fps, max 24.000 fps, avg 24.000 fps
x265 [info]: frame I:    152, Avg QP:19.95  kb/s: 30377.15
x265 [info]: frame P:   2865, Avg QP:22.34  kb/s: 7530.22
x265 [info]: frame B:  11298, Avg QP:28.35  kb/s: 1030.20
x265 [info]: Weighted P-Frames: Y:4.7% UV:3.4%
x265 [info]: consecutive B-frames: 5.1% 0.6% 0.8% 1.7% 91.8%
encoded 14315 frames in 247.50s (57.84 fps), 2642.73 kb/s, Avg QP:27.06
 
   As you can see even if I set CRF=22 the Avg QP of the movie is 27.06 with an average BitRate of 2642.73 kb/s. So the average quantizer of the movie is well above the quality of 22 defined for the encoding. For comparison I encoded the same movie with NVenc using the "constant quality" of 27 (near the avg CQP). I put the log in the table below

Output Info    H.265/HEVC main10 @ Level auto
1920x1080p 1:1 24.000fps (24/1fps)
Encoder Preset default
Rate Control   VBR
Multipass      2pass-full
Bitrate        0 kbps (Max: 240000 kbps)
Target Quality 27.00
Initial QP     I:20  P:23  B:25
QP Offset      cb:0  cr:0
VBV buf size   auto
Lookahead      on, 16 frames, Adaptive I, B Insert
GOP length     240 frames
B frames       4 frames [ref mode: each]
Ref frames     4 frames, MultiRef L0:auto L1:auto
AQ             on
CU max / min   auto / auto
VUI            matrix:bt709
Others         mv:Q-pel nonrefp
encoded 14315 frames, 300.17 fps, 2788.63 kbps, 198.28 MB
encode time 0:00:47, CPU: 3.7, GPU: 42.4, VE: 95.1, VD: 85.9, GPUClock: 1878MHz, VEClock: 1657MHz
frame type IDR   137
frame type I     137,  total size   16.71 MB
frame type P    3322,  total size  104.46 MB
frame type B   10856,  total size   77.11 MB
ssim/psnr/vmaf: SSIM YUV: 0.993079 (21.598220), 0.992116 (21.032410), 0.993133 (21.632587), All: 0.992927 (21.504229), (Frames: 14315)
ssim/psnr/vmaf: PSNR YUV: 45.265175, 48.234129, 49.389327, Avg: 46.152116, (Frames: 14315)

  The average BitRate in this case is 2,788.63 kb/s. Unfortunately it is not available the Avg QP, but are available the SSIM and PSNR.
  Just to check if I can match the Avg QP of x265, I encoded again the movie using the "constant quantizer" with CQP: 20-22-28, which are near to the Avg QP calculated by x265, the encoding output is the following.

Output Info    H.265/HEVC main10 @ Level auto
1920x1080p 1:1 24.000fps (24/1fps)
Encoder Preset default
Rate Control   CQP  I:20  P:22  B:28
ChromaQPOffset cb:0  cr:0
Lookahead      on, 16 frames, Adaptive I, B Insert
GOP length     240 frames
B frames       4 frames [ref mode: each]
Ref frames     4 frames, MultiRef L0:auto L1:auto
AQ             off
CU max / min   auto / auto
VUI            matrix:bt709
Others         mv:Q-pel nonrefp
encoded 14315 frames, 321.76 fps, 2600.68 kbps, 184.92 MB
encode time 0:00:44, CPU: 3.4, GPU: 42.6, VE: 90.6, VD: 93.5, GPUClock: 1860MHz, VEClock: 1645MHz
frame type IDR   137
frame type I     137,  total size   19.77 MB
frame type P    3322,  total size  112.50 MB
frame type B   10856,  total size   52.65 MB
ssim/psnr/vmaf: SSIM YUV: 0.992313 (21.142605), 0.992405 (21.194449), 0.993497 (21.869194), All: 0.992526 (21.264396), (Frames: 14315)
ssim/psnr/vmaf: PSNR YUV: 45.414124, 48.511711, 49.641120, Avg: 46.322234, (Frames: 14315)

 Now the average BitRate is 2600.68 kb/s which is near to 2642.73 kb/s obtained with x265 and CRF=22. The problem is that is not possible to know in advance the optimal values for CQP, so using the "constant quality" (aka CRF) with NVenc is probably the best option.


RE: perf. RTX 3060 ti slower than GTX 1070 - NVEnc - Dan64 - 15.01.2022

Just to complete the comparison, in the chart below you can find the value of Netflix VMAF for the movies encoded with x265 (CRF=22) and NVenc with CQ=27 (aka CRF=27).

   [Image: VMAF-CRF22-vs-CQ27-small.jpg]
   I think that VMAF is the best metric to compare rate factors that try to improve the perceived quality. The VMF obtained by NVEnc of 95.28 is very near to VMF of 95.31 obtained with the x265 software encoder. So at cost of an increase of about 5% in the BitRate, the NVEnc is able to encode at about 5x the speed of x265 with a comparable quality. Moreover the Constant Quality (aka CRF) of NVEnc is very near to the average "quantization" while it seems that the CRF used by x265 is not linked to the final average "quantization" of the encoded movie. As a rule of thumb it is necessary to increase of about 20% the CRF used by x265 to obtain the equivalent CQ value used by NVEnc.

    Another interesting chart is the following

[Image: VMAF-CQP-vs-CQ-small.jpg]

   In this case I compared the movie encoded with NVenc using CQP:20-22-28, with the movie encoded using Constant Quality (aka CRF) = 27.8. The reason why I used the CQ of 27.8 is because, it provides almost the same BitRate obtained by using CQP:20-22-28. In this case the VMAF obtained using the "constant quantizer" is 94.2, while the VMAF obtained with "constant quality" is 95.0. So despite that the size of encoded movies is almost the same, the quality obtained with the "constant quality" is better. The reason is that the CQP uses the same "quantization" for all the frames, while the "constant quality" uses a better algorithm and is able to increase the quality on frames where the details are more visible (bright frames) and lower the quality when they are less visible (dark frames), in this way this rate control is able to allocate better the "quantization". It is some kind of VBR 2-pass encoding where the target is not the size but the overall "quantization", probably this is the reason why NVIDIA called this rate control: "Target Quality".