This forum uses cookies

mogobime · 05.01.2023, 03:23

A few comparisions between KNLMeansCL(Strength increased by 50% from 1,2 to 1,80) and FFT3DGPU (Sigma reduced by 35% from 2.0 to 1.3) - both filtering YUV:

1080p:
KNLMeansCL: 27.14 fps, 7312.13 kbps
FFT3DGPU mode=2 precision=0: 57.18 fps (+111% faster), 6618.81 kbps (-9,5%)

2160p (downscaled by lanczos to 1080p after filtering):
KNLMeansCL 7.49 fps, 7143.20 kbps
ConvertBits(10) FFT3DGPU mode=2 precision=2: 11.98 fps (+60% faster), 6369.00 kbps (-10,8%)
ConvertBits(10) FFT3DGPU mode=1 precision=2: 15.47 fps (+107% faster), 6495.53 kbps (-9,1%)

All described tunings added to FFT3DGPU ( ConvertBits(10) @ 4K, bw+bh=64, wintype=2 and Prefetch(1,7) )

Encoder settings:

NVEncC (x64) 7.06 (r2388) by rigaya, Dec 10 2022 12:26:56 (VC 1929/Win)

OS Version     Windows 10 x64 (19043) [UTF-8]

CPU            AMD FX(tm)-8350 Eight-Core Processor [4.54GHz] (4C/8T)

GPU            #0: NVIDIA RTX A2000 (3328 cores, 1200 MHz)[PCIe2x16][527.27]

NVENC / CUDA   NVENC API 12.0, CUDA 12.0, schedule mode: sync

Input Buffers  CUDA, 46 frames

Input Info     y4m(yv12(10bit))->p010 [SSE2], 1920x1080, 60000/1001 fps

Vpp Filters    copyHtoD

Output Info    H.265/HEVC main10 @ Level 6.2

1920x1080p 1:1 59.940fps (60000/1001fps)

Encoder Preset quality

Rate Control   VBR

Multipass      none

Bitrate        0 kbps (Max: 768000 kbps)

Target Quality 25.75

Initial QP     I:20  P:23  B:25

QP Offset      cb:0  cr:0

VBV buf size   auto

Lookahead      on, 32 frames, Adaptive I, B Insert

GOP length     600 frames

B frames       5 frames [ref mode: middle]

Ref frames     7 frames, MultiRef L0:6 L1:2

AQ             on

CU max / min   auto / auto

VUI            matrix:bt709,range:limited

Others         mv:Q-pel

Some suboptimal FFT3DGPU default settings:
2160p:
FFT3DGPU untuned default with banding (no additional prefetch(1,7), no ConvertBits(10)), fastest setting without picture errors, mode=1, precision=1, bw+bh=32, wintype=1:
12.23 fps, 6451.48 kbps (improved mode 1 is 26,5% faster)

FFT3DGPU untuned default with banding (no additional prefetch(1,7), no ConvertBits(10)), fastest setting without picture errors, mode=1, precision=2, bw+bh=32, wintype=1 (precision=2 doesn't reduce banding, only "0" does->grid errors):
11.94 fps, 6456.07 kbps (improved mode 1 with ConvertBits(10), prefetch(1,7) and bw+bh=64 is 29,6% faster)

1080p:
FFT3DGPU untuned mode=2, precision=0 (no additional prefetch(1,7), bw+bh=32, wintype=1):
29.66 fps, 4372.59 kbps (improved mode 2 with prefetch(1,7) and bw+bh=64 is 92,8% faster)

->Prefetch becomes more important, if other multithreaded filters that need more resources like a simple resizer follow after FFT3DGPU.

Login
Username:
Password:	Lost Password?
	Remember me