This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff?
#2
A few comparisions between KNLMeansCL(Strength increased by 50% from 1,2 to 1,80) and FFT3DGPU (Sigma reduced by 35% from 2.0 to 1.3) - both filtering YUV:

1080p:
KNLMeansCL: 27.14 fps, 7312.13 kbps
FFT3DGPU mode=2 precision=0: 57.18 fps (+111% faster), 6618.81 kbps (-9,5%)

2160p (downscaled by lanczos to 1080p after filtering):
KNLMeansCL 7.49 fps, 7143.20 kbps
ConvertBits(10) FFT3DGPU mode=2 precision=2: 11.98 fps (+60% faster), 6369.00 kbps (-10,8%)
ConvertBits(10) FFT3DGPU mode=1 precision=2: 15.47 fps (+107% faster), 6495.53 kbps (-9,1%)

All described tunings added to FFT3DGPU ( ConvertBits(10) @ 4K, bw+bh=64, wintype=2 and Prefetch(1,7) )

Encoder settings:
NVEncC (x64) 7.06 (r2388) by rigaya, Dec 10 2022 12:26:56 (VC 1929/Win) OS Version    Windows 10 x64 (19043) [UTF-8] CPU            AMD FX(tm)-8350 Eight-Core Processor [4.54GHz] (4C/8T) GPU            #0: NVIDIA RTX A2000 (3328 cores, 1200 MHz)[PCIe2x16][527.27] NVENC / CUDA  NVENC API 12.0, CUDA 12.0, schedule mode: sync Input Buffers  CUDA, 46 frames Input Info    y4m(yv12(10bit))->p010 [SSE2], 1920x1080, 60000/1001 fps Vpp Filters    copyHtoD Output Info    H.265/HEVC main10 @ Level 6.2 1920x1080p 1:1 59.940fps (60000/1001fps) Encoder Preset quality Rate Control  VBR Multipass      none Bitrate        0 kbps (Max: 768000 kbps) Target Quality 25.75 Initial QP    I:20  P:23  B:25 QP Offset      cb:0  cr:0 VBV buf size  auto Lookahead      on, 32 frames, Adaptive I, B Insert GOP length    600 frames B frames      5 frames [ref mode: middle] Ref frames    7 frames, MultiRef L0:6 L1:2 AQ            on CU max / min  auto / auto VUI            matrix:bt709,range:limited Others        mv:Q-pel

Some suboptimal FFT3DGPU default settings:
2160p:
FFT3DGPU untuned default with banding (no additional prefetch(1,7), no ConvertBits(10)), fastest setting without picture errors, mode=1, precision=1, bw+bh=32, wintype=1:
12.23 fps, 6451.48 kbps (improved mode 1 is 26,5% faster)

FFT3DGPU untuned default with banding (no additional prefetch(1,7), no ConvertBits(10)), fastest setting without picture errors, mode=1, precision=2, bw+bh=32, wintype=1 (precision=2 doesn't reduce banding, only "0" does->grid errors):
11.94 fps, 6456.07 kbps (improved mode 1 with ConvertBits(10), prefetch(1,7) and bw+bh=64 is 29,6% faster)

1080p:
FFT3DGPU untuned mode=2, precision=0 (no additional prefetch(1,7), bw+bh=32, wintype=1):
29.66 fps, 4372.59 kbps (improved mode 2 with prefetch(1,7) and bw+bh=64 is 92,8% faster)

->Prefetch becomes more important, if other multithreaded filters that need more resources like a simple resizer follow after FFT3DGPU.
Reply


Messages In This Thread
RE: Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff? - by mogobime - 05.01.2023, 03:23

Forum Jump:


Users browsing this thread: 1 Guest(s)