The following warnings occurred:
Warning [2] Undefined array key "extra" - Line: 100 - File: inc/plugins/google_seo/url.php PHP 8.3.12-nmm1 (Linux)
File Line Function
/inc/class_error.php 153 errorHandler->error
/inc/plugins/google_seo/url.php 100 errorHandler->error_callback
/inc/plugins/google_seo.php 317 require_once
/inc/class_plugins.php 38 require_once
/inc/init.php 235 pluginSystem->load
/global.php 20 require_once
/printthread.php 16 require_once
Warning [2] Undefined variable $location - Line: 1250 - File: inc/plugins/google_seo/url.php PHP 8.3.12-nmm1 (Linux)
File Line Function
/inc/class_error.php 153 errorHandler->error
/inc/plugins/google_seo/url.php 1250 errorHandler->error_callback
/inc/plugins/google_seo/url.php 174 google_seo_url_hook
/inc/plugins/google_seo.php 317 require_once
/inc/class_plugins.php 38 require_once
/inc/init.php 235 pluginSystem->load
/global.php 20 require_once
/printthread.php 16 require_once
Warning [2] Trying to access array offset on null - Line: 14 - File: inc/plugins/cookielaw.php(272) : eval()'d code PHP 8.3.12-nmm1 (Linux)
File Line Function
/inc/class_error.php 153 errorHandler->error
/inc/plugins/cookielaw.php(272) : eval()'d code 14 errorHandler->error_callback
/inc/plugins/cookielaw.php 272 eval
/inc/class_plugins.php 142 cookielaw_global_intermediate
/global.php 100 pluginSystem->run_hooks
/printthread.php 16 require_once
Warning [2] Trying to access array offset on null - Line: 14 - File: inc/plugins/cookielaw.php(272) : eval()'d code PHP 8.3.12-nmm1 (Linux)
File Line Function
/inc/class_error.php 153 errorHandler->error
/inc/plugins/cookielaw.php(272) : eval()'d code 14 errorHandler->error_callback
/inc/plugins/cookielaw.php 272 eval
/inc/class_plugins.php 142 cookielaw_global_intermediate
/global.php 100 pluginSystem->run_hooks
/printthread.php 16 require_once



Selur's Little Message Board
Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff? - Printable Version

+- Selur's Little Message Board (https://forum.selur.net)
+-- Forum: Hybrid - Support (https://forum.selur.net/forum-1.html)
+--- Forum: Problems & Questions (https://forum.selur.net/forum-3.html)
+--- Thread: Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff? (/thread-3019.html)



Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff? - mogobime - 04.01.2023

Hi,

unfortunately, the only mode of operation with fft3dgpu that produces almost no banding is mode 2 with precision 0 (and a border of 2-4 and best wintype=2 and sharpen turned off).
As you know, precision=0 does not work properly with 4K stuff in any mode and creates grid artefacts that destroy the material. Therefore you have to live with the banding of precision=1 / 2, especially if you use higher sigma values than 1.5.

However, I have found a quite good solution to the problem. Apparently, the 32 bit floats only seem to handle 10 bit input reasonably, so a simple prefix "ConvertBits(10)" is enough to fight the banding. This works in mode 1 as well as in mode 2.

Even if you then convert back to 8 bit, the banding is reduced, if you encode with 10 bit it is practically eradicated and you achieve unimagined quality and significantly improved compressibility, even if you use low sigma values like 1.3, which hardly causes any loss of image sharpness.
With 4K material, the speed loss is only about 10-15% compared to precision=2 and is therefore tolerable. With material up to WQHD (1440p) it is somewhat higher, but there, as I said, mode=2 works with precision=0, which also hardly produces any banding.

1.) I think a "HQ 10 bit 4K filtering" or similar option, that explains the problem with grid artefacts and banding, would be a good thing?!?
2.) Also an option to set prefetch(1,2) - prefetch(1,7) after the FTFT3DGPU call would boost speed significantly in many use cases, for example when prefetch is set to 2-5 at the end of the script.
3.) Also bw + bh=64 would be the better default setting for the banding free mode 2. Increases speed by almost 100% for me at 1080p and 500% at 4K with ConvertBits(10)! Generally mode 1+2 seem to benefit from bw+bh=64. Maybe this could be realized with an "auto" setting which switches to bw + bh=64 when >WQHD stuff is detected or generally if mode 2 is active (it doesn't necessarily have to be the default setting).

With the explained tricks, FFT3DGPU achieves at least the quality of KNLMeansCL with significantly higher speed and better compressibility of the video material. It would be a shame if the vast majority of users were deprived of this! Smile


RE: Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff? - mogobime - 05.01.2023

A few comparisions between KNLMeansCL(Strength increased by 50% from 1,2 to 1,80) and FFT3DGPU (Sigma reduced by 35% from 2.0 to 1.3) - both filtering YUV:

1080p:
KNLMeansCL: 27.14 fps, 7312.13 kbps
FFT3DGPU mode=2 precision=0: 57.18 fps (+111% faster), 6618.81 kbps (-9,5%)

2160p (downscaled by lanczos to 1080p after filtering):
KNLMeansCL 7.49 fps, 7143.20 kbps
ConvertBits(10) FFT3DGPU mode=2 precision=2: 11.98 fps (+60% faster), 6369.00 kbps (-10,8%)
ConvertBits(10) FFT3DGPU mode=1 precision=2: 15.47 fps (+107% faster), 6495.53 kbps (-9,1%)

All described tunings added to FFT3DGPU ( ConvertBits(10) @ 4K, bw+bh=64, wintype=2 and Prefetch(1,7) )

Encoder settings:
NVEncC (x64) 7.06 (r2388) by rigaya, Dec 10 2022 12:26:56 (VC 1929/Win)
OS Version     Windows 10 x64 (19043) [UTF-8]
CPU            AMD FX(tm)-8350 Eight-Core Processor [4.54GHz] (4C/8T)
GPU            #0: NVIDIA RTX A2000 (3328 cores, 1200 MHz)[PCIe2x16][527.27]
NVENC / CUDA   NVENC API 12.0, CUDA 12.0, schedule mode: sync
Input Buffers  CUDA, 46 frames
Input Info     y4m(yv12(10bit))->p010 [SSE2], 1920x1080, 60000/1001 fps
Vpp Filters    copyHtoD
Output Info    H.265/HEVC main10 @ Level 6.2
1920x1080p 1:1 59.940fps (60000/1001fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 768000 kbps)
Target Quality 25.75
Initial QP     I:20  P:23  B:25
QP Offset      cb:0  cr:0
VBV buf size   auto
Lookahead      on, 32 frames, Adaptive I, B Insert
GOP length     600 frames
B frames       5 frames [ref mode: middle]
Ref frames     7 frames, MultiRef L0:6 L1:2
AQ             on
CU max / min   auto / auto
VUI            matrix:bt709,range:limited
Others         mv:Q-pel

Some suboptimal FFT3DGPU default settings:
2160p:
FFT3DGPU untuned default with banding (no additional prefetch(1,7), no ConvertBits(10)), fastest setting without picture errors, mode=1, precision=1, bw+bh=32, wintype=1:
12.23 fps, 6451.48 kbps (improved mode 1 is 26,5% faster)

FFT3DGPU untuned default with banding (no additional prefetch(1,7), no ConvertBits(10)), fastest setting without picture errors, mode=1, precision=2, bw+bh=32, wintype=1 (precision=2 doesn't reduce banding, only "0" does->grid errors):
11.94 fps, 6456.07 kbps (improved mode 1 with ConvertBits(10), prefetch(1,7) and bw+bh=64 is 29,6% faster)

1080p:
FFT3DGPU untuned mode=2, precision=0 (no additional prefetch(1,7), bw+bh=32, wintype=1):
29.66 fps, 4372.59 kbps (improved mode 2 with prefetch(1,7) and bw+bh=64 is 92,8% faster)

->Prefetch becomes more important, if other multithreaded filters that need more resources like a simple resizer follow after FFT3DGPU.


RE: Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff? - Selur - 05.01.2023

Okay, so as a workaround atm. one would use a custom section with:
ConvertBit(10)
# bitdepth 10
in it (before the denoising). Wink

I'll look into adding a bitdepth option to FFFT3DGPU.

Cu
Selur


RE: Integrate FFT3DGPU 10 bit hack in Hybrid to reduce banding with 4K stuff? - Selur - 05.01.2023

send you a link to a dev version which allows to specify an input bit depth for FFT3DGPU

Cu
Selur