This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

Deoldify Vapoursynth filter
#41
This is something related to vs-deoldify only.
Your wrapper should have an option to set it, and always set it.
Reply
#42
This message is triggered by "numpy" which is used by Deoldify.

I will add an option in the filter to set the number of thread (default = 8).

It is just enough to add the following code in the filter

os.environ['NUMEXPR_MAX_THREADS'] = '8'

Dan
Reply
#43
top, going to bed now. Smile
Reply
#44
I tested vsPipe with "y4m"

D:\PProjects\vs-deoldify_dev>set NUMEXPR_MAX_THREADS=10

D:\PProjects\vs-deoldify_dev>"D:\Programs\Hybrid\64bit\Vapoursynth\vspipe.exe" "D:\PProjects\vs-deoldify_dev\encoding.vpy" - -c y4m   | "D:\Programs\Hybrid\64bit\x265.exe" --preset fast --input - --fps 24000/1001 --output-depth 10 --y4m --profile main10 --b-adapt 2 --crf 21.00 --psy-rd 2.00 --deblock=-1:-1 --psnr --ssim --range limited --sar 1:1 --output "D:\PProjects\vs-deoldify_dev\VideoTest1_720p.265"

y4m  [info]: 1280x692 fps 24000/1001 i420p10 sar 1:1 unknown frame count
raw  [info]: output file: D:\PProjects\vs-deoldify_dev\VideoTest1_720p.265
x265 [info]: HEVC encoder version 3.5+115-88fd6d3ad
x265 [info]: build info [Windows][GCC 13.2.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [warning]: --psnr used with psy on: results will be invalid!
x265 [warning]: --tune psnr should be used if attempting to benchmark psnr!
x265 [info]: Main 10 profile, Level-3.1 (Main tier)
x265 [info]: Thread pool created using 20 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 4 / wpp(11 rows)
x265 [warning]: Source height < 720p; disabling lookahead-slices
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 2
x265 [info]: Keyframe min / max / scenecut / bias  : 23 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 15 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-21.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 rskip mode=1 signhide tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing deblock(tC=-1:B=-1) sao
Output 2594 frames in 415.95 seconds (6.24 fps)
x265 [info]: frame I:     14, Avg QP:19.85  kb/s: 8718.74   PSNR Mean: Y:48.786 U:51.412 V:51.627  SSIM Mean: 0.991435 (20.673dB)
x265 [info]: frame P:    663, Avg QP:20.24  kb/s: 3345.85   PSNR Mean: Y:48.327 U:50.204 V:50.376  SSIM Mean: 0.991880 (20.904dB)
x265 [info]: frame B:   1917, Avg QP:25.28  kb/s: 653.67    PSNR Mean: Y:47.784 U:48.858 V:48.975  SSIM Mean: 0.991540 (20.726dB)
x265 [info]: Weighted P-Frames: Y:11.3% UV:10.4%

encoded 2594 frames in 415.78s (6.24 fps), 1385.29 kb/s, Avg QP:23.96, Global PSNR: 48.267, SSIM Mean Y: 0.9916264 (20.771 dB)

Now the speed decrease to 6.24 fps.
So to speed-up the encoding is not the pipe but the "raw" format.
An increase of speed of 1.9x is worth your attention.

You should consider to abandon the "y4m" format for vsPipe and switch to "raw" format, please check if you obtain the same increase in speed.

Thanks,
Dan
Reply
#45
Will do some testing tomorrow, question is vspipe or x265 faster with raw and is this always the case.
Also note that the main downside of raw video pipes is that any output to std:out will the stream,...

F:\Hybrid\64bit>"F:\Hybrid\64bit\Vapoursynth\vspipe.exe" "J:\tmp\encodingTempSynthSkript_2024-03-01@20_35_55_0010_0.vpy" - -c y4m | "F:\Hybrid\64bit\NVEncC.exe" --y4m -i - --fps 25.000 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt470bg --cuda-schedule sync --output "J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1"
--------------------------------------------------------------------------------
J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1
--------------------------------------------------------------------------------

Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet101_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet101_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")

NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            AMD Ryzen 9 7950X 16-Core Processor [5.50GHz] (16C/32T)
GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.61]
NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers  CUDA, 20 frames
Input Info     y4m(yv12(10bit))->p010 [AVX2], 640x352, 25/1 fps
Vpp Filters    copyHtoD
Output Info    AV1 main 10bit @ Level auto
               640x352p 1:1 25.000fps (25/1fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 0 kbps)
Target Quality 23.00
Initial QP     I:20  P:23  B:25
QP range       I:0-255  P:0-255  B:0-255
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      off
GOP length     250 frames
B frames       3 frames [ref mode: middle]
Ref frames     7 frames, MultiRef L0:auto L1:auto
AQ             on (spatial, temporal, strength 5)
Part size      max auto / min auto
Tile num       columns auto / rows auto
TemporalLayers max 1
Refs           forward auto, backward auto
VUI            matrix:bt470bg,range:limited
Others         mv:Q-pel
Output 429 frames in 29.80 seconds (14.40 fps)%

encoded 429 frames, 14.48 fps, 1135.94 kbps, 2.32 MB
encode time 0:00:29, CPU: 0.0%, GPU: 7.9%, GPUClock: 2805MHz, VEClock: 2175MHz
frame type IDR   2
frame type I     2,  total size  0.03 MB
frame type P   108,  total size  0.01 MB
frame type B   319,  total size  2.28 MB

F:\Hybrid\64bit>"F:\Hybrid\64bit\Vapoursynth\vspipe.exe" "J:\tmp\encodingTempSynthSkript_2024-03-01@20_35_55_0010_0.vpy" - | "F:\Hybrid\64bit\NVEncC.exe" --raw --input-res 640x352 -i - --fps 25.000 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt470bg --cuda-schedule sync --output "J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1"
--------------------------------------------------------------------------------
J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1
--------------------------------------------------------------------------------

NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            AMD Ryzen 9 7950X 16-Core Processor [5.52GHz] (16C/32T)
GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.61]
NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers  CUDA, 20 frames
Input Info     raw(yv12)->nv12 [AVX2], 640x352, 25/1 fps
Vpp Filters    copyHtoD
               cspconv(nv12 -> p010)
Output Info    AV1 main 10bit @ Level auto
               640x352p 1:1 25.000fps (25/1fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 0 kbps)
Target Quality 23.00
Initial QP     I:20  P:23  B:25
QP range       I:0-255  P:0-255  B:0-255
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      off
GOP length     250 frames
B frames       3 frames [ref mode: middle]
Ref frames     7 frames, MultiRef L0:auto L1:auto
AQ             on (spatial, temporal, strength 5)
Part size      max auto / min auto
Tile num       columns auto / rows auto
TemporalLayers max 1
Refs           forward auto, backward auto
VUI            matrix:bt470bg,range:limited
Others         mv:Q-pel
Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet101_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet101_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")

Output 429 frames in 29.29 seconds (14.65 fps)0%

encoded 858 frames, 21.79 fps, 2979.71 kbps, 12.19 MB
encode time 0:00:39, CPU: 0.0%, GPU: 10.2%, GPUClock: 2796MHz, VEClock: 2171MHz
frame type IDR   4
frame type I     4,  total size   0.19 MB
frame type P   216,  total size   0.07 MB
frame type B   638,  total size  11.93 MB


tooo sleepy,... this looks wrong, check whether the output are really correctly playable in your examples.

encoded 429 frames, 14.48 fps, 1135.94 kbps, 2.32 MB (429 is the correct frame count)

encoded 858 frames, 21.79 fps, 2979.71 kbps, 12.19 MB (same input but double the number of frames)


Cu Selur
Reply
#46
I released a new version: https://github.com/dan64/vs-deoldify/rel...tag/v1.0.2

I applied the following changes:
  • updated the readme.
  • filtered out the torch warnings
  • added the new parameter "n_threads" to set the number of threads used by numpy (default=8)

These changes should enable the encoding using vsPipe.

Dan
Reply
#47
The new dev version with vs-deoldify 1.0.2 is working. Smile

Thanks,
Dan

(01.03.2024, 22:17)Dan64 Wrote:
ffmpeg with pipe:
encoded 5188 frames in 446.18s (11.63 fps), 15130.72 kb/s, Avg QP:37.53

vsPipe:
encoded 5188 frames in 440.83s (11.77 fps), 15130.72 kb/s, Avg QP:37.53

ffmpeg.exe -f vapoursynth:
encoded 2594 frames in 416.57s (6.23 fps), 463.80 kb/s, Avg QP:32.24

This is a big improvement!  Smile
The encoding speed of Jupiter version of Deoldify on my PC is about 5.6 fps

Dan

  The fps speed reported in the raw mode is wrong.  For some reason in raw mode is reported that the number of frames encoded is 5188, while in reality are the half, 2594. And this the reason why the reported fps speed doubled.
But the total encoding time is almost the same: 446s, 440s, 416s.   
  Yesterday was too tired to observe it. The raw mode is not introducing any encoding speed increase. Sad 

Dan
Reply
#48
I agree, I also did some tests and I too can't detect any real speed difference (that isn't in the normal error range).

Cu Selur

Ps.: also includes vs-deoldify in the torch-addon.
Reply
#49
btw. using Merge combining ddcolor and deoldify surprisingly does look interesting:
file
[Image: grafik.png]
(not good enough to get integrated into Hybrid)

Cu Selur
Reply
#50
If you are looking to the perfect colorizer, I think that it will be necessary wait too many years.
To me the result look good enought. In Stable Diffusion it is possible to set for every "filter" a weight that they call "visibility".
I understand that implementing this feature in Hybrid for every filter is a mess.
But at least you can consider the possibility to Merge the 2 filters in some way.

Thanks,
Dan

P.S.
In meanwhile I posted this request to rigaya: https://github.com/rigaya/NVEnc/issues/564
Reply


Forum Jump:


Users browsing this thread: 40 Guest(s)