Posts: 10.976
Threads: 56
Joined: May 2017
This is something related to vs-deoldify only.
Your wrapper should have an option to set it, and always set it.
Posts: 736
Threads: 70
Joined: Feb 2020
This message is triggered by " numpy" which is used by Deoldify.
I will add an option in the filter to set the number of thread (default = 8).
It is just enough to add the following code in the filter
os.environ['NUMEXPR_MAX_THREADS'] = '8'
Dan
Posts: 10.976
Threads: 56
Joined: May 2017
top, going to bed now.
Posts: 736
Threads: 70
Joined: Feb 2020
I tested vsPipe with "y4m"
D:\PProjects\vs-deoldify_dev>set NUMEXPR_MAX_THREADS=10
D:\PProjects\vs-deoldify_dev>"D:\Programs\Hybrid\64bit\Vapoursynth\vspipe.exe" "D:\PProjects\vs-deoldify_dev\encoding.vpy" - -c y4m | "D:\Programs\Hybrid\64bit\x265.exe" --preset fast --input - --fps 24000/1001 --output-depth 10 --y4m --profile main10 --b-adapt 2 --crf 21.00 --psy-rd 2.00 --deblock=-1:-1 --psnr --ssim --range limited --sar 1:1 --output "D:\PProjects\vs-deoldify_dev\VideoTest1_720p.265"
y4m [info]: 1280x692 fps 24000/1001 i420p10 sar 1:1 unknown frame count
raw [info]: output file: D:\PProjects\vs-deoldify_dev\VideoTest1_720p.265
x265 [info]: HEVC encoder version 3.5+115-88fd6d3ad
x265 [info]: build info [Windows][GCC 13.2.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [warning]: --psnr used with psy on: results will be invalid!
x265 [warning]: --tune psnr should be used if attempting to benchmark psnr!
x265 [info]: Main 10 profile, Level-3.1 (Main tier)
x265 [info]: Thread pool created using 20 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 4 / wpp(11 rows)
x265 [warning]: Source height < 720p; disabling lookahead-slices
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 2
x265 [info]: Keyframe min / max / scenecut / bias : 23 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 15 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-21.0 / 0.60
x265 [info]: tools: rd=2 psy-rd=2.00 rskip mode=1 signhide tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing deblock(tC=-1:B=-1) sao
Output 2594 frames in 415.95 seconds (6.24 fps)
x265 [info]: frame I: 14, Avg QP:19.85 kb/s: 8718.74 PSNR Mean: Y:48.786 U:51.412 V:51.627 SSIM Mean: 0.991435 (20.673dB)
x265 [info]: frame P: 663, Avg QP:20.24 kb/s: 3345.85 PSNR Mean: Y:48.327 U:50.204 V:50.376 SSIM Mean: 0.991880 (20.904dB)
x265 [info]: frame B: 1917, Avg QP:25.28 kb/s: 653.67 PSNR Mean: Y:47.784 U:48.858 V:48.975 SSIM Mean: 0.991540 (20.726dB)
x265 [info]: Weighted P-Frames: Y:11.3% UV:10.4%
encoded 2594 frames in 415.78s (6.24 fps), 1385.29 kb/s, Avg QP:23.96, Global PSNR: 48.267, SSIM Mean Y: 0.9916264 (20.771 dB)
Now the speed decrease to 6.24 fps.
So to speed-up the encoding is not the pipe but the "raw" format.
An increase of speed of 1.9x is worth your attention.
You should consider to abandon the "y4m" format for vsPipe and switch to " raw" format, please check if you obtain the same increase in speed.
Thanks,
Dan
Posts: 10.976
Threads: 56
Joined: May 2017
01.03.2024, 23:01
(This post was last modified: 01.03.2024, 23:21 by Selur.)
Will do some testing tomorrow, question is vspipe or x265 faster with raw and is this always the case.
Also note that the main downside of raw video pipes is that any output to std:out will the stream,...
F:\Hybrid\64bit>"F:\Hybrid\64bit\Vapoursynth\vspipe.exe" "J:\tmp\encodingTempSynthSkript_2024-03-01@20_35_55_0010_0.vpy" - -c y4m | "F:\Hybrid\64bit\NVEncC.exe" --y4m -i - --fps 25.000 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt470bg --cuda-schedule sync --output "J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1"
--------------------------------------------------------------------------------
J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1
--------------------------------------------------------------------------------
Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet101_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet101_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)
OS Version Windows 11 x64 (22631) [UTF-8]
CPU AMD Ryzen 9 7950X 16-Core Processor [5.50GHz] (16C/32T)
GPU #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.61]
NVENC / CUDA NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers CUDA, 20 frames
Input Info y4m(yv12(10bit))->p010 [AVX2], 640x352, 25/1 fps
Vpp Filters copyHtoD
Output Info AV1 main 10bit @ Level auto
640x352p 1:1 25.000fps (25/1fps)
Encoder Preset quality
Rate Control VBR
Multipass none
Bitrate 0 kbps (Max: 0 kbps)
Target Quality 23.00
Initial QP I:20 P:23 B:25
QP range I:0-255 P:0-255 B:0-255
QP Offset cb:0 cr:0
VBV buf size auto
Split Enc Mode auto
Lookahead off
GOP length 250 frames
B frames 3 frames [ref mode: middle]
Ref frames 7 frames, MultiRef L0:auto L1:auto
AQ on (spatial, temporal, strength 5)
Part size max auto / min auto
Tile num columns auto / rows auto
TemporalLayers max 1
Refs forward auto, backward auto
VUI matrix:bt470bg,range:limited
Others mv:Q-pel
Output 429 frames in 29.80 seconds (14.40 fps)%
encoded 429 frames, 14.48 fps, 1135.94 kbps, 2.32 MB
encode time 0:00:29, CPU: 0.0%, GPU: 7.9%, GPUClock: 2805MHz, VEClock: 2175MHz
frame type IDR 2
frame type I 2, total size 0.03 MB
frame type P 108, total size 0.01 MB
frame type B 319, total size 2.28 MB
F:\Hybrid\64bit>"F:\Hybrid\64bit\Vapoursynth\vspipe.exe" "J:\tmp\encodingTempSynthSkript_2024-03-01@20_35_55_0010_0.vpy" - | "F:\Hybrid\64bit\NVEncC.exe" --raw --input-res 640x352 -i - --fps 25.000 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt470bg --cuda-schedule sync --output "J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1"
--------------------------------------------------------------------------------
J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1
--------------------------------------------------------------------------------
NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)
OS Version Windows 11 x64 (22631) [UTF-8]
CPU AMD Ryzen 9 7950X 16-Core Processor [5.52GHz] (16C/32T)
GPU #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.61]
NVENC / CUDA NVENC API 12.1, CUDA 12.4, schedule mode: sync
Input Buffers CUDA, 20 frames
Input Info raw(yv12)->nv12 [AVX2], 640x352, 25/1 fps
Vpp Filters copyHtoD
cspconv(nv12 -> p010)
Output Info AV1 main 10bit @ Level auto
640x352p 1:1 25.000fps (25/1fps)
Encoder Preset quality
Rate Control VBR
Multipass none
Bitrate 0 kbps (Max: 0 kbps)
Target Quality 23.00
Initial QP I:20 P:23 B:25
QP range I:0-255 P:0-255 B:0-255
QP Offset cb:0 cr:0
VBV buf size auto
Split Enc Mode auto
Lookahead off
GOP length 250 frames
B frames 3 frames [ref mode: middle]
Ref frames 7 frames, MultiRef L0:auto L1:auto
AQ on (spatial, temporal, strength 5)
Part size max auto / min auto
Tile num columns auto / rows auto
TemporalLayers max 1
Refs forward auto, backward auto
VUI matrix:bt470bg,range:limited
Others mv:Q-pel
Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet101_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet101_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Output 429 frames in 29.29 seconds (14.65 fps)0%
encoded 858 frames, 21.79 fps, 2979.71 kbps, 12.19 MB
encode time 0:00:39, CPU: 0.0%, GPU: 10.2%, GPUClock: 2796MHz, VEClock: 2171MHz
frame type IDR 4
frame type I 4, total size 0.19 MB
frame type P 216, total size 0.07 MB
frame type B 638, total size 11.93 MB
tooo sleepy,... this looks wrong, check whether the output are really correctly playable in your examples.
encoded 429 frames, 14.48 fps, 1135.94 kbps, 2.32 MB (429 is the correct frame count)
encoded 858 frames, 21.79 fps, 2979.71 kbps, 12.19 MB (same input but double the number of frames)
Cu Selur
Posts: 736
Threads: 70
Joined: Feb 2020
I released a new version: https://github.com/dan64/vs-deoldify/rel...tag/v1.0.2
I applied the following changes:
- updated the readme.
- filtered out the torch warnings
- added the new parameter "n_threads" to set the number of threads used by numpy (default=8)
These changes should enable the encoding using vsPipe.
Dan
Posts: 736
Threads: 70
Joined: Feb 2020
The new dev version with vs-deoldify 1.0.2 is working.
Thanks,
Dan
(01.03.2024, 22:17)Dan64 Wrote: ffmpeg with pipe:
encoded 5188 frames in 446.18s (11.63 fps), 15130.72 kb/s, Avg QP:37.53
vsPipe:
encoded 5188 frames in 440.83s (11.77 fps), 15130.72 kb/s, Avg QP:37.53
ffmpeg.exe -f vapoursynth:
encoded 2594 frames in 416.57s (6.23 fps), 463.80 kb/s, Avg QP:32.24
This is a big improvement!
The encoding speed of Jupiter version of Deoldify on my PC is about 5.6 fps
Dan
The fps speed reported in the raw mode is wrong. For some reason in raw mode is reported that the number of frames encoded is 5188, while in reality are the half, 2594. And this the reason why the reported fps speed doubled.
But the total encoding time is almost the same: 446s, 440s, 416s.
Yesterday was too tired to observe it. The raw mode is not introducing any encoding speed increase.
Dan
Posts: 10.976
Threads: 56
Joined: May 2017
02.03.2024, 13:37
(This post was last modified: 02.03.2024, 13:39 by Selur.)
I agree, I also did some tests and I too can't detect any real speed difference (that isn't in the normal error range).
Cu Selur
Ps.: also includes vs-deoldify in the torch-addon.
Posts: 10.976
Threads: 56
Joined: May 2017
02.03.2024, 15:48
(This post was last modified: 02.03.2024, 16:00 by Selur.)
btw. using Merge combining ddcolor and deoldify surprisingly does look interesting:
file
(not good enough to get integrated into Hybrid)
Cu Selur
Posts: 736
Threads: 70
Joined: Feb 2020
If you are looking to the perfect colorizer, I think that it will be necessary wait too many years.
To me the result look good enought. In Stable Diffusion it is possible to set for every "filter" a weight that they call "visibility".
I understand that implementing this feature in Hybrid for every filter is a mess.
But at least you can consider the possibility to Merge the 2 filters in some way.
Thanks,
Dan
P.S.
In meanwhile I posted this request to rigaya: https://github.com/rigaya/NVEnc/issues/564
|