This forum uses cookies

***Selur*** · 01.03.2024, 22:28

This is something related to vs-deoldify only.
Your wrapper should have an option to set it, and always set it.

Dan64 · 01.03.2024, 22:42

This message is triggered by "numpy" which is used by Deoldify.

I will add an option in the filter to set the number of thread (default = 8).

It is just enough to add the following code in the filter

os.environ['NUMEXPR_MAX_THREADS'] = '8'

Dan

***Selur*** · 01.03.2024, 22:45

top, going to bed now. Smile

Dan64 · 01.03.2024, 22:52

I tested vsPipe with "y4m"

D:\PProjects\vs-deoldify_dev>set NUMEXPR_MAX_THREADS=10

D:\PProjects\vs-deoldify_dev>"D:\Programs\Hybrid\64bit\Vapoursynth\vspipe.exe" "D:\PProjects\vs-deoldify_dev\encoding.vpy" - -c y4m   | "D:\Programs\Hybrid\64bit\x265.exe" --preset fast --input - --fps 24000/1001 --output-depth 10 --y4m --profile main10 --b-adapt 2 --crf 21.00 --psy-rd 2.00 --deblock=-1:-1 --psnr --ssim --range limited --sar 1:1 --output "D:\PProjects\vs-deoldify_dev\VideoTest1_720p.265"

y4m  [info]: 1280x692 fps 24000/1001 i420p10 sar 1:1 unknown frame count

raw  [info]: output file: D:\PProjects\vs-deoldify_dev\VideoTest1_720p.265

x265 [info]: HEVC encoder version 3.5+115-88fd6d3ad

x265 [info]: build info [Windows][GCC 13.2.0][64 bit] 10bit

x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2

x265 [warning]: --psnr used with psy on: results will be invalid!

x265 [warning]: --tune psnr should be used if attempting to benchmark psnr!

x265 [info]: Main 10 profile, Level-3.1 (Main tier)

x265 [info]: Thread pool created using 20 threads

x265 [info]: Slices                              : 1

x265 [info]: frame threads / pool features       : 4 / wpp(11 rows)

x265 [warning]: Source height < 720p; disabling lookahead-slices

x265 [info]: Coding QT: max CU size, min CU size : 64 / 8

x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra

x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 2

x265 [info]: Keyframe min / max / scenecut / bias  : 23 / 250 / 40 / 5.00

x265 [info]: Lookahead / bframes / badapt        : 15 / 4 / 2

x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0

x265 [info]: References / ref-limit  cu / depth  : 3 / on / on

x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1

x265 [info]: Rate Control / qCompress            : CRF-21.0 / 0.60

x265 [info]: tools: rd=2 psy-rd=2.00 rskip mode=1 signhide tmvp fast-intra

x265 [info]: tools: strong-intra-smoothing deblock(tC=-1:B=-1) sao

Output 2594 frames in 415.95 seconds (6.24 fps)

x265 [info]: frame I:     14, Avg QP:19.85  kb/s: 8718.74   PSNR Mean: Y:48.786 U:51.412 V:51.627  SSIM Mean: 0.991435 (20.673dB)

x265 [info]: frame P:    663, Avg QP:20.24  kb/s: 3345.85   PSNR Mean: Y:48.327 U:50.204 V:50.376  SSIM Mean: 0.991880 (20.904dB)

x265 [info]: frame B:   1917, Avg QP:25.28  kb/s: 653.67    PSNR Mean: Y:47.784 U:48.858 V:48.975  SSIM Mean: 0.991540 (20.726dB)

x265 [info]: Weighted P-Frames: Y:11.3% UV:10.4%

encoded 2594 frames in 415.78s (6.24 fps), 1385.29 kb/s, Avg QP:23.96, Global PSNR: 48.267, SSIM Mean Y: 0.9916264 (20.771 dB)

Now the speed decrease to 6.24 fps.
So to speed-up the encoding is not the pipe but the "raw" format.
An increase of speed of 1.9x is worth your attention.

You should consider to abandon the "y4m" format for vsPipe and switch to "raw" format, please check if you obtain the same increase in speed.

Thanks,
Dan

***Selur*** · (This post was last modified: 01.03.2024, 23:21 by Selur.)

Will do some testing tomorrow, question is vspipe or x265 faster with raw and is this always the case.
Also note that the main downside of raw video pipes is that any output to std:out will the stream,...

F:\Hybrid\64bit>"F:\Hybrid\64bit\Vapoursynth\vspipe.exe" "J:\tmp\encodingTempSynthSkript_2024-03-01@20_35_55_0010_0.vpy" - -c y4m | "F:\Hybrid\64bit\NVEncC.exe" --y4m -i - --fps 25.000 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt470bg --cuda-schedule sync --output "J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1"

--------------------------------------------------------------------------------

J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1

--------------------------------------------------------------------------------

Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet101_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet101_Weights.DEFAULT` to get the most up-to-date weights.

  warnings.warn(msg)

Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.

  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")

NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)

OS Version     Windows 11 x64 (22631) [UTF-8]

CPU            AMD Ryzen 9 7950X 16-Core Processor [5.50GHz] (16C/32T)

GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.61]

NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync

Input Buffers  CUDA, 20 frames

Input Info     y4m(yv12(10bit))->p010 [AVX2], 640x352, 25/1 fps

Vpp Filters    copyHtoD

Output Info    AV1 main 10bit @ Level auto

               640x352p 1:1 25.000fps (25/1fps)

Encoder Preset quality

Rate Control   VBR

Multipass      none

Bitrate        0 kbps (Max: 0 kbps)

Target Quality 23.00

Initial QP     I:20  P:23  B:25

QP range       I:0-255  P:0-255  B:0-255

QP Offset      cb:0  cr:0

VBV buf size   auto

Split Enc Mode auto

Lookahead      off

GOP length     250 frames

B frames       3 frames [ref mode: middle]

Ref frames     7 frames, MultiRef L0:auto L1:auto

AQ             on (spatial, temporal, strength 5)

Part size      max auto / min auto

Tile num       columns auto / rows auto

TemporalLayers max 1

Refs           forward auto, backward auto

VUI            matrix:bt470bg,range:limited

Others         mv:Q-pel

Output 429 frames in 29.80 seconds (14.40 fps)%

encoded 429 frames, 14.48 fps, 1135.94 kbps, 2.32 MB

encode time 0:00:29, CPU: 0.0%, GPU: 7.9%, GPUClock: 2805MHz, VEClock: 2175MHz

frame type IDR   2

frame type I     2,  total size  0.03 MB

frame type P   108,  total size  0.01 MB

frame type B   319,  total size  2.28 MB

F:\Hybrid\64bit>"F:\Hybrid\64bit\Vapoursynth\vspipe.exe" "J:\tmp\encodingTempSynthSkript_2024-03-01@20_35_55_0010_0.vpy" - | "F:\Hybrid\64bit\NVEncC.exe" --raw --input-res 640x352 -i - --fps 25.000 --codec av1 --sar 1:1 --output-depth 10 --vbr 0 --vbr-quality 23.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 7 --multiref-l0 3 --multiref-l1 3 --bframes 3 --bref-mode auto --mv-precision Q-pel --preset quality --colorrange limited --colormatrix bt470bg --cuda-schedule sync --output "J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1"

--------------------------------------------------------------------------------

J:\tmp\test_1_2024-03-01@20_35_55_0010_02.av1

--------------------------------------------------------------------------------

NVEncC (x64) 7.41 (r2681) by rigaya, Jan 22 2024 13:02:15 (VC 1929/Win)

OS Version     Windows 11 x64 (22631) [UTF-8]

CPU            AMD Ryzen 9 7950X 16-Core Processor [5.52GHz] (16C/32T)

GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][551.61]

NVENC / CUDA   NVENC API 12.1, CUDA 12.4, schedule mode: sync

Input Buffers  CUDA, 20 frames

Input Info     raw(yv12)->nv12 [AVX2], 640x352, 25/1 fps

Vpp Filters    copyHtoD

               cspconv(nv12 -> p010)

Output Info    AV1 main 10bit @ Level auto

               640x352p 1:1 25.000fps (25/1fps)

Encoder Preset quality

Rate Control   VBR

Multipass      none

Bitrate        0 kbps (Max: 0 kbps)

Target Quality 23.00

Initial QP     I:20  P:23  B:25

QP range       I:0-255  P:0-255  B:0-255

QP Offset      cb:0  cr:0

VBV buf size   auto

Split Enc Mode auto

Lookahead      off

GOP length     250 frames

B frames       3 frames [ref mode: middle]

Ref frames     7 frames, MultiRef L0:auto L1:auto

AQ             on (spatial, temporal, strength 5)

Part size      max auto / min auto

Tile num       columns auto / rows auto

TemporalLayers max 1

Refs           forward auto, backward auto

VUI            matrix:bt470bg,range:limited

Others         mv:Q-pel

Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet101_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet101_Weights.DEFAULT` to get the most up-to-date weights.

  warnings.warn(msg)

Warning: F:\Hybrid\64bit\Vapoursynth\Lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.

  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")

Output 429 frames in 29.29 seconds (14.65 fps)0%

encoded 858 frames, 21.79 fps, 2979.71 kbps, 12.19 MB

encode time 0:00:39, CPU: 0.0%, GPU: 10.2%, GPUClock: 2796MHz, VEClock: 2171MHz

frame type IDR   4

frame type I     4,  total size   0.19 MB

frame type P   216,  total size   0.07 MB

frame type B   638,  total size  11.93 MB

tooo sleepy,... this looks wrong, check whether the output are really correctly playable in your examples.

encoded 429 frames, 14.48 fps, 1135.94 kbps, 2.32 MB (429 is the correct frame count)

encoded 858 frames, 21.79 fps, 2979.71 kbps, 12.19 MB (same input but double the number of frames)

Cu Selur

Dan64 · 02.03.2024, 00:07

I released a new version: https://github.com/dan64/vs-deoldify/rel...tag/v1.0.2

I applied the following changes:

updated the readme.
filtered out the torch warnings
added the new parameter "n_threads" to set the number of threads used by numpy (default=8)

These changes should enable the encoding using vsPipe.

Dan

Dan64 · 02.03.2024, 10:21

The new dev version with vs-deoldify 1.0.2 is working. Smile

Thanks,
Dan

(01.03.2024, 22:17)Dan64 Wrote:
ffmpeg with pipe: encoded 5188 frames in 446.18s (11.63 fps), 15130.72 kb/s, Avg QP:37.53 vsPipe: encoded 5188 frames in 440.83s (11.77 fps), 15130.72 kb/s, Avg QP:37.53 ffmpeg.exe -f vapoursynth: encoded 2594 frames in 416.57s (6.23 fps), 463.80 kb/s, Avg QP:32.24

This is a big improvement!
The encoding speed of Jupiter version of Deoldify on my PC is about 5.6 fps

Dan

The fps speed reported in the raw mode is wrong. For some reason in raw mode is reported that the number of frames encoded is 5188, while in reality are the half, 2594. And this the reason why the reported fps speed doubled.
But the total encoding time is almost the same: 446s, 440s, 416s.
Yesterday was too tired to observe it. The raw mode is not introducing any encoding speed increase. Sad

Dan

***Selur*** · (This post was last modified: 02.03.2024, 13:39 by Selur.)

I agree, I also did some tests and I too can't detect any real speed difference (that isn't in the normal error range).

Cu Selur

Ps.: also includes vs-deoldify in the torch-addon.

***Selur*** · (This post was last modified: 02.03.2024, 16:00 by Selur.)

btw. using Merge combining ddcolor and deoldify surprisingly does look interesting:
file

(not good enough to get integrated into Hybrid)

Cu Selur

Dan64 · 02.03.2024, 16:19

If you are looking to the perfect colorizer, I think that it will be necessary wait too many years.
To me the result look good enought. In Stable Diffusion it is possible to set for every "filter" a weight that they call "visibility".
I understand that implementing this feature in Hybrid for every filter is a mess.
But at least you can consider the possibility to Merge the 2 filters in some way.

Thanks,
Dan

P.S.
In meanwhile I posted this request to rigaya: https://github.com/rigaya/NVEnc/issues/564

Login
Username:
Password:	Lost Password?
	Remember me