Posts: 528
Threads: 59
Joined: Oct 2022
20.12.2023, 20:50
Howdy,
Today, i want to try out the torch for the first time now that i own a 40xx serie nvidia card xD .
So, i dumped all folders in the x64 folder..right..
My question , is the filesize (unpacked) → 30+ GB sound about right
If so, my poor 256GB os system ...
cheers,
TD
Posts: 10.985
Threads: 57
Joined: May 2017
20.12.2023, 20:56
(This post was last modified: 20.12.2023, 20:56 by Selur.)
Yes, extracting all the add-ons requires ~39GB of free space. - onnx_models ~8.96GB
- Vapoursynth_torch ~15.5GB
- vsgan_models ~8.51GB
- vs-mlrt ~6.04GB
Cu Selur
Posts: 528
Threads: 59
Joined: Oct 2022
(20.12.2023, 20:56)Selur Wrote: Yes, extracting all the add-ons requires ~39GB of free space.- onnx_models ~8.96GB
- Vapoursynth_torch ~15.5GB
- vsgan_models ~8.51GB
- vs-mlrt ~6.04GB
Cu Selur
DamN 0^0 .. thanks for the confirmation..
In addition, which green gpu do you have selur .. and what's your fps speed using x4 ESRGAN on a 720x576 SD res upscaled to 1080p ?
My 4060ti barely can hold it together
thanks,
td
....Must say... realy impresive results.. without the need of denoise/sharpening filters ^^ .. duck like very much hmm.. ° ^ °
Odd 2x looks much better than 4x.. and it's way less gpu intensive too 0_o .. Cpu utilization is like non-existing LoL..
Iam curious to see what the performance will be on the new 40xx Super cards or even 50xx Nvidia series ... indeed..
Posts: 10.985
Threads: 57
Joined: May 2017
In my system, I got: - NVIDIA GeForce RTX 4080
- Intel® Arc A380
- AMD Radeon of my AMD Ryzen 9 9850X
I usually use the NVIDIA card when using machine learning stuff.
No clue about the speed,...
Did a quick test, resizing 720x576 (PAR 16x15) to 1920x1440(PAR 1x1) through:
# Imports
import vapoursynth as vs
# getting Vapoursynth core
import ctypes
import os
import site
core = vs.core
# Adding torch dependencies to PATH
path = site.getsitepackages()[0]+'/torch_dependencies/bin/'
ctypes.windll.kernel32.SetDllDirectoryW(path)
path = path.replace('\\', '/')
os.environ["PATH"] = path + os.pathsep + os.environ["PATH"]
# Loading Plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/Support/fmtconv.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/DGDecNV/DGDecodeNV.dll")
# source: 'G:\TestClips&Co\files\MPEG-2\pal_lpcm.vob'
# current color space: YUV420P8, bit depth: 8, resolution: 720x576, fps: 25, color matrix: 470bg, yuv luminance scale: limited, scanorder: progressive
# Loading G:\TestClips&Co\files\MPEG-2\pal_lpcm.vob using DGSource
clip = core.dgdecodenv.DGSource("J:/tmp/vob_4c623effa02143dabae36eb7e17be929_853323747.dgi")# 25 fps, scanorder: progressive
# Setting detected color matrix (470bg).
clip = core.std.SetFrameProps(clip, _Matrix=5)
# Setting color transfer info (470bg), when it is not set
clip = clip if core.text.FrameProps(clip,'_Transfer') else core.std.SetFrameProps(clip, _Transfer=5)
# Setting color primaries info (5), when it is not set
clip = clip if core.text.FrameProps(clip,'_Primaries') else core.std.SetFrameProps(clip, _Primaries=5)
# Setting color range to TV (limited) range.
clip = core.std.SetFrameProp(clip=clip, prop="_ColorRange", intval=1)
# making sure frame rate is set to 25
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
clip = core.std.SetFrameProp(clip=clip, prop="_FieldBased", intval=0) # progressive
from vsrealesrgan import realesrgan as RealESRGAN
# adjusting color space from YUV420P8 to RGBH for vsRealESRGAN
clip = core.resize.Bicubic(clip=clip, format=vs.RGBH, matrix_in_s="470bg", range_s="limited")
# resizing using RealESRGAN
clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3) # 2880x2304
# resizing 2880x2304 to 1920x1440
# adjusting resizing
clip = core.resize.Bicubic(clip=clip, format=vs.RGBS, range_s="limited")
clip = core.fmtc.resample(clip=clip, w=1920, h=1440, kernel="spline64", interlaced=False, interlacedd=False)
# adjusting output color from: RGBS to YUV420P10 for x265Model
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P10, matrix_s="470bg", range_s="limited", dither_type="error_diffusion")
# set output frame rate to 25fps (progressive)
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
# Output
clip.set_output()
(after waiting ages till the .engine file is build)
VSPipe.exe c:\Users\Selur\Desktop\Testing.vpy --progress -c y4m NUL
reports:
Script evaluation done in 3.88 seconds
Output 5367 frames in 258.18 seconds (20.79 fps)
Cu Selur
Posts: 528
Threads: 59
Joined: Oct 2022
(20.12.2023, 21:26)Selur Wrote: In my system, I got:- NVIDIA GeForce RTX 4080
- Intel® Arc A380
- AMD Radeon of my AMD Ryzen 9 9850X
I usually use the NVIDIA card when using machine learning stuff.
First of → 9850x !? are you still on an AM3 Amd PHENOM Quad-core system mayhap ?
I asume you meant to write R9 5950x , right? Just one step higher than mine → 5900x
Also, are you using Intel Arc gpu in conjuction with ur nvidia in Sli config ? That even possible, using mixed brands in one system
(20.12.2023, 21:26)Selur Wrote: No clue about the speed,...
Did a quick test, resizing 720x576 (PAR 16x15) to 1920x1440(PAR 1x1) through:
I guess you did, and you do know..
haven't you done ↓ an 53xx ish pal clip inlike 25x seconds at a 20fps average speed ? Wich is very impressive speeds if that's true !
(20.12.2023, 21:26)Selur Wrote: Script evaluation done in 3.88 seconds
Output 5367 frames in 258.18 seconds (20.79 fps)
Cu Selur
Also, i can see you used ESRGAN.. but wich model have you used ? i.e: 2xPlus, 4xPlus.. realsr ?
BIG ↑ difference, quality / speed wise !!
cheers,
Posts: 10.985
Threads: 57
Joined: May 2017
20.12.2023, 21:49
(This post was last modified: 20.12.2023, 21:51 by Selur.)
Sorry, meant: AMD Ryzen 9 7950X for the cpu.
No, I use the Intel card separately, mainly for AV1 encoding.
I used https://github.com/HolyWu/vs-realesrgan
clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3)
which is the general model:
0 = ESRGAN_SRx4_DF2KOST_official-ff704c30 (official ESRGAN x4 model)
1 = RealESRGAN_x2plus (x2 model for general images)
2 = RealESRGAN_x4plus (x4 model for general images)
3 = RealESRGAN_x4plus_anime_6B (x4 model optimized for anime images)
4 = realesr-animevideov3 (x4 model optimized for anime videos)
5 = realesr-general-x4v3 (tiny small x4 model for general scenes)
Cu Selur
Posts: 528
Threads: 59
Joined: Oct 2022
(20.12.2023, 21:49)Selur Wrote: Sorry, meant: AMD Ryzen 9 7950X for the cpu.
No, I use the Intel card separately, mainly for AV1 encoding.
clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3)
which is the general model:
0 = ESRGAN_SRx4_DF2KOST_official-ff704c30 (official ESRGAN x4 model)
1 = RealESRGAN_x2plus (x2 model for general images)
2 = RealESRGAN_x4plus (x4 model for general images)
3 = RealESRGAN_x4plus_anime_6B (x4 model optimized for anime images)
4 = realesr-animevideov3 (x4 model optimized for anime videos)
5 = realesr-general-x4v3 (tiny small x4 model for general scenes)
Cu Selur
There you go..→ model=5 Check
so and again, in a nutshel... you have upscaled dvd/SD content of about 3min give or take to 1080p using Realesr-general (wich is quite faster than 4x btw) inllike realtime more or less (25x secs) that's about like 20 ish frames per second if your source is 25fps uhuh..
If that info is about right.. damn ...
Me using the same settings , my poor gpu ( 4060-Ti) achieves like 3fps in hybrid .. means 4080 is like 700% faster ... realy? LOL!
and it's not like my cpu is the bottleneck... it has hardly anything to do... since its mostly gpu intensive !!
I think i'll go for an Nvidia super card next year.. for sure..
cheers,
...but erm... is it possible i have such slow speeds... cause i have to use sRestore on 50fps content !??
but then again, that's more of a cpu task if you ask me ...?
chees,
TD
Posts: 10.985
Threads: 57
Joined: May 2017
20.12.2023, 22:05
(This post was last modified: 20.12.2023, 22:07 by Selur.)
Note that I uses TensorRT, which takes quite a while to create a .engine file (will be created for each resolution), but then is quite a bit faster than when using ncnn.
But yes, after building the .engine file, processing ran at ~21fps and for a 3:34.68 clip it took 258.18 seconds (20.79 fps).
Seeint the CUDA and Tensor RT core counts (4060ti vs 4080):
Cores 4352 9728
RT Cores 32 76
and taking into account that the 4080 uses faster memory, I would have expected it to be ~2.5 faster.
=> I suspect you didn't use TensorRT, which could explain the speed difference.
Quote:...but erm... is it possible i have such slow speeds... cause i have to use sRestore on 50fps content !??
but then again, that's more of a cpu task if you ask me ...?
what speeds do you get if you use for example spline64 as resizer? (instead of RealESRGAN; I still suspect it's mainly due to not using TensorRT)
Cu Selur
Posts: 528
Threads: 59
Joined: Oct 2022
(20.12.2023, 22:05)Selur Wrote: Note that I uses TensorRT, which takes quite a while to create a .engine file (will be created for each resolution), but then is quite a bit faster than when using ncnn.
But yes, after building the .engine file, processing ran at ~21fps and for a 3:34.68 clip it took 258.18 seconds (20.79 fps).
Seeint the CUDA and Tensor RT core counts (4060ti vs 4080):
Cores 4352 9728
RT Cores 32 76
and taking into account that the 4080 uses faster memory, I would have expected it to be ~2.5 faster.
=> I suspect you didn't use TensorRT, which could explain the speed difference.
Cu Selur
First of indeed Twice as much cores , but 3x times the price (minimum..) the same applies to power consumption LoL
Very true... you've guessed it right.. And i did checked Tensor first then both Tensor + fp16 (for faster .. so it claims but..) ... Indeed it took SOO LONG for the preview, that i thought it would be much slower using Tensor...
But like you said, if it's doing an prep and we users don't know that because we just don't see all the ongoing processes in hybrid you know.. If you put it like that, it might be faster in totality...
but is it now ? Have you add up the time it took to prepare before the actuall encode started ?
Btw... my poor 4060ti encreased to speed 3.04 in the meanwhile
cheers,
TD
Posts: 10.985
Threads: 57
Joined: May 2017
20.12.2023, 22:16
(This post was last modified: 20.12.2023, 22:16 by Selur.)
Quote:but is it now ? Have you add up the time it took to prepare before the actuall encode started ?
The .engine file is build for different settings, so you can reuse them for different files as long as the settings stay the same,...
In my case it was named:
realesr-general-x4v3.pth_NVIDIA GeForce RTX 4080_trt-8.6.1_720x576_fp16_workspace-1073741824_denoise-0.5.pt
(vs-mlrt uses .engine as extension)
Using TensorRT doesn't make sense if you use short clip, but it does make sense for stuff that is longer.
Cu Selur
|