This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

Torch Addon ...
#1
Tongue 
Howdy, 


Today, i want to try out the torch for the first time now that i own a 40xx serie nvidia card xD .
So, i dumped all folders in the x64 folder..right..

My question , is the filesize (unpacked) → 30+ GB sound about right  Huh 

If so, my poor 256GB os system ...


cheers,
TD
Reply
#2
Yes, extracting all the add-ons requires ~39GB of free space.
  • onnx_models ~8.96GB
  • Vapoursynth_torch ~15.5GB
  • vsgan_models ~8.51GB
  • vs-mlrt ~6.04GB

Cu Selur
Reply
#3
(20.12.2023, 20:56)Selur Wrote: Yes, extracting all the add-ons requires ~39GB of free space.
  • onnx_models ~8.96GB
  • Vapoursynth_torch ~15.5GB
  • vsgan_models ~8.51GB
  • vs-mlrt ~6.04GB

Cu Selur

DamN 0^0 .. thanks for the confirmation..

In addition, which green gpu do you have selur .. and what's your fps speed using x4 ESRGAN on a 720x576 SD res upscaled to 1080p Big Grin ?  
My 4060ti barely can hold it together  Confused

thanks,
td

....Must say... realy impresive results.. without the need of denoise/sharpening filters ^^ .. duck like very much hmm.. ° ^ ° 

Odd 2x looks much better than 4x.. and it's way less gpu intensive too 0_o .. Cpu utilization is like non-existing LoL..
Iam curious to see what the performance will be on the new 40xx Super cards or even 50xx Nvidia series Tongue ... indeed..
Reply
#4
In my system, I got:
  • NVIDIA GeForce RTX 4080
  • Intel® Arc™ A380
  • AMD Radeon™ of my AMD Ryzen 9 9850X
I usually use the NVIDIA card when using machine learning stuff.

No clue about the speed,...
Did a quick test, resizing 720x576 (PAR 16x15) to 1920x1440(PAR 1x1) through:
# Imports
import vapoursynth as vs
# getting Vapoursynth core
import ctypes
import os
import site
core = vs.core
# Adding torch dependencies to PATH
path = site.getsitepackages()[0]+'/torch_dependencies/bin/'
ctypes.windll.kernel32.SetDllDirectoryW(path)
path = path.replace('\\', '/')
os.environ["PATH"] = path + os.pathsep + os.environ["PATH"]
# Loading Plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/Support/fmtconv.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/DGDecNV/DGDecodeNV.dll")
# source: 'G:\TestClips&Co\files\MPEG-2\pal_lpcm.vob'
# current color space: YUV420P8, bit depth: 8, resolution: 720x576, fps: 25, color matrix: 470bg, yuv luminance scale: limited, scanorder: progressive
# Loading G:\TestClips&Co\files\MPEG-2\pal_lpcm.vob using DGSource
clip = core.dgdecodenv.DGSource("J:/tmp/vob_4c623effa02143dabae36eb7e17be929_853323747.dgi")# 25 fps, scanorder: progressive
# Setting detected color matrix (470bg).
clip = core.std.SetFrameProps(clip, _Matrix=5)
# Setting color transfer info (470bg), when it is not set
clip = clip if core.text.FrameProps(clip,'_Transfer') else core.std.SetFrameProps(clip, _Transfer=5)
# Setting color primaries info (5), when it is not set
clip = clip if core.text.FrameProps(clip,'_Primaries') else core.std.SetFrameProps(clip, _Primaries=5)
# Setting color range to TV (limited) range.
clip = core.std.SetFrameProp(clip=clip, prop="_ColorRange", intval=1)
# making sure frame rate is set to 25
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
clip = core.std.SetFrameProp(clip=clip, prop="_FieldBased", intval=0) # progressive
from vsrealesrgan import realesrgan as RealESRGAN
# adjusting color space from YUV420P8 to RGBH for vsRealESRGAN
clip = core.resize.Bicubic(clip=clip, format=vs.RGBH, matrix_in_s="470bg", range_s="limited")
# resizing using RealESRGAN
clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3) # 2880x2304
# resizing 2880x2304 to 1920x1440
# adjusting resizing
clip = core.resize.Bicubic(clip=clip, format=vs.RGBS, range_s="limited")
clip = core.fmtc.resample(clip=clip, w=1920, h=1440, kernel="spline64", interlaced=False, interlacedd=False)
# adjusting output color from: RGBS to YUV420P10 for x265Model
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P10, matrix_s="470bg", range_s="limited", dither_type="error_diffusion")
# set output frame rate to 25fps (progressive)
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
# Output
clip.set_output()
(after waiting ages till the .engine file is build)
VSPipe.exe  c:\Users\Selur\Desktop\Testing.vpy --progress -c y4m NUL
reports:
Script evaluation done in 3.88 seconds
Output 5367 frames in 258.18 seconds (20.79 fps)

Cu Selur
Reply
#5
(20.12.2023, 21:26)Selur Wrote: In my system, I got:
  • NVIDIA GeForce RTX 4080
  • Intel® Arc™ A380
  • AMD Radeon™ of my AMD Ryzen 9 9850X
I usually use the NVIDIA card when using machine learning stuff.

First of → 9850x  Dodgy !?  are you still on an AM3 Amd PHENOM Quad-core system mayhap  Tongue ?
I asume you meant to write R9 5950x , right?  Just one step higher than mine → 5900x  Big Grin

Also, are you using Intel Arc gpu in conjuction with ur nvidia in Sli config ?  That even possible, using mixed brands in one system  Huh


(20.12.2023, 21:26)Selur Wrote: No clue about the speed,...
Did a quick test, resizing 720x576 (PAR 16x15) to 1920x1440(PAR 1x1) through:


I guess you did, and you do know.. 

haven't you done ↓ an 53xx ish pal clip inlike 25x seconds at a 20fps average speed ? Wich is very impressive speeds if that's true !

(20.12.2023, 21:26)Selur Wrote: Script evaluation done in 3.88 seconds
Output 5367 frames in 258.18 seconds (20.79 fps)


Cu Selur
 


Also, i can see you used ESRGAN.. but wich model have you used ?  i.e: 2xPlus, 4xPlus.. realsr ?
                                                                                                  BIG ↑ difference, quality / speed wise !! 

cheers,
Reply
#6
Sorry, meant: AMD Ryzen 9 7950X for the cpu.
No, I use the Intel card separately, mainly for AV1 encoding.

I used https://github.com/HolyWu/vs-realesrgan
clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3)
which is the general model:
0 = ESRGAN_SRx4_DF2KOST_official-ff704c30 (official ESRGAN x4 model)
1 = RealESRGAN_x2plus (x2 model for general images)
2 = RealESRGAN_x4plus (x4 model for general images)
3 = RealESRGAN_x4plus_anime_6B (x4 model optimized for anime images)
4 = realesr-animevideov3 (x4 model optimized for anime videos)
5 = realesr-general-x4v3 (tiny small x4 model for general scenes)
Cu Selur
Reply
#7
(20.12.2023, 21:49)Selur Wrote: Sorry, meant: AMD Ryzen 9 7950X for the cpu.
No, I use the Intel card separately, mainly for AV1 encoding.


clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3)
which is the general model:
0 = ESRGAN_SRx4_DF2KOST_official-ff704c30 (official ESRGAN x4 model)
1 = RealESRGAN_x2plus (x2 model for general images)
2 = RealESRGAN_x4plus (x4 model for general images)
3 = RealESRGAN_x4plus_anime_6B (x4 model optimized for anime images)
4 = realesr-animevideov3 (x4 model optimized for anime videos)
5 = realesr-general-x4v3 (tiny small x4 model for general scenes)
Cu Selur

There you go..→ model=5 Check  Wink

so and again, in a nutshel... you have upscaled dvd/SD content of about 3min give or take to 1080p using Realesr-general (wich is quite faster than 4x btw) inllike realtime more or less (25x secs) that's about like 20 ish frames per second if your source is 25fps uhuh..  

If that info is about right.. damn ...  

Me using the same settings , my poor gpu ( 4060-Ti) achieves like 3fps in hybrid  Huh   .. means 4080 is like 700% faster ... realy? LOL!
and it's not like my cpu is the bottleneck... it has hardly anything to do... since its mostly gpu intensive !!

I think i'll go for an Nvidia super card next year.. for sure.. Tongue

cheers,

...but erm... is it possible i have such slow speeds... cause i have to use sRestore on 50fps content !??  
but then again, that's more of a cpu task if you ask me ...?


chees,
TD
Reply
#8
Note that I uses TensorRT, which takes quite a while to create a .engine file (will be created for each resolution), but then is quite a bit faster than when using ncnn.
But yes, after building the .engine file, processing ran at ~21fps and for a 3:34.68 clip it took 258.18 seconds (20.79 fps).

Seeint the CUDA and Tensor RT core counts (4060ti vs 4080):
Cores    4352        9728
RT Cores    32            76
and taking into account that the 4080 uses faster memory, I would have expected it to be ~2.5 faster.
=> I suspect you didn't use TensorRT, which could explain the speed difference.

Quote:...but erm... is it possible i have such slow speeds... cause i have to use sRestore on 50fps content !??
but then again, that's more of a cpu task if you ask me ...?
what speeds do you get if you use for example spline64 as resizer? (instead of RealESRGAN; I still suspect it's mainly due to not using TensorRT)


Cu Selur
Reply
#9
(20.12.2023, 22:05)Selur Wrote: Note that I uses TensorRT, which takes quite a while to create a .engine file (will be created for each resolution), but then is quite a bit faster than when using ncnn.
But yes, after building the .engine file, processing ran at ~21fps and for a 3:34.68 clip it took 258.18 seconds (20.79 fps).

Seeint the CUDA and Tensor RT core counts (4060ti vs 4080):
Cores    4352        9728
RT Cores    32            76
and taking into account that the 4080 uses faster memory, I would have expected it to be ~2.5 faster.
=> I suspect you didn't use TensorRT, which could explain the speed difference.

Cu Selur



First of indeed Twice as much cores , but 3x times the price (minimum..) the same applies to power consumption LoL Tongue

Very true... you've guessed it right..  And i did checked Tensor first then both Tensor + fp16 (for faster .. so it claims but..) ... Indeed it took SOO LONG for the preview, that i thought it would be much slower using Tensor...

But like you said, if it's doing an prep and we users don't know that because we just don't see all the ongoing processes in hybrid you know.. If you put it like that, it might be faster in totality... 

but is it now ?  Have you add up the time it took to prepare before the actuall encode started ?

Btw... my poor 4060ti encreased to speed 3.04 in the meanwhile Tongue

cheers,
TD
Reply
#10
Quote:but is it now ? Have you add up the time it took to prepare before the actuall encode started ?
The .engine file is build for different settings, so you can reuse them for different files as long as the settings stay the same,...
In my case it was named:
realesr-general-x4v3.pth_NVIDIA GeForce RTX 4080_trt-8.6.1_720x576_fp16_workspace-1073741824_denoise-0.5.pt
(vs-mlrt uses .engine as extension)
Using TensorRT doesn't make sense if you use short clip, but it does make sense for stuff that is longer.

Cu Selur
Reply


Forum Jump:


Users browsing this thread: 4 Guest(s)