This forum uses cookies

ToiletDuck · 20.12.2023, 20:50

Howdy,

Today, i want to try out the torch for the first time now that i own a 40xx serie nvidia card xD .
So, i dumped all folders in the x64 folder..right..

My question , is the filesize (unpacked) → 30+ GB sound about right Huh

If so, my poor 256GB os system ...

cheers,
TD

***Selur*** · (This post was last modified: 20.12.2023, 20:56 by Selur.)

Yes, extracting all the add-ons requires ~39GB of free space.

onnx_models ~8.96GB
Vapoursynth_torch ~15.5GB
vsgan_models ~8.51GB
vs-mlrt ~6.04GB

Cu Selur

ToiletDuck · 20.12.2023, 21:04

(20.12.2023, 20:56)Selur Wrote: Yes, extracting all the add-ons requires ~39GB of free space.
onnx_models ~8.96GB

Vapoursynth_torch ~15.5GB

vsgan_models ~8.51GB

vs-mlrt ~6.04GB

Cu Selur

DamN 0^0 .. thanks for the confirmation..

In addition, which green gpu do you have selur .. and what's your fps speed using x4 ESRGAN on a 720x576 SD res upscaled to 1080p Big Grin

?
My 4060ti barely can hold it together Confused

thanks,
td

....Must say... realy impresive results.. without the need of denoise/sharpening filters ^^ .. duck like very much hmm.. ° ^ °

Odd 2x looks much better than 4x.. and it's way less gpu intensive too 0_o .. Cpu utilization is like non-existing LoL..
Iam curious to see what the performance will be on the new 40xx Super cards or even 50xx Nvidia series Tongue

... indeed..

***Selur*** · 20.12.2023, 21:26

In my system, I got:

NVIDIA GeForce RTX 4080
Intel® Arc™ A380
AMD Radeon™ of my AMD Ryzen 9 9850X

I usually use the NVIDIA card when using machine learning stuff.

No clue about the speed,...
Did a quick test, resizing 720x576 (PAR 16x15) to 1920x1440(PAR 1x1) through:

# Imports

import vapoursynth as vs

# getting Vapoursynth core

import ctypes

import os

import site

core = vs.core

# Adding torch dependencies to PATH

path = site.getsitepackages()[0]+'/torch_dependencies/bin/'

ctypes.windll.kernel32.SetDllDirectoryW(path)

path = path.replace('\\', '/')

os.environ["PATH"] = path + os.pathsep + os.environ["PATH"]

# Loading Plugins

core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/Support/fmtconv.dll")

core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/DGDecNV/DGDecodeNV.dll")

# source: 'G:\TestClips&Co\files\MPEG-2\pal_lpcm.vob'

# current color space: YUV420P8, bit depth: 8, resolution: 720x576, fps: 25, color matrix: 470bg, yuv luminance scale: limited, scanorder: progressive

# Loading G:\TestClips&Co\files\MPEG-2\pal_lpcm.vob using DGSource

clip = core.dgdecodenv.DGSource("J:/tmp/vob_4c623effa02143dabae36eb7e17be929_853323747.dgi")# 25 fps, scanorder: progressive

# Setting detected color matrix (470bg).

clip = core.std.SetFrameProps(clip, _Matrix=5)

# Setting color transfer info (470bg), when it is not set

clip = clip if core.text.FrameProps(clip,'_Transfer') else core.std.SetFrameProps(clip, _Transfer=5)

# Setting color primaries info (5), when it is not set

clip = clip if core.text.FrameProps(clip,'_Primaries') else core.std.SetFrameProps(clip, _Primaries=5)

# Setting color range to TV (limited) range.

clip = core.std.SetFrameProp(clip=clip, prop="_ColorRange", intval=1)

# making sure frame rate is set to 25

clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)

clip = core.std.SetFrameProp(clip=clip, prop="_FieldBased", intval=0) # progressive

from vsrealesrgan import realesrgan as RealESRGAN

# adjusting color space from YUV420P8 to RGBH for vsRealESRGAN

clip = core.resize.Bicubic(clip=clip, format=vs.RGBH, matrix_in_s="470bg", range_s="limited")

# resizing using RealESRGAN

clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3) # 2880x2304

# resizing 2880x2304 to 1920x1440

# adjusting resizing

clip = core.resize.Bicubic(clip=clip, format=vs.RGBS, range_s="limited")

clip = core.fmtc.resample(clip=clip, w=1920, h=1440, kernel="spline64", interlaced=False, interlacedd=False)

# adjusting output color from: RGBS to YUV420P10 for x265Model

clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P10, matrix_s="470bg", range_s="limited", dither_type="error_diffusion")

# set output frame rate to 25fps (progressive)

clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)

# Output

clip.set_output()

(after waiting ages till the .engine file is build)

VSPipe.exe c:\Users\Selur\Desktop\Testing.vpy --progress -c y4m NUL

reports:

Script evaluation done in 3.88 seconds

Output 5367 frames in 258.18 seconds (20.79 fps)

Cu Selur

ToiletDuck · 20.12.2023, 21:42

(20.12.2023, 21:26)Selur Wrote: In my system, I got:
NVIDIA GeForce RTX 4080

Intel® Arc™ A380

AMD Radeon™ of my AMD Ryzen 9 9850X

I usually use the NVIDIA card when using machine learning stuff.

First of → 9850x Dodgy

!? are you still on an AM3 Amd PHENOM Quad-core system mayhap Tongue

?
I asume you meant to write R9 5950x , right? Just one step higher than mine → 5900x Big Grin

Also, are you using Intel Arc gpu in conjuction with ur nvidia in Sli config ? That even possible, using mixed brands in one system Huh

(20.12.2023, 21:26)Selur Wrote: No clue about the speed,...
Did a quick test, resizing 720x576 (PAR 16x15) to 1920x1440(PAR 1x1) through:

I guess you did, and you do know..

haven't you done ↓ an 53xx ish pal clip inlike 25x seconds at a 20fps average speed ? Wich is very impressive speeds if that's true !

(20.12.2023, 21:26)Selur Wrote: Script evaluation done in 3.88 seconds
Output 5367 frames in 258.18 seconds (20.79 fps)

Cu Selur

Also, i can see you used ESRGAN.. but wich model have you used ? i.e: 2xPlus, 4xPlus.. realsr ?
BIG ↑ difference, quality / speed wise !!

cheers,

***Selur*** · (This post was last modified: 20.12.2023, 21:51 by Selur.)

Sorry, meant: AMD Ryzen 9 7950X for the cpu.
No, I use the Intel card separately, mainly for AV1 encoding.

I used https://github.com/HolyWu/vs-realesrgan

clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3)

which is the general model:

= ESRGAN_SRx4_DF2KOST_official-ff704c30 (official ESRGAN x4 model)

= RealESRGAN_x2plus (x2 model for general images)

= RealESRGAN_x4plus (x4 model for general images)

= RealESRGAN_x4plus_anime_6B (x4 model optimized for anime images)

= realesr-animevideov3 (x4 model optimized for anime videos)

= realesr-general-x4v3 (tiny small x4 model for general scenes)

Cu Selur

ToiletDuck · 20.12.2023, 21:55

(20.12.2023, 21:49)Selur Wrote: Sorry, meant: AMD Ryzen 9 7950X for the cpu.
No, I use the Intel card separately, mainly for AV1 encoding.

clip = RealESRGAN(clip=clip, model=5, device_index=0, trt=True, trt_cache_path=r"J:\tmp", num_streams=3)
which is the general model:

0 = ESRGAN_SRx4_DF2KOST_official-ff704c30 (official ESRGAN x4 model) 1 = RealESRGAN_x2plus (x2 model for general images) 2 = RealESRGAN_x4plus (x4 model for general images) 3 = RealESRGAN_x4plus_anime_6B (x4 model optimized for anime images) 4 = realesr-animevideov3 (x4 model optimized for anime videos) 5 = realesr-general-x4v3 (tiny small x4 model for general scenes)
Cu Selur

There you go..→ model=5 Check Wink

so and again, in a nutshel... you have upscaled dvd/SD content of about 3min give or take to 1080p using Realesr-general (wich is quite faster than 4x btw) inllike realtime more or less (25x secs) that's about like 20 ish frames per second if your source is 25fps uhuh..

If that info is about right.. damn ...

Me using the same settings , my poor gpu ( 4060-Ti) achieves like 3fps in hybrid Huh

.. means 4080 is like 700% faster ... realy? LOL!
and it's not like my cpu is the bottleneck... it has hardly anything to do... since its mostly gpu intensive !!

I think i'll go for an Nvidia super card next year.. for sure.. Tongue

cheers,

...but erm... is it possible i have such slow speeds... cause i have to use sRestore on 50fps content !??
but then again, that's more of a cpu task if you ask me ...?

chees,
TD

***Selur*** · (This post was last modified: 20.12.2023, 22:07 by Selur.)

Note that I uses TensorRT, which takes quite a while to create a .engine file (will be created for each resolution), but then is quite a bit faster than when using ncnn.
But yes, after building the .engine file, processing ran at ~21fps and for a 3:34.68 clip it took 258.18 seconds (20.79 fps).

Seeint the CUDA and Tensor RT core counts (4060ti vs 4080):

Cores    4352        9728

RT Cores    32            76

and taking into account that the 4080 uses faster memory, I would have expected it to be ~2.5 faster.
=> I suspect you didn't use TensorRT, which could explain the speed difference.

Quote:...but erm... is it possible i have such slow speeds... cause i have to use sRestore on 50fps content !??
but then again, that's more of a cpu task if you ask me ...?

what speeds do you get if you use for example spline64 as resizer? (instead of RealESRGAN; I still suspect it's mainly due to not using TensorRT)

Cu Selur

ToiletDuck · 20.12.2023, 22:10

(20.12.2023, 22:05)Selur Wrote: Note that I uses TensorRT, which takes quite a while to create a .engine file (will be created for each resolution), but then is quite a bit faster than when using ncnn.
But yes, after building the .engine file, processing ran at ~21fps and for a 3:34.68 clip it took 258.18 seconds (20.79 fps).

Seeint the CUDA and Tensor RT core counts (4060ti vs 4080):

Cores 4352 9728 RT Cores 32 76
and taking into account that the 4080 uses faster memory, I would have expected it to be ~2.5 faster.
=> I suspect you didn't use TensorRT, which could explain the speed difference.

Cu Selur

First of indeed Twice as much cores , but 3x times the price (minimum..) the same applies to power consumption LoL Tongue

Very true... you've guessed it right.. And i did checked Tensor first then both Tensor + fp16 (for faster .. so it claims but..) ... Indeed it took SOO LONG for the preview, that i thought it would be much slower using Tensor...

But like you said, if it's doing an prep and we users don't know that because we just don't see all the ongoing processes in hybrid you know.. If you put it like that, it might be faster in totality...

but is it now ? Have you add up the time it took to prepare before the actuall encode started ?

Btw... my poor 4060ti encreased to speed 3.04 in the meanwhile Tongue

cheers,
TD

***Selur*** · (This post was last modified: 20.12.2023, 22:16 by Selur.)

Quote:but is it now ? Have you add up the time it took to prepare before the actuall encode started ?

The .engine file is build for different settings, so you can reuse them for different files as long as the settings stay the same,...
In my case it was named:

realesr-general-x4v3.pth_NVIDIA GeForce RTX 4080_trt-8.6.1_720x576_fp16_workspace-1073741824_denoise-0.5.pt

(vs-mlrt uses .engine as extension)
Using TensorRT doesn't make sense if you use short clip, but it does make sense for stuff that is longer.

Cu Selur

Login
Username:
Password:	Lost Password?
	Remember me