Selur's Little Message Board

Hi @Selur,
I have a few questions and suggestions.

First, could you add support for dynamic TensorRT models? Specifically, the ability to include min, opt, and max shapes during the model building process.

Second, would it be possible to turn off disabled optimizations by default (like -Jitt cudnn...), or alternatively, allow the user to select a "classic" build using only a flag like --fp16, --bf16, or --fp32? This would let trtexec find the best optimizations on its own.
Here's why I'm asking: for some models, when I create them the same way the hybrid build does by default, I get drastically slower FPS or tile-FPS. This is in comparison to a classic build using a simple command like:
trtexec --onnx=model.onnx --saveEngine=model.engine --shapes=input:1x3x1080x1920 --fp16 --verbose
This issue is particularly noticeable with more complex and demanding GAN models.

A third point: I'm not 100% certain, but I believe the author of the vs-mlrt addon managed to enable fp16 for tensorrt_RTX. I'm not sure if they did this by first quantizing the model with NVIDIA's ModelOpt or through a standard onnxconvert common process. As we know, the standard tensorrt_rtx build doesn't natively support --bf16 and --fp16 flags.
Thanks a lot for all your work, @Selur.
Best regards.

Quote:First, could you add support for dynamic TensorRT models? Specifically, the ability to include min, opt, and max shapes during the model building process.

Assuming you are referring to vs-mlrt and
TRT_RTX:

Code:
static_shape: bool = True

        min_shapes: typing.Tuple[int, int] = (0, 0)

        opt_shapes: typing.Optional[typing.Tuple[int, int]] = None

        max_shapes: typing.Optional[typing.Tuple[int, int]] = None

TRT:

Code:
static_shape: bool = True

        min_shapes: typing.Tuple[int, int] = (0, 0)

        max_shapes: typing.Optional[typing.Tuple[int, int]] = None

        opt_shapes: typing.Optional[typing.Tuple[int, int]] = None

In theory yes, but I would need to know:
a. What are allowed values for those tuples?
b. Are these independent of tiling settings?
c. What are the defaults for min_shapes and opt_shapes?
=> if you can answer these, I can look into adding support for non-static shapes for TRT&TRT_RTX.

Quote:Second, would it be possible to turn off disabled optimizations by default (like -Jitt cudnn...), or alternatively, allow the user to select a "classic" build using only a flag like --fp16, --bf16, or --fp32? This would let trtexec find the best optimizations on its own.
Here's why I'm asking: for some models, when I create them the same way the hybrid build does by default, I get drastically slower FPS or tile-FPS. This is in comparison to a classic build using a simple command like:
trtexec --onnx=model.onnx --saveEngine=model.engine --shapes=input:1x3x1080x1920 --fp16 --verbose
This issue is particularly noticeable with more complex and demanding GAN models.

I don't see how I could while using vsmlrt.py and using the convenient wrappers.
=> if you can tell me how to do this, I can think about adding support for it.

Quote:A third point: I'm not 100% certain, but I believe the author of the vs-mlrt addon managed to enable fp16 for tensorrt_RTX. I'm not sure if they did this by first quantizing the model with NVIDIA's ModelOpt or through a standard onnxconvert common process. As we know, the standard
tensorrt_rtx build doesn't natively support --bf16 and --fp16 flags.

afaik
v15.13.cu13: latest TensorRT libraries: does not support TRT_RTX at all
v15.13.ort: latest ONNX Runtime libraries: fp16 inference for RIFE v2 and SAFA models, as well as fp32/fp16 inference for some SwinIR models, are not currently working in TRT_RTX.
Hybrid itself, does allow settings FP16 with TRT_RTX in the devs for a week or so.
It allows calling SCUNet for example with:

Code:
clip = vsmlrt.SCUNet(clip=clip, model=4, overlap=16, backend=Backend.TRT_RTX(fp16=True,device_id=0,verbose=True,use_cuda_graph=True,num_streams=3,builder_optimization_level=3,engine_folder="J:/TRT"))

Cu Selur

The main goal and advantage of generating dynamic TensorRT models is to avoid having to create a new model for every single input resolution. This would make things much more efficient.
I noticed that the author of VideoJaNai (which also uses vsmlrt as a backend) is doing something similar.

https://github.com/the-database/VideoJaNai/

To create a single dynamic model that supports a wide range of resolutions (from an 8-pixel video all the way up to 1080p), they use the following trtexec command:
--fp16 --minShapes=input:1x3x8x8 --optShapes=input:1x3x1080x1920 --maxShapes=input:1x3x1080x1920 --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference

This is in contrast to their static model creation, which is tied to a specific resolution:
--fp16 --optShapes=input:%video_resolution% --inputIOFormats=fp16:chw --outputIOFormats=fp16:chw --tacticSources=+CUDNN,-CUBLAS,-CUBLAS_LT --skipInference

I was thinking, could you implement a similar approach? It would be great if hybrid and vsmlrt could use these dynamic models, making them universal for almost all resolutions.

The ideal workflow would be: once a dynamic model is created for a specific ONNX file, the system would reuse that same TensorRT model for any future processing with that ONNX, as long as the new input material fits within the dimensional range the model was built for (i.e., between minShapes and maxShapes).

If this is too complicated to look into right now, no worries at all—maybe something to consider for the future? I'm not exactly sure which values would need to be changed, but I thought it was a promising idea to share! Big Grin

Okay, that does not answer my questions clearly. Sad

But after some reading,...

a. min_shapes in vsmlrt is the min resolution of the content that the trained engine can handle.
b. max_shapes in vsmlrt is the max resolution of the content that the trained engine can handle.
c. opt_shapes in vsmlrt is the resolution of the content that the engine really gets trained.
The more min&max differ from opt the less reliable/efficient is the model.

So if the contents you apply a model to only have just a few resolutions, you get better results training the model for each resolution.
If the resolutions of your content, are around a few fixed resolutions, creating multiple engine files around those fixed resolutions is a good idea.
Using something like:

Code:
min_shapes = (720, 480)       # smallest resolution

opt_shapes = (1920, 1080)     # most frequent resolution

max_shapes = (3840, 2160)     # highest resolution

static_shape = False

might be 'okay', if you want to only create one engine, but the min and max resolution will be less efficient than individual models.
If you use tiling, note that the tiling size is the resolution, so if you always use 256x256 tiling it does not make sense to use dynamic shapes.
So if you have a model that requires a specific min resolution it does not make sense to use a min_shapes value below this resolution.
Would be interresting to read about some tests, experiences about how 'much' diversion is 'okay' between opt<>min opt<>max before one really should better create a separate engine file.

Cu Selur

Ps.: send you link to a test version => let me know whether that works as expected.

Thanks for the test version.
It looks like you've successfully implemented the dynamic TensorRT build.
I don't think there's any visual difference in the output file when using the dynamic model. It just runs a bit slower when the resolution isn't optimal, and the performance difference can vary from model to model.

I tested the SCUNet vsmlrt plugin by creating a single engine for resolutions from 8x8 to 4096x4096. I then tried it with clips ranging from 176x144 to 4K input, and it worked perfectly without needing to rebuild the TensorRT engine. The dynamic engine did its job as expected. I assume that DPIR and the other internal vsmlrt filters are working correctly too.

However, I'm getting a syntax-related error in the resize section; please see the screenshot and the error text.

Code:
# Imports

import vapoursynth as vs

# getting Vapoursynth core

import logging

import site

import sys

import os

core = vs.core

# Import scripts folder

scriptPath = 'C:/Program Files/Hybrid/64bit/vsscripts'

sys.path.insert(0, os.path.abspath(scriptPath))

os.environ["CUDA_MODULE_LOADING"] = "LAZY"

# Force logging to std:err

logging.StreamHandler(sys.stderr)

# loading plugins

core.std.LoadPlugin(path="C:/Program Files/Hybrid/64bit/vsfilters/Support/vszip.dll")

core.std.LoadPlugin(path="C:/Program Files/Hybrid/64bit/vs-mlrt/vstrt.dll")

core.std.LoadPlugin(path="C:/Program Files/Hybrid/64bit/vsfilters/Support/fmtconv.dll")

core.std.LoadPlugin(path="C:/Program Files/Hybrid/64bit/vsfilters/SourceFilter/LSmashSource/LSMASHSource.dll")

# Import scripts

from importlib.machinery import SourceFileLoader

vsmlrt = SourceFileLoader('vsmlrt', 'C:/Program Files/Hybrid/64bit/vs-mlrt/vsmlrt.py').load_module()

from vsmlrt import Backend

import validate

# Source: 'F:\REMASTERIZACIJA\KLIPOVI ONLY BD\15.februar\davor\PRINC RIMA.AVI'

# Current color space: YUV422P8, bit depth: 8, resolution: 320x240, frame rate: 15fps, scanorder: progressive, yuv luminance scale: full, matrix: 470bg, format: JPEG

# Loading F:\REMASTERIZACIJA\KLIPOVI ONLY BD\15.februar\davor\PRINC RIMA.AVI using LWLibavSource

clip = core.lsmas.LWLibavSource(source="F:/REMASTERIZACIJA/KLIPOVI ONLY BD/15.februar/davor/PRINC RIMA.AVI", format="YUV422P8", stream_index=0, cache=0, prefer_hw=0)

frame = clip.get_frame(0)

# setting color matrix to 470bg.

clip = core.std.SetFrameProps(clip, _Matrix=vs.MATRIX_BT470_BG)

# setting color transfer (vs.TRANSFER_BT601), if it is not set.

if validate.transferIsInvalid(clip):

  clip = core.std.SetFrameProps(clip=clip, _Transfer=vs.TRANSFER_BT601)

# setting color primaries info (to vs.PRIMARIES_BT470_BG), if it is not set.

if validate.primariesIsInvalid(clip):

  clip = core.std.SetFrameProps(clip=clip, _Primaries=vs.PRIMARIES_BT470_BG)

# setting color range to PC (full) range.

clip = core.std.SetFrameProps(clip=clip, _ColorRange=vs.RANGE_FULL)

# making sure frame rate is set to 15fps

clip = core.std.AssumeFPS(clip=clip, fpsnum=15, fpsden=1)

# making sure the detected scan type is set (detected: progressive)

clip = core.std.SetFrameProps(clip=clip, _FieldBased=vs.FIELD_PROGRESSIVE) # progressive

# adjusting color space from YUV422P8 to RGBH for vsVSMLRT

clip = core.resize.Bicubic(clip=clip, format=vs.RGBH, matrix_in_s="470bg", range_in_s="full", range_s="full")

# resizing using VSMLRT, target: 640x480

clip = vsmlrt.inference([clip],network_path="D:/ONNX MODELI/2x_FeMaSR_SRX2_model_g-sim.onnx", backend=Backend.TRT(fp16=True,device_id=0,bf16=False,num_streams=1,verbose=True,use_cuda_graph=True,workspace=1073741824,builder_optimization_level=3,engine_folder="C:/Users/admin/Videos/HYBRIDMODELI",,static_shape=False,opt_shapes=[1280,1280],min_shapes=[8,8],max_shapes=[2560,2560])) # 640x480

# making sure 0-1 limits are respected

clip = core.vszip.Limiter(clip=clip, min=[0,0,0], max=[1,1,1])

# adjusting output color from: RGBH to YUV444P16 for FFV1Model

clip = core.resize.Bicubic(clip=clip, format=vs.YUV444P16, matrix_s="470bg", range_in_s="full", range_s="full") # additional resize to allow target color sampling

# set output frame rate to 15fps (progressive)

clip = core.std.AssumeFPS(clip=clip, fpsnum=15, fpsden=1)

# output

clip.set_output()

Quote:2025-09-07 20:09:39.260
Failed to evaluate the script:
Python exception: invalid syntax (tempPreviewVapoursynthFile20_09_39_120.vpy, line 47)
Traceback (most recent call last):
File "src/cython/vapoursynth.pyx", line 3364, in vapoursynth._vpy_evaluate
File "C:\Users\admin\AppData\Local\Temp\tempPreviewVapoursynthFile20_09_39_120.vpy", line 47
clip = vsmlrt.inference([clip],network_path="D:/ONNX MODELI/2x_FeMaSR_SRX2_model_g-sim.onnx", backend=Backend.TRT(fp16=True,device_id=0,bf16=False,num_streams=1,verbose=True,use_cuda_graph=True,workspace=1073741824,builder_optimization_level=3,engine_folder="C:/Users/admin/Videos/HYBRIDMODELI",,static_shape=False,opt_shapes=[1280,1280],min_shapes=[8,8],max_shapes=[2560,2560])) # 640x480
^
SyntaxError: invalid syntax

The same thing happens when I try to enable vsmlrt with dynamic dimensions under "VS-Others-Vsmlrt". I'm guessing it's a small and subtle bug in the code.

Let me know if you need a more detailed log. I have to admit, I forgot where the log-debug files are saved, but I do know how to enable the option in Hybrid Tongue

Typo, there are too many commas.
=> updated the Hybrid_dynamic_shapes download, which hopefully fixes the problem.

Cu Selur

mikazmaj

Selur

mikazmaj

Selur

mikazmaj

Selur