[HELP] Things went wrong after saving global profile with VS yuv420 chroma location left

[HELP] Things went wrong after saving global profile with VS yuv420 chroma location left - Printable Version

+- Selur's Little Message Board (https://forum.selur.net)
+-- Forum: Hybrid - Support (https://forum.selur.net/forum-1.html)
+--- Forum: Problems & Questions (https://forum.selur.net/forum-3.html)
+--- Thread: [HELP] Things went wrong after saving global profile with VS yuv420 chroma location left (/thread-4424.html)

Pages: 1 2 3 4 5

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - andrewschen - 21.06.2026

Quote:Here it does not happen mainly when using drag&drop, but if that is not the case for you the culprit must be somewhere else.

So I did a little more test for drag&drop because of what you said:

For loading the same file twice:

1. first drag&drop, next open dialog : error -> the test case I do earlier, and I didn't notice the next condition
2. first open dialog, next drag&drop : ok
3. drag&drop twice : ok
4. open dialogtwice : error

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - Selur - 21.06.2026

I uploaded a new dev, which seems to fix the issue here.
I added some additional restrictions when to refresh the VapourSynth script; less frequent now.
This seems to help, but I'm not sure whether this has some unintended side effect. Smile

=> Let me know whether this dev fixes the problem for you.

Cu Selur

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - andrewschen - 21.06.2026

Downloading new dev, and if I want to move the already installed addons from old ver dir to new ver dir, what's the right way to do that? Is it to delete the pycache folder in \Hybrid_dev_folder\64bit\Vapoursynth\Lib\site-packages\ and \Hybrid_dev_folder\64bit\vs-mlrt, totally 2 folders?

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - Selur - 21.06.2026

For vs-mlrt: move all the vs-mlrt related folders to the side, copy them back after deinstalling and installing Hybrid
For the torch add-on: move all the torch add-on folders (especially the Vapoursynth-folder) to the side, after deinstalling and installing Hybrid, delete the existing Vapoursynth folder and copy back the torch add-on folders.
For avisynth 32bit: same as with vs-mlrt, move all related folders to the side and copy them back after a clean install.
(delete __pycache__ folders)

Cu Selur

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - andrewschen - 21.06.2026

Hybrid_dev_2026.06.21-184241: loading error fixed.

Guess that's all for now.... Might try reinstall windows later. Huh

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - andrewschen - 22.06.2026

Gemini tolds me I can change part the content of vsmlrt.py from

Quote: if input_format != output_format:
raise ValueError("input format must be equal to output format")

Quote: output_format = input_format
if input_format != output_format:
raise ValueError("input format must be equal to output format")

Now DPIRDenoise(mlrt)(TensorRT FP16) is working at my original speed 15 fps. SCUNet(mlrt) (TensorRT FP16) back to 3~5 fps.

Could this cause any problem?

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - Selur - 22.06.2026

You are basically removing that check.
So, yes, if that check was intended, it can cause problems.

This https://pastebin.com/hyHQ42dg is the vsmlrt.py that is used on my system. (3.23.1) which is the current version and part of vs-mlrt_2026.06.06 and also the latest version in the project https://github.com/AmusementClub/vs-mlrt/blob/master/scripts/vsmlrt.py.

What version are you using?
Does "3.22.39" work for you (https://github.com/AmusementClub/vs-mlrt/blob/8e330a145d556c0f46af089f9e719f47ac66baff/scripts/vsmlrt.py) ?

Cu Selur

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - andrewschen - 22.06.2026

The test I did was running on the same version 3.23.1 included in vs-mlrt_2026.06.06, only with the extra " output_format = input_format" added.

So I replace it with ver 3.22.39 and run another test (cleared engine folder, delete all the __pycache__ folder in Hybrid dir), but now failed in both TensorRT & TensorRT FP16. Below was error log of TensorRT only:

Quote:2026-06-22 19:37:46.556
Plugin D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt/vstrt.dll is using API3 which is deprecated and will be removed shortly.
Plugin D:/APPD/Hybrid_dev_20260621-184241/64bit/vsfilters/SourceFilter/LSmashSource/LSMASHSource.dll is using API3 which is deprecated and will be removed shortly.
2026-06-22 19:37:47.149
Failed to evaluate the script:
Python exception: trtexec execution fails, log has been written to C:\Users\mmddffkk\AppData\Local\Temp\trtexec_260622_193746.log

Traceback (most recent call last):
File "vapoursynth.pyx", line 3623, in vapoursynth._vpy_evaluate
File "vapoursynth.pyx", line 3624, in vapoursynth._vpy_evaluate
File "D:\APPD\HybridFiles\Temp\tempPreviewVapoursynthFile19_37_46_002.vpy", line 52, in
clip = vsmlrt.DPIR(clip, strength=5.000, overlap=16, model=1, backend=Backend.TRT(fp16=False,device_id=0,bf16=False,verbose=True,use_cuda_graph=False,num_streams=1,builder_optimization_level=3,engine_folder="D:/APPD/HybridFiles/Engine"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt/vsmlrt.py", line 595, in DPIR
clip = inference_with_fallback(
^^^^^^^^^^^^^^^^^^^^^^^^
File "D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt/vsmlrt.py", line 3056, in inference_with_fallback
raise e
File "D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt/vsmlrt.py", line 3033, in inference_with_fallback
ret = _inference(
^^^^^^^^^^^
File "D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt/vsmlrt.py", line 2850, in _inference
engine_path = trtexec(
^^^^^^^^
File "D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt/vsmlrt.py", line 2194, in trtexec
raise RuntimeError(f"trtexec execution fails, log has been written to {log_filename}")
RuntimeError: trtexec execution fails, log has been written to C:\Users\mmddffkk\AppData\Local\Temp\trtexec_260622_193746.log

Update: and here's the trtexec_260622_193746.log

Quote:&&&& RUNNING TensorRT.trtexec [TensorRT v110000] [b114] # D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt\vsmlrt-cuda\trtexec --onnx=D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt\models\dpir\drunet_color.onnx --timingCacheFile=D:/APPD/HybridFiles/Engine\469697a4.engine.cache --device=0 --saveEngine=D:/APPD/HybridFiles/Engine\469697a4.engine --shapes=input:1x4x816x1920 --verbose --tacticSources=-CUBLAS,-CUBLAS_LT,-CUDNN,+EDGE_MASK_CONVOLUTIONS,+JIT_CONVOLUTIONS --skipInference --noTF32 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --builderOptimizationLevel=3 --precisionConstraints=obey --layerPrecisions=Conv_123:fp32
=== Model Options ===
--onnx=<file> ONNX model

=== Build Options ===
--minShapes=spec Build with dynamic shapes using a profile with the min shapes provided
--optShapes=spec Build with dynamic shapes using a profile with the opt shapes provided
--maxShapes=spec Build with dynamic shapes using a profile with the max shapes provided
--inputIOFormats=spec Memory layout of each of the input tensors (default = chw)
See --outputIOFormats help for the grammar of format list.
Note: If this option is specified, please set comma-separated formats for all
inputs following the same order as network inputs ID (even if only one input
needs specifying IO format) or set the format once for broadcasting.
--outputIOFormats=spec Memory layout of each of the output tensors (default = chw)
Note: If this option is specified, please set comma-separated formats for all
outputs following the same order as network outputs ID (even if only one output
needs specifying IO format) or set the format once for broadcasting.
IO Formats: spec ::= IOfmt[","spec]
IOfmt ::= ("chw"|"chw2"|"hwc8"|"chw4"|"chw16"|"chw32"|"dhwc8"|
"cdhw32"|"hwc"|"dla_linear"|"dla_hwc4"|"hwc16"|"dhwc")["+"IOfmt]
--memPoolSize=poolspec Specify the size constraints of the designated memory pool(s)
Supports the following base-2 suffixes: B (Bytes), G (Gibibytes), K (Kibibytes), M (Mebibytes).
If none of suffixes is appended, the defualt unit is in MiB.
Note: Also accepts decimal sizes, e.g. 0.25M. Will be rounded down to the nearest integer bytes.
In particular, for dlaSRAM the bytes will be rounded down to the nearest power of 2.
Pool constraint: poolspec ::= poolfmt[","poolspec]
poolfmt ::= poolize
pool ::= "workspace"|"dlaSRAM"|"dlaLocalDRAM"|"dlaGlobalDRAM"|"tacticSharedMem"
--profilingVerbosity=mode Specify profiling verbosity. mode ::= layer_names_only|detailed|none (default = layer_names_only).
Please only assign once.
--avgTiming=M Set the number of times averaged in each iteration for kernel selection (default = 8)
--refit Mark the engine as refittable. This will allow the inspection of refittable layers
and weights within the engine.
--stripWeights Strip weights from plan. This flag works with either refit or refit with identical weights. Default
to latter, but you can switch to the former by enabling both --stripWeights and --refit at the same
time.
--stripAllWeights Alias for combining the --refit and --stripWeights options. It marks all weights as refittable,
disregarding any performance impact. Additionally, it strips all refittable weights after the
engine is built.
--versionCompatible, --vc Mark the engine as version compatible. This allows the engine to be used with newer versions
of TensorRT on the same host OS, as well as TensorRT's dispatch and lean runtimes.
--pluginInstanceNorm, --pi Set `kNATIVE_INSTANCENORM` to false in the ONNX parser. This will cause the ONNX parser to use
a plugin InstanceNorm implementation over the native implementation when parsing.
--uint8AsymmetricQuantizationDLA Set `kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLA` to true in the ONNX parser. This directs the
ONNX parser to allow UINT8 as a quantization data type and import zero point values directly
without converting to float type or all-zero values. Should only be set with DLA software version
>= 3.16.
--reportCapabilityDLA Set `kREPORT_CAPABILITY_DLA` to true in the ONNX parser. This signals the ONNX parser to validate
that all nodes in the model can run on DLA. This flag is set to be OFF by default.
--adjustForDLA Set `kADJUST_FOR_DLA` to true in the ONNX parser. This signals the ONNX parser to opportunistically
rewrite or modify layers to make them more amenable to running on DLA. This flag is set to be OFF
by default.
--enablePluginOverride Set `kENABLE_PLUGIN_OVERRIDE` to true in the ONNX parser. This allows the ONNX parser to use
a plugin implementation over the standard ONNX operator implementation when parsing.
--useRuntime=runtime TensorRT runtime to execute engine. "lean" and "dispatch" require loading VC engine and do
not support building an engine.
runtime::= "full"|"lean"|"dispatch"
--leanDLLPath=<file> External lean runtime DLL to use in version compatible mode.
--excludeLeanRuntime When --versionCompatible is enabled, this flag indicates that the generated engine should
not include an embedded lean runtime. If this is set, the user must explicitly specify a
valid lean runtime to use when loading the engine.
--monitorMemory Enable memory monitor report for debugging usage. (default = disabled)
Disables CUDA timing cache and profile streams. Only allowed when building
a safe engine (--safe) with remote auto-tuning (--remoteAutoTuningConfig).
(default = disabled)
--sparsity=spec Control sparsity (default = disabled).
Sparsity: spec ::= "disable", "enable", "force"
Note: Description about each of these options is as below
disable = do not enable sparse tactics in the builder (this is the default)
enable = enable sparse tactics in the builder (but these tactics will only be
considered if the weights have the right sparsity pattern)
force = enable sparse tactics in the builder and force-overwrite the weights to have
a sparsity pattern (even if you loaded a model yourself)
[Deprecated] this knob has been deprecated.
Please use <polygraphy surgeon prune> to rewrite the weights.
--noTF32 Disable tf32 precision (default is to enable tf32, in addition to fp32)
--stronglyTyped [Deprecated] Strongly typed network is now enabled by default. This flag is a no-op.
--directIO [Deprecated] Avoid reformatting at network boundaries. (default = disabled)
--layerDeviceTypes=spec Specify layer-specific device type.
The specs are read left-to-right, and later ones override earlier ones. If a layer does not have
a device type specified, the layer will opt for the default device type.
Per-layer device type spec ::= layerDeviceTypePair[","spec]
layerDeviceTypePair ::= layerName":"deviceType
deviceType ::= "GPU"|"DLA"
--decomposableAttentions=spec Specify decomposable attentions by comma-separated names.
The specs are read left-to-right, and later ones override earlier ones. Each layer name can
contain at most one wildcard ('*') character.
--safe Enable build safety certified engine.
If DLA is enabled, --buildDLAStandalone will be specified
--dumpKernelText Dump the kernel text to a file, only available when --safe is enabled
--buildDLAStandalone Enable build DLA standalone loadable which can be loaded by cuDLA, when this option is enabled,
--allowGPUFallback is disallowed and --skipInference is enabled by default. Additionally,
specifying --inputIOFormats and --outputIOFormats restricts memory layout
(default = disabled)
--allowGPUFallback When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)
--consistency Perform consistency checking on safety certified engine
--saveEngine=<file> Save the serialized engine
--loadEngine=<file> Load a serialized engine
--asyncFileReader Load a serialized engine using async stream reader. Should be combined with --loadEngine.
--getPlanVersionOnly Print TensorRT version when loaded plan was created. Works without deserialization of the plan.
Use together with --loadEngine. Supported only for engines created with 8.6 and forward.
--tacticSources=tactics Specify the tactics to be used by adding (+) or removing (-) tactics from the default
tactic sources (default = all available tactics).
Note: Currently only edge mask convolutions and JIT convolutions are listed as optional
tactics.
Tactic Sources: tactics ::= tactic[","tactics]
tactic ::= (+|-)lib
lib ::= "EDGE_MASK_CONVOLUTIONS"|"JIT_CONVOLUTIONS"
For example, to disable edge mask convolutions: --tacticSources=-EDGE_MASK_CONVOLUTIONS
--noBuilderCache Disable timing cache in builder (default is to enable timing cache)
--noCompilationCache Disable Compilation cache in builder, and the cache is part of timing cache (default is to enable compilation cache)
--errorOnTimingCacheMiss Emit error when a tactic being timed is not present in the timing cache (default = false)
--timingCacheFile=<file> Save/load the serialized global timing cache
--preview=features Specify preview feature to be used by adding (+) or removing (-) preview features from the default
Preview Features: features ::= feature[","features]
feature ::= (+|-)flag
flag ::= "aliasedPluginIO1003"
|"runtimeActivationResize"
--builderOptimizationLevel Set the builder optimization level. (default is 3)
A Higher level allows TensorRT to spend more time searching for better optimization strategy.
Valid values include integers from 0 to the maximum optimization level, which is currently 5.
--maxTactics Set the maximum number of tactics to time when there is a choice of tactics. (default is -1)
Larger number of tactics allow TensorRT to spend more building time on evaluating tactics.
Default value -1 means TensorRT can decide the number of tactics based on its own heuristic.
--hardwareCompatibilityLevel=mode Make the engine file compatible with other GPU architectures. (default = none)
Hardware Compatibility Level: mode ::= "none" | "ampere+" | "sameComputeCapability"
none = no compatibility
ampere+ = compatible with Ampere and newer GPUs
sameComputeCapability = compatible with GPUs that have the same Compute Capability version
--runtimePlatform=platform Set the target platform for runtime execution. (default = SameAsBuild)
When this option is enabled, --skipInference is enabled by default.
RuntimePlatfrom: platform ::= "SameAsBuild" | "WindowsAMD64"
SameAsBuild = no requirement for cross-platform compatibility.
WindowsAMD64 = set the target platform for engine execution as Windows AMD64 system
--tempdir=<dir> Overrides the default temporary directory TensorRT will use when creating temporary files.
See IRuntime:etTemporaryDirectory API documentation for more information.
--tempfileControls=controls Controls what TensorRT is allowed to use when creating temporary executable files.
Should be a comma-separated list with entries in the format (in_memory|temporary)allow|deny).
in_memory: Controls whether TensorRT is allowed to create temporary in-memory executable files.
temporary: Controls whether TensorRT is allowed to create temporary executable files in the
filesystem (in the directory given by --tempdir).
For example, to allow in-memory files and disallow temporary files:
--tempfileControls=in_memory:allow,temporary:deny
If a flag is unspecified, the default behavior is "allow".
--maxAuxStreams=N Set maximum number of auxiliary streams per inference stream that TRT is allowed to use to run
kernels in parallel if the network contains ops that can run in parallel, with the cost of more
memory usage. Set this to 0 for optimal memory usage. (default = using heuristics)
--profile Build with dynamic shapes using a profile with the min/max/opt shapes provided. Can be specified
multiple times to create multiple profiles with contiguous index.
(ex: --profile=0 --minShapes=<spec> --optShapes=<spec> --maxShapes=<spec> --profile=1 ...)
--allowWeightStreaming Enable a weight streaming engine. TensorRT will disable
weight streaming at runtime unless --weightStreamingBudget is specified.
--markDebug Specify list of names of tensors to be marked as debug tensors. Separate names with a comma
--markUnfusedTensorsAsDebugTensors Mark unfused tensors as debug tensors
--tilingOptimizationLevel Set the tiling optimization level. (default is 0)
A Higher level allows TensorRT to spend more time searching for better optimization strategy.
Valid values include integers from 0 to the maximum tiling optimization level(3).
--l2LimitForTiling Set the L2 cache usage limit for tiling optimization(default is -1)
--remoteAutoTuningConfig Set the remote auto tuning config. Must be specified with --safe.
Format: protocol://username[:password]@hostname[:port]?param1=value1&param2=value2
Example: ssh://user:pass@192.0.2.100:22?remote_exec_path=/opt/tensorrt/bin&remote_lib_path=/opt/tensorrt/lib
--refitFromOnnx Refit the loaded engine with the weights from the provided ONNX model.
The model should be identical to the one used to generate the engine.
--cpuOnly Build the engine with CPU-only mode. No local GPU is required on the build machine.
Must be specified with --remoteAutoTuningConfig and --safe flags.

=== Inference Options ===
--shapes=spec Set input shapes for dynamic shapes inference inputs.
Note: Input names can be wrapped with escaped single quotes (ex: 'Input:0').
Example input shapes spec: input0:1x3x256x256, input1:1x3x128x128
For scalars (0-D shapes), use input0calar or simply input0: with nothing after the colon.
Each input shape is supplied as a key-value pair where key is the input name and
value is the dimensions (including the batch dimension) to be used for that input.
Each key-value pair has the key and value separated using a colon (.
Multiple input shapes can be provided via comma-separated key-value pairs, and each input
name can contain at most one wildcard ('*') character.
--loadInputs=spec Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')
Input values spec ::= Ival[","spec]
Ival ::= name":"file
Consult the README for more information on generating files for custom inputs.
--iterations=N Run at least N inference iterations (default = 10)
--warmUp=N Run for N milliseconds to warmup before measuring performance (default = 200)
--duration=N Run performance measurements for at least N seconds wallclock time (default = 3)
If -1 is specified, inference will keep running unless stopped manually
--sleepTime=N Delay inference start with a gap of N milliseconds between launch and compute (default = 0)
--idleTime=N Sleep N milliseconds between two continuous iterations(default = 0)
--infStreams=N Instantiate N execution contexts to run inference concurrently (default = 1)
--exposeDMA Serialize DMA transfers to and from device (default = disabled).
--includeDataTransfers Enable DMA transfers to and from device (default = disabled). Note some device-to-host
data transfers will remain if output dumping is enabled via the --dumpOutput or
--exportOutput flags.
--noDataTransfers [Deprecated] DMA transfers are now disabled by default. This flag is a no-op.
--useManagedMemory Use managed memory instead of separate host and device allocations (default = disabled).
--noSpinWait Disable spin wait and use blocking synchronization instead. This may reduce CPU usage but increase synchronization time (default = spin wait enabled)
--useSpinWait [Deprecated] Spin wait is now enabled by default. This flag is a no-op.
--threads Enable multithreading to drive engines with independent threads or speed up refitting (default = disabled)
--noCudaGraph Disable CUDA graph capture and launch (default = CUDA graph enabled).
--useCudaGraph [Deprecated] CUDA graph is now enabled by default. This flag is a no-op.
--timeDeserialize Time the amount of time it takes to deserialize the network and exit.
--timeRefit Time the amount of time it takes to refit the engine before inference.
--separateProfileRun [Deprecated] Separate profile run is now always enabled. This flag is a no-op.
--skipInference Exit after the engine has been built and skip inference perf measurement (default = disabled)
--persistentCacheRatio Set the persistentCacheLimit in ratio, 0.5 represent half of max persistent L2 size (default = 0)
--useProfile Set the optimization profile for the inference context (default = 0 ).
--allocationStrategy=spec Specify how the internal device memory for inference is allocated.
Strategy: spec ::= "static"|"profile"|"runtime"
static = Allocate device memory based on max size across all profiles.
profile = Allocate device memory based on max size of the current profile.
runtime = Allocate device memory based on the actual input shapes.
--saveDebugTensors Specify list of names of tensors to turn on the debug state
and filename to save raw outputs to.
These tensors must be specified as debug tensors during build time.
Input values spec ::= Ival[","spec]
Ival ::= name":"file
--saveAllDebugTensors Save all debug tensors to files.
Including debug tensors marked by --markDebug and --markUnfusedTensorsAsDebugTensors
Multiple file formats can be saved simultaneously.
Input values spec ::= format[","format]
format ::= "summary"|"numpy"|"string"|"raw"
--weightStreamingBudget Set the maximum amount of GPU memory TensorRT is allowed to use for weights.
It can take on the following values:
-2: (default) Disable weight streaming at runtime.
-1: TensorRT will automatically decide the budget.
0-100%: Percentage of streamable weights that reside on the GPU.
0% saves the most memory but will have the worst performance.
Requires the '%' character.
>=0B: The exact amount of streamable weights that reside on the GPU. Supports the
following base-2 suffixes: B (Bytes), G (Gibibytes), K (Kibibytes), M (Mebibytes).

=== Reporting Options ===
--verbose Use verbose logging (default = false)
--avgRuns=N Report performance measurements averaged over N consecutive iterations (default = 10)
--percentile=P1,P2,P3,... Report performance for the P1,P2,P3,... percentages (0<=P_i<=100, 0 representing max perf, and 100 representing min perf; (default = 90,95,99%)
--dumpRefit Print the refittable layers and weights from a refittable engine
--dumpOutput Print the output tensor(s) of the last inference iteration (default = disabled)
--dumpRawBindingsToFile Print the input/output tensor(s) of the last inference iteration to file(default = disabled)
--dumpProfile Print profile information per layer (default = disabled)
--dumpLayerInfo Print layer information of the engine to console (default = disabled)
--dumpOptimizationProfile Print the optimization profile(s) information (default = disabled)
--exportTimes=<file> Write the timing results in a json file (default = disabled)
--exportOutput=<file> Write the output tensors to a json file (default = disabled)
--exportProfile=<file> Write the profile information per layer in a json file (default = disabled)
--exportLayerInfo=<file> Write the layer information of the engine in a json file (default = disabled)

=== System Options ===
--device=N Select cuda device N (default = 0)
--staticPlugins Plugin library (.so) to load statically (can be specified multiple times)
--useDLACore=N Select DLA core N for layers that support DLA (default = none)
--staticPlugins Plugin library (.so) to load statically (can be specified multiple times)
--dynamicPlugins Plugin library (.so) to load dynamically and may be serialized with the engine if they are included in --setPluginsToSerialize (can be specified multiple times)
--setPluginsToSerialize Plugin library (.so) to be serialized with the engine (can be specified multiple times)
--ignoreParsedPluginLibs By default, when building a version-compatible engine, plugin libraries specified by the ONNX parser
are implicitly serialized with the engine (unless --excludeLeanRuntime is specified) and loaded dynamically.
Enable this flag to ignore these plugin libraries instead.
--safetyPlugins Plugin library (.so) for TensorRT auto safety to manually load safety plugins specified by the command line arguments.
Example: --safetyPlugins=/path/to/plugin_lib.so[pluginNamespace1::plugin1,pluginNamespace2::plugin2].
The option can be specified multiple times with different plugin libraries.

=== Help ===
--help, -h Print this message
[06/22/2026-19:37:47] [E] Invalid TensorFormat fp32:chw
&&&& FAILED TensorRT.trtexec [TensorRT v110000] [b114] # D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt\vsmlrt-cuda\trtexec --onnx=D:/APPD/Hybrid_dev_20260621-184241/64bit/vs-mlrt\models\dpir\drunet_color.onnx --timingCacheFile=D:/APPD/HybridFiles/Engine\469697a4.engine.cache --device=0 --saveEngine=D:/APPD/HybridFiles/Engine\469697a4.engine --shapes=input:1x4x816x1920 --verbose --tacticSources=-CUBLAS,-CUBLAS_LT,-CUDNN,+EDGE_MASK_CONVOLUTIONS,+JIT_CONVOLUTIONS --skipInference --noTF32 --inputIOFormats=fp32:chw --outputIOFormats=fp32:chw --builderOptimizationLevel=3 --precisionConstraints=obey --layerPrecisions=Conv_123:fp32

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - Selur - 22.06.2026

okay, so 3.22.39 doesn't work with the current version.
=> I'm uploading vs-mlrt_2026.03.26 to the 'experimental/old' folder in the googledrive share. (should be up in ~40min), maybe that version works for you.

Cu Selur

RE: Things went wrong after saving global profile with VS yuv420 chroma location left - Selur - 22.06.2026

vs-mlrt_2026.03.26 is up