4 hours ago
What “flexible output” is for in VS-MLRT
In normal VapourSynth usage, a filter usually returns a regular
: either a single-plane
clip, a three-plane
clip, or a three-plane
clip. That works well for classic image/video restoration models, where the ONNX model takes an image and returns another image.
But not every ONNX model has that simple shape. Many ML models return tensors such as:
Those channels are not always “RGB”. They may be:
The problem is that a generic frontend like Hybrid cannot safely assume that a multi-channel ONNX output should be converted to RGB. If the model outputs 2 channels, 4 channels, 6 channels, or more, forcing it into a normal VapourSynth clip either fails or destroys the semantic meaning of the output.
That is exactly what VS-MLRT’s flexible output path solves.
VS-MLRT already exposes
as a public API, alongside the regular
function. In
,
is included in
, so it is intentionally part of the wrapper’s public interface, not just an internal hack.
How VS-MLRT flexible output works internally
The normal
path returns one
. That is suitable when the ONNX output can be represented as a standard VapourSynth clip.
The flexible path is different.
calls
with a
argument. Instead of returning only a normal clip, VS-MLRT receives a dictionary containing:
Then it extracts each output channel separately:
and returns a Python list of
s, one per output channel.
So conceptually:
That means Hybrid would not need to guess whether
,
,
, or
means RGB, YUV, mask, alpha, flow, or something else. It can expose the channels, then let the user or script decide how to combine them.
Why this matters for multi-channel models
A lot of ONNX models used in VapourSynth are not just “RGB in, RGB out”. Some are more like tensor processors.
For example:
This could mean the model predicts two chroma planes.
Another model might be:
That could mean two RGB frames are packed on input and one interpolated or restored RGB frame is returned.
Another one:
That might be RGB plus alpha, or RGB plus mask, or Y/U/V plus confidence. Without flexible output, Hybrid has no clean generic way to support that.
Another example:
This cannot be represented as a normal RGB/YUV/GRAY VapourSynth clip at all. But it is still a valid ONNX model and VS-MLRT can expose those eight channels individually through flexible output.
That is the key point: the ONNX model is valid, and VS-MLRT can run it, but a frontend that only expects one normal clip cannot represent the result correctly.
Existing proof inside VS-MLRT: ArtCNN chroma models
This is not only theoretical. The current
already uses flexible output for ArtCNN chroma models.
For chroma variants, VS-MLRT calls:
Then it reconstructs a YUV clip with:
So the model output is not treated as one RGB image. Instead, the two output channels are extracted separately as
and
, then combined with the original luma plane.
That is an excellent example to show Selur, because it proves the feature is already useful in real VS-MLRT code:
For Hybrid, this means flexible output would not be an exotic feature. It would expose a capability VS-MLRT already uses internally.
Why forcing everything into RGB is wrong
A frontend may be tempted to do something simple like:
That is safe only for very simple models.
But it breaks down quickly:
The channel count alone does not define the meaning.
Flexible output avoids pretending that the frontend knows the model’s semantics. It gives Hybrid a lower-level, lossless way to access the model result.
That is important because with ML models, channel order is part of the model contract. Treating arbitrary channels as RGB can silently produce wrong colors, wrong masks, wrong temporal behavior, or completely meaningless output.
Why Hybrid should implement it
The main reason is simple:
Hybrid should not be less capable than VS-MLRT itself.
If VS-MLRT can run a model and expose all output channels, Hybrid should ideally allow the user to access that functionality instead of blocking the model or forcing a wrong interpretation.
The benefit for Hybrid would be:
VS-MLRT’s own code path already passes
into backend model calls when flexible output is requested, so the feature is designed to work through the existing backend infrastructure.
Suggested Hybrid-level behavior
A practical Hybrid UI/script design could be:
Then Hybrid could optionally provide helpers:
But the important point is that these should be explicit choices, not automatic assumptions.
One-line summary
Flexible output is important because some ONNX models output tensors, not normal images. VS-MLRT can already expose those tensor channels safely; Hybrid should support that path so it does not reject valid models or incorrectly force their outputs into RGB/GRAY.
In normal VapourSynth usage, a filter usually returns a regular
VideoNodeGRAYRGBYUVBut not every ONNX model has that simple shape. Many ML models return tensors such as:
1 x 2 x H x W
1 x 4 x H x W
1 x 6 x H x W
1 x 8 x H x W
1 x N x H x WU/V chroma planes
alpha
mask
confidence map
optical flow
depth
luma/chroma residuals
multiple temporal outputs
intermediate restoration planes
auxiliary model outputs
packed model-specific dataThat is exactly what VS-MLRT’s flexible output path solves.
VS-MLRT already exposes
flexible_inferenceinferencevsmlrt.pyflexible_inference__all__How VS-MLRT flexible output works internally
The normal
inference()VideoNodeThe flexible path is different.
flexible_inference_with_fallback()_inference()flexible_output_propret["clip"]
ret["num_planes"]planes = [
clip.std.PropToClip(prop=f"{flexible_output_prop}{i}")
for i in range(num_planes)
]VideoNodeSo conceptually:
ONNX output: N x C x H x W
regular inference:
expects C to be representable as a normal clip
flexible inference:
exposes output channel 0 as clip[0]
exposes output channel 1 as clip[1]
exposes output channel 2 as clip[2]
...
exposes output channel C-1 as clip[C-1]C=2C=4C=6C=8Why this matters for multi-channel models
A lot of ONNX models used in VapourSynth are not just “RGB in, RGB out”. Some are more like tensor processors.
For example:
input: 1 x 3 x H x W
output: 1 x 2 x H x WAnother model might be:
input: 1 x 6 x H x W
output: 1 x 3 x H x WAnother one:
input: 1 x 3 x H x W
output: 1 x 4 x H x WAnother example:
input: 1 x 3 x H x W
output: 1 x 8 x H x WThat is the key point: the ONNX model is valid, and VS-MLRT can run it, but a frontend that only expects one normal clip cannot represent the result correctly.
Existing proof inside VS-MLRT: ArtCNN chroma models
This is not only theoretical. The current
vsmlrt.pyFor chroma variants, VS-MLRT calls:
clip_u, clip_v = flexible_inference_with_fallback(...)clip = core.std.ShufflePlanes([clip, clip_u, clip_v], [0, 0, 0], vs.YUV)clip_uclip_vThat is an excellent example to show Selur, because it proves the feature is already useful in real VS-MLRT code:
model output channels → separate VapourSynth clips → custom recombinationWhy forcing everything into RGB is wrong
A frontend may be tempted to do something simple like:
if output has 3 channels → RGB
if output has 1 channel → GRAY
else rejectBut it breaks down quickly:
2 channels → could be UV, flow x/y, mask pair, chroma residuals
3 channels → not always RGB; could be YUV, Lab, residuals, flow+mask
4 channels → could be RGBA, RGB+mask, YUV+alpha, 4 feature maps
6 channels → could be two RGB frames, bidirectional flow, temporal packed data
8+ channels → often model-specific tensor dataFlexible output avoids pretending that the frontend knows the model’s semantics. It gives Hybrid a lower-level, lossless way to access the model result.
That is important because with ML models, channel order is part of the model contract. Treating arbitrary channels as RGB can silently produce wrong colors, wrong masks, wrong temporal behavior, or completely meaningless output.
Why Hybrid should implement it
The main reason is simple:
Hybrid should not be less capable than VS-MLRT itself.
If VS-MLRT can run a model and expose all output channels, Hybrid should ideally allow the user to access that functionality instead of blocking the model or forcing a wrong interpretation.
The benefit for Hybrid would be:
1. Support more ONNX models.
2. Avoid wrong assumptions about output channel meaning.
3. Preserve model semantics.
4. Allow advanced users to combine channels manually.
5. Enable models that output masks, alpha, UV, flow, confidence, or auxiliary planes.
6. Use VS-MLRT’s native mechanism instead of inventing a workaround.flexible_output_propQuote:VS-MLRT already supports flexible output through
. This is useful for ONNX models whose output tensor has an arbitrary number of channels, not only 1 or 3. Some models output chroma planes, masks, alpha, optical flow, confidence maps, residuals, or other auxiliary data. Forcing such outputs into a normal RGB or GRAY VapourSynth clip either fails or destroys the model semantics.flexible_inference
With flexible output, VS-MLRT exposes every output channel as a separate
. Hybrid would not need to guess whether the output is RGB, YUV, UV, mask, alpha, or something else. It could simply expose the channels and let the user or script combine them correctly.VideoNode
This is already used in VS-MLRT itself, for example ArtCNN chroma models return separate U and V outputs through
, and then VS-MLRT recombines them with the original luma plane into a YUV clip. So this is not a theoretical feature; it is already part of the intended VS-MLRT workflow.flexible_inference_with_fallback
Implementing flexible output in Hybrid would make Hybrid compatible with a broader class of ONNX models without modifying the models and without losing channel semantics.
Suggested Hybrid-level behavior
A practical Hybrid UI/script design could be:
Mode: normal output
Use vsmlrt.inference()
Expect output to be directly usable as GRAY/RGB/YUV.
Mode: flexible output
Use vsmlrt.flexible_inference()
Return/output channel clips separately:
output_0
output_1
output_2
...combine first 3 channels as RGB
combine first 3 channels as YUV
use channel 0 as GRAY
use channels 0/1 as UV
use channel 3 as alpha/mask
export all channels separatelyOne-line summary
Flexible output is important because some ONNX models output tensors, not normal images. VS-MLRT can already expose those tensor channels safely; Hybrid should support that path so it does not reject valid models or incorrectly force their outputs into RGB/GRAY.

