(12.05.2026, 19:51)Selur Wrote: Quick and Dirty: just running all frames through the server:
# Imports
import sys
import os
import vapoursynth as vs
# getting Vapoursynth core
core = vs.core
# Limit frame cache to 48449MB
core.max_cache_size = 48449
# Import scripts folder
scriptPath = 'F:/Hybrid/64bit/vsscripts'
sys.path.insert(0, os.path.abspath(scriptPath))
# loading plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/Vapoursynth/Lib/site-packages/vapoursynth/plugins2/fmtconv.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/Vapoursynth/Lib/site-packages/vapoursynth/plugins2/libbestsource.dll")
# Import scripts
import validate
# Source: 'G:\TestClips&Co\files\test.avi'
# clip current meta; color space: YUV420P8, bit depth: 8, resolution: 640x352, fps: 25, color matrix: 470bg, color primaries: Unspecific, color transfer: Unspecified, yuv luminance scale: limited, scanorder: progressive, full height: true ((Source))
# Loading 'G:\TestClips&Co\files\test.avi' using BestSource
clip = core.bs.VideoSource(source="G:/TestClips&Co/files/test.avi", cachepath="J:/tmp/test_bestSource", track=0, hwdevice="opencl")
import xmlrpc.client
import io
import numpy as np
from PIL import Image
clip_rgb = core.resize.Bicubic(clip, format=vs.RGB24, matrix_in_s="470bg")
proxy = xmlrpc.client.ServerProxy("http://127.0.0.1:8765/", use_builtin_types=True)
PROMPT = "Colorize this black and white image with natural, realistic colors."
def frame_to_png_bytes(f):
w, h = f.width, f.height
# VapourSynth R55+: planes are accessed with frame[plane]
r = np.asarray(f[0])
g = np.asarray(f[1])
b = np.asarray(f[2])
arr = np.dstack([r, g, b])
img = Image.fromarray(arr, "RGB")
buf = io.BytesIO()
img.save(buf, format="PNG")
return xmlrpc.client.Binary(buf.getvalue())
def write_png_to_frame(fout, png_bytes_data):
out_img = Image.open(io.BytesIO(bytes(png_bytes_data))).convert("RGB")
out_arr = np.array(out_img)
for plane_idx in range(3):
np.copyto(np.asarray(fout[plane_idx]), out_arr[:, :, plane_idx])
# Process pairs: frame N and N+1 together
# Use FrameEval with a clip-of-clips approach, or simply process even frames
# and carry the paired result. A simpler approach for offline encoding:
num_frames = clip_rgb.num_frames
results = {} # cache colorized frames
def colorize_paired(n, f):
if n in results:
return results.pop(n)
fout = f.copy()
# Get frame n
png1 = frame_to_png_bytes(f)
# Get frame n+1 (if exists)
n2 = min(n + 1, num_frames - 1)
f2 = clip_rgb.get_frame(n2)
png2 = frame_to_png_bytes(f2)
fout2 = f2.copy()
result = proxy.colorize_frame_pair(png1, png2, PROMPT, 8)
# gap_px=8 is the separator between the two images during inference
if result["ok"]:
write_png_to_frame(fout, result["data1"])
write_png_to_frame(fout2, result["data2"])
if n2 != n:
results[n2] = fout2 # cache the second result
return fout
colorized = core.std.ModifyFrame(clip_rgb, clip_rgb, colorize_paired)
output = core.resize.Bicubic(colorized, format=vs.YUV420P8, matrix_s="470bg")
output.set_output()
Cu Selur
Hi, Selur and Dan
everything worked for me when installing and using the server
(.venv) PS E:\DiTServerRPC> .\.venv\Scripts\activate
(.venv) PS E:\DiTServerRPC>
(.venv) PS E:\DiTServerRPC> python dit_client_pair_example.py --use-shm
[INFO] Connecting to http://127.0.0.1:8765/ ...
[INFO] Server is reachable.
[INFO] Transport: shared memory
[INFO] Pipeline is loaded on server.
[INFO] Image 1: sample1_bw.jpg (1480x1080 px)
[INFO] Image 2: sample2_bw.jpg (1480x1080 px)
[INFO] Running paired inference (gap=8px) ...
[INFO] Inference time : 5.96s total (2.98s per image)
[INFO] Round-trip time: 6.08s
[INFO] Saved: sample1_colorized.jpg
[INFO] Saved: sample2_colorized.jpg
(.venv) PS E:\DiTServerRPC>
but what Selur did in terms of pairing the ditserver with a hybrid for direct video coloring I never managed to do.
it would be very good if the work with this server was automated with a hybrid.
I did not use this with Hybrid, I just used the files from Hybrid.
The script is mainly written by hand and was just a quick test, to see whether the speed was as Dan64 suggested and to test the basic usage of the RPC API.
Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page. Offline between (including) 29th of June and 5th of July => RochHarz Festival
13.05.2026, 19:15 (This post was last modified: 13.05.2026, 19:35 by Selur.)
@Dan64: Side question is misc still needed for ProPainter, ColorAdjust and HAVC? (working on adjusting all other scripts to work without it)
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page. Offline between (including) 29th of June and 5th of July => RochHarz Festival
(13.05.2026, 19:15)Selur Wrote: @Dan64: Side question is misc still needed for ProPainter, ColorAdjust and HAVC? (working on adjusting all other scripts to work without it)
yes misc is still used by ProPainter and HAVC, because it is used the filter SCDetect().
In HAVC the filter is automatically loaded from Hybrid plugin location, if you are planning to change the path please advise me.
In the table below there is the list of all plugins used by HAVC.
These plugins are automatically loaded by HAVC, no need of an external loading from Hybrid inside the script.
About the paths,... Due to the changes in R75+ current Hybrid does load plugins from "Vapoursynth/Lib/site-packages/vapoursynth/plugins2" is they are available through pip. Main gain from this is that, Vapoursynth will, if the a dll has multiple versions (like for example zsmooth, vszip, hysteresis, cranexpr), automatically load the best suited when loading the base .dll (i.e. zsmooth.dll).
(I install/update the plugins through pip and move them to the plugins2 folder, to avoid autoloading.)
=> I plan to remove dlls which are in the plugins2 folder from vsfilters in the (near) future. (atm. they are still there)
Not sure whether I will switch to autoloading dlls in the future, but atm. that is not planned, so plugins2 folder is likely to stay for now.
side note: the next problematic thing will happen when Vapoursynth drops (atm. you just get annoying warning messages) API3 support,...
(hopefully this will not happen as fast as new Vapoursynth versions come out nowadays,...)
Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page. Offline between (including) 29th of June and 5th of July => RochHarz Festival
I updated the project DiTServerRPC
The most important change is the addition of the new model "gguf-qwen".
Now the server can use quantized gguf models.
Since in my knowledge the only project being able to manage properly DiT GGUF models is Comfyui, I had to develop a "comfy bridge" which incorporates the comfy code on gguf management (about 30% of total comfyui code).
In the folder config, there are the json files with the configuration of supported gguf models.
In this way I was able to lower the VRAM requirement to 12GB and the RAM requirement to 32GB
Unfortunately the model "gguf-qwen" is about 3X slower that "nunchaku-qwen", but even worse in some cases spurious artifacts may appear in the colorized output that are not present in the source image and/or the colors are washed (for production, must be used nunchaku-qwen (FP4/INT4) which is not affected by such problems).
For me the problem is in the 4steps Lightning model (used in the "gguf-model") that is not so good.
I need to do more tests. But the important that now there is the GGUF support!
Nice, contratulation.
Can't test today, but will give it a spin tomorrow after work and report back.
Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page. Offline between (including) 29th of June and 5th of July => RochHarz Festival
I updated project DiTServerRPC now the quality of "gguf-qwen" model is improved.
I suggest to use Q3 or Q4 quantization, for the colorization process these quantizations are fine.
First of all, congratulations on the excellent work. The results are very impressive.
How do you manage to avoid inconsistent coloring across different scenes when the same objects appear?
For example, in one video I extracted 4,000 frames. To achieve consistent colors for the same objects, I had to manually remove about 3,800 of them. In some cases, a lady's clothing was colored in five different colors across different frames. If I let the process run automatically, those color variations remained visible throughout the movie.
I hope that was just a test, because 3800 removal out of 4000 is really too much.
The trick is to no create too many reference frames, 1 ever 30 or 40 frames is enough.
CMNET2 will be able to keep color consistency.
But it could happen that even in his case there are colors inconsistency.
In this case the only working solution is to manually remove them, before starting the colorization.
In my experience is necessary to remove not more than 10-20 reference frames on total frames of 4000.
Given the logic behind CMNET2 if in the permanent memory are found for the same feature (for example a car) different colors, the color in output will be a blended color of color founds (they are equally merged). For such reason it is better to keep max_memory_frames in a range of 20-50. I tried to use max_memory_frames=500, but it was a disaster, too many conflicts and the colors in output were faded due too many colors used in the blending.