![]() |
|
Using Stable Diffision models for Colorization - Printable Version +- Selur's Little Message Board (https://forum.selur.net) +-- Forum: Talk, Talk, Talk (https://forum.selur.net/forum-5.html) +--- Forum: Small Talk (https://forum.selur.net/forum-7.html) +--- Thread: Using Stable Diffision models for Colorization (/thread-4287.html) |
RE: Using Stable Diffision models for Colorization - Dan64 - 11.05.2026 (11.05.2026, 15:44)Selur Wrote: I then stopped the server, and called start_server.cmd: The problem is clear: .cmd files have Unix line endings (LF) instead of Windows (CRLF). When Windows CMD reads an LF-only file, it doesn't recognize the lines correctly and interprets comments and configuration text as commands to execute—hence all those errors. The cause: the files were created in LF, and when downloaded from GitHub with core.autocrlf=false, they remain in LF. Immediate fix: You can convert the downloaded files with Notepad++ → Edit → EOL Conversion → Windows (CRLF), or with VS Code by clicking LF in the bottom right and choosing CRLF. Does it really make sense to add this to Hybrid? No, this is the reason why I split the project in client/server, in HAVC will be implemented only the client part, which is lightweight and has no dependencies. If one want to use the client, must download the server from github and run it. Dan P.S. If you run python dit_client_pair_example.py --pipeline-config qwen_config_int4.json --use-shm(11.05.2026, 18:12)didris Wrote: installation was not complicated, but I also could not get it to work this way. I don't understand why you was not able to follow the instructions provided in github, maybe you had the same Selur's problems regarding LF/CRLF. On my RTX5070Ti I'm getting the same speed of about 8/9 sec per image, here my output (.venv) PS D:\PProjects\DiTServerRPC> python dit_client_pair_example.py --pipeline-config qwen_config_fp4.json --use-shm
[INFO] Connecting to http://127.0.0.1:8765/ ...
[INFO] Server is reachable.
[INFO] Transport: shared memory
[INFO] Pipeline already loaded on server.
[INFO] Image 1: sample1_bw.jpg (1480x1080 px)
[INFO] Image 2: sample2_bw.jpg (1480x1080 px)
[INFO] Running paired inference (gap=8px) ...
[INFO] Inference time : 8.12s total (4.06s per image)
[INFO] Round-trip time: 8.28s
[INFO] Saved: sample1_colorized.jpg
[INFO] Saved: sample2_colorized.jpgIf you run python dit_client_pair_example.py --pipeline-config qwen_config_fp4.json --use-shmyou are able to colorize 2 images at the same speed of 1 image, a 2x increase of speed for free. you can change qwen_config_fp4.json as follow {
"model_name": "nunchaku-qwen",
"model_precision": "fp4",
"model_rank": "32",
"model_inference_steps": "4",
"cache_dir": "C:\Users\YOUR_USERNAME\.cache\huggingface\hub",
"full_model_path": ""
}to use your HF cache dir. Dan P.S. Use my version which uses the shared memory instead of conversion of image in PNG->bytes, you will be able to increase the speed by 25% (from 5sec. to 4sec.) Also try to change the line 143 in dit_colorize_main.py as if torch.cuda.get_device_properties(0).total_memory / (1024 ** 3) < 48:RE: Using Stable Diffision models for Colorization - didris - 11.05.2026 Hi, Dan I will try your optimization tips and write to you, in comfui I code for 5 seconds per frame, so it is possible it would be very good if qwen_edit was integrated into hybrid just asking if it is not too impudent? is this script correct for subsequent coloring in hybrid with reference frames? and can it be improved with something? I have already coded a movie - it turned out pretty well: import vapoursynth as vs
from vapoursynth import core
import sys
import os
# ------------------------------------------------------------
# PATH TO HYBRID VSSCRIPTS (IMPORTANT FIX)
# ------------------------------------------------------------
scriptPath = r"D:/Programs/Hybrid/64bit/vsscripts"
sys.path.insert(0, os.path.abspath(scriptPath))
# ------------------------------------------------------------
# IMPORT HAVC (actually vsdeoldify wrapper)
# ------------------------------------------------------------
import vsdeoldify as havc
# ------------------------------------------------------------
# PATHS
# ------------------------------------------------------------
VideoPath = r"E:\Hybrid\video.mkv"
RefDir = r"E:\DiTServerRPC\output"
# ------------------------------------------------------------
# LOAD VIDEO
# ------------------------------------------------------------
clip = havc.HAVC_read_video(source=VideoPath)
# ------------------------------------------------------------
# COLOR PROPAGATION (HAVC)
# ------------------------------------------------------------
clip = havc.HAVC_cmnet2(
clip,
method=4,
sc_framedir=RefDir,
encode_mode=0,
render_speed="auto",
max_memory_frames=50,
ref_mode=0,
render_vivid=False
)
# ------------------------------------------------------------
# RGB -> YUV420P10 (for x265)
# ------------------------------------------------------------
clip = core.resize.Bicubic(
clip,
format=vs.YUV420P10,
matrix_in_s="709",
matrix_s="709",
range_in_s="full",
range_s="limited",
dither_type="error_diffusion"
)
# ------------------------------------------------------------
# OUTPUT TO VAPOURSYNTH PIPE
# ------------------------------------------------------------
clip.set_output()RE: Using Stable Diffision models for Colorization - Dan64 - 12.05.2026 Hi didris, your script seems Ok, the call to the function HAVC_cmnet2() is the one described in this post: #22 Using ComfUI my inference speed is about 22sec. using the super optimized code of the server I was able to increase the speed of about 5x. So on your RTX5090 you should be able to perform the inference in less than 2sec (using the pair() trick), probably in 1sec. The total space of the files necessary to run the server are: venv : 4.96GB (o/w 4.28GB are related to torch package) .cache : 23.3GB (nunchaku-qwen-image) + 15.7GB (vae + text_encoder) = 39GB in summary to run the server are necessary about 44GB. The total memory (RAM + VRAM) necessary to run the server is about 46GB (see post #5), on top of this is necessary to add the RAM necessary to run Windows OS (about 12GB) for a total RAM of 58GB. As you can see is not the amount of RAM that usually is available on a standard PC. So I think that the usage of this model is limited to high-end workstations. I'm happy to know that Selur was able to run the model on its RTX4080, probably using the pair() trick should be able to perform the inference of a full frame in about 5sec. Using a reference frame every 25, this imply that could be possible to colorize a clip at a speed of about 5fps, not too bad for a DiT model. I don't see any advantage in including the server in Hybrid, only disadvantages. But both Selur and you are asking for that, but I don't understand why. If the steps to run the server are too complex, please suggest what are the steps to be improved. In any case to run the full DiT colorization in Hybrid it will be necessary to split the process in client/server as I already done for CMNET2 because these process are not compatible with Vapoursynth threading. Moreover using a client/server architecture will allow users, willing to use the DiT colorizer with standard hardware, to rent a powerful GPU to run the server for few hours. It is the cheapest solution compared to a hardware upgrade (especially in these days). For example assuming to rent a RTX5090 it could be possible to colorize a clip at a speed of about 20/25 fps (almost in real-time). Let me know what you think. Dan RE: Using Stable Diffision models for Colorization - Dan64 - 12.05.2026 (11.05.2026, 15:44)Selur Wrote: ... here's what I did: Hello Selur, your results are good using the pair() trick I expect that you could obtain an inference speed of about 5/6secs per image. The RTX4080 should have 16GB of VRAM size, how many RAM do you have on your PC (just the understand better the HW requirements) ? The comments that I wrote to didris in the previous post will apply also to you. Please let me know what you think. Dan RE: Using Stable Diffision models for Colorization - Selur - 12.05.2026 I got 64GB of RAM. RE: Using Stable Diffision models for Colorization - Dan64 - 12.05.2026 Nice ![]() But what do you think about the integration in Hybrid ? Thanks, Dan RE: Using Stable Diffision models for Colorization - Selur - 12.05.2026 I got no problem with adding support to use the server. Assuming you would add support for it to HAVC or write a separate wrapper for the rpc calls it doesn't seem much work in Hybrid. a. ask user for Data (1. server url 2. and maybe how many frames should be processed in parallel (1,2,?) b. convert video to probably RGB c. call the wrapper the ui elements would be minimal and from what I gather the client would not need anything that the torch add-on with HAVC in it wouldn't already provide. iirc. the plan was to use this for reference images every xy frames, so adding it to HAVC would make sense,... and it would require just a few additional parameters (a. server url, b. intervall in which reference frames get created c. number of frame so process in parallel) Cu Selur RE: Using Stable Diffision models for Colorization - Selur - 12.05.2026 Quick and Dirty: just running all frames through the server: # Imports
import sys
import os
import vapoursynth as vs
# getting Vapoursynth core
core = vs.core
# Limit frame cache to 48449MB
core.max_cache_size = 48449
# Import scripts folder
scriptPath = 'F:/Hybrid/64bit/vsscripts'
sys.path.insert(0, os.path.abspath(scriptPath))
# loading plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/Vapoursynth/Lib/site-packages/vapoursynth/plugins2/fmtconv.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/Vapoursynth/Lib/site-packages/vapoursynth/plugins2/libbestsource.dll")
# Import scripts
import validate
# Source: 'G:\TestClips&Co\files\test.avi'
# clip current meta; color space: YUV420P8, bit depth: 8, resolution: 640x352, fps: 25, color matrix: 470bg, color primaries: Unspecific, color transfer: Unspecified, yuv luminance scale: limited, scanorder: progressive, full height: true ((Source))
# Loading 'G:\TestClips&Co\files\test.avi' using BestSource
clip = core.bs.VideoSource(source="G:/TestClips&Co/files/test.avi", cachepath="J:/tmp/test_bestSource", track=0, hwdevice="opencl")
import xmlrpc.client
import io
import numpy as np
from PIL import Image
clip_rgb = core.resize.Bicubic(clip, format=vs.RGB24, matrix_in_s="470bg")
proxy = xmlrpc.client.ServerProxy("http://127.0.0.1:8765/", use_builtin_types=True)
PROMPT = "Colorize this black and white image with natural, realistic colors."
def frame_to_png_bytes(f):
w, h = f.width, f.height
# VapourSynth R55+: planes are accessed with frame[plane]
r = np.asarray(f[0])
g = np.asarray(f[1])
b = np.asarray(f[2])
arr = np.dstack([r, g, b])
img = Image.fromarray(arr, "RGB")
buf = io.BytesIO()
img.save(buf, format="PNG")
return xmlrpc.client.Binary(buf.getvalue())
def write_png_to_frame(fout, png_bytes_data):
out_img = Image.open(io.BytesIO(bytes(png_bytes_data))).convert("RGB")
out_arr = np.array(out_img)
for plane_idx in range(3):
np.copyto(np.asarray(fout[plane_idx]), out_arr[:, :, plane_idx])
# Process pairs: frame N and N+1 together
# Use FrameEval with a clip-of-clips approach, or simply process even frames
# and carry the paired result. A simpler approach for offline encoding:
num_frames = clip_rgb.num_frames
results = {} # cache colorized frames
def colorize_paired(n, f):
if n in results:
return results.pop(n)
fout = f.copy()
# Get frame n
png1 = frame_to_png_bytes(f)
# Get frame n+1 (if exists)
n2 = min(n + 1, num_frames - 1)
f2 = clip_rgb.get_frame(n2)
png2 = frame_to_png_bytes(f2)
fout2 = f2.copy()
result = proxy.colorize_frame_pair(png1, png2, PROMPT, 8)
# gap_px=8 is the separator between the two images during inference
if result["ok"]:
write_png_to_frame(fout, result["data1"])
write_png_to_frame(fout2, result["data2"])
if n2 != n:
results[n2] = fout2 # cache the second result
return fout
colorized = core.std.ModifyFrame(clip_rgb, clip_rgb, colorize_paired)
output = core.resize.Bicubic(colorized, format=vs.YUV420P8, matrix_s="470bg")
output.set_output()with this I get ~4s/frame 2026-05-12 19:44:26,630 [INFO] colorize_frame_pair: 13.94s (6.97s/frame)
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.64s/it]
2026-05-12 19:44:54,805 [INFO] colorize_frame_pair: 7.55s (3.78s/frame)
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.63s/it]
2026-05-12 19:49:00,028 [INFO] colorize_frame_pair: 8.40s (4.20s/frame)
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.73s/it]
2026-05-12 19:49:37,091 [INFO] colorize_frame_pair: 8.01s (4.00s/frame)
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.64s/it]
2026-05-12 19:49:44,713 [INFO] colorize_frame_pair: 7.48s (3.74s/frame)
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.64s/it]
2026-05-12 19:49:52,345 [INFO] colorize_frame_pair: 7.49s (3.75s/frame)
100%|████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.67s/it]
2026-05-12 19:50:00,027 [INFO] colorize_frame_pair: 7.54s (3.77s/frame)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.63s/it]
2026-05-12 19:50:07,564 [INFO] colorize_frame_pair: 7.40s (3.70s/frame)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.66s/it]
2026-05-12 19:50:15,198 [INFO] colorize_frame_pair: 7.50s (3.75s/frame)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.67s/it]
2026-05-12 19:50:22,959 [INFO] colorize_frame_pair: 7.62s (3.81s/frame)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.67s/it]
2026-05-12 19:50:30,665 [INFO] colorize_frame_pair: 7.56s (3.78s/frame)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.63s/it]
2026-05-12 19:50:38,270 [INFO] colorize_frame_pair: 7.46s (3.73s/frame)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.63s/it]
2026-05-12 19:50:45,998 [INFO] colorize_frame_pair: 7.58s (3.79s/frame)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.68s/it]
2026-05-12 19:50:53,992 [INFO] colorize_frame_pair: 7.84s (3.92s/frame)Cu Selur RE: Using Stable Diffision models for Colorization - Dan64 - 12.05.2026 Nice ![]() As you can see, once the model is fully loaded, the inference time drops to about 3.8 seconds, much better than using ComfyUI, and with this speed it makes sense to add it to Hybrid. Tomorrow I will go in Holiday and I will be away for one week. So I hope to be able to deliver the new RC for HAVC 5.8.5 in 2 weeks. Thanks, Dan RE: Using Stable Diffision models for Colorization - Selur - 12.05.2026 Take your time. btw. would it be complicated/possible to support RGBS, RGBH, YUV444PS, YUV444PH in hAVC? Alternatively, I'll think about a wrapper to: Take convert the original video (if it's high bit depth) to YUV444PS, copy Y to the side, convert the video to RGB24 apply HAVC, convert the HAVC output to YUV444PS and then the combine UV channels with the original YUV444PS, this way at least the high bit depth of the luma would be preserved. Cu Selur |