This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

Using Stable Diffision models for Colorization
#41
(12.05.2026, 19:51)Selur Wrote: Quick and Dirty: just running all frames through the server:
# Imports import sys import os import vapoursynth as vs # getting Vapoursynth core core = vs.core # Limit frame cache to 48449MB core.max_cache_size = 48449 # Import scripts folder scriptPath = 'F:/Hybrid/64bit/vsscripts' sys.path.insert(0, os.path.abspath(scriptPath)) # loading plugins core.std.LoadPlugin(path="F:/Hybrid/64bit/Vapoursynth/Lib/site-packages/vapoursynth/plugins2/fmtconv.dll") core.std.LoadPlugin(path="F:/Hybrid/64bit/Vapoursynth/Lib/site-packages/vapoursynth/plugins2/libbestsource.dll") # Import scripts import validate # Source: 'G:\TestClips&Co\files\test.avi' # clip current meta; color space: YUV420P8, bit depth: 8, resolution: 640x352, fps: 25, color matrix: 470bg, color primaries: Unspecific, color transfer: Unspecified, yuv luminance scale: limited, scanorder: progressive, full height: true ((Source)) # Loading 'G:\TestClips&Co\files\test.avi' using BestSource clip = core.bs.VideoSource(source="G:/TestClips&Co/files/test.avi", cachepath="J:/tmp/test_bestSource", track=0, hwdevice="opencl") import xmlrpc.client import io import numpy as np from PIL import Image clip_rgb = core.resize.Bicubic(clip, format=vs.RGB24, matrix_in_s="470bg") proxy = xmlrpc.client.ServerProxy("http://127.0.0.1:8765/", use_builtin_types=True) PROMPT = "Colorize this black and white image with natural, realistic colors." def frame_to_png_bytes(f):     w, h = f.width, f.height     # VapourSynth R55+: planes are accessed with frame[plane]     r = np.asarray(f[0])     g = np.asarray(f[1])     b = np.asarray(f[2])     arr = np.dstack([r, g, b])     img = Image.fromarray(arr, "RGB")     buf = io.BytesIO()     img.save(buf, format="PNG")     return xmlrpc.client.Binary(buf.getvalue()) def write_png_to_frame(fout, png_bytes_data):     out_img = Image.open(io.BytesIO(bytes(png_bytes_data))).convert("RGB")     out_arr = np.array(out_img)     for plane_idx in range(3):         np.copyto(np.asarray(fout[plane_idx]), out_arr[:, :, plane_idx]) # Process pairs: frame N and N+1 together # Use FrameEval with a clip-of-clips approach, or simply process even frames # and carry the paired result. A simpler approach for offline encoding: num_frames = clip_rgb.num_frames results = {}  # cache colorized frames def colorize_paired(n, f):     if n in results:         return results.pop(n)     fout = f.copy()         # Get frame n     png1 = frame_to_png_bytes(f)         # Get frame n+1 (if exists)     n2 = min(n + 1, num_frames - 1)     f2 = clip_rgb.get_frame(n2)     png2 = frame_to_png_bytes(f2)     fout2 = f2.copy()     result = proxy.colorize_frame_pair(png1, png2, PROMPT, 8)     # gap_px=8 is the separator between the two images during inference     if result["ok"]:         write_png_to_frame(fout, result["data1"])         write_png_to_frame(fout2, result["data2"])         if n2 != n:             results[n2] = fout2  # cache the second result         return fout colorized = core.std.ModifyFrame(clip_rgb, clip_rgb, colorize_paired) output = core.resize.Bicubic(colorized, format=vs.YUV420P8, matrix_s="470bg") output.set_output()
Cu Selur

Hi, Selur and Dan
everything worked for me when installing and using the server
(.venv) PS E:\DiTServerRPC> .\.venv\Scripts\activate (.venv) PS E:\DiTServerRPC> (.venv) PS E:\DiTServerRPC> python dit_client_pair_example.py --use-shm [INFO] Connecting to http://127.0.0.1:8765/ ... [INFO] Server is reachable. [INFO] Transport: shared memory [INFO] Pipeline is loaded on server. [INFO] Image 1: sample1_bw.jpg  (1480x1080 px) [INFO] Image 2: sample2_bw.jpg  (1480x1080 px) [INFO] Running paired inference (gap=8px) ... [INFO] Inference time : 5.96s total  (2.98s per image) [INFO] Round-trip time: 6.08s [INFO] Saved: sample1_colorized.jpg [INFO] Saved: sample2_colorized.jpg (.venv) PS E:\DiTServerRPC>

but what Selur did in terms of pairing the ditserver with a hybrid for direct video coloring I never managed to do.
it would be very good if the work with this server was automated with a hybrid.
Reply
#42
I did not use this with Hybrid, I just used the files from Hybrid.
The script is mainly written by hand and was just a quick test, to see whether the speed was as Dan64 suggested and to test the basic usage of the RPC API. Smile

Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#43
@Dan64: Side question is misc still needed for ProPainter, ColorAdjust and HAVC? (working on adjusting all other scripts to work without it)
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#44
(13.05.2026, 19:15)Selur Wrote: @Dan64: Side question is misc still needed for ProPainter, ColorAdjust and HAVC? (working on adjusting all other scripts to work without it)

yes misc is still used by ProPainter and HAVC, because it is used the filter SCDetect().
In HAVC the filter is automatically loaded from Hybrid plugin location, if you are planning to change the path please advise me.  

In the table below there is the list of all plugins used by HAVC. 
These plugins are automatically loaded by HAVC, no need of an external loading from Hybrid inside the script.

[Image: attachment.php?aid=3596]

Dan


Attached Files Thumbnail(s)
   
Reply
#45
About the paths,... Due to the changes in R75+ current Hybrid does load plugins from "Vapoursynth/Lib/site-packages/vapoursynth/plugins2" is they are available through pip. Main gain from this is that, Vapoursynth will, if the a dll has multiple versions (like for example zsmooth, vszip, hysteresis, cranexpr), automatically load the best suited when loading the base .dll (i.e. zsmooth.dll).
(I install/update the plugins through pip and move them to the plugins2 folder, to avoid autoloading.)
=> I plan to remove dlls which are in the plugins2 folder from vsfilters in the (near) future. (atm. they are still there)
Not sure whether I will switch to autoloading dlls in the future, but atm. that is not planned, so plugins2 folder is likely to stay for now.

side note: the next problematic thing will happen when Vapoursynth drops (atm. you just get annoying warning messages) API3 support,...
(hopefully this will not happen as fast as new Vapoursynth versions come out nowadays,...)

Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#46
Hello Selur,

  I updated the project DiTServerRPC
  The most important change is the addition of the new model "gguf-qwen".
  Now the server can use quantized gguf models.

  Since in my knowledge the only project being able to manage properly DiT GGUF models is Comfyui, I had to develop a "comfy bridge" which incorporates the comfy code on gguf management (about 30% of total comfyui code).

  In the folder config, there are the json files with the configuration of supported gguf models.

  In this way I was able to lower the VRAM requirement to 12GB and the RAM requirement to 32GB

  Unfortunately the model "gguf-qwen" is about 3X slower that "nunchaku-qwen", but even worse in some cases spurious artifacts may appear in the colorized output that are not present in the source image and/or the colors are washed (for production, must be used nunchaku-qwen (FP4/INT4) which is not affected by such problems).

For me the problem is in the 4steps Lightning model (used in the "gguf-model") that is not so good. 
  
   I need to do more tests. But the important that now there is the GGUF support!

Dan
Reply
#47
Nice, contratulation.
Can't test today, but will give it a spin tomorrow after work and report back.

Cu Selur
----
Dev versions are in the 'experimental'-folder of my GoogleDrive, which is linked on the download page.
Offline between (including) 29th of June and 5th of July => RochHarz Festival
Reply
#48
I updated project DiTServerRPC now the quality of "gguf-qwen" model is improved. 
I suggest to use Q3 or Q4 quantization, for the colorization process these quantizations are fine.

Dan
Reply
#49
Hi, Dan

First of all, congratulations on the excellent work. The results are very impressive.
 
How do you manage to avoid inconsistent coloring across different scenes when the same objects appear?
For example, in one video I extracted 4,000 frames. To achieve consistent colors for the same objects, I had to manually remove about 3,800 of them. In some cases, a lady's clothing was colored in five different colors across different frames. If I let the process run automatically, those color variations remained visible throughout the movie.

How do you handle this issue?
Reply
#50
Hi didris,

  I hope that was just a test, because 3800 removal out of 4000 is really too much.
  The trick is to no create too many reference frames, 1 ever 30 or 40 frames is enough. 
  CMNET2 will be able to keep color consistency.  
  But it could happen that even in his case there are colors inconsistency.
  In this case the only working solution is to manually remove them, before starting the colorization.
  In my experience is necessary to remove not more than 10-20 reference frames on total frames of 4000. 

  Given the logic behind CMNET2 if in the permanent memory are found for the same feature (for example a car) different colors, the color in output will be a blended color of color founds (they are equally merged). For such reason it is better to keep max_memory_frames in a range of 20-50. I tried to use max_memory_frames=500, but it was a disaster, too many conflicts and the colors in output were faded due too many colors used in the blending.  

Dan
Reply


Forum Jump:


Users browsing this thread: 10 Guest(s)