Selur's Little Message Board
Using Stable Diffision models for Colorization - Printable Version

+- Selur's Little Message Board (https://forum.selur.net)
+-- Forum: Talk, Talk, Talk (https://forum.selur.net/forum-5.html)
+--- Forum: Small Talk (https://forum.selur.net/forum-7.html)
+--- Thread: Using Stable Diffision models for Colorization (/thread-4287.html)



Using Stable Diffision models for Colorization - Dan64 - 14.12.2025

Recently I received some request to include stable diffusion models in HAVC colorization process.

So I decided to analyze the problem and to write this post to describe my findings.

First all is necessary to understand that the stable diffusion models were developed for the text to image process. They are able to build an image based on the description of the image.
This "specialization" represent the main problem, because I want to try to use them to color an image already available. 

For example if I try to describe to a stable diffusion model the following image

 [Image: attachment.php?aid=3419]

In the best case I can obtain something like this

[Image: attachment.php?aid=3420]

So I had to develop a complex pipeline, and after many attempts I was able to obtain "decent" colored images using the following models in the colorization pipeline:

1) Juggernaut-XL_v9_RunDiffusionPhoto_v2 (for the text to image colorization)
2) control-LoRA-recolor-rank256 (LoRA specialized to force the stable diffusion model to produce an image equal to the source in the gray-space)
3) DDColor_Modelscope (to provide a colored image as reference)
3) Qwen3-VL-2B (to descibe the image provided by DDColor and to provide the "text" to Juggernaut which tries to "mimic" DDColor)

Using this pipeline I obtained the following result (source image on the left generated with AI)

[Image: attachment.php?aid=3421]

The description of the pipeline that I used is too complex to be included in this post, but I can say that to build the colorization pipeline I used ComyUI.

For those familiar with it, I've attached an image (Recolor_Workflow.png) containing the workflow that I used. 
It is necessary to drag and drop the image into ComfyUI to view the workflow (very big). 
Of course, it will be necessary to install all the missing nodes and models (available at Hugging Face).

In summary, a part the speed (stable diffusion models are about 50x slower) I don't see any significant improvement in using stable diffusion models for the colorization process.
They could be used with the Image-to-Image or Image-Edit process to change the colors of an image already colored to be used as reference.
But this process is totally manual and cannot be included in the automatic colorization pipeline used by HAVC.
 

Dan


RE: Using Stable Diffision models for Colorization - Dan64 - 25.12.2025

Good news, I tried a new approach that seems promising.

I used the last version of Qwen Image Edit with ComfyUI, with the simple prompt: "restore and colorize this photo. Repair the damaged white background. Maintain the consistency between the characters and the background", and I obtained the following result:

[Image: attachment.php?aid=3458]


As it is possible to see Qwen recognize the castle and colorized it properly. In all my tests Qwen Image Edit was always was able to provide colorized images that were better respect to the ones colorized with DDColor (any model) and DeOldify (any model).

For sure this approach represent the future of colorization. 

But there are some big problems that need to be addressed:

1) model storage: Qwen Image Edit requires about 30GB of storage for the diffusion model, text encoder and VAE
2) speed: on my RTX 5070 Ti the colorization process takes about 20sec to get the colored picture.

Moreover it will be necessary to write from zero all the necessary python code, because I cannot use ComfyUI with Vapoursynth.

I will start to investigate the possibility to develop a filter using this model, in meanwhile for those familiar with ComfyUI, I've attached an image (QwenIE_Recolor_small_workflow.png) containing the workflow that I used.  

Merry Christmas,
Dan


RE: Using Stable Diffision models for Colorization - Selur - 25.12.2025

The DDColor version seems to be more realistic to me, to be frank.
Stable Diffusion probably only used this right color in this case since it was trained with images of Disneyland and DDColor not.
So maybe figuring out how to create more models for DDColor might be a better, more resource friendly goal.

> 1) model storage: Qwen Image Edit requires about 30GB of storage for the diffusion model, text encoder and VAE
> 2) speed: on my RTX 5070 Ti the colorization process takes about 20sec to get the colored picture.
hmm,.. so in ~20years your GPU can do the coloring for you live. Smile
(Assuming gpu compute increases by ~1.5× every 2 years; and real-time to be 25-30fps; 20years would give a x512 speed increase)


Cu Selur


RE: Using Stable Diffision models for Colorization - Dan64 - 25.12.2025

I provided this sample as example of image recognition, a normal castle will be colored properly.

Given the high quality of Diffusion Tensor models (DiT) for sure these models will represent the future of colorization.

Probably it will be necessary some time, maybe years before having a fast and lightweight model, but in mean time I want to try to build at least a prototype working on Vapoursynth. 

In the coming months it is expected that will be released z-image edit, that should be faster and lightweight respect to Qwen-image-edit.

In meanwhile, using quantized models (Q4 or Q3) it is possible to reduce the Qwen-IE model storage to about 10/11 Gb, so that all the encoding could be done on a GPU with a 12GB of VRAM. 

By doing all the inference on VRAM without the offload to RAM memory, I expect that the inference time can decrease to about 10sec, maybe even less.

Finally by using ColorMNet, it could be possible to perform the inference only on the key frames by reducing significantly the delay introduced by Qwen-IE.

I don't know if after all these optimizations it will be practical to use this approach, but in meanwhile I will be able to build all the knowledge and code necessary, so that, when in the coming months or years will be released a new model that fit all the video coloring constrains, I will be ready to include it in HAVC.

Dan


RE: Using Stable Diffision models for Colorization - Dan64 - 27.12.2025

Good News: Finally I was able to build a working prototype.
Bad News: Hardware requirements are very high: RTX 50x GPU (Blackwell) and 64GB RAM.

The main recent innovation in the Diffusion models family is the introduction of Transformer technology (the same technology used by the current LLM).
The introduction of Transformer technology has really improved significantly the Diffusion models that are now called DiT (Diffusion Transformer).

But Transformers models are memory-hungry, as evidenced by the recent RAM shortage and the relatively crazy increase in RAM prices (fortunately I upgraded my PC some months before the RAM shortage).

I can do nothing to solve this problem, I hope that in the coming months will be released models with lower hardware requirements.

But the problem is not easy to solve, for example these are the RAM requirements for the model that I'm using

[Image: attachment.php?aid=3463]

as it is possible to see even if the storage size of the model on the disk is about 23GB (not too much for a DiT model) the RAM usage is about 2x, 46GB, on top of this is necessary to add the RAM used by OS and background programs, about 17GB, and the total RAM necessary to perform the colorization is 63 GB.

But the use of DiT models for colorization is really a game changer, as shown in the image below

[Image: attachment.php?aid=3462]

 
 I recently started colorizing my old B&W films with the latest version of HAVC. Naturally, I had to make a lot of color compromises, but when I colorized the film Miracle on 34th Street (1947) and saw Santa Claus wearing a gray/brown costume, I rebelled and began to delve into the technology for colorizing photos with DiT models, and the results I achieved are astonishing.

As a test, I tried colorizing the film Miracle on 34th Street (138687 frames) using the following pipeline:

1) export of reference frames, in this case I obtained about 3000+ frames (time 25m)
2) selection of key reference frames, I obtained 994 key frames (time 20m)
3) colorization of the key frames using Nunchaku Qwen Image Edit 2509 (only python code, not using ComfyUI), at a speed of 12sec/frame (time 3h30m)
4) colorization of full movie using HAVC(ColorMNet) with Vivid=False (time 2h40m)

The total time required to colorize the film was about 6h55m

The time necessary to colorize the movie with HAVC with preset slower was 6h37m

So the new approach requires only 18m more respect to the old HAVC approach.

It is possible to see the colorized film at this link (Santa Claus is perfectly colored): miracle-on-34th-street-colorized-1947

Dan


RE: Using Stable Diffision models for Colorization - Dan64 - 28.12.2025

I tested another DiT model: Flux.1 Kontext

The HW requirements are acceptable: 32GB RAM and GPU RTX3060 and above.

Unfortunately this model, which is older than Qwn-IE-2509, is 2x slower than Qwen-IE-2509 and in some cases it provides the wrong colorization as shown in the example below

[Image: attachment.php?aid=3465]

As it is possible to see Flux.1 didn't recognize Santa Claus and mistook people for plants.

In any case for those willing to test it, I've attached an image (flux1-workflow_small.png) containing the Flux.1 workflow that I used.

Dan


RE: Using Stable Diffision models for Colorization - Dan64 - 29.12.2025

For those interested in trying out DiT workflows for coloring the reference frames, I suggest to  try to install the Windos portable version of ComfyUI.

ComfyUI is a little complex to use but is very powerful and there is a lot of documentation available, see: https://docs.comfy.org/  

Once you have installed ComfyUI, download the workflow for Qwen Image Edit (similar to Nano Banana but free):  qwen-image-edit-2511

Once you have the workflow loaded in ComfyUI, load a B&W image that you want to colorize. 

Then change the prompt include in the example with:  "Colorize this black and white photo with realistic, vibrant colors while preserving skin tones. Strictly preserve all shapes, edges and background details."

I suggest ComfUI as a free alternative to Nano Banana, moreover the users more skilled could try to automatize the coloring of reference frames with ComfUI (better ask to Gemini how to do that).  

Currently mi coloring pipeline is similar to the one described in the chapter "Advanced coloring using adjusted reference frames" in my user guide, the only difference is that instead to manually adjust the reference frames using Photoshop Elements, now I'm using Nunchaku Qwen-Image-Edit-2511 (available also in ComfyUI). Given the very good results obtained with this DiT model, I decided to color all the reference frames in this way, then I use ColorMNet (with Vivid unchecked) with Method "external RF different from Video" to colorize all the movie as described at the pages 50-59 of my user guide.

I recently published this movie: A Night To Remember (Colorized, 1958) that was colored using this new pipeline. Looking at this video (which can be easily downloaded) you can see that the colors are very accurate, vivid and quite stable, something that cannot be achieved using DeOldify and/or DDColor. The colors are so accurate that it looks like the movie was filmed in Color and not "colorized".

Dan