This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

Using Stable Diffision models for Colorization
#91
Hi Dan,

The two photos turned out very well. 😊

Even though the generated reference images contain some fictional/non-existent figures, CMNET2 seems to handle them surprisingly well during the actual colorization process and does not transfer those artifacts into the final video.

It seems that the solution may really be in finding the right prompt. Perhaps adding a negative prompt could help reduce these generated figures even further, or maybe a newer model in the future will handle this type of scene more accurately.
Reply
#92
(09.06.2026, 19:41)safshe Wrote: Please see the attached image.

It is a security problem on your PC, this is the reason why is unable to find python and/or vspipe, they are blocked.

Try this solution

Open Windows Security. Go to Virus & threat protection > Manage settings. Scroll to Exclusions and click Add or remove exclusions. Click Add an exclusion > Folder. Select your entire project folder: F:\AI_Works\DiTServerRPC\.venv Try running the command again.

and then retry the command

Dan

(09.06.2026, 20:02)didris Wrote: Hi Dan,

The two photos turned out very well. 😊

Even though the generated reference images contain some fictional/non-existent figures, CMNET2 seems to handle them surprisingly well during the actual colorization process and does not transfer those artifacts into the final video.

It seems that the solution may really be in finding the right prompt. Perhaps adding a negative prompt could help reduce these generated figures even further, or maybe a newer model in the future will handle this type of scene more accurately.

Lightning models do not support negative prompt. Try to experiment with different prompts till you get the optimal result for you.

Dan
Reply
#93
(09.06.2026, 19:03)Dan64 Wrote: Moreover provide the output of the following commands
    
PS D:\PProjects\DiTServerRPC_dev> .\.venv\Scripts\activate (.venv) PS D:\PProjects\DiTServerRPC_dev> pip list

(.venv) PS D:\PProjects\DiTServerRPC_dev> Get-CimInstance Win32_VideoController | Select-Object Name


Dan

PS E:\DITservercolorize> .\.venv\Scripts\activate
(.venv) PS E:\DITservercolorize> pip list
Package                    Version
--------------------------- ---------------------
accelerate                  1.12.0
annotated-doc              0.0.4
anyio                      4.13.0
av                          17.1.0
build                      1.5.0
certifi                    2026.5.20
charset-normalizer          3.4.7
click                      8.4.1
colorama                    0.4.6
comfy-aimdo                0.4.7
comfy-kitchen              0.2.10
diffusers                  0.37.0.dev0
einops                      0.8.2
Expr                        0.96
FFmpegSource2              5.0
filelock                    3.29.0
FreeSimpleGUI              5.2.0.post1
fsspec                      2026.4.0
gguf                        0.19.0
h11                        0.16.0
hf-xet                      1.5.0
httpcore                    1.0.9
httpx                      0.28.1
huggingface_hub            0.36.2
idna                        3.18
ImageIO                    2.37.3
importlib_metadata          9.0.0
Jinja2                      3.1.6
lazy-loader                0.5
LSMASHSource                1282
markdown-it-py              4.2.0
MarkupSafe                  3.0.3
mdurl                      0.1.2
mpmath                      1.3.0
networkx                    3.6.1
numpy                      2.4.4
nunchaku                    1.2.1+cu13.0torch2.10
opencv-python              4.13.0.92
packaging                  26.2
peft                        0.19.1
pillow                      12.2.0
pip                        26.1.2
protobuf                    7.35.0
psutil                      7.2.2
Pygments                    2.20.0
pyproject_hooks            1.2.0
PyYAML                      6.0.3
regex                      2026.5.9
requests                    2.34.2
rich                        15.0.0
safetensors                0.8.0rc1
scikit-image                0.26.0
scipy                      1.17.1
Send2Trash                  2.1.0
sentencepiece              0.2.1
setuptools                  70.2.0
shellingham                1.5.4
spatial_correlation_sampler 0.5.0
sympy                      1.14.0
TCanny                      14
tifffile                    2026.6.1
tokenizers                  0.22.2
torch                      2.10.0+cu130
torchaudio                  2.10.0+cu130
torchsde                    0.2.6
torchvision                0.25.0+cu130
tqdm                        4.68.1
trampoline                  0.1.2
transformers                4.57.6
typer                      0.25.1
typing_extensions          4.15.0
urllib3                    2.7.0
uv-build                    0.11.19
VapourSynth                74
vscmnet2                    1.0.0
vsrepo                      2.0.0
vsstubs                    2.1.1
zipp                        4.1.0
(.venv) PS E:\DITservercolorize> Get-CimInstance Win32_VideoController | Select-Object Name

Name
----
AMD Radeon™ Graphics
NVIDIA GeForce RTX 5090


(.venv) PS E:\DITservercolorize> python -c "import torch; print(torch.cuda.get_device_name(0))"
NVIDIA GeForce RTX 5090
(.venv) PS E:\DITservercolorize> python -c "import torch; print(torch.cuda.is_available())"
True
Reply
#94
(09.06.2026, 20:05)Dan64 Wrote:
(09.06.2026, 19:41)safshe Wrote: Please see the attached image.

It is a security problem on your PC, this is the reason why is unable to find python and/or vspipe, they are blocked.

Try this solution

Open Windows Security. Go to Virus & threat protection > Manage settings. Scroll to Exclusions and click Add or remove exclusions. Click Add an exclusion > Folder. Select your entire project folder: F:\AI_Works\DiTServerRPC\.venv Try running the command again.

and then retry the command

Dan


Hi Dan,
I’ve done a clean install of everything from scratch, and it is working now. Based on my initial testing, here are my observations and a few feature suggestions:


1. Temporal Color Consistency Issue
While the model successfully colorizes and outputs all images, consistency across frames is a major issue. The model tends to colorize the same shot differently from frame to frame. For example, in a tracking shot where a man walks from a distance toward the camera, his shirt color shifts multiple times throughout the sequence.
2. Shot-Based Segmentation & Keyframe Reference (Feature Suggestion)
Since we already have scene detection capabilities, we could leverage it to fix this consistency problem. Here is a potential workflow:
  • Automated Batching: The system could automatically segment each detected shot into its own dedicated folder.
  • Keyframe Guidance: Once the first image (or a chosen keyframe) is colorized, the model could use it as a reference for the remaining frames in that folder. A prompt fallback like "colorize the remaining images using the color profile and references from the first image" could drastically improve uniformity.
  • Granular Control via GUI: Adding a dedicated GUI tab for shot management would be incredibly helpful. If a specific shot's colorization fails or looks off, we could easily navigate to that shot's folder via the interface and re-run the colorization for just that sequence.
3. Manual Reference & Control Net Tab (Feature Suggestion)
For scenes requiring high accuracy—such as maintaining the specific historical colors of an institutional logo, emblem, or uniform—we need a way to manually intervene.
  • It would be fantastic to have an additional tab where we can upload a specific external reference image.
  • We could then instruct the model with a prompt like: "colorize this sequence, but match the emblem's colors exactly to the attached reference image."
Implementing these features would give us the comprehensive, granular control needed to restore and colorize video content with professional accuracy.
Reply
#95
(10.06.2026, 11:05)safshe Wrote: Based on my initial testing, here are my observations and a few feature suggestions:


1. Temporal Color Consistency Issue
While the model successfully colorizes and outputs all images, consistency across frames is a major issue. The model tends to colorize the same shot differently from frame to frame. For example, in a tracking shot where a man walks from a distance toward the camera, his shirt color shifts multiple times throughout the sequence.
2. Shot-Based Segmentation & Keyframe Reference (Feature Suggestion)
Since we already have scene detection capabilities, we could leverage it to fix this consistency problem. Here is a potential workflow:
  • Automated Batching: The system could automatically segment each detected shot into its own dedicated folder.
  • Keyframe Guidance: Once the first image (or a chosen keyframe) is colorized, the model could use it as a reference for the remaining frames in that folder. A prompt fallback like "colorize the remaining images using the color profile and references from the first image" could drastically improve uniformity.
  • Granular Control via GUI: Adding a dedicated GUI tab for shot management would be incredibly helpful. If a specific shot's colorization fails or looks off, we could easily navigate to that shot's folder via the interface and re-run the colorization for just that sequence.
3. Manual Reference & Control Net Tab (Feature Suggestion)
For scenes requiring high accuracy—such as maintaining the specific historical colors of an institutional logo, emblem, or uniform—we need a way to manually intervene.
  • It would be fantastic to have an additional tab where we can upload a specific external reference image.
  • We could then instruct the model with a prompt like: "colorize this sequence, but match the emblem's colors exactly to the attached reference image."
Implementing these features would give us the comprehensive, granular control needed to restore and colorize video content with professional accuracy.

Hi safshe,

  I thank you for you observations. I already tried to find a solution to some of the questions raised in your post and you can find my thoughts below

1. Temporal Color Consistency Issue

The only "reasonable" solution to this problem is to enforce color consistency by manually looking to the colored frames in folder "ref_qwen". If are missing reference frames, you can manually add them to the folder "ref_tht10" and re-run the colorization task. The program will colorize only the missing frames (no need to start re-colorization  from zero). If the are frames with inconsistent colors you can remove or modify them.   

2. Shot-Based Segmentation & Keyframe Reference (Feature Suggestion)

In the program is already implemented a scene-detection algorithm that I consider quite good. The algorithm identifies scene boundaries by analyzing structural differences between frames rather than relying solely on raw pixel changes. The core method computes frame differences between temporally offset frames and enhances them using an edge mask built from Kirsch and TCanny operators. This produces an edge-weighted difference metric, which emphasizes meaningful structural changes (e.g., object boundaries) while reducing sensitivity to noise or flat regions. Scene changes are detected when both:
  • the global frame difference, and
  • the edge-weighted difference
exceed configurable thresholds, while also respecting a minimum distance between cuts. Additional safeguards include:
  • luma filtering, which rejects frames that are too dark or too bright,
  • override conditions for very strong changes or external detector hints.
Optionally, a second stage refines detections using SSIM and histogram comparison, removing false positives when consecutive frames are still perceptually similar.
The algorithm annotates each frame with scene-change flags and metadata, providing both detection results and information about the decision process.  

I developed this algorithm because I was unable to find a good scene-detection in the open-code world. The Shot-Based Segmentation & Keyframe Reference colorization are already managed by the 2 tasks: 1) Extract Reference Frames 2) Colorize Frames. But,  as wrote in my previous answer, to obtain a perfect result is necessary a manual adjustment. Don't hope to be able to do that automatically.   

3. Manual Reference & Control Net Tab (Feature Suggestion)

I already tried to change the prompt to enforce color consistency but the results were bad. For example because in a clip the car was colored both in blue and in red, I asked in the prompt to always colorize the cars in blue, the result was that Qwen added a blue car even in frames where the car was missing. I also tried to provide in input to Qwen, 2 images, asking to the model to colorize the first images using the colors available in the second image. The result was bad, it seems the Qwen was not trained to properly solve this type of prompts. Unless in the future will be available models trained to enforce color consistency the only viable solution is the one described at point 1.

Dan
Reply
#96
(10.06.2026, 16:05)Dan64 Wrote: I developed this algorithm because I was unable to find a good scene-detection the open-code world. The Shot-Based Segmentation & Keyframe Reference colorization are already managed by the 2 tasks: 1) Extract Reference Frames 2) Colorize Frames. But,  as wrote in my previous answer, the  to obtain a perfect result is necessary a manual adjustment. Don't hope to be able to do that automatically.   



Dan

Thanks for the reply.
Regarding scene change detection, can we use DaVinci Resolve to detect scene cuts and then use the generated EDL in VapourSynth? I believe the free version of Resolve should be sufficient for scene cut detection.
Reply
#97
Hi safshe,

  The GUI has 4 tasks (the last is optional and suitable only if is used in input a clip already colorized).

   [Image: attachment.php?aid=3637]

   You can skip the first task and generate the reference frames using other external tools. It is only necessary to assign them the right name (ref_nnnnnn.jpg) and put them in the folder called "ref_tht10". Then you can start the colorization using Qwen-IE.

   [Image: attachment.php?aid=3638]

    Having split the pipeline in independent tasks will provide more flexibility. After the colorization you can check the colored frames and eventually adjust them.
    Finally you can run the full clip colorization using CMNET2 by running the task #3.

Dan

P.S.

You can add other extraction scripts in the folder "GUI\scripts", to be visible they need to be named *extract*.vpy.  If you want to use other filters for scene-detection, if they are available as VS filter you can add them in your (.venv) and create a new script to use them, look at the script extract_refs_edge.vpy as an example.      

You can add also other colorization scripts it is just enough to rename them *encode*.vpy. If you want to use the colorization functions available in HAVC it is just enough to install the filter in your (.venv) environment (remember to add all the necessary model weights). Look at the script encode_cmnet2.vpy as an example.


Attached Files Thumbnail(s)
       
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)