15.04.2026, 11:03
Hi NASS,
good newes! I extended ColorMNet with Xmem2.
I named the project CMNET2, you can find it at the following link: https://github.com/dan64/cmnet2
The key features implemented are:
I tried to add also DinoV3 but a full implementation requires a complete new training, which requires a lot of time to perform and implement and given that my time to develop this project is limited I decided to skip this extension (my attempt to train only last 7m nodes was unsuccessful).
The pipeline in which this model will be used involves extracting a certain number of reference images from a B&W video, which will then be colored using this model, passing the reference images (colored with Qwen-Image-Edit) to CMNET2. In this context, there are two main problems: 1) there may be frames that do not have a reference image; in this case, the colors provided by the model are faded, with people's faces appearing gray; 2) the same object appears in multiple reference frames with different colors; in this case, the model often provides an intermediate color between the two.
DinoV3 doesn't solve either of the two problems:
Problem 1 (frame without reference → faded colors) — This is a temporal memory coverage problem, not a feature quality problem. DinoV3 extracts better features, but if there's no reference close in time, the result will still be faded.
Problem 2 (same object with different colors between references) — This is a semantic inconsistency problem between references, caused by Qwen. DinoV3 doesn't know that two references show the same object with different colors — it would calculate the same average as DinoV2.
Instead I'm working on including SAM3 in the pipeline, I hope I can further improve the coloring process this way, we'll see...
Dan
good newes! I extended ColorMNet with Xmem2.
I named the project CMNET2, you can find it at the following link: https://github.com/dan64/cmnet2
The key features implemented are:
- Reference-based colorization
- Permanent memory (XMem++ style)
- Preloading API
- Sliding window memory management
- Adaptive VRAM management
- DINOv2 + ResNet50 fusion backbone
I tried to add also DinoV3 but a full implementation requires a complete new training, which requires a lot of time to perform and implement and given that my time to develop this project is limited I decided to skip this extension (my attempt to train only last 7m nodes was unsuccessful).
The pipeline in which this model will be used involves extracting a certain number of reference images from a B&W video, which will then be colored using this model, passing the reference images (colored with Qwen-Image-Edit) to CMNET2. In this context, there are two main problems: 1) there may be frames that do not have a reference image; in this case, the colors provided by the model are faded, with people's faces appearing gray; 2) the same object appears in multiple reference frames with different colors; in this case, the model often provides an intermediate color between the two.
DinoV3 doesn't solve either of the two problems:
Problem 1 (frame without reference → faded colors) — This is a temporal memory coverage problem, not a feature quality problem. DinoV3 extracts better features, but if there's no reference close in time, the result will still be faded.
Problem 2 (same object with different colors between references) — This is a semantic inconsistency problem between references, caused by Qwen. DinoV3 doesn't know that two references show the same object with different colors — it would calculate the same average as DinoV2.
Instead I'm working on including SAM3 in the pipeline, I hope I can further improve the coloring process this way, we'll see...
Dan

