This forum uses cookies

Dan64 · 15.04.2026, 11:03

Hi NASS,

good newes! I extended ColorMNet with Xmem2.
I named the project CMNET2, you can find it at the following link: https://github.com/dan64/cmnet2
The key features implemented are:

Reference-based colorization
Permanent memory (XMem++ style)
Preloading API
Sliding window memory management
Adaptive VRAM management
DINOv2 + ResNet50 fusion backbone

I tried to add also DinoV3 but a full implementation requires a complete new training, which requires a lot of time to perform and implement and given that my time to develop this project is limited I decided to skip this extension (my attempt to train only last 7m nodes was unsuccessful).

The pipeline in which this model will be used involves extracting a certain number of reference images from a B&W video, which will then be colored using this model, passing the reference images (colored with Qwen-Image-Edit) to CMNET2. In this context, there are two main problems: 1) there may be frames that do not have a reference image; in this case, the colors provided by the model are faded, with people's faces appearing gray; 2) the same object appears in multiple reference frames with different colors; in this case, the model often provides an intermediate color between the two.

DinoV3 doesn't solve either of the two problems:
Problem 1 (frame without reference → faded colors) — This is a temporal memory coverage problem, not a feature quality problem. DinoV3 extracts better features, but if there's no reference close in time, the result will still be faded.
Problem 2 (same object with different colors between references) — This is a semantic inconsistency problem between references, caused by Qwen. DinoV3 doesn't know that two references show the same object with different colors — it would calculate the same average as DinoV2.

Instead I'm working on including SAM3 in the pipeline, I hope I can further improve the coloring process this way, we'll see...

Dan

Login
Username:
Password:	Lost Password?
	Remember me