10.04.2026, 00:27
Hello Dan & Selur ,
I am working on a custom video colorization pipeline heavily inspired by ColorMNet, but I completely overhauled the core architecture to make it state-of-the-art:
1. Backbone Upgrade: Replaced DINOv2 with DINOv3 for denser and richer semantic feature extraction.
2. Memory Upgrade: Upgraded the tracking engine to the XMem++ architecture (incorporating Permanent Memory).
The Progress:
I successfully trained the model from scratch up to 145,000 iterations (DAVIS AND REDS AND 16MM FILM)
The temporal stability and object tracking are mind-blowing. If I provide a reference frame with a red car, the car stays perfectly red throughout the whole video, even through severe occlusions.
The Problem:
While the tracking is perfect, I am experiencing a spatial issue: Color Bleeding / Spilling ( specifically spilling over the ground/road and the sky )
Call for Collaboration:
I am reaching out to see if we can team up to stabilize this model. Once we fix this spatial bleeding, I truly believe this will be the ultimate upgrade to ColorMNet.
To get things started, I have attached all the files to this post:
The complete training and inference source code.
The test scripts.
The trained model weights (at 145k iterations).
The visual results along with the reference images.
Let's build something great together. Any advice or pull requests are welcome!
Best
NASS
Script and model: https://drive.google.com/file/d/1JV7V2pp...sp=sharing
Resultat: https://drive.google.com/file/d/1aKtCB5Q...sp=sharing
For Test: python nass.py --input 0000.mp4 --ref_path REF --model saves/color_v3_3090_145000.pth
I am working on a custom video colorization pipeline heavily inspired by ColorMNet, but I completely overhauled the core architecture to make it state-of-the-art:
1. Backbone Upgrade: Replaced DINOv2 with DINOv3 for denser and richer semantic feature extraction.
2. Memory Upgrade: Upgraded the tracking engine to the XMem++ architecture (incorporating Permanent Memory).
The Progress:
I successfully trained the model from scratch up to 145,000 iterations (DAVIS AND REDS AND 16MM FILM)
The temporal stability and object tracking are mind-blowing. If I provide a reference frame with a red car, the car stays perfectly red throughout the whole video, even through severe occlusions.
The Problem:
While the tracking is perfect, I am experiencing a spatial issue: Color Bleeding / Spilling ( specifically spilling over the ground/road and the sky )
Call for Collaboration:
I am reaching out to see if we can team up to stabilize this model. Once we fix this spatial bleeding, I truly believe this will be the ultimate upgrade to ColorMNet.
To get things started, I have attached all the files to this post:
The complete training and inference source code.
The test scripts.
The trained model weights (at 145k iterations).
The visual results along with the reference images.
Let's build something great together. Any advice or pull requests are welcome!
Best
NASS
Script and model: https://drive.google.com/file/d/1JV7V2pp...sp=sharing
Resultat: https://drive.google.com/file/d/1aKtCB5Q...sp=sharing
For Test: python nass.py --input 0000.mp4 --ref_path REF --model saves/color_v3_3090_145000.pth

