11.04.2026, 12:39
Hi Dan, thanks for your reply. Yes, I trained the model from scratch! And you're probably right. There might be an issue because I included a dataset from a 16mm film. The model might have interpreted the grain as color. Also, I don’t think I included the full xmem++ code (https://github.com/mbzuai-metaverse/XMem2), because after a thorough analysis, I read that it works with a permanent memory (already included) plus a working memory and a 15-frame jump (not included in the training).
You know, all colormnet scripts are based on xmem scripts. We were able to improve colorization by adding Dinov3 and ResNet. Now we need to understand how xmem2’s memory works. (xmem2 provides a precise reference image, with surgical-grade quality! For example, if you want to apply a color to a person or a car, it will remain consistent throughout the video, even if the object moves or enters the field of view.)
I’ve sent you the latest version of the training file train.py (that I used for training)
https://drive.google.com/file/d/1qr9jkCa...sp=sharing
The command to run: python train.py --davis_root datasets/480p --exp_id ColorMNet_V3_Final --s2_batch_size 4 - -s2_num_frames 10 --s2_lr 2e-5 --key_dim 128 --value_dim 512 --hidden_dim 64
Best regards,
NASS
You know, all colormnet scripts are based on xmem scripts. We were able to improve colorization by adding Dinov3 and ResNet. Now we need to understand how xmem2’s memory works. (xmem2 provides a precise reference image, with surgical-grade quality! For example, if you want to apply a color to a person or a car, it will remain consistent throughout the video, even if the object moves or enters the field of view.)
I’ve sent you the latest version of the training file train.py (that I used for training)
https://drive.google.com/file/d/1qr9jkCa...sp=sharing
The command to run: python train.py --davis_root datasets/480p --exp_id ColorMNet_V3_Final --s2_batch_size 4 - -s2_num_frames 10 --s2_lr 2e-5 --key_dim 128 --value_dim 512 --hidden_dim 64
Best regards,
NASS

