10.04.2026, 12:06
Hi Dan, thanks for your reply. For Dinov3 Base (https://huggingface.co/facebook/dinov3-v...n-lvd1689m), which I used, you can request access from Meta. They’ll get back to you within 24 hours
For ColorMnet 2023, we use Dinov2 Small (training data: approximately 142 million images) and XMEM.
For ColorMnet v2 2026, I used Dinov3 Base (training data: 1.7 billion images) and XMEM++, which is superior to XMEM.
I’ve noticed that the temporal consistency of the model I trained with Dinov3 Base and XMEM++ is far superior to that of the old model, thanks, I believe, to the Permanent memory. There are no color jumps, and Dinov3 provides superior object recognition in the video, allowing for precise integration of reference images!!
You’re right: based on my research, it’s XMEM++ that needs to be properly configured to avoid that color on the ground and in the sky! Aside from that, if we can stabilize it, it will truly be superior to other image-guided colorization models! Let me know what you think?
For ColorMnet 2023, we use Dinov2 Small (training data: approximately 142 million images) and XMEM.
For ColorMnet v2 2026, I used Dinov3 Base (training data: 1.7 billion images) and XMEM++, which is superior to XMEM.
I’ve noticed that the temporal consistency of the model I trained with Dinov3 Base and XMEM++ is far superior to that of the old model, thanks, I believe, to the Permanent memory. There are no color jumps, and Dinov3 provides superior object recognition in the video, allowing for precise integration of reference images!!
You’re right: based on my research, it’s XMEM++ that needs to be properly configured to avoid that color on the ground and in the sky! Aside from that, if we can stabilize it, it will truly be superior to other image-guided colorization models! Let me know what you think?

