This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

ColormnetV2 Project
#13
Hi Dan, you're absolutely right—that's the right approach: start from scratch! To avoid any issues, here's a technical report on my previous attempt:

Technical Report on the Migration and Optimization of the ColorMNet Pipeline to DINOv3 Base (is the major update to DINOv2) and XMem2 (is the major update to XMem )
1. Background and Initial Objective

The project is based on the ColorMNet architecture (ECCV 2024), which was originally designed with a DINOv2 Small backbone (ViT-S/14). The objective was to transform this prototype into a high-fidelity film colorization tool .
2. Technical Conversion Process (Backbone & Latent Space)

The first phase involved replacing the model’s “eyes”:

    Backbone: Switch to DINOv3 Base (ViT-B/16). This change involved a radical modification of the dimensionality. By concatenating the last 4 hidden layers (hidden states), we went from a 1536-channel vector (4x384) to a 3072-channel vector (4x768).

    Semantic Compression: To optimize computation on the 3090, a projection layer (conv_proj) was implemented to compress these 3072 channels into a working space of 1024 channels (ValueDim) and 128 channels (KeyDim).

    Spatial Alignment: DINOv3 (patch 16) has been synchronized with the ResNet-50 branch (stride 16) to ensure exact pixel-to-pixel correspondence in the PVGFE fusion module.

3. Integration of XMem++ Memory

To address the memory loss issue in the original pipeline—which caused colors to fade after a few seconds—we implemented the XMem2 logic:

    Permanent Memory Bank: Unlike the original model, where the copy (reference image) was volatile, we created an immutable anchor in VRAM. The reference image is injected as “eternal memory” that is consulted at every frame to stop color drift.



My recommendation: Start with the original XMEM2 GitHub repository and try integrating Backbone DINOv3 Base (ViT-B/16) and ResNet-50, using the colorization technique inspired by the old Colormnet 2023 pipeline.

I'm here to help with any requests for assistance

Best

NASS
Reply


Messages In This Thread
ColormnetV2 Project - by NASS - 10.04.2026, 00:27
RE: Deoldify Vapoursynth filter - by Dan64 - 10.04.2026, 09:51
RE: ColormnetV2 Project - by Selur - 10.04.2026, 10:32
RE: ColormnetV2 Project - by NASS - 10.04.2026, 12:06
RE: ColormnetV2 Project - by Dan64 - 10.04.2026, 16:58
RE: ColormnetV2 Project - by NASS - 10.04.2026, 18:48
RE: ColormnetV2 Project - by Dan64 - 11.04.2026, 10:15
RE: ColormnetV2 Project - by Dan64 - 11.04.2026, 12:14
RE: ColormnetV2 Project - by NASS - 11.04.2026, 12:39
RE: ColormnetV2 Project - by Dan64 - 11.04.2026, 16:09
RE: ColormnetV2 Project - by NASS - 11.04.2026, 16:40
RE: ColormnetV2 Project - by Dan64 - 11.04.2026, 17:31
RE: ColormnetV2 Project - by NASS - 11.04.2026, 18:44
RE: ColormnetV2 Project - by Dan64 - 15.04.2026, 11:03
RE: ColormnetV2 Project - by NASS - 16.04.2026, 19:34
RE: ColormnetV2 Project - by Dan64 - 16.04.2026, 21:52
RE: ColormnetV2 Project - by NASS - 16.04.2026, 22:21
RE: ColormnetV2 Project - by Dan64 - 19.04.2026, 15:16

Forum Jump:


Users browsing this thread: 1 Guest(s)