11.04.2026, 16:40
Hi Dan, thanks for your reply. I trained the model with xmem2 using train-xmem2.py (I didn't use the original files because they needed to be modified for compatibility between Dinov3+ResNet and xmem2). It took me about 48 hours in FP16 AMP on my RTX 3090
I modified dataset/vos_dataset.py so it can read all types of datasets (DAVIS/REDS/16mm).
In the script I shared, everything is set up to run a training session, except we need to find a way to stabilize the model if we understand xmem2 correctly (Just so you understand, I used Gemini for the xmem2 code). I should have used the GitHub repo directly from the start
Even though I haven’t been able to stabilize it yet, I’m really surprised by the accuracy of the integration of the reference images into a full video! Even with Deep Exemplar-Based and Colormnet 2023, I couldn’t get this result!
Best
NASS
I modified dataset/vos_dataset.py so it can read all types of datasets (DAVIS/REDS/16mm).
In the script I shared, everything is set up to run a training session, except we need to find a way to stabilize the model if we understand xmem2 correctly (Just so you understand, I used Gemini for the xmem2 code). I should have used the GitHub repo directly from the start
Even though I haven’t been able to stabilize it yet, I’m really surprised by the accuracy of the integration of the reference images into a full video! Even with Deep Exemplar-Based and Colormnet 2023, I couldn’t get this result!
Best
NASS

