25.12.2025, 22:08
I provided this sample as example of image recognition, a normal castle will be colored properly.
Given the high quality of Diffusion Tensor models (DiT) for sure these models will represent the future of colorization.
Probably it will be necessary some time, maybe years before having a fast and lightweight model, but in mean time I want to try to build at least a prototype working on Vapoursynth.
In the coming months it is expected that will be released z-image edit, that should be faster and lightweight respect to Qwen-image-edit.
In meanwhile, using quantized models (Q4 or Q3) it is possible to reduce the Qwen-IE model storage to about 10/11 Gb, so that all the encoding could be done on a GPU with a 12GB of VRAM.
By doing all the inference on VRAM without the offload to RAM memory, I expect that the inference time can decrease to about 10sec, maybe even less.
Finally by using ColorMNet, it could be possible to perform the inference only on the key frames by reducing significantly the delay introduced by Qwen-IE.
I don't know if after all these optimizations it will be practical to use this approach, but in meanwhile I will be able to build all the knowledge and code necessary, so that, when in the coming months or years will be released a new model that fit all the video coloring constrains, I will be ready to include it in HAVC.
Dan
Given the high quality of Diffusion Tensor models (DiT) for sure these models will represent the future of colorization.
Probably it will be necessary some time, maybe years before having a fast and lightweight model, but in mean time I want to try to build at least a prototype working on Vapoursynth.
In the coming months it is expected that will be released z-image edit, that should be faster and lightweight respect to Qwen-image-edit.
In meanwhile, using quantized models (Q4 or Q3) it is possible to reduce the Qwen-IE model storage to about 10/11 Gb, so that all the encoding could be done on a GPU with a 12GB of VRAM.
By doing all the inference on VRAM without the offload to RAM memory, I expect that the inference time can decrease to about 10sec, maybe even less.
Finally by using ColorMNet, it could be possible to perform the inference only on the key frames by reducing significantly the delay introduced by Qwen-IE.
I don't know if after all these optimizations it will be practical to use this approach, but in meanwhile I will be able to build all the knowledge and code necessary, so that, when in the coming months or years will be released a new model that fit all the video coloring constrains, I will be ready to include it in HAVC.
Dan

