The previous comparison was unfair because as reference image was used a
ground truth image.
In the real situation the "ground truth" image is not available and cannot be provided manually inside an automatic video coloring process.
The most realistic solution is to provide at every scene change a reference image provided by another automatic image coloring model.
In this case the best candidate for providing the reference images is DDColor.
Given that the comparison with "DDeoldifiy(Stable)" was already provided, I will show the comparison with "DDColor(rf=32)".
1) Start frame (for Bistnet was provided the DDColor image as reference)
in this case the 2 images are the same by construction
2) Frame 25
in this case Bistnet is stable, while in DDColor the boys' shirts are starting to turn green
3) Frame 80
even in this case Bistnet is stable, while in DDColor the boy's shirt has become green and his trousers are almost blue
3) Frame 113
in this case Bistnet is still stable, while in DDColor the girl's skirt also turned blue
This example shows that Bistnet could improve the stability of images provided by DDColor when the DDColor images are used (only) as reference, but the comparison provides also a clear example of the DDColor instability.
Comparing the Bistnet images obtained with this example with the "DDeoldifiy(Stable)" images provided previously, it is possible to see (a part the choice of colors) the both present a good color image stability.
But "DDeoldifiy(Stable)" is about 33x faster than Bistnet which is too slow and in practice cannot be used.
The author wrote that he will provide a new version called
ColorMNet that should be faster and less memory hungry, let's see...
Dan