This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

Using Stable Diffision models for Colorization
#33
Hi didris,

  your script seems Ok, the call to the function HAVC_cmnet2() is the one described in this post: #22

  Using ComfUI my inference speed is about 22sec. using the super optimized code of the server I was able to increase the speed of about 5x.
  So on your RTX5090 you should be able to perform the inference in less than 2sec (using the pair() trick), probably in 1sec.
  
  The total space of the files necessary to run the server are:
  
     venv :  4.96GB (o/w 4.28GB are related to torch package)
    .cache :  23.3GB (nunchaku-qwen-image) + 15.7GB (vae + text_encoder) = 39GB

   in summary to run the server are necessary about 44GB.

   The total memory (RAM + VRAM) necessary to run the server is about 46GB (see post #5), on top of this is necessary to add the RAM necessary to run Windows OS (about 12GB) for a total RAM of 58GB. As you can see is not the amount of RAM that usually is available on a standard PC.

   So I think that the usage of this model is limited to high-end workstations.

   I'm happy to know that Selur was able to run the model on its RTX4080, probably using the pair() trick should be able to perform the inference of a full frame in about 5sec. 
   Using a reference frame every 25, this imply that could be possible to colorize a clip at a speed of about 5fps, not too bad for a DiT model.

   I don't see any advantage in including the server in Hybrid, only disadvantages. But both Selur and you are asking for that, but I don't understand why. 
   If the steps to run the server are too complex, please suggest what are the steps to be improved.

   In any case to run the full DiT colorization in Hybrid it will be necessary to split the process in client/server as I already done for CMNET2 because these process are not compatible with Vapoursynth threading.  

   Moreover using a client/server architecture will allow users, willing to use the DiT colorizer with standard hardware, to rent a powerful GPU to run the server for few hours. It is the cheapest solution compared to a hardware upgrade (especially in these days). For example assuming to rent a RTX5090 it could be possible to colorize a clip at a speed of about 20/25 fps (almost in real-time).

 Let me know what you think.


Dan
Reply


Messages In This Thread
RE: Using Stable Diffision models for Colorization - by Dan64 - 12.05.2026, 09:48

Forum Jump:


Users browsing this thread: 1 Guest(s)