12.05.2026, 09:48
Hi didris,
your script seems Ok, the call to the function HAVC_cmnet2() is the one described in this post: #22
Using ComfUI my inference speed is about 22sec. using the super optimized code of the server I was able to increase the speed of about 5x.
So on your RTX5090 you should be able to perform the inference in less than 2sec (using the pair() trick), probably in 1sec.
The total space of the files necessary to run the server are:
venv : 4.96GB (o/w 4.28GB are related to torch package)
.cache : 23.3GB (nunchaku-qwen-image) + 15.7GB (vae + text_encoder) = 39GB
in summary to run the server are necessary about 44GB.
The total memory (RAM + VRAM) necessary to run the server is about 46GB (see post #5), on top of this is necessary to add the RAM necessary to run Windows OS (about 12GB) for a total RAM of 58GB. As you can see is not the amount of RAM that usually is available on a standard PC.
So I think that the usage of this model is limited to high-end workstations.
I'm happy to know that Selur was able to run the model on its RTX4080, probably using the pair() trick should be able to perform the inference of a full frame in about 5sec.
Using a reference frame every 25, this imply that could be possible to colorize a clip at a speed of about 5fps, not too bad for a DiT model.
I don't see any advantage in including the server in Hybrid, only disadvantages. But both Selur and you are asking for that, but I don't understand why.
If the steps to run the server are too complex, please suggest what are the steps to be improved.
In any case to run the full DiT colorization in Hybrid it will be necessary to split the process in client/server as I already done for CMNET2 because these process are not compatible with Vapoursynth threading.
Moreover using a client/server architecture will allow users, willing to use the DiT colorizer with standard hardware, to rent a powerful GPU to run the server for few hours. It is the cheapest solution compared to a hardware upgrade (especially in these days). For example assuming to rent a RTX5090 it could be possible to colorize a clip at a speed of about 20/25 fps (almost in real-time).
Let me know what you think.
Dan
your script seems Ok, the call to the function HAVC_cmnet2() is the one described in this post: #22
Using ComfUI my inference speed is about 22sec. using the super optimized code of the server I was able to increase the speed of about 5x.
So on your RTX5090 you should be able to perform the inference in less than 2sec (using the pair() trick), probably in 1sec.
The total space of the files necessary to run the server are:
venv : 4.96GB (o/w 4.28GB are related to torch package)
.cache : 23.3GB (nunchaku-qwen-image) + 15.7GB (vae + text_encoder) = 39GB
in summary to run the server are necessary about 44GB.
The total memory (RAM + VRAM) necessary to run the server is about 46GB (see post #5), on top of this is necessary to add the RAM necessary to run Windows OS (about 12GB) for a total RAM of 58GB. As you can see is not the amount of RAM that usually is available on a standard PC.
So I think that the usage of this model is limited to high-end workstations.
I'm happy to know that Selur was able to run the model on its RTX4080, probably using the pair() trick should be able to perform the inference of a full frame in about 5sec.
Using a reference frame every 25, this imply that could be possible to colorize a clip at a speed of about 5fps, not too bad for a DiT model.
I don't see any advantage in including the server in Hybrid, only disadvantages. But both Selur and you are asking for that, but I don't understand why.
If the steps to run the server are too complex, please suggest what are the steps to be improved.
In any case to run the full DiT colorization in Hybrid it will be necessary to split the process in client/server as I already done for CMNET2 because these process are not compatible with Vapoursynth threading.
Moreover using a client/server architecture will allow users, willing to use the DiT colorizer with standard hardware, to rent a powerful GPU to run the server for few hours. It is the cheapest solution compared to a hardware upgrade (especially in these days). For example assuming to rent a RTX5090 it could be possible to colorize a clip at a speed of about 20/25 fps (almost in real-time).
Let me know what you think.
Dan

