This forum uses cookies

Dan64 · 11.05.2026, 20:08

(11.05.2026, 15:44)Selur Wrote: I then stopped the server, and called start_server.cmd:

(.venv) F:\Hybrid\64bit\Vapoursynth\DiTServerRPC-main>start_server.cmd Der Befehl "erver.cmd" ist entweder falsch geschrieben oder konnte nicht gefunden werden. Der Befehl "age:" ist entweder falsch geschrieben oder konnte nicht gefunden werden. Der Befehl "rt_server.cmd" ist entweder falsch geschrieben oder konnte nicht gefunden werden. ............................... Der Befehl "f" ist entweder falsch geschrieben oder konnte nicht gefunden werden. [ERROR] Unknown precision argument: "". Use "fp4" or "int4".
=> That didn't work.

Does it really make sense to add this to Hybrid? If yes, in what way?
Adding it to the torch add-on seems like a bad idea, since updates&co could break stuff to easy. (also it's huge)
So only way, this does seem to make sense would be to create a separate add-on and depending on whether it is present or not additional options could be available in HAVC; assuming the plan is to use this in HAVC.

Cu Selur

Ps.: 'DiTServerRPC-main'-folder is ~5GB in size.

The problem is clear: .cmd files have Unix line endings (LF) instead of Windows (CRLF). When Windows CMD reads an LF-only file, it doesn't recognize the lines correctly and interprets comments and configuration text as commands to execute—hence all those errors. The cause: the files were created in LF, and when downloaded from GitHub with core.autocrlf=false, they remain in LF. Immediate fix: You can convert the downloaded files with Notepad++ → Edit → EOL Conversion → Windows (CRLF), or with VS Code by clicking LF in the bottom right and choosing CRLF.

Does it really make sense to add this to Hybrid? No, this is the reason why I split the project in client/server, in HAVC will be implemented only the client part, which is lightweight and has no dependencies. If one want to use the client, must download the server from github and run it.

Dan

P.S.
If you run

python dit_client_pair_example.py --pipeline-config qwen_config_int4.json --use-shm

you shoud be able to colorize 2 images in about 12sec, i.e. 6sec per image, a 2x increase of speed for free

(11.05.2026, 18:12)didris Wrote: installation was not complicated, but I also could not get it to work this way.
I installed it on embedded python - there are quite a few models - C:\Users\YOUR_USERNAME\.cache\huggingface\hub\ - 30.8 GB

The result is an average of 9 seconds per image, uses an average of 23 GB of gpu memory and 39 GB of ram during the process /my card is rtx5090/, It is probably possible to improve the time, but it will be at the expense of quality. It does not colorize quite evenly - on different frames the same thing sometimes colors it differently - probably a lot depends on what is set in the prompt.

Once again, thanks to Dan and Selur for what they have done.

I don't understand why you was not able to follow the instructions provided in github, maybe you had the same Selur's problems regarding LF/CRLF.

On my RTX5070Ti I'm getting the same speed of about 8/9 sec per image, here my output

(.venv) PS D:\PProjects\DiTServerRPC> python dit_client_pair_example.py --pipeline-config qwen_config_fp4.json --use-shm
[INFO] Connecting to http://127.0.0.1:8765/ ...
[INFO] Server is reachable.
[INFO] Transport: shared memory
[INFO] Pipeline already loaded on server.
[INFO] Image 1: sample1_bw.jpg  (1480x1080 px)
[INFO] Image 2: sample2_bw.jpg  (1480x1080 px)
[INFO] Running paired inference (gap=8px) ...
[INFO] Inference time : 8.12s total  (4.06s per image)
[INFO] Round-trip time: 8.28s
[INFO] Saved: sample1_colorized.jpg
[INFO] Saved: sample2_colorized.jpg

If you run

python dit_client_pair_example.py --pipeline-config qwen_config_fp4.json --use-shm

you are able to colorize 2 images at the same speed of 1 image, a 2x increase of speed for free.

you can change qwen_config_fp4.json as follow

{
    "model_name":            "nunchaku-qwen",
    "model_precision":       "fp4",
    "model_rank":            "32",
    "model_inference_steps": "4",
    "cache_dir":             "C:\Users\YOUR_USERNAME\.cache\huggingface\hub",
    "full_model_path":       ""
}

to use your HF cache dir.

Dan

P.S.
Use my version which uses the shared memory instead of conversion of image in PNG->bytes, you will be able to increase the speed by 25% (from 5sec. to 4sec.)
Also try to change the line 143 in dit_colorize_main.py as

if torch.cuda.get_device_properties(0).total_memory / (1024 ** 3) < 48:

to see if the optimizations implemented for 16GB VRAM, will work also on your RTX5090

Login
Username:
Password:	Lost Password?
	Remember me