This forum uses cookies

Dan64 · 28.05.2024, 22:56

There is difference, see for example frame 74.

Here the comparison: https://imgsli.com/MjY3OTA5

using

clip = propainter(clip, length=100, mask_path="sample2_mask.png", enable_fp16=True, mask_dilation=8)

the speed increase to 1.11fps and the mask is big enough to cover all the edges.

Dan

By providing more frames, also the paper is rendered better: https://imgsli.com/MjY3OTE5

I attached RC2, with the color shift fixed.

Now the default is GPU=0 and f16 enabled.

Dan

***Selur*** · (This post was last modified: 29.05.2024, 07:17 by Selur.)

in Lib\site-packages\vspropainter\model\misc.py", line 56

IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^([0-9]+)\.([0-9]+)\.([0-9]+)([^0-9][a-zA-Z0-9]*)?(\+git.*)?$",\

should be:

IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^(\d+)\.(\d+)\.(\d+)([\w\d\.].*)?$",\

to be 12.1
see: https://github.com/sczhou/CodeFormer/pul...2bb40536dc
(I did that before, but forgot to mention it)
With that adjustment, RC2 works as (adjusted) RC1 does here.

Cu Selur

***Selur*** · (This post was last modified: 29.05.2024, 10:00 by Selur.)

Same test I ran in #9 is slower now.
Before I got:

encoded 192 frames in 50.79s (3.78 fps), 1707.74 kb/s, Avg QP:21.51

now I get:

encoded 192 frames in 87.40s (2.20 fps), 1707.83 kb/s, Avg QP:21.45

Okay, strangely commenting out the '@torch.inference_mode()'-lines gets speed up to 3.8fps again.
Side note: Additionally, increasing ref_stride to 100 increases speed to 3.99fps, increasing raft_iter to 30 does slow down the encoding to 2.02fps.

Keeping the '@torch.inference_mode()'-lines and using ref_stride=100 increases the speed to 5.14 fps.

Cu Selur

Ps.: did a small test with a non-transparent logo (see attachment)

***Selur*** · 29.05.2024, 15:55

The crop and only filter part of the image seems to work, but width&height need to be at least 256.

# Imports
import vapoursynth as vs
import site
import sys
import os

# Initialize VapourSynth core
core = vs.core

# Import scripts folder
scriptPath = 'F:/Hybrid/64bit/vsscripts'
sys.path.insert(0, os.path.abspath(scriptPath))

# Load plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/LSmashSource/LSMASHSource.dll")

# Import scripts
import validate

# Load source
clip = core.lsmas.LWLibavSource(source=r"running_car.mp4", format="YUV420P8", stream_index=0, cache=0, fpsnum=25, repeat=True, prefer_hw=0)

# changing range from limited to full range for propainter
clip = core.resize.Bicubic(clip, range_in_s="limited", range_s="full")
# setting color range to PC (full) range.
clip = core.std.SetFrameProps(clip=clip, _ColorRange=0)

# adjusting color space from YUV420P8 to RGB24 for propainter
clip = core.resize.Bicubic(clip=clip, format=vs.RGB24, matrix_in_s="709", range_s="full")

# Define the region coordinates and size
x = 80
y = 70
width = 400
height = 256

# Create a blank clip for the mask of the same size as the original clip
mask = core.std.BlankClip(clip=clip, color=[255, 255, 255])
mask = core.std.SetFrameProps(clip=mask, _ColorRange=1)
mask = core.std.CropRel(mask, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))
mask = core.std.AddBorders(mask, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))
# Convert the mask to grayscale
mask = core.resize.Bicubic(mask, format=vs.GRAY8, matrix_s="709")
# Binarize Mask
binarize_mask = core.std.BinarizeMask(clip, 1)


# Crop to the region of interest
cropped_clip = core.std.CropRel(clip, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))
# Apply propainter to the cropped region
from vspropainter import propainter
processed_cropped_clip = propainter(cropped_clip, length=250, mask_path="running_car_mask_cropped_400x256.png", device_index=0, enable_fp16=True)
# Pad the processed region back to the original size
padded_clip = core.std.AddBorders(processed_cropped_clip, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))


# Merge the processed region back into the original frame
final_clip = core.std.MaskedMerge(clip, padded_clip, mask, planes=[0,1,2])

# undo range change
final_clip = core.resize.Bicubic(final_clip, range_in_s="full", range_s="limited")


# Adjust output color from RGB24 to YUV420P8
final_clip = core.resize.Bicubic(clip=final_clip, format=vs.YUV420P8, matrix_s="709", range_s="limited")
# Set output frame rate to 25fps (progressive)
final_clip = core.std.AssumeFPS(clip=final_clip, fpsnum=25, fpsden=1)

# Output the final clip
final_clip.set_output()

Speed increased to:

encoded 192 frames in 48.05s (4.00 fps), 1831.56 kb/s, Avg QP:21.5

Cu Selur

Dan64 · 29.05.2024, 17:12

(29.05.2024, 07:13)Selur Wrote: in Lib\site-packages\vspropainter\model\misc.py", line 56

IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^([0-9]+)\.([0-9]+)\.([0-9]+)([^0-9][a-zA-Z0-9]*)?(\+git.*)?$",\
should be:

IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^(\d+)\.(\d+)\.(\d+)([\w\d\.].*)?$",\
to be 12.1
see: https://github.com/sczhou/CodeFormer/pul...2bb40536dc
(I did that before, but forgot to mention it)
With that adjustment, RC2 works as (adjusted) RC1 does here.

Cu Selur

Thanks for this fix, but the color fix was obtained by changing the following line

if convert_to_pil:
            np_comp_frames = [Image.fromarray(cv2.cvtColor(cv2.resize(f, self.out_size), cv2.COLOR_BGR2RGB), 'RGB') for f in comp_frames]

in

if convert_to_pil:
            np_comp_frames = [Image.fromarray(cv2.resize(f, self.out_size), 'RGB') for f in comp_frames]

in any case in the coming RC3 it will be not more necessary the conversion to PIL image

Dan

***Selur*** · 29.05.2024, 17:14

Yes, the above fix, is needed to be compatible with newer pytorch versions.

Dan64 · 29.05.2024, 17:38

I released the new RC3 (attached)

If are not found bugs should be also the initial release.

This is the (final) header:

def propainter(
    clip: vs.VideoNode,
    length: int = 100,
    clip_mask: vs.VideoNode = None,
    img_mask_path: str = None,
    mask_dilation: int = 8,
    neighbor_length: int = 10,
    ref_stride: int = 10,
    raft_iter: int = 20,
    mask_region: tuple[int, int, int, int] = None,
    weights_dir: str = model_dir,
    enable_fp16: bool = True,
    device_index: int = 0,
    inference_mode: bool = False
) -> vs.VideoNode:
    """ProPainter: Improving Propagation and Transformer for Video Inpainting

    :param clip:            Clip to process. Only RGB24 format is supported.
    :param length:          Sequence length that the model processes (min. 12 frames). High values will
                            increase the inference speed but will increase also the memory usage. Default: 100
    :param clip_mask:       Clip mask, must be of the same size and lenght of input clip. Default: None
    :param img_mask_path:   Path of the mask image: Default: None
    :param mask_dilation:   Mask dilation for video and flow masking. Default: 8
    :param neighbor_length: Length of local neighboring frames. Low values decrease the
                            memory usage. Default: 10
    :param ref_stride:      Stride of global reference frames. High values will allow to
                            reduce the memory usage and increase the inference speed. Default: 10
    :param raft_iter:       Iterations for RAFT inference. Low values will decrease the inference
                            speed but could affect the output quality. Default: 20
    :param mask_region:     Allow to restirct the region of the mask, format: (width, height, left, top).
                            The region must be big enough to allow the inference. Available only if clip_mask
                            is specified. Default: None
    :param enable_fp16:     If True use fp16 (half precision) during inference. Default: fp16 (for RTX30 or above)
    :param device_index:    Device ordinal of the GPU (if = -1 CPU mode is enabled). Default: 0
    :param inference_mode:  Enable/Disable torch inference mode. Default: False
    """

I added some interesting parameter.

1) clip_mask: now it is possible to pass a clip mask. You can test it using the provided sample and the following code:

# build clip mask
clip_mask = core.imwri.Read(["running_car_mask.png"])
clip_mask = core.std.Loop(clip=clip_mask, times=clip.num_frames)
clip_mask = core.std.AssumeFPS(clip=clip_mask, fpsnum=25, fpsden=1)
# remove mask using propainter
from vspropainter import propainter
clip = propainter(clip, length=96, clip_mask=clip_mask)

In my tests it seems that using a mask clip even if there is only one frame, improves the speed.

2) mask_region: it is possible to define a smaller area for apply the propainter mask. You can test it using the provided sample and the following code:

# build clip mask
clip_mask = core.imwri.Read(["running_car_mask.png"])
clip_mask = core.std.Loop(clip=clip_mask, times=clip.num_frames)
clip_mask = core.std.AssumeFPS(clip=clip_mask, fpsnum=25, fpsden=1)
# remove mask using propainter
from vspropainter import propainter
clip = propainter(clip, length=96, clip_mask=clip_mask, mask_region=(596-68*2, 336-28*2, 68, 28))

3) inference_mode: if true it will be enabled the "torch" inference mode. I don't have noted any speed increment/decrement by enabling it.

I was able to significantly speed up the inference using this code:

# build clip mask
clip_mask = core.imwri.Read(["running_car_mask.png"])
clip_mask = core.std.Loop(clip=clip_mask, times=clip.num_frames)
clip_mask = core.std.AssumeFPS(clip=clip_mask, fpsnum=25, fpsden=1)
# remove mask using propainter
from vspropainter import propainter
clip = propainter(clip, length=96, clip_mask=clip_mask, mask_region=(596-88*2, 336-38*2, 88, 38), ref_stride=25, raft_iter=15, inference_mode=True)

have fun!

Dan

***Selur*** · 29.05.2024, 17:45

Nice. I'll do some testing. Smile

Cu Selur

***Selur*** · (This post was last modified: 29.05.2024, 19:57 by Selur.)

using:

clip = propainter(clip, clip_mask=clip_mask, length=96, mask_region=(596-88*2, 336-38*2, 88, 38), ref_stride=25, raft_iter=15, inference_mode=True)

I get:

encoded 192 frames in 39.53s (4.86 fps), 1645.06 kb/s, Avg QP:21.51

using:

clip = propainter(clip, clip_mask=clip_mask, length=250, mask_region=(596-88*2, 336-38*2, 88, 38), ref_stride=25, raft_iter=15, inference_mode=True)

I get:

encoded 192 frames in 38.54s (4.98 fps), 1649.41 kb/s, Avg QP:21.51

Nice. Not sure whether I get around to it today, but tomorrow I'll create a Hybrid dev version with support for ProPainter.
Will send you a link via pm once, I'm ready. Smile

Cu Selur

Dan64 · 29.05.2024, 18:52

new version RC4. I added support for "mask_region" also for single image mask.

It is possible to test it using the following code:

from vspropainter import propainter
clip = propainter(clip, length=96, img_mask_path="running_car_mask.png", mask_region=(596-68*2, 336-28*2, 68, 28), inference_mode=True)

Dan

Login
Username:
Password:	Lost Password?
	Remember me