Selur's Little Message Board

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14

There is difference, see for example frame 74.

Here the comparison: https://imgsli.com/MjY3OTA5

using

Code:
clip = propainter(clip, length=100, mask_path="sample2_mask.png", enable_fp16=True, mask_dilation=8)

the speed increase to 1.11fps and the mask is big enough to cover all the edges.

Dan

By providing more frames, also the paper is rendered better: https://imgsli.com/MjY3OTE5

I attached RC2, with the color shift fixed.

Now the default is GPU=0 and f16 enabled.

Dan

in Lib\site-packages\vspropainter\model\misc.py", line 56

Code:
IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^([0-9]+)\.([0-9]+)\.([0-9]+)([^0-9][a-zA-Z0-9]*)?(\+git.*)?$",\

should be:

Code:
IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^(\d+)\.(\d+)\.(\d+)([\w\d\.].*)?$",\

to be 12.1
see: https://github.com/sczhou/CodeFormer/pul...2bb40536dc
(I did that before, but forgot to mention it)
With that adjustment, RC2 works as (adjusted) RC1 does here.

Cu Selur

Same test I ran in #9 is slower now.
Before I got:

Code:
encoded 192 frames in 50.79s (3.78 fps), 1707.74 kb/s, Avg QP:21.51

now I get:

Code:
encoded 192 frames in 87.40s (2.20 fps), 1707.83 kb/s, Avg QP:21.45

Okay, strangely commenting out the '@torch.inference_mode()'-lines gets speed up to 3.8fps again.
Side note: Additionally, increasing ref_stride to 100 increases speed to 3.99fps, increasing raft_iter to 30 does slow down the encoding to 2.02fps.

Keeping the '@torch.inference_mode()'-lines and using ref_stride=100 increases the speed to 5.14 fps.

Cu Selur

Ps.: did a small test with a non-transparent logo (see attachment)

The crop and only filter part of the image seems to work, but width&height need to be at least 256.

Code:
# Imports

import vapoursynth as vs

import site

import sys

import os

# Initialize VapourSynth core

core = vs.core

# Import scripts folder

scriptPath = 'F:/Hybrid/64bit/vsscripts'

sys.path.insert(0, os.path.abspath(scriptPath))

# Load plugins

core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/LSmashSource/LSMASHSource.dll")

# Import scripts

import validate

# Load source

clip = core.lsmas.LWLibavSource(source=r"running_car.mp4", format="YUV420P8", stream_index=0, cache=0, fpsnum=25, repeat=True, prefer_hw=0)

# changing range from limited to full range for propainter

clip = core.resize.Bicubic(clip, range_in_s="limited", range_s="full")

# setting color range to PC (full) range.

clip = core.std.SetFrameProps(clip=clip, _ColorRange=0)

# adjusting color space from YUV420P8 to RGB24 for propainter

clip = core.resize.Bicubic(clip=clip, format=vs.RGB24, matrix_in_s="709", range_s="full")

# Define the region coordinates and size

x = 80

y = 70

width = 400

height = 256

# Create a blank clip for the mask of the same size as the original clip

mask = core.std.BlankClip(clip=clip, color=[255, 255, 255])

mask = core.std.SetFrameProps(clip=mask, _ColorRange=1)

mask = core.std.CropRel(mask, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))

mask = core.std.AddBorders(mask, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))

# Convert the mask to grayscale

mask = core.resize.Bicubic(mask, format=vs.GRAY8, matrix_s="709")

# Binarize Mask

binarize_mask = core.std.BinarizeMask(clip, 1)

# Crop to the region of interest

cropped_clip = core.std.CropRel(clip, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))

# Apply propainter to the cropped region

from vspropainter import propainter

processed_cropped_clip = propainter(cropped_clip, length=250, mask_path="running_car_mask_cropped_400x256.png", device_index=0, enable_fp16=True)

# Pad the processed region back to the original size

padded_clip = core.std.AddBorders(processed_cropped_clip, left=x, top=y, right=clip.width - (x + width), bottom=clip.height - (y + height))

# Merge the processed region back into the original frame

final_clip = core.std.MaskedMerge(clip, padded_clip, mask, planes=[0,1,2])

# undo range change

final_clip = core.resize.Bicubic(final_clip, range_in_s="full", range_s="limited")

# Adjust output color from RGB24 to YUV420P8

final_clip = core.resize.Bicubic(clip=final_clip, format=vs.YUV420P8, matrix_s="709", range_s="limited")

# Set output frame rate to 25fps (progressive)

final_clip = core.std.AssumeFPS(clip=final_clip, fpsnum=25, fpsden=1)

# Output the final clip

final_clip.set_output()

Speed increased to:

Code:
encoded 192 frames in 48.05s (4.00 fps), 1831.56 kb/s, Avg QP:21.5

Cu Selur

(29.05.2024, 07:13)Selur Wrote: [ -> ]in Lib\site-packages\vspropainter\model\misc.py", line 56

Code:
IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^([0-9]+)\.([0-9]+)\.([0-9]+)([^0-9][a-zA-Z0-9]*)?(\+git.*)?$",\
should be:

Code:
IS_HIGH_VERSION = [int(m) for m in list(re.findall(r"^(\d+)\.(\d+)\.(\d+)([\w\d\.].*)?$",\
to be 12.1
see: https://github.com/sczhou/CodeFormer/pul...2bb40536dc
(I did that before, but forgot to mention it)
With that adjustment, RC2 works as (adjusted) RC1 does here.

Cu Selur

Thanks for this fix, but the color fix was obtained by changing the following line

Code:
if convert_to_pil:

            np_comp_frames = [Image.fromarray(cv2.cvtColor(cv2.resize(f, self.out_size), cv2.COLOR_BGR2RGB), 'RGB') for f in comp_frames]

Code:
if convert_to_pil:

            np_comp_frames = [Image.fromarray(cv2.resize(f, self.out_size), 'RGB') for f in comp_frames]

in any case in the coming RC3 it will be not more necessary the conversion to PIL image

Dan

Yes, the above fix, is needed to be compatible with newer pytorch versions.

I released the new RC3 (attached)

If are not found bugs should be also the initial release.

This is the (final) header:

Code:
def propainter(

    clip: vs.VideoNode,

    length: int = 100,

    clip_mask: vs.VideoNode = None,

    img_mask_path: str = None,

    mask_dilation: int = 8,

    neighbor_length: int = 10,

    ref_stride: int = 10,

    raft_iter: int = 20,

    mask_region: tuple[int, int, int, int] = None,

    weights_dir: str = model_dir,

    enable_fp16: bool = True,

    device_index: int = 0,

    inference_mode: bool = False

) -> vs.VideoNode:

    """ProPainter: Improving Propagation and Transformer for Video Inpainting

    :param clip:            Clip to process. Only RGB24 format is supported.

    :param length:          Sequence length that the model processes (min. 12 frames). High values will

                            increase the inference speed but will increase also the memory usage. Default: 100

    :param clip_mask:       Clip mask, must be of the same size and lenght of input clip. Default: None

    :param img_mask_path:   Path of the mask image: Default: None

    :param mask_dilation:   Mask dilation for video and flow masking. Default: 8

    :param neighbor_length: Length of local neighboring frames. Low values decrease the

                            memory usage. Default: 10

    :param ref_stride:      Stride of global reference frames. High values will allow to

                            reduce the memory usage and increase the inference speed. Default: 10

    :param raft_iter:       Iterations for RAFT inference. Low values will decrease the inference

                            speed but could affect the output quality. Default: 20

    :param mask_region:     Allow to restirct the region of the mask, format: (width, height, left, top).

                            The region must be big enough to allow the inference. Available only if clip_mask

                            is specified. Default: None

    :param enable_fp16:     If True use fp16 (half precision) during inference. Default: fp16 (for RTX30 or above)

    :param device_index:    Device ordinal of the GPU (if = -1 CPU mode is enabled). Default: 0

    :param inference_mode:  Enable/Disable torch inference mode. Default: False

    """

I added some interesting parameter.

1) clip_mask: now it is possible to pass a clip mask. You can test it using the provided sample and the following code:

Code:
# build clip mask

clip_mask = core.imwri.Read(["running_car_mask.png"])

clip_mask = core.std.Loop(clip=clip_mask, times=clip.num_frames)

clip_mask = core.std.AssumeFPS(clip=clip_mask, fpsnum=25, fpsden=1)

# remove mask using propainter

from vspropainter import propainter

clip = propainter(clip, length=96, clip_mask=clip_mask)

In my tests it seems that using a mask clip even if there is only one frame, improves the speed.

2) mask_region: it is possible to define a smaller area for apply the propainter mask. You can test it using the provided sample and the following code:

Code:
# build clip mask

clip_mask = core.imwri.Read(["running_car_mask.png"])

clip_mask = core.std.Loop(clip=clip_mask, times=clip.num_frames)

clip_mask = core.std.AssumeFPS(clip=clip_mask, fpsnum=25, fpsden=1)

# remove mask using propainter

from vspropainter import propainter

clip = propainter(clip, length=96, clip_mask=clip_mask, mask_region=(596-68*2, 336-28*2, 68, 28))

3) inference_mode: if true it will be enabled the "torch" inference mode. I don't have noted any speed increment/decrement by enabling it.

I was able to significantly speed up the inference using this code:

Code:
# build clip mask

clip_mask = core.imwri.Read(["running_car_mask.png"])

clip_mask = core.std.Loop(clip=clip_mask, times=clip.num_frames)

clip_mask = core.std.AssumeFPS(clip=clip_mask, fpsnum=25, fpsden=1)

# remove mask using propainter

from vspropainter import propainter

clip = propainter(clip, length=96, clip_mask=clip_mask, mask_region=(596-88*2, 336-38*2, 88, 38), ref_stride=25, raft_iter=15, inference_mode=True)

have fun!

Dan

Nice. I'll do some testing. Smile

Cu Selur

using:

Code:
clip = propainter(clip, clip_mask=clip_mask, length=96, mask_region=(596-88*2, 336-38*2, 88, 38), ref_stride=25, raft_iter=15, inference_mode=True)

I get:

Code:
encoded 192 frames in 39.53s (4.86 fps), 1645.06 kb/s, Avg QP:21.51

using:

Code:
clip = propainter(clip, clip_mask=clip_mask, length=250, mask_region=(596-88*2, 336-38*2, 88, 38), ref_stride=25, raft_iter=15, inference_mode=True)

I get:

Code:
encoded 192 frames in 38.54s (4.98 fps), 1649.41 kb/s, Avg QP:21.51

Nice. Not sure whether I get around to it today, but tomorrow I'll create a Hybrid dev version with support for ProPainter.
Will send you a link via pm once, I'm ready. Smile

Cu Selur

new version RC4. I added support for "mask_region" also for single image mask.

It is possible to test it using the following code:

Code:
from vspropainter import propainter

clip = propainter(clip, length=96, img_mask_path="running_car_mask.png", mask_region=(596-68*2, 336-28*2, 68, 28), inference_mode=True)

Dan

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Dan64

Selur

Selur

Selur

Dan64

Selur

Dan64

Selur

Selur

Dan64