This forum uses cookies
This forum makes use of cookies to store your login information if you are registered, and your last visit if you are not. Cookies are small text documents stored on your computer; the cookies set by this forum can only be used on this website and pose no security risk. Cookies on this forum also track the specific topics you have read and when you last read them. Please confirm whether you accept or reject these cookies being set.

A cookie will be stored in your browser regardless of choice to prevent you being asked this question again. You will be able to change your cookie settings at any time using the link in the footer.

4k (HDR) x265 → 1080p (SDR) x264, slow speed and low resource utilization
#1
Helluh,

Iam testing an 4K to 1080p encoding currently. 

And to my surprise, i only achieve like near 20 fps.. 

Now, iam using an R9-5900x cpu and Multiple cores peaking at 5.0ghz (PBO2), yet for some reason the encode is going verryy slow.. peaking at 20fps 😐

Have checked a couple filters, and none are cpu/gpu intensive though !  i.e color:basic, detail sharpen, debanding and HDRtoSDR .. that's pretty much it..

Furthermore, cpu utilizes like 30 - 50% , Gpu 12 - 25% .. so very low resources are required for some reason , if you ask me  !?

Did something significant / fundamental changed about the codecs  ?
   
Currently using Hybrid latest 2024.06.16.1 ..


EDIT: Only the Debanding filter is somewhat gpu intensive, and is the bottleneck there, but by 5 - 6 fps.. So yes, i could have sworn i have had more fps with this cpu converting 4K to 1080p in the past..  When using no gpu intesive filters , all cores peak at 4650 Mhz .. so cpu gets utilized for atleast 99% .

Pretty shocked 4K is still a thing for modern cpu's even for gpu (Nvenc) !?

ta ta
Reply
#2
Quote:Did something significant / fundamental changed about the codecs ?
In the last few years: No.

Quote:Pretty shocked 4K is still a thing for modern cpu's even for gpu (Nvenc) !?
Converting 4k(HDR,h.265) -> 2k (SDR,H.264) using just using the chips on the graphic card:
"F:\Hybrid\64bit\NVEncC.exe" --avhw  -i "G:\TestClips&Co\files\HDR\HDR10\4K sun HDR test.mp4" --fps 25.000 --codec h264 --profile high --level auto --sar 1:1 --lookahead 32 --vbr 0 --vbr-quality 18.00 --aq --aq-strength 5 --aq-temporal --gop-len 0 --ref 6 --multiref-l0 3 --multiref-l1 3 --bframes 3 --direct auto --bref-mode auto --no-b-adapt --mv-precision Q-pel --cabac --deblock --preset quality --colorrange limited --colormatrix bt709 --vpp-colorspace hdr2sdr=mobius,source_peak=1000,ldr_nits=100,transition=0.3,peak=1 --vpp-resize auto --output-res 1920x1080 --vpp-gauss disabled --cuda-schedule sync --output "J:\tmp\4K sun HDR test_1_2024-06-30@07_13_41_7010_01.264"
--------------------------------------------------------------------------------
J:\tmp\4K sun HDR test_1_2024-06-30@07_13_41_7010_01.264
--------------------------------------------------------------------------------
NVEncC (x64) 7.57 (r2924) by rigaya, Jun 29 2024 14:09:42 (VC 1929/Win)
OS Version     Windows 11 x64 (22631) [UTF-8]
CPU            AMD Ryzen 9 7950X 16-Core Processor [5.52GHz] (16C/32T)
GPU            #0: NVIDIA GeForce RTX 4080 (9728 cores, 2505 MHz)[PCIe4x16][555.99]
NVENC / CUDA   NVENC API 12.2, CUDA 12.5, schedule mode: sync
Input Buffers  CUDA, 44 frames
Input Info     avcuvid: H.265/HEVC, 3840x2160, 25/1 fps
Vpp Filters    colorspace: cspconv(p010 -> yuv444(16bit))
matrix:bt2020nc->GBR
transfer:smpte2084->linear
hdr2sdr(mobius): source_peak=1000.00 ldr_nits=100.00
transition 0.30, peak 1.00
desat base 0.18, strength 0.75, exp 1.50
prim:bt2020->bt709
transfer:linear->bt709
matrix:GBR->bt709
cspconv(yuv444(16bit) -> yv12)
resize(bicubic): 3840x2160 -> 1920x1080
cspconv(yv12 -> nv12)
Output Info    H.264/AVC high @ Level auto
1920x1080p 1:1 25.000fps (25/1fps)
Encoder Preset quality
Rate Control   VBR
Multipass      none
Bitrate        0 kbps (Max: 162000 kbps)
Target Quality 18.00
QP range       I:0-51  P:0-51  B:0-51
QP Offset      cb:0  cr:0
VBV buf size   auto
Split Enc Mode auto
Lookahead      on, 32 frames, Adaptive I Insert
GOP length     250 frames
B frames       3 frames [ref mode: middle]
Ref frames     6 frames, MultiRef L0:3 L1:3
AQ             on (spatial, temporal, strength 5)
VUI            matrix:bt709,range:limited
Others         mv:Q-pel cabac deblock adapt-transform:auto bdirect:auto
encoded 125 frames, 154.32 fps, 13202.87 kbps, 7.87 MB
encode time 0:00:00, CPU: 0.5, GPU: 82.0, VE: 53.0, VD: 49.0, GPUClock: 2505MHz, VEClock: 2055MHz
frame type IDR  1
frame type I    1,  total size  0.20 MB
frame type P   31,  total size  3.33 MB
frame type B   93,  total size  4.34 MB
2024-06-30@07_13_41_7010_01_video finished after 00:00:01.719
finished...
I get 150fps here. Which seems like a decent speed.

If it's too slow for your taste, you should:
  • figure out whether it's the decoding, filtering or encoding that is slow.
  • if it's the decoding, try different decoders.
  • if it's the filtering, try adjusting the filter order (or example move filters behind the resizer) and try different filters and settings.
  • if it's the encoding, try tweaking your settings.

Using Vapoursynth:
# Imports
import vapoursynth as vs
# getting Vapoursynth core
import sys
import os
core = vs.core
# Import scripts folder
scriptPath = 'F:/Hybrid/64bit/vsscripts'
sys.path.insert(0, os.path.abspath(scriptPath))
# loading plugins
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/Support/fmtconv.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/ColorFilter/DGHDRtoSDR/DGHDRtoSDR.dll")
core.std.LoadPlugin(path="F:/Hybrid/64bit/vsfilters/SourceFilter/DGDecNV/DGDecodeNV.dll")
# Import scripts
import validate
# Source: 'G:\TestClips&Co\files\HDR\HDR10\4K sun HDR test.mp4'
# Current color space: YUV420P10, bit depth: 10, resolution: 3840x2160, frame rate: 25fps, scanorder: progressive, yuv luminance scale: limited, matrix: 2020ncl, transfer: smpte2084, primaries: bt.2020
# Loading G:\TestClips&Co\files\HDR\HDR10\4K sun HDR test.mp4 using DGSource
clip = core.dgdecodenv.DGSource("J:/tmp/mp4_103cd4c1d7cbc771969218d2162207ff_853323747.dgi")# 25 fps, scanorder: progressive
frame = clip.get_frame(0)
# Setting detected color matrix (2020ncl).
clip = core.std.SetFrameProps(clip=clip, _Matrix=9)
# setting color transfer (2084), if it is not set.
if validate.transferIsInvalid(clip):
  clip = core.std.SetFrameProps(clip=clip, _Transfer=16)
# setting color primaries info (to 2020), if it is not set.
if validate.primariesIsInvalid(clip):
  clip = core.std.SetFrameProps(clip=clip, _Primaries=9)
# setting color range to TV (limited) range.
clip = core.std.SetFrameProps(clip=clip, _ColorRange=1)
# making sure frame rate is set to 25fps
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
# making sure the detected scan type is set (detected: progressive)
clip = core.std.SetFrameProps(clip=clip, _FieldBased=0) # progressive
# adjusting color using HDR to SDR (DG)
clip = core.dghdrtosdr.DGHDRtoSDR(clip=clip, impl="255", mode="pq", fulldepth=True)
# Resizing using 10 - bicubic spline
clip = core.fmtc.resample(clip=clip, kernel="spline16", w=1920, h=1080, interlaced=False, interlacedd=False) # resolution 1920x1080 before YUV420P16 after YUV420P16
# adjusting output color from: YUV420P16 to YUV420P8 for x264Model
clip = core.resize.Bicubic(clip=clip, format=vs.YUV420P8, range_s="limited", dither_type="error_diffusion")
# set output frame rate to 25fps (progressive)
clip = core.std.AssumeFPS(clip=clip, fpsnum=25, fpsden=1)
# output
clip.set_output()
and x264:
"F:\Hybrid\64bit\x264.exe" --preset veryfast --crf 18.00 --profile high --level 5.1 --ref 3 --direct auto --b-adapt 0 --sync-lookahead 48 --qcomp 0.50 --rc-lookahead 40 --qpmax 51 --partitions i4x4,p8x8,b8x8 --no-fast-pskip --subme 5 --aq-mode 0 --vbv-maxrate 300000 --vbv-bufsize 300000 --sar 1:1 --non-deterministic --range tv --colormatrix bt709 --demuxer y4m --input-range tv --fps 25/1 --output-depth 8 --output "J:\tmp\2024-06-30@07_15_12_2510_03.264" -
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
x264 [info]: profile High, level 5.1, 4:2:0, 8-bit
x264 [info]: frame I:1     Avg QP:21.97  size: 57170
x264 [info]: frame P:31    Avg QP:21.27  size: 25223
x264 [info]: frame B:93    Avg QP:24.33  size:  8126
x264 [info]: consecutive B-frames:  0.8%  0.0%  0.0% 99.2%
x264 [info]: mb I  I16..4: 32.8% 52.5% 14.7%
x264 [info]: mb P  I16..4: 21.5%  0.0%  5.7%  P16..4: 28.1%  4.1%  1.5%  0.0%  0.0%    skip:39.2%
x264 [info]: mb B  I16..4:  0.8%  0.0%  1.3%  B16..8:  7.5%  1.9%  0.5%  direct: 1.2%  skip:86.8%  L0:34.3% L1:43.1% BI:22.6%
x264 [info]: 8x8 transform intra:4.6% inter:30.7%
x264 [info]: direct mvs  spatial:80.6% temporal:19.4%
x264 [info]: coded y,uvDC,uvAC intra: 28.6% 27.9% 11.4% inter: 3.5% 2.2% 0.0%
x264 [info]: i16 v,h,dc,p: 28% 38% 12% 22%
x264 [info]: i8 v,h,dc,ddl,ddr,vr,hd,vl,hu:  7% 36% 45%  2%  2%  1%  4%  1%  3%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu:  9% 27% 17%  5% 10%  6% 13%  4%  9%
x264 [info]: i8c dc,h,v,p: 63% 26%  7%  3%
x264 [info]: Weighted P-Frames: Y:6.5% UV:0.0%
x264 [info]: ref P L0: 58.9% 23.4% 17.8%
x264 [info]: ref B L0: 88.3% 10.1%  1.5%
x264 [info]: ref B L1: 95.6%  4.4%
x264 [info]: kb/s:2551.74
encoded 125 frames, 100.89 fps, 2551.74 kb/s
2024-06-30@07_15_12_2510_03_video finished after 00:00:01.935
finished...
I get 100fps.

Quote:Furthermore, cpu utilizes like 30 - 50% , Gpu 12 - 25% .. so very low resources are required for some reason , if you ask me !?
No worry, I won't ask you since you clearly didn't do some testing to figure out where your 'problems' are.

Using LibavSMASHSource(cpu) instead of DGDecNV, I get ~63fps.
Using LibavSMASHSource(gpu) instead of DGDecNV, I get ~63fps.
Using FFMS2 instead of DGDecNV, I get ~63fps.

Since I suspect the main slow down is due to the HDR->SDR conversion (x264 encoding):
Using HDR to SDR (DG) with DGDecNV, I get ~100fps.
Using HDRToSDR with DGDecNV, I get ~11fps.
Using ToneMap with DGDecNV, I get ~48fps.
Using ToneMap (Placebo) with DGDecNV, I get ~50fps.
Using TimeCuve with DGDecNV, I get ~80fps.
Using HDR to SDR (DG) with DGDecNV and using DGDec for Resizing, I get ~206fps. (this way resizing is done during the decoding)

Using additional filtering can slow things down again,... but with some testing you should be able to figure out what are bottlenecks and what are alternatives.
At the end it comes down to what filters&co do you want and what speed penalty you have to pay to use them.

=> Hybrid offers a variety of screws to turn and tweak stuff. You might benefit from testing different decoder-, filter-, encoder-settings.

Cu Selur

Ps.: I adjusted the title, since yours was crap.
Reply
#3
(30.06.2024, 07:44)Selur Wrote:
Quote:Pretty shocked 4K is still a thing for modern cpu's even for gpu (Nvenc) !?
Converting 4k(HDR,h.265) -> 2k (SDR,H.264) using just using the chips on the graphic card:

First off.. since when is 2K (2560x1440)  = 1080p (Output res = 1920x1080 🤔) ?

You probebly meant 4K to <2K ☝️😀 .. right? 



(30.06.2024, 07:44)Selur Wrote: I get 150fps here. Which seems like a decent speed.

If it's too slow for your taste, you should:
  • figure out whether it's the decoding, filtering or encoding that is slow.
  • if it's the decoding, try different decoders.
  • if it's the filtering, try adjusting the filter order (or example move filters behind the resizer) and try different filters and settings.
  • if it's the encoding, try tweaking your settings.


and x264:
I get 100fps.


Anyway.. i have found out what filter is bottlenecking the performance .. 

Using your settings i now get 👇

Nvenc = 130  - 140 fps 👀 😏
X264 = 110 - 120 Fps 



(30.06.2024, 07:44)Selur Wrote: No worry, I won't ask you since you clearly didn't do some testing to figure out where your 'problems' are.

Using LibavSMASHSource(cpu) instead of DGDecNV, I get ~63fps.
Using LibavSMASHSource(gpu) instead of DGDecNV, I get ~63fps.
Using FFMS2 instead of DGDecNV, I get ~63fps.
Since I suspect the main slow down is due to the HDR->SDR conversion (x264 encoding):
Using HDR to SDR (DG) with DGDecNV, I get ~100fps.
Using HDRToSDR with DGDecNV, I get ~11fps.
Using ToneMap with DGDecNV, I get ~48fps.
Using ToneMap (Placebo) with DGDecNV, I get ~50fps.
Using TimeCuve with DGDecNV, I get ~80fps.
Using HDR to SDR (DG) with DGDecNV and using DGDec for Resizing, I get ~206fps. (this way resizing is done during the decoding)



From the long troubleshooting list ↑ you've posted, it lacks one other particular filter that isn't included in that long list, but might cripple an HIGH END system aswell .. 😏

Iam refering to the filter Colors → basic → Levels  & Colors HDR to SDR (DG) !

The root of the cause is, using both filters simultaneously that crippled my system to single digit fps encode 😐!!

Fun fact → Using the filters independently = issue solved = System going Full throttle 😗👌



(30.06.2024, 07:44)Selur Wrote: Ps.: I adjusted the title, since yours was crap.

Hey, topic title sure wos short uhuh.. but to the point  ☝️(°^ °).. no?



Like whats the first thing you think of when you see a duck on a toilet ?  

Thats Right..  A → ToiletDuck...  ☝️Big Grin   

And not a "2x2x4 feathered poop drainage system"  ... right ¯\_(ツ)_/¯?


Ta TA
ToiletDuck (°^ °)
Reply
#4
Quote:First off.. since when is 2K (2560x1440) = 1080p (Output res = 1920x1080 🤔) ?
I adjusted the title. to 1080p

Happy, you found what's causing the bottleneck with your settings.

Cu Selur
Reply
#5
(30.06.2024, 18:43)Selur Wrote:
Quote:First off.. since when is 2K (2560x1440)  = 1080p (Output res = 1920x1080 🤔) ?
I adjusted the title. to 1080p

Happy, you found what's causing the bottleneck with your settings.

Cu Selur

🤭👌

Still want to emphasize, something isn't right about a basic color filter to have such a huge impact on the encoding performance 😐 !
You want to take a look at that ↑ !!!

ta ta
Reply
#6
Some combination of filters are slower than others.
Filters get slower with high resolutions, especially non-gpu filters.
When downscaling, moving filters behind the resizer can have huge speed impacts.

Encoding (av1) with levels enabled for 4k I get ~71fps, without it, I get ~84fps and like you wrote this is a rather simple filter. Other filters will even decrease the speed even more.

=> Nothing to look at, from my point of view. This is just how things are the way you configure stuff.

Cu Selur

Ps.: the simple color filters can probably be ported to GLSL-code (if you write a working filter I could add it to Hybrid) which could be used through vs-placebo like the other GLSL filters (without direct integration those can be used through Filtering->Vapoursynth->Other->GLSL.
Reply
#7
(01.07.2024, 09:46)Selur Wrote: Some combination of filters are slower than others.
Filters get slower with high resolutions, especially non-gpu filters.
When downscaling, moving filters behind the resizer can have huge speed impacts

I get that, but are you suggesting to change the filter order and do the filtering before resizing ?  Wouldn't that have a bigger impact in a bad way, since you apply the filters to the original (4K) resolution then !


(01.07.2024, 09:46)Selur Wrote: Encoding (av1) with levels enabled for 4k I get ~71fps, without it, I get ~84fps and like you wrote this is a rather simple filter. Other filters will even decrease the speed even more.

I get that too.. aint it great 😁👌

But in you example you talk about a 10 maybe 20 fps decreas worst case..  In my example iam losing more like 50 - 80 fps Avg , when using the 2 color filters combined 🤯 !


(01.07.2024, 09:46)Selur Wrote: Ps.: the simple color filters can probably be ported to GLSL-code (if you write a working filter I could add it to Hybrid) which could be used through vs-placebo like the other GLSL filters (without direct integration those can be used through Filtering->Vapoursynth->Other->GLSL.

Sure...easy peasy lemon squeezy 🙄
Reply
#8
I'm offline tomorrow till next monday (festival), but maybe I'll find some time to look into writing some glsl stuff for mpv for simple gamma&co manipulations. (should not be easy,.. last time I wrote glsl code was +10years ago Wink)

Cu Selur
Reply
#9
(01.07.2024, 22:56)Selur Wrote: I'm offline tomorrow till next monday (festival)

ROCK 💃  ON .. my majesty


(01.07.2024, 22:56)Selur Wrote: (should not be easy,.. last time I wrote glsl code was +10years  ago Wink)

Right... ↓

...as your humble servant have said...

"Easy - Peasy - Lemon - Squeezy" 😜
Reply
#10
Wrote a small gamma GLSL filter. (will add it to the next release)
Hybrid got only:
* Saturation
* Gamma (new)
as simple glsl filters.
Hybrid also has a filter for:
* DeRinging
* Luma Sharpening
* Adaptive Sharpening
* CAS
* Filmgrain
and a bunch of other filter that are not directly integrated into the gui, see: https://github.com/Selur/hybrid-glsl-filters/


Cu Selur
Reply


Forum Jump:


Users browsing this thread: 3 Guest(s)