08:58mriesch: hey all! i have got a v4l2 device that delivers nv12 and a gstreamer pipeline with v4l2src ! queue ! waylandsink
08:58mriesch: if i use the contiguous variant (v4l2 nv12), everything is fine
09:00mriesch: however, if i use the non-contiguous variant (v4l2 nm12, one buf for luma, one for chroma), the colors are all wrong and there are halos
09:01mriesch: halos = the chroma plane (half the size in nv12) is somehow stretched over the complete output
09:01mriesch: any idea what could be going wrong here?
09:07soreau: with weston?
09:08mriesch: yep
09:08soreau: weston git?
09:08soreau: have you tried any other compositor by chance?
09:08mriesch: weston 13.0.3
09:08mriesch: uh no
09:09soreau: I'd recommend trying weston git if you haven't, and maybe another compositor to see what happens there
09:09soreau: just to gain a better scope on the issue
09:10soreau: if it doesn't work with weston git though, you could file an issue
09:10mriesch: ok, will do
09:10daniels: I wouldn't expect anything's changed in git
09:10daniels: but I would be very interested to see what the offset param is when the dmabuf gets imported
09:11daniels: it sounds to me like the luma plane is being sampled again as the chroma plane, and yeah, nm12 is going to be funny because v4l2 mplane doesn't bother telling you what the byte offset is going to be
09:12mriesch: what is the meaning of "byte offset" in this scope?
09:13daniels: the offset into the dmabuf at which the buffer data begins, e.g. for the chroma plane in NM12, the byte at which the first UV pair can be found
09:14mriesch: so.. should be zero for both buffers in NM12 case, right?
09:16soreau: basically the offset would be the size of the Y buffer data, IIUC
09:17mriesch: soreau: in the nv12 case, surely
09:17mriesch: however, in nm12 you have two separate buffers, right?
09:19soreau: ah, I see
09:20soreau: so U would be the second plane @ 0 and the V would be second plane @ size of U
09:21mriesch: UV are interleaved, but yes, second plane @ 0
09:22mriesch: daniels: ok, two params, both have offset 1 and stride 3840 (for my 3840x2160 video input)
09:23mriesch: (offset 1 seems strange to me, but this is due to some gstreamer "skip" magic that i don't get)
09:24soreau: hm..
09:25soreau: mriesch: what if you pipe to file instead of waylandsink? or try glimagesink
09:25soreau: maybe it's gstreamer being funny
09:25mriesch: soreau: good point, forgot to mention that. i tried ! glupload ! glcolorconvert ! glimagesink and everything works find, even in the nm12 case
09:26soreau: ah
09:26soreau: so this smeels kinda like a waylandsink gstreamer bug
09:26mriesch: wouldn't want to waste too much of your time here, but could we talk briefly about what is happening here at the big picture level?
09:26soreau: mriesch: @ 3840x2160, you certainly have a big picture ;)
09:27mriesch: soreau: nice one ;-)
09:28mriesch: how are the two buffers combined to a single picture in weston?
09:28daniels: if the underlying GL or KMS driver supports importing as a single image then we do that, if not then we fall back to one R8 and one GR88 buffer and do the colourspace conversion manually
09:28daniels: what hardware & driver are you using?
09:29mriesch: this is on the rockchip rk3568 with mainline drivers
09:29mriesch: i.e., vop2 drm + panfrost gpu
09:32wlb: wayland-protocols Merge request !369 closed (xdg-shell: Add request to make a toplevel surface always on top)
09:32daniels: mriesch: hmm yeah, that certainly should work
09:33daniels: but it seems like it's definitely an issue in either GStreamer itself or waylandsink I'm afraid, because both NV12 and NM12 on the V4L2/GSt side resolve to just NV12 on the GPU/display side
09:36mriesch: daniels: possibly, yes. i guess this is quite a corner case with some barely tested code paths involved
09:37soreau: maybe try gstreamer git if you're not already and file an issue if it's still a problem
09:37mriesch: where does this check happen whether direct import is possible?
09:41soreau: looks like there are some fresh commits (Dec. 18th) that might be related https://gitlab.freedesktop.org/gstreamer/gstreamer/-/commits/main
09:45mriesch: soreau: ok, those *are* fresh
09:45mriesch: i'll give them a try, thx
09:46soreau: np, and of course there is also #gstreamer on this network too
09:52daniels: mriesch: yeah, NV12 in Weston is exhaustively tested, but NM12 in GSt probably less so
09:55mriesch: daniels: but what about nm12 in weston?
09:56mriesch: from an outsider's point of view nm12 does make a difference, even if at some point nv12 and nm12 are handled the same way
09:57daniels: weston never sees 'nm12'
09:57daniels: weston only ever sees DRM_FORMAT_NV12, with the Y and UV plane defined separately
09:58daniels: so if V4L2/GSt NV12 works but V4L2/GSt NM12 doesn't, then either the V4L2 device is producing erroneous NM12, or GSt is erroneously translating it
09:58soreau: garbage in.. garbage out
10:00mriesch: ah ok
10:02mriesch: just for the sake of completeness, could the drm driver be the culprit?
10:03soreau: if it works with glupload/glimagesink, it sounds more like gst to me
10:04linkmauve: soreau, there are two DRM drivers here, panthor will serve the GL parts, and rockchip-drm will take either a RGBX8888 dmabuf from panthor, or a NV12 buffer from V4L2 (after some mangling in Gstreamer).
10:04soreau: mriesch: I'm just kinda curious why you'd want waylandsink over glimagesink
10:05linkmauve: But I believe I have already tested decoding on the rk3568, although IIRC WebP decoding was broken on that SoC.
10:06mriesch: soreau: in my experience, better performance since the gpu is not involved
10:06soreau: huh
10:06linkmauve: soreau, it is more efficient, as it can use the specialized hardware present in the SoC to scale the buffer and convert it from Rec.709 YUV to RGB.
10:06linkmauve: Instead of doing it all in shaders.
10:06soreau: I see..
10:06linkmauve: At 4K resolutions it can make a huge difference, but even at lower resolutions.
10:07soreau: so for my case of live streaming rtsp streams to wl windows, I should use waylandsink instead of glimagesink for better/best performance?
10:08linkmauve: On my Intel laptop it can be the difference between watching a series in three hours in a train, or watching it for eight hours, something like that IIRC.
10:08linkmauve: Before the battery runs out.
10:08linkmauve: soreau, if your hardware supports it and you don’t need anything else from OpenGL, probably.
10:09soreau: linkmauve: how would I check if the hw supports it?
10:10linkmauve: % v4l2-ctl --list-formats-out -d0
10:10linkmauve: That’s for /dev/video0, increase the -d parameter for /dev/video1 etc.
10:10soreau: no, this is rtsp streams
10:10linkmauve: And on desktop, you most likely want % vainfo instead.
10:10soreau: well I am using vaapi..
10:10linkmauve: soreau, which you demux on the CPU, and then pass on to the hardware for decoding/presentation.
10:11linkmauve: soreau, if it’s already output by vaapi, then your hardware certainly supports it.
10:11soreau: I'm using rtspsrc ... name=src src. ! rtph264depay ! h264parse ! vaapih264dec ! queue ! vaapipostproc ! glimagesink src. ! decodebin ! audioconvert ! queue ! autoaudiosink
10:12linkmauve: Try s/glimagesink/waylandsink/ and see for yourself. :)
10:12linkmauve: You can use # intel_gpu_top to monitor the usage of the 3D parts.
10:13soreau: it's radeon, but thanks
10:13soreau: I'll give it a go
10:13linkmauve: # radeontop then, and possibly some other tools to check the power usage of the whole computer.
10:14soreau: sure
10:14soreau:types `top`
10:15linkmauve: That likely won’t give you anything but the CPU’s relative usage.
10:16soreau:colors linkmauve informative
10:18linkmauve: soreau, # powertop is what I use on my laptop.
10:19soreau: linkmauve: ok, thanks
10:19soreau: I usually just judge cpu usage by listening to the fan xD
12:04linkmauve: soreau, in neither case should there be much CPU usage, as you are delegating to either the GPU (in the OpenGL case) or to various dedicated bits of hardware (in the Wayland + DRM planes case, if your compositor supports that).
14:40soreau: linkmauve: using wlroots without libliftoff if it matters.. I'm not sure if it's placebo effect because I haven't properly measured, but it seems waylandsink uses slightly less resources in this scheme
15:11wlb: wayland Issue #516 opened by () Decouple Wayland server from window manager https://gitlab.freedesktop.org/wayland/wayland/-/issues/516
15:19wlb: wayland Issue #516 closed \o/ (Decouple Wayland server from window manager https://gitlab.freedesktop.org/wayland/wayland/-/issues/516)