IRC Logs of #dri-devel on irc.freenode.net for 2023-10-26

07:55 lumag: mripard, if you have time, could you please look at https://patchwork.freedesktop.org/patch/561683/?series=123224&rev=2 ?
08:36 austriancoder: emersion: maybe you as wl expert might have some time to have a look at this MR? https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/26
08:37 emersion: pq: ^ you wrote an explanation of why it's not a good idea to check in scanner-generated code right?
08:37 emersion: if you have a link that'd be useful
09:47 wv: pq, you where a weston developer right? I'm trying to debug my gstwaylandsink to get the best memory path. But it's failing on dmabuf creation, so fall backs to SHM. Would you have any clue where to look for?
09:47 wv: it appears zwp_linux_buffer_params_v1_create is failing on me
09:48 wv: but I have no idea where this one is going to
09:49 wv: width, height, format is 360x240, NV12
09:57 pq: emersion, I don't think I can find any explanation I might have written about checking in wayland-scanner generated code. But I should have no problem thinking up the points again.
09:59 pq: wv, hmm, I wonder if weston actually had any debugging already written for that...
10:01 pq: wv, it's most likely Mesa that rejects the dmabuf when you get a graceful failure. Or whatever you use as your EGL implementation.
10:02 wv: well, the gracefull failure is gstreamer reporting it cannot create the linux-dmabuf buffer, depending on !data.wbuf
10:03 wv: if I know the code where zwp_linux_buffer_params_v1_create is actually calling to, I can maybe add some debugging there, but I'm a bit lost currently
10:03 pq: the graceful error I'm talking about is the Wayland event zwp_linux_buffer_params_v1.failed.
10:04 pq: wv, right, so you can see that Weston's linux-dmabuf.c has very few paths that would fail without sending a protocol error.
10:07 pq: most likely it's gl_renderer_import_dmabuf() that fails
10:09 wv: how can I output wl_resource_post_error?
10:09 pq: wv, env var WAYLAND_DEBUG=client, but I would hope that gst would print it too somehow.
10:10 pq: it's a protocol error, and protocol errors always disconnect the client, which means the app usually quits
10:10 pq: it cannot fall back
10:10 pq: that's why I think you are not hitting a protocol error
10:13 wv: ct_wl_buffer:<GstWlDisplay@0x1628e00> plane 1, offset 88320, stride 368
10:13 wv: [2471746.758] -> zwp_linux_buffer_params_v1@20.add(fd 32, 1, 88320, 368, 0, 0)
10:13 wv: [2471751.129] -> zwp_linux_buffer_params_v1@20.create(360, 240, 842094158, 0)
10:13 wv: [2471760.513] zwp_linux_buffer_params_v1@20.failed()
10:13 wv: [2471761.044] -> zwp_linux_buffer_params_v1@20.destroy()
10:14 pq: yup
10:15 wv: that's for plane 1
10:15 pq: what about the other planes?
10:15 wv: ct_wl_buffer:<GstWlDisplay@0x1628e00> plane 0, offset 0, stride 368
10:15 wv: [2471738.734] -> zwp_linux_buffer_params_v1@20.add(fd 31, 0, 0, 368, 0, 0)
10:16 wv: there are only 2 layers
10:16 wv: *planes
10:16 pq: that's correct for NV12, right?
10:17 pq: also modifier is linear for both, so nothing odd here
10:18 pq: wv, remind me, is this upstream version of Weston and Mesa drivers?
10:19 wv: yes, weston 10.0.2, mesa 23.1.8
10:20 pq: my likely guess is that the GPU driver just doesn't like the dmabuf for some reason
10:21 wv: I'm just seeing my weston version is quiet off mesa... Upstream weston is already at 12.x
10:21 pq: IIRC NV12 should have two import paths in Weston, as NV12 and as R8+GR88, and both would fail here.
10:21 wv: Well, I've limited my planes to be rgb565, maybe that can be an issue?
10:21 pq: well, should be easy to get the latest Weston and test, right?
10:22 pq: I don't remember anything that might have improved that inside Weston, but it's possible
10:23 pq: it could be something as silly as the dmabuf pointing to wrong type of memory, like VRAM vs. sysram, or some more subtle difference
10:24 pq: that's why zwp_linux_dmabuf_v1 latest version tells the client which DRM devices the dmabuf should work with, so that the app can try its best
10:25 pq: I don't know how much gst tries to make use of that information
10:33 wv: I'll try some things out. Would it be possible that it's because I limited my plances capabilities to be rgb565 only?
10:34 pq: of course!
10:34 pq: erm...
10:35 pq: if you hacked EGL, then yes
10:35 pq: NV12 is not RGB565, obviously
10:36 pq: and if you removed NV12 support from KMS planes, then naturally NV12 cannot be scanned out directly but needs to go through a GPU copy/conversion.
10:36 pq: For dmabuf import in Weston though, the important part is EGL.
10:37 pq: Weston will reject any dmabuf it cannot import to EGL for texturing by the GPU, because that's is the ultimate fallback path that always needs to work.
10:40 pq: emersion, https://gitlab.freedesktop.org/freedreno/freedreno/-/merge_requests/26/diffs#note_2143774
10:41 emersion: ty pq!
10:43 pq: emersion, feel free to lift that to anywhere more accessible :-)
10:53 tomba: In a situation where the bootloader sets up the display, and simplefb provides an early fb, are there any mechanisms/helpers to move/copy the early fb or its contents to the main DRM driver's fb?
10:54 tomba: (In a context of the fb being in the system ram)
11:12 cwabbott: alyssa: I'm looking at one of the GL CTS double tests because of something else and it looks like there's a bug with the preamble pass where it's storing extra values in the preamble that are never used
11:13 cwabbott: it's doing ceil() of a dvec2 uniform, and thanks to your work the entire thing is getting shoved in the preamble and the main shader just reads 4 constants, however the preamble is writing way more than 4 constants
11:13 cwabbott: the test is KHR-GL46.gpu_shader_fp64.builtin.ceil_dvec2 if you wanna reproduce
11:14 cwabbott: I'm not sure if I'll have time to look into it
11:15 tursulin: robclark: ping on https://lists.freedesktop.org/archives/dri-devel/2023-September/424905.html - can you live with it or object?
11:16 alyssa: cwabbott: IIRC there's a known bug where all the booleans used by if's get stored even when they're not read
11:17 alyssa: but the patch I had to fix that hosed freedreno (or shaderdb or something) so I dropped and forgot about it
11:17 cwabbott: ah yeah, I think that's it
11:18 cwabbott: I can look at it later if you have a patch
11:18 alyssa: will try to cook something up
11:18 cwabbott: right now I gotta reduce this monstrosity to find what's causing it to hang with my changes to use shared regs :/
11:31 llyyr: is the envvars document not supposed to be an exhaustive list? It's missing quite a few options, at least for AMD_DEBUG like useaco or noefc
12:09 alyssa: llyyr: patches welcome :)
12:50 mclasen: pq: is there any documentation or examples for what clients should do with the device information in the dmabuf protocol?
12:52 emersion: mclasen: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/blob/main/stable/linux-dmabuf/feedback.rst?ref_type=heads
12:54 mclasen: yes, I've read that (or rather, the identical part of the xml), but my dmabufs are coming from pipewire or gstreamer. Should I give them the device? If so, how?
12:55 pq: I guess the most fundamental problem is still unsolved? some sort of API where you could list all devices and tell it allocate.
12:55 pq: or The Unix Device Allocator project
12:56 swick[m]: mclasen: in general you could try to allocate on that device to increase your chances to get direct scanout but I'm not sure if pipewire or gstreamer can be told which device to use for something
12:56 emersion: there is no "identical part of the xml"
12:57 emersion: the xml and this document are complementary
12:57 pq: ...or to get dmabuf import working at all to begin with
12:57 swick[m]: true
12:58 mclasen: emersion: good to know, I had so far mistaken the xml as the sole protocol docs
12:58 emersion: the xml is pretty terse/dry because it's normative
12:58 emersion: the docs give specific guidelines for specific client architectures
12:59 mclasen: thanks, very useful
12:59 emersion: "if your client does XXX, then this is how you should use the protocol"
12:59 emersion: (and in particular interpret the device events)
13:12 Company: next step: do a fullscreen playback of a screencast and try to make it hit direct scanout
13:13 Company: and if you do it right, you invented a method to make the compositor loop-de-loop present
13:14 Company: the reddit guy with the drawing tablet is gonna love that
13:15 emersion: sway supports this already, via wl-mirror
13:15 Company: the one who used obs to mirror his HDR monitor onto the tablet when drawing
13:17 alyssa: cwabbott: ok, here's the issue
13:18 alyssa: this patch "fixes" it https://rosenzweig.io/0001-nir-opt_preamble-Don-t-store-if-conditions.patch
13:18 alyssa: in the sense it no longer stores the if conds
13:18 alyssa: BUT it's a huge shaderdb loss
13:18 alyssa: because it also no longer stores if-conds for if's that we end up not moving
13:19 alyssa: (regressing stuff where you have a big uniform expression for a condition but the interior of the if is not uniform)
13:19 alyssa: see android/angle/hill_climb_racing/73.shader_test in rob's shaderdb for ex
13:20 alyssa: i've paged out the candidate vs can_move vs replace vs .. details so not sure rn what the proper fix is
13:21 MrCooper: Company: the client would probably get stuck, because the compositor can't release the scanout buffer
13:22 Company: MrCooper: there's a pool involved though, so that should work (assuming the pool is large enough)
13:22 MrCooper: hmm, frozen screen might actually be the expected result for that, since the client is asking the compositor to present what it's already presenting
13:23 Company: it's a fun question to think about what should happen
13:24 Company: you can then take 2 monitors and play the game with a screencast of monitor 1 displaying on monitor 2 and vice versa
13:25 Company: do they display the same buffer?
13:41 robclark: tursulin: a-b
13:42 tursulin: robclark: ty!
15:17 gfxstrand: dcbaker: Any progress on 32-bit proc macro stuff?
15:17 gfxstrand: dcbaker: I'd really like to land NAK next week and that's the blocker.
15:22 Venemo: is there any reason we have "feature", "feature_request" and "enhancement" tags in GitLab? it seems they are redundant
15:35 gfxstrand: Do the a*_vk_full tests normally fail?
15:36 pendingchaos: I suppose that features are user facing (an extension, for example), while enhancements (an optimization or maybe a cleanup) might not be
15:38 gfxstrand: cwabbott, danylo: They're blowing up on !25894 and I don't think anything I'm doing there should affect blits...
15:39 cwabbott: gfxstrand: there's a commit from anholt_ that fixes the expectations, I think
15:39 alyssa: gfxstrand: also _full is a nightly fwiw
15:39 cwabbott: usually they blow up because someone updated CI and forgot to run the _full jobs
15:40 cwabbott: sorry, updated deqp
15:42 cwabbott: the blit tests failing are actually sorta your fault because we have the same issue intel had where we canonicalize NaNs when copying and you were the one to sign onto disallowing copies doing that
15:43 alyssa: ...heh
15:43 alyssa: In file included from ../src/asahi/lib/shaders/texture.cl:6:
15:43 cwabbott: that happened like 6 years ago, but then later someone (qcom?) ran into the same issue, didn't see your issue, and changed the CTS to avoid NaNs in the copy tests, then recently someone noticed that and reverted it
15:43 alyssa: In file included from /usr/lib/llvm-15/lib/clang/15.0.6/include/stdint.h:52:
15:43 alyssa: /usr/include/stdint.h:26:10: fatal error: 'bits/libc-header-start.h' file not found
15:43 alyssa: Error executing LLVM compilation action.
15:43 alyssa: ~~opencl was a mistake i regret my choices~~
15:45 ccr: :/
15:45 alyssa: happening ci but not locally, ofc
15:45 ccr: of course.
15:51 cwabbott: gfxstrand: I'm going to have to determine exactly how screwed we are later
15:51 dj-death: alyssa: I was going to ask, how's that going
15:51 dj-death: ?
15:51 alyssa: dj-death: waiting on review mostly
15:52 alyssa: WARNING: Tried to mix libraries for machines 0 and 1 in target 'asahi_clc' This will fail in cross build.
15:52 alyssa: exciting
15:52 cwabbott: decompressing uses the same path as normal blits, so we can't just decompress and blit with a different format since the decompression step will also canonicalize
15:52 cwabbott: it sounds like that was what you did in anv to solve the problem, but we can't do that
15:53 alyssa: exciting
15:54 gfxstrand: cwabbott: Oh, that' makes a depressing amount of sense.
15:55 gfxstrand: cwabbott: Sadly, I think the whole thing with copies is pretty settled at this point. It's a memcpy
15:55 gfxstrand: Which sucks in some cases
15:55 cwabbott: yeah, that might completely screw us over
15:55 cwabbott: as in, no compression at all for any float formats
15:55 gfxstrand: But it's been a while
15:55 gfxstrand: Yeah, Intel can't compress 11_11_10 for that reason
15:55 gfxstrand: And it costs a bit of perf in a few benchmarks
15:56 alyssa: this sounds like a vk spec bug
15:56 cwabbott: isn't R16 more common than 11_11_10 though?
15:56 alyssa: >V
15:56 cwabbott: R16_FLOAT that is
15:56 gfxstrand: Yeah, R16_FLOAT kinda sucks but Intel can compress that as R16_UINT
15:56 cwabbott: we... can't
15:56 gfxstrand: Ugh
15:56 cwabbott: qcom added stuff on a7xx to do reinterpreting copies correctly
15:57 gfxstrand: Do you remember the bug numver?
15:57 cwabbott: hold on, one sec
15:57 gfxstrand: I paged this out a long time ago
15:57 cwabbott: https://gitlab.khronos.org/vulkan/vulkan/-/issues/527
15:57 cwabbott: yeah, it's quite old
15:58 cwabbott: but it got brought up again recently, where someone pointed to this issue
15:58 cwabbott: which i think led to some CTS changes aiming to avoid NaNs getting reverted breaking us
15:59 cwabbott: qualcomm themselves have stopped caring, and I would love to not care about a6xx too, but it's what has actual users at this point
16:00 anholt_: gfxstrand: lots of vk full fails due to timeouts currently post cts 1.3.7.0. I'm probably going to disable those jobs shortly, in the absence of a workaround.
16:01 anholt_: gfxstrand: there's one job iirc that does complete but didn't get updated, which is in my draft VK CTS MR.
16:06 cwabbott: hmm, maybe it's just happenstance that the test changed to introduce NaNs into the mix
16:06 gfxstrand: anholt_: In this case, it was the copy fails that cwabbott just mentioned.
16:07 cwabbott: it was happening because the source texture is white (1.0) and 1.0 for unorm_16 and snorm_16 are a NaN in fp16
16:25 gfxstrand: Yeah, it's also a problem for INT16_MIN in an SNORM image
16:26 cwabbott: yeah, we already disable compression for snorm (although on a7xx we can reinterpret snorm as unorm)
16:27 cwabbott: I have to do more research, but I think the situation is that on a7xx we can make everything work perfectly by using new HW features but on a6xx we're just screwed
16:28 cwabbott: especially if we have to disable compression for all floating-point formats
16:31 alyssa: sigh
16:31 alyssa: I suspect this will screw over M1 too..
16:32 DemiMarie: Time for a global configuration file?
16:33 DemiMarie: That would allow users to select between “I want full conformance at the cost of performance” and “I want improved performance at the cost of being slightly out of spec in ways that will almost certainly not break anything”
16:33 cwabbott: alyssa: there's a nuclear option here, but you won't like it
16:34 cwabbott: can't believe I'm even typing this, but...
16:34 cwabbott: decompress in a compute shader
16:35 DemiMarie: cwabbott: what about allowing users to pick between fast and conformant?
16:35 alyssa: we haven't r/e'd the compression algo and i'd rather not
16:35 cwabbott: me too
16:36 DemiMarie: I know that certain Arm CPUs allow choosing between “RunFast” mode (flush to zero + denormals are zero + no traps), which is implemented entirely in hardware, and strict conformance mode, which requires certain operations to be implemented in software.
16:36 DemiMarie: alyssa: is this because of effort or because of e.g. patents?
16:42 alyssa: hopefully we can cast to uint
16:42 alyssa: unsure
17:27 alyssa: dj-death: review welcome ;)
17:28 alyssa: (big MR but most of it is driver changes, just a few common code commits on top)
19:28 bl4ckb0ne: are all the gl_renderbuffer_attachment Zoffset and CubeFaceMap field the same for a given gl_framebuffer?
19:29 bl4ckb0ne: im trying to add a given layer to `do_blit_framebuffer`
19:44 bl4ckb0ne: Zoffset is set at 2 places in fbobject.c
20:18 austriancoder: emersion: pq: thanks for the feedback .. will try to generate the files. Expect from the wayland-scanner thing.. does the rest of the wl changes look okay?
20:18 emersion: yeah, sounds fine
20:18 emersion: dispatch_pending, not sure why that call is necessary though
20:25 austriancoder: emersion: without it none listener was called and there was no window shown that displays .. in my case some green quads
20:25 emersion: that's a bit weird
20:26 austriancoder: emersion: the test I used is: https://gitlab.freedesktop.org/freedreno/freedreno/-/blob/master/tests-3d/test-varyings.c?ref_type=heads
20:31 austriancoder: I use these demos to capture cmd streams from blob and mesa and compare them.
20:49 agd5f: anyone seeing this with drm-next? https://pastebin.com/9aGTq5Pg
21:51 airlied: agd5f: I am now I turned on the option
21:53 airlied: I assume adding disconnect is the correct answer
21:56 airlied: agd5f: pushed a fix
22:41 agd5f: airlied, thanks!