IRC Logs of #dri-devel on irc.freenode.net for 2024-05-06

00:34 kode54: "oops"
00:34 kode54: meanwhile
00:35 kode54: https://gitlab.freedesktop.org/drm/amd/-/issues/3343 this ruined my games recently
00:35 kode54: but also called to my attention that I have had Resizable BAR disabled for several weeks now
01:23 kode54: Resizable BAR was set to “auto” which meant disabled due to CSM being default enabled
07:06 tshikaboom: hello! could anyone assign marge to https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26386 please?
07:39 MrCooper: Lynne: gfxXX (GFX block) and NaviXX (whole GPU) are separate nomenclatures
08:30 mlankhorst: agd5f: I'd like to get some discussion going on https://lore.kernel.org/all/20240424165937.54759-1-friedrich.vock@gmx.de/T/#meebfc2b08ce76f0eda8c4c5eb02ab460e0f6a9b6
12:16 sima: airlied, oops that epoll thread
13:04 sima: airlied, I tried to fix a few misconceptions around what dma_buf/fence is, but I think Linus' take is that this isn't our bug ...?
13:15 MrCooper: Company: the SUBOPTIMAL thing likely happens because the compositor sends no-change dma-buf feedback to Xwayland; Mesa 24.1 has a workaround for this
13:15 sima: airlied, ah the fix is in -rc7
13:16 Company: MrCooper: hrm - nvidia seems to cause OUT_OF_DATE errors in that situation, would that be an issue with nvidia's Vulkan driver?
13:16 MrCooper: sounds like it
15:53 regularityon: So dma-buf API has no unified hardware configurator without interrupting to CPU to allow or reject wraparound, currently it handles one or another based of gcc flags on integer overflow arithmetic intrinsics, while windows defaults to wraparound, Linux does currently default on no wraparound based of gcc flags and trackers interrupt. But the opening is the same for normal modes dma wraparound to zero from max vs no wraparound in incremental mode,
15:53 regularityon: and opener like provided is fairly intelligent, needs no fifo aka circular mode but one shot burst mode. dma-buf API is bulky and primitive, should be advanced to dma compute machine yet.
16:05 regularityon: Conversion routines for dma offloading are fairly simple , it needs no alu dictionaries, but conversion magic weight numbers, since plus and minus are handled without any answer set logics, and the skeleton is hence the same proportionally for compressed and not formats for back and forth encoding decoding, I've already solved everything but not yet the rounding procedures.
16:23 regularityon: Rounding ceil floor and such are at compile and relink phase, and are fairly simple to add into the logics once I start on the alu dictionaries as to how I see, but I have not looked at all the details of rounding yet, very clear it's not a blocker though.
18:55 sghuge: Has anyone encounterd strange issue with LLVM 17 vs LLVM 16? We have some CL kernels those gets compiled for Ray tracing but looks like something has changed between LLVM 16 and LLVM 17..things are failing on LLVM 17 and passing on LLVM 16. I tried to find the delta between spirv but differences are a lot, not sure how to track down the differences/changes. Just checking if someone
18:55 sghuge: already encounterd similar issue before.
19:02 mattst88: karolherbst has dealt with a bunch of issues around llvm-17. he might have some advice
19:06 karolherbst: sghuge: llvm only has opaque pointers, so that's causing quite some headache, because external functions might have different pointer types in their signature
19:06 karolherbst: and I wouldn't be surprised if that's the main cause of regressions (or well.. opaque pointers in general)
19:07 karolherbst: could try enabling them with llvm-16 and see if you get the same regressions
19:11 sghuge:is reading about the opaque pointers
19:16 kisak: llvm 16 has not-opaque pointers? I thought the forced transition was earlier than that
19:17 karolherbst: 16 has them disabled by default
19:17 karolherbst: or maybe it was enabled by default but we've disabled it
19:17 karolherbst: something something
19:17 kisak: my memory says 15 was disabled by default.
19:17 karolherbst: but non opaque got removed with 17
19:19 kisak: okay, so the factoid in my memory is incorrect regardless.
19:27 sghuge: karolherbst: Okay, enabling the opaque-pointers with LLVM 16 reproduces the failure.
19:29 karolherbst: I don't know if that is a good or a bad thing...
19:29 karolherbst: what kind of failure btw?
19:30 sghuge: ray tracing tests are failing with the LLVM 17 :(
19:30 karolherbst: soo.. runtime behavior changes?
19:30 karolherbst: given the size of ray tracing kernels... good luck?
19:31 karolherbst: but it might be that we optimize some strides away or other random things
19:31 karolherbst: to workaround opaque pointers the translator added a bunch of casts
19:31 karolherbst: sghuge: it might make sense to check with llvm-18 and llvm-19 and see if it got fixed in the translator
19:32 karolherbst: backporting stuff only happens on demand there
19:33 sghuge: yeah it's a runtime behavior change...and yep, it's a hell lot of kernels for RT :(
19:34 karolherbst: I haven't hit any runtime regressions since moving to llvm-17 sadly, so I can only guess what's up
19:34 sghuge: let me try and update the llvm version to see if it got fixed in the latest version.
19:40 dcbaker: karolherbst: I have an... interesting issue, and it might be something particular to my setup. Can you run `meson setup builddir Drust_args="-Clink-arg=-fsanitize=address" -Db_sanitize=address` on that rust test branch you have? When I run that with clang it works, when I run it with GCC it doesn't, and I'm suspecting there's just something wrong with my gcc
19:46 karolherbst: dcbaker: you mean like it still fails to link on your end?
19:47 dcbaker: Yea, but only when use gcc as the linker, when I compile with clang it works fine
19:48 karolherbst: same, but curious
19:48 karolherbst: why are we hitting all the compiler bugs...
19:48 karolherbst: *and lniker
19:49 dcbaker: This would all be easy if rustc would go ahead and stabilize their sanitizer support, lol
19:49 karolherbst: heh, fair
19:50 dcbaker: Because that does work with gcc
19:51 karolherbst: mhhhhh
19:51 karolherbst: with clang: note: /usr/bin/ld: cannot find /usr/bin/../lib/clang/17/lib/linux/libclang_rt.asan_static-x86_64.a: No such file or directory
19:52 karolherbst: but... that sounds like a fedora bug or something? I have no idea
19:56 karolherbst: dcbaker: mhhhhh, it might have to be caused if things compiled by gcc and rustc are mixed
19:56 karolherbst: in the final object I mean
19:56 karolherbst: rusticl_test is special because it basically links against a lot of mesa
19:57 karolherbst: and probably the reason why the other test binaries compile just fine
19:58 dcbaker: Yeah, that makes sense to me. I don’t really know where to go with it, since it’s a bug easy to hit on CI, but I don’t know how often people will hit it in production
19:58 karolherbst: you have to enable the sanitizer, soo...
19:59 karolherbst: I'm more concerned about the other issue tbh :D
19:59 karolherbst: but that only seems to happen when enabling lto?
19:59 karolherbst: it's kinda weird
19:59 karolherbst: or rather.. I think lto kinda causes a lot of random symbols to be linked in and the linker just doesn't nuke unused things early enough
20:11 dcbaker: I actually expect GCC + rustc with lto enabled to not work, since GCC and LLVM have different formats for annotating LTO objects
20:11 dcbaker: One of these days I'll finish wiring up gcc-rs to Meson and I would expect that to work
20:17 karolherbst: mhhh
20:17 karolherbst: I'm still reluctant to gurantee any sort of support for gcc-rs
20:18 karolherbst: maybe in 5 years, once we don't constantly bump up the req rust version or so
20:25 dcbaker: Yeah, I just meant if someone is dead set on LTO for gcc, that saying "well, there's gcc-rs, otherwise you can use clang"
20:25 dcbaker: please keep the pieces
20:26 karolherbst: yeah, fair
20:27 karolherbst: I wonder if the libasan problem is fundamentally the same, or are gcc and clang handling it similiar enough so that it should still all link fine?
21:19 alyssa: other than broadcom, is anyone CI'ing vulkan drivers with WSI?
21:19 alyssa: dEQP-VK.wsi.* seems.... very unstable. v3dv flakes file is large, no other driver mark those tests but i'm not convinced they run either
21:21 dj-death: I think we run them internally
21:33 alyssa: hmm, ok
21:33 alyssa: no flakes?
21:35 alyssa: oh, exciting
21:35 alyssa: dEQP-VK.wsi.xlib.maintenance1.present_modes.heterogenous.fifo_relaxed_fifo_relaxed_immediate
21:35 alyssa: fails on sway and passes on gnome
21:35 alyssa: (-:
21:36 robclark: I think we test on weston in CI, fwiw
22:02 Company: (Vulkan in general feels more unstable than GL. I'd describe it as solid code but not enough serious usage to weed out all the weird stuff.)
22:05 alyssa: robclark: for turnip?
22:05 alyssa: Company: i'm not sure "stable" applies to GL (-:
22:15 Company: GL is surprisingly solid for the size of the API
22:16 Company: but I was more talking about the kinds of issues that GTK got since we switched to Vulkan by default
22:17 Company: the nvidia driver spins up all its suspended gpus on dlopen() of the driver for example, so vkCreateInstance() blocks for 5 seconds
22:17 Company: or Intel advertises YcbcrConversion but doesn't implement it
22:18 Company: stuff like that
23:10 sghuge: karolherbst: Okay, I think the same issue exist in llvm-18 as well. I can still repro the failure with llvm-18 :( Will try and see if I can get llvm-19
23:18 karolherbst: Company: vulkan not validating by default might also cause random issues I bet
23:18 karolherbst: but yeah.. that resuming GPUs issues is probably a huge blocker, because I doubt users want this lag when launching any app
23:21 Company: I don't think any of those issues are bad, they're just indications of developer focus
23:21 Company: someone just has to sit down and make the driver do the same thing that the EGL driver does
23:22 karolherbst: I'd consider the app only starting in 5 seconds to be a major blocker tbh
23:22 Company: or implement the YcbcrConversion/not advertise it
23:22 Company: yeah, it'd certainly be more critical if it was October already
23:22 karolherbst: yeah, fair
23:24 Company: but it's also just the dual gpus that have a setup that has gpu suspend enabled and that uses the nvidia driver
23:24 karolherbst: every driver has it enabled afaik
23:25 karolherbst: thoug
23:25 Company: I didn't follow the whole discussion, I think it had something to do with what kernel module was used etc
23:25 karolherbst: maybe mesa is better at not accessing the suspended device in that case
23:25 karolherbst: normally the GPU gets resumed once you start accessing it via ioctls or other things
23:26 Company: I haven't heard complaints from AMD users
23:26 Company: but I have heard that nvidia's EGL-wayland had the same issue and they fixed it
23:26 karolherbst: ahh
23:26 karolherbst: probably just the way they init their drivers then
23:26 Company: yeah
23:27 Company: Vulkan use is heavily biased towards DXVK atm
23:27 Company: nobody runs a tiny GTK app with Vulkan, so all those features haven't seen much use
23:28 Company: well, DXVK and other Vulkan-targeting things, which are generally fullscreen apps that want the dgpu if there is one
23:28 karolherbst: that reminds me that running 32 bit games on my quadro with the nvidia driver at some point caused OOM situations, because the driver thought it's funny to pre allocate ~3GB of VM space
23:28 karolherbst: (quadro as in 48GB GPU)
23:30 Sachiel: Company: what's this about intel advertising ycbcr conversion but not implementing it? At least the cts tests for it are working fine, which I grant doesn't say much, but I'd expect "advertise but not implement" to fail them utterly
23:30 Company: the worst thing about drivers doing that is that people run their favorite top clone and then complain that every GTK4 app uses gigabytes of memory while GTK3 didn't
23:31 Company: Sachiel: no idea about the cts tests - it might require yuv dmabuf imports
23:55 Company: Sachiel: turns out I hadn't filed it. It's https://gitlab.freedesktop.org/mesa/mesa/-/issues/11125 now
23:56 Sachiel: oh, modifiers, fun