03:24karenthedorf: So I tried out AMDGPU on my 8610G/M laptop. Ended up with the most cursed setup ever. Vulkan support was reported, but it wasn't possible to make a swapchain. But what was really cursed was that OpenGL worked *using Zink*. So Vulkan apps didn't work, but OpenGL apps worked, thanks to using Vulkan instead. Is that complete chaos to be expected with AMDGPU + SI cards? Anyway to get Vulkan swapchains working?
06:21Ristovski: karenthedorf: Did you load amdgpu with `si_support=1`?
06:21Ristovski: (I assume you did, else it wouldn't work at all, but still..)
06:25karenthedorf: Yes. Had to blacklist radeon completely, with radeon.si_support=0 it still got loaded (and couldn't be unloaded) but nothing graphical would start.
06:27Ristovski: karenthedorf: If you still have access to the setup, can you do `MESA_LOADER_DRIVER_OVERRIDE=zink glxinfo -B`? If its actually using Zink over amdgpu/radv, it should report something like `Device: zink Vulkan 1.3(AMD Radeon Graphics (RADV RENOIR) (MESA_RADV)) (0x1638)`
06:28karenthedorf: I'm 95% certain it was using Zink from the messages in syslog and the fact that glxgears was spitting out vulkan validation messages. I'll note that down and try later.
06:45karenthedorf: Okay, I was wrong. I don't know how OpenGL is working and why anything using OGL is spitting out vulkan validation messages and messages about Zink. https://pastebin.com/BVHqQee7
06:46karenthedorf: So if I understand this correctly, *everything* is going through llvmpipe and not hardware? O.o
06:48Ristovski: Looks like it
06:48Ristovski: Check `dmesg` for any amdgpu messages
06:49karenthedorf: sudo dmesg | grep amdgpu ?
06:49karenthedorf: https://pastebin.com/tkRGnLCX
06:51karenthedorf: There's similar UBSAN messages in radeon (more, in fact)
06:52Ristovski: odd, seems like its loading fine
06:56karenthedorf: So in lspci -k. There's a [8670A/8670M/8690M/R5 M330/M340/Radeon 520 Mobile] Display Controller that says "Kernel driver in use: amdgpu, Kernel Modules: amdgpu, radeon". There's also an 8610G VGA compatible controller that just says "Kernel Modules: amdgpu, radeon" No module in use listed
06:57karenthedorf: So my (naive) reading of that is this has dual graphics and amdgpu is being loaded for the 'good' gpu but not the low-power -G one?
07:39MrCooper: karenthedorf: plausible, there are no SI APUs so it's more likely CIK; try adding "radeon.cik_support=0 amdgpu.cik_support=1" instead of blacklisting radeon
07:40karenthedorf: Yeah I've tried that before (radeon 0 for si and cik, amdgpu 1 for both). Radeon still gets loaded and everything works like I'd not changed anything.
07:41MrCooper: right, actually the APU is pre-SI, so only supported by radeon
07:43karenthedorf: Right, well, that's solved it I guess. -G chip can only run on Radeon, I am not going to amdgpu today
07:49MrCooper: radeon for the APU & amdgpu for the dGPU should work in principle though, sounds like there's a bug somewhere
07:50karenthedorf: I don't see a way to have that happen
07:51karenthedorf: So if I have amdgpu si_support=1 cik_support=1 and radeon si_support=0 cik_support=1, you think I should see the 8610G use radeon and the 8670M use amdgpu?
07:51karenthedorf: radeon cik_support=0, sorry, typo
07:56karenthedorf: Rebooting to test, away I go
08:09karenthedorf: Thank you so much! I have it working in the described hybrid configuration. The bug appears to be with sddm. When first setting up this hybrid configuration sddm failed to start and I had to trop to a text console. But after installing gdm3 I can load into Plasma(Wayland) and run glxgears/vlcube/vkcube-wayland all fine.
08:28MrCooper: karenthedorf: cool, FWIW the cik_support parameters have no effect on your system, you can drop those again
08:30karenthedorf: Seems ridiculous I have to use the GNOME display manager to launch a KDE Plasma session, but there you go.
08:42DavidHeidelberg: karolherbst: btw. found another regression "to be bisected". tinygrad now runs 950ms per token, while before it was around 400ms and CPU was 1.2s
08:42DavidHeidelberg: (sadly talking about not-yet-merged freedreno code)
08:48DavidHeidelberg: my clear_buffer patches got it to 550ms.... but that means, it should now run probably around 200ms per token (before the perf regression)
09:32karolherbst: DavidHeidelberg: maybe check out RUSTICL_DEBUG=perf https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30324
09:32karolherbst: I've added some performance warnings if the code stalls too often
09:33karolherbst: I want to get down GPU stalls a lot, but that will mean doing some tricks here and there
09:34karolherbst: DavidHeidelberg: but anyway, would be interesting to know why it got slower, but yeah, might just be stalling and https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30082 which might cause some perf regressions maybe
09:39DavidHeidelberg: karolherbst: perf doesn't print anything (I checked, I have the MR referenced in)
09:39karolherbst: oh interesting...
09:40karolherbst: but those kernels might be small enough that https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30152 matters
09:40karolherbst: worth a try
09:42DavidHeidelberg: karolherbst: can u rebase?
09:44karolherbst: DavidHeidelberg: done
09:46DavidHeidelberg: karolherbst: thanks. so far same result, or the incremental compilation doesn't work :D
09:57karolherbst: DavidHeidelberg: can you get numbers on how busy the GPU is?
09:58DavidHeidelberg: karolherbst: just turned off the phone and going for dinner, later I can
11:46sima: narmstrong, the 3 panel patches you reviewed, do you also plan to apply them?
11:46sima: (reviewed yesterday, I'm a notch behind)
12:02DavidHeidelberg: karolherbst: 35 - 40%
12:02karolherbst: DavidHeidelberg: yeah.. so something stalls a lot and I'm not quite sure it's rusticl or tinygrad :D
12:06DavidHeidelberg: hmm, also 40% would explain why it's so much slower than before, if it was running around 90-100 previously... I'm kinda almost done today, so I can try do some testing of older tinygrad and see where is the problem
12:11karolherbst: DavidHeidelberg: might also be a regression in rusticl
12:11karolherbst: I'm still stalling way too often
12:11karolherbst: but anyway, would be interesting to figure out where exactly that happens
12:12DavidHeidelberg: I'll check :)
12:34DavidHeidelberg: karolherbst: the original git revision which I run giving me "terminate called after throwing an instance of 'std::bad_alloc'" and everything after is not getting over 50% CPU usage. This peaks the GPU usage to 100% but then immediatelly crash
12:36DavidHeidelberg: thou tinygrad codebase isn't famous by it's correctness :D
12:41DavidHeidelberg: karolherbst: cannot cleanly revert 30082, I'll look into it tomorrow
13:15narmstrong: sima: you mean https://lore.kernel.org/all/20240710084715.1119935-2-yangcong5@huaqin.corp-partner.google.com/ ? yes I'm applying #1-#3 right now
13:18sima: narmstrong, https://lore.kernel.org/all/20240706102338.99231-1-kikuchan98@gmail.com/ this series I've meant
13:19narmstrong: sima: conor requested a cmomit message rewritte of patch #3
13:19narmstrong: *commit
13:20sima: ah missed that
13:20narmstrong: but I cam apply #1 #2 #4
13:25DemiMarie: How difficult would it be to use all of the Mesa UMDs on Windows with a virtio KMD? Is this something that would be upstreamable?
13:28karolherbst: DavidHeidelberg: if current mesa gives you high CPU usage, that might be worth looking into and profile where the overhead is coming from
13:30zmike: ishitatsuyuki: another linker q: if I'm linking shared lib A to B, and I have declared a symbol in B as PUBLIC (and readelf shows it GLOBAL), why am I getting undefined symbol errors from linking A to B for that symbol?
13:31ishitatsuyuki: should work, but can you paste the line from readelf of that symbol?
13:31zmike: 68231: 000000000006e01a 30 FUNC GLOBAL DEFAULT 13 driGetExtensions
13:32ishitatsuyuki: so you are linking A.so to B.so
13:32zmike: yep
13:32ishitatsuyuki: or rather, A.so is the link product
13:33ishitatsuyuki: that should resolve, what's your linker error?
13:33zmike: /home/zmike/src/mesa/build/../src/glx/drisw_glx.c:986: undefined reference to `driGetExtensions'
13:33zmike: or mold: error: undefined symbol: src/glx/libglx.a(src/glx/libglx.a.p/drisw_glx.c.o): driGetExtensions
13:34ishitatsuyuki: can you paste the failing compiler / linker invocation command line to a pastebin?
13:34zmike: I could push a branch if that's easier
13:35ishitatsuyuki: that would work
13:36zmike: ishitatsuyuki: zmike/test
13:36zmike: you should see errors from linking glx
13:41ishitatsuyuki: zmike: driGetAPIMask is marked LOCAL, did you forget to export the symbol?
13:42zmike: errr
13:42ishitatsuyuki: sounds like linker script actually
13:42zmike: it's GLOBAL/DEFAULT here?
13:42ishitatsuyuki: all the symbols are LOCAL for me
13:42zmike: weird
13:43zmike: which in which lib?
13:43ishitatsuyuki: build/src/gallium/targets/dri/libgallium-24.3.0-devel.so
13:43ishitatsuyuki: symbols: driGet*
13:43zmike: okay so same lib
13:43zmike: 68244: 000000000006ee42 32 FUNC GLOBAL DEFAULT 13 driGetAPIMask
13:44zmike: (this is a different branch than my original question)
13:44ishitatsuyuki: I checked out test (7b938237495)
13:44zmike: yeah that's correct
13:44zmike: it has linker issues
13:44zmike: just not with driGetExtensions specifically
13:44zmike: you can see at HEAD~1
13:45zmike: the symbols that should be exported
13:46ishitatsuyuki: there is a --dynamic-list passed for libgallium, everything not listed there will not be exported
13:46ishitatsuyuki: not sure why you are seeing GLOBAL
13:46zmike: aha
13:46zmike: okay, that explains it
13:47ishitatsuyuki: not sure why we pass both dynamic-list and version-script
13:47ishitatsuyuki: that sounds like a recipe for confusion
13:49ishitatsuyuki: updating both dri.sym.in and dri.dyn to be safe, but honestly we should just eliminate the less supported one of them
13:49ishitatsuyuki: s/updating/should update/
13:50zmike: dri.dyn seems redundant
13:51zmike: at least for dri target
14:43zmike: ishitatsuyuki: I hit another weird case
14:44zmike: public symbol exported by static lib -> linked into shared lib -> linked another lib, and the symbol is gone after step 2
14:45zmike: seems like it's getting pruned because it has no users even though it's declared public
15:00Company: zmike: I think the link_whole solved that one - I used to use whole source files when they weren't used in a static lib and only exported as public symbols
15:01zmike: oh good point
15:01zmike: I didn't check that
15:01zmike: thx for the reminder
15:02zmike: ishitatsuyuki: cancel ping
15:52austriancoder: DavidHeidelberg: Not sure if you still need this information, but piglit/bin/cl-api-enqueue-fill-buffer is a pass for rusticl and entaviv
15:54jenatali: Demi: Probably difficult
15:55jenatali: zmike: When linking static libs, the linker only pulls in object files that are referenced (and then can prune unused symbols after everything is resolved, as an optimization). If you want to ensure a symbols is present in a linked output, you need to explicitly list the .o, not a .a
15:55jenatali: Which is what link_whole does
15:56DemiMarie: jenatali: what makes it so hard?
15:57jenatali: 1. Porting to cross-platform code isn't easy
15:58jenatali: 2. Adding a backend that talks Windows kernel stuff instead of drm fds is probably a lot of work for most Mesa drivers
15:58DemiMarie: I see.
16:00DemiMarie: The context is that I want to provide GPU acceleration to Windows guests under Qubes OS. These guests will get a paravirtualized GPU that is basically the Linux kernel driver API.
16:02jenatali: Are you just trying to get VK? Or do you want any other graphics APIs? Because if you do, the UMDs need to talk WDDM
16:13Company: AMD has done it again - libva is producing P010 dmabufs and the radv can only import P016
16:13Company: this driver...
16:16mattst88: which driver do you think is bad? cause one of them is not written or maintained by amd :)
16:16mattst88: (and it's the good one :)
16:17Company: I would just hope the left hand and the right hand talk to each other some more
16:17Company: so that interop is a bit less likely to frustrate me
16:18mattst88: IMO, at least three customers of AMD's vaapi driver are pretty displeased with the status
16:18mattst88: we could really benefit from some CI for vaapi drivers
16:18daniels: mattst88: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26113
16:19mattst88: awesome \o/
16:19daniels: was mostly blocked behind a pretty infuriating inability to update deqp-runner, but that's been resolved as of ~now
16:19mattst88: daniels: is AMD involved in this effort?
16:23Company: so, is this an AMD hardware issue?
16:23Company: or is this just missing P010 support in the radv driver?
16:23jenatali: Huh. Wonder if we should get some of this onboarded for the d3d12 vaapi stuff
16:24DavidHeidelberg: austriancoder: etnavic doesn't have custom impl, it uses u_default_clear_buffer fallback. But thanks :) freedreno impl. now passes all tests, so I'm OK.
16:36Company: I need to take that back
16:36austriancoder: DavidHeidelberg: 👍🏻
16:36Company: I incorrectly blamed AMD, while it was a mixup of P010 and P210 Vulkan formats in my table
20:39mattst88: any clues what might cause nir_lower_vars_to_ssa to leave a deref_var and deref_struct in place?
20:40mattst88: https://dpaste.com/4RHWWT2J8 is the diff of the nir_print_shader output for the last time nir_lower_vars_to_ssa ran and made progress
20:40mattst88: and you can see that it didn't lower a deref_var and a deref_struct
20:43mattst88: I'm suspecting that `32 %317 = @load_deref (%316) (access=none)` is the problem
20:55mattst88: or is it something about var #1 being a push constant...?
21:45jenatali: mattst88: You need lower_io / lower_explicit_io to fully remove derefs
21:45mattst88: okay, thanks
21:46jenatali: vars_to_ssa just removes store / load pairs that can be trivially be turned into direct SSA references or phis