IRC Logs of #dri-devel on irc.freenode.net for 2024-10-25

04:15 DavidHeidelberg: Been on RISC-V conf., tried run eglgears/glxgears on some presented risc-v laptops, got segfaults & another unhealty output from Mesa (sometimes 22.x :/ so, not even update)...
04:16 DavidHeidelberg: I'm not crying here, but thinking.. there is like ~ 6 months until next Debian starts entering freeze (so we bump the CI and we can cover risc-v).
04:17 DavidHeidelberg: eventually, we could at least drop Alpine container with risc-v, maybe into nightly runs... not sure how well it'll run on x86_64. We don't have any risc-v machines (now, but there will be some in close future available)
04:17 DavidHeidelberg: I have 3 pieces at home, but it's so slow, the x86_64 emul would be 10x faster.
04:19 DavidHeidelberg: For the HW, what I saw is Imagination (proprietary) or AMD (PCI-e, working kinda well on SiFive). Except that everything is softpipe or llvmpipe (thanks to ORCJIT :) these we're the ones which worked)
04:31 airlied: DavidHeidelberg: with PCIe it's often the PCIe hw is just broken
04:31 airlied: SoC PCIE hw is rarely tested on the things GPUs want
04:31 airlied: good to know orcjit works :-)
04:32 DavidHeidelberg: suprisingly, the PCIe GPUs seemed to work well (haha, let's be honest, no AAA titles was running there, but OpenArena level games)
04:34 DavidHeidelberg:regrets he didn't take pictures, they usually had cool PC cases on display
04:38 HdkR: perhaps fortunately, I don't play with RISCV hardware with GPUs. ARM hardware has mostly broken PCIe fabric :P
04:39 Company: how fast is llvmpipe on those things?
04:44 airlied: no idea what sort of vector support they have
04:46 DemiMarie: airlied: Is SoC HW broken, or are GPU drivers the broken ones?
04:47 airlied: DemiMarie: usually the hw
04:47 DemiMarie: airlied: in what ways?
04:47 airlied: they don't implement PCIE conformantly
04:47 DemiMarie: In what way?
04:47 airlied: usually they don't enact snooping support
04:47 DavidHeidelberg: Company: Answer is usually sorted to two groups, on small boards: No; on boards with PCIe (and better cores), it's Yes, but you use normal GPU anyway, so it doesn't matter :D
04:48 DemiMarie: airlied: I thought that the DMA API did not guarantee snooping.
04:48 DavidHeidelberg: yup, usually weird workarounds are needed, also I heard that for example recent AMD GPUs have some adjustments to work on these boards
04:49 Company: I'm just curious because people build PoS systems with those low-powered non-gpu systems
04:49 airlied: what has the DMA API got to do with the hw not doing it?
04:49 Company: and I'm waiting for the time when software GL is fast enough on those things
04:49 airlied: DavidHeidelberg: often the AMD adjustments are just hacks that disable a path, but overall the hw is screwed
04:50 DemiMarie: Company: doubt it will happen, once the CPU is fast enough I suspect they will want more things and it's back to a dedicated SW renderer.
04:50 airlied: there are endless threads on dri-devel with various non-x86 cpus trying to disable codepaths
04:50 airlied: the loongsoon folks being the most horrible
04:51 airlied: there was one loongsoon that I think used an AMD intergrated GPU on an x86 southbridge
04:51 airlied: or rather northbridge
04:51 HdkR: Most ARM hardware has heartburn for nGnRnE PCIe mappings
04:52 airlied: always gives me bad AGP flashbacks
04:52 DemiMarie: airlied: My understanding is that the DMA API does not guarantee snooping, so Linux drivers that assume it are buggy from a DMA API perspective (they need explicit cache flushes).
04:53 airlied: DemiMarie: we don't use the DMA API
04:53 HdkR: AGP PTSD, oh no
04:53 airlied: or at least we workaround it's lack of support for snooping, since GPU needs it
04:54 DemiMarie: Why do GPUs need snooping and not flushes?
04:54 airlied: hw designers gonna design hw :-)
04:55 DemiMarie: airlied: why can't one add SW flushes?
04:56 airlied: because we have userspace mappings
04:56 airlied: we don't just map stuff in the kernel
04:59 airlied: you also don't want to be throwing away your whole cache all the time
05:02 DemiMarie: I thought one could force writeback without invalidating. Are syscalls for cache maintenance too expensive?
05:03 airlied: why bother adding all that when the hw is meant to support it
05:03 DemiMarie: Obviously any mapping that is written from both sides would be busted, but that's racy anyway.
05:03 airlied: you have just a bunch of code that never gets tested
05:03 DemiMarie: airlied: which HW?
05:03 airlied: PCIE hw
05:03 DemiMarie: Are there any Arm SoCs that get this right?
05:03 DemiMarie: I presume POWER does
05:03 airlied: the new Ampere seems good
05:04 DemiMarie: That's a server board
05:04 airlied: plugging a 16x GPU into an SoC is often hard, but I've no idea, there is a lot of socs
05:05 airlied: I think jetson might have been good
05:05 airlied: not sure the plug a gpu into the rpi pcie 1x ever worked :-P
05:06 airlied: oh maybe the rpi5 works
05:16 DemiMarie: airlied: thanks for the explanation
05:35 HdkR: DemiMarie: NVIDIA ARM boards get PCIe correct, including Xavier and Orin and of course Thor next year. Plus their server Grace offering of course.
16:11 cheako: Hey, I was doing "well" but not great at writing vklayers in rust. ash, where I'm getting the vulkan types from, is good at writing vulkan structs... but is bad at reading them and I've a lot of code for just doing that. If ppl are interested in writing vulkan ICDs in rust, we should share code.
17:06 digetx: could anyone from Intel please ack the last patch of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 that updates CI flake expectation for Zink/ADL test, Marge refuses to apply this mesa-cache MR because Intel CI fails due to the flaky test
17:44 kisak: mattst88: fwiw, media-libs/mesa-24.1.7 USE="opencl" was FTBFS for me. After transitioning back to my irregular mesa ebuild to go back to 24.2.5, it's fine again.
17:46 kisak: I should have grabbed a build log snippet. Alas...
20:19 benjaminl: is there a reliable way to get the nir variable associated with a {load,store}_input intrinsic?
21:20 karolherbst: benjaminl: usually you do that before IO lowering was done on the shader, and after lowering all the relevant information should be part of the load/store instruction
21:35 benjaminl: karolherbst: thanks! figured out that there was already a pass to do exactly this for the property I was interested in :)
21:35 karolherbst: ahh, cool
21:39 alyssa: benjaminl: my usual advice is "don't>
21:39 benjaminl: curious about the design intention behind doing it that way instead of preserving the var association?
21:39 alyssa: use locations on lowered i/o instead if you can
22:26 DemiMarie: airlied: which Ampere? Ampere Altra has an erratum (PCIE_65) that makes it not work, with the workaround being to emulate unaligned accesses in the kernel.
22:27 HdkR: They were referring to "The new Ampere" So that would be the AmpereOne
22:27 HdkR: New is relative considering it's already over a year old :D
22:29 HdkR: Just need System76 to immediately replace their new system with a recent chip instead
22:36 DemiMarie: Also, do the Nvidia chips work with generic PCIe GPUs, or just with their own? The Nvidia driver has a workaround for the bug.
22:36 DemiMarie: HdkR: IMO Linux should just add an emulator for unaligned access faults and enable it by default on all Arm machines.
22:37 HdkR: DemiMarie: I have a Radeon Pro W7500 plugged in to my Jetson Orin
22:37 HdkR: Eh, maybe the unaligned handler for everything makes sense, but I'd prefer if ARM vendors just fix their broken hardware.
22:43 HdkR: Although I definitely don't recommend buying an Orin board. It's old and has bugged atomics
22:44 HdkR: Thor will be a significant upgrade :)
22:45 daniels: trapping and fixing unaligned access is … not great for performance
22:46 HdkR: Also hard to be entirely correct when crossing 16B and cacheline access granularies
22:46 HdkR: granularities*
22:53 DemiMarie: daniels: what about recompiling everything with `-fstrict-align`?
22:53 DemiMarie: I’d prefer for hardware to be fixed too, but in the absence of that then an unaligned access emulator is the best option I know of.
22:53 iive: DemiMarie, I think that's a different type of align
22:54 DemiMarie: iive: the idea is to prevent the compiler from generating unaligned accesses so they don't need to be trapped
22:54 iive: why is the compiler generating unaligned access at all on architecture that doesn't support it?
23:01 iive: I understand if there is a bug where pointer arithmetic leads to unaligned access. but stuff that is entirely controlled by the compiler...
23:08 iive: my bad, it's the same align. but isn't it supposed to be set by march or target by default?
23:10 iive: hum... arm arch doesn't even have the options. aarch64 does.
23:27 iive: apparently ampere is aarch64.
23:28 HdkR: Indeed it is
23:28 HdkR: Neoverse-N1 based cores in the Ampere Altra, custom design in the AmpereOne