IRC Logs of #dri-devel on irc.freenode.net for 2024-09-17

02:43 DavidHeidelberg: mareko: what about asking for llvm18 only with new HW support enabled? For CI it would be most reasonable to wait until Debian gets soft-freeze and then bump it
03:02 airlied: Ermine: TTM has no core drm impact
03:02 airlied: so drivers can just use it
04:51 a-865: Is this likely describing a bug?:
04:51 a-865: Sep 16 23:30:08 kernel: [drm] Initialized nouveau 1.4.0 20120801 for 0000:01:00.0 on minor 0
04:51 a-865: Sep 16 23:30:08 kernel: nouveau 0000:01:00.0: DRM: DDC responded, but no EDID for DVI-D-1
04:51 a-865: Sep 16 23:30:08 kernel: nouveau 0000:01:00.0: [drm] Cannot find any crtc or sizes
04:51 a-865: started post-6.6.x kernel with GK107 10de:0fc1 in Feb. or Mar.
04:55 a-865: symptom?
04:57 airlied: it's a we tried to probe the monitor, it's there but it didn't respond
05:13 a-865: airlied: can you postulate why?
05:14 a-865: local I/0 totally absent with kernels doing this
05:14 K900: Bad cable?
05:14 K900: Could just be a bad cable
05:15 airlied: sounds like something in the ddc/i2c broke or bad cable
05:15 K900: Though it's pretty hard to fuck up a cable so that I2C only works _sometimes_
05:16 a-865: nothing wrong with cables. Occurs only with kernels > 6.6.50, on every installed distro, regardless whether both displays are connected to DVI ports, or one DVI and one HDMI
05:17 airlied: a-865: 6.6.49 works?
05:17 airlied: or not narrowed down that far?
05:20 a-865: It's between late Feb and mid March in Leap's 6.4 kernel. 6.6.50 works. journalctl -b --dmesg: https://paste.opensuse.org/b833c8a4376d
05:22 airlied: ah okay so anything 6.7+ doesn't
05:22 a-865: after 22 Feb., by 18 Mar.
05:23 a-865: everything 6.7+ locks up local I/O=blackscreen, no keyboard
05:45 a-865: journalctl -b --dmesg with Tumbleweed 6.9.9: https://paste.opensuse.org/14fabf9513df
06:44 airlied: a-865: not sure what debug option might be useful
06:44 airlied: Lyude: ^ you might know nouveau.debug=debug or drm.debug=255
06:46 emersion: Lynne: sync_files are binary
06:47 emersion: suncobj might be binary or timeline
06:47 emersion: syncobj*
06:47 emersion: wlr has code for these things fwiw
07:20 a-865: I used drm.debug=255 for https://bugzilla.opensuse.org/show_bug.cgi?id=1225296#c35
08:25 MrCooper: Ermine: FYI, the brand-new xe driver uses TTM
08:50 Lynne: emersion: syncobj?
09:04 emersion: Lynne: https://dri.freedesktop.org/docs/drm/gpu/drm-mm.html#drm-sync-objects
09:05 emersion: a sync_file may only exist once GPU work has been submitted
09:05 emersion: a drm_syncobj is a container which can hold zero, one or more sync_files
09:41 Lynne: emersion: ah, I see, so I can get an implicit sync_file out of a dmabuf, or, with explicit sync, I could get a syncobj
09:44 emersion: Lynne: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/render/dmabuf_linux.c?ref_type=heads#L89
10:26 dj-death: karolherbst: maybe you have some interest in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30711 ?
10:27 dj-death: karolherbst: not sure what changed in ubuntu 24.10 but it's always failing to build stuff with the llvm header without that change
10:28 dj-death: since it's just adding another directory I don't think it'll break other distros
10:44 daniels: mareko: hmm, I thought AMD drivers were just supposed to be dropped into any distro without any system changes
11:11 mareko: daniels: with ACO that's true
11:12 mareko: DavidHeidelberg: let's wait for when debian upgrades to llvm18 and then we bump it
11:14 mupuf: in debian stable or sid?
11:14 mupuf: or testing?
11:15 mupuf: FYI, it is already in testing and sid
11:16 mupuf: if stable, then you'll have to wait another year
11:23 tjaalton: sid is already on 18, ubuntu 24.10 is on 19
11:24 tjaalton: the failure to find opencl-c-base.h is probably due to some change in the llvm-toolchain-19 packaging
11:24 tjaalton: happens on debian too now that I tried
11:25 dj-death: I think it's 18+
11:26 tjaalton: but the mesa packaging is on 18 already, and didn't fail before
11:26 dj-death: hmm then I don't know
11:26 dj-death: something else :)
11:30 DavidHeidelberg: mupuf: not whole year, I think we can switch images in soft freeze, where we don't have to be afraid of breaking changes
11:30 mupuf: DavidHeidelberg: ack, well, you know better than me Debian's practices
11:31 mupuf: I mean, my understanding is ~0, I just usually get annoyed and walk away grumbling :D
11:31 DavidHeidelberg: mupuf: I was thinking.. maybe.. the CI could be switched to Ubuntu (6m release cycle), but I'm not sure if anyone would spend time to bumping it
11:32 mupuf: realistically what are the odds for breaking changes in testing though?
11:32 DavidHeidelberg: mupuf: I recently become Debian Maintainer, so I got used to it :D Modern packages are fine, the older ones and special ones are pain and suffering
11:32 tjaalton: dj-death: well, the patch did help with llvm-19 anyway ;)
11:33 dj-death: tjaalton: good
11:33 DavidHeidelberg: mupuf: well, I remember bumping CI to bookworm and... there was lot of work and lot of expectations unexpectedly passed/failed..
11:33 mupuf: oh, wow, ok
11:33 mupuf: bbl!
11:53 mareko: DavidHeidelberg: I'd like to bump the LLVM version requirement to 18, but I can do when it's most convenient for the CI
11:54 mareko: *I can do it
12:09 karolherbst: mareko: as I'm looking more into SVM these days, is there a way with amdgpu to guarantee that allocations across GPUs to have the same virtual address?
12:09 pepp: mareko: couldn't the requirement be bumped per gfx version? I don't think LLVM18 makes a major difference vs LLVM15 for gfx8 for instance
12:11 mareko: pepp: it's a built-time requirement
12:11 mareko: *build
12:12 mareko: karolherbst: for ROCm or Mesa?
12:15 karolherbst: mesa
12:16 karolherbst: I already have a prototype which works across drivers/devices for anything doing userspace vm management, but I don't think radeonsi supports that, nor do I know if it's in theory supported by amdgpu
12:17 pepp: mareko: hmm true. I guess LLVM is not meant to be used with a runtime version >= build-time version
12:17 mareko: yes, each llvm version is a separate lib, libllvm18 etc.
12:19 mareko: karolherbst: the addresses are assigned by userspace, it's solvable with mesa changes only I think
12:19 karolherbst: ohh, they are?
12:20 mareko: the kernel only exposes an empty virtual address space with a small area reserved for the kernel, the rest is up to us to assign
12:21 mareko: libdrm provides a virtual address allocator as a utility
12:21 karolherbst: ohh, libdrm is doing it, figures
12:21 mareko: but we can use any address range that's free
12:21 karolherbst: mhhh
12:21 pepp: karolherbst: see https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/345/commits
12:21 karolherbst: so the way I considered doing it, is that rusticl reserves an address range and manages it itself
12:22 karolherbst: and drivers tell in which range an area can be reserved
12:22 mareko: yes that's doable
12:22 karolherbst: okay, cool
12:23 mareko: the only limitation is that address bits 47-63 must be 1
12:24 karolherbst: right.. iris has the same limitation
12:24 mareko: that naturally doesn't intersect with any CPU addresses
12:25 karolherbst: so for iris I just excluded that range rusticl could reserve, because it's just messy to use those ranges for SVM
12:25 mareko: it also doesn't conflict with ROCm sharing the same address space and mirroring the whole CPU address space in there
12:26 karolherbst: (like in theory the shader could sign-extend the address, but halfing the availble VM range isn't an issue at all)
12:26 karolherbst: mhh, maybe I get SVM working with iris + radeonsi until XDC then... would be kinda cool
12:28 mareko: the only short-term obstacle is that radeonsi also uses the same range for command buffers, shaders, and the scratch buffer for spills and arrays
12:28 mareko: and ring buffers for the gfx pipeline
12:29 karolherbst: mhh, why would that be an issue?
12:29 mareko: you can't use those addresses
12:30 mareko: it will only work if we reserve a piece of the address range just for rusticl
12:30 karolherbst: yeah, that's the idea
12:30 karolherbst: it's already implemented for e.g. iris
12:30 karolherbst: so rusticl just allocates some VM space from the driver and assigns addresses out of that range to resources allocated with a special flag
12:31 mareko: karolherbst: on a different note, it's non-conformant to evaluate constant expressions at compile time because the rounding mode can be unknown if it's changed at runtime
12:32 karolherbst: there is no way to change it at runtime tho
12:32 karolherbst: maybe there is an extension and maybe it was planned to do it, but core CL doesn't allow it
12:32 mareko: so ROCm doesn't evaluate constant expressions at compile time because of that
12:33 karolherbst: heh
12:33 mareko: e.g. sqrt(3)
12:33 karolherbst: interesting
12:34 karolherbst: maybe ROCm implements an extension so it matters, but it's still a bit strange
12:35 karolherbst: there was e.g. cl_khr_select_fprounding_mode to change the rounding mode at compile time
12:35 karolherbst: maybe it has CUDA reasons for why they aren't doing it
12:35 karolherbst: *CUDA/HIP
12:35 mareko: arguably even the current way is non-conformant because sqrt evaluated on the CPU and GPU can produce slightly different results
12:35 karolherbst: yeah..
12:35 karolherbst: I don't think the CL spec says anything at all here
12:36 karolherbst: so I think it matters what the C99 spec says
12:40 mareko: it seems unlikely that sqrt is allowed to generate 2 different results for the same value (one is a constant and the other isn't)
12:41 mareko: a compiler could run a shader to evaluate constant expressions on the target device instead of the CPU
12:42 karolherbst: mhhhh yeah...
12:42 karolherbst: I think the conclusion from the last time I had the discussion was, that constant folding is allowed to show different results
12:43 karolherbst: I even landed a CTS fix recently which moved a calculation from compile time to runtime
12:44 karolherbst: but maybe I need to dig deeper into the C spec and figure out what's the situation there
12:45 karolherbst: mareko: there is e.g. the gcc -frounding-math compiler flag which disables constant folding by assuming some default rounding mode
12:46 karolherbst: which is disabled by default, which means that the compiler will be happy to constant fold
12:48 karolherbst: though floating point constants in C code is also usually promoted to double, so not sure it even matters much
12:48 karolherbst: (and maybe they get promoted due to this reason)
12:50 karolherbst: C99: "If a floating expression is evaluated in the translation environment, the arithmetic precision and range shall be at least as great as if the expression were being evaluated in the execution environment."
13:48 a-865: airlied: https://gitlab.freedesktop.org/drm/nouveau/-/issues/385 is my report about this kepler no EDID
14:44 cambrian_invader: if I have a gpgpu+display driver, and I want to set up a framebuffer, who is responsible for ensuring that both the gpgpu and the display driver can DMA the framebuffer?
14:46 cambrian_invader: e.g. if the display driver supports a larger address space than the gpu
14:48 cambrian_invader: then the sequence drm_create_dumb(display); drm_handle_to_fd(display); drm_fd_to_handle(gpu) will fail since the GPU won't be able to address the buffer
14:48 cambrian_invader: (or might not, but it's likely)
15:43 cambrian_invader: huh "Note that dumb objects may not be used for gpu acceleration"
15:43 cambrian_invader: so maybe this is a mesa bug
15:44 emersion: cambrian_invader: dumb buffers are specifically for display
15:45 emersion: and for CPU rendering
15:45 cambrian_invader: yeah, but as noted above the problem occurs when sharing it with the gpu
15:45 emersion: if you want a buffer which can do rendering + display, you'd need to use GBM
15:46 cambrian_invader: I don't know what I want :P
15:49 cambrian_invader: here's a dump of the relevant calls https://paste.debian.net/1329632/
15:50 cambrian_invader: and this is all downstream of gbm_dri_bo_create so I guess I am using GBM
16:07 cambrian_invader: emersion: should mesa be using something other than renderonly_create_kms_dumb_buffer_for_resource ? like drm_mode_addfb ?
16:11 cambrian_invader: that one doesn't seem to have any handles for controlling the buffer location
16:12 MrCooper: addfb just allocates a KMS framebuffer handle for an existing BO, it doesn't create a new BO
16:13 cambrian_invader: hm
16:13 cambrian_invader: do you know of any examples of "hardware-specific ioctls"? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_dumb_buffers.c#n55
16:17 cambrian_invader: https://lore.kernel.org/dri-devel/20230803100041.387404-1-contact@emersion.fr/ makes the same point, which seems to indicate that the strategy employed by mesa here is incorrect
16:38 cambrian_invader: ok, so I guess mesa should use lima_bo_create somehow?
16:39 cambrian_invader: "In practice, it would mean that you'd stop using renderonly_create_kms_dumb_buffer_for_resource, and replace it with your own driver-specific allocation ioctl on the render node." https://gitlab.freedesktop.org/mesa/mesa/-/issues/5510#note_1248284
18:33 Lyude: a-865: hm, looking at https://gitlab.freedesktop.org/drm/nouveau/-/issues/385 are you saying this would happen between upstream kernels 6.6.50 and 6.7.9?
18:50 a-865: Lyude: 6.6.11 (suse), 6.6.18 (suse), 6.6.24 (suse), 6.6.28 (mageia), 6.6.32 (suse), 6.6.50 (mageia & suse) all work as expected
18:50 Ermine: airlied, MrCooper thank you for your answers
19:06 uriah: pepp: hi, I'm sorry to bother you about this, but I am very curious, and willing to experiment with trying to update your qemu amdgpu native context commits for newer qemu, but is this work by chance already elsewhere, or being collaborated on between various driver devs for compatibility with all the various efforts?
19:07 pepp: uriah: hi. digetx might have a more recent branch
19:07 uriah: Ok
19:07 uriah: I saw that and just wanted to be sure
19:07 uriah: There are 2, one for intel and one possibly more general?
19:09 uriah: Also, I saw v12 qemu venus patch series; is that meant to be separate, or behind the native context commits?
19:09 pepp: I think so. The native-context one seems to have everything required for amdgpu nctx
19:09 uriah: Ok
19:09 uriah: Sweet thanks
19:09 pepp: venus / nctx are separate, but share some infrastructure work in qemu
19:09 uriah: Ok
19:10 uriah: So I should try both at once and see where I get?
19:10 uriah: As in, both patch series, and test whether either works individually
19:12 uriah: Like, in the future... would people adopt a choice between both?
19:12 uriah: Especially considering wanting venus for legacy hardware that won't be supported by native context?
19:13 uriah: Or am I misunderstanding the intel side, which is currently iris and up?
19:13 uriah: (I would also be interested in trying to port digetx's work to i915/i965)
19:14 uriah: I have an old retina haswell laptop xD
19:21 uriah: Anyhow thank you so much for answering
19:45 DemiMarie: budgetazoo: support for Intel native context on older hardware would be great, but IOCTLs should be gated based on the hardware version to avoid exposing unnecessary attack surface.
19:48 uriah: Demi: noted.
19:49 cambrian_invader: ok, so I wrote a patch for mesa to create the framebuffer with the gpu driver and then transfer it to the display driver
19:49 DemiMarie: budgetazoo: thanks! At some point it might make more sense to have a separate renderer but I’m not sure.
19:49 cambrian_invader: but it doesn't always work because drm_gem_dma_prime_import_sg_table wants a contiguous buffer
19:50 cambrian_invader: the DMA for this device does support scatter-gather; do I need to convert to shmem for that to work?
19:54 cambrian_invader: and where do I hook into the dma stuff?
22:20 alyssa: robclark: for honeykrisp+virtgpu, I'm getting a lot of log spam:
22:20 alyssa: TU: error: ../src/freedreno/vulkan/tu_knl_drm_virtio.cc:1299: could not get connect vdrm: No such file or directory (VK_ERROR_INCOMPATIBLE_DRIVER)
22:21 alyssa: wonder if we can make turnip less noisy...? "have virtgpu but not adreno virtgpu" shouldn't really be different than "don't have msm or kgsl"
22:24 Ermine: I guess drm_gem_cma_mmap() is not a thing nowadays?
22:30 cambrian_invader: Ermine: it's drm_gem_dma_* now
22:37 uriah: pepp: Demi: digetx's qemu patches did the trick for amdgpu native context to work here, it seems! Thanks so much!
22:37 Ermine: cambrian_invader: oh, thank you
22:37 DemiMarie: budgetazoo: which qemu patches?
22:37 uriah: [@_oftc_alyssa:matrix.org](https://matrix.to/#/@_oftc_alyssa:matrix.org) and thanks to your discussion irl too :>
22:37 uriah: Uh oh I matrixed apologies
22:39 uriah: Demi: the 4 last commits applied on top of v12 venus patch series https://gitlab.freedesktop.org/digetx/qemu/-/commits/native-context
22:39 uriah: Using qemu-9.1 and linux 6.10.10 rn
22:39 uriah: I will push a repo for how to get it working on gentoo asap
22:42 uriah: Demi: v17 actually sorry. From august 22
22:42 uriah: v12 is what I used for kernel venus under pepp's linux commits
22:43 uriah: (Which I edited a bit to get up to 6.10.10)
22:51 Ermine: Also, in Documentation/gpu/drm-mm.rst: "To use drm_gem_mmap(), drivers must fill the struct struct drm_driver gem_vm_ops field with a pointer to VM operations" --- seems like outdated statement
23:27 robclark: alyssa: yeah, I guess that could probably be quieted down.. but probably not something I can look at this week