11:29 emersion: daniels: hm so we're discussing split render/display again and we were wondering which of these two approaches would be better?
11:29 emersion: list of display-only/render-only drivers https://gitlab.freedesktop.org/mesa/libdrm/-/merge_requests/327/diffs
11:29 emersion: platform bus matching https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/5379/diffs#f7533503f405d332eea1b431b039fe2347804715_352_380
11:30 emersion: latter is now not-so-bad since we got the faux bus, but some drivers are still abusing the platform bus (e.g. https://github.com/DisplayLink/evdi/issues/571)
11:32 daniels: emersion: I think the latter is much better tbh, and maybe we can just special-case udl/evdi until they fix things?
11:32 emersion: udl is fine, it uses the USB bus
11:33 emersion: i don't have high hopes wrt evdi :/
11:33 emersion: daniels: so i've also been wondering if we can match the platform bus name
11:34 daniels: you mean like the sysfs path or?
11:34 emersion: on rpi both render and display nodes are under /sys/bus/platform/devices/axi/
11:34 daniels: heh, right
11:34 daniels: I guess usb would a pretty strong hint? or does it use something else?
11:34 emersion: but that's only on the vendor kernel, not in upstream
11:34 emersion: (https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/5379#note_3492891)
11:35 emersion: yeah, ideally evdi would use usb and then it's all good
11:35 emersion: but they assume platform bus in various userspace :/
11:35 emersion: and also they create a single platform bus for all displaylink devices iirc
11:35 emersion: so it would be quite a large change
11:36 emersion: daniels, i also wondered if there are devices where multiple devicetree devices can't directly access each others' memory
11:36 emersion: s/devices/systems/
11:38 daniels: emersion: not that I know of - generally they’re all pretty interchangeable with UMA. maybe they need CMA but that’s another problem :P I think this is a good start and then we can add exceptions as needed, rather than the other way around which is getting pretty tedious by now
11:38 emersion: well, exceptions is a no-go for me
11:38 emersion: i don't want to add driver-specific checks in wlroots
11:38 emersion: these are bad to duplicate across all compositors
14:49 daniels: emersion: ah sorry, what I meant is that I’d rather have a general rule with very few exceptions, than a much longer list of every applicable driver
14:50 daniels: but yeah, obviously better to fix udl/evdi and have no exceptions at all
14:51 emersion: i see - i think i'd prefer to have a "i'm 100% sure these are compatible" rule rather than "i'm 100% sure these are not compatible"
14:52 emersion: if it's just about evdi, i think i don't care enough :P
14:52 emersion: do you think we'll need to add more exceptions for future hw?
14:52 emersion: specifically hw where the faux bus doesn't make sense?
18:05 daniels: emersion: I can’t really think of hardware where a platform-bus GPU can share with a non-platform display controller for free (or it’s just required), or something with the kind of performance penalty you get from faux, but on a platform bus
18:07 daniels: maybe for those, the best heuristic is that if users see a KMS device that prefers shadow (on any bus), and a GPU as a different device, to render headless and then explicitly copy? afaik those are mostly devices which take a copy through the kernel anyway, e.g. SPI panels
18:17 HdkR: daniels: DGX Spark where the GPU lives on one side of the NVLink-C2C and the display controller lives on the other side? :P
18:19 daniels: HdkR: a platform where people famously run general-purpose compositors
18:19 HdkR: Famously indeed
18:22 HdkR: I run i3 on mine, so it's absoute as general purpose as possible :P
18:24 karolherbst: I got pinged on a darktable bug report on DGX Spark, because apparently clEnqueueReadImage is super slow if the destination host memory doesn't have populated pages yet :')
18:24 karolherbst: and madvise with MADV_POPULATE_WRITE does fix it..
18:25 HdkR: Oh hey, Spark is why in my micro-bench does MAP_POPULATE on its mappings first :D
18:25 karolherbst: :D
18:25 karolherbst: would be nice if the driver would just do it for us...
19:28 karolherbst: Could it be that "nir_intrinsic_load_subgroup_size" has kinda broken semantics atm? Like what is this supposed to return if the subgroup size reported by the hardware/driver is 32, but e.g. the workgroup only contains 16 invocations?
19:29 karolherbst: atm a lot of lowering/drivers seem to implement nir_intrinsic_load_subgroup_size as nir_intrinsic_load_subgroup_max_size actually (which I'll need to non uniform workgroup support I guess)
19:33 karolherbst: or maybe it's always supposed to be the native subgroup size even with smaller workgroups.. it's kinda hard to tell because docs are also pretty much non existent
19:34 karolherbst: though at least in CL the "normal" one is implementation defined where the "max" one is... the max