05:56tomeu: if anybody has trouble building rusticl with clang 16, try upgrading bindgen to something newer than 0.59.2 (what is in CI currently)
06:31kode54: wow
06:31kode54: check out this um, thing
06:31kode54: https://youtu.be/4lOmJmH4pas
06:31kode54: (video recording of monitor showing some fine sync looking jank)
06:32kode54: (video on screen is of a bald eagle getting its beak planed, because that's a thing captive birds need?)
07:47sima: kode54, looks like tile-y ccs compression lolz
12:59rodrigovivi: airlied sima: what's your take on this https://lore.kernel.org/all/2023051029-overspend-sherry-1b85@gregkh/ ?
13:00rodrigovivi: should we continue using STAGING config in our drm drivers and make sure we taint the kernel? or should we create a new DRM_STAGING tag depending on EXPERT and move the current usages to it and then add it to Xe?
13:01rodrigovivi: demarchi ^
13:17sima: rodrigovivi, for out-of-tree it's kinda whatever
13:18sima: once it's merged I think the alpha_support_flag is good enough until it's all fully validated
13:18sima: since that should taint too
13:18sima: ofc if you're already 100% confident on the uapi and "not too many bugs to the point we can't support the thing" then you can enable right away imo
13:19sima: since either way there's no taking back with uapi really
13:53karolherbst: sima: do you have some time to help me understand what's going on with https://gitlab.freedesktop.org/drm/nouveau/-/issues/209 ?
13:54sima: regrets?
13:54karolherbst: how so? I just don't see the problem lockdep points out here
13:55sima: oh just general sarcasm, there's usually not a good time ahead when your lockdep splat has 4 locks involved
13:55karolherbst: yeah, but I also don't see the deadlock anyway
13:56sima: btw if you have anyhing with more than 2 contexts (2 locks or lock + irq) the "Possible unsafe locking scenario:" is bs
13:56sima: it's only correct for 2
13:56karolherbst: ahh
13:56sima: so you need to build the full graph yourself
13:56karolherbst: shouldn't get printed then
13:56sima: where's the entertainment in that
13:56sima: but yeah the loop seems to be calling request_fw while holding a lock
13:56sima: which has a chain back to mmap_sem
13:57sima: which deadlocks, because request_fw needs userspace and that needs to be able to take faults :-)
13:57sima: so pull that request_fw call out of any mutex section and you should be good
13:57karolherbst: ahh
13:57karolherbst: so generally, request_firmware while holidng locks == bad
13:57sima: you get the splat only into some arbitrary vfs locks but that's the root issue I think
13:57sima: yup
13:58karolherbst: I have to try to trigger this anyway
14:00karolherbst: mhhh
14:01karolherbst: sima: yeah soo. bad news, I doubt we can move that request_firmware out of locked sections
14:01sima: maybe put a mmap_read_lock(mm); mmap_read_unlock(mm); into request_fw to trigger it more reliably
14:01sima: karolherbst, hence regrets :-P
14:03sima: karolherbst, from a very quick look
14:03sima: you get to add another hook plus bail-out path
14:03sima: so you can request the fw in the ioctl, stuff it somewhere and retry
14:03karolherbst: mhh. annoying
14:03sima: or you make sure you load all these at driver init time, which is kinda how it should
14:04karolherbst: yeah.. we already moved some firmware to that
14:04sima: (aside from the annoyance of delays and I guess harder to hack around on)
14:04karolherbst: and I was thinking about doing the same with this one then
14:04sima: yeah, somehow you need to pull it out
14:05sima: the annoyance with delays is that when you do this at driver load it's either a stall, or you do it async and then it's still a dependency, but you might accidentally cheat lockdep out of noticing it
14:05karolherbst: Maybe I just talk to Ben about it, because with GSP all that changes anyway
14:05karolherbst: and with GSP we get like multi MB firmware
14:06karolherbst: sima: wasn't the problem with loading firmware on init, that we need an FS in the first place?
14:07karolherbst: which kinda requires a userspace program context?
14:07sima: karolherbst, the fs can be the initramfs
14:08karolherbst: right.. but I was kinda under the impression it needs to start from an ioctl, $because
14:09karolherbst: ohh wait, that was just a problem with suspend/resume/whatever
14:09karolherbst: so on suspect you can't load firmware
14:09karolherbst: *suspend
14:11karolherbst: sima: so I guess what we should do is to call request_firmware_nowait on all firmware files we need and just make code needing that firmware to wait on its completion?
14:11karolherbst: which we should be able to do outside of locked areas
14:12karolherbst: mhh
14:12karolherbst: maybe not
14:12karolherbst: uhhh
14:12karolherbst: annoying
14:15sima: karolherbst, yeah the wait is still an issue
14:15sima: but if you just wait for a completion lockdep wont see the chain anymore
14:15sima: "there I fixed it"
14:44karolherbst: sima: yeah.. so I think the issue here is, that when all users of the GPU disappear, we kinda clean stuff up, but once there is another user, we reinitialize things, which might trigger the firmware to be loaded again. So my assumption is, if we just cache it the first time it should be fine...
14:45karolherbst: it's kinda a pita situation tho... I mean the entire way of how firmware loading things
14:45karolherbst: *works
14:45sima: karolherbst, why do you clean up when all users disappear?
14:45karolherbst: I have no idea
14:45sima: is that just poor runtime pm because vgaswitcheroo didn't need more?
14:45karolherbst: I doubt it
14:46sima: but yeah loading the fw once and caching it is the way to go I think
14:46karolherbst: we also do so because we had issues doing firmware stuff on suspend
14:46karolherbst: for some stuff
14:46sima: but first time still needs to be outside the locks
14:46karolherbst: guess we should do it for more
14:46karolherbst: mhhh
14:46sima: well reloading fw on resume makes sense
14:46sima: it's more "why is this connected to users"
14:46sima: since ideally you suspend stuff even when there are users, as long as they didn't do anything for a while
14:46karolherbst: the problem was, we needed new firmware on suspend
14:47sima: yeah you need to cache
14:47karolherbst: we already do
14:47sima: hm what do you mean with "new fw" then?
14:47karolherbst: I mean, we fixed _that_ issue already
14:47karolherbst: but I don't think we need to do the initial loading outside locks
14:48karolherbst: the problem is more or less: process releases all fds to nouveau, takes mmap lock on it clean up path. Another process spawns opening an fd to nouveau and starts the init path, that might trigger a firmware load and with bad timing -> dead lock
14:48karolherbst: we kinda have locks per subsystem
14:48karolherbst: and uhh.. it's kinda annoying
14:49karolherbst: so I can see this happening midway through deinit/init
14:50sima: karolherbst, "I can explain how lockdep is wrong in this specific case" is not very good approach to shutting up lockdep :-)
14:50karolherbst: I'm not saying lockdep is wrong
14:50kode54: sima: maybe 14013111325?
14:50sima: kode54, ?
14:50kode54: workaround/bug number
14:51kode54: something to do with MCS?
14:51karolherbst: I'm just saying it's a deadlock when you have a process freeing the last nouveau resources vs one allocating one and it's just bad timing
14:51sima: karolherbst, hm I guess I misunderstood
14:51sima: but also my take on lockdep splats is that they tell you the design isn't clean
14:51karolherbst: so we if cache the firmware, on reinit it should be fine
14:51sima: even if there's some specific reasons for why it can or cannot happen in practice
14:52karolherbst: that case only happens if the firmware already got loaded once is my understanding here
14:52karolherbst: and doesn't on first load
14:52sima: yeah that's what I mean with "in practice this can't happen on first load, lockdep is wrong"
14:53sima: because it's different files/mmap_sem/whatever
14:53karolherbst: yeah, it's not complaining on first load here
14:53sima: hm I need to look at the splat again
14:53karolherbst: `__x64_sys_munmap` kinda gives it away I think
14:54karolherbst: mhh not sure
14:56sima: well if you say it doesn't splat on load
14:56sima: then that means the chain of lock deps matter
14:56sima: to tie the loop
14:56sima: and a lot of those are in nouveau cleanup code
14:56sima: so maybe there's another place where you have a not-so-great dependency that shouldn't be there for a clean design
14:56karolherbst: I'm sure there are some of those
14:57karolherbst: that splat with driver_probe_device is kinda weird
14:58karolherbst: maybe it's also a race on driver init and userpsace processes already using the driver?
14:59sima: lockdep doesn't care about races
14:59sima: as long as you hit all the call paths
14:59karolherbst: I know
14:59karolherbst: but
15:00karolherbst: mhhh
15:01sima: so I don't know enough nouveau.ko by far to say anything
15:02sima: but in general taking big locks in ctor/dtor paths is smelly
15:02sima: and having a request_fw makes these locks the biggest you can have in the kernel
15:02sima: big = starting points in the dep graph
15:03sima: the issue could be taking these locks in dtor/ctor (but in the issue there's 3, so which one is it?)
15:03sima: or that the innermost one is too big due to the request_fw and really should just be a very, very inner mutex to serialize hw access
15:04sima: sometimes it's unavoidable and then you need to push dtor work into a cleanup queue
15:05karolherbst: right, but this is more ref counting stuff, where nouveau frees up resources on last use
15:06karolherbst: it's also to deal with the fact, that you can enable/disable certain parts of the GPU
15:06sima: yeah holding the mutex over more than the dec is an antipattern
15:06sima: it's kref_put (good) vs kref_put_mutex (pain)
15:08karolherbst: yeah.. so we guard the initialization of those bits with locks
15:09karolherbst: and we take the same lock on unref
15:09karolherbst: mhh, it's all kinda weird, maybe Ben has any ideas
15:34kode54: might have been mesa!22802
15:52q4a: Hi. Is there channel for Imagination GPU? like #_oftc_#panfrost:matrix.org for Mali or #_oftc_#d3d9:matrix.org for Nine.
15:55alyssa: yeah
15:55alyssa: #powervr
16:13q4a: Thanks!
16:36pinchartl: sima: would you consider dim to be usable for other subsystems ? not necessarily as such, but would the required refactoring of the script to move DRM-specific bits to a configuration file be something that would be a) doable and b) accepted upstream ?
16:38sima: not sure
16:38sima: in the end it's bash
16:39pinchartl: at least it's not perl ;-)
16:40sima: but also, it has config files, both for user specific stuff (.dimrc) and for what kind of trees/branches you have and how the integration tree is built (nightly.conf for historical reasons)
16:40sima: so I think the only thing needed is to make the "where is the branch with nightly.conf" configurable you're probably 90% there
16:41pinchartl: I may have a look, depending on how much buy-in I would get for using it for V4L2
18:33DemiMarie: pinchartl: Perl > bash
20:12karolherbst: gfxstrand: I think I know how to deal with crossworkgroup initializers: make them the frontends problem. We should simply add some APIs (maybe to clc) to query for all of them and the frontends just creates the nir_shader normally for them. And then it's entirely up to the frontend to when to call them
20:15karolherbst: mhh actually... those are always blocks of constant memory, no? Though for destructors it might not be
20:17karolherbst: Initializer + Finalizer
20:17karolherbst: but yeah, for those I think clc should just expose that information
20:18karolherbst: and make it part of `clc_kernel_info` or something
20:19karolherbst: I wonder if we want to lower CrossWorkgroup Variable initializers to a kernel function just writing to memory passed in as the only arg
20:19karolherbst: hh
20:19karolherbst: mhhh
20:19jenatali: Yeah, that was my thinking about how to handle it too
20:20karolherbst: sadly I really need this soonish now :)
20:20jenatali: Ping me if you need code review
20:20karolherbst: I think the Initializer + Finalizer case is pretty straightforward
20:20karolherbst: the variable initializer one is the annoying part
20:21karolherbst: jenatali: ohh btw, I have a trivial one for you: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22893/diffs?commit_id=5353eafc75b4ff68a6787a10caebfa72da96271c
20:21karolherbst: also
20:22karolherbst: I think that `cl_khr_subgroups` line is actually wrong
20:22karolherbst: soo.. this is where it's getting weird
20:22karolherbst: OpenCL core is different from cl_khr_subgroups
20:22jenatali: For initializers, I think it can just be reflected out and make it the frontend's problem. It just needs to ensure that the data is uploaded into the buffer before the first kernel is executed
20:22karolherbst: the extension requires CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS where OpenCL core doesn't
20:22karolherbst: so I'm not sure if it's safe to always expose the extension string via clc
20:23jenatali: Oof
20:23jenatali: Yeah maybe split that
20:23karolherbst: subgroup + subgroup_ifp or subgroup_khr_ext?
20:23jenatali: I think subgroup_ifp
20:24karolherbst: the CTS is happy eihter way, but I don't want to debug applications where they actually check that correct in kernel and do things differently :)
20:25karolherbst: *correctly
20:26karolherbst: but I'm also not sure if we could just return true for all hw vendors here. I just don't know
20:27karolherbst_: ohh it's for threads within a sub group
20:27karolherbst_: yeah.. I guess not then
20:29karolherbst: ehh no, it's about subgroups towards each other actually
20:29karolherbst: that's way more sane
20:32airlied: konstantin_: asan with just a cts run doesn't produce anything locally?
20:33konstantin_: One sec, lemme try
20:33konstantin_: passes
20:34konstantin_: airlied: btw, why does llvmpipe copy constant buffers? I had to add it back because it caused some failures with the llvmpipe job but now, all dEQP-VK.descriptor_indexing.sampler_after_bind* test fail
20:35karolherbst: jenatali: I think I have a better idea... spirv_to_nir sould just return a array like for constant, and if the frontend allocates memory it just uploads the data...
20:36karolherbst: and for Initializer + Finalizer we just compile the kernel in the frontend as discussed above
20:36jenatali: karolherbst: Yeah isn't that what I said?
20:36jenatali: "For initializers, I think it can just be reflected out and make it the frontend's problem. It just needs to ensure that the data is uploaded into the buffer before the first kernel is executed"
20:36karolherbst: ohh, didn't see that
20:36karolherbst: yeah, it can be done on creation of the cl_program object and it just lives there or something...
20:36karolherbst: yeah...
20:37karolherbst: shouldn't be too hard then
20:37jenatali: Yeah it has to live there, since it can be shared among kernels in the program
20:37karolherbst: finalizers are funky
20:37karolherbst: I kinda don't see the point, but I can see weird pointer magic happening
20:37karolherbst: I suspect you could store pointers to memory there and retrieve it after the program object was destroyed
20:38karolherbst: it's messed up, but...
21:15airlied: konstantin_: complete CTS run though? it seems weird venus is triggering anything
21:17airlied: konstantin_: I'm not actually 100% what was behind the consts copying, I think it comes from the olden days where consts could get overwritten somehow
21:17airlied: so it would copy the consts, I'm not sure what value there really is doing it now if any
22:18karolherbst: dcbaker: updated my rust.bindgen + llvm fix https://github.com/mesonbuild/meson/pull/11733
22:19karolherbst: I don't really understand that loop in get_compiler_system_dirs, so I just copied it and hope for the best