IRC Logs of #dri-devel on irc.freenode.net for 2023-03-09

00:49 bluetail: hello. Is it possible to disable MPO (https://www.kernel.org/doc/html/next/gpu/amdgpu/display/mpo-overview.html) in archlinux? I read somewhere that this allegedly can fix amdgpu related issues.
00:49 bluetail: I mean, disabling without working on the kernel, perhaps even something simpler? Forcing performance mode to true didnt help
04:02 agd5f: bluetail, I doubt you are using MPO. Not many desktops use overlays at this point
09:24 eric_engestrom: is there a way to get deqp to print its version? there's no `--version`
09:24 vliaskov: jadahl ajax gnome-shell/x11-backend mutter crashes on intel using Mesa 23.0.0 https://pastebin.com/52uFkWp5 , the culprit Mesa commit is "19c57ea3 glx: Remove pointless GLX_INTEL_swap_event paranoia". Is that event really supposed to be handled differently by mutter as that commit suggests, or did Mesa glx drop the ball here?
09:25 eric_engestrom: I'm asking because it looks like hakzsam's cts update wasn't applied to the debian/armhf_test container as it doesn't have some tests, but printing the version would confirm that
09:29 vliaskov: a colleague opened a mesa issue regarding the GLX_INTEL_swap_event related crash , discussion should happen there: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8542
09:29 jadahl: vliaskov: that'd mean we got a glx swap buffer event without doing a swapbuffer, which seems to be what that commit was being (seemingly rightfully) paranoid about
09:31 vliaskov: thanks jadahl, let's continue in the bug
09:38 MrCooper: lina: the deadlock described by Christian is real, though very subtle
09:39 MrCooper: I'd advise saving the time & energy arguing against that
09:43 kode54: how useful, my client can't auto recover my name due to registration quiet rules
09:44 MrCooper: what registration quiet rules?
09:44 kode54: +M moderation of unregistered users
09:45 MrCooper: that was an issue for me before on FreeNode, not here though
09:45 kode54: network somehow didn't decide to match my client cert because the name didn't match
09:46 kode54: oh well
09:46 MrCooper: authenticating with NickServ works for me even while I'm on channels with +M
09:46 kode54: this network is set to client cert sasl rather than nickserv
09:46 emersion: kode54: yeah i've hit this as well, this is so annoying
09:47 emersion: OFTC for the win :S
09:48 emersion: MrCooper: happens when the client degrades to an alt-nick
09:48 kode54: I mean, libera and freenode had the same rule with nick degrading to an alt nick on registration only moderated channels
09:48 emersion: libera has SASL
09:48 MrCooper: right, but in contrast to FreeNode before, authenticating with NickServ works after that and changes to the proper nick
09:49 kode54: freenode never changed your nick automatically
09:49 kode54: though some clients, notably peter pawlowski's client, would forcibly do the full auth handshake before rejoining channels
10:07 MrCooper: lina: FWIW, the deadlock doesn't happen in any of the code touched by your series, but deep down in the core kernel MM code
11:45 eric_engestrom: hakzsam: I think f775873f81d1b8dd01e9b should have bumped DEBIAN_BASE_TAG as well?
11:46 eric_engestrom: we really need some way to document this
11:46 eric_engestrom: perhaps at the top of each file we could list the image tags that need to be updated
11:59 mupuf: +1 for that
12:20 hakzsam: did I make a mistake?
12:22 zmike: gasp
12:59 eric_engestrom: I think? but it's basically impossible to know which deqp version is running so I'm not sure
13:00 eric_engestrom: I raised https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/4327 about having `deqp-vk --version` or something
13:21 javierm: jfalempe, tzimmermann: do you know why virtgpu doesn't advertise FB_DAMAGE_CLIPS ?
13:22 javierm: AFAICT it already has a .dirty callback and calls drm_atomic_helper_damage_merged() in it's primary plane update callback
13:22 tzimmermann: javierm, it should
13:24 javierm: tzimmermann: yeah, that's what I thought. A colleague mentioned to me and I suggested him to post patch to call drm_plane_enable_fb_damage_clips()
13:24 javierm: I believe that should be enough
13:24 tzimmermann: but i recently had a discussion about this with danvet. and it sounded like it was deliberatly doing things differently. i just don't really remember if this was part of it
13:24 tzimmermann: maybe post that patch and wait for feedback
13:24 javierm: tzimmermann: maybe your discussion was about why it was not using the GEM shmem helpers ?
13:26 tzimmermann: javierm, no IIRC the discussion was about the use of shadow planes. but i don't remember the details. and now that i think about it, i might be confusing virtgpu with vkms here
13:26 danvet: tzimmermann, I think "forgot to enable the atomic property for damage update" is just lack of userspace using that
13:27 danvet: I don't have any memories why a driver would not do that
13:27 danvet: maybe we should have a check that if you have a ->dirty callback, are atomic but didn't enable atomic property, then WARN()
13:27 danvet: but would need a check to make sure drivers do this
13:27 tzimmermann: me neither. maybe i'm really confusing things between virtio and vkms here
13:28 danvet: vkms is the other way round
13:28 tzimmermann: danvet, we have a check in the damage helpers already. virtgpu already warns
13:28 danvet: it can enable the property, but it still needs to compute the crc over the entire thing
13:28 danvet: otherwise the crc computation breaks :-)
13:29 javierm: tzimmermann: yeah, virtgpu already warns
13:29 tzimmermann: danvet, right. i think that's what the discussion was about
13:29 danvet: tzimmermann, yeah that discussion I remember, but nothing about virtio
13:30 danvet: probably just a virtio oversight to not enable that
13:30 danvet: btw where's that warning?
13:30 tzimmermann: javierm, cirrus and a bunch of others also warned. it's just that no one cared much, because it's an optimization that it not immediately recognizable
13:30 danvet: there's much less atomic userspace that supports damage than legacy dirtyfb damage
13:31 javierm: tzimmermann: Ok, I'll tell him then to post a patch
13:31 danvet: it's a bit a chicken/egg thing
13:31 javierm: tzimmermann, danvet: the issue reported was that some qemu ui backend did a full plane update: https://gitlab.com/qemu-project/qemu/-/blob/master/ui/dbus-listener.c#L216
13:31 javierm: tried to optimize that but then the driver didn't advertise FB_DAMAGE_CLIPS
13:32 tzimmermann: danvet, see https://elixir.bootlin.com/linux/v6.2/source/drivers/gpu/drm/drm_plane.c#L1502
13:33 javierm: danvet, tzimmermann: thanks for the confirmation. It looked to me that was just missing that but wondered why was missing
13:33 javierm: "nobody cared" is a good answer :)
13:33 danvet: lol I even wrote that
13:33 danvet: and yeah I didn't find it because I was looking in the helpers
13:33 danvet: tzimmermann, thx
13:34 tzimmermann: danvet, and we had that same conversation just a few weeks ago where you asked me about this warning :D
13:34 javierm: haha
13:36 danvet: sometimes I'm a bit too stateless
13:36 javierm: I also noticed that the virtgpu driver is using drm_atomic_helper_damage_merged()
13:36 javierm: wonder when that is the correct thing to do over iterating over the damage areas
13:37 javierm: I guess if doing a big rectangle update is more efficient that doing many small updates
13:37 javierm: but probably not the case for virtgpu and more for small I2C/serial panels?
13:37 tzimmermann: javierm, that would be my guess as well
13:38 javierm: tzimmermann: Ok. I'll suggest him to look at that as well. Thanks!
13:38 tzimmermann: but it probably depends on the exact size of the big retangle and the number of extra pixels
13:39 javierm: tzimmermann: yeah, in general should be about the same. But I remember jfalempe had some perf issues on some old gfx cards
13:39 javierm: because the damage clips where too far away so the resulting rectangle was quite big
13:39 tzimmermann: javierm, IIRC i had this dicussion with jfalempe wrt mgag200. he mentioned that damage_merged() was always slower than individual updates
13:40 javierm: tzimmermann: right, that's the discussion I was remembering
13:40 jfalempe: javierm, yes it matters on mgag200, the graphic memory is very slow
13:41 tzimmermann: my take is to remove damage_merged() entirely and leave it to compositors. they can track how much overlap updates have and merge areas in userspace already
13:41 javierm: tzimmermann: 100%
13:42 jfalempe: and I had the case where you move the pointer at one corner of the screen, and the date/or some UI element updates at the other corner, triggering a full repaint.
13:44 javierm: jfalempe: that's an excelent example indeed
13:44 javierm: tzimmermann, jfalempe: should we add that item to the TODO ?
13:44 tzimmermann: javierm, sure why not
13:45 javierm: tzimmermann: Ok
13:45 tzimmermann: there aren't too many users of the function
13:46 javierm: tzimmermann: right. Then we can just address it in a patch series
13:47 tzimmermann: javierm, i know of one exception in ast: the cursor code needs to update the full 64x64 rectangle. damage_merged is used to test if there are any image updates at all. maybe other drivers have a similar requirement
13:48 tzimmermann: i do have an update for cirrus already. i just need to preapre another patchset
14:01 javierm: tzimmermann: I see. And then maybe makes sense to just disable damage clips for the cursor plane?
14:01 javierm: and only keep it enabled for the primary plane
14:02 tzimmermann: javierm, no. please keep it as-is. we don't want to update the cursor image unconditionally. the damage helpers are there for this situation
14:03 tzimmermann: ast cannot update only a sub-image of the cursor because there's some checksum computation involved. this needs the whole image. but it's only 64x64
14:06 javierm: tzimmermann: right. So the damage helpers are used to determine whether the cursor has to be updated or not, but when it does is the full 64x64 rectangle
14:06 tzimmermann: right
14:09 javierm: hmm, for this case the helper does fit quite well
14:09 javierm: tzimmermann: I think that won't touch anything then, as it seems that whether using a merged rectangle or iterating over the damage areas is really on a case by case basis
14:11 tzimmermann: javierm, it might make sense to review all instances. i think ast is a bit of a special cases here
14:14 javierm: tzimmermann: Ok
15:05 lina: MrCooper: Sorry, I wasn't checking IRC... but I'm kind of confused now, I thought he was talking about the signaling logic for forward progress of the scheduler, not MM/alloc related deadlocks...?
15:07 lina: I know there's a lot of subtlety around the MM stuff I don't understand yet, and we did touch on that in the thread, but I don't think that's what the argument was about...
15:13 javierm: lina: is pretty amazing the work you are doing, thanks a lot for paving the road for future rust DRM drivers :)
15:18 lina: Thank you! ^^
15:19 ccr: =)
16:07 danylo: In Freedreno (partially) and Turnip we are thinking about moving to C++ to use templates instead of genX macros to share code between gens (Turnip currently works only on a single gen, Freedreno just duplicated code), and to have nice c++ things. Anyway, there are no troubles with gallium driver, but with Turnip I'm not so sure, since e.g. previously Dozen used C++ for a bit but went back to C afterwards. Though it was done, from what I saw,
16:07 danylo: for reasons mostly not relevant to us (like older msvc compiler and lack of CI testing). After silencing a number of warnings, putting a big '-fpermissive', some changes to Turnip, and a few small changes to common code - Turnip compiled as C++. Templated entrypoints look good enough, works like this: https://godbolt.org/z/hf5rPE1E1.
16:07 danylo: Any downsides of going this way?
16:09 zmike: gallium drivers have been doing template functions for a while
16:09 zmike: seems fine
16:12 danylo: It's probably less about whether driver would be fine and a bit more about whether common code would be fine. Dozen returning to C was almost a year ago, now I had to make only a few small changes to common code, so from my POV it would be fine, but I have a terrible overview of things =)
16:12 hazl: hooray, another bit of useful info found on the dark souls 3 front: wined3d has no problem, so i guess I'm going to figure out where to take that info
16:17 danylo: @gfxstrand probably you would have a thought or two on this matter
16:19 hazl: i'm launching elden ring to see if it does the same thing, and if it does, i'm just going tot stop trying things for now out of fear of working myself up into etching a crucifix into this deck to try to remove the curses
16:21 hazl: oh wait, i can't, dx12 wouldn't allow that. it has the same issues i was finding in ds3 with not wanting to get past 40-50fps and being unstable on frame pacing
16:58 anholt_: Looks like mesa-swrast-* got owned last night. it'll be a bit until I can make new vms and upgrade and such so it hopefully doesn't happen again.
18:01 idr: So.... are we just not able to have MRs land?
18:01 cmarcelo: ?
18:01 idr: Because now DavidHeidelberg[m] has manually unassigned my MR from marge, so... what's the deal?
18:01 daniels: idr: they can land, just exceptionally painfully
18:01 daniels: anholt_: damn. :( container escape?
18:02 DavidHeidelberg[m]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21815 must go first
18:02 daniels: idr: we've been pushing fixes into the tree as fast as we can. which isn't very fast.
18:02 DavidHeidelberg[m]: sorry about that idr , it's the only way...
18:02 idr: daniels: It might be worth sending a message to the mailing list so that people know to be patient.
18:03 idr: Is the root cause known? Are they runners just down or ... ?
18:03 DavidHeidelberg[m]: idr: I'll re-assign your jobs just after fixing missing runners :)
18:03 DavidHeidelberg[m]: but yeah, you're right. I should at least write on IRC
18:03 DavidHeidelberg[m]: mesa-swrast is down
18:04 idr: Those particular jobs tend to catch a lot of bugs, so hopefully they'll be back soon. :(
18:04 idr: DavidHeidelberg[m]: Thanks.
18:04 DavidHeidelberg[m]: idr: already pinged Emma and Rob
18:05 DavidHeidelberg[m]: also there in in queue improvement about priority for Marge jobs, so reliability will go soon up
18:05 DavidHeidelberg[m]: (at least a bit)
18:24 DavidHeidelberg[m]: === Please do not assign for next ~ 1 hour to Marge yet please, I need merge +- 3 MR improving the CI workflow itself.
18:25 dj-death: ah okay
18:45 anholt_: daniels: no obviously sketchy job during the evening
18:57 swiftgeek: this is confusing me https://docs.mesa3d.org/gallium/distro.html#xorg-ddx
18:58 robclark: probably referring to the xa frontend?
18:58 swiftgeek: it says tracker, so I would assume it's somewhere inside mesa?
18:58 swiftgeek: ah thaks
18:58 swiftgeek: *thanks
19:00 swiftgeek: > should investigate sharing the loadig mechanism with the EGL gallium frontend.
19:01 swiftgeek: that's a typo isn't it
19:07 robclark: sounds like it
19:20 anholt_: daniels: so, I don't see an equivalent thing to metal's userdata for gcloud instance creation?
19:22 hazl: i just realized (for the dark souls 3 issue) that with wined3d it shouldn't (??) be able to use the steam shadercache that dxvk uses but there isn't any noticable stuttering loading in or running around and that means it's either generating shaders very fast or managing to use the compatdata vulkan cache
19:23 anholt_: I do like the idea of this script that brings up the instance, rather than building them from disk snapshots I've incrementally built up, but I think I'd have to write my own init?
19:29 anholt_: something something instance metadata and maybe a specific boot os and it works?
19:32 daniels: anholt_: yeah, set the metadata field of the VM, and at least Debian expected that (if it started with '#cloud-config') to contain cloud-init stuff
19:43 DemiMarie: anholt_: do you know how they got pwned?
20:07 anholt_: DemiMarie: no
20:07 anholt_: but we run arbitrary code from users, so :shrug:
20:08 DemiMarie: anholt_: do you spin up a new VM for each run?
20:08 anholt_: no, that's too slow.
20:09 DemiMarie: do the CI runs need access to the GPU?
20:09 anholt_: no, this is swrast.
20:10 DemiMarie: first suggestion would be to make sure your containers are unprivileged (uid 0 on host not mapped in container) and that you have SELinux enforcing
20:10 DemiMarie: Also check your host kernel; kernels from distros like RHEL are often out of date
20:12 kisak: This is probably not the best channel to do a brain dump of every hypothetical security best practice.
20:13 DemiMarie: kisak: fair
20:25 anholt_: also, you lack a lot of context, so generic suggestions are not very helpful.
21:26 demarchi: mattrope: do you know if we have a drm_printer that is equivalent to drm_dbg()?
21:27 demarchi: I see we have drm_debug_printer, but that is not the same
21:27 demarchi: all the callsite decoration is gone with that one
21:53 mattrope: demarchi: I don't think we have drm_printers equivalent to the drm_device-centric drm_dbg/drm_err/etc. print calls yet.
21:54 demarchi: the drm_info_printer() is the closest one since it receives the device as param
21:54 demarchi: but even that is not the same
22:31 robclark: the drm_printer thing is mainly intended to have something to share code between things like debugfs and other things.. not sure if call-site annotation makes sense for that, but if there is a use-case you can add drm_print_dbg() or something along those lines
22:34 demarchi: robclark: my use case is exactly for that. There is the debufs file that dumps the content of a table, and during the driver probe we also print entry by entry while applying
22:35 demarchi: robclark: I will take a look into adding a drm_print_dbg() later if I have a few more users
22:35 demarchi: thanks