04:08soreau: why does radeontop make my compositor frame events perfectly on time whereas without, they are latent by some margin? ref: https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/2087
06:23airlied: hey fossilise/shader-db experts, is there anything written to compare the compile speed between two runs?
06:39soreau: it seems like /sys/class/drm/card1/device/power_dpm_force_performance_level should be set to "high" for workstations by default
06:41soreau: doing this acheives the same frame timing perfection as running radeontop
08:05glehmann: karolherbst: workgroup size does no matter for load_subgroup_size for VK/GL
08:06glehmann: they are seperate concepts, and if you use a too small subgroup size you will get partially empty subgroups.
08:08glehmann: so back to your example, if the hardware/driver subgroup size is 32 and the workgroup only has 16 invocations, you get a subgroup size of 32
08:34MoeIcenowy: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/101245813 oh Mesa CI fails sanity check
08:37MrCooper: looks like an infra issue which should be raised on #freedesktop
08:50pq: emersion, daniels, last time (several years ago) I looked at evdi, all it did was to route stuff back to userspace, so it really had no device to attach to, USB or otherwise, regardless of what the (proprietary) userspace on the other end did.
09:03emersion: pq, userspace could surface back the USB device details and create one KMS device per USB device
09:03emersion: but that's a bad architecture anyways, shrug
09:06tzimmermann: emersion, pq, i work on some improvements to the kernel's damage handling. if DRM detects a modeset, if frees the damage clips from user space. but for mere page flips, it keeps them. do you know if that is an intentional design?
09:06MoeIcenowy: well at least evdi isn't something mainlined?
09:06emersion: i have no idea
09:07tzimmermann: emersion, which question are you answering to?
09:08emersion: yours ;P
09:08tzimmermann: oh, ok. do you know if any user space re-uses any such damage clips in later commits?
09:21emersion: oh, you mean the damage clips blob?
09:22emersion: i think all userspace i know of keeps it strictly per-commit
09:22emersion: it would be weird to persist it across commits, damage changes all the time
09:30pq: yeah, what emersion said.
09:30javierm: tzimmermann, emersion: if this is cleaned up and the damage clips are not carried over between page flips, can the ignore_damage_clips then be dropped ?
09:31javierm: I mean the struct drm_plane_state.ignore_damage_clips
09:31tzimmermann: emersion, right. that makes sense. the kernel's logic is very inconsistent. it clears the blob in the middle of the atomic check. then different pipeline elements see different state
09:32tzimmermann: javierm, i'd rather make it the single flag for damage iterators
09:32pq: tzimmermann, what do you mean by "frees the damage clips from user space" exactly? Does it implicitly destroy the blob and free its id?
09:33tzimmermann: pq, exactly this
09:33emersion: i wonder what happens when switching VTs from a FB_DAMAGE_CLIPS-aware DRM master to one which doesn't support FB_DAMAGE_CLIPS
09:33emersion: is the new master unable to update any region outside of the clip?
09:34pq: I see, that freeing probably needs to the kept then, surely some userspace already depends on it.
09:34emersion: pq, if the free'ing is delayed a bit, until next commit, probably it's fine?
09:34tzimmermann: emersion. that would be a full modeset; hence no damage clips
09:34pq: emersion, yeah
09:34emersion: tzimmermann: VT switch doesn't imply full modeset
09:34pq: Are blob IDs re-used?
09:34zamundaaa[m]: Doesn't damage clips automatically reset to zero, like in fence fd?
09:35tzimmermann: specifically at https://elixir.bootlin.com/linux/v7.0.10/source/drivers/gpu/drm/drm_damage_helper.c#L81 if there's an implicit modeset, the kernel clears the blob
09:35emersion: zamundaaa[m]: it would've been nice, with the wrinckle that ffmpeg kmsgrab wouldn't be able to read it
09:35emersion: but it's a hack anyways
09:36tzimmermann: now my question is: does user space require or want this? or rather not?
09:36emersion: zamundaaa[m]: a bit too late for that anyways
09:36pq: What if the kernel implicitly frees a blob ID, and then userspace later frees the same ID? Could that ID refer to something else by then? Are blobs ref-counted btw?
09:36emersion: tzimmermann: does git blame say anything interesting?
09:38zamundaaa[m]: emersion: making kmsgrab less useful sounds like a bonus to me :D
09:38emersion: aha
09:39emersion: tzimmermann: maybe it's a workaround for the VT switch or sth
09:39tzimmermann: emersion, is says exactly nothing: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v7.1-rc6&id=d9778b40260950a01a00852be43ca6c5c2d97f69
09:40emersion: well, i think there's no point in taking damage into account for modeset
09:40emersion: and it's safer to ignore/reject damage, so that a modeset always starts from a blank state
09:41emersion: but maybe the way the kernel implements this is not the right way, that i don't know
09:41tzimmermann: emersion. exactly, we ignore damage in full modesets. but in some cases, user space supplied a page flip, but the driver upgrades to a full commit. in those cases, the kernel frees the damage blob
09:42emersion: what do you mean by "full commit"?
09:42emersion: do you mean modeset?
09:42tzimmermann: emersion, a full modeset
09:42emersion: kernel should never upgrade a regular commit to a modeset
09:42emersion: if ALLOW_MODESET is not set, the kernel is not allowed to perform a modeset
09:43tzimmermann: emersion, by regular commit, you mean page flip?
09:43emersion: i mean an atomic commit without ALLOW_MODESET
09:43emersion: which would be page-flip in legacy uAPI parlance
09:44emersion: well, it's perfectly possible that there's a quirk of the legacy uAPI that i'm not aware of
09:45tzimmermann: emersion, ALLOW_MODESET is still respected
09:47emersion: pq, yes it's ref'counted, and user-space holds a ref iirc
09:49javierm: tzimmermann: what I meant is that if you make the core to also do drm_property_blob_put() on page flips, then things like https://elixir.bootlin.com/linux/v7.0.10/source/drivers/gpu/drm/virtio/virtgpu_plane.c#L113 won't be needed
09:50javierm: AFAIU that's a workaround to the fact that damage clips are carried over during page flips
09:51tzimmermann: javierm, it's more complicated than that
09:52javierm: tzimmermann: ah, I see. You still need if the buffer attached to the plane is changed between commits...
09:53tzimmermann: the damage blob is for mere page flips. we cannot _put() them.
09:53tzimmermann: the code in virtio is for flushing the framebuffer data (i think), while damage clips is for planes.
09:54tzimmermann: i have a series that keeps damage blobs unconditionally across all state changes. instead it maintains ignore_damage_clips to control DRM's usage of the damage clipping
09:55javierm: tzimmermann: and what will the semantics of ignore_damage_clips be? To always ignore it or ?
09:55tzimmermann: a would assume from the comments here that user space always updates damage clips as required.
09:58tzimmermann: javier, by default damage clips are enabled if present. if there's a special case that requires full-screen updates, drivers/helpers set the flag during atomic_check. later damage handling during atomic_commit would then ignore damage clips and go for the full update
09:58tzimmermann: javierm ^
09:59tzimmermann: that cleanly separates the user-supplied property from the kernel's interpretation
09:59javierm: tzimmermann: got it
10:00emersion: tzimmermann: so it would be nicer to set that flag there instead of dropping the blob?
10:00tzimmermann: yes. from the discussion here, it seems that user-space is ok with this
10:00tzimmermann: emersion ^
10:00emersion: yes, seems so
10:01javierm: that also sounds more consistent with what drivers do when they want the damage clips to be ignored so it makes sense to me as well
10:03tzimmermann: you might want to look over this series, although it's still buggy in edge cases: https://lore.kernel.org/dri-devel/20260530185716.65688-1-tzimmermann@suse.de/T/#mb57af259fbcf28aae46e6f560962c74956df97b7
10:03javierm: originally I was thinking the opposite, to set it to drop the blog and set to NULL everywhere but instead using the existing ignore_damage_clips makes more sense
10:03javierm: and is more explicit indeed
10:03javierm: tzimmermann: sure, I'll take a look
10:04tzimmermann: thanks, javierm
10:06tzimmermann: javierm, when i looking into damage-handling code, i found various conditions and tests that affect damage handling. the easiest resolution appears to consolidate this aroudn that flag. although i know that this wasn't the initial intention of the flag
10:07javierm: tzimmermann: yeah, as mentioned I originally thought that could be dropped and instead use the lack of the blob as the condition
10:07javierm: but given that there's a flag already, makes sense to use it and make the condition more explicit
10:34javierm: tzimmermann: I reviewed your series. It's a nice cleanup, I just had two small comments
10:36javierm: tzimmermann: now going back to virtio-gpu (and vmwgfx), I wonder if the 'new_state->fb != old_state->fb' condition shouldn't be part of drm_atomic_helper_check_plane_damage() as well
10:38javierm: since there's no support for buffer damage currently, I believe that check could be part of the core and be dropped from these drivers
10:49tzimmermann: javierm, why? if we swithc the fb, damage clips tell us what has changed. that's the main purpose.
10:50tzimmermann: we cannot move this into check_plane_damage
10:51tzimmermann: IIRC gnome uses 2 fbs to flip between them. simple mouse cursor movements would become really slow
10:53karolherbst: glehmann: okay, at least in OpenCL is implementation defined, and the CL CTS test I was looking at was relying on specific behavior, so in the end I fixed the CTS and all is good. Was just curious how it's defined generally.
10:56javierm: tzimmermann: right, I forgot about the whole per-buffer uploads vs per-plane or per-CRTC uploads distinction
10:56javierm: tzimmermann: these virtual drivers are the odd ones so it makes sense for the check to be in the drivers
10:56javierm: that's actually why I preferred to ask here instead of the mailing list since I wasn't sure about it
11:27tzimmermann: javierm. yeah, makes sense to split the patch. will do that for the rev
11:29tzimmermann: 'the next rev'
11:32javierm: tzimmermann: thanks, and yeah if you split would be easier for stable to cherry-pick
12:41wens: looking to do some more GEM DMA helper related cleanup, and this bit of code looks very weird #
12:41wens: 242
12:41wens: 243
12:41wens: 244
12:41wens: https://elixir.bootlin.com/linux/v7.0.10/source/drivers/gpu/drm/imx/dcss/dcss-plane.c#L241
12:46wens: I imagine pitches should already be the correct value and does not need to be divided?
13:19wens: and also it uses the dma address from gem dma object of plane 0 as the base for calculating the address of plane 1
13:19wens: seems a bit iffy, even though I suppose most applications just use the same dma buffer for all planes, just one plane after the other
14:10jnoorman: NIR (and related places like `spirv_to_nir`) often uses patterns like `nir_instr1(nir_instr2(...), nir_instr3(...))` to build instructions. Since the evaluation order of arguments is unspecified in C, this introduces nondeterminism in the compiler. I'm currently hitting this issue because my x86 drm-shim doesn't always produce the same output as my aarch64 native build. I imagine this could be en issue for tests as well (e.g., any test
14:10jnoorman: using something like `check_nir_string`). So I was wondering how people feel about this: would it be a useful goal for Mesa to have a deterministic compiler?
14:19karolherbst: jnoorman: given that most of those are generated, we might be able to enforce ordering globally for a lot of things
14:20karolherbst: could wrap it with a macro, or something
14:20karolherbst: though that might suck in different ways
14:24karolherbst: do we have a simple way to determine if a nir_intrinsic uses a variable? mhh
14:24karolherbst: well image variable to be specific
14:24karolherbst: ohh I have a better idea, nvm
14:26jnoorman: karolherbst:
14:27jnoorman: you mean creating macros for all builders to enforce ordering? I tried to think of a way to do that but couldn't immediately think of one.
14:29karolherbst: yeah.. just take the expressions as arguments and evaluate them in order in the macro and then call into a builder C function
14:29karolherbst: but that turns all those into macros..
14:41glehmann: ideally C would just have a defined execution order for this, it's just silly
14:44karolherbst: you'd think there is a compiler option for that, but no :)
14:46pq: what's one more compiler in Mesa, use it to compile that code with determinism...
14:51jnoorman: glehmann: I agree, but I have found no way to force this somehow.
19:57airlied: karolherbst: do you have llama.cpp fossils?
19:57karolherbst: never created some, but you can run individual things easily
19:58karolherbst: but yeah need to create fossils yourself
22:25airlied: loop unroll 972 blocks to 13467 blocks might explain the slowdown
22:30karolherbst: oof
22:31karolherbst: maybe don't want to unroll that aggressively :)
23:02airlied: the loop is marked as unroll in the spir-v though