07:22pq: sima, I saw your F_ISDUP callout, but I don't think I have any recollection right now about where and why its used. daniels might recall something where we need to check if two fds are dups of each other?
07:22daniels: not inside the compositor, no
07:23emersion: we need for vulkan import
07:24emersion: but i use a DRM FD, and compare GEM handlers
07:24emersion: handles
07:24daniels: yeah, nice
07:24daniels: we don't need to dedup GEM handles because we hide behind GBM/EGL to do that for us and give the uniqueness guarantee
07:24daniels: if we were doing it directly, indeed we'd just use a hash table of GEM handles which is iirc what Mesa does
07:26pq: daniels, wasn't there some GBM thing where I complained that something should check for something... thing?
07:27pq: maybe you just fixed that already
07:27MrCooper: pq: Mesa needs to know if two DRM fds passed in by the app reference the same file description, for correct tracking of GEM handles
07:27daniels: depends on what 'that' was ...
07:27sima: I thought there was a case where we needed to compare dma-buf fd for sameness and there was no drmfd around to do the import trick
07:28MrCooper: doesn't ring a bell offhand
07:28emersion: i think we always have a DRM FD…
07:29emersion: would need to check mesa usage…
07:33MrCooper: emersion: incidentally, I just thought of the GEM handle comparison trick while taking a shower this morning :) out of curiosity, does wlroots also handle the case where the DRM file descriptions are separate, but the imported GEM handle matches the original one by coincidence?
07:34emersion: hm, what do you mean?
07:36MrCooper: even if the DRM file descriptions are separate, the GEM handle of the imported BO may have the same value as the original exported one by coincidence
07:39emersion: for vulkan we want to know if DISJOINT imports are required or not
07:40emersion: if two DMA-BUFs represent the same underlying memory, we don't need DSJOINT
07:40emersion: importing the FDs as GEM handles and comparing these handles is enough for this use-case
07:41emersion: you're saying that maybe the fiile descriptions are different, but the GEM handles are the same? can this really happen?
07:42emersion: hm, i suppose mesa might be using kcmp for a different use-case then?
07:43emersion: trying to figure out whether two DRM FDs are the same file description?
07:43emersion: i haven't looked at mesa usage
07:44pq: emersion, I think the question is whether you always import on the same DRM device fd, or sometimes on different DRM open files?
07:45emersion: I have multiple DRM FDs
07:46pq: GEM handles are specific to an opened DRM device file, aren't they?
07:47pq: import buffer A on device 1, and buffer B on device 2, you might accidentally get the same number as a GEM handle I think
07:48emersion: sure
07:49emersion: for the equalty test, i use the same DRM FD of course
07:49pq: right, I think that was the question
07:49pq: so no accidental match is possible
07:50emersion: ah actually i misremembered
07:50emersion: i use the DMA-BUF inode number instead
07:50emersion: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/render/vulkan/texture.c#L490
07:53MrCooper: yep, different use case from Mesa then
07:56MrCooper: I was thinking Mesa could export a dma-buf from the reference DRM fd, import it into the comparison DRM fd, and compare the GEM handles; if they're different, it's definitely no the same DRM file description, otherwise would need to make sure they don't just have the same value by coincidence though
07:58sima: emersion, isn't there an older kernel where this kinda fails because it's all the same anon inode underneath all the dma_buf files?
07:58emersion: i think this would have bad side effects
07:58pq: if a coincidence could happen, then it can lie both ways: false-match and false-different.
07:58sima: yeah gem dma-buf fd2handle isn't guaranteed to be cheap
07:59emersion: MrCooper: importing a DMA-BUF will keep a ref to the memory
07:59emersion: there's no atomic way to know if the import created a new GEM handle or not
07:59emersion: so you can't close it either
07:59MrCooper: I know, this would use a BO created for this purpose
08:00MrCooper: pq: false-different can't happen, importing a dma-buf back into the exporting DRM file description results in the same GEM handle
08:01pq: I guess I didn't understand what kind of algorithm you're describing. No worries.
08:02pq: I didn't even understand if you wanted to compare buffers or devices.
08:02emersion: why does mesa need to compare DRM FDs?
08:02MrCooper: DRM file descriptions
08:03pq: I assumed you wanted to compare buffers, because that's what has been talked about.
08:03MrCooper: emersion: it can get multiple DRM fds from separate API calls, and needs to know if they reference the same DRM file description, for correct tracking of GEM handles
08:03sima: emersion, just looked, before ed63bb1d1f846 your inode comparison trick doesn't work because all dma-buf share a singleton inode
08:03emersion: if it's about GEM handle ref'counting, it would be much cleaner to have proper ref'counting in the kernel…
08:04sima: that was v5.3
08:04emersion: sima, is there a way to check?
08:04MrCooper: pq: well the whole thing about comparing file descriptions started with the use case I'm talking about, some 4 years ago
08:05sima: emersion, create two different buffers, export them, if they have matching inode number then it's the singleton
08:05emersion: yeah not going to do that
08:05sima: or just assume that you won't get bug reports from this old kernels
08:05emersion:shrugs
08:05MrCooper: emersion: maybe, I hope your time machine is working ;)
08:05emersion: i had no bug report about this yet
08:06pq: MrCooper, my memory goes back about an hour or two. :-)
08:08emersion: MrCooper: could be done with import flags or such
08:20MrCooper: you mean a flag which always creates a new GEM handle on import? I guess that could work
08:21emersion: or a flag that increments a counter
08:22MrCooper: I see
08:22emersion: always creating a new handle would be simpler, if it covers the use-cases
08:28MrCooper: I do suspect there might be use cases which need to know if it's the same underlying BO
10:24tomba: Is it fine to push fixes to drm-misc-next-fixes, if the fixes are already on drm-misc-next?
10:29sima: tomba, needs a "cherry picked from $git_citation" line and a ping to drm-misc maintainers so they're aware because cherry picks tend to cause strange conflicts
10:29sima: but otherwise fine
10:29sima: mlankhorst, tzimmermann ^^
10:30tzimmermann: tomba, please use 'dim cherry-pick'
10:31tzimmermann: it's usually better to merge fixes into drm-misc-next-fixes or drm-misc-fixes first.
10:31tzimmermann: we can than backmerge into drm-mics-next
10:31tzimmermann: tomba, which commit needs to be cherry-picked?
10:32sima: yeah cherry-picks should be only for misplaced patches, not the default approach
10:33tomba: tzimmermann: 87f36e03c0f1 and c72211751870. I pushed them to drm-misc-next before rc6, but I guess I was still late for the feature freeze.
10:40tomba: tzimmermann: do you mean that it's better to merge fixes to drm-misc-next-fixes even outside the feature freeze? or only when in feature freeze, or close to it?
10:41tzimmermann: tomba, only when in feature freeze
10:42tzimmermann: tomba, these commits would have deserved to Fixes tag
10:42tzimmermann: tomba, shall i cherry-pick them into drm-misc-next-fixes. i'd add the Fixes tag then
10:42tzimmermann: ?
10:43tomba: tzimmermann: uh that is true. I'm not sure how I missed the fixes tag...
10:43tomba: tzimmermann: yes please
10:43tzimmermann: ok
10:45tomba: Is there a way to know when we are in feature freeze? https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html just says "occurs after -rc6".
10:47tzimmermann: tomba, -rc6 is the last tag that still receives new features from drm
10:48tzimmermann: any change must be drm-misc-next (or a similar per-driver tree) in the week before -rc6
10:48tzimmermann: maybe wednesday at the latest
10:48tzimmermann: PRs usually go out on thursdays and fridays. -rc6 happens on sunday
10:49tzimmermann: it's much harder to pick the correct branch during the merge window. i still haven't fully figured that out
10:50tzimmermann: i think drm-misc-next-fixes closes when the linux release happens
10:50tomba: tzimmermann: ok. so me pushing fixes on Saturday, and rc6 tagged on Sunday, didn't quite fit in =). but is there a way for me to know, between -rc5 and -rc6, when should I push a fix drm-misc-next and when to drm-misc-next-fixes?
10:51tzimmermann: before wednesdays, -misc-next is safe
10:51tzimmermann: mondays after -rc6, it should be drm-misc-next-fixes
10:51tzimmermann: for the time in between, it's probably best to not push fixes
10:52tzimmermann: dmr-misc-next might be done already, but drm-misc-next-fixes hasn't yet catched up
10:54tomba: would it be possible to communicate the phase with tags? say, "if branch X has tag Y, but Linus hasn't tagged the final release, it's feature freeze".
10:55tzimmermann: no idea
10:55tzimmermann: maybe dim could warn about that
10:59tzimmermann: tomba, i've cherry-picked the commits
11:00tomba: tzimmermann: thanks!
11:01tzimmermann: sure, np
11:01pq: I had no idea that epoll would be *that* broken vs fork. OTOH, I believe the only good use of fork is to exec ASAP in the child.
11:02DragoonAethis: pq: Quite a lot of software forks without execve, web servers especially like it
11:03emersion: sh(1) as well
11:03pq: DragoonAethis, I guess the first thing they guarantee is that there never is more than one thread is a process?
11:03pq: *in
11:03pq: I'm just reading https://lists.freedesktop.org/archives/dri-devel/2024-May/452958.html
11:04DragoonAethis: Nope, most of them are heavily threaded in worker processes too
11:05DragoonAethis: Another fun use case: Factorio (factory building game) on Linux can fork off to a separate process to implement non-blocking saving
11:05pq: DragoonAethis, how do they avoid the twilight zone of needing to be async signal safe in the child?
11:05pq: IOW, random deadlocks inside e.g. libc
11:05vsyrjala: hwentlan_: agd5f: can i get an rb or ack for https://lore.kernel.org/dri-devel/20240408190611.24914-2-ville.syrjala@linux.intel.com/ ?
11:07DragoonAethis: pq: No idea, but AFAIK fork spawns a single new child thread, and fork explicitly requires you to use async-signal-safe calls only in child processes from that point on: https://www.man7.org/linux/man-pages/man7/signal-safety.7.html
11:07pq: exactly, so can they pull that off? :-)
11:08pq: *so how can they
11:09pq: things like malloc() and fwrite() are no-go there
11:09DragoonAethis: Hmm, I can't say how web servers etc do it, but I worked on a project once that forked off into worker processes to handle incoming clients
11:10DragoonAethis: And tbh things just worked, I didn't care much about signal safety and what not, but there was effectively no shared state between the controller/worker processes
11:10DragoonAethis: It was a C++ project too
11:10pq: libc probably has some mutexes inside, so if you manage to fork just at the time when any of those mutexes are locked...
11:11DragoonAethis: It was doing the polling in the controller process and worker processes just got an exclusive handle to work with from that point on
11:12pq: the thread that locked the mutex won't exist in the child, so nothing will release the mutex
11:13DragoonAethis: Unless they do some cleanup in the fork wrapper as well
11:13DragoonAethis: So that the child process automatically does some internal libc cleanup
11:13pq: somehow I doubt that
11:14DragoonAethis: https://git.musl-libc.org/cgit/musl/tree/src/process/fork.c
11:14DragoonAethis: Yeah it does that, for a few of these at least
11:14DragoonAethis: Not all though
11:14pq: if the clean-up was trust-worthy, then why the requirement for async-signal-safe...
11:17DragoonAethis: Because libc is a bag of evil no-good APIs that can't be reasonably cleaned up, and somebody just needed to have a Exit Jail Free card in the docs so that proper cleanup did not have to be implemented?
11:18sima: tomba, tzimmermann I wouldn't worry too much, since if something is misplaced we can fix it with a cherry-pick
11:18sima: it's a few days out of 2.5 months after all
11:18pq: or because different libc work differently here
11:18sima: picking the right branch after freeze seems to be more tricky anyway
11:19DragoonAethis: pq: https://github.com/bminor/glibc/blob/master/posix/fork.c
11:19DragoonAethis: >Although POSIX has dropped async-signal-safe requirement for fork (Austin Group tracker issue #62)
11:20sima: imo worry so much that you delay pushing a fix is worse, since then it might get lost entirely
11:20sima: and that happened plenty enough
11:20heat: you only need async-signal-safe if you're forking a MT process, because the lock state (and anything locked) will be messed up
11:22pq: heat, yeah, and many libraries use threads under the hood.
11:22heat: the code you're looking at (that releases a bunch of locks) is a best-effort thing, see https://git.musl-libc.org/cgit/musl/commit/src/process/fork.c?id=167390f05564e0a4d3fcb4329377fd7743267560
11:23heat: basically "we would rather crash rarely vs deadlock consistently"
11:25pq: DragoonAethis, I think your link is talking about making fork() async-signal-safe. As in, calling fork() from inside a signal handler.
11:26DragoonAethis: pq: Yeah, you're right, sorry
11:39Company: robertmader[m]: Have you ever tested YUV playback with GTK's Vulkan renderer on non-intel non-amd?
11:40Company: I'm trying to confirm that this code has seen usage outside of radv
11:41Company: context is https://gitlab.freedesktop.org/mesa/mesa/-/issues/11125
11:53robert_mader: Company: writing again in IRC, not Matrix...: no, unfortunately not - but I got a RPi5 and soon a RK3588 which should allow me to test it in a while.
11:53robert_mader: P.S.: love the image you choose in the issue :)
11:53robert_mader: Company in theory it should be possible to make things work on Nvidia if we get https://github.com/elFarto/nvidia-vaapi-driver to work with Gst
11:53robert_mader: Or once the Gst vulkan decoders support dmabuf export
11:53robert_mader: Finally, one could test it on Qualcomm devices when building Gst with https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/6114
11:57Company: I just wanted to know how confident I should be in my code being correct
11:57Company: so that I know how authoritative I want to come across when arguing with dj-death ;)
11:58robert_mader: I mean - the fact that Intel is currently broken is evident by the big warning, no?
11:58Company: yeah, that's not in question
11:58Company: the question is more if GTK is doing the right thing - say if a fix comes up that doesn't work with GTK - is that likely a bug in anv or in GTK
11:59Company: and if GTK is only tested against radv, it might very well be a GTK bug
11:59Company: that was the way I was thinking
12:04robert_mader: Meh, unfortunately it won't be particularly easy to test on other platforms for now. I guess we can't count on panvk to support it, v3dv probably should but the HW decoder on the RPi5 is weird - Qualcomm might be the best bet for now. Assuming that lavapipe is out of question.
12:06Company: lavapipe can't do dmabufs afaik
12:07Company: software rendering not doing dmabufs was always one of my big complaints
12:15DemiMarie: Is there a way to validate a dmabuf from a potentially malicious client, such that if it passes, it won't cause undefined behavior to use it?
12:17DemiMarie: The use-case is virtualization without copying buffers back to the CPU.
12:19Company: I would hope so, because that's how flatpak apps render.
12:32pq: DemiMarie, I believe there should never be undefined or explosive behavior, it is up to the import API to ensure that.
12:33pq: DemiMarie, if the dmabuf is a restricted buffer where a wrong access would cause a machine check fault and reboot or whatever, it is the kernel drivers' responsibility to make sure any attempt of such access is prevented before it can happen. IOW, fail the import.
12:34pq: reality can be lacking and buggy, of course
12:35pq: DemiMarie, I guess you'd want to have explicit fences, either straight from the client or extracted from dmabuf, so your own rendering won't be stalled.
12:37DemiMarie: pq: what about shadow compressed state on Intel?
12:37pq: I dunno, what about it?
12:38pq: why would it be an exception?
12:38DemiMarie: I recall some Vulkan API not doing proper validation and potentially causing GPU hangs.
12:39pq: and you want to work around driver bugs in your userspace program?
12:42pq: I feel like it should be easier to fix the driver and distribute that instead of coming up with a reliable out-of-driver check.
12:44Company: you could also just not advertise the critical modifiers (or formats) across process boundaries
12:45Company: that assumes you know which ones those are ofc
12:46Company: but flatpak/compositor has that problem, kernel/anything has it, and browser web process/ui process does, too - so it's not just vms
13:04MrCooper: Company: FWIW, lavapipe supports VK_EXT_external_memory_dma_buf now: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27805
13:09Company: that would mean if I compile me a mesa git and force lavapipe, I should get dmabuf support?
13:09Company: let's see what happens
13:12Company: that's also a neat way for testing GStreamer dmabuf negotiation I guess
13:16penguin42: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29105 says 'pipeline waiting for manual action' - what's that about - it seems to let me click the buttons to kick individual tasks to run
13:17agd5f: vsyrjala, feel free to add my RB
13:23Company: MrCooper: hrm, that's still using udmabuf and my udmabuf (F40) is not user-writable
13:24Company: (or user-readable for that matter)
13:30MrCooper: udmabuf seems to be the intended long-term solution for this
13:39MrCooper: jkhsjdhjs: https://patchwork.freedesktop.org/patch/593130/ should fix the SIGBUS issue
13:57jkhsjdhjs: MrCooper: yep thank you! I tested the patch on 6.9-rc7 and it seems to work fine
14:01MrCooper: cool
14:02cwabbott: ManMower: while looking at this I realized that upstream weston-rdp doesn't work with vulkan because it doesn't support dma-buf
14:03cwabbott: this seems to work, just copy-pasting the dma-buf stuff from the drm backend (minus feedback and direct display):
14:03cwabbott: https://www.irccloud.com/pastebin/fdnkjm4I/
14:03cwabbott: but I have no idea if that's "proper" or not... is there a reason it hasn't been done already?
14:04ManMower: my guess is that it was just forgotten :)
14:05ManMower: though maybe it's problematic for multi-backend situations.
14:05cwabbott: fwiw, on the hw I'm working on there's no GL except through zink
14:06ManMower: the future is now
14:06cwabbott: :)
14:06cwabbott: so no graphics support at all except through dma-buf
14:07ManMower: most of my rdp testing lately has been multi-backend, so the drm backend would've initted that for me and I'd not have seen that problem.
14:09cwabbott: we've been using weston-rdp a lot recently because you really don't want to actually use a device like this as your daily driver, and display is often broken
14:10ManMower: cwabbott: I think that new bit should just be in the switch statement later in that function undeer the WESTON_RENDERER_GL case
14:11cwabbott: ManMower: is there a reason it's not that way in the drm backend?
14:12cwabbott: it has a similar switch-case but calls linux_dmabuf_setup() later
14:12ManMower: drm backend is a "primary backend" exclusively, so it should always be first to init and doesn't have to check
14:13ManMower: is weston's multi-backend stuff useful to you at all? you can light up the drm output and have the rdp output show the same thing...
14:14cwabbott: right now my only drm output is a phone display
14:14ManMower: that is not my preferred way to look at the matrix.
14:14cwabbott: I don't really want my desktop to have those dimensions...
14:14ManMower: gotcha
14:17cwabbott: in general weston-rdp headless is really useful for bringing up stuff when you don't have display working yet, or don't want to dedicate a monitor to the device
14:18cwabbott: and of course for newer devices it's going to be more and more zink-only, at least at first
14:49vsyrjala: agd5f: thanks. ok if i push that into drm-misc-next?
14:49agd5f: vsyrjala, yes, please go ahead
15:01mlankhorst: agd5f: Hey, can I get some discussion going again on cgroups? https://lists.freedesktop.org/archives/dri-devel/2024-May/452150.html
15:01mlankhorst: Obviously appears to be a need for it
15:04agd5f: mlankhorst, sure. Will take a look. Seems like every time we try and do this through, it gets shot down by the cgroups maintainers
15:17MrCooper: JoshuaAshton: in wsi_explicit_sync_free_levels[], what's the reason for preferring (WSI_ES_STATE_RELEASE_MATERIALIZED | WSI_ES_STATE_ACQUIRE_SIGNALLED) over (WSI_ES_STATE_RELEASE_MATERIALIZED | WSI_ES_STATE_RELEASE_SIGNALLED) ?
15:59penguin42: is the fdo gitlab having a bad day ? I'm getting a 'Permission denied (publickey)' when it was working earlier
16:02penguin42: oh yeh, I see someone saying it on list
16:35DemiMarie: Is there hope for Linux getting to where Windows is, where there is no implicit sync at all and everything is done via explicit sync?
16:36jenatali: FWIW, GDI on Windows still operates with implicit sync
16:42zackr: i'm looking at igt's prime_mmap_kms.c (https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/prime_mmap_kms.c?ref_type=heads) which sets up the crtc, then grabs a prime fd for the framebuffer, then calls paint that just mmap's the dma-buf and writes to it, then it seems to expect the framebuffer to magically update (https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/intel/prime_mmap_kms.c?ref_type=heads#L226)
16:43zackr: does anyone know if that's actual dma-buf/kms expected behavior? meaning grabbing a prime fd for the active framebuffer and writting to it is expected to update the screen without any explicit flips in there
17:06MrCooper: DemiMarie: not realistically, we can't break backward compatibility with user space which depends on implicit sync
17:07DemiMarie: MrCooper: will it ever be possible for userspace that _does_ support explicit sync to be able to use features like userspace fences that don’t have completion guarantees?
17:08emersion: zackr: that's called front-buffer rendering
17:08emersion: i'm not sure KMS drivers are required to support it
17:09emersion: in particular, some KMS drivers might not realize that the buffer changed
17:09emersion: (especially things like GUD)
17:11zackr: emersion: yea, i'm not sure if that's the expected behavior. i can do dirty tracking for this stuff for vmwgfx but it's a pain so if that behavior is not expected i'd prefer to avoid it
17:13emersion: there's also PSR for instance
17:14emersion: by "dirty tracking", do you mean FB_DAMAGE_CLIPS and/or the dirtyfb ioctls?
17:14emersion: or something else?
17:14emersion: i would expect user-space doing front-buffer rendering to flip with the same FB after the buffer has be written to, personally
17:18zackr: emersion: for vmwgfx "dirty tracking" is a combination of all that plus implicit magic to cover up for the fact that the gem buffer/dma-buf isn't really a gpu buffer but a shadowed copy that will have to be updated (readback if written to on the gpu, upload when changed on the cpu)
17:18zackr: which is why front-buffer updates are a pain. we need to track the updates, then notice that the given buffer is actually a front-buffer then upload that immediately to be able to present it
17:18emersion: do you somehow detect that someone did a SYNC_END IOCTL?
17:19emersion: maybe sima knows what's expected here
17:21zackr: well, tbh currently we're broken wrt to this (meaning prime and fb updates). for a similar behavior on coherent surfaces we just page-fault on maps, explicitly mark whatever was accessed as dirty internally and then we scan command buffers and if we see that something has been used and is dirty we update it
17:23zackr: if we had more igt tests for dma-buf/kms/sync's that'd make it obvious. although not sure if we'd want igt to be the reference for expected behavior
17:24penguin42: (the gitlab ssh seems to have woken up)
17:24daniels: the prime user for frontbuffer rendering is xserver, though it's not (unless it's got weirder whilst I haven't been looking) mmaping a dmabuf to do that
17:28zackr: tbh, i'm fishing a little bit. i know the new kde/kwin is broken on vmwgfx and it's definitely dma-buf/prime related. i know that the DRIVER_ANY tests in kms pass on vmwgfx, so i'm just looking at driver specific tests to see if i can dig up something to test that doesn't involve debugging a whole desktop
17:28emersion: daniels, dumb buffer mmap?
17:29emersion: wth or without a flip after dirtying the mmap?
17:30daniels: I think yes, and without
17:35zmike: linyaa: can you test https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29108 and confirm that it solves your issue
17:43emersion: daniels, do drivers have a way to know when user-space has mutated the dumb buffer?
17:57daniels: emersion: I can't remember if flipping is mandatoary or not tbh
17:59Company: dj-death: you got the reproducing working for !11125 ?
18:01Company: DavidHeidelberg: your PASS => FAIL thing in !9989 is not related to F39=>F40 which had the change to qsort() behavior in libc?
18:02jannau: flipping is not mandatory and it doesn't happen with the modesetting driver. This breaks X applications on apple silicon since the display driver has not vsync interrupt. That results fast animations since everything relies on vsync sequences for timing
18:03emersion: maybe we should have a CAP for this
18:04emersion: or… just make user-space flip
18:04emersion: jannau: does modesetting use dirtyfb oictls?
18:04emersion: ioctls*
18:08DavidHeidelberg: Company: wow. this. Could be the difference....
18:08Company: yay?
18:09DavidHeidelberg: now to find, when Debian did it
18:09DavidHeidelberg: because it worked for me before, but then it stopped (and I'm using testing/unstable)
18:09DavidHeidelberg: MesaCI is on Debian stable
18:09DavidHeidelberg: and there it works
18:09Company: what's Intel CI on?
18:10DavidHeidelberg: Company: they use apt, but question is which ver
18:10jenatali: Ugh, qsort has been a nightmare for Windows/MSVC too, causing all kinds of app compat bugs
18:11Company: jenatali: you get partial credit for that anyway, because you told me about that change back when
18:11jannau: emersion: it does: https://gitlab.freedesktop.org/xorg/xserver/-/blob/master/hw/xfree86/drivers/modesetting/driver.c
18:13emersion: jannau: so the driver could simulate a flip after the dirtyfb ioictl?
18:14jannau: since tearfree was merged it even does real flips if the option is enabled
18:17jannau: yes, we probably could do a page flip after in the dirty update assuming that a flip with the same framebuffer does not confuse the firmware
18:19dj-death: Company: yeah, but only on TGL/ADL
18:19dj-death: Company: I already spotted one bug
18:19dj-death: Company: still digging
18:19dj-death: DG2+ is not affected
18:20Company: I am on TGL
18:20Company: but had reports from somewhere else
18:20dj-death: I assumed, or something older
18:20Company: yeah
18:20dj-death: all older ones should be affected in a similar way
18:21Company: I just assumed everyone is affected because of the debug messages it prints
18:23mattst88: Company: what is the qsort behavior change? (is it related to the security issue from a couple of months ago?)
18:23Company: mattst88: equal items are ordered differently now
18:24Company: mattst88: I think it's related to https://www.phoronix.com/news/Intel-AVX-512-Quicksort-Numpy making its way into glibc?
18:24Company: not sure though
18:28mattst88: interesting.
18:28mattst88: I'm guessing https://sourceware.org/git/?p=glibc.git;a=commit;h=709fbd3ec3595f2d1076b4fec09a739327459288 is relevant
18:29mattst88: looks like that commit (and the one it references) were both in glibc-2.39
18:29Company: so they reverted it again?
18:29janesma: Intel CI is on debian testing
18:30Company: so that in a year when we're all back to the old sort, MesaCI switches to the broken one?
18:30janesma: Company ^
18:30Company: DavidHeidelberg ^
18:30DavidHeidelberg: janesma: I'm on testing/unstable too.. so that would explain it
18:31Company: so CTS is comparing something as equal where the order actually matters?
18:31DavidHeidelberg: zmike: what do you run the tests on?
18:31zmike: define "what"
18:31DavidHeidelberg: system and version
18:32janesma: our docker containers have libc-bin/testing,now 2.37-15 amd64 [installed,automatic]
18:33zmike: f39
18:33Company: Fedora 39 has glibc 2.38 and the change was in glibc 2.39 I think which is Fedora 40
18:35janesma: DavidHeidelberg: isn't the pbuffer config selected at deqp compile time? we build with -DDEQP_TARGET=x11_egl
18:35DavidHeidelberg: I used x11_egl locally. I'll try to build it with mesaci config
18:35janesma:expected recompiling deqp was the action that changed the behaviour
18:36zmike: I'm not using surfaceless locally
18:36janesma:attempted pbuffer/surfaceless in the distant past and found that the configs were *not* the same
18:36zmike: but if it really is a glibc change breaking this then...
18:37DavidHeidelberg: ..then we'll need fix CTS :D
18:37janesma: it would be lovely if we didn't have to launch X, but the test suite was not in a position to test without it.
18:37DavidHeidelberg: janesma: you can use weston + xwayland :) in MesaCI it's usually less buggy than X
18:38Sachiel: it runs into the same issues as plain x
18:42DavidHeidelberg: janesma: hmm, nope, I build it as MesaCI and still getting the fail
18:42Company: zmike: time to dnf install --releasever=40 glibc and find out!
18:43zmike: that sounds like a project for tomorrow's me
18:46DavidHeidelberg:section: dark humor; task: break previously working stuff
19:07penguin42: looks like the panfrost-g52 in CI is toast
19:14daniels: penguin42: it's been disabled
19:15dj-death: Company: fixed
19:15dj-death: Company: will upload the MR shortly
19:16dj-death: let it be known I used the most advanced debug tool to find the issue : printf
19:17Company: debugging video is like that
19:23dj-death: Company: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29111
19:37DemiMarie:wonders if all libcs should just use block sort
19:38Company: POSIX should just demand a stable sort
19:39penguin42: daniels: Ah, thanks; do I need to do anything with a pipeline in which it's already failed?
19:45DemiMarie: Company: yup, and block is a stable in-place comparison sort, hence my suggestion.
20:14DavidHeidelberg: Company: I tried downgrade to 2.38 without patches and sadly it still failed
20:16Company: boooo
20:18daniels: penguin42: nope, just reassign
20:24penguin42: daniels: Thanks
20:24daniels: np
20:29Company: DavidHeidelberg: just to make sure: nothing of those tools are statically linked, or?
21:16DavidHeidelberg: Company: rebuilt mesa and GL-CTS with 2.38-1 package and still failing
21:19Company: then I'm sorry to have wasted your afternoon
21:19Company: it was a good idea at least
21:23DavidHeidelberg: well, it's not like we have idea where is the issue :D so every idea can be the right one
22:14DavidHeidelberg: so, for me and Intel, it's the GL-CTS which doesn't validate it correctly
22:14DavidHeidelberg: I checked against modified eglinfo https://gitlab.freedesktop.org/mesa/demos/-/merge_requests/176 and the output is correct, but GL-CTS expectation is ... suddenly wrong
22:15DavidHeidelberg: Is there someone with exhaustive ability to see if there is something uncertain within this patch: https://github.com/KhronosGroup/VK-GL-CTS/commit/88ba9ac270db5be600b1ecacbc6d9db0c55d5be4 ?
22:16DavidHeidelberg: SOMETHING has to be undefined or random enough, when it gets compiled in one set, it validate correctly, but fails when compiled with different options/libs/arch/whatever
22:17DavidHeidelberg: From my examination, the "Expected" state seems to ignore this patch 100%. While the "Got:" part seems to be correct (at least regarding to EGL_CONFIG_SELECT_GROUP_EXT behaviour
22:27Sachiel: what test fails?
22:31DavidHeidelberg: ok, I think I have it. the GL-CTS check for the extension, and if it's not there, it completly ignores it
22:31DavidHeidelberg: sample one: ./modules/egl/deqp-egl -n "dEQP-EGL.functional.choose_config.simple.selection_and_sort.buffer_size"
22:32DavidHeidelberg: but whole group fails
22:33DavidHeidelberg: So in my build I don't see EGL_CONFIG_SELECT_GROUP_EXT exposed anymore
22:33Company: DavidHeidelberg: Not that I want to beat a dead horse - but that's C++ and using std::sort() - did libstdc++ have a similar issue maybe?
22:34jenatali: std::sort isn't guaranteed stable, there's std::stable_sort if you want stability
22:34Company: yeah, that's what I was about to suggest
22:34Company: switching to std::stable_sort for testing
22:40DavidHeidelberg: I see EGL_KHR_debug (one line above), but I don't see EGL_EXT_config_select_group
22:42Sachiel: how does that work? Does the driver the X server is using need to know about the extension too or is just a client thing?
22:45DavidHeidelberg: ... or maybe it always worked with GLVND disabled, but not enabled. We'll see after compilation finishes
22:46DavidHeidelberg: YUP. GLVND on/off is the difference.
22:48DavidHeidelberg: because with GLVND on it doesn't report the extension. So GL-CTS says.. I don't care... and this tests fails
22:48Sachiel: the test should probably check for the extension then
22:50DavidHeidelberg: it does.
22:50DavidHeidelberg: But mesa compiled with GLVND won't report it
22:51DavidHeidelberg: I verified I run with correct mesa, formats are listed, but the EXT is not reported
22:51DavidHeidelberg: that's why it worked for some people and setups.. where GLVND was not used
22:52zmike: so glvnd needs to explicitly add support for this or something?
22:52Company: sweet, progress!
22:52Sachiel: what happens with the cts for anything that doesn't support the extension?
22:52DavidHeidelberg: I need to figure out, why the extensions is not added there, the processing is different than on non-GLVND workflows
22:52DavidHeidelberg: Sachiel: it just do the test old way
22:53DavidHeidelberg: with the extension, the GL-CTS account for our extension for the sorting of visuals (so allows us put some extra visuals at the end of the list)
22:53DavidHeidelberg: but without EXT, it would scream and say there is some irregularity, as it does now for GLVND
22:54Sachiel: ah, the failure is the cts thinking there's no extension while the driver is using it, ok
22:54Sachiel: GL land is fun
22:54DavidHeidelberg: I though GLVND may strip last string from the array (something like \0 problem), but it seems to not to be the case (as I added extra entry which is not accounted anyway in GLVND mode)
22:54DavidHeidelberg: :D
22:55zmike: 🤕
22:55DavidHeidelberg: janesma: btw. maybe you could disable GLVND for CI, do you find some usefulness there?
22:57zmike: well it SHOULD work with glvnd
22:57zmike: what's going on there?
22:58karolherbst: this reminds me of the issue, that some applications used a fixed max buffer for the ext string and at some point mesa reported too much
22:59Sachiel: maybe someone should write GL_EXT_report_extensions_in_a_sensible_manner
22:59DavidHeidelberg: interesting, I compiled mesa without glvnd, now with and I see both extensions (I added one extra).
23:00DavidHeidelberg: I removed the extra one EXT I added for testing, recompiled. Nothing, still see it. re-run meson setup, recompiled, still see it
23:01DavidHeidelberg: maybe I need to study glvnd a bit more, but still that's weird it happens also in Intel CI then, because it should be a clean build
23:02zmike: seems like they use glvnd
23:02zmike: if it's failing
23:02DavidHeidelberg: that's for sure
23:03DavidHeidelberg: thats why asked janesma whats their motivation to use it in CI for one vendor one driver
23:04Sachiel: to test what distributions will use
23:04airlied: yeah we should probably prefer glvnd testing
23:05airlied: since I don't think anyone actually ships non-glvnd configuratios
23:55DavidHeidelberg: I opened https://gitlab.freedesktop.org/glvnd/libglvnd/-/merge_requests/294 I hope it's right place, so far for GLVND builds enabled, I haven't found a way how to sneak it there than add it into GLVND
23:56janesma: thank you for investing the time to get to the bottom of this.
23:59DavidHeidelberg: good thing is, it'll work even without new glvnd, but the system will not know about it (generally no test was failing except the sort, so it shouldn't matter to real-world apps)
23:59DavidHeidelberg: janesma: thank you for helping :)