IRC Logs of #wayland on irc.freenode.net for 2023-07-04

06:04 vyivel: regarding https://gitlab.freedesktop.org/wayland/wayland-protocols/-/issues/108 and https://gitlab.freedesktop.org/wayland/wayland-protocols/-/issues/109, can it be specified that acking a configure while uninitialized is an error (xdg_surface.error.not_constructed), or would that be a breaking change?
06:46 wlb: weston Merge request !1301 merged \o/ (tests: Initialise breakpoint list for all test types https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1301)
06:46 wlb: weston/main: Daniel Stone * tests: Initialise breakpoint list for all test types https://gitlab.freedesktop.org/wayland/weston/commit/17e4a778310a tests/weston-test.c
08:22 pq: Reading the GPU reset discussion on dri-devel@, if gfx API starts ignoring calls, leading to application window looking as if it was frozen, is there anything a compositor could or should do to help the end user? E.g. a dialog "This app seems to be stuck. Do you want to: - try to close it gracefully, - terminate it?"
08:22 pq: Ignoring gfx API calls means the the Wayland client is not stuck from protocol perspective, it just happens to not be posting any new surface contents.
08:23 pq: I can't think of a really good way to detect that situation in a Wayland compositor.
08:25 orowith2os: pq I don't think a compositor has any sane way to know anything's going on, realistically?
08:25 emersion: what would eglSwapBuffers do?
08:25 pq: not without some kind of "send me a new buffer now" event, anyway
08:25 emersion: a commit with no new buffer?
08:25 emersion: what if the compositor wants to resize?
08:25 pq: emersion, good question.
08:26 emersion: it seems like it could pretty easily end up with a protocol error
08:26 orowith2os: all it can do is see if it responds to pings across Wayland - unless somehow that stops responding too, which sounds more common?
08:26 orowith2os: based on my current knowledge
08:27 orowith2os: what are the chances that OpenGL or Vulkan dies, and the window containing their contexts doesn't stop functioning and responding?
08:27 pq: I'm thinking of the UX here which would be horrible: the window look frozen but the app is still acting on all input as it would normally
08:28 emersion: right. so from UX PoV, it would be best to detect the reset as soon as it happens, and then gray out the window
08:28 pq: what would stop the end user from clicking all over the place before they try to close the window? Possibly doing things they never intended.
08:28 emersion: also, the reset can be per-context, so per-window, rather then per-process
08:29 emersion: pq, if the compositor greys out the window, it can also block input
08:29 pq: I don't think the compositor would know if the app is actually using robustness and has handle the reset already
08:29 emersion: i'm purely talking from a UX PoV for now
08:29 emersion: ie, ignoring current APIs and technical limitations
08:29 pq: yes, that would be ideal, but how could we get there?
08:30 emersion: so, to react, the compositor needs some kind of signal
08:30 pq: or a better question: is this a problem worth solving?
08:30 emersion: i'm not sure
08:30 pq: me neither
08:30 jadahl: seems worth solving IMO. a frozen window with reactive invisible buttons seems bad
08:30 emersion: i've seen GPU resets only affect the whole GPU for now
08:31 emersion: jadahl: but have you seen this in practice so far?
08:31 jadahl: emersion: personally no, I have an intel GPU and the only hiccups I have are crashes and deadlocks deep in iris code
08:32 emersion: any user bug reports?
08:32 jadahl: mostly nvidia proprietary and amd, but we don't really handle resets very well
08:32 emersion: all i've seen so far from my side (personally and bug reports) are whole GPU reset
08:32 jadahl: yea, whole gpu
08:32 pq: I've seen games on Proton/Xorg get stuck leading exactly to this UX, and I suspect it's not a GPU hang or reset.
08:33 jadahl: but lets say we have a opengl client that doesn't know how to handle resets. how do we know a surface invalidation response has content from a new and shiny context?
08:33 pq: I'd assume gfx APIs refuse to post a buffer if they cannot actually draw it.
08:34 emersion: jadahl: there cannot be content from an old context, since the old context is gone
08:36 pq: maybe a compositor would take re-using old buffers as not having handled the reset
08:37 pq: or does reset handling not require re-creating gfx API surfaces too? e.g. EGLSurface
08:37 emersion: pretty sure it does
08:37 emersion: the GPU memory is gone
08:38 emersion: the whole EGL context needs to be re-created, and EGLSurface depends on the EGLContext
08:38 emersion: i haven't checked vulkan
08:38 emersion: (but would assume something similar)
08:41 jadahl: emersion: so if a new buffer is attached, it should be fin ethen
08:41 emersion: i think so, yeah
08:41 jadahl: just need to verify it's actually *new* and not attaching some old buffer
08:41 jadahl: so if it is dumb and re-attaches an old buffer, it must be continued to be greyed out
08:42 jadahl: only that we won't know that because it might have attached new valid buffers before the invalidation.. hmm...
08:43 emersion: vulkan: the whole logical device is lost, which the swapchain depends on
08:44 emersion: (https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#devsandqueues-lost-device)
08:44 emersion: right, the ordering between compositor and client reset notifications is undefined
08:47 jadahl: an easy way out is to go on about that it's the responsibility of the client to have an up to date graphics context etc and if doesn't support that don't implement that extension
08:47 jadahl: for wl_shm ones it's trivial since the CPU is the graphics context
08:48 jadahl: but then I guess there is a chance that the compositor sees the reset, sends an invalidation, before the client knows about the reset
08:53 pq: emersion, does an EGLSurface actually depend on an EGLContext?
08:53 pq: eglMakeCurrent can mix and match arbitrarily
08:55 emersion: ehhh
08:55 emersion: > Any EGL rendering context that was created with respect to config can be used to render into the surface.
08:55 pq: it kinds depends on EGLConfig, yes, but... :-)
08:57 pq: emersion, a GPU reset might not invalidate *all* GPU memory. But I think it should invalidate all memory a failing context may have been writing to. OTOH, the allocation might still persist and only contents are lost, which might be remedied by simply writing it all again, meaning the wl_buffer is still the same. Something would need to not do such shortcuts.
08:59 emersion: pq, eglMakeCurrent will report EGL_CONTEXT_LOST
08:59 emersion: but yeah the EGLSurface may continue to work with another context
08:59 emersion: in particular, with a fresh new context
09:01 emersion: i don't know enough to say whether a buffer can survive a context loss
09:01 pq: exactly, and are buffers associated with EGLSurface, not context?
09:01 emersion: buffers are not exposed
09:01 emersion: it's all hidden inside the EGL impl
09:01 pq: yeah, but they exist
09:01 pq: does the app needs to destroy EGLSurface to ensure no old buffers are re-used?
09:02 emersion: right, so you mean EGL impls store buffers per-surface
09:02 pq: does context lost require apps to destroy also EGLSurfaces?
09:03 pq: if yes and yes, then a compositor could be certain that when it sees a brand new buffer, it cannot come from the failed client context
09:04 jadahl: pq: how can it know the "brand new" buffer is from just before or just after the gpu reset?
09:04 pq: no and yes would be fine too. Yes and no would not.
09:05 pq: jadahl, exercise left to the reader :-p
09:05 jadahl: :P
09:15 MrCooper: the compositor can't know if the client has created a new context, or even if the client needs to create a new context
09:25 wlb: weston Issue #769 opened by Marius Vlad (mvlad) xwayland _NET_WM_STATE_ABOVE property / gitk crash relative & abs positioning https://gitlab.freedesktop.org/wayland/weston/-/issues/769 [XWayland]
09:33 wlb: weston Issue #770 opened by Marius Vlad (mvlad) Timeline debug scope crash at start-up / begin fence sync https://gitlab.freedesktop.org/wayland/weston/-/issues/770 [GL renderer]
10:14 pq: emersion, swick[m], what's the design for libdisplay-info high level API returning structs? Should I store them in struct di_info to have implicit lifetime, or let the caller free() them, or?
10:25 emersion: i think it depends whether they are the same for the whole lifetime of the di_info
10:25 pq: they are the same
10:26 emersion: if that's the case, we've usually just stored them in di_info/di_edid/etc and then returned a const pointer… but now that i think about it…
10:26 emersion: this case is a tad different, since the struct is computed from di_edid
10:27 pq: from di_info, yeah
10:27 emersion: returning an alloc'ed struct would work, but if we add some more alloc'ed fields in there, then we'll leak, so would probably need to expose a _destroy() function for it too
10:27 pq: yes
10:28 emersion: taking a pointer an filling it would cause more issues, ABI-wise and alloc-wise
10:28 emersion: i think the simpler solution is still to return a const pointer, with everything in it owner by di_info
10:28 pq: I could store a pointer in struct di_info that is NULL until the getter is caller, and automatically all freed when di_info is destroyed?
10:28 emersion: owned*
10:29 emersion: yeah, that would work too
10:29 emersion: and would just be an impl detail, the API would be the same
10:29 pq: yup
10:30 emersion: i suppose one detail is whether this function would return NULL on failure
10:30 emersion: if the struct is alloc'ed, it needs to be able to return NULL
10:31 emersion: if the struct is embedded in di_info, and the function cannot fail, it can guarantee that it never returns NULL
10:31 emersion: but this also restricts our future API extensions
10:31 emersion: as always, it's a balance
10:32 emersion: i'd be fine with it either way
10:32 pq: we have precedence of high level API returning NULL already
10:32 pq: *precedent
10:33 pq: do you mean the caller should be able to tell the difference between alloc fail and no info?
10:33 emersion: hm, no
10:34 pq: I*d pick dyn alloc, store pointer in struct di_info for automatic free, and doc the API maybe returning NULL.
10:35 emersion: i mean that if the function never returns NULL, it'd save callers from doing NULL checks (at the cost of restricting our extensibility)
10:35 emersion: yeaah, that's fine by me
10:36 pq: cool
10:36 emersion: also, fwiw, while we are on this topic… i've been wondering whether i want to guarantee ABI stability for my libraries across major and minor releases
10:36 emersion: (API stability for sure, but ABI is different)
10:38 emersion: the consequence would be that dependant binaries would need a rebuild for major and minor library upgrades
10:39 emersion: and that it would allow us to be less constrained when designing APIs (especially when structs are fed to the library)
10:40 emersion: but yeah, it's an Unpopular Opinion™
10:48 pq: you'd be using something else than semantic versioning if I understood right?
10:50 pq: I'd assume distributions would hate it as much as they hate bundling libs in apps.
10:57 pq: E-EDID... I guess priorities are DDDB > DI-EXT > base
11:00 emersion: semantic versioning is just about API AFAIK
11:01 emersion: not about ABI
11:01 emersion: so it's still semantic versioning:
11:01 emersion: - major versions break API
11:01 emersion: - major and minor versions break ABI
11:02 emersion: IOW, minor versions never require downstream code to change, just a rebuild
11:02 emersion: example ABI change which isn't an API change: adding a field to a struct
11:10 pq: I've always though the ABI is the most important thing to keep stable.
11:13 pq: maybe that's just a quirk of C again, where ABI and API can be distinguished
11:14 pq: https://github.com/semver/semver/issues/590 says "up to you"
11:16 davidre: Qt and KDE Frameworks for example keep ABI between Major versions
11:16 davidre: And we have a fun page that explains what you can and cannot do
11:17 emersion: pq, why is the ABI important to keep stable?
11:17 emersion: in all other languages, one doesn't need to concern itself with ABI
11:17 pq: emersion, so that distributions (and users) can upgrade libs without rebuilding the world.
11:17 emersion: badly phrased
11:17 emersion: but that's not an issue with Rust, or Go, or…
11:18 pq: yes, the C world is weird but it's also big
11:19 pq: yeah, it's not an issue if shared libs do not exist in the first place. You cannot update any single dependency without rebuilding a whole lot.
11:19 pq: I don't think you install any Rust or Go deps via distros?
11:20 kennylevinsen: you do, but it's a bit weird
11:20 kennylevinsen: I think that's why rust added support for rust shared libs
11:20 emersion: debian insists on packaging each and every Go dependency, as well as JS, etc
11:20 kennylevinsen: just like how some distros package python packages outside pip
11:24 pq: python packages I kinda understand since you never statically link anything... do you?
11:25 kennylevinsen: different technology, same intent
11:26 pq: Rust and Go are built on statically linking a monolithic executable, and anything aside from that is "you keep the pieces", isn't it? So "installing deps" is really just getting the sources, and not build artifacts?
11:27 pq: If "linking" and "build from source" are not distinguishable steps, then ABI in libs seems completely irrelevant.
11:32 kennylevinsen: Rust suports building shared libs for use as Rust dependencies
11:32 kennylevinsen: Not sure if any distro uses it currently, but I imagine it allows packaging similar to C shared libraries
11:33 pq: kennylevinsen, yeah, but I understood that it is also absolutely unstable to changes in anything.
11:33 pq: so it might as well not exist for practical purposes
11:33 pq: It's hard for me to argue in any direction here, because I am only assuming what distributions want, and I've been criticised of that before.
11:34 kennylevinsen: I suppose that issue goes away when the distro picks compiler and source, but it does start to feel like a tangent. :)
11:35 pq: yeah, and then every app uses a slightly different version of the lib :-p
11:36 pq: I guess the fundamental reason to do anything is to be able to address (security) bugs with minimum work.
11:37 emersion: note, bugfix releases can still guarantee ABI stability
11:38 pq: I suppose we're not talking about languages which do not meaningfully have shared libs like C does.
11:38 pq: so, yes
11:39 pq: I guess distros do exactly than, backport patched to libs themselves.
11:46 wlb: weston/main: marius vlad * backend-drm: Use resize_output to allow changing the fb https://gitlab.freedesktop.org/wayland/weston/commit/3044d8ed7259 libweston/backend-drm/drm.c
11:46 wlb: weston Merge request !1300 merged \o/ (backend-drm: Use resize_output to allow changing the fb https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1300)
11:51 emersion: what is a good test client with heavyweight rendering such that DMA-BUFs aren't ready when submitted to the compositor?
11:51 emersion: weston-simple-dmabuf-egl -m isn't enough it seems
11:52 emersion: (i get POLLIN immediately)
11:53 emersion: i would've asked MrCooper, but apparently not here
11:55 pq: the mandelbrot is not enough? tune the params there? shouldn't you be polling for writable?
11:58 emersion: i've tested with https://github.com/ascent12/compositor-killer
11:58 emersion: and that was enough
11:59 emersion: no, POLLOUT checks the exclusive fence, not the shared one
12:01 daniels: emersion: obnoxious-fbo-load should do a reasonable job at https://gitlab.collabora.com/daniels/texture-atlas-test/-/tree/master
12:01 swick[m]: https://blogs.gnome.org/shell-dev/2023/03/30/ensuring-steady-frame-rates-with-gpu-intensive-clients/
12:02 swick[m]: GpuTest apparently works well
12:02 emersion: "pix = 0xdeadbeefUL;"
12:03 emersion: ty for the suggestions!
12:09 pq: hmm, but if you want to read, shouldn't you ensure that the writer has finished? That is, check that no-one is holding the exclusive fence? There was something weird about this which may have been an AMD quirk. Polling per se does not "take locks" on the buffer, it's just a way to check if anyone else has a fence open.
12:16 emersion: AMD used to have a bug where POLLIN would always return immediately
12:16 emersion: but it's been fixed
12:17 pq: cool, that's was probably it
12:17 emersion: https://docs.kernel.org/driver-api/dma-buf.html#implicit-fence-poll-support
12:17 emersion: "Checking for EPOLLIN, i.e. read access, can be use to query the state of the most recent write or exclusive fence."
12:17 emersion: and of course i got it the other way around in my message above :<
12:18 pq: heh
12:23 Momentum: is weston usable as a desktop WM?
12:24 Momentum: for daily driving
12:35 MrCooper: emersion: e.g. GpuTest plot3d, as described on https://gitlab.gnome.org/GNOME/mutter/-/issues/1162
12:36 emersion: aha, thanks, Sebastian has said the same thing
12:37 MrCooper: emersion: FWIW, shared libraries must change SONAME on backward incompatible ABI changes, or distros will complain (which they might also if the SONAME changes too often)
12:37 emersion: yes
12:39 emersion: with my scheme, i'd bump SONAME on major and minor versions
12:39 daniels: Momentum: if your focus is really just on the apps, and all you need is a launcher and a clock, then sure. if you want more detailed stuff like network/audio/etc control, notifications, etc, then it doesn't have those features
12:40 Momentum: i see
12:40 Momentum: i thought those are generally not related to the compositor
12:46 MrCooper: emersion: don't expect enthusiasm for that from distros, e.g. they'll have to make sure a process doesn't end up linking in multiple versions of that library
12:47 emersion: same when a new major version is shipped
13:03 daniels: Momentum: some compositors allow you to add external panels/etc, but none of the libweston-based ones do
13:04 wlb: weston Merge request !1302 opened by Philipp Zabel (pH5) backend-vnc: make to finish frames with timestamps in the past https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1302
14:11 wlb: weston Merge request !695 closed (libweston: remove checks for invalid dmabuf)
14:38 kchibisov: Assuming I have 3 configure event [A, B, C] arrived in the sayed order, is it fine if I ack B, draw with B, and then on the next size of the iteration I'll ack C and draw with C?
14:39 kchibisov: s/next size/next loop iteration/
14:40 kchibisov: From what I read it's fine, but I'm not sure how sane it would be to do so. My case is: my loop and window are on the main thread, but I render from some other thread and commit a surface from it as well, thus acking a configure from it would be more robust solution from what I can see?
14:41 kchibisov: And the reason I use `B` is because I've started rendering operation with B, but got C while doing the rendering.
14:41 kennylevinsen: I don't see why not - seems no different than C racing with a single-threaded client busy acking and rendering for B
14:43 kchibisov: It's a bit weird to design an API in library for that though.
14:43 kchibisov: Since most libraries do ack unconditionally, but they assume single threaded sometimes.
14:56 wlb: weston/main: Derek Foreman * libweston: Build z_order_list after view_list https://gitlab.freedesktop.org/wayland/weston/commit/8a673efada16 libweston/compositor.c
14:56 wlb: weston/main: Derek Foreman * libweston: Build view list for all outputs at once https://gitlab.freedesktop.org/wayland/weston/commit/3ec2ebc7e274 libweston/compositor.c
14:56 wlb: weston Merge request !1287 merged \o/ (Build view list for all outputs at once https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1287)
15:17 wlb: weston/main: Daniel Stone * 20 commits https://gitlab.freedesktop.org/wayland/weston/compare/3ec2ebc7e274ed1ff647c7b6026c59e5b7196666...23ea8655085272df86418b580e63f648599dd7cb
15:17 wlb: weston Merge request !1285 merged \o/ (Create and destroy subsurface views at source, many view/surface cleanups https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/1285)
15:34 wlb: weston Issue #771 opened by Edgar Neubauer (Desperado17) What can cause frequent calls to glDrawArrays in Weston? https://gitlab.freedesktop.org/wayland/weston/-/issues/771
15:44 wlb: weston Issue #771 closed \o/ (What can cause frequent calls to glDrawArrays in Weston? https://gitlab.freedesktop.org/wayland/weston/-/issues/771)