06:21MoeIcenowy: how to prevent deqp from timing out?
06:22MoeIcenowy: attaching it to gdb to debug driver (llvmpipe) now, but when I do too many things in breakpoint, it times out
06:29dwfreed: increase the timeout? or set commands for the breakpoint instead of doing it manually
06:29MoeIcenowy: oops teglGLES2SharingThreadedTests has its own timer
06:29MoeIcenowy: I should disable it
09:29ondracka: Anyone has any idea why egl would report (with eglinfo for example) MSAA configs with 2,4 and 6 samples for X11 platform but only 4 samples for GBM? This is with r300 driver.
09:44mripard: sima, pq: could you have a look at https://lore.kernel.org/all/20240715-drm-bridge-connector-fix-hdmi-reset-v4-2-61e6417cfd99@linaro.org/ ?
09:44mripard: emersion: ^
09:45emersion: hm, not sure i'm a fan
09:45emersion: if a property is designed to be set by the user (e.g. "Colorspace") but only has a single choice because of hw limitations
09:45emersion: i wouldn't want to mark it as immutable
09:47mripard: I think the larger disucssion is that there's an IGT test that checks for this
09:47sima: mripard, might be the igt is just wrong
09:48mripard: so we should either document (and enforce it) in the kernel, or get rid of that test
09:48mripard: both are equally valid to me :)
09:48sima: ack
09:48sima: well imo just documenting is kinda silly, should enforce it with a check, or maybe just automatically set it when creating the prop
09:49mripard: the next patch in that series enforces it for max_bpc, but we can move that check into drm_property_create_range if that's what we decide
09:53emersion: "immutable" to me is an indication of how the info flows
09:53emersion: is it kernel → userspace, or userspace → kernel
09:53emersion: so it's completely orthogonal to "are there multiple choices"
09:53emersion: maybe pq has an opinion on this
09:54sima: mripard, vsyrjala wrote the test originally, maybe can shed some light on the why
09:54sima: emersion, yeah for me immutable is also "the kernel sets this"
09:55sima: but I guess that does fit for a prop with only one valid value
09:55emersion: maybe user-space is supposed to set it, but because of the monitor or hw limitations there is a single choice
09:55emersion: see "Colorspace" example above
09:58sima: yeah it could definitely also surprise userspace in a bad way
09:58sima: I guess either way we go, needs a pile of acks ...
09:59sima: since a "single choice props might or might not be immutable" doc patch feels really awkward :-/
09:59emersion: awkward?
10:00sima: suboptimal
10:00sima: Swiss sarcasm for a really shit situation
10:06emersion: i don't really see why it would be awkward/suboptimal
10:09sima: inconsistent uapi across drivers just means more chances that compositors don't work across all of them
10:11emersion: it's not inconsistent
10:12emersion: given a property name, it should be defined whether or not it's mutable or immutable for all drivers, regardless of the choices exposed to userspace
10:12emersion: with the core helpers to create the property, that's already the case, i think
10:17zamundaaa[m]: Oh ffs, stupid matrix bridge
10:18zamundaaa[m]: sima: properties being immutable on some drivers but not on others would be what would be really inconsistent and cause userspace not to work on some drivers
10:27sima: emersion, zamundaaa[m] yeah per-property consistency would be a good option too I think
11:09daniels: ondracka: is this actually a regression from current?
11:09daniels: ondracka: like is it something dril changes, or?
11:10daniels: I can't really see how dril would affect r300 gbm
11:10ondracka: No, I don't think so.
11:10daniels: ok :)
11:11ondracka: However the idea for dril was to figure out the supported MSAA configs to expose from EGL GBM configs, but if there aren't all configs that are reported for X11, than this is not going to work.
11:13daniels: so in src/gallium/frontends/dri/dri_screen.c, you've got dri_init_screen() calling dri_fill_in_modes(), which adds a DRIconfig for each driver-supported configuration, including all the MSAA axes
11:13daniels: the first thing to check is that it's calling driCreateConfigs() with msaa_modes[] populated for the sample rates you want them to have
11:14daniels: after that you're jumping to src/egl/drivers/dri2/platform_drm.c, where drm_add_configs_for_visuals() takes the list from driCreateConfigs as driver_configs[], and its own pre-determined (from src/gbm/backends/dri/gbm_dri.c) table of { pipe_format, gbm_format } mappings
11:15daniels: it'll walk driver_configs[] and try to find a match for that in the visual_table mapping
11:15daniels: it doesn't care at all about sample rates there, it's just trying to match the colour format
11:15daniels: so my suspicion is that dri_fill_in_modes() is already failing to correctly populate the config table with MSAA configs
11:16daniels: but yeah, stick a pile of debugging in there and try to find the first point things go wrong
11:16ondracka: just to be clear, this happens for example when I run eglinfo, or already earlier?
11:21daniels: when you run eglinfo
11:22daniels: (assuming you're running eglinfo on the GBM/DRM platform, not X11)
11:28ondracka: So if I do eglinfo under X11, than the output I get for GBM platform are swrast configs, right?
11:31daniels: -EPARSE
11:31daniels: I don't see how swrast is really involved here
11:31daniels: just do it from a terminal
11:31daniels: not under X11, and just touch GBM directly
11:31daniels: that's the first point things are presumably going wrong, so just keep it as simple as possible and let's narrow down the behaviour with GBM first
11:31daniels: after that's giving us what we want, then we can step up the stack to trying to debug dril
11:37ondracka: Well, my current theory is that drill is actually getting the swrast configs and nor r300 configs (because I only get 4 samples for GBM platform when I run eglinfo under X11 and there it seems to be using swrast), how could I test this?
11:38ondracka: In fact elginfo in terminal in tty without X seems to show properly 2,4 and 6 sample configs for r300 and GBM platform.
11:48mripard: sima, emersion, zamundaaa[m]: so should we just send a patch removing the IGT test and start the discussion there?
11:49daniels: ondracka: print the device from dril?
11:52zamundaaa[m]: mripard: yeah
11:52emersion: sounds good
11:52ondracka: daniels: right, sorry I have no clue about EGL, so assuming I have an EGLDisplay, how do I print what driver is it using?
11:53swick[m]: +1 for "a prop is either mutable or immutable regardless of the number of choices"
11:59daniels: ondracka: sorry, am on my phone on my way to an appointment so can’t look specifically at code atm, but eglQueryString is your friend here
11:59daniels: emersion: ++
12:36zmike: ondracka_: have you tried following the egl calls with gdb
12:39zmike: like during xorg startup you can just follow eglInitialize or whatever and see why you're getting swrast
12:41ondracka: zmike: no because I failed to get gdb working in interactive mode with X, but I can try again
12:41zmike: that's probably your best bet for getting to the bottom of this
12:47ondracka: zmike: OK, I'm attached now and in init_dri2_configs, so what should I look for?
12:48zmike: uh
12:48zmike: one sec
12:49zmike: ondracka: you want to basically trace through eglInitialize and see when it does _eglDriver.Initialize() where it's going
12:49zmike: and figure out why it's hitting swrast in dri_device_create() in gbm
12:50zmike: I guess dri_screen_create() is failing somehow?
13:23ondracka: zmike: so loader_open_device returns -1, for /dev/dri/card0... Do I need to run X as root maybe or something?
13:24zmike: maybe you aren't in the right groups?
13:24zmike: sudo usermod -aG video <username>
13:24ondracka: I'm in video
13:24zmike: huh
13:25zmike: you can try chmod the nodes I suppose
13:25zmike: like g+r
13:27ondracka: crw-rw----+ 1 root video 226, 0 16. čec 09.12 /dev/dri/card0
13:27sima: mripard, yeah sounds good, and if we go with per-prop consistency on this then would need per-prop documentation fix anyway
13:27ondracka: I guess I need to go deeper in loader_open_device to see what happens
14:10DragoonAethis: sima: Hey, I'm looking for a lockdep expert :) Is it sensible to disable CONFIG_DEBUG_LOCKDEP while running driver tests in CI?
14:10DragoonAethis: The Kconfig blurb suggests it's additional runtime checks for lockdep itself, but https://linux-kernel-labs.github.io/refs/heads/master/lectures/debugging.html suggests it does a bit more, not sure if it's valuable for day-to-day driver testing
14:11ondracka: zmike: the loader_open_device was false alarm. Actually what seems to be failing is loader_open_driver_lib (driver_name=0x2651d10 "radeon", lib_suffix=0xb6f67043 "_gbm", search_path_vars=0xb6f6e144 <backend_search_path_vars>, default_search_path=0xb6f67048 "/home/paulie/graphics/install/lib/gbm", warn_on_fail=false)
14:12zmike: hm
14:14zmike: ondracka: when that fails does it call loader_open_driver_lib again?
14:14ondracka: So probably just another mesa misconfiguration on my part? I suppose there should be radeon_gbm or something in the lib/gbm directory?
14:14zmike: you should get a callstack like https://paste.centos.org/view/191078ba
14:15zmike: no, that call should fail
14:15zmike: and then it should call again like this^
14:18ondracka: zmike: https://paste.centos.org/view/7afb66aa
14:18zmike: ok so that's as expected
14:18zmike: and the second one succeeds?
14:24ondracka: zmike: actually the second one seems to succeed
14:24zmike: yup
14:27ondracka: However continuing it seem loader_open_driver is called once more with kms_swrast later
14:30MrCooper: ondracka: I also get swrast for the GBM platform while in a display session, I suspect it's kind of expected, since the GBM platform can't get DRM master in that case
14:32MrCooper: the GBM platform is for display servers and other bare-metal apps
14:35MrCooper: BTW, eglinfo prints the EGL driver name shortly after the platform name
14:37ondracka: MrCooper: well, not on my old debian, but yeah, I confirmed with eglGetDisplayDriverName manually
14:38zmike: ondracka: are you able to figure out why it's being called a third time?
14:38alyssa: FYI - I'm planning to turn fddx & friends into intrinsics, this should fix a lot of weird special cases in NIR
14:38alyssa: figured I'd mention that here before I hog all the labels again ;)
14:40ondracka: zmike: here is the backtrace where it happens https://paste.centos.org/view/04de0011
14:41zmike: ondracka: that's what I expected, but why is dri_screen_create failing? can you confirm this
14:41zmike: (this would be gbm_dri.c:1296)
14:50sima: DragoonAethis, depends what you're testing really
14:51sima: if your validating mesa against a stable kernel, then yeah disable all the costly debug options, and lockdep is definitely one of the expensive ones
14:51sima: since you want full throughput
14:51sima: if you want to ci the kernel, then you need all these debug checks or the results has enough gaps to be pretty much pointless
14:53alyssa: (Well. Maybe. TBD how painful this might get.)
14:53sima: also lockdep isn't just about locking nowadays, a lot of of the annotations are much more about api semantics than just deadlocks
14:53sima: plus lockdep_assert_held is actually precise
15:06glehmann: alyssa: shouldn't !30066 have changed this too? https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/compiler/nir/nir_opt_constant_folding.c#L78
15:31alyssa: glehmann: thanks, https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30214
15:34glehmann: now I just need to remember to copy the fixes to my MR
16:39gfxstrand: eric_engestrom: Couple of requests for ci_run_n_monitor.py:
16:40gfxstrand: 1. It currently doesn't actually monitor if you kicked off manual jobs with --force-manual. It stops as soon as all non-manual jobs finish.
16:41gfxstrand: 2. It's still a giant pain to test the build. Most build jobs hit the manual test mess.
16:42gfxstrand: 3. (or 2a?) I'd be happy to do build tests locally if we had some neat script that used local docker/podman to fire off all the non-Windows CI build things on my laptop. It doesn't have to be on the build farm necessarily. I just want an easy way to run all the build tests so I know they're going to pass.
16:42gfxstrand: eric_engestrom: Do you want me to put all that in issues?
16:43daniels: gfxstrand: please in an issue that's also cc sergi
16:43gfxstrand: Okay
16:43gfxstrand: I'll file 2 separate ones
16:44daniels: #3 in particular would be super-awesome but has just never been quite high enough prio
16:44gfxstrand: I think a bunch of people would love 3 and it would reduce the load on the farm as well
16:46gfxstrand: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11517
16:50gfxstrand: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11518
17:15MoeIcenowy: ah `remote: GitLab: http post to gitlab api /post_receive endpoint: Internal API unreachable`
17:15MoeIcenowy: gitlab.fd.o boom?
17:16MoeIcenowy: (trying to push a branch to my mesa fork, and create a mr
17:17vyivel: just slow
17:17MoeIcenowy: I didn't found this remote error before, and then when I am trying to create mr, I can select the branch in the source branch part, but trying to create leads to `Source branch "orcjit-mutex-lookup" does not exist`
17:17MoeIcenowy: my commit is seens at https://gitlab.freedesktop.org/icenowy/mesa/-/commit/1bca5dc6442cfd4480e63637297681730483490b
17:18MoeIcenowy: but I still cannot make the MR ...
17:18MoeIcenowy: I created a branch that exists and does not exit at the same time ...
17:21MoeIcenowy: s/exit/exist/
17:28lumag: mlankhorst, mripard: would it be possible to merge drm-misc-fixes into drm-misc-next? I'd like to finally push https://patchwork.freedesktop.org/patch/602338/?series=135234&rev=2, but it depends on both -fixes and -next. Or we'd better postpone that to the -rc1 time?
17:42DragoonAethis: sima: Alright, thanks - this is about the kernel CI, with IGTs, wrt https://gitlab.freedesktop.org/drm/xe/ci/-/merge_requests/55
17:43DragoonAethis: We don't have any Mesa testing on that level (yet) (if ever)
17:44sima: DragoonAethis, that looks more like disabling lockdep debugging instead of lockdep itself?
17:45DragoonAethis: Well, yeah, that's CONFIG_DEBUG_LOCKDEP
17:46DragoonAethis: The link above suggests that's needed for some lockdep checks
17:53MoeIcenowy: okay the gitlab.fd.o finally recognizes my branch as legal source branch
22:41alyssa: did anything change in the vk runtime recently that would cause tests like dEQP-VK.api.object_management.max_concurrent.device to start failing nondetministically?
22:41alyssa: (I'm wondering if I missed a refactor or something. I really need to get honeykrisp upstream lolz)
22:44Sachiel: alyssa: we've seen those in intel CI, but they didn't reproduce on my system, so I thought it was something wrong on CI itself. Didn't follow through with it though
22:44alyssa: Sachiel: ouch x_x
22:44alyssa: I never saw it before today and now it's spamming all over my CTS runs so clearly *something* changed but, ?!?!
22:45Sachiel: the entire object_management group passes cleanly for me. Maybe I need to try a full cts run to see if somehow that's causing them
22:46alyssa: today I'm hitting it with just ./deqp-vk -n 'dEQP-VK.api.object_management.*'
22:46alyssa: and it's a heisenbug, because of course it is (:
22:46Sachiel: well, ngcortes says that rolling back some meson and deps changes makes the problem go away
22:46Sachiel: a fun one to debug this one will be
22:46alyssa: !!
22:49alyssa: oh, the dEQP-VK.api.object_management.single_alloc* group is reliably failing all but the first
22:50alyssa: hopefully this is related, because that seems easier to figure out
22:58alyssa: ...Uh
22:59alyssa: commenting out the body of vk_free so mesa never calls pfnFree ... I'm still getting "Attempt to free.. which has not been allocated"
22:59alyssa: go back to sleep CTS
23:01Sachiel: oh, huh... I see them fail if I don't use my built mesa
23:15Sachiel: I blame the loader
23:19Sachiel: alyssa: if VK_DRIVER_FILES includes at least two different drivers, I see all but the first single_alloc tests fail. If there's only one driver there, everything passes fine
23:21Sachiel: I wonder if we are leaking something in the case of "try to open, not compatible, bail" case
23:24ngcortes: Sachiel, alyssa https://drive.google.com/file/d/11vwhyUbvNFvpp6tXwAlBLaHyLWV4k-Tq/view?usp=drive_link
23:25ngcortes: ^ there's the delta between the packages (new on left, old on right)
23:25Sachiel: no access, pastebin is usually easier
23:26ngcortes: whoops sorry, try again
23:26Sachiel: that is pretty unreadable
23:27ngcortes: lol I'll throw it in pastebin one sec
23:27Sachiel: I do see libvulkan1 there, so it might be a loader issue
23:31Sachiel: alyssa: https://github.com/KhronosGroup/Vulkan-Loader/pull/1504 maybe?
23:31Sachiel: on to build the loader and see if that's it...
23:38Sachiel: alyssa: yeah, that commit fixes it for me
23:38ngcortes: Sachiel, alyssa https://paste.centos.org/view/f1d57698
23:38ngcortes: there we go
23:38ngcortes: that should be more readable