12:15karolherbst: zmike: ever ran into the issue where you get "radv/amdgpu: The CS has been rejected, see dmesg for more information (-14)." but dmesg contains nothing? I'm hitting "../src/gallium/drivers/zink/zink_context.c:4265: zink_wait_on_batch: Assertion `batch_id' failed." but no idea why. validation layers are also clean
12:24tnt: I got an application triggering "GL_OUT_OF_MEMORY in glTextureStorage2DMultisample(texture too large)". The texture in question is 3582x3582 GL_RGBA32F 8 samples so that should be about a ~1.6G texture. I've got like 15G free RAM and it's an iris iGPU so are there other limits that apply ?
12:26zmike: karolherbst: that's a gpu hang
12:26karolherbst: yeah sure, but normally amdgpu prints stuff in dmesg
12:26karolherbst: which it doesn't
12:26zmike: idk that's usually what I see
12:27karolherbst: I wonder if it's a kernel bug tho...
12:27karolherbst: or something else going on
12:27karolherbst: -14 is a weird error code
12:28karolherbst: maybe just a post driver forgets to print something, but then again... anyway, I was just curious if you know more
12:28zmike: I do not
12:29zmike: usually those types of hangs are something being fucked with descriptors
12:29karolherbst: thing is it only happens randomly, so not quite sure what sometimes goes wrong
12:29zmike: or maybe that's -21
12:30karolherbst: I think the -14 comes from validation, so maybe I just add some printks and see if I can figure out why the kernel is unhappy
12:34zmike: you could try with vvl and ZINK_DESCRIPTORS=lazy to see if it finds something then
12:34zmike: validation can't really handle descriptor buffer
12:46karolherbst: mhh.. nothing showing up with that either
12:47zmike: then it's something unusually tricky
12:52zmike: did you try with lavapipe?
12:54karolherbst: this issue only happens with radv
12:54karolherbst: uhh actually iris as well... oops
12:55karolherbst: but I don't see it with lvp at all
12:57zmike: maybe try ZINK_DEBUG=sync on radv
12:57zmike: to rule out sync issues
12:58karolherbst: doesn't help
13:02zmike: then my guess is something descriptor-related
13:03zmike: unless you're actually crashing the gpu in a shader somehow
13:03zmike: but it works in lavapipe, which is odd
13:05karolherbst: it doesn't run shaders
13:09karolherbst: I'm sure it's something really silly somewhere
13:16karolherbst: it seems to goes away with libasan...
13:18karolherbst: mhh with tsan it triggers reliably
13:18karolherbst: *sigh*
13:19karolherbst: 40 data races with iris when it runs successfully, 46 if not..
13:21karolherbst: heh.. that's just error handling
15:19karolherbst: zmike: okay.. I think I found it, and it's as silly as anticipated. So it's related to userptrs and I think kernels reject the submissions if there are stale buffers referenced whose userptr has been freed by the application already
15:20karolherbst: if I make the CTS stop freeing them, I don't submits failing anymore, but the test is failing, so I wonder if userptr support is busted in one or another way
16:08zmike: huh
18:22karolherbst: mhhh.. soo it looks like it works in principle, just that parts of the buffer don't contain the expected values... doesn't look like a sync issue, maybe something with caches?