IRC Logs of #dri-devel on irc.freenode.net for 2024-08-23

09:00 karolherbst: pendingchaos: mhhh.. looks like the ISA has v_mul_hi_i32_i24_e32 and v_mul_hi_u32_u24_e32, but LLVM selects the former.. but I also don't see a way through the LLVM API to make sure the unsigned version to be picked... but I also don't know if that one would fix the issue anyway
09:01 karolherbst: but it looks like this bug exists for quite some time, but I don't know if it's technically a LLVM bug or a nir to llvm one
09:01 karolherbst: nir to llvm just does a 64 bit mul and shifts the result
09:02 pendingchaos: v_mul_hi_u32_u24_e32 can't be used because 0xff803fe1 is not an unsigned 24-bit integer
09:02 karolherbst: ohhh....
09:03 karolherbst: sadly llvm only has mul, no umul/imul split
09:04 karolherbst: what would I need to do to emit v_mul_hi_u32 directly?
09:05 karolherbst: mhh I guess one of the llvm.amdgcn. intrinsics if one for that one exists...
09:06 pendingchaos: there would be no difference between umul and imul
09:06 pendingchaos: this is probably a llvm bug, if it's turning a mul(zext(a), zext(b))>>32 into v_mul_hi_i32_i24
09:07 karolherbst: yeah.. that seems what's happening here
09:14 karolherbst: .... rocm has the same bug
09:15 karolherbst: Intel's CL stack behaves like the reporter expects it (different than rocm)
09:18 karolherbst: as a workaround we could make nir_opt_idiv_const to not emit negative numbers...
09:28 karolherbst: yeah... that works
09:28 karolherbst: llvm emits v_mul_hi_u32 then
09:29 karolherbst: just need to avoid numbers which can be interpreted as negative signed integers
09:32 karolherbst: though this emits an additional uadd_sat
09:32 karolherbst: could make it configurable
09:33 pendingchaos: here's another idea: https://www.irccloud.com/pastebin/2k1n5Xzc/
09:34 pendingchaos: prevents s_mul_hi_u32/s_mul_hi_i32 from being used, though
09:36 karolherbst: v_mul_hi_u32 v2, 0xff803fe1, v2 ; D56A0002 000204FF FF803FE1
09:36 karolherbst: seems fine?
09:36 pendingchaos: s_mul_hi_{u,i}32 is introduced in gfx9 though, and gfx9 and later don't seem to have the v_mul_hi_i32_i24 bug for some reason
09:36 pendingchaos: so this workaround could be limited to gfx6-8
09:37 karolherbst: I think I'm on gfx9
09:37 karolherbst: what's gfx9 again?
09:37 karolherbst: I'm on rdna2 here
09:38 karolherbst: or is that 8?
09:39 pendingchaos: rdna2 is gfx10.3
09:39 karolherbst: heh
09:39 karolherbst: oh right
09:40 karolherbst: but anyway, your suggestion seems to fix it for me as well
09:40 karolherbst: though not really sure what it's doing...
09:45 pendingchaos: the inline assembly (which does nothing) prevents LLVM from knowing that the sources are within [-8388608,8388607]
09:46 karolherbst: ohh, I see
09:48 karolherbst: I wished there would be a more reliable way of telling llvm to not use the 24 bit mul here..
09:48 karolherbst: or a more explicit way
09:53 DemiMarie: Finally ditch LLVM for AMD?
10:23 mareko: karolherbst: do you have LLVM IR producing the incorrect v_mul?
10:25 pendingchaos: mareko: https://www.irccloud.com/pastebin/GLrsszIo/
10:26 mareko: and please a description of the problem that I can send to the LLVM team?
10:26 karolherbst: yeah, looks similar to what I got
10:27 karolherbst: using those inputs yields the wrong result doing mul_hi: 67451429, 684514641, 694514641, 794514641, 894514641, 99451464, 123343442, 23111252, 412341412, 354325253
10:27 karolherbst: wait.. I can even write a CL file whith shows the bug with rocm
10:30 karolherbst: mareko: anything else? https://gitlab.freedesktop.org/mesa/mesa/-/issues/11761#note_2539707
10:34 mareko: thanks, that will be the most detailed bug report they've ever seen
10:34 karolherbst: 🙃
10:35 mareko: in the meantime, we can use ac_build_optimization_barrier guarded by LLVM_VERSION_MAJOR checks
10:36 mareko: and the required LLVM version can be set to 9999
10:36 mareko: for those checks
10:41 karolherbst: yep, seems to work as well
10:43 karolherbst: mhhh
10:43 karolherbst: mareko: though your suggestion leads to worse code than the one from pendingchaos
10:44 karolherbst: now it's doing v_mul_hi_u32 + v_mad_u64_u32
10:51 karolherbst: mareko: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30810
11:01 mareko: karolherbst: you can keep pendingchaos's suggestion, but replace block_range_analysis with ac_build_optimization_barrier
11:01 karolherbst: okay, cool
11:02 mareko: the effect should be identical
11:02 pendingchaos: block_range_analysis is more optimizable
11:02 pendingchaos: ac_build_optimization_barrier can't be CSE'd and claims to have side effects
11:02 karolherbst: ehh wait, I misunderstood what you said
11:02 mareko: ok
11:02 pendingchaos: my suggestion doesn't need to change emit_imul_high()
11:02 karolherbst: yeah.. I've put the asm in the MR
11:03 karolherbst: it's a signficiant difference
11:03 mareko: you can keep pendingchaos's suggestion as-is
11:03 karolherbst: pendingchaos: any thoughts on using LLVMIsConstant on both sources? not really sure what's the idea behind the else there
11:03 mareko: pendingchaos: you said "gfx9 and later don't seem to have the v_mul_hi_i32_i24 bug for some reason", is that true given that karolherbst is on rdna2?
11:04 pendingchaos: the inline assembly needs to be used for at least one source, and using it for constant ones prevents the constant from being combined into the instruction
11:04 karolherbst: okay
11:05 pendingchaos: apparently the gfx9 and later thing isn't true, since it happens for karolherbst on rdna2
11:05 pendingchaos: not sure what I'm missing in my testing
11:05 mareko: ok
11:05 karolherbst: pendingchaos: did you test with the division or the optimized umul_high pattern?
11:06 karolherbst: and what did you test? Anyway, ROCm doesn't show the bug with the divisions, only with using umul_high directly here
11:06 pendingchaos: I tested a vkrunner thing: https://www.irccloud.com/pastebin/GLrsszIo/
11:07 pendingchaos: the GLSL used division, which was optimized to the umul_high stuff
11:07 karolherbst: I see
11:07 karolherbst: if you give me your vkrunner file I can check on rdna2 here as well
11:08 pendingchaos: https://www.irccloud.com/pastebin/Q5rnqZvi/llvm_mulhi_bug.shader_test
11:11 karolherbst: mhhh yeah.. that seems to emit v_mul_hi_u32 v0, 0xff803fe1, v0 ; D56A0000 000200FF FF803FE1 here
11:11 karolherbst: LLVM: %12 = mul nuw nsw i64 %11, 4286595041
11:11 karolherbst: maybe llvm takes more into account or something...
11:18 karolherbst: I haven't tried to figure out where the optimization gets applied inside LLVM, but a quick git grep didn't show anything (but I might have used the wrong things to search for)
11:25 karolherbst: mhh.. okay.. ac_build_optimization_barrier works as well, I just modified the code incorrectly
11:26 karolherbst: (I accidentally swaped the condition in regards to src0 and src1
12:41 FireBurn: Hey, a commit in the last 24hrs has broken Chromium Vulkan EGL, I'm just bisecting now
12:49 zmike:sweats nervously
12:51 FireBurn: heh I was eying up https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30647 but I'll confirm shortly
12:53 FireBurn: Should I start creating the bug now?
13:05 karolherbst: pendingchaos: sooo, it seems like I can prevent llvm to make that opt by doing "unsigned int tmp1 = A[0];" instead of "unsigned int tmp1 = A[get_global_id(0)];" 🙃
13:07 pendingchaos: it uses s_mul_hi_u32?
13:07 karolherbst: yeah
13:07 pendingchaos: well, that's just because the source is no longer divergent
13:07 karolherbst: I guess there is no s_mul_hi_i32_i24_e32 then?
13:07 pendingchaos: no
13:07 pendingchaos: s_mul_hi_u32 is as fast as any other SALU
13:08 karolherbst: ahh...
13:08 karolherbst: anyway, your suggestion prevents the use of s_mul_hi_u32 if it's convergent
13:09 pendingchaos: yeah, I don't know how to prevent that without the divergence analysis idea mentioned in the MR
13:11 karolherbst: though LLVM could choose to still decide differently than nir's analysis would guess, no?
13:46 mareko: karolherbst: that was quick: https://github.com/llvm/llvm-project/pull/105831
13:47 karolherbst: nice
13:47 karolherbst: let me verify that
13:47 karolherbst: mhh.. will this even apply cleanly here :D
13:48 karolherbst: seems to apply cleanly on 18.x (ignoring the test)
14:18 karolherbst: mareko: yep, that fixes the bug, thanks!
14:50 alyssa: karolherbst: ..does clang not support cl_khr_gl_msaa_sharing? :(
14:51 alyssa: oh, I need to patch clc proper. ok
14:53 alyssa: Unimplemented SPIR-V capability: SpvCapabilityImageMipmap (15)
14:53 alyssa: Oh come on.
14:53 karolherbst: 🙃 not sure if I've added that one
14:53 karolherbst: wait..
14:53 karolherbst: yeah well.. soo mesa doens't implement cl_khr_gl_msaa_sharing ...
14:53 karolherbst: alyssa: do you need it fir something?
14:54 alyssa: gpu crimes yeah
14:54 karolherbst: mhh
14:54 karolherbst: this ext depends on cl_khr_gl_depth_images
14:54 alyssa: i'll just backdoor it with a vendor intrinsic because I do not want to think about llvm right now
14:54 karolherbst: which... is something I wanted to add at some point as well
14:54 karolherbst: yeah, fair
14:54 karolherbst: depth_images are a bit of a pain to add
14:55 alyssa: trying to get my decompression kernel to pass GL CTS
14:55 karolherbst: but I can prioritize it a bit higher if others want to use it as well
14:56 karolherbst: though I think it's mostly API stuff I need to add here, but the spirv/nir side might also need a bit of work
14:56 alyssa: yeah I'll backdoor for now and we can port to the standard syntax when the rusticl side is there
14:56 karolherbst: okay, cool
14:56 karolherbst: soo.. three features: depth/stencil images, cl_khr_gl_depth_images and cl_khr_gl_msaa_sharing
15:13 alyssa: k
15:13 alyssa: oops
15:43 jenatali: karolherbst: Feel free to ping me for reviews on the spirv/nir stuff. Not sure if/when I'd get to plumbing it myself but I'd like to make sure it's workable
15:43 karolherbst: jenatali: for depth images and co?
15:43 jenatali: Yeha
15:43 karolherbst: will do
16:11 FireBurn: @zmike device_select: shortcut EnumeratePhysicalDevice* for count-only calls
16:24 zmike: FireBurn: okay, just make a MR to revert that then
16:30 FireBurn: Will do
16:37 FireBurn: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30820
16:37 FireBurn: I dumped the info in there too
16:37 FireBurn: (into a bug sorry)
16:38 zmike: 🤝
16:38 FireBurn: IF there's any debugging that might be useful, let me know
16:38 zmike: nah it's fine
17:29 oneforall2: hmm no mesa chanel or is this it?
17:29 jenatali: This is it
17:34 oneforall2: thanks https://pastebin.com/msNHBgjH not liking rust :)
17:37 oneforall2: rust-1.80.1
17:37 oneforall2: mesa 24.2.0
18:03 orbea: oneforall2: unless you need opencl might be able to just disable it to work around the errors
18:04 oneforall2: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30710/diffs?commit_id=93e96da9458c9d0348f2390dc0bea67cf140b1a0thislookslikethepatch?
18:16 oneforall2: https://gitlab.freedesktop.org/mesa/mesa/-/commit/93e96da9458c9d0348f2390dc0bea67cf140b1a0.patch compiled
18:17 karolherbst: oneforall2: yeah.. ultimately this seems to be a meson bug, but I've added that workaround for now
18:34 oneforall2: 1 thing I dislike about meson is it doesn't like to work with flags to well like LDFLAGS to pick the right lib dir 32bit or 64
18:35 oneforall2: like right now :)
18:45 alyssa: everybody needs opencl
18:45 alyssa:laughs villainously
18:50 urja:shudders
18:51 ccr: the knights who say "opencl"
19:20 Company: I don't understand this whole dri2_query_image()/resource_get_handle() thing but I think llvmpipe doesn't dup() the fds it returns from eglExportDMABUFImageMESA() - or it dups them and returns the wrong one?
19:21 karolherbst: jenatali: looks like we are already good in regards to depth images
19:21 jenatali: Oh cool
19:22 Company: LIBGL_ALWAYS_SOFTWARE=1 GSK_RENDERER=vulkan gtk4-demo --run=gears
19:22 Company: if anyone wanna help me figure this out
19:23 Company: oh, that might need gtk from main
19:23 Company: or F41/rawhide
19:26 Company: yup, it does
19:27 airlied: it doesn't appear to, but I've also no idea if it is meant to
19:27 Company: it is meant to
19:27 airlied: https://paste.centos.org/view/5a06a63a would dup it
19:27 airlied: but as you mentioned previously it already has a long lived dup
19:27 airlied: maybe we should drop the other one
19:28 Company: what happened 12 hours ago is that our sysadmins finally exposed /dev/udmabuf into our CI
19:29 Company: and suddenly both Vulkan and GL llvmpipe feel the full force of our dmabuf import/exports
19:29 Company: because they finally hit those codepaths
19:29 Company: and it seems that is not a very common thing that people do - export and import dmabufs from software renderers ;)
19:34 karolherbst: jenatali: I think msaa will need some work because that actually has its own GLSL_SAMPER_DIM type, but depth images are just like plain ones
19:34 jenatali: 👍
19:35 karolherbst: clEnqueueFillImage needs fixing, because the pixel size is one, not four, but... that's just API stuff
19:36 Company: airlied: that works for avoiding the double close but runs into VK_ERROR_INVALID_DRM_FORMAT_MODIFIER_PLANE_LAYOUT_EXT when importing
19:37 Company: (with AMD, lvp Vulkan doesn't complain but also doesn't show the texture)
19:45 airlied: Company: no idea on that, would probably require debugging :-)
19:46 Company: yeah, I suppose I get to file a bunch of issues
19:46 Company: once I've figured out how to unblock GTK's CI
19:48 airlied: rm /dev/udmabuf :-P
19:48 airlied: or maybe chmod
19:49 Company: I worked hard to get it turned on
19:53 DemiMarie: Company: do you use containers or ephemeral VMs for CI?
19:54 airlied: where are you importing into?
19:54 Company: DemiMarie: I use whatever the gnome sysadmins give me - and that's some docker image
19:56 Company: airlied: the current test I'm debugging is GL content in Vulkan, ie llvmpipe is exporting, lvp is importing - and when I'm running it locally I also run the test against my AMD (usually because I forget the env vars)
19:57 Company: airlied: and the reproducer above does the same thing
19:57 airlied: Company: not sure how importing into amd should work there
19:58 Company: I have no idea if it should - it might be perfectly fine that it's complaining
19:58 Company: no idea what the state of udmabuf is
19:59 Company: or if llvmpipe creates dmabufs that AMD likes with its stride requirements
19:59 Company: the worse problem is that lvp imports it but then doesn't show anything
20:09 airlied: Company: yeah definitely should file that one and see where it's going wrong
20:47 Company: airlied: it's https://gitlab.freedesktop.org/mesa/mesa/-/issues/11773 now
22:52 Company: airlied: with udmabuf, does one have to keep the memfd around as long as the dmabuf_fd exists or can I close() it? It's not documented anywhere