09:00karolherbst: pendingchaos: mhhh.. looks like the ISA has v_mul_hi_i32_i24_e32 and v_mul_hi_u32_u24_e32, but LLVM selects the former.. but I also don't see a way through the LLVM API to make sure the unsigned version to be picked... but I also don't know if that one would fix the issue anyway
09:01karolherbst: but it looks like this bug exists for quite some time, but I don't know if it's technically a LLVM bug or a nir to llvm one
09:01karolherbst: nir to llvm just does a 64 bit mul and shifts the result
09:02pendingchaos: v_mul_hi_u32_u24_e32 can't be used because 0xff803fe1 is not an unsigned 24-bit integer
09:02karolherbst: ohhh....
09:03karolherbst: sadly llvm only has mul, no umul/imul split
09:04karolherbst: what would I need to do to emit v_mul_hi_u32 directly?
09:05karolherbst: mhh I guess one of the llvm.amdgcn. intrinsics if one for that one exists...
09:06pendingchaos: there would be no difference between umul and imul
09:06pendingchaos: this is probably a llvm bug, if it's turning a mul(zext(a), zext(b))>>32 into v_mul_hi_i32_i24
09:07karolherbst: yeah.. that seems what's happening here
09:14karolherbst: .... rocm has the same bug
09:15karolherbst: Intel's CL stack behaves like the reporter expects it (different than rocm)
09:18karolherbst: as a workaround we could make nir_opt_idiv_const to not emit negative numbers...
09:28karolherbst: yeah... that works
09:28karolherbst: llvm emits v_mul_hi_u32 then
09:29karolherbst: just need to avoid numbers which can be interpreted as negative signed integers
09:32karolherbst: though this emits an additional uadd_sat
09:32karolherbst: could make it configurable
09:33pendingchaos: here's another idea: https://www.irccloud.com/pastebin/2k1n5Xzc/
09:34pendingchaos: prevents s_mul_hi_u32/s_mul_hi_i32 from being used, though
09:36karolherbst: v_mul_hi_u32 v2, 0xff803fe1, v2 ; D56A0002 000204FF FF803FE1
09:36karolherbst: seems fine?
09:36pendingchaos: s_mul_hi_{u,i}32 is introduced in gfx9 though, and gfx9 and later don't seem to have the v_mul_hi_i32_i24 bug for some reason
09:36pendingchaos: so this workaround could be limited to gfx6-8
09:37karolherbst: I think I'm on gfx9
09:37karolherbst: what's gfx9 again?
09:37karolherbst: I'm on rdna2 here
09:38karolherbst: or is that 8?
09:39pendingchaos: rdna2 is gfx10.3
09:39karolherbst: heh
09:39karolherbst: oh right
09:40karolherbst: but anyway, your suggestion seems to fix it for me as well
09:40karolherbst: though not really sure what it's doing...
09:45pendingchaos: the inline assembly (which does nothing) prevents LLVM from knowing that the sources are within [-8388608,8388607]
09:46karolherbst: ohh, I see
09:48karolherbst: I wished there would be a more reliable way of telling llvm to not use the 24 bit mul here..
09:48karolherbst: or a more explicit way
09:53DemiMarie: Finally ditch LLVM for AMD?
10:23mareko: karolherbst: do you have LLVM IR producing the incorrect v_mul?
10:25pendingchaos: mareko: https://www.irccloud.com/pastebin/GLrsszIo/
10:26mareko: and please a description of the problem that I can send to the LLVM team?
10:26karolherbst: yeah, looks similar to what I got
10:27karolherbst: using those inputs yields the wrong result doing mul_hi: 67451429, 684514641, 694514641, 794514641, 894514641, 99451464, 123343442, 23111252, 412341412, 354325253
10:27karolherbst: wait.. I can even write a CL file whith shows the bug with rocm
10:30karolherbst: mareko: anything else? https://gitlab.freedesktop.org/mesa/mesa/-/issues/11761#note_2539707
10:34mareko: thanks, that will be the most detailed bug report they've ever seen
10:34karolherbst: 🙃
10:35mareko: in the meantime, we can use ac_build_optimization_barrier guarded by LLVM_VERSION_MAJOR checks
10:36mareko: and the required LLVM version can be set to 9999
10:36mareko: for those checks
10:41karolherbst: yep, seems to work as well
10:43karolherbst: mhhh
10:43karolherbst: mareko: though your suggestion leads to worse code than the one from pendingchaos
10:44karolherbst: now it's doing v_mul_hi_u32 + v_mad_u64_u32
10:51karolherbst: mareko: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30810
11:01mareko: karolherbst: you can keep pendingchaos's suggestion, but replace block_range_analysis with ac_build_optimization_barrier
11:01karolherbst: okay, cool
11:02mareko: the effect should be identical
11:02pendingchaos: block_range_analysis is more optimizable
11:02pendingchaos: ac_build_optimization_barrier can't be CSE'd and claims to have side effects
11:02karolherbst: ehh wait, I misunderstood what you said
11:02mareko: ok
11:02pendingchaos: my suggestion doesn't need to change emit_imul_high()
11:02karolherbst: yeah.. I've put the asm in the MR
11:03karolherbst: it's a signficiant difference
11:03mareko: you can keep pendingchaos's suggestion as-is
11:03karolherbst: pendingchaos: any thoughts on using LLVMIsConstant on both sources? not really sure what's the idea behind the else there
11:03mareko: pendingchaos: you said "gfx9 and later don't seem to have the v_mul_hi_i32_i24 bug for some reason", is that true given that karolherbst is on rdna2?
11:04pendingchaos: the inline assembly needs to be used for at least one source, and using it for constant ones prevents the constant from being combined into the instruction
11:04karolherbst: okay
11:05pendingchaos: apparently the gfx9 and later thing isn't true, since it happens for karolherbst on rdna2
11:05pendingchaos: not sure what I'm missing in my testing
11:05mareko: ok
11:05karolherbst: pendingchaos: did you test with the division or the optimized umul_high pattern?
11:06karolherbst: and what did you test? Anyway, ROCm doesn't show the bug with the divisions, only with using umul_high directly here
11:06pendingchaos: I tested a vkrunner thing: https://www.irccloud.com/pastebin/GLrsszIo/
11:07pendingchaos: the GLSL used division, which was optimized to the umul_high stuff
11:07karolherbst: I see
11:07karolherbst: if you give me your vkrunner file I can check on rdna2 here as well
11:08pendingchaos: https://www.irccloud.com/pastebin/Q5rnqZvi/llvm_mulhi_bug.shader_test
11:11karolherbst: mhhh yeah.. that seems to emit v_mul_hi_u32 v0, 0xff803fe1, v0 ; D56A0000 000200FF FF803FE1 here
11:11karolherbst: LLVM: %12 = mul nuw nsw i64 %11, 4286595041
11:11karolherbst: maybe llvm takes more into account or something...
11:18karolherbst: I haven't tried to figure out where the optimization gets applied inside LLVM, but a quick git grep didn't show anything (but I might have used the wrong things to search for)
11:25karolherbst: mhh.. okay.. ac_build_optimization_barrier works as well, I just modified the code incorrectly
11:26karolherbst: (I accidentally swaped the condition in regards to src0 and src1
12:41FireBurn: Hey, a commit in the last 24hrs has broken Chromium Vulkan EGL, I'm just bisecting now
12:49zmike:sweats nervously
12:51FireBurn: heh I was eying up https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30647 but I'll confirm shortly
12:53FireBurn: Should I start creating the bug now?
13:05karolherbst: pendingchaos: sooo, it seems like I can prevent llvm to make that opt by doing "unsigned int tmp1 = A[0];" instead of "unsigned int tmp1 = A[get_global_id(0)];" 🙃
13:07pendingchaos: it uses s_mul_hi_u32?
13:07karolherbst: yeah
13:07pendingchaos: well, that's just because the source is no longer divergent
13:07karolherbst: I guess there is no s_mul_hi_i32_i24_e32 then?
13:07pendingchaos: no
13:07pendingchaos: s_mul_hi_u32 is as fast as any other SALU
13:08karolherbst: ahh...
13:08karolherbst: anyway, your suggestion prevents the use of s_mul_hi_u32 if it's convergent
13:09pendingchaos: yeah, I don't know how to prevent that without the divergence analysis idea mentioned in the MR
13:11karolherbst: though LLVM could choose to still decide differently than nir's analysis would guess, no?
13:46mareko: karolherbst: that was quick: https://github.com/llvm/llvm-project/pull/105831
13:47karolherbst: nice
13:47karolherbst: let me verify that
13:47karolherbst: mhh.. will this even apply cleanly here :D
13:48karolherbst: seems to apply cleanly on 18.x (ignoring the test)
14:18karolherbst: mareko: yep, that fixes the bug, thanks!
14:50alyssa: karolherbst: ..does clang not support cl_khr_gl_msaa_sharing? :(
14:51alyssa: oh, I need to patch clc proper. ok
14:53alyssa: Unimplemented SPIR-V capability: SpvCapabilityImageMipmap (15)
14:53alyssa: Oh come on.
14:53karolherbst: 🙃 not sure if I've added that one
14:53karolherbst: wait..
14:53karolherbst: yeah well.. soo mesa doens't implement cl_khr_gl_msaa_sharing ...
14:53karolherbst: alyssa: do you need it fir something?
14:54alyssa: gpu crimes yeah
14:54karolherbst: mhh
14:54karolherbst: this ext depends on cl_khr_gl_depth_images
14:54alyssa: i'll just backdoor it with a vendor intrinsic because I do not want to think about llvm right now
14:54karolherbst: which... is something I wanted to add at some point as well
14:54karolherbst: yeah, fair
14:54karolherbst: depth_images are a bit of a pain to add
14:55alyssa: trying to get my decompression kernel to pass GL CTS
14:55karolherbst: but I can prioritize it a bit higher if others want to use it as well
14:56karolherbst: though I think it's mostly API stuff I need to add here, but the spirv/nir side might also need a bit of work
14:56alyssa: yeah I'll backdoor for now and we can port to the standard syntax when the rusticl side is there
14:56karolherbst: okay, cool
14:56karolherbst: soo.. three features: depth/stencil images, cl_khr_gl_depth_images and cl_khr_gl_msaa_sharing
15:13alyssa: k
15:13alyssa: oops
15:43jenatali: karolherbst: Feel free to ping me for reviews on the spirv/nir stuff. Not sure if/when I'd get to plumbing it myself but I'd like to make sure it's workable
15:43karolherbst: jenatali: for depth images and co?
15:43jenatali: Yeha
15:43karolherbst: will do
16:11FireBurn: @zmike device_select: shortcut EnumeratePhysicalDevice* for count-only calls
16:24zmike: FireBurn: okay, just make a MR to revert that then
16:30FireBurn: Will do
16:37FireBurn: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30820
16:37FireBurn: I dumped the info in there too
16:37FireBurn: (into a bug sorry)
16:38zmike: 🤝
16:38FireBurn: IF there's any debugging that might be useful, let me know
16:38zmike: nah it's fine
17:29oneforall2: hmm no mesa chanel or is this it?
17:29jenatali: This is it
17:34oneforall2: thanks https://pastebin.com/msNHBgjH not liking rust :)
17:37oneforall2: rust-1.80.1
17:37oneforall2: mesa 24.2.0
18:03orbea: oneforall2: unless you need opencl might be able to just disable it to work around the errors
18:04oneforall2: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30710/diffs?commit_id=93e96da9458c9d0348f2390dc0bea67cf140b1a0thislookslikethepatch?
18:16oneforall2: https://gitlab.freedesktop.org/mesa/mesa/-/commit/93e96da9458c9d0348f2390dc0bea67cf140b1a0.patch compiled
18:17karolherbst: oneforall2: yeah.. ultimately this seems to be a meson bug, but I've added that workaround for now
18:34oneforall2: 1 thing I dislike about meson is it doesn't like to work with flags to well like LDFLAGS to pick the right lib dir 32bit or 64
18:35oneforall2: like right now :)
18:45alyssa: everybody needs opencl
18:45alyssa:laughs villainously
18:50urja:shudders
18:51ccr: the knights who say "opencl"
19:20Company: I don't understand this whole dri2_query_image()/resource_get_handle() thing but I think llvmpipe doesn't dup() the fds it returns from eglExportDMABUFImageMESA() - or it dups them and returns the wrong one?
19:21karolherbst: jenatali: looks like we are already good in regards to depth images
19:21jenatali: Oh cool
19:22Company: LIBGL_ALWAYS_SOFTWARE=1 GSK_RENDERER=vulkan gtk4-demo --run=gears
19:22Company: if anyone wanna help me figure this out
19:23Company: oh, that might need gtk from main
19:23Company: or F41/rawhide
19:26Company: yup, it does
19:27airlied: it doesn't appear to, but I've also no idea if it is meant to
19:27Company: it is meant to
19:27airlied: https://paste.centos.org/view/5a06a63a would dup it
19:27airlied: but as you mentioned previously it already has a long lived dup
19:27airlied: maybe we should drop the other one
19:28Company: what happened 12 hours ago is that our sysadmins finally exposed /dev/udmabuf into our CI
19:29Company: and suddenly both Vulkan and GL llvmpipe feel the full force of our dmabuf import/exports
19:29Company: because they finally hit those codepaths
19:29Company: and it seems that is not a very common thing that people do - export and import dmabufs from software renderers ;)
19:34karolherbst: jenatali: I think msaa will need some work because that actually has its own GLSL_SAMPER_DIM type, but depth images are just like plain ones
19:34jenatali: 👍
19:35karolherbst: clEnqueueFillImage needs fixing, because the pixel size is one, not four, but... that's just API stuff
19:36Company: airlied: that works for avoiding the double close but runs into VK_ERROR_INVALID_DRM_FORMAT_MODIFIER_PLANE_LAYOUT_EXT when importing
19:37Company: (with AMD, lvp Vulkan doesn't complain but also doesn't show the texture)
19:45airlied: Company: no idea on that, would probably require debugging :-)
19:46Company: yeah, I suppose I get to file a bunch of issues
19:46Company: once I've figured out how to unblock GTK's CI
19:48airlied: rm /dev/udmabuf :-P
19:48airlied: or maybe chmod
19:49Company: I worked hard to get it turned on
19:53DemiMarie: Company: do you use containers or ephemeral VMs for CI?
19:54airlied: where are you importing into?
19:54Company: DemiMarie: I use whatever the gnome sysadmins give me - and that's some docker image
19:56Company: airlied: the current test I'm debugging is GL content in Vulkan, ie llvmpipe is exporting, lvp is importing - and when I'm running it locally I also run the test against my AMD (usually because I forget the env vars)
19:57Company: airlied: and the reproducer above does the same thing
19:57airlied: Company: not sure how importing into amd should work there
19:58Company: I have no idea if it should - it might be perfectly fine that it's complaining
19:58Company: no idea what the state of udmabuf is
19:59Company: or if llvmpipe creates dmabufs that AMD likes with its stride requirements
19:59Company: the worse problem is that lvp imports it but then doesn't show anything
20:09airlied: Company: yeah definitely should file that one and see where it's going wrong
20:47Company: airlied: it's https://gitlab.freedesktop.org/mesa/mesa/-/issues/11773 now
22:52Company: airlied: with udmabuf, does one have to keep the memfd around as long as the dmabuf_fd exists or can I close() it? It's not documented anywhere