06:21 airlied: we don't have a nir pass that merges common variable derefs do we?
10:17 karolherbst: airlied: isn't nir_opt_deref doing that?
10:17 karolherbst: or cse?
10:17 karolherbst: mhhh
11:34 mripard: mlankhorst: ping?
12:45 Tom^: not sure if this is the correct place to ask, but il give it a shot. been looking a bit at various wayland compositors and their code. just trying to grasp why this intel igpu is just not keeping up the pageflips with the native hz of the pnel. ive debugged it into simply page_flip_handler and drmHandleEvent just taking its sweet time. i get that its most likely the fence signaling
12:45 Tom^: slow but is that the only story when im seeing barely any gpu usage or cpu usage in the various resource monitors like nvtop/intel_gpu_top etc
12:55 Tom^: and what really nags me is that x11 does it better somehow
13:04 llyyr: you can force the use of the legacy KMS API to see if that solves the problem, if it works on x11 better
13:05 llyyr: look at the specific compositor options to see how to do that
13:08 tzimmermann: javierm, thanks for the fast review. did you see the other recent fixes?
13:09 tzimmermann: https://patchwork.freedesktop.org/series/156107/
13:09 tzimmermann: https://patchwork.freedesktop.org/series/156108/
13:11 Tom^: llyyr: hm thats true, x11 is generally using legacy then i assume?
13:11 javierm: tzimmermann: yes, I've both in my todo queue
13:11 llyyr: yes, x11 can't use the atomic interface
13:11 tzimmermann: i see. thanks again
13:11 javierm: is just that this one was trivial enough that I could r-b on a first look
13:11 tzimmermann: :)
13:11 javierm: tzimmermann: yw
13:13 MrCooper: Tom^: the DRM event handling itself should be light, you might be seeing the work for the next frame getting triggered from that though
13:14 MrCooper: *the compositor work
13:15 Tom^: MrCooper: yeah i suspect as much its just grinding my gears why i cant seem to profile it down to a slow cause with barely any resource monitors catching it besides the flips being slow. im probably just looking at the wrong things but
13:16 MrCooper: maybe the GPU fails to finish drawing the frame before the next display refresh cycle, so the atomic commit / flip is delayed by at least one cycle
13:16 zamundaaa[m]: Tom^: sounds like you're experiencing https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4363
13:18 Tom^: zamundaaa[m]: oh maybe yeah
13:33 mareko: airlied: do you mean a pass that unifies variables that have the same location? lowering and unlowering IO is one way to unify them
13:50 Tom^: zamundaaa[m]: reading that hackfest page, does it mean there are multiple commits scheduled and the safety margin thats dynamicly adjusting is merely trying to hold off that scheduled commit until once the previous commit is actually done
13:51 zamundaaa[m]: Tom^: no, it's about how long it takes from calling the atomic ioctl to the commit actually being programmed into the hardware
13:52 zamundaaa[m]: If that takes a long time, then a compositor committing even 2ms before vblank will miss the deadline for that frame
13:56 Tom^: zamundaaa[m]: but doesnt the drmEventContext with the page_flip_handler2 and drmHandleEvent trigger when thats done?
13:57 Tom^: sorry if im asking dumb questions, just trying to understand what im actually profiling here :D
14:01 Tom^: im gonna see what i915.enable_dsb xe.enable_dsb for sure does overhere as that #4363 issue mentions regressions regarding that tho
14:04 zamundaaa[m]: Tom^: no, the pageflip event triggers for the vblank that the image has been presented in
14:04 zamundaaa[m]: Which is not the same as how long it takes to program the hardware to trigger a pageflip on the next suitable occasion
14:05 Tom^: zamundaaa[m]: oh makes sense, thanks
14:53 jhugo: daniels: Looks like "drm/rockchip: Use temporary variables" has a merge conflict in drm-tip. Could you fix it up? I don't want to make a mess in unfamiliar code
14:53 daniels:blinks
14:54 daniels: I didn't realise Heiko had applied that
14:55 daniels: mmind00: ^ did you get a conflict when pushing for tip?
14:56 mmind00: daniels: oh ...
14:57 mmind00: daniels: dim: FAILURE: Could not merge drm-misc/drm-misc-next ... didn't realize that it ended in that
14:58 mmind00: daniels: I did pick up the ones before the patch I commented on :-)
14:59 mmind00: I guess I should read that conflict-resolution doc
15:05 mmind00: daniels: jhugo: I think I should've resolved the problem
15:05 mmind00: sorry about not noticing this earlier
15:07 mmind00: conflicting patch was "drm/rockchip: vop2: use correct destination rectangle height check" ... which fixed the identical call to drm_rect_width() instead of drm_rect_height() ... which is handled correctly in daniels new change too
15:08 jhugo: Looks good on my end. Thank you for the prompt resolution
15:34 dcbaker: karolherbst: sigh, that cuda dependency has been the bane of my existence recently. I'll have a look. I have been working on refactoring the dependencies to split the include directories out of the compile args for *all* dependencies, so hopefully that will just work in Meson 1.10 or 1.11, lol.
15:34 dcbaker: looking at the documentation though, the way we handle the cudart is not good
15:35 dcbaker: since cuda absolutely allows a non value there...
15:35 karolherbst: dcbaker: okay.. yeah anyway, so far after I got rid of all direct calls the linker is able to drop the dep, so I can keep using it, but I'd also rather have some compile time check to enforce it
15:35 dcbaker: s/non/none
15:35 karolherbst: though I'm not sure if one can have libcuda.so but not libcudart.so installed...
15:36 karolherbst: but I'm sure it's also easier for packagers to only need the header files...
15:36 karolherbst: but if that's fixed in a future meson release, that's good
15:36 karolherbst: dcbaker: but I was jokingly thinking about aa "no_automatic_plz" option 🙃
15:36 karolherbst: *automagic
15:37 karolherbst: or maybe having a way to adjust behavior for certain deps in a documented way?
15:37 karolherbst: wouldn't mind if I'd have to explicitly say "no cudart please"
15:37 dcbaker: at the very least we should have a cudart_none (or really just a cuda_rt: static | shared | none option
15:37 karolherbst: but as I only need the heaaders, it also don't particularly matters
15:38 karolherbst: *doesn't
15:38 dcbaker: Because the docs seem to say that the cudart isn't required, it's just something that you probably want
15:38 karolherbst: if partial_dependency would work, that would solve all my problems... but I suspect the depepdency lookup would fail if the so isn't there anyway?
15:38 karolherbst: yeah.. I get the reasons
15:38 karolherbst: it would be a pain for all the projects to specify "cudart" explicitly
15:39 karolherbst: and I'm like maybe the only one who doesn't need it lol
15:39 dcbaker: I can at the very least add a cudart_none that translates to -cudart=none
15:39 karolherbst: anyway.. already ported the code to use dlopen + dlsym + cuGetProcAddress, so effectively the final binary is fine
15:40 karolherbst: dcbaker: well as long as it's documented and guaranteed to work I don't really mind either way.
15:40 dcbaker: https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#cudart-none-shared-static-cudart suggests that it is
15:41 karolherbst: ohh I meant documentation on the meson side
15:42 karolherbst: just come up with some way to specify it + fixing the header only partial_dependency thing and I'll be happy either way
15:47 elibrokeit: is it the kind of thing where as long as the library exists at build time the build still succeeds and -Wl,--as-needed removes the dependency again so users don't actually have to install it?
15:50 elibrokeit: (so then it would still be needed in e.g. distro build chroots but I assume that -dev packages already eagerly depend on everything they can)
15:59 karolherbst: elibrokeit: yep
18:12 cwabbott: anyone know what to do about https://gitlab.freedesktop.org/mesa/mesa/-/jobs/86016800 ?
18:12 cwabbott: perfetto uprev is failing because of some warnings emitted that our CI turns into errors
18:27 anholt: cwabbott: maybe something like dep_perfetto = declare_dependency(dependencies: dep_perfetto, compile_args: "-Wno-array-bounds")? not sure if that'll work.
19:00 airlied: mareko: no just a simple this block has multiple derefs of the same var, make one of them so I can find things easier, it's pretty trivial and probably useless
19:01 jenatali1: Wouldn't CSE do that?
19:02 airlied: I thought it did, but it doesn't
19:04 Kayden: somehow I thought nir_opt_deref did that, but maybe it doesn't
19:07 airlied: I did as well, I could just add it there, it's a pretty trivial hash table across a block
19:08 jenatali1: Looking at the CSE code, it should do it
19:13 airlied: might go figure out why cse isn't doing it, it makes sense that it should
19:31 airlied:goes back to try cse again in case I was just an idio
19:31 airlied:goes back to try cse again in case I was just an idiot
19:33 airlied: ah found the idiot, it was me
19:59 zmike: spiderman-pointing.meme
21:28 Venemo: karolherbst: do you know anything about when is shader_info::workgroup_size_variable used?
21:28 karolherbst: Venemo: rusticl only
21:28 Venemo: can you elaborate a bit more on how it's used?
21:28 karolherbst: CL allows you to specify additional shared memory at dispatch time
21:29 karolherbst: like you can have kernel arguments that are shared memory buffers and you specify the size when setting those arguments from the API side
21:29 karolherbst: which happens after compilation
21:29 karolherbst: or compute state creation in gallium terms
21:30 Venemo: what does that have to do with the workgroup size?
21:30 karolherbst: so this shader_info flag simply indicates that something like that might happen with the shader
21:30 karolherbst: ohhhh
21:30 karolherbst: I misread :')
21:30 karolherbst: confused it with has_variable_shared_mem
21:30 karolherbst: anyway
21:31 karolherbst: workgroup_size_variable means that the workgroup size is only known at dispatch time
21:31 karolherbst: and it's used in CL and GL
21:31 Venemo: how does that work in practice? when is it actually determined?
21:32 karolherbst: so with CL you either declare a required workgroup size, or you don't. And I think in GLSL you have something similar where you can say that the workgroup size isn't known at compile time
21:32 karolherbst: and then the workgroup size from the pipe_grid_info object is used
21:32 Venemo: okay, I get that. but then when is it specified?
21:33 karolherbst: here is the GL extension: https://github.com/KhronosGroup/OpenGL-Registry/blob/main/extensions/ARB/ARB_compute_variable_group_size.txt
21:33 karolherbst: Venemo: workgroup_size_variable? At compile time
21:33 Venemo: I mean when is the real workgroup size determined, and how
21:33 karolherbst: at dispatch time
21:34 Venemo: based on something that the application passes in?
21:34 karolherbst: yes
21:34 karolherbst: in GL you use DispatchComputeGroupSizeARB where you pass the size along
21:34 karolherbst: and in CL it's just part of the normal kernel launch
21:34 karolherbst: (And if it's not variable, the application has to pass the compile time size along)
21:36 karolherbst: in GL you have a different query able max workgroup size for variable shaders. In CL you can ask for the max workgroup size of a specific kernel object
21:36 Venemo: thanks
21:36 karolherbst: so drivers don't have to make sure the shader is able to run with every workgroup size and can restrict applications there as they see fit
21:37 Venemo: are there limits which the driver must respect? ie. is there a minimum amount?
21:39 karolherbst: apparently the GL extension requires a variable workgroup size of 512
21:39 Venemo: is that a total, or is that for each dimension?
21:39 karolherbst: total
21:39 airlied: does vulkan care?
21:39 karolherbst: vulkan doesn't have it
21:40 airlied: wierd, I wonder why nobody has dropped a spec for it
21:40 karolherbst: per dimensions the limits are 512/512/64
21:40 karolherbst: airlied: well zink emulates it with a spec constant
21:41 karolherbst: which.. isn't great, but it's one solution :')
21:41 Venemo: and how does variable workgroup size work w.r.t the dimensions? is the app allowed to use arbitrary dimensions as long as the product is <= 512?
21:41 karolherbst: yes
21:41 karolherbst: well
21:41 karolherbst: the driver can specify how big it cna be on each dimensions
21:41 karolherbst: x/y are 512 minimu, z is 64
21:41 Venemo: right
21:42 karolherbst: but yeah.. not sure why this never has been a vulkan extension, because being able to save the troupe of recreating the pipeline should be argument enough...
21:43 Venemo: I am asking about this from the point of view of NIR, basically trying to figure out how to fit this into a new NIR pass that I'm writing.
21:44 karolherbst: well depends on what your pass is doing
21:44 karolherbst: but the semantics are that if "workgroup_size_variable" is set, you can't use "workgroup_size"
21:44 Venemo: the pass lowers a shader that uses a higher workgroup size into one that does the same work but in a smaller workgroup
21:45 karolherbst: so you'd run the same shader 2 or 4 times or something, but with less regs and therefore you can run more at the same time?
21:45 karolherbst: or uhm.. more regs available I mean
21:46 Venemo: I hadn't considered that use, but it could be an interesting experiment.
21:47 karolherbst: I think if the size is variable you simply can't do that optimization... or maybe a factor is part of the shader
21:47 Venemo: at the moment my main use case is to work around HW limitations. the primary user of this pass will be NVK's mesh shaders, but it could be useful for anyone
21:47 karolherbst: ahh
21:47 karolherbst: right...
21:47 Venemo: a secondary user could be to work around cs_regalloc_hang_bug in old AMD GPUs
21:48 karolherbst: I wonder how that fits into mesh shaders... unless you want to run it always as a workgroup size of 32?
21:48 zmike: I think that was the only thing I had to use spec constants for and I hated it
21:48 karolherbst: and then I added a second thing :P
21:49 zmike: and I hated that too
21:49 Venemo: karolherbst: NVidia doesn't support real workgroups in their mesh shaders. if your application uses a higher workgroup size, NVidia's compiler will have to "loop" the shader. this is actually one of the main blockers for mesh shaders in NVK
21:49 karolherbst: both are so preventable.. we should really draft extensions for those
21:49 karolherbst: Venemo: right.... but the issue is that those run on the 3D pipeline and therefore you have weirdo restrictions
21:50 karolherbst: mainly.. you don't have workgrups :)
21:50 Venemo: karolherbst: exactly.
21:50 Venemo: at least, not on NVidia
21:50 Venemo: hence, NVK will have to lower all mesh shaders to a workgroup size of 32
21:50 karolherbst: right
21:50 karolherbst: but then you won't have to bother with variable
21:50 karolherbst: unless...
21:50 karolherbst: does GL have mesh shaders?
21:51 Sachiel: it does now
21:51 karolherbst: does it specify interactions with the variable workgroup size extension?
21:51 zmike: you're welcome.
21:51 karolherbst: or is it using a different specifier?
21:51 Venemo: karolherbst: the reason I ask about variable workgroup size is because I'd like to use this pass to mitigate "cs_regalloc_hang_bug" in radeonsi.
21:52 karolherbst: ahh I see
21:52 Venemo: as you said above, indeed the factor needs to be part of the shader, there is no way around that
21:52 karolherbst: yeah I think if you lower the size and put a factor on the shader_info, then drivers making use of that pass also have to read out the factor and honor it at dispatch time
21:53 karolherbst: if the workgroup size isn't variable, you can just change the workgroup size in the shader directly
21:53 Venemo: with that in mind, it should be just a matter of computing the local_invocation_id in runtime, hence my questions
21:54 karolherbst: I mean that's just details of the lowering, right? You'll still have to make it look like all those invocations exist
21:54 Venemo: yes
21:55 Venemo: I am trying to figure out exactly those details. but first I needed to understand how all of this works. thanks karolherbst for explaining it! I now have an idea how to solve this in the pass :)
21:56 karolherbst: it might make sense to do the reverse on nvidia 🙃
21:56 Venemo: reverse?
21:57 karolherbst: e.g. if the shader uses a lot of shared memory, you might only be able to run a single workgroup concurrently. So why not rework the shader so it runs two workgroups inside one, but makes better use of shared memory this way. Not sure it's work the effort tho
21:57 karolherbst: *worth
21:58 Venemo: it is definitely worth considering using my pass for that
21:58 karolherbst: I could see that we can do better than running two workgroups sequentially .. but who knows
21:59 Venemo: main issue with that is workgroup barriers
21:59 Venemo: if the shader has any barriers, it is not generally possible to lower it to a smaller workgroup and make it use less shader memory
22:00 karolherbst: could just duplicate the entire shader and put it sequentially and hope that nir optimizations are helping somewhat
22:00 Venemo: shared* memory I mean
22:00 Venemo: ie. you can't just duplicate the shader if it has barriers in it
22:01 karolherbst: if you don't mess with the workgroup size? why not
22:01 Venemo: I don't really see what you mean, sorry
22:01 Venemo: do you mean running two workgroups in one basically?
22:02 karolherbst: yeah, but like sequentially inside the shader instead of the hw running the shader again
22:02 karolherbst: and maybe nir can optimize it a bit better
22:02 Venemo: yes, that could be feasible, although that's a different problem from what I'm solving right now
22:02 karolherbst: yeah..
22:06 Venemo: karolherbst: do you have a test case in mind for variable workgroup size?
23:41 airlied: zmike: not like you to not create a full vulkan ext to avoid ugly :-P
23:42 zmike: I think that was in the early years when I was unaware of this awesome, terrible power I now wield
23:42 zmike: also it wasn't that bad and khronos would've been way more work
23:44 karolherbst: Venemo: doesn't the GL CTS repo have tests for ARB_compute_variable_group_size?
23:45 zmike: might just be piglit
23:46 karolherbst: impressive... it's an ARB extension and yet...
23:46 zmike: GL cts coverage is really bad in some places and really good in other places
23:48 airlied: you can sometimes tell which vendor wrote what ext by the cts coverage :-P
23:50 zmike: in vulkan you definitely can :P