00:28karolherbst: nice.. I think I reduced the CPU overhead enough, so I got like 5% of more performance :D
00:31karolherbst: clCreateKernel 20% -> 4%
00:48karolherbst: I've written some silly code :')
00:48karolherbst: but I think I was just very optimstic on what's going to be a hot path for some applications
00:53karolherbst: 3.4% mhhh
01:01karolherbst: now 5% of the CPU time is spend on setting kernel args and 7% on enqueuing kernels... maybe I can bring down those numbers as well
01:03karolherbst: and 3% of releasing the kernel, which is literally just some free
01:35karolherbst: mhh yeah, nothing I can easily do... oh well...
01:36karolherbst: (well, besides figuring out why it's slow on the GPU side here or something)
02:13airlied: mwalle: robertfoss is probably who to ping
02:13airlied: robertfoss: ^
07:27mwalle: airlied: thanks
07:28mwalle: robertfoss: just in case the URL was lost: anything on https://lore.kernel.org/r/D09FZ0P0ARBE.1YPEPPM160VJK@kernel.org/ there was no reponse for months now :/
10:36MrCooper: interesting, "mangohud vkcube-wayland --present_mode 1" shows ~10% higher frame rate (~3600 vs ~3300) with implicit sync compared to explicit sync with RADV on Navi 14, CPU load ~10% in both cases
11:45zamundaaa[m]: MrCooper: about 10% difference on my PC (11k vs 10k) as well, that's pretty weird
12:18MrCooper: zamundaaa[m]: yeah, while it was clear that explicit sync can't really be faster, I didn't expect it to hurt that much
12:22zamundaaa[m]: I would've expected it to be slightly faster, as it can avoid using buffers the compositor is still using on the GPU side
12:24MrCooper: not convinced that really makes a difference, the resulting client GPU work can't start earlier anyway
12:35zamundaaa[m]: Of course it can start earlier, by the client picking a buffer that's free
12:36zamundaaa[m]: If it picks the buffer the compositor is still busy on, then the client's GPU work has to wait until the compositor is done with it
12:37MrCooper: that's the point, it has to wait anyway, since the compositor uses a higher priority context
12:43zamundaaa[m]: right, if you assume the GPU is a serial device, then it shouldn't make a difference in practice
12:43zamundaaa[m]: unless the client is rendering on a different GPU. Gonna test that
12:47zamundaaa[m]: I get maybe 1% more fps with vkcube-wayland running on the integrated GPU with explicit sync vs without. It fluctuates a bit with both explicit and implicit sync though, idk how reproducible those results are
12:48zamundaaa[m]: More important than the slight improvement is... I don't get the 10% drop in fps
12:52MrCooper: I see more than 1% fluctuation in the frame rate while it's running, so that would need to be qualified more :)
12:53zamundaaa[m]: The 1% improvement is in the maximum fps. Idk how to tell mangohud to average over a longer time period to check that part out
12:56MrCooper: anyway, I don't think your explanation for an improvement works in this case either, since the client doesn't draw to the same buffer the compositor samples from, so the client GPU work doesn't have to wait with implicit sync either
12:57zamundaaa[m]: I don't think Mesa does a blit with Vulkan
12:57MrCooper: hmm, or doesn't Mesa always use a separate shared buffer with Vulkan?
12:57zamundaaa[m]: It only does that with EGL
12:58zamundaaa[m]: At least that's how I understood the code last time I looked into the multi gpu changes for the linux dmabuf protocol
12:59MrCooper: hmm, that means either non-optimal access characteristics for either GPU, and/or buffer storage migration ping-poing though
13:00MrCooper: or not, since the client uses the integrated GPU
13:01zamundaaa[m]: yeah, in this case it should work out. In the other direction it might not be that great though
13:01MrCooper: yep
13:18MrCooper: zamundaaa[m]: weird, with a second dGPU, implicit sync starts out >10% slower than explicit sync, but after a while the frame rate jumps up to the same range (reproduced 3x in a row)
13:19MrCooper: never mind
13:20MrCooper: it starts out slower with explicit sync as well, guess the GPU clocks are lower
13:20MrCooper: can't see any difference in this case
13:35MrCooper: zamundaaa[m]: '"MANGOHUD_CONFIG=frame_count mangohud vkcube-wayland --present_mode 1" & sleep 60 && killall -STOP vkcube-wayland' shows how many frames it presented in a minute
14:20zamundaaa[m]: MrCooper: seems to be slightly better (<1%) with explicit sync still, but the difference between two runs is a few percent so it's hard to make any actual statements without running tons of tests
14:21MrCooper: or at least running the numbers through something like ministat, yeah
14:22MrCooper: sounds like no significant difference though
14:22zamundaaa[m]: Either way, more important is that there's an fps drop if it's on the same GPU, but not if it's a different one. It's at least a clue to where the problem comes from
15:00MrCooper: zamundaaa[m]: BTW, AFAICT there's a blit for PRIME with Vulkan as well
15:18zamundaaa[m]: Ah, indeed, that seems to have been fixed a while ago. Sorry for spreading outdated information
15:27robertfoss: mwalle, arlied: thanks, having a look
15:27robertfoss: mwalle: sorry about the delay, i'll have a look today or tomorrow
15:30zamundaaa[m]: MrCooper: do I see that correctly, the blit goes from the swapchain buffer to just another buffer on the same device as the swapchain?
15:31MrCooper: I think so, it uses system memory for storage though
15:32zamundaaa[m]: okay, that's less bad then. Dunno if that actually guarantees the import to the main device to be possible though
15:33zamundaaa[m]: but if it works in practice it's probably not a real problem
15:40orbea: idk this is the right place to ask, but any ideas what happened to https://gitlab.com/panfork/panfork ?
16:26DavidHeidelberg: Serious question: should egl/glxinfo has -d option for decimal output OR should GL-CTS print hex values?
16:26DavidHeidelberg: I'm getting mad converting back and forth between these two and it gets seriously annoying.
17:13daniels: orbea: only one person knows, and they aren't here
17:14orbea: daniels: do you mind elaborating?
17:14daniels: orbea: panfork was a one-person effort, and that person has not been part of the mesa community for quite some time now
17:14orbea: ah, and they are now MIA?
17:15daniels: unsure
17:15orbea: okay, thanks for the info
17:18daniels: np
17:46DavidHeidelberg: daniels: what do you think about the hex/dec eglinfo output vs CTS?
17:48daniels: DavidHeidelberg: adding a flag seems reasonable yeah
17:50DavidHeidelberg: on eglinfo side?
18:06daniels: yeah
18:25DavidHeidelberg: daniels: done
18:57daniels: DavidHeidelberg: where'd you get to with the mesa-rootfs bucket?
19:04DavidHeidelberg: daniels: I think sergi claimed the task. Generally it should be just changing the part and re-running the uploads, right?
19:08daniels: yeah