06:34tokyovigilante: Hi, I note in the commit for bfloat16 support for RADV is the comment "GFX11 seems to have precision issues, so don't enable the extension there for now." Is this likely to be revisited at some point? I note that it is enabled in AMDVLK for my hw (7900 XT) on Windows.
06:59glehmann: tokyovigilante: it's trivial to enable it, but gfx11 doesn't pass CTS with bf16
07:00glehmann: what does amdvlk expose? just bf16 cooperative matrix with fp32 accumulator, or also with bf16 accumulator and also the dot products?
07:04glehmann: looking at gpuinfo.org, they only support conversions and dot products, no bf16 cooperative matrix
07:04glehmann: we could do that too, but it's kind of useless
07:08tokyovigilante: yup I did see the flag, good to know re the matrix. I'm only running whisper and a small LLM to clean up currently, so performance not too bad regardless.
14:22mareko: zmike: there is a TLS pointer to an array of GL function (it's set to one of the tables in struct gl_dispatch), glapi has asm code (or C code if you disable asm) that generates GL functions that are returned from GetProcAddress that just call the function from the TLS dispatch table, which is, again, one of the tables in struct gl_dispatch; that's how the glapi dispatch works and it's as simple as it
14:22mareko: could possibly be, I don't see how it can be untangled more
14:23mareko: the GL functions for GetProcAddress are generated by python and the output is either asm or C
14:24mareko: the generated functions are also exported from libGL, libGLESv1, and libGLESv2 and built into those loaders if glvnd is disabled
14:26zmike: I meant mentally untangle
14:27zmike: your description matches up with what I managed to figure out, so that's good
14:28zmike: re: gl_dispatch though, suppose I wanted to inject another function in front of the mesa/glthread functions; would it be enough to override the Current table and then have the injected functions call the old table?
14:28mareko: yes
14:29zmike: nice
14:29mareko: with glvnd, you could return your functions from GetProcAddress (there is no other way to get GL functions), and then your function can call the current dispatch table functions, which is how the dispatch works today
14:31mareko: by default, there is either a function call (without asm), or a load and jump to the dispatch table function pointer (with asm)
14:32zmike: probably need it to work with and without glvnd
14:32mareko: without glvnd, the same functions are also built in and exported from libGL, libGLESv1, and libGLESv2
14:35mareko: the complexities that are impossible or detrimental to remove: 1) if glvnd is disabled, the generated GL functions must also be part of the loaders, which complicates stuff; with glvnd, they are only in libgallium, 2) the asm code can be removed, but the C version is slightly slower, 3) if glthread is always enabled for all drivers, we could return marshal functions from GetProcAddress and remove almost
14:35mareko: all of the dispatch code, but then glthread can't ever be disabled
14:36zmike: I'm trying to see if I can jam the apitrace tracing wrappers into mesa as a compile option so that mesa itself can generate traces
14:37zmike: it's okay to take on certain restrictions for that (e.g., glthread must be enabled)
14:37mareko: at some point, we might want the following: 1) removal of non-glvnd support, 2) enabling glthread for all drivers permanently that's impossible to disable
14:43zmike: glthread is very bad on machines without enough cores, so we probably don't want to force enable it globally
14:44mareko: it's the only way to delete most of glapi complexities
14:45zmike: that's a bit out of scope for my current investigation
14:48mareko: note that the functions that GetProcAddress returns must work with all drivers, so it can't be just marshal functions at the moment, though I suppose we could do that on x86 where glthread can be enabled permanently
14:51mareko: I'd say the removal of non-glvnd paths is something we could do