14:17 robclark: make clpeak float/half fast with this one little trick... anyone care to review simple change to add missing opt_algebraic rules: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39899
14:49 dj-death: is this a network issue on CI ? : https://gitlab.freedesktop.org/mesa/mesa/-/jobs/93535370
14:56 daniels: dj-death: nah, it's a transient issue that sergi & our admins are fixing, see #freedesktop
14:58 sergi: dj-death: it should work back now. Let me know if there is any problem
15:08 dj-death: thanks
17:16 idr: glehmann: Do you have a branch in progress for SPV_VALVE_mixed_float_dot_product?
17:21 alyssa: `(('ior', ('bcsel', a, '#b', '#c'), '#d'), ('bcsel', a, ('ior', b, d), ('ior', c, d)))` would clean up junk on a CTS shader, no idea if similar permutations with other opcodes might be profitable on real shaders
17:22 alyssa: (should be correct for /any/ binop but we can't express that with nir algebriac..)
17:40 glehmann: idr: yes
17:42 glehmann: idr: https://gitlab.freedesktop.org/DadSchoorse/mesa/-/commits/radv-mixed-float-dot
17:42 idr: glehmann: I'll see if we can do something clever for those opcode.
17:43 glehmann: to be honest, I thought no other vendor would be interested
17:44 glehmann: even had a look at the intel isa doc, but I only saw integer versions
17:45 idr: glehmann: We won't have any specific opcodes, but... it may be possible to do something clever for at least some of them.
17:51 glehmann: alyssa: ideally I would say we should have something that cleans this up for literally any opcode
17:51 glehmann: but yeah, opt_algebraic is the wrong tool here, kind of
17:51 glehmann: but we already have similar patterns
17:52 alyssa: Yeah
18:10 idr: I could have sworn that I added some patterns like that to opt_algebraic.
18:13 idr: fe3c5182775b does that for comparisons.
18:14 idr: I guess that just made changes to the even older 604ae33c8b95.
18:14 jenatali: Seems like a general "constant reassociation" pass might be useful
18:15 idr: Maybe I had more patterns in a branch that didn't pan out...
18:15 idr:checks
18:19 alyssa: jenatali: I mean we have nir_opt_reassociate but that's for something else
18:20 alyssa: <he reassociates things like ((x + 1) + uniform) + 2 to ((1 + 2) + uniform) + x>
18:26 glehmann: sadly nir_opt_reassociate is both too aggressive and not aggressive enough for the things I would like it to do
18:27 idr: Looks like I have two things...
18:27 glehmann: but I should get back to that at some point, radv really needs it to clean up cooperative matrix addressing math
18:27 idr: A branch that does that kind of reassociation in GLSL IR. :screaming:
18:27 idr: And a branch that does some stuff like (('ior', ('bcsel(is_used_once)', a, '#b', 0), c), ('bcsel', a, ('ior', b, c), c)).
18:28 alyssa: glehmann: Yeah...
20:02 karolherbst: soo.. I need a RA expert for a problem I'm seeing in NAK: We do have vector alignment requirements for instructions writing/reading to/from more than 32 bits (e.g. loads or tex ops), and because they need to be vec aligned (to the size of the entire vec, POT only), this also makes SSA regalloc a tad more complicated. Atm NAK doesn't really takes
20:02 karolherbst: any additional steps to make this work and RA inserts tons of movs (like half the shader can end up being movs in really bad cases) to satisfy vector alignments and such. I was trying to fix this and make RA allocate vectors as whole things, but then that conflicts in shaders which write to single scalars of it and register usage blows up, because
20:02 karolherbst: the whole RA allocated vector gets replaced instead of single components updated. And so I was wondering if there are relevant papers on that matter on how to deal with this properly.
20:05 alyssa: karolherbst: what NAK does *is* the state of the art, afaik.
20:05 karolherbst: oof
20:05 karolherbst: so my attempt was to track vectors across phi nodes and that made things better in some cases, but worse in others
20:06 karolherbst: but NAK also aggressively splits up phi vectors
20:06 glehmann: aco's RA does pretty much what NAK does for some RT instructions, and it avoids movs by having good vector affinity handling
20:07 karolherbst: it's kinda very 32 bit register focused, not sure if that's the core issue here and if I find more on it on the relevent paper
20:07 karolherbst: glehmann: okay so "good vector affinity handling" might sound like the thing NAK is missing
20:08 karolherbst: like my attempt was to improve the PhiWeb handling, but I'm also like.. not an RA expert at all
20:08 alyssa: who is?
20:08 alyssa: as a wise RA developer once told me when I said I was too dumb for RA,
20:08 alyssa: "we're all too dumb for RA"
20:08 karolherbst: heh
20:09 karolherbst: yeah anyway, point is.. I have shaders where I see like 7 or 8 movs between actual instructions 🙃
20:09 karolherbst: it's kinda wild
20:09 glehmann: I can only refer to daniel and rhys for more involved questions about aco RA
20:10 karolherbst: okay.. my assumption was that aco doesn't really need it, because it's not as vector based as nvidia's ISA, but maybe I should scan a bit through acos RA code and see if I find something interesting there
20:13 glehmann: if anything aco has to deal with even more vectors
20:14 glehmann: we have 16x32bit SMEM loads after all
20:14 karolherbst: and the registers need to be vector aligned? sounds annoying :)
20:15 glehmann: it depends, SGPR vectors need to be aligned to 2 or 4 regs, VGPRs can be any alignment
20:15 karolherbst: well.. on NV it always has to be aligned to the vector size up to the next POT
20:15 jenatali: At some point I need to do a massive rework of RA in WARP, I've also seen shaders where most of the shader is movs
20:16 glehmann: on some new cdna chips, VGPRs need to be aligned to 2 regs too, but we don't support those yet
20:16 karolherbst: though not sure if we need vec8 alignment on Blackwells new 256 bit loads...
20:17 karolherbst: right my point is, on nvidia you always have to align no matter what it is about, so every load/store on global memory needs 64 bit alignment
20:17 karolherbst: no matter what type of register it is
20:18 karolherbst: jenatali: you have to do RA in WARP?
20:19 karolherbst: ohh it's the sw thing...
20:19 jenatali: Yeah, I'm one of the primary maintainers of it these days
20:19 karolherbst: just use LLVM 🙃
20:19 glehmann: suprised it's not llvm based already
20:19 jenatali: Yeah, sorry not super relevant, just talking about RA is making me remember that it's awful there too
20:19 jenatali: Yeah, pros and cons
20:20 karolherbst: well at least you don't have to run on 100 CPU archs
20:20 jenatali: It'd bloat binary size by... well, a lot
20:20 jenatali: Yeah, just 2 these days
20:21 karolherbst: good times when I was looking into s390x big endian llvmpipe bugs, because colors were all weirdly wrong :')
20:22 karolherbst: anyway yeah.. that's a core issue I want to fix in NAK, because it's really causing significant perf issues :')
20:28 linkmauve: I have yet to figure out how to fix colours being all weirdly wrong (RGBA<->ABGR swapped most likely, or BGRA<->ARGB) on PowerPC, since I now have more PowerPC computers than x86 computers on my desk.
20:28 linkmauve: Still more ARM than both though. :D
20:29 karolherbst: tldr: reversing the channel order isn't how to make this all big endian compatible :P
20:30 K900: If anyone wants some big endian fun, spirv-tools tests explode
20:30 K900: Fixing that would be a good idea probably
20:30 karolherbst: like on a quick glance it looks like the "right" idea, but once you dig into RGB565 and other weird formats it all explodes
20:30 K900: (also spirv is host endian whyyyyyyyy)
20:31 K900: (like literally most of the test failures are just "oops test data is LE")
20:31 karolherbst: K900: so the app is responsible? sounds like a smart move
20:31 K900: (but not all of them)
20:32 linkmauve: My G4’s Radeon Mobility 9550 only does GL 2.1 and GLES 2.0, SPIR-V will only be useful for lavapipe I expect.
20:32 K900: karolherbst: And the app has to ship two copies of all shaders if it wants to run on both LE and BE, yes
20:32 karolherbst: it can swap in memory tho
20:32 linkmauve: And the CPU is already slow enough, on llvmpipe it does 4 fps where my laptop does 170 fps on the same game.
20:33 K900: karolherbst: Yeah but then you still need an endianness aware parser
20:33 K900: And you need to round trip through it
20:33 K900: Which kinda defeats the whole point
20:33 K900: If there even was a point in the first place
20:34 karolherbst: heh
20:35 K900: Like I'm sorry but I genuinely do not understand how you design a supposedly hardware independent bytecode format AND MAKE IT HOST ENDIAN
20:35 karolherbst: yeah, should have been little endian :P
20:36 K900: I mean Java is big endian and no one complains
20:36 K900: Despite it being wrong for like 99.9% of computers in active use
20:36 karolherbst: yeah but games run mostly on little endian machines, and making it more painful for 0.0001% of all users is a bad idea :P
20:36 K900: But like just pick one
20:36 K900: Either one
20:37 karolherbst: I'm sure it's more fun on mixed endian systems
20:37 K900: Host endian in practice just means LE and also if you're running a BE system then fuck you
20:37 karolherbst: GPUs are kinda irrelevant on BE tho
20:37 K900: And like just make it LE
20:37 K900: Sorry I'm unreasonably mad about bytecode design
20:38 K900: karolherbst: There's people running ppc64be on POWER9
20:38 karolherbst: without GPUs, yes
20:38 K900: Those things are weird but they can most certainly run a modern GPU
20:38 K900: I know at least one person using one as a workstation
20:38 karolherbst: I'm sure it's secretly LE
20:40 karolherbst: or software
20:40 karolherbst: (or an old GPU)
20:40 K900: But also like, it's the principle of the thing
20:40 K900: Gonna set up a gofundme to pay for my Khronos membership so I can get into every SPIR-V WG meeting and yell at people about endianness until they agree to make it LE in the next major version
20:41 karolherbst: BE is dead, just let it die
20:41 K900: I think it's some sort of GCN and they claim at least OpenGL works
20:41 K900: Vulkan doesn't because spirv-tools
20:41 karolherbst: this is too painful to watch
20:41 K900: I don't even disagree
20:42 K900: I just want the spec to reflect that instead of forcing the BE containment breach upon literally everyone
20:42 karolherbst: but the thing is, llvmpipe does smart memory vectorization tricks that don't really work on BE, because of texture formats and it's just pain
20:45 karolherbst: yeah... dunno.. maybe there was a good reason for it or just nobody really bothered discussing it and it just become host endian because nobody cared
21:16 robclark: karolherbst: btw, for cts do you run_conformance.py or is there some better way to run at least parts of it in parallel?