17:28 alyssa: oof.
17:37 mareko: do we have a NIR pass that converts VS+GS to a mesh shader?
17:38 mareko: prim restart and vertex reuse are not needed
20:03 alyssa: mareko: No, but src/poly/ has a bunch of prior art.
20:04 alyssa: particularly src/poly/nir/poly_nir_lower_gs.c
20:04 alyssa: (poly does geometry/tessellation emulation atop of pure compute + VS, I originally wrote it for AGX and now there's an MR to use it for panfrost)
20:05 alyssa: for NGG GS, not sure there's anything to share in terms of code, but maybe ideas.
20:05 alyssa: e.g. for
20:05 alyssa: > The main issue with NGG GS is that is launches as many shader invocations as there are output vertices, but only a very small amount of invocations do the actual GS work. The rest are waiting for the GS threads to complete. Any way we could distribute GS work accross multiple threads would be a win.
20:05 alyssa: the GS emulation in poly sort-of solves this by lowering GS into VS
20:06 alyssa: (VS->GS->FS becomes CS->VS->FS)
20:06 alyssa: (VS->TCS->TES->GS->FS becomes CS->CS->[CS->]CS->VS->FS)
20:06 alyssa: (VS->TCS->TES->FS becomes CS->CS->[CS->]VS->FS)
20:07 alyssa: "NGG GS: Determine primitive connectivity at compile time for simple cases of GS"
20:07 alyssa: poly does this
20:07 alyssa: see `optimize_static_topology`
20:09 alyssa: apple gets hw mesh on m3+, which mesa doesn't support (yet?) so that's why this is all pure compute
20:09 alyssa: mali doesn't have hw mesh at all
20:40 ccr:'s poor brain becomes confused by the -> chains despite not understanding anything anyway
21:27 alyssa: ccr: compute/vertex/fragment shader pipelines
21:28 ccr: thought so, but what are TCS, TES?
21:43 Company: tesselation control / evaluation
22:15 mareko: alyssa: NGG is basically MS, i.e. it's launched as a workgroup and outputs are indices, vertices, and how many of those are generated; the thing with NGG is that VS, TES, and GS system values and the input assembler ([index fetch -> prim restart -> vertex reuse, tessellator] -> workgroup generation) are also available, so you get VertexID/InstanceID/TessCoords per vertex invocation,
22:15 mareko: PrimitiveID/GSInvocationID per primitive invocation, etc. so we already have GS as MS except we rely on the fixed-func input assembler to do the hard stuff (index fetch, prim restart, vertex reuse, tessellator)
22:17 mareko: VS+GS already runs as MS but with a fixed-func input assembler, so the next step is to make it just like MS by implementing the input assembler in MS too
22:20 mareko: TS could determine if prim restart flips triangle strip winding and pass that as bool into MS, and each MS workgroup can do the rest of input assembly by itself
22:22 mareko: TS can just divide a draw into MS workgroups and that's it
22:42 mareko: TS->MS == CS->CS (mostly)