06:38neatversion: It's possible 524 has to be taken to eight fields, index/offset comes from PC, cause every index is up to 32 offset is 4*32-offset, so 1 4 8 12, has indexes 1 2 3 4 and offset 1, so 2 6 10 14 has indexes 5 6 7 8 and offset 2. So when you do alus every alu graduates when last field becomes to known. So 1 4 8 12 maps to 15 , so if 10 goes down, alu indexes are passed as bit position index so 2 6 14, are not enabled, are hence all zeros,
06:38neatversion: when from second we get 6 and 3 and 4 and 5 and 6 and 7 and 8, that means answer is at index 10+6+3+4+5+6+7+8=49th, and hence mul is performed as first operand as 49 if second operand is at 12, that would release from 49th bank subindex 12 , 8 fields alike started with. Anyhow if even I got this wrong, it should be most compact at format 524 and 8 fields of 4 bits. Otherwise things would collide, as 1+2 is same as 3, and 2+3 is same as
06:38neatversion: 1+4. So each answer of alu is 524*524 length from a set of 524 different of those.
06:39neatversion: So each set is 524*524*524, quite few alus can be batched hence
06:50mareko: DemiMarie: Pierre-Eric is working on amdgpu native context
06:51DemiMarie: mareko: thanks!
06:55neatversion: So it's prolly coming as many thousands of alus can be batched together.
09:05DavidHeidelberg: mupuf: hey, I like the wildcard usage withtin flakes/fails etc. in general, but it raises one of mine concern: searching for the flake... So far, I do "rg first.second.third.forth" part and removing as I don't find it for including wildcards, but this is impossible to do with "first..*third.fourth" syntax
09:13mupuf: DavidHeidelberg: why do you search for it?
09:15DavidHeidelberg: mupuf: for example, sometimes make sense compare to other devices/drivers, if the flake is present etc.
09:16mupuf: Right, I see.
09:17mupuf: Honestly, I don't like listing fails in the tree, but it works OK most of the time. Flakes, on the other hand, are a nightmare
09:17DavidHeidelberg: I think it's not blocker for doing the "random place" wildcards.. maybe we can have tool which loads full list and translate regexes into full-names
09:17DavidHeidelberg:not voluteering now, just throwing ideas
09:17mupuf: A tool could work, yeah, but would still suck
09:18DavidHeidelberg: it could also report that test for example doesn't exist in current .*-CTS
09:18mupuf: What I hate with the flakes list is that it is ending up being append-only
09:19mupuf: This is because getting reproduction rate statistics doesn't work well with the current model
09:19mupuf: With CI Bug Log, it would be easy to see changes in the reproduction rate
09:19mupuf: And remove the flake
09:20mupuf: But with these lists, ... Good luck :s
09:20mupuf: We'll be starting to do nightly/weekly stress tests soon
09:21mupuf: Maybe we can use that to auto-generate the flake list
09:21mupuf: Not pretty, but better
09:29DavidHeidelberg: hmm, we'll see.
09:30DavidHeidelberg: btw. "ERROR - Piglit error: ES 2 not available.".. how tf is this error :D
09:30karolherbst: dcbaker: mhhh.. so https://github.com/mesonbuild/meson/commit/36210f64f22dc10d324db76bb1a7988c9cd5b14e added some very annoying behavior. The build system generally checks through meson if certain compiler flags are recognized, and if they assume they have gcc/msvc/whatever, they'll simply add those even globally, but that screws around with clang :')
09:32karolherbst: the question is reather, as it's causing a regression, do we have to opt in on the bindgen callside instead? like add a "inherit-c-flags": ["global"|"project"|"cli"], and then projects can manage what they'll want inherited by themselfs
09:32karolherbst: in mesa it's causing issues with `-mtls-dialect=gnu2` and some warnings we disabled, though only the former is causing errors
09:37DavidHeidelberg: mupuf: removing two red "^ERROR: " lines from the CI log, what do u say? https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/842 :P
09:37DavidHeidelberg:needs some ACKs
12:11neatversion: You likely thought that I see some, deluded dream, but it's not at all, however this algorithm I did, is most likely exactly the same as elias-fano. If you pin and eliminate and pass on results with correct inverse magic values, it allows to make insane solutions, yeah it's true , bitcount is not needed per result, if you link the transitions, which is bit cheaper, I have done some minor tests, and will soon implement the whole thing.
12:12neatversion: The last thing I worked at was not having dedicated pass in contrast to branching with global variables and gpio.
12:16neatversion: And if some chips have too few alu instructions available, I want to work them out with multi-pass rendering, but that needs dynamic geometry through egl.
12:17neatversion: Current calculations are that even the most vintage cards have enough, 64 instructions is way above the needed.
12:28neatversion: In science that others did they solved such a thing through format called that you all know, SSA, this suites too, but I do not have enough memory at it, I do different.
12:32neatversion: In other words off hand how the lifting worked I do not remember, best links I had on another HDD, which is no longer at service.
13:48neatversion: But I think since memory space at 4gb is presented as similar max 524 eight fields value, it's very cheap to carry it with every hash, just the same value can't be accessed or eliminated more than once, wondering if cisc could read globals more than once in the same basic block, aside from multiple threads, loop should copy it to register. It's pointless to access memory again if it was in register already, micro-ops do not copy the
13:48neatversion: memory to gpr:(, but was there an implicit copy done in any case before I dunno, need to go at MacBook. 524*524 and some linked list offsets, for all 4gb memory space, cause hw segmentation deals with memory.
14:14neatversion: I looked already the asm, so memory operand can only be destination, so that simplified things, it is hence always reading it to register
14:17neatversion: Add x86 instruction or any alu I meant that would do micro-ops
14:18neatversion: Mov can read mem too
15:17neatversion: My compiler reads the written memory again but points to memory location relative to the PC it was assigned, that's without optimization, optimization level 1 and above would not read that memory
15:17neatversion: But would play in registers
15:20neatversion: Should try with nopic
15:33neatversion: No difference but I messed up, it just put my read from global memory to var a, to also memory, even the noop would not do memory read from memory if that was read to register already
15:34neatversion: It's just that the global memory seems to grow down, it has nothing to do with PC relative
15:39neatversion: Hell it's still pc-relative, it's how objdump always disasms
15:39neatversion: Should try gdb
15:40neatversion: Anyway it's the needed output in both cases so whatever
15:46neatversion: It much seems that computer designers have just well thought every scenario including super fast execution that I have in mind, cause the segmentation unit does it exactly the needed way
16:57neatversion: That would end up as one check in the compiler, if rip is offset to earlier memory, replace destination with source and source with destination.
17:08neatversion: So it reads the global, say it reads two in asm, puts one to eax and another to edi, then writes both, from whatever registers, then reads them from memory again. That ends up as when relPC+mem is eax
17:10neatversion: For the write, it can write to eax
17:11neatversion: If it's edi it can write to edi, then the same with read... read eax write whatever, read edi instead of memory write whatever
17:12neatversion: Write to wherever I meant
17:56neatversion: But there's two more ways, one particular is decreasing PC cause hashed execution would not care and carry the memory space with every block, in that case memory writes go to pointing where the access happens, that is also good. Requires no modifications.