01:15medfly: drivers/gpu/drm/amd/display/dc/dcn{31,32,35,301,314,321}/Makefile have an interesting copyright notice probably forgotten before an initial import
01:25DemiMarie: FL4SHK: GPUs are SIMT for a reason, MIMD will have too much overhead for performance.
01:39FL4SHK[m]: <DemiMarie> "FL4SHK: GPUs are SIMT for a..." <- can you tell me why?
01:39FL4SHK[m]: MIMD just means all the cores have separate instruction streams
01:40FL4SHK[m]: though separate caches...
01:45FL4SHK[m]: maybe it's the bus accesses?
01:46FL4SHK[m]: or the cache coherence, but I'm not sure how you avoid cache coherence with SIMT
01:46FL4SHK[m]: oh
01:46FL4SHK[m]: Wait I think I get it
01:46FL4SHK[m]: though
01:47FL4SHK[m]: wait no you can't necessarily avoid data cache coherence if you have two cores using the same address
01:51FL4SHK[m]: guess it might have to do with atomics
01:51FL4SHK[m]: though if you have multiple threads, you need those too
02:41airlied: agd5f: might want to fix the above ^
05:19DemiMarie: <FL4SHK[m]> "MIMD just means all the cores..." <- That’s the overhead. GPUs use SIMT to amortize the overhead of instruction decoders and the like.
05:25DemiMarie: Without SIMT, most of the die area would be spent on control logic, rather than on compute. Nowadays, SIMT is a large part of what makes a GPU a GPU — there are compute GPUs that don’t even have the ability to render graphics.
05:31einecardiograph: they do not amortize the overhead of decoders though multiple cores remark mimd vs simd sounds right VLIW is something similar to mimd.
05:58einecardiograph: i see a minor disambiguation between decoders and encoders i deal with and the decoders on hw too btw. as well as i see a trouble of doing my encoders without loops and bitshifts, it's not possible, however dma packets can bitshift well or unaligned access , but in the same subject if you amortized the cost of opencl decoders you would lose the cores, gpu is very complex hardware and
05:58einecardiograph: nowdays they use most often so called programmable shading hw which is not fixed anymore but most oftenly unified.
09:34sima: lumag, I replied some more on the fastrpc thread, maybe I was in a too good mood on Fri :-)
09:41lumag: sima, :-D
11:15nerdopolis: How do I add the Bisected tag to a Mesa issue I opened a week ago? I recently found out the commit causing a build issue
11:18ccr: nerdopolis, you don't, unless you have the permissions to do it.
11:18ccr: you can edit the title and add [bisected], someone with the rights may eventually tag it
11:20nerdopolis: Ah, thanks. I will do that.
11:43FL4SHK[m]: <DemiMarie> "That’s the overhead. GPUs use..." <- gotcha
11:44FL4SHK[m]: Well, I'll go for my idea anyway I think
11:45FL4SHK[m]: what I was thinking of doing was each core is a partitioned SIMD, but the cores would be arranged in a MIMD fashion
11:45FL4SHK[m]: if I keep the control small I should still be able to fit a lot of cores
11:46FL4SHK[m]: since this is an FPGA implementation, I can always change it
11:53alarumbe: hi robclark, do you think you could have a quick look into the last two patches in this series: https://lore.kernel.org/dri-devel/uzsqh2b3j7hp6z3zcjcsxxudt2sucgutzwof5bhsvjjaeusigy@wvfhibqtyz4y/T/#t
13:13FL4SHK[m]: question: do GPUs include virtual memory?
13:13stsquad: I'm trying to decode some acronyms, any idea what a shmem virgl GBM FB BO is and why they get allocated for KMS rendering and not under a GUI toolkit?
13:14karolherbst: FL4SHK[m]: yes
13:14FL4SHK[m]: hmm
13:14FL4SHK[m]: how does that work?
13:14FL4SHK[m]: I'll have to include virtual memory in my design
13:14karolherbst: applications can allocate a VMA context of whatever form, and then commands submitted by applications have a VM attached
13:15FL4SHK[m]: what's VMA? virtual memory address?
13:15karolherbst: ehh I meant VM
13:15FL4SHK[m]: gotcha
13:15FL4SHK[m]: I'll have to develop this machine further
13:15karolherbst: command buffers can contain things like "copy from this VMA to this VMA"
13:15FL4SHK[m]: what's VMA?
13:16karolherbst: an address in this case
13:16FL4SHK[m]: gotcha, so it's a virtual address?
13:16karolherbst: yeah
13:16karolherbst: shaders also operate on virtual addresses for things like ssbos or global memory in compute APIs
13:16FL4SHK[m]: I see
13:16FL4SHK[m]: my plan is to make a partially MIMD GPU, just to speed up compute tasks
13:17karolherbst: yeah.. ignoring 3D for now isn't the worst idea
13:17FL4SHK[m]: ... but with each core having SIMD operations
13:17FL4SHK[m]: well 3D is something I think I can handle
13:18karolherbst: the question is mainly how much of 3D do you want to do in hardware and how much in software
13:18FL4SHK[m]: I wrote a software rasterizer to learn about am the graphics pipeline
13:18FL4SHK[m]: well
13:18FL4SHK[m]: I heard that rasterization is often done in hardware
13:18FL4SHK[m]: the actually drawing triangles stuff
13:18FL4SHK[m]: and clipping I think could be done in hardware
13:19FL4SHK[m]: Could be wise to do those two in their own pipelines
13:19vsyrjala: mlankhorst: tzimmermann: ack for merging via drm-intel https://lore.kernel.org/all/ZnLDKM2I8WWrWwmO@intel.com/ ?
13:19FL4SHK[m]: texture mapping could be in software
13:19FL4SHK[m]: but I'm not sure how fast I can make that
13:20FL4SHK[m]: to me it makes sense to include multiple pipelines for rendering
13:20tzimmermann: vsyrjala, ack
13:20vsyrjala: tzimmermann: thanks
13:20FL4SHK[m]: pipelines here being hardware pipelines
13:20FL4SHK[m]: not talking about the graphics pipeline in this case
13:21FL4SHK[m]: perhaps a tiled renderer could be wise for what I'm doing
13:22FL4SHK[m]: I hope that answers your question karolherbst:
13:25MrCooper: Company: one issue with using EGL on X instead of GLX is that there's no functionality corresponding to GLX_OML_sync_control & GLX_INTEL_swap_event, which makes frame scheduling hard
13:34Company: MrCooper: interestingly, nobody has complained about that yet (and afaik nvidia has been using EGL over GLX for years)
13:35Company: but good that you amde me aware of that difference
13:45jadahl: Company: NVIDIA doesn't support GLX_INTEL_swap_event
13:46Company: true, it could be that nvidia is just worse no matter what you do
16:13DemiMarie: FL4SHK: do you plan to make your GPU something that would have decent performance if implemented on an ASIC?
16:17FL4SHK[m]: Demi: I have no way to get an ASIC made, so probably not?
16:17FL4SHK[m]: ASICs are expensive
16:19DemiMarie: FL4SHK: There are actually ways to do that (on old processes) via various programs.
16:19FL4SHK[m]: really?
16:19FL4SHK[m]: I've heard of Google Skywater
16:19DemiMarie: Yes
16:19FL4SHK[m]: Hmm
16:19DemiMarie: But an FPGA is still good for development.
16:20DemiMarie: FYI, GPUs used to be SIMD and later moved to an SIMT model.
16:20FL4SHK[m]: right
16:21DemiMarie: I am more interested in whether your design is overall high performance and would be within an order of magnitude of the performance of modern hardware if built on the same process.
16:21FL4SHK[m]: Honestly I don't know
16:22FL4SHK[m]: if it were, that would be nice
16:22DemiMarie: Also, SIMT is not an implementation detail anymore. It is explicitly exposed in APIs.
16:22FL4SHK[m]: I see
16:22FL4SHK[m]: I've done some CUDA development
16:22DemiMarie: So you will need an SIMT architecture to support the latest features.
16:22FL4SHK[m]: huh
16:23FL4SHK[m]: that's... interesting
16:23FL4SHK[m]: Not exactly what I wanted to hear
16:23DemiMarie: Wave and subgroup ops are what you are looking for.
16:24DemiMarie: I suggest implementing a Vulkan driver for your hardware before finalizing the hardware design.
16:24FL4SHK[m]: I wanted to do that
16:24DemiMarie: That will ensure you have all the needed features.
16:25FL4SHK[m]: Also as long as I don't get an ASIC made I'm free to change the design
16:25DemiMarie: Yup
16:26FL4SHK[m]: I have never really done ASIC designs before
16:26FL4SHK[m]: just FPGA dev
16:27FL4SHK[m]: can you tell me more about wave and subgroup ops?
16:27FL4SHK[m]: Like, what are they
16:28FL4SHK[m]: I might be able to implement SIMT on my MIMD design, if I understand correctly
16:30FL4SHK[m]: but still be better in some ways for general purpose compute
16:31FL4SHK[m]: not sure this is the case
16:33jenatali: Wave and subgroup ops are exposing the SIMT data as SIMD instead
16:33FL4SHK[m]: I see
16:36FL4SHK[m]: I'll study up
16:55FL4SHK[m]: Demi: can you give me some documentation I could read about this stuff?
16:55FL4SHK[m]: or point me to it
16:55DemiMarie: FL4SHK: I would look up the SPIR-V and Vulkan specs, along with the operations supported by the latest GLSL to SPIR-V translators.
16:56FL4SHK[m]: ah
16:56FL4SHK[m]: thanks
19:37retroassembling: I do not have info to deliver left anymore, I am coding again as of now, cleaning my home lab and tomorrow i drop most of the oftc usernames there are many :)
19:37retroassembling: https://www.alfonsobeato.net/arm/how-to-access-safely-unaligned-data/
19:40retroassembling: i understand that i mixed up minuend and subtrahend meaning, but overall research is still entirely graduated, no more details
19:44retroassembling: also DemiMarie was talking about some mmu stuff, that i was not able to grasp, but soft-mmu can be exposed much the similar compacted form, as long as one allocates enough basically anything could be now done from sw.
19:45retroassembling: I have so much programming to do, i figured i deal with rtlinux and rtoses as of now, to see if they spawn minimal amount of processes, zephyr does not well support x86
19:50Mis012[m]: what would "well support" mean, adding ACPI support to zephyr would be kinda insane, and the other staple of x86 designs is initializing stuff in uefi and then pretending to the OS that it doesn't need initialization, which in this case probably doesn't hurt too much?
19:53Mis012[m]: stsquad: shared memory virtual gl general buffer manager framebuffer buffer object probably, but virgl refers to OpenGL pass-through to a VM afaik
19:55retroassembling: yes, indeed , i agree with this comment. There is native_posix and it was maximum to be able to offer to users.
19:56retroassembling: i actually have 5 laptops and all of them are supported by coreboot for a coincidence, and i have one fitpc which also has some support.
20:15retroassembling: but i do not recommend random people to deal with synthesizing gpu hardware, it's the most complex commodity hw, i did spend nearly five years looking at miaow code namely cause my life sucked and i had nothing to do, but the results i came out with matched the comments in the mailing list with/by tom stellard. Only thing was that he commented this at year 2012, but my conclusions one
20:15retroassembling: to one came in 2018 or something, so 6 years later, so under normal conditions this was waste of time but once again i was sanctioned in my country and had nothing other to do.
20:24uis: Can rendering be sped-up by binning groups of triangles in engine?
21:11karolherbst: uis: you mean like glDrawElements?
22:06uis: karolherbst: in drawcalls, yes.
22:06uis: I know there is some binning happening in llvmpipe