02:12DemiMarie: For drm-panic, has kexec (or something like it) been considered?
10:38karolherbst: jenatali: how are you handling memory maps with OpenCL? Allocate a CPU side buffer and synchronize accordingly? I'm considering to stop handing out pointers to mappings and just have host side copies
12:20jenatali: karolherbst: ALLOC_HOST_PTR resources are allocated out of sysmem. Everything else allocates a temp buffer and copies
13:39karolherbst: jenatali: with temp buffer, you mean a buffer located in VRAM or host memory mapped into the GPU address space?
13:40karolherbst: or rather.. I'm considering syncrhonizing mapped memory like that (GPU copies VRAM -> mapped buffer via compute shaders)
13:40jenatali: karolherbst: host memory
13:41karolherbst: my only concern is that this is quite inefficient, because the application might just copy it to its own buffer somewhere, but oh well I guess (and it also uses RAM)
13:41karolherbst: I wonder if I want to keep what I'm currently doing as an optimized path for single device contexts...
13:42karolherbst: or something similar
13:52jenatali: karolherbst: if it's ALLOC_HOST_PTR then it should be placed in host memory
13:52karolherbst: ehh, I meant the mapping part here
13:52jenatali: If it's not in host memory then I always copy to host memory for mapping instead of mapping VRAM directly
13:53karolherbst: right, I'm considering doing that as well now
13:54karolherbst: atm for `CL_MEM_ALLOC_HOST_PTR` I'm using `PIPE_USAGE_STAGING`, but I wonder if I should allocate myself and use `resource_from_user_memory` instead, so the same allocation can be shared across devices
13:54karolherbst: but...
13:54karolherbst: I think that would require gallium changes so that drivers can tell me when it won't fail
13:55karolherbst: but anyway, that's going to be a fun rework
13:57jenatali: karolherbst: staging seems like the right thing there
13:57karolherbst: right.. but if you have three devices, they'll all allocate in system RAM individually
13:57karolherbst: probably
13:58jenatali: Yeah
13:58jenatali: Is that a thing that people actually do?
13:58karolherbst: doing what?
13:59jenatali: Using ALLOC_HOST_PTR on multiple devices
13:59karolherbst: I have no idea
13:59jenatali: Then I wouldn't worry about it until you see it be a problem
14:00karolherbst: though then I'd also have to do an additional allocation on the host for mapping
14:01karolherbst: and then ALLOC_HOST_PTR would kinda perform terribly as well
14:03karolherbst: I guess should I try a few things out here
19:45Mis012[m]: robclark: wtf is this https://blog.chromium.org/2024/06/building-faster-smarter-chromebook.html
19:45Mis012[m]: > To continue rolling out new Google AI features to users at a faster and even larger scale, we’ll be embracing portions of the Android stack, like the Android Linux kernel and Android frameworks, as part of the foundation of ChromeOS.
19:46Mis012[m]: what's next, they will stop respecting ownership and will blow fuses?
19:46Mis012[m]: if anything, the android team should be more like cros
20:53karolherbst: Mis012[m]: please don't be like this. If you are displeased with Google's decision, maybe redirect your anger towards the people actually able to make those decisions and not just individual developers
20:54Mis012[m]: this is not me directing anger at robclark, if anything I'm worried about his ability to keep his job
20:55Mis012[m]: I doubt he supports this decision
20:55Mis012[m]: I just wonder if he has some insight, though I'm not very optimistic about this being a misunderstanding
20:56karolherbst: you don't know that, but yeah, I think it's fair to ask for his take on this
20:58robclark: it's tbd how much cros will rub off on android vs how much android will rub off on cros.. given the 10yr support window for cros I don't think the android kernel model actually works and I think a lot of the cros kernel devs have the same opinion. Some of leadership and PMs have magical thinking, we'll see how it plays out.
21:12DemiMarie: robclark: For what it is worth, I agree with you.
21:16DemiMarie: Is it worth adding virito-GPU native context support for nouveau, or should this wait until the Nova driver is ready?
21:19robclark: if the new driver will support all the hw the old one does, then I guess probably wait? Tbh I've not thought about it much because nv isn't a thing we have to care about for cros ;-)
21:52karolherbst: nova will be Turing+ only
21:53karolherbst: or rather, it will be GSP only
21:53karolherbst: DemiMarie: ^^
21:53DemiMarie: karolherbst: is that the case I should care about?
21:54karolherbst: mhh?
21:55DemiMarie: Is pre-GSP hardware worth caring about?
21:56karolherbst: it depends
21:56DemiMarie: what on?
21:57karolherbst: on many things I suppose? I can't really make the decision for any distribution. Newer kernel modules generally always require some newer hardware, because often the reason for new kernel drivers are, that hardware changed significantly and splitting things up makes it easier to maintain
21:58karolherbst: such a decision isn't being made based on use of hardware
21:58karolherbst: if you want to add virtio-gpu native context to nouveau, then that's kinda on you to decide what hardware to care about
22:00DemiMarie: What is the attack surface like? I know that GSP firmware replaces much of the driver, but I don’t know how much of the code that was moved is userspace-accessible attack surface and how much is purely internal to the GPU.
22:02karolherbst: it's just used to configure the GPU, e.g. display, hardware contexts, and probably other things. But the main purpose is hardware configuration. VM management is still done outside of GSP afaik
22:02Mis012[m]: sadly it doesn't matter if the hardware changed if the firmware interface did and changing the firmware is not possible
22:03Mis012[m]: unless you can go around the firmware I guess, which to my understanding you can't
22:04karolherbst: GSP is not possible to use before Turing due to hardware reasons
22:04karolherbst: anyway, it takes care of a lot of hardware specific programming
22:06DemiMarie: I read that it improves performance. Is this because of fewer PCIe bus round-trips?
22:06karolherbst: no
22:06karolherbst: it reclocks the GPU
22:06karolherbst: and does full power management
22:06DemiMarie: This was for articles aimed at Windows users
22:06karolherbst: including fan control and everything
22:06DemiMarie: So the comparison was to the proprietary driver, which would have already had these features.
22:07karolherbst: I don't see why performance should change per se, however, the GPU can change it's performance levels without having to wait for the kernel to tell it to do so
22:08karolherbst: so maybe it just reacts quicker and that leads to higher power efficiency?
22:08DemiMarie: Maybe?
22:08karolherbst: as in.. you can run at lower clocks, because you can mitigate perf spikes quicker without having to poll too often
22:08karolherbst: doing so in the kernel is just an objectively bad idea
22:09DemiMarie: and without interrupt latency guarantees the hardware would need to be safe even if the kernel didn’t respond
22:09karolherbst: yeah. but that's a different thing
22:10karolherbst: you don't need to reclock in order to reduce power consumption
22:10karolherbst: but the benefit is, that you can lower clocks to bring down temps, which is more efficient what the hardware was able to do before
22:10DemiMarie: I see
22:10karolherbst: there was a temperature triggered clock divider, which could cut the clocks to 1/8 or something on high temperatures
22:10karolherbst: which maybe brings down power by 50%
22:10karolherbst: but at terrible perf
22:11karolherbst: but that worked without kernel intervention since forever basically
22:11karolherbst: and is pretty cheap to do in hardware
22:12karolherbst: at some point I figured out how to program that part, because I wanted to know if nouveau has to or if something else does so already. Turns out, the vbios already programmed it in sane ways
22:12DemiMarie: On Maxwell/Pascal/Volta, is Nouveau usable for anything interesting, or would users be just as well off using integrated graphics?
22:13Mis012[m]: robclark: well, hopefully the cros team will be able to convince the decision makers that their idea is absolutely insane...
22:13Mis012[m]: android without HALs would certainly be possible, I'm working on https://gitlab.com/android_translation_layer/android_translation_layer for example, but now Google dedided to add GKI and encourage borderline GPL violations there so HALs are no longer the only issue
22:13karolherbst: external displays mostly
22:13DemiMarie: those don’t require native contexts or accelerated rendering
22:13karolherbst: correct
22:14karolherbst: the only gen which is somewhat worth caring about is kepler and 1st gen maxwell
22:14karolherbst: but that's still manually reclocking
22:15DemiMarie: why not pre-Kepler?
22:15karolherbst: becuase on fermi nouveau doesn't support reclocking either, and then you have tesla, which are kinda old
22:15Mis012[m]: I don't think there would be a massive performance hit for Linux controlling the reclocking the same way it does for the CPU, I always assumed that it's not able to access the required registers but the fw can
22:15karolherbst: and won't support vulkan
22:16Mis012[m]: and/or the registers are not documented and can't be RE'd either because the proprietary driver doesn't use them
22:16karolherbst: Mis012[m]: it's a pointless discussion. Intel also moves more and more of its reclocking outside the kernel. It's where the industry is heading and it's a good thing
22:16karolherbst: also on nvidia those registers literally can't be accessed by the host anyway
22:17karolherbst: at least some of them, e.g. fan speed control and voltage regulation
22:17karolherbst: heck, even your own written firmware won't be allowed to access those
22:18Mis012[m]: my firmware won't be allowed to run, which is somehow not considered property rights violation
22:18karolherbst: it can run
22:18karolherbst: it just can't access certain registers
22:19karolherbst: anyway, nothing we discuss here will change that fact
22:19Mis012[m]: if it can't run with the same privileges, it's more of a shader than a firmware
22:19karolherbst: I'm getting tired of this nonsense
22:19Mis012[m]: firmware is typically used for "software that you better not even thing about changing"
22:20karolherbst: look, I'm not happy either, but being angry on IRC won't change it either
22:20DemiMarie: karolherbst: Why is it a good thing? Lower latency?
22:20Mis012[m]: intel handing reclocking outside kernel could maybe be a good thing if it's completely transparent, if it needs a driver anyway then it's very much not helping anything
22:21karolherbst: DemiMarie: yeah, and also it doesn't rely on the kernel to be responsive. You can adjust the clocks way quicker. So instead of running at a 60% target, you might get away with running at 80% without risking fps drops
22:21karolherbst: the lower voltage you can use, the more efficient your hardware runs
22:21karolherbst: and the higher the clocks, the higher the voltage
22:21DemiMarie: karolherbst: drops because the clocks could not react fast enough and the hardware thermal safety tripped?
22:22Mis012[m]: and if the driver is hidden in ACPI then that's abuse of something that was supposed to be board-level in order to play pretend with x86 "just working"
22:22karolherbst: DemiMarie: no, just if you want to keep your GPU at 80% load, you need lower clocks than if you'd keep it at 60%
22:22karolherbst: but if you get a spike in load, you have to jump up the clocks quick enough
22:22karolherbst: there are idle counters on the GPU telling you how busy the engines are
22:22DemiMarie: which the kernel can’t do without polling too frequently, ruining CPU-side performance?
22:22karolherbst: and the firmware can just read them out directly, instead of the kernel having to poll and waste IRQs on it
22:23Mis012[m]: polling and using irqs are two separate things surely?
22:23karolherbst: it depends on how you poll
22:23Mis012[m]: if you poll in sw then the hw design is insane
22:23karolherbst: you really don't want to do those things on the kernel, because that's just another poll thing keeping your CPU from idling
22:23Mis012[m]: to make you do that
22:24karolherbst: hence you do it in firmware
22:24DemiMarie: The firmware runs on a much smaller processor than the ones Linux runs on. That means that the processor is slower at executing instructions, but it also means that it uses much less power and has much less state to be flushed when an interrupt comes in.
22:25karolherbst: and there is no PCIe bus in the way
22:25DemiMarie: You really want your main CPU to go to sleep to save power, but it can take quite a while to wake it up from that low-power state.
22:25karolherbst: anyway, the tldr is, it makes perfect sense to do that in firmware
22:25DemiMarie: I suspect the firmware can wake up much, much faster, and given how small a processor it runs on, it might even be able to get away with busy-polling.
22:26karolherbst: nviida's firmware coprocessors did support timers
22:26karolherbst: I'm sure the new ones also do
22:27Mis012[m]: eh, using a timer for polling doesn't make it much less sad
22:27karolherbst: at least I'm sure their RPC works like that. You send an IRQ and the firmware handles the RPC request. But it can also configure a timer for itself, not sure if that also uses an IRQ or not, but probably just handled on the chip itself
22:28karolherbst: Mis012[m]: how else do you think those things work in hardware?
22:28DemiMarie: The other part of the issue is that Linux can’t run on the coprocessors that the firmware runs on. That’s one reason the firmware is separate software from the OS.
22:28karolherbst: stuff just magically wakes up after 1.5s because it just knows a value changed?
22:28karolherbst: this isn't userspace programming
22:28Mis012[m]: you could directly fire an irq when a value changes, that seems easy enough
22:29Mis012[m]: to the mcu obviously
22:29karolherbst: and what part would know that a value changed?
22:29karolherbst: anyway, I'm done with your better knowing attidue, good night
22:30Mis012[m]: lots of things that don't contain MCUs can fire IRQs, I'm sure it's possible
22:30Mis012[m]: but the MCU probably doesn't have much other stuff to worry about so it could be fine to use a timer
22:30DemiMarie: Possible? Yes. Is it what they did? No, and presumably they have good reasons for that, such as reducing the risk of the hardware design.
22:31DemiMarie: It is much cheaper to fix an issue in firmware than to have to recall the silicon.
22:32DemiMarie: So it makes sense to have as much as possible in the firmware, with the silicon only having what is necessary.
22:32Mis012[m]: can always work around a hw issue in fw or even in the OS, vendors love this one weird trick
22:32DemiMarie: exactly
22:33DemiMarie: The more that is in FW instead of HW, the more likely they are to be able to do this.
22:34Mis012[m]: fw being much easier to change also means that for example for one EC you need at least dozens of different drivers
22:34Mis012[m]: which is one reason why I prefer stuff being done in hw
22:35DemiMarie: That isn’t the way things are going, though.
22:35Mis012[m]: fw middleman is an option for standardizing a protocol but somehow that doesn't usually happen so it actually ends up worse
22:37DemiMarie: And more importantly, this is not a channel about embedded hardware design, so this discussion is off-topic.
22:37Mis012[m]: right
22:38DemiMarie: This channel is about how things are, not how we would like them to be.
22:39DemiMarie: Unless “how we would like them to be” can be achieved by reasonable changes to Linux, Mesa, or another part of the open source graphics stack, it’s not relevant here.
22:41Mis012[m]: reasonable people disagree about whether flashing custom fw would be reasonable, not that it's a possibility
22:41Mis012[m]: * a possibility in this case
22:41DemiMarie: This isn't the place for that discussion
22:42Mis012[m]: it could qualify as a change to Linux
22:43DemiMarie: no, because the signature checks are done either by hardware or by ROM
22:44Mis012[m]: well, signature checks are not always done, but I don't know about anything graphics-related where there aren't
22:44Mis012[m]: I heard something about AMD possibly opening something up
22:45Mis012[m]: arguably standardizing a fw interface across a single vendor is absolutely useless
22:45Mis012[m]: well, I got the answer to my original question, anything else is tangential
22:45Mis012[m]: 'night