IRC Logs of #dri-devel on irc.freenode.net for 2024-10-20

02:17 androidui: how exactly does a virtio-pci device (such as virtio-pmem) communicate with a virtio-pmem driver via virtio-over-pci ?
04:55 Lynne: managed to crash glslang
04:57 Lynne: I think we've succeeded in running code on GPUs, just not writing code that runs on GPUs
08:29 colinmarc: I'm currently reading about GPU (para-)virtualization (@demi I just watched your talk about Xen, that was super helpful, thank you!). I have a question about SR-IOV. Given that consumer nvidia cards seem to have hardware support for SR-IOV, is there a future where nvk/noveau support it? Or is that off the table for some reason?
11:55 karolherbst: colinmarc: needs somebody to write the code
11:57 androidui: does anyone know how to implement a virtio-over-pci device ?
13:04 androidui: sadly this uses MMIO ;-; https://github.com/redoste/riscv-emulator/blob/master/emulator/device_virtio.c
13:46 DemiMarie: Colin Marc: You're welcome! karolherbst is correct that someone would need to write the code. Also, there are concerns that the feature might have gotten no testing and so be broken in a security-relevant way.
13:47 colinmarc: cloud providers presumably use it, no?
13:48 karolherbst: cloud providers don't use nouveau
13:48 colinmarc: ah, I thought "the feature" meant sr-iov
13:48 karolherbst: ohh, sure
13:48 karolherbst: but if nouveau implements it and nobody uses it, it's kinda meaningless
13:49 colinmarc: makes sense
13:49 karolherbst: it's quite low on the priority list, because there are way more critical things to work on for the time being, maybe long-term at some point once things are figured out
13:51 colinmarc: that also makes sense :) I was asking in case the answer was like "licensing means that'll never be a thing" or something
14:00 DemiMarie: karolherbst: Is Nouveau ready for serious use on recent cards, or will that need to wait for Nova?
14:02 DemiMarie: Colin Marc: Cloud workloads using Nvidia GPUs need the blobby stack for CUDA.
14:43 DemiMarie: Colin Marc: Have you considered native contexts?
14:55 androidui: the closest one i can find is https://github.com/matthias-prangl/VirtualBox6/blob/2db58873a4141dd42b3f7d34dbd713dca2fa1984/src/VBox/Devices/VirtIO/Virtio.cpp#L832-#L905
15:10 colinmarc: <DemiMarie> "Colin Marc: Have you considered..." <- In your talk you said malicious or broken code could reset the whole host GPU with native context, but that's not the case with SR-IOV - or did I misunderstand that? Otherwise the native context stuff seems really cool
15:11 colinmarc: I don't really understand how sr-iov would prevent that, so I probably just misunderstood it
15:14 DemiMarie: Colin Marc: SR-IOV *may* do better at containing the impact of a fault. This depends *entirely* on the GPU hardware and firmware.
15:16 colinmarc: I'm trying to read between the lines here, but if google/aws use SR-IOV for gpu instances they must feel reasonably good about it?
15:17 DemiMarie: The only major cloud I know of that offers SR-IOV VFs is Azure.
15:17 DemiMarie: I suspect Microsoft's answer is to have a support contract with the GPU vendor.
15:19 colinmarc: then I really misunderstood something. do you mean "offer" in the sense of they let you use SR-IOV? or they partition their GPUs into "GPU instances" with SR-IOV behind the scenes?
15:20 DemiMarie: The latter I think
15:23 DemiMarie: What is your use-case? On desktop denial of service is often more of a reliability issue than a security issue.
15:23 colinmarc: I'm pondering what it would look like to offer a hosted version of my game streaming thing (https://github.com/colinmarc/magic-mirror)
15:24 colinmarc: so one question would be, what kind of GPUs would I need, and could I actually run them in a meaningfully multitenant way
15:25 colinmarc: it's closer to the "I'm a cloud provider and want to offer gpu instances" than the desktop case, since I have to assume remote code execution probably
15:27 DemiMarie: Personally, I would run that on top of an actual cloud provider and pass the costs to the user. That makes the cloud provider responsible for security.
15:28 DemiMarie: Specifically GPU security
15:28 colinmarc: that's probably not really tenable, because of cost and because of how AI has changed the cloud gpu landscape
15:29 DemiMarie: Azure has instances specifically for VDI (Virtual Desktop Infrastructure).
15:30 colinmarc: but my background is in cloud infra so I'm not that scared of running computers/vms actually... and also I'm just doing research at the moment :)
15:31 DemiMarie: The big cloud providers will do things like conduct white-box security audits of vendor firmware and build custom hardware specifically for their needs.
15:32 colinmarc: since you're warning me off this, what do you see as the biggest threat in terms of gpu security? assuming single-tenant for now
15:32 colinmarc: (that is, 1:1 gpus-VMs)
15:32 DemiMarie: For PCI passthrough: hardware infection
15:33 DemiMarie: It's trivially easy to prevent with custom board designs and very hard to prevent otherwise.
15:34 colinmarc: would you mind pointing me to an example exploit so I can read up on it?
15:34 DemiMarie: I don't know of one actually, but I do know that vBIOS locks have been defeated.
15:36 DemiMarie: The OpenStack docs have a good explanation of what hardware infection is.
15:36 DemiMarie: I do wonder what Google Stadia used before it was shut down.
15:37 colinmarc: they had custom amd gpus made
15:37 colinmarc: and used SR-IOV
15:37 colinmarc: based on something I read today
15:37 DemiMarie: That would not surprise me at all.
15:38 colinmarc: native context would be better than PCI-passthrough even for a single tenant, then, if I want to avoid hardware infection
15:38 DemiMarie: Correct
15:38 DemiMarie: Also weekly kernel updates (so low uptimes).
15:40 colinmarc: thanks for the input!
15:40 DemiMarie: You're welcome!
15:40 DemiMarie: What is your intended user base for this?
15:41 colinmarc: this is getting off-topic :) I'll DM you if that's ok
15:41 DemiMarie: Sure
19:26 airlied: colinmarc: nvidia don't supply a stable ABI there so it's a mess to use in nouveau upstream
19:26 airlied: also SRIOV enabled GPUs are quite rare and sometimes need special tooling to turn on sriov
23:58 androidui: ughhh why are PCI registration interfaces so different ;-;