02:43llyyr: is there any movement towards solving #11383? mpv switched to using vulkan by default as of the last release, and we constantly get bug reports about mpv taking ages to start up now because on multigpu systems enumerating vulkan devices has to wait for slept devices to wake up
02:43llyyr: note that the issue specifies nvk but it's a problem for every vulkan driver in mesa and outside mesa (nvidia)
03:21bluetail: llyyr what can I do reproduce? I got https://0x0.st/8GZp.txt but got none of the issue you are mentioning
03:21bluetail: using mpv v0.40.0
03:21bluetail: and yes I use a 6950xt and a w7500 pro
11:43karolherbst: mareko: is there anything I can do in rusticl to speed up data uploads to the GPU? Or things I need to consider like what flags are important to get optimal speed
11:51karolherbst: I have a couple of tests which run for minutes and spend like 50% of the CPU time uploading/downloading VRAM, so would be nice to figure out how to speed things up if there are simple ways
12:22mareko: karolherbst: AMD_DEBUG=info glxgears; print the theoretical PCIe bandwidth
12:22mareko: karolherbst: AMD_TEST=dmaperf glxgears; tests the real bandwidth available to copies
12:24mareko: don't access VRAM with the CPU, use staging allocations and do copies unless it's an APU
12:24mareko: the roundtrip VRAM latency is insane from the CPU
12:30mareko: the GPU can hide the latency because it can request lots of data at once and then just waits for everything to arrive at maximum available throughput
12:31karolherbst: okay, that might be helpful. The thing that I saw in perf profiles is mostly just the CPU spending time in memcpy
12:31karolherbst: but yeah.. I was considering doing more staging copy things, but not sure yet how to go about it, not even sure it's the issue here
12:33karolherbst: mareko: I was wondering if I could let the GPU write to a userptr allocation and if that would be any faster overall
12:33mareko: PIPE_USAGE_STAGING if you're gonna read it by the CPU, PIPE_USAGE_STREAM if you're never gonna read it; allocate a temp buffer, do memcpy into it, then do resource_copy_region to upload it, then release the temp buffer; similar for downloads
12:35mareko: PIPE_USAGE_STAGING enables CPU caching, the GPU has to go through the CPU cache for all access; PIPE_USAGE_STREAM is uncached for CPU reads, write-combined for CPU writes, the GPU skips the CPU cache and accesses RAM directly
12:36mareko: STAGING is best for any random access and reads, STREAM might be better for sequential writes only
12:37karolherbst: I use PIPE_USAGE_STAGING when the application requests fast CPU access atm
12:38karolherbst: but it doesn't really matter here, because all the data is only used once anyway in thoes tests
12:39karolherbst: for writes I just use subdata and trust it to do the right thing
12:39mareko: yes that should do the right thing
12:40karolherbst: though as I mentioned above, I was considering to do a userptr (resource_from_user) allocation instead and let the GPU copy from userptr -> VRAM
12:41karolherbst: "PIPE_USAGE_STREAM" is useful when the CPU only uploads once and never touches it ever again, right?
12:41mareko: not necessarily, it just has to be sequential write-only access
12:41mareko: you can try STAGING though
12:42karolherbst: yeah.. doesn't make much of a difference
12:42karolherbst: it's a test testing image copies for all sorts of formats, so it's uploading/download a lot of data
12:42karolherbst: so not sure if caching matters that much there
12:44karolherbst: there are a couple of "CL_MEM_HOST_*" flags I don't make use of yet.. so maybe I should look into this as well
12:55karolherbst: anyway yeah.. will play around with doing the copies on the GPU side, because I wanted to do that to get rid of stalls that way anyway. Maybe this also speeds up transfers which would be great
21:30karolherbst: this might be a weird question, but how can I create a pipe_fence_handle without pipe_screen::flush
21:37jenatali: To do what with?
21:38karolherbst: cl_khr_semaphore
21:39karolherbst: basically want to create a fence that only gets signalled through fence_server_signal
21:39jenatali: You want an external fence then
21:41jenatali: create_fence_fd / create_fence_win32
21:41jenatali: I don't know why the fd one is on a context and not the screen...
21:42karolherbst: but I don't necessarily have an fd, nor do I really want to create one
21:42karolherbst: I think I'll have to change gallium there anyway, because I also need to deal with payloads and be able to query those
21:43karolherbst: just trying to figure out what's there already and what's not