DRM Driver uAPI¶

drm/i915 uAPI¶

uevents generated by i915 on its device node

I915_L3_PARITY_UEVENT - Generated when the driver receives a parity mismatch: event from the GPU L3 cache. Additional information supplied is ROW, BANK, SUBBANK, SLICE of the affected cacheline. Userspace should keep track of these events, and if a specific cache-line seems to have a persistent error, remap it with the L3 remapping tool supplied in intel-gpu-tools. The value supplied with the event is always 1.
I915_ERROR_UEVENT - Generated upon error detection, currently only via: hangcheck. The error detection event is a good indicator of when things began to go badly. The value supplied with the event is a 1 upon error detection, and a 0 upon reset completion, signifying no more error exists. NOTE: Disabling hangcheck or reset via module parameter will cause the related events to not be seen.
I915_RESET_UEVENT - Event is generated just before an attempt to reset the: GPU. The value supplied with the event is always 1. NOTE: Disable reset via module parameter will cause this event to not be seen.

struct i915_user_extension¶: Base class for defining a chain of extensions

Definition:

struct i915_user_extension {
    __u64 next_extension;
    __u32 name;
    __u32 flags;
    __u32 rsvd[4];
};

Members

next_extension

Pointer to the next struct i915_user_extension, or zero if the end.

name

Name of the extension.

Note that the name here is just some integer.

Also note that the name space for this is not global for the whole driver, but rather its scope/meaning is limited to the specific piece of uAPI which has embedded the struct i915_user_extension.

flags

MBZ

All undefined bits must be zero.

rsvd

MBZ

Reserved for future use; must be zero.

Description

Many interfaces need to grow over time. In most cases we can simply extend the struct and have userspace pass in more data. Another option, as demonstrated by Vulkan’s approach to providing extensions for forward and backward compatibility, is to use a list of optional structs to provide those extra details.

The key advantage to using an extension chain is that it allows us to redefine the interface more easily than an ever growing struct of increasing complexity, and for large parts of that interface to be entirely optional. The downside is more pointer chasing; chasing across the __user boundary with pointers encapsulated inside u64.

Example chaining:

struct i915_user_extension ext3 {
        .next_extension = 0, // end
        .name = ...,
};
struct i915_user_extension ext2 {
        .next_extension = (uintptr_t)&ext3,
        .name = ...,
};
struct i915_user_extension ext1 {
        .next_extension = (uintptr_t)&ext2,
        .name = ...,
};

Typically the struct i915_user_extension would be embedded in some uAPI struct, and in this case we would feed it the head of the chain(i.e ext1), which would then apply all of the above extensions.

enum drm_i915_gem_engine_class¶: uapi engine type enumeration

Constants

I915_ENGINE_CLASS_RENDER: Render engines support instructions used for 3D, Compute (GPGPU), and programmable media workloads. These instructions fetch data and dispatch individual work items to threads that operate in parallel. The threads run small programs (called “kernels” or “shaders”) on the GPU’s execution units (EUs).
I915_ENGINE_CLASS_COPY: Copy engines (also referred to as “blitters”) support instructions that move blocks of data from one location in memory to another, or that fill a specified location of memory with fixed data. Copy engines can perform pre-defined logical or bitwise operations on the source, destination, or pattern data.
I915_ENGINE_CLASS_VIDEO: Video engines (also referred to as “bit stream decode” (BSD) or “vdbox”) support instructions that perform fixed-function media decode and encode.
I915_ENGINE_CLASS_VIDEO_ENHANCE: Video enhancement engines (also referred to as “vebox”) support instructions related to image enhancement.
I915_ENGINE_CLASS_COMPUTE: Compute engines support a subset of the instructions available on render engines: compute engines support Compute (GPGPU) and programmable media workloads, but do not support the 3D pipeline.
I915_ENGINE_CLASS_INVALID: Placeholder value to represent an invalid engine class assignment.

Description

Different engines serve different roles, and there may be more than one engine serving each role. This enum provides a classification of the role of the engine, which may be used when requesting operations to be performed on a certain subset of engines, or for providing information about that group.

struct i915_engine_class_instance¶: Engine class/instance identifier

Definition:

struct i915_engine_class_instance {
    __u16 engine_class;
#define I915_ENGINE_CLASS_INVALID_NONE -1;
#define I915_ENGINE_CLASS_INVALID_VIRTUAL -2;
    __u16 engine_instance;
};

Members

engine_class: Engine class from enum drm_i915_gem_engine_class
engine_instance: Engine instance.

Description

There may be more than one engine fulfilling any role within the system. Each engine of a class is given a unique instance number and therefore any engine can be specified by its class:instance tuplet. APIs that allow access to any engine in the system will use struct i915_engine_class_instance for this identification.

perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915

struct drm_i915_getparam¶: Driver parameter query structure.

Definition:

struct drm_i915_getparam {
    __s32 param;
    int __user *value;
};

Members

param

Driver parameter to query.

value

Address of memory where queried value should be put.

WARNING: Using pointers instead of fixed-size u64 means we need to write compat32 code. Don’t repeat this mistake.

type drm_i915_getparam_t¶: Driver parameter query structure. See struct drm_i915_getparam.

struct drm_i915_gem_mmap_offset¶: Retrieve an offset so we can mmap this buffer object.

Definition:

struct drm_i915_gem_mmap_offset {
    __u32 handle;
    __u32 pad;
    __u64 offset;
    __u64 flags;
#define I915_MMAP_OFFSET_GTT    0;
#define I915_MMAP_OFFSET_WC     1;
#define I915_MMAP_OFFSET_WB     2;
#define I915_MMAP_OFFSET_UC     3;
#define I915_MMAP_OFFSET_FIXED  4;
    __u64 extensions;
};

Members

handle

Handle for the object being mapped.

pad

Must be zero

offset

The fake offset to use for subsequent mmap call

This is a fixed-size type for 32/64 compatibility.

flags

Flags for extended behaviour.

It is mandatory that one of the MMAP_OFFSET types should be included:

I915_MMAP_OFFSET_GTT: Use mmap with the object bound to GTT. (Write-Combined)
I915_MMAP_OFFSET_WC: Use Write-Combined caching.
I915_MMAP_OFFSET_WB: Use Write-Back caching.
I915_MMAP_OFFSET_FIXED: Use object placement to determine caching.

On devices with local memory I915_MMAP_OFFSET_FIXED is the only valid type. On devices without local memory, this caching mode is invalid.

As caching mode when specifying I915_MMAP_OFFSET_FIXED, WC or WB will be used, depending on the object placement on creation. WB will be used when the object can only exist in system memory, WC otherwise.

extensions

Zero-terminated chain of extensions.

No current extensions defined; mbz.

Description

This struct is passed as argument to the DRM_IOCTL_I915_GEM_MMAP_OFFSET ioctl, and is used to retrieve the fake offset to mmap an object specified by handle.

The legacy way of using DRM_IOCTL_I915_GEM_MMAP is removed on gen12+. DRM_IOCTL_I915_GEM_MMAP_GTT is an older supported alias to this struct, but will behave as setting the extensions to 0, and flags to I915_MMAP_OFFSET_GTT.

struct drm_i915_gem_set_domain¶: Adjust the objects write or read domain, in preparation for accessing the pages via some CPU domain.

Definition:

struct drm_i915_gem_set_domain {
    __u32 handle;
    __u32 read_domains;
    __u32 write_domain;
};

Members

handle

Handle for the object.

read_domains

New read domains.

write_domain

New write domain.

Note that having something in the write domain implies it’s in the read domain, and only that read domain.

Description

Specifying a new write or read domain will flush the object out of the previous domain(if required), before then updating the objects domain tracking with the new domain.

Note this might involve waiting for the object first if it is still active on the GPU.

Supported values for read_domains and write_domain:

I915_GEM_DOMAIN_WC: Uncached write-combined domain

I915_GEM_DOMAIN_CPU: CPU cache domain

I915_GEM_DOMAIN_GTT: Mappable aperture domain

All other domains are rejected.

Note that for discrete, starting from DG1, this is no longer supported, and is instead rejected. On such platforms the CPU domain is effectively static, where we also only support a single drm_i915_gem_mmap_offset cache mode, which can’t be set explicitly and instead depends on the object placements, as per the below.

Implicit caching rules, starting from DG1:

If any of the object placements (see drm_i915_gem_create_ext_memory_regions) contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and mapped as write-combined only.

Everything else is always allocated and mapped as write-back, with the guarantee that everything is also coherent with the GPU.

Note that this is likely to change in the future again, where we might need more flexibility on future devices, so making this all explicit as part of a new drm_i915_gem_create_ext extension is probable.

struct drm_i915_gem_exec_fence¶: An input or output fence for the execbuf ioctl.

Definition:

struct drm_i915_gem_exec_fence {
    __u32 handle;
    __u32 flags;
#define I915_EXEC_FENCE_WAIT            (1<<0);
#define I915_EXEC_FENCE_SIGNAL          (1<<1);
#define __I915_EXEC_FENCE_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SIGNAL << 1));
};

Members

handle

User’s handle for a drm_syncobj to wait on or signal.

flags

Supported flags are:

I915_EXEC_FENCE_WAIT: Wait for the input fence before request submission.

I915_EXEC_FENCE_SIGNAL: Return request completion fence as output

Description

The request will wait for input fence to signal before submission.

The returned output fence will be signaled after the completion of the request.

struct drm_i915_gem_execbuffer_ext_timeline_fences¶: Timeline fences for execbuf ioctl.

Definition:

struct drm_i915_gem_execbuffer_ext_timeline_fences {
#define DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES 0;
    struct i915_user_extension base;
    __u64 fence_count;
    __u64 handles_ptr;
    __u64 values_ptr;
};

Members

base: Extension link. See struct i915_user_extension.
fence_count: Number of elements in the handles_ptr & value_ptr arrays.
handles_ptr: Pointer to an array of struct drm_i915_gem_exec_fence of length fence_count.
values_ptr: Pointer to an array of u64 values of length fence_count. Values must be 0 for a binary drm_syncobj. A Value of 0 for a timeline drm_syncobj is invalid as it turns a drm_syncobj into a binary one.

Description

This structure describes an array of drm_syncobj and associated points for timeline variants of drm_syncobj. It is invalid to append this structure to the execbuf if I915_EXEC_FENCE_ARRAY is set.

struct drm_i915_gem_execbuffer2¶: Structure for DRM_I915_GEM_EXECBUFFER2 ioctl.

Definition:

struct drm_i915_gem_execbuffer2 {
    __u64 buffers_ptr;
    __u32 buffer_count;
    __u32 batch_start_offset;
    __u32 batch_len;
    __u32 DR1;
    __u32 DR4;
    __u32 num_cliprects;
    __u64 cliprects_ptr;
    __u64 flags;
#define I915_EXEC_RING_MASK              (0x3f);
#define I915_EXEC_DEFAULT                (0<<0);
#define I915_EXEC_RENDER                 (1<<0);
#define I915_EXEC_BSD                    (2<<0);
#define I915_EXEC_BLT                    (3<<0);
#define I915_EXEC_VEBOX                  (4<<0);
#define I915_EXEC_CONSTANTS_MASK        (3<<6);
#define I915_EXEC_CONSTANTS_REL_GENERAL (0<<6);
#define I915_EXEC_CONSTANTS_ABSOLUTE    (1<<6);
#define I915_EXEC_CONSTANTS_REL_SURFACE (2<<6);
#define I915_EXEC_GEN7_SOL_RESET        (1<<8);
#define I915_EXEC_SECURE                (1<<9);
#define I915_EXEC_IS_PINNED             (1<<10);
#define I915_EXEC_NO_RELOC              (1<<11);
#define I915_EXEC_HANDLE_LUT            (1<<12);
#define I915_EXEC_BSD_SHIFT      (13);
#define I915_EXEC_BSD_MASK       (3 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_BSD_DEFAULT    (0 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_BSD_RING1      (1 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_BSD_RING2      (2 << I915_EXEC_BSD_SHIFT);
#define I915_EXEC_RESOURCE_STREAMER     (1<<15);
#define I915_EXEC_FENCE_IN              (1<<16);
#define I915_EXEC_FENCE_OUT             (1<<17);
#define I915_EXEC_BATCH_FIRST           (1<<18);
#define I915_EXEC_FENCE_ARRAY   (1<<19);
#define I915_EXEC_FENCE_SUBMIT          (1 << 20);
#define I915_EXEC_USE_EXTENSIONS        (1 << 21);
#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS << 1));
    __u64 rsvd1;
    __u64 rsvd2;
};

Members

buffers_ptr

Pointer to a list of gem_exec_object2 structs

buffer_count

Number of elements in buffers_ptr array

batch_start_offset

Offset in the batchbuffer to start execution from.

batch_len

Length in bytes of the batch buffer, starting from the batch_start_offset. If 0, length is assumed to be the batch buffer object size.

DR1

deprecated

DR4

deprecated

num_cliprects

See cliprects_ptr

cliprects_ptr

Kernel clipping was a DRI1 misfeature.

It is invalid to use this field if I915_EXEC_FENCE_ARRAY or I915_EXEC_USE_EXTENSIONS flags are not set.

If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array of drm_i915_gem_exec_fence and num_cliprects is the length of the array.

If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to a single i915_user_extension and num_cliprects is 0.

flags

Execbuf flags

rsvd1

Context id

rsvd2

in and out sync_file file descriptors.

When I915_EXEC_FENCE_IN or I915_EXEC_FENCE_SUBMIT flag is set, the lower 32 bits of this field will have the in sync_file fd (input).

When I915_EXEC_FENCE_OUT flag is set, the upper 32 bits of this field will have the out sync_file fd (output).

struct drm_i915_gem_caching¶: Set or get the caching for given object handle.

Definition:

struct drm_i915_gem_caching {
    __u32 handle;
#define I915_CACHING_NONE               0;
#define I915_CACHING_CACHED             1;
#define I915_CACHING_DISPLAY            2;
    __u32 caching;
};

Members

handle

Handle of the buffer to set/get the caching level.

caching

The GTT caching level to apply or possible return value.

The supported caching values:

I915_CACHING_NONE:

GPU access is not coherent with CPU caches. Default for machines without an LLC. This means manual flushing might be needed, if we want GPU access to be coherent.

I915_CACHING_CACHED:

GPU access is coherent with CPU caches and furthermore the data is cached in last-level caches shared between CPU cores and the GPU GT.

I915_CACHING_DISPLAY:

Special GPU caching mode which is coherent with the scanout engines. Transparently falls back to I915_CACHING_NONE on platforms where no special cache mode (like write-through or gfdt flushing) is available. The kernel automatically sets this mode when using a buffer as a scanout target. Userspace can manually set this mode to avoid a costly stall and clflush in the hotpath of drawing the first frame.

Description

Allow userspace to control the GTT caching bits for a given object when the object is later mapped through the ppGTT(or GGTT on older platforms lacking ppGTT support, or if the object is used for scanout). Note that this might require unbinding the object from the GTT first, if its current caching value doesn’t match.

Note that this all changes on discrete platforms, starting from DG1, the set/get caching is no longer supported, and is now rejected. Instead the CPU caching attributes(WB vs WC) will become an immutable creation time property for the object, along with the GTT caching level. For now we don’t expose any new uAPI for this, instead on DG1 this is all implicit, although this largely shouldn’t matter since DG1 is coherent by default(without any way of controlling it).

Implicit caching rules, starting from DG1:

If any of the object placements (see drm_i915_gem_create_ext_memory_regions) contain I915_MEMORY_CLASS_DEVICE then the object will be allocated and mapped as write-combined only.

Everything else is always allocated and mapped as write-back, with the guarantee that everything is also coherent with the GPU.

Note that this is likely to change in the future again, where we might need more flexibility on future devices, so making this all explicit as part of a new drm_i915_gem_create_ext extension is probable.

Side note: Part of the reason for this is that changing the at-allocation-time CPU caching attributes for the pages might be required(and is expensive) if we need to then CPU map the pages later with different caching attributes. This inconsistent caching behaviour, while supported on x86, is not universally supported on other architectures. So for simplicity we opt for setting everything at creation time, whilst also making it immutable, on discrete platforms.

struct drm_i915_gem_context_create_ext¶: Structure for creating contexts.

Definition:

struct drm_i915_gem_context_create_ext {
    __u32 ctx_id;
    __u32 flags;
#define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS        (1u << 0);
#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE       (1u << 1);
#define I915_CONTEXT_CREATE_FLAGS_UNKNOWN         (-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1));
    __u64 extensions;
#define I915_CONTEXT_CREATE_EXT_SETPARAM 0;
#define I915_CONTEXT_CREATE_EXT_CLONE 1;
};

Members

ctx_id

Id of the created context (output)

flags

Supported flags are:

I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS:

Extensions may be appended to this structure and driver must check for those. See extensions.

I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE

Created context will have single timeline.

extensions

Zero-terminated chain of extensions.

I915_CONTEXT_CREATE_EXT_SETPARAM: Context parameter to set or query during context creation. See struct drm_i915_gem_context_create_ext_setparam.

I915_CONTEXT_CREATE_EXT_CLONE: This extension has been removed. On the off chance someone somewhere has attempted to use it, never re-use this extension number.

struct drm_i915_gem_context_param¶: Context parameter to set or query.

Definition:

struct drm_i915_gem_context_param {
    __u32 ctx_id;
    __u32 size;
    __u64 param;
#define I915_CONTEXT_PARAM_BAN_PERIOD   0x1;
#define I915_CONTEXT_PARAM_NO_ZEROMAP   0x2;
#define I915_CONTEXT_PARAM_GTT_SIZE     0x3;
#define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE     0x4;
#define I915_CONTEXT_PARAM_BANNABLE     0x5;
#define I915_CONTEXT_PARAM_PRIORITY     0x6;
#define I915_CONTEXT_MAX_USER_PRIORITY        1023;
#define I915_CONTEXT_DEFAULT_PRIORITY         0;
#define I915_CONTEXT_MIN_USER_PRIORITY        -1023;
#define I915_CONTEXT_PARAM_SSEU         0x7;
#define I915_CONTEXT_PARAM_RECOVERABLE  0x8;
#define I915_CONTEXT_PARAM_VM           0x9;
#define I915_CONTEXT_PARAM_ENGINES      0xa;
#define I915_CONTEXT_PARAM_PERSISTENCE  0xb;
#define I915_CONTEXT_PARAM_RINGSIZE     0xc;
#define I915_CONTEXT_PARAM_PROTECTED_CONTENT    0xd;
#define I915_CONTEXT_PARAM_LOW_LATENCY          0xe;
#define I915_CONTEXT_PARAM_CONTEXT_IMAGE        0xf;
    __u64 value;
};

Members

ctx_id: Context id
size: Size of the parameter value
param: Parameter to set or query
value: Context parameter value to be set or queried

Virtual Engine uAPI

Virtual engine is a concept where userspace is able to configure a set of physical engines, submit a batch buffer, and let the driver execute it on any engine from the set as it sees fit.

This is primarily useful on parts which have multiple instances of a same class engine, like for example GT3+ Skylake parts with their two VCS engines.

For instance userspace can enumerate all engines of a certain class using the previously described Engine Discovery uAPI. After that userspace can create a GEM context with a placeholder slot for the virtual engine (using I915_ENGINE_CLASS_INVALID and I915_ENGINE_CLASS_INVALID_NONE for class and instance respectively) and finally using the I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE extension place a virtual engine in the same reserved slot.

Example of creating a virtual engine and submitting a batch buffer to it:

I915_DEFINE_CONTEXT_ENGINES_LOAD_BALANCE(virtual, 2) = {
        .base.name = I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE,
        .engine_index = 0, // Place this virtual engine into engine map slot 0
        .num_siblings = 2,
        .engines = { { I915_ENGINE_CLASS_VIDEO, 0 },
                     { I915_ENGINE_CLASS_VIDEO, 1 }, },
};
I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 1) = {
        .engines = { { I915_ENGINE_CLASS_INVALID,
                       I915_ENGINE_CLASS_INVALID_NONE } },
        .extensions = to_user_pointer(&virtual), // Chains after load_balance extension
};
struct drm_i915_gem_context_create_ext_setparam p_engines = {
        .base = {
                .name = I915_CONTEXT_CREATE_EXT_SETPARAM,
        },
        .param = {
                .param = I915_CONTEXT_PARAM_ENGINES,
                .value = to_user_pointer(&engines),
                .size = sizeof(engines),
        },
};
struct drm_i915_gem_context_create_ext create = {
        .flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
        .extensions = to_user_pointer(&p_engines);
};

ctx_id = gem_context_create_ext(drm_fd, &create);

// Now we have created a GEM context with its engine map containing a
// single virtual engine. Submissions to this slot can go either to
// vcs0 or vcs1, depending on the load balancing algorithm used inside
// the driver. The load balancing is dynamic from one batch buffer to
// another and transparent to userspace.

...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 0; // Submits to index 0 which is the virtual engine
gem_execbuf(drm_fd, &execbuf);

struct i915_context_engines_parallel_submit¶: Configure engine for parallel submission.

Definition:

struct i915_context_engines_parallel_submit {
    struct i915_user_extension base;
    __u16 engine_index;
    __u16 width;
    __u16 num_siblings;
    __u16 mbz16;
    __u64 flags;
    __u64 mbz64[3];
    struct i915_engine_class_instance engines[];
};

Members

base

base user extension.

engine_index

slot for parallel engine

width

number of contexts per parallel engine or in other words the number of batches in each submission

num_siblings

number of siblings per context or in other words the number of possible placements for each submission

mbz16

reserved for future use; must be zero

flags

all undefined flags must be zero, currently not defined flags

mbz64

reserved for future use; must be zero

engines

2-d array of engine instances to configure parallel engine

length = width (i) * num_siblings (j) index = j + i * num_siblings

Description

Setup a slot in the context engine map to allow multiple BBs to be submitted in a single execbuf IOCTL. Those BBs will then be scheduled to run on the GPU in parallel. Multiple hardware contexts are created internally in the i915 to run these BBs. Once a slot is configured for N BBs only N BBs can be submitted in each execbuf IOCTL and this is implicit behavior e.g. The user doesn’t tell the execbuf IOCTL there are N BBs, the execbuf IOCTL knows how many BBs there are based on the slot’s configuration. The N BBs are the last N buffer objects or first N if I915_EXEC_BATCH_FIRST is set.

The default placement behavior is to create implicit bonds between each context if each context maps to more than 1 physical engine (e.g. context is a virtual engine). Also we only allow contexts of same engine class and these contexts must be in logically contiguous order. Examples of the placement behavior are described below. Lastly, the default is to not allow BBs to be preempted mid-batch. Rather insert coordinated preemption points on all hardware contexts between each set of BBs. Flags could be added in the future to change both of these default behaviors.

Returns -EINVAL if hardware context placement configuration is invalid or if the placement configuration isn’t supported on the platform / submission interface. Returns -ENODEV if extension isn’t supported on the platform / submission interface.

Examples syntax:
CS[X] = generic engine of same class, logical instance X
INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE

Example 1 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=1,
             engines=CS[0],CS[1])

Results in the following valid placement:
CS[0], CS[1]

Example 2 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=2,
             engines=CS[0],CS[2],CS[1],CS[3])

Results in the following valid placements:
CS[0], CS[1]
CS[2], CS[3]

This can be thought of as two virtual engines, each containing two
engines thereby making a 2D array. However, there are bonds tying the
entries together and placing restrictions on how they can be scheduled.
Specifically, the scheduler can choose only vertical columns from the 2D
array. That is, CS[0] is bonded to CS[1] and CS[2] to CS[3]. So if the
scheduler wants to submit to CS[0], it must also choose CS[1] and vice
versa. Same for CS[2] requires also using CS[3].
VE[0] = CS[0], CS[2]
VE[1] = CS[1], CS[3]

Example 3 pseudo code:
set_engines(INVALID)
set_parallel(engine_index=0, width=2, num_siblings=2,
             engines=CS[0],CS[1],CS[1],CS[3])

Results in the following valid and invalid placements:
CS[0], CS[1]
CS[1], CS[3] - Not logically contiguous, return -EINVAL

Context Engine Map uAPI

Context engine map is a new way of addressing engines when submitting batch- buffers, replacing the existing way of using identifiers like I915_EXEC_BLT inside the flags field of struct drm_i915_gem_execbuffer2.

To use it created GEM contexts need to be configured with a list of engines the user is intending to submit to. This is accomplished using the I915_CONTEXT_PARAM_ENGINES parameter and struct i915_context_param_engines.

For such contexts the I915_EXEC_RING_MASK field becomes an index into the configured map.

Example of creating such context and submitting against it:

I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 2) = {
        .engines = { { I915_ENGINE_CLASS_RENDER, 0 },
                     { I915_ENGINE_CLASS_COPY, 0 } }
};
struct drm_i915_gem_context_create_ext_setparam p_engines = {
        .base = {
                .name = I915_CONTEXT_CREATE_EXT_SETPARAM,
        },
        .param = {
                .param = I915_CONTEXT_PARAM_ENGINES,
                .value = to_user_pointer(&engines),
                .size = sizeof(engines),
        },
};
struct drm_i915_gem_context_create_ext create = {
        .flags = I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
        .extensions = to_user_pointer(&p_engines);
};

ctx_id = gem_context_create_ext(drm_fd, &create);

// We have now created a GEM context with two engines in the map:
// Index 0 points to rcs0 while index 1 points to bcs0. Other engines
// will not be accessible from this context.

...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 0; // Submits to index 0, which is rcs0 for this context
gem_execbuf(drm_fd, &execbuf);

...
execbuf.rsvd1 = ctx_id;
execbuf.flags = 1; // Submits to index 0, which is bcs0 for this context
gem_execbuf(drm_fd, &execbuf);

struct drm_i915_gem_context_create_ext_setparam¶: Context parameter to set or query during context creation.

Definition:

struct drm_i915_gem_context_create_ext_setparam {
    struct i915_user_extension base;
    struct drm_i915_gem_context_param param;
};

Members

base: Extension link. See struct i915_user_extension.
param: Context parameter to set or query. See struct drm_i915_gem_context_param.

struct drm_i915_gem_vm_control¶: Structure to create or destroy VM.

Definition:

struct drm_i915_gem_vm_control {
    __u64 extensions;
    __u32 flags;
    __u32 vm_id;
};

Members

extensions: Zero-terminated chain of extensions.
flags: reserved for future usage, currently MBZ
vm_id: Id of the VM created or to be destroyed

Description

DRM_I915_GEM_VM_CREATE -

Create a new virtual memory address space (ppGTT) for use within a context on the same file. Extensions can be provided to configure exactly how the address space is setup upon creation.

The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is returned in the outparam id.

An extension chain maybe provided, starting with extensions, and terminated by the next_extension being 0. Currently, no extensions are defined.

DRM_I915_GEM_VM_DESTROY -

Destroys a previously created VM id, specified in vm_id.

No extensions or flags are allowed currently, and so must be zero.

struct drm_i915_gem_userptr¶: Create GEM object from user allocated memory.

Definition:

struct drm_i915_gem_userptr {
    __u64 user_ptr;
    __u64 user_size;
    __u32 flags;
#define I915_USERPTR_READ_ONLY 0x1;
#define I915_USERPTR_PROBE 0x2;
#define I915_USERPTR_UNSYNCHRONIZED 0x80000000;
    __u32 handle;
};

Members

user_ptr

The pointer to the allocated memory.

Needs to be aligned to PAGE_SIZE.

user_size

The size in bytes for the allocated memory. This will also become the object size.

Needs to be aligned to PAGE_SIZE, and should be at least PAGE_SIZE, or larger.

flags

Supported flags:

I915_USERPTR_READ_ONLY:

Mark the object as readonly, this also means GPU access can only be readonly. This is only supported on HW which supports readonly access through the GTT. If the HW can’t support readonly access, an error is returned.

I915_USERPTR_PROBE:

Probe the provided user_ptr range and validate that the user_ptr is indeed pointing to normal memory and that the range is also valid. For example if some garbage address is given to the kernel, then this should complain.

Returns -EFAULT if the probe failed.

Note that this doesn’t populate the backing pages, and also doesn’t guarantee that the object will remain valid when the object is eventually used.

The kernel supports this feature if I915_PARAM_HAS_USERPTR_PROBE returns a non-zero value.

I915_USERPTR_UNSYNCHRONIZED:

NOT USED. Setting this flag will result in an error.

handle

Returned handle for the object.

Object handles are nonzero.

Description

Userptr objects have several restrictions on what ioctls can be used with the object handle.

struct drm_i915_perf_oa_config¶

Definition:

struct drm_i915_perf_oa_config {
    char uuid[36];
    __u32 n_mux_regs;
    __u32 n_boolean_regs;
    __u32 n_flex_regs;
    __u64 mux_regs_ptr;
    __u64 boolean_regs_ptr;
    __u64 flex_regs_ptr;
};

Members

uuid: String formatted like “%08x-%04x-%04x-%04x-%012x”
n_mux_regs: Number of mux regs in mux_regs_ptr.
n_boolean_regs: Number of boolean regs in boolean_regs_ptr.
n_flex_regs: Number of flex regs in flex_regs_ptr.
mux_regs_ptr: Pointer to tuples of u32 values (register address, value) for mux registers. Expected length of buffer is (2 * sizeof(u32) * n_mux_regs).
boolean_regs_ptr: Pointer to tuples of u32 values (register address, value) for mux registers. Expected length of buffer is (2 * sizeof(u32) * n_boolean_regs).
flex_regs_ptr: Pointer to tuples of u32 values (register address, value) for mux registers. Expected length of buffer is (2 * sizeof(u32) * n_flex_regs).

Description

Structure to upload perf dynamic configuration into the kernel.

struct drm_i915_query_item¶: An individual query for the kernel to process.

Definition:

struct drm_i915_query_item {
    __u64 query_id;
#define DRM_I915_QUERY_TOPOLOGY_INFO            1;
#define DRM_I915_QUERY_ENGINE_INFO              2;
#define DRM_I915_QUERY_PERF_CONFIG              3;
#define DRM_I915_QUERY_MEMORY_REGIONS           4;
#define DRM_I915_QUERY_HWCONFIG_BLOB            5;
#define DRM_I915_QUERY_GEOMETRY_SUBSLICES       6;
#define DRM_I915_QUERY_GUC_SUBMISSION_VERSION   7;
    __s32 length;
    __u32 flags;
#define DRM_I915_QUERY_PERF_CONFIG_LIST          1;
#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2;
#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID   3;
    __u64 data_ptr;
};

Members

query_id

The id for this query. Currently accepted query IDs are:

DRM_I915_QUERY_TOPOLOGY_INFO (see struct drm_i915_query_topology_info)
DRM_I915_QUERY_ENGINE_INFO (see struct drm_i915_engine_info)
DRM_I915_QUERY_PERF_CONFIG (see struct drm_i915_query_perf_config)
DRM_I915_QUERY_MEMORY_REGIONS (see struct drm_i915_query_memory_regions)
DRM_I915_QUERY_HWCONFIG_BLOB (see GuC HWCONFIG blob uAPI)
DRM_I915_QUERY_GEOMETRY_SUBSLICES (see struct drm_i915_query_topology_info)
DRM_I915_QUERY_GUC_SUBMISSION_VERSION (see struct drm_i915_query_guc_submission_version)

length

When set to zero by userspace, this is filled with the size of the data to be written at the data_ptr pointer. The kernel sets this value to a negative value to signal an error on a particular query item.

flags

When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0.

When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the following:

DRM_I915_QUERY_PERF_CONFIG_LIST

DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID

DRM_I915_QUERY_PERF_CONFIG_FOR_UUID

When query_id == DRM_I915_QUERY_GEOMETRY_SUBSLICES must contain a struct i915_engine_class_instance that references a render engine.

data_ptr

Data will be written at the location pointed by data_ptr when the value of length matches the length of the data to be written by the kernel.

Description

The behaviour is determined by the query_id. Note that exactly what data_ptr is also depends on the specific query_id.

struct drm_i915_query¶: Supply an array of struct drm_i915_query_item for the kernel to fill out.

Definition:

struct drm_i915_query {
    __u32 num_items;
    __u32 flags;
    __u64 items_ptr;
};

Members

num_items: The number of elements in the items_ptr array
flags: Unused for now. Must be cleared to zero.
items_ptr: Pointer to an array of struct drm_i915_query_item. The number of array elements is num_items.

Description

Note that this is generally a two step process for each struct drm_i915_query_item in the array:

Call the DRM_IOCTL_I915_QUERY, giving it our array of struct drm_i915_query_item, with drm_i915_query_item.length set to zero. The kernel will then fill in the size, in bytes, which tells userspace how memory it needs to allocate for the blob(say for an array of properties).
Next we call DRM_IOCTL_I915_QUERY again, this time with the drm_i915_query_item.data_ptr equal to our newly allocated blob. Note that the drm_i915_query_item.length should still be the same as what the kernel previously set. At this point the kernel can fill in the blob.

Note that for some query items it can make sense for userspace to just pass in a buffer/blob equal to or larger than the required size. In this case only a single ioctl call is needed. For some smaller query items this can work quite well.

struct drm_i915_query_topology_info¶

Definition:

struct drm_i915_query_topology_info {
    __u16 flags;
    __u16 max_slices;
    __u16 max_subslices;
    __u16 max_eus_per_subslice;
    __u16 subslice_offset;
    __u16 subslice_stride;
    __u16 eu_offset;
    __u16 eu_stride;
    __u8 data[];
};

Members

flags

Unused for now. Must be cleared to zero.

max_slices

The number of bits used to express the slice mask.

max_subslices

The number of bits used to express the subslice mask.

max_eus_per_subslice

The number of bits in the EU mask that correspond to a single subslice’s EUs.

subslice_offset

Offset in data[] at which the subslice masks are stored.

subslice_stride

Stride at which each of the subslice masks for each slice are stored.

eu_offset

Offset in data[] at which the EU masks are stored.

eu_stride

Stride at which each of the EU masks for each subslice are stored.

data

Contains 3 pieces of information :

The slice mask with one bit per slice telling whether a slice is available. The availability of slice X can be queried with the following formula :
```
(data[X / 8] >> (X % 8)) & 1
```
Starting with Xe_HP platforms, Intel hardware no longer has traditional slices so i915 will always report a single slice (hardcoded slicemask = 0x1) which contains all of the platform’s subslices. I.e., the mask here does not reflect any of the newer hardware concepts such as “gslices” or “cslices” since userspace is capable of inferring those from the subslice mask.
The subslice mask for each slice with one bit per subslice telling whether a subslice is available. Starting with Gen12 we use the term “subslice” to refer to what the hardware documentation describes as a “dual-subslices.” The availability of subslice Y in slice X can be queried with the following formula :
```
(data[subslice_offset + X * subslice_stride + Y / 8] >> (Y % 8)) & 1
```
The EU mask for each subslice in each slice, with one bit per EU telling whether an EU is available. The availability of EU Z in subslice Y in slice X can be queried with the following formula :
```
(data[eu_offset +
      (X * max_subslices + Y) * eu_stride +
      Z / 8
 ] >> (Z % 8)) & 1
```

Description

Describes slice/subslice/EU information queried by DRM_I915_QUERY_TOPOLOGY_INFO

Engine Discovery uAPI

Engine discovery uAPI is a way of enumerating physical engines present in a GPU associated with an open i915 DRM file descriptor. This supersedes the old way of using DRM_IOCTL_I915_GETPARAM and engine identifiers like I915_PARAM_HAS_BLT.

The need for this interface came starting with Icelake and newer GPUs, which started to establish a pattern of having multiple engines of a same class, where not all instances were always completely functionally equivalent.

Entry point for this uapi is DRM_IOCTL_I915_QUERY with the DRM_I915_QUERY_ENGINE_INFO as the queried item id.

Example for getting the list of engines:

struct drm_i915_query_engine_info *info;
struct drm_i915_query_item item = {
        .query_id = DRM_I915_QUERY_ENGINE_INFO;
};
struct drm_i915_query query = {
        .num_items = 1,
        .items_ptr = (uintptr_t)&item,
};
int err, i;

// First query the size of the blob we need, this needs to be large
// enough to hold our array of engines. The kernel will fill out the
// item.length for us, which is the number of bytes we need.
//
// Alternatively a large buffer can be allocated straightaway enabling
// querying in one pass, in which case item.length should contain the
// length of the provided buffer.
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

info = calloc(1, item.length);
// Now that we allocated the required number of bytes, we call the ioctl
// again, this time with the data_ptr pointing to our newly allocated
// blob, which the kernel can then populate with info on all engines.
item.data_ptr = (uintptr_t)&info;

err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

// We can now access each engine in the array
for (i = 0; i < info->num_engines; i++) {
        struct drm_i915_engine_info einfo = info->engines[i];
        u16 class = einfo.engine.class;
        u16 instance = einfo.engine.instance;
        ....
}

free(info);

Each of the enumerated engines, apart from being defined by its class and instance (see struct i915_engine_class_instance), also can have flags and capabilities defined as documented in i915_drm.h.

For instance video engines which support HEVC encoding will have the I915_VIDEO_CLASS_CAPABILITY_HEVC capability bit set.

Engine discovery only fully comes to its own when combined with the new way of addressing engines when submitting batch buffers using contexts with engine maps configured.

struct drm_i915_engine_info¶

Definition:

struct drm_i915_engine_info {
    struct i915_engine_class_instance engine;
    __u32 rsvd0;
    __u64 flags;
#define I915_ENGINE_INFO_HAS_LOGICAL_INSTANCE           (1 << 0);
    __u64 capabilities;
#define I915_VIDEO_CLASS_CAPABILITY_HEVC                (1 << 0);
#define I915_VIDEO_AND_ENHANCE_CLASS_CAPABILITY_SFC     (1 << 1);
    __u16 logical_instance;
    __u16 rsvd1[3];
    __u64 rsvd2[3];
};

Members

engine: Engine class and instance.
rsvd0: Reserved field.
flags: Engine flags.
capabilities: Capabilities of this engine.
logical_instance: Logical instance of engine
rsvd1: Reserved fields.
rsvd2: Reserved fields.

Description

Describes one engine and its capabilities as known to the driver.

struct drm_i915_query_engine_info¶

Definition:

struct drm_i915_query_engine_info {
    __u32 num_engines;
    __u32 rsvd[3];
    struct drm_i915_engine_info engines[];
};

Members

num_engines: Number of struct drm_i915_engine_info structs following.
rsvd: MBZ
engines: Marker for drm_i915_engine_info structures.

Description

Engine info query enumerates all engines known to the driver by filling in an array of struct drm_i915_engine_info structures.

struct drm_i915_query_perf_config¶

Definition:

struct drm_i915_query_perf_config {
    union {
        __u64 n_configs;
        __u64 config;
        char uuid[36];
    };
    __u32 flags;
    __u8 data[];
};

Members

{unnamed_union}

anonymous

n_configs

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 sets this fields to the number of configurations available.

config

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID, i915 will use the value in this field as configuration identifier to decide what data to write into config_ptr.

uuid

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID, i915 will use the value in this field as configuration identifier to decide what data to write into config_ptr.

String formatted like “08x-````04x-````04x-````04x-````012x”

flags

Unused for now. Must be cleared to zero.

data

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 will write an array of __u64 of configuration identifiers.

When drm_i915_query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA, i915 will write a struct drm_i915_perf_oa_config. If the following fields of struct drm_i915_perf_oa_config are not set to 0, i915 will write into the associated pointers the values of submitted when the configuration was created :

drm_i915_perf_oa_config.n_mux_regs

drm_i915_perf_oa_config.n_boolean_regs

drm_i915_perf_oa_config.n_flex_regs

Description

Data written by the kernel with query DRM_I915_QUERY_PERF_CONFIG and DRM_I915_QUERY_GEOMETRY_SUBSLICES.

enum drm_i915_gem_memory_class¶: Supported memory classes

Constants

I915_MEMORY_CLASS_SYSTEM: System memory
I915_MEMORY_CLASS_DEVICE: Device local-memory

struct drm_i915_gem_memory_class_instance¶: Identify particular memory region

Definition:

struct drm_i915_gem_memory_class_instance {
    __u16 memory_class;
    __u16 memory_instance;
};

Members

memory_class: See enum drm_i915_gem_memory_class
memory_instance: Which instance

struct drm_i915_memory_region_info¶: Describes one region as known to the driver.

Definition:

struct drm_i915_memory_region_info {
    struct drm_i915_gem_memory_class_instance region;
    __u32 rsvd0;
    __u64 probed_size;
    __u64 unallocated_size;
    union {
        __u64 rsvd1[8];
        struct {
            __u64 probed_cpu_visible_size;
            __u64 unallocated_cpu_visible_size;
        };
    };
};

Members

region

The class:instance pair encoding

rsvd0

MBZ

probed_size

Memory probed by the driver

Note that it should not be possible to ever encounter a zero value here, also note that no current region type will ever return -1 here. Although for future region types, this might be a possibility. The same applies to the other size fields.

unallocated_size

Estimate of memory remaining

Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. Without this (or if this is an older kernel) the value here will always equal the probed_size. Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will always equal the probed_size).

{unnamed_union}

anonymous

rsvd1

MBZ

{unnamed_struct}

anonymous

probed_cpu_visible_size

Memory probed by the driver that is CPU accessible.

This will be always be <= probed_size, and the remainder (if there is any) will not be CPU accessible.

On systems without small BAR, the probed_size will always equal the probed_cpu_visible_size, since all of it will be CPU accessible.

Note this is only tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will always equal the probed_size).

Note that if the value returned here is zero, then this must be an old kernel which lacks the relevant small-bar uAPI support (including I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on such systems we should never actually end up with a small BAR configuration, assuming we are able to load the kernel module. Hence it should be safe to treat this the same as when probed_cpu_visible_size == probed_size.

unallocated_cpu_visible_size

Estimate of CPU visible memory remaining.

Note this is only tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will always equal the probed_cpu_visible_size).

Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. Without this the value here will always equal the probed_cpu_visible_size. Note this is only currently tracked for I915_MEMORY_CLASS_DEVICE regions (for other types the value here will also always equal the probed_cpu_visible_size).

If this is an older kernel the value here will be zero, see also probed_cpu_visible_size.

Description

Note this is using both struct drm_i915_query_item and struct drm_i915_query. For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS at drm_i915_query_item.query_id.

struct drm_i915_query_memory_regions¶

Definition:

struct drm_i915_query_memory_regions {
    __u32 num_regions;
    __u32 rsvd[3];
    struct drm_i915_memory_region_info regions[];
};

Members

num_regions: Number of supported regions
rsvd: MBZ
regions: Info about each supported region

Description

The region info query enumerates all regions known to the driver by filling in an array of struct drm_i915_memory_region_info structures.

Example for getting the list of supported regions:

struct drm_i915_query_memory_regions *info;
struct drm_i915_query_item item = {
        .query_id = DRM_I915_QUERY_MEMORY_REGIONS;
};
struct drm_i915_query query = {
        .num_items = 1,
        .items_ptr = (uintptr_t)&item,
};
int err, i;

// First query the size of the blob we need, this needs to be large
// enough to hold our array of regions. The kernel will fill out the
// item.length for us, which is the number of bytes we need.
err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

info = calloc(1, item.length);
// Now that we allocated the required number of bytes, we call the ioctl
// again, this time with the data_ptr pointing to our newly allocated
// blob, which the kernel can then populate with the all the region info.
item.data_ptr = (uintptr_t)&info,

err = ioctl(fd, DRM_IOCTL_I915_QUERY, &query);
if (err) ...

// We can now access each region in the array
for (i = 0; i < info->num_regions; i++) {
        struct drm_i915_memory_region_info mr = info->regions[i];
        u16 class = mr.region.class;
        u16 instance = mr.region.instance;

        ....
}

free(info);

struct drm_i915_query_guc_submission_version¶: query GuC submission interface version

Definition:

struct drm_i915_query_guc_submission_version {
    __u32 branch;
    __u32 major;
    __u32 minor;
    __u32 patch;
};

Members

branch: Firmware branch version.
major: Firmware major version.
minor: Firmware minor version.
patch: Firmware patch version.

GuC HWCONFIG blob uAPI

The GuC produces a blob with information about the current device. i915 reads this blob from GuC and makes it available via this uAPI.

The format and meaning of the blob content are documented in the Programmer’s Reference Manual.

struct drm_i915_gem_create_ext¶: Existing gem_create behaviour, with added extension support using struct i915_user_extension.

Definition:

struct drm_i915_gem_create_ext {
    __u64 size;
    __u32 handle;
#define I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS (1 << 0);
    __u32 flags;
#define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0;
#define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1;
#define I915_GEM_CREATE_EXT_SET_PAT 2;
    __u64 extensions;
};

Members

size

Requested size for the object.

The (page-aligned) allocated size for the object will be returned.

On platforms like DG2/ATS the kernel will always use 64K or larger pages for I915_MEMORY_CLASS_DEVICE. The kernel also requires a minimum of 64K GTT alignment for such objects.

NOTE: Previously the ABI here required a minimum GTT alignment of 2M on DG2/ATS, due to how the hardware implemented 64K GTT page support, where we had the following complications:

1) The entire PDE (which covers a 2MB virtual address range), must contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is forbidden by the hardware.

2) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM objects.

However on actual production HW this was completely changed to now allow setting a TLB hint at the PTE level (see PS64), which is a lot more flexible than the above. With this the 2M restriction was dropped where we now only require 64K.

handle

Returned handle for the object.

Object handles are nonzero.

flags

Optional flags.

Supported values:

I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS - Signal to the kernel that the object will need to be accessed via the CPU.

Only valid when placing objects in I915_MEMORY_CLASS_DEVICE, and only strictly required on configurations where some subset of the device memory is directly visible/mappable through the CPU (which we also call small BAR), like on some DG2+ systems. Note that this is quite undesirable, but due to various factors like the client CPU, BIOS etc it’s something we can expect to see in the wild. See drm_i915_memory_region_info.probed_cpu_visible_size for how to determine if this system applies.

Note that one of the placements MUST be I915_MEMORY_CLASS_SYSTEM, to ensure the kernel can always spill the allocation to system memory, if the object can’t be allocated in the mappable part of I915_MEMORY_CLASS_DEVICE.

Also note that since the kernel only supports flat-CCS on objects that can only be placed in I915_MEMORY_CLASS_DEVICE, we therefore don’t support I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS together with flat-CCS.

Without this hint, the kernel will assume that non-mappable I915_MEMORY_CLASS_DEVICE is preferred for this object. Note that the kernel can still migrate the object to the mappable part, as a last resort, if userspace ever CPU faults this object, but this might be expensive, and so ideally should be avoided.

On older kernels which lack the relevant small-bar uAPI support (see also drm_i915_memory_region_info.probed_cpu_visible_size), usage of the flag will result in an error, but it should NEVER be possible to end up with a small BAR configuration, assuming we can also successfully load the i915 kernel module. In such cases the entire I915_MEMORY_CLASS_DEVICE region will be CPU accessible, and as such there are zero restrictions on where the object can be placed.

extensions

The chain of extensions to apply to this object.

This will be useful in the future when we need to support several different extensions, and we need to apply more than one when creating the object. See struct i915_user_extension.

If we don’t supply any extensions then we get the same old gem_create behaviour.

For I915_GEM_CREATE_EXT_MEMORY_REGIONS usage see struct drm_i915_gem_create_ext_memory_regions.

For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see struct drm_i915_gem_create_ext_protected_content.

For I915_GEM_CREATE_EXT_SET_PAT usage see struct drm_i915_gem_create_ext_set_pat.

Description

Note that new buffer flags should be added here, at least for the stuff that is immutable. Previously we would have two ioctls, one to create the object with gem_create, and another to apply various parameters, however this creates some ambiguity for the params which are considered immutable. Also in general we’re phasing out the various SET/GET ioctls.

struct drm_i915_gem_create_ext_memory_regions¶: The I915_GEM_CREATE_EXT_MEMORY_REGIONS extension.

Definition:

struct drm_i915_gem_create_ext_memory_regions {
    struct i915_user_extension base;
    __u32 pad;
    __u32 num_regions;
    __u64 regions;
};

Members

base

Extension link. See struct i915_user_extension.

pad

MBZ

num_regions

Number of elements in the regions array.

regions

The regions/placements array.

An array of struct drm_i915_gem_memory_class_instance.

Description

Set the object with the desired set of placements/regions in priority order. Each entry must be unique and supported by the device.

This is provided as an array of struct drm_i915_gem_memory_class_instance, or an equivalent layout of class:instance pair encodings. See struct drm_i915_query_memory_regions and DRM_I915_QUERY_MEMORY_REGIONS for how to query the supported regions for a device.

As an example, on discrete devices, if we wish to set the placement as device local-memory we can do something like:

struct drm_i915_gem_memory_class_instance region_lmem = {
        .memory_class = I915_MEMORY_CLASS_DEVICE,
        .memory_instance = 0,
};
struct drm_i915_gem_create_ext_memory_regions regions = {
        .base = { .name = I915_GEM_CREATE_EXT_MEMORY_REGIONS },
        .regions = (uintptr_t)&region_lmem,
        .num_regions = 1,
};
struct drm_i915_gem_create_ext create_ext = {
        .size = 16 * PAGE_SIZE,
        .extensions = (uintptr_t)&regions,
};

int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...

At which point we get the object handle in drm_i915_gem_create_ext.handle, along with the final object size in drm_i915_gem_create_ext.size, which should account for any rounding up, if required.

Note that userspace has no means of knowing the current backing region for objects where num_regions is larger than one. The kernel will only ensure that the priority order of the regions array is honoured, either when initially placing the object, or when moving memory around due to memory pressure

On Flat-CCS capable HW, compression is supported for the objects residing in I915_MEMORY_CLASS_DEVICE. When such objects (compressed) have other memory class in regions and migrated (by i915, due to memory constraints) to the non I915_MEMORY_CLASS_DEVICE region, then i915 needs to decompress the content. But i915 doesn’t have the required information to decompress the userspace compressed objects.

So i915 supports Flat-CCS, on the objects which can reside only on I915_MEMORY_CLASS_DEVICE regions.

struct drm_i915_gem_create_ext_protected_content¶: The I915_OBJECT_PARAM_PROTECTED_CONTENT extension.

Definition:

struct drm_i915_gem_create_ext_protected_content {
    struct i915_user_extension base;
    __u32 flags;
};

Members

base: Extension link. See struct i915_user_extension.
flags: reserved for future usage, currently MBZ

Description

If this extension is provided, buffer contents are expected to be protected by PXP encryption and require decryption for scan out and processing. This is only possible on platforms that have PXP enabled, on all other scenarios using this extension will cause the ioctl to fail and return -ENODEV. The flags parameter is reserved for future expansion and must currently be set to zero.

The buffer contents are considered invalid after a PXP session teardown.

The encryption is guaranteed to be processed correctly only if the object is submitted with a context created using the I915_CONTEXT_PARAM_PROTECTED_CONTENT flag. This will also enable extra checks at submission time on the validity of the objects involved.

Below is an example on how to create a protected object:

struct drm_i915_gem_create_ext_protected_content protected_ext = {
        .base = { .name = I915_GEM_CREATE_EXT_PROTECTED_CONTENT },
        .flags = 0,
};
struct drm_i915_gem_create_ext create_ext = {
        .size = PAGE_SIZE,
        .extensions = (uintptr_t)&protected_ext,
};

int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...

struct drm_i915_gem_create_ext_set_pat¶: The I915_GEM_CREATE_EXT_SET_PAT extension.

Definition:

struct drm_i915_gem_create_ext_set_pat {
    struct i915_user_extension base;
    __u32 pat_index;
    __u32 rsvd;
};

Members

base: Extension link. See struct i915_user_extension.
pat_index: PAT index to be set PAT index is a bit field in Page Table Entry to control caching behaviors for GPU accesses. The definition of PAT index is platform dependent and can be found in hardware specifications,
rsvd: reserved for future use

Description

If this extension is provided, the specified caching policy (PAT index) is applied to the buffer object.

Below is an example on how to create an object with specific caching policy:

struct drm_i915_gem_create_ext_set_pat set_pat_ext = {
        .base = { .name = I915_GEM_CREATE_EXT_SET_PAT },
        .pat_index = 0,
};
struct drm_i915_gem_create_ext create_ext = {
        .size = PAGE_SIZE,
        .extensions = (uintptr_t)&set_pat_ext,
};

int err = ioctl(fd, DRM_IOCTL_I915_GEM_CREATE_EXT, &create_ext);
if (err) ...

drm/nouveau uAPI¶

VM_BIND / EXEC uAPI¶

Nouveau’s VM_BIND / EXEC UAPI consists of three ioctls: DRM_NOUVEAU_VM_INIT, DRM_NOUVEAU_VM_BIND and DRM_NOUVEAU_EXEC.

In order to use the UAPI firstly a user client must initialize the VA space using the DRM_NOUVEAU_VM_INIT ioctl specifying which region of the VA space should be managed by the kernel and which by the UMD.

The DRM_NOUVEAU_VM_BIND ioctl provides clients an interface to manage the userspace-managable portion of the VA space. It provides operations to map and unmap memory. Mappings may be flagged as sparse. Sparse mappings are not backed by a GEM object and the kernel will ignore GEM handles provided alongside a sparse mapping.

Userspace may request memory backed mappings either within or outside of the bounds (but not crossing those bounds) of a previously mapped sparse mapping. Subsequently requested memory backed mappings within a sparse mapping will take precedence over the corresponding range of the sparse mapping. If such memory backed mappings are unmapped the kernel will make sure that the corresponding sparse mapping will take their place again. Requests to unmap a sparse mapping that still contains memory backed mappings will result in those memory backed mappings being unmapped first.

Unmap requests are not bound to the range of existing mappings and can even overlap the bounds of sparse mappings. For such a request the kernel will make sure to unmap all memory backed mappings within the given range, splitting up memory backed mappings which are only partially contained within the given range. Unmap requests with the sparse flag set must match the range of a previously mapped sparse mapping exactly though.

While the kernel generally permits arbitrary sequences and ranges of memory backed mappings being mapped and unmapped, either within a single or multiple VM_BIND ioctl calls, there are some restrictions for sparse mappings.

The kernel does not permit to:

unmap non-existent sparse mappings
unmap a sparse mapping and map a new sparse mapping overlapping the range of the previously unmapped sparse mapping within the same VM_BIND ioctl
unmap a sparse mapping and map new memory backed mappings overlapping the range of the previously unmapped sparse mapping within the same VM_BIND ioctl

When using the VM_BIND ioctl to request the kernel to map memory to a given virtual address in the GPU’s VA space there is no guarantee that the actual mappings are created in the GPU’s MMU. If the given memory is swapped out at the time the bind operation is executed the kernel will stash the mapping details into it’s internal allocator and create the actual MMU mappings once the memory is swapped back in. While this is transparent for userspace, it is guaranteed that all the backing memory is swapped back in and all the memory mappings, as requested by userspace previously, are actually mapped once the DRM_NOUVEAU_EXEC ioctl is called to submit an exec job.

A VM_BIND job can be executed either synchronously or asynchronously. If executed asynchronously, userspace may provide a list of syncobjs this job will wait for and/or a list of syncobj the kernel will signal once the VM_BIND job finished execution. If executed synchronously the ioctl will block until the bind job is finished. For synchronous jobs the kernel will not permit any syncobjs submitted to the kernel.

To execute a push buffer the UAPI provides the DRM_NOUVEAU_EXEC ioctl. EXEC jobs are always executed asynchronously, and, equal to VM_BIND jobs, provide the option to synchronize them with syncobjs.

Besides that, EXEC jobs can be scheduled for a specified channel to execute on.

Since VM_BIND jobs update the GPU’s VA space on job submit, EXEC jobs do have an up to date view of the VA space. However, the actual mappings might still be pending. Hence, EXEC jobs require to have the particular fences - of the corresponding VM_BIND jobs they depend on - attached to them.

struct drm_nouveau_sync¶: sync object

Definition:

struct drm_nouveau_sync {
    __u32 flags;
#define DRM_NOUVEAU_SYNC_SYNCOBJ 0x0;
#define DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ 0x1;
#define DRM_NOUVEAU_SYNC_TYPE_MASK 0xf;
    __u32 handle;
    __u64 timeline_value;
};

Members

flags

the flags for a sync object

The first 8 bits are used to determine the type of the sync object.

handle

the handle of the sync object

timeline_value

The timeline point of the sync object in case the syncobj is of type DRM_NOUVEAU_SYNC_TIMELINE_SYNCOBJ.

Description

This structure serves as synchronization mechanism for (potentially) asynchronous operations such as EXEC or VM_BIND.

struct drm_nouveau_vm_init¶: GPU VA space init structure

Definition:

struct drm_nouveau_vm_init {
    __u64 kernel_managed_addr;
    __u64 kernel_managed_size;
};

Members

kernel_managed_addr: start address of the kernel managed VA space region
kernel_managed_size: size of the kernel managed VA space region in bytes

Description

Used to initialize the GPU’s VA space for a user client, telling the kernel which portion of the VA space is managed by the UMD and kernel respectively.

For the UMD to use the VM_BIND uAPI, this must be called before any BOs or channels are created; if called afterwards DRM_IOCTL_NOUVEAU_VM_INIT fails with -ENOSYS.

struct drm_nouveau_vm_bind_op¶: VM_BIND operation

Definition:

struct drm_nouveau_vm_bind_op {
    __u32 op;
#define DRM_NOUVEAU_VM_BIND_OP_MAP 0x0;
#define DRM_NOUVEAU_VM_BIND_OP_UNMAP 0x1;
    __u32 flags;
#define DRM_NOUVEAU_VM_BIND_SPARSE (1 << 8);
    __u32 handle;
    __u32 pad;
    __u64 addr;
    __u64 bo_offset;
    __u64 range;
};

Members

op

the operation type

Supported values:

DRM_NOUVEAU_VM_BIND_OP_MAP - Map a GEM object to the GPU’s VA space. Optionally, the DRM_NOUVEAU_VM_BIND_SPARSE flag can be passed to instruct the kernel to create sparse mappings for the given range.

DRM_NOUVEAU_VM_BIND_OP_UNMAP - Unmap an existing mapping in the GPU’s VA space. If the region the mapping is located in is a sparse region, new sparse mappings are created where the unmapped (memory backed) mapping was mapped previously. To remove a sparse region the DRM_NOUVEAU_VM_BIND_SPARSE must be set.

flags

the flags for a drm_nouveau_vm_bind_op

Supported values:

DRM_NOUVEAU_VM_BIND_SPARSE - Indicates that an allocated VA space region should be sparse.

handle

the handle of the DRM GEM object to map

pad

32 bit padding, should be 0

addr

the address the VA space region or (memory backed) mapping should be mapped to

bo_offset

the offset within the BO backing the mapping

range

the size of the requested mapping in bytes

Description

This structure represents a single VM_BIND operation. UMDs should pass an array of this structure via struct drm_nouveau_vm_bind’s op_ptr field.

struct drm_nouveau_vm_bind¶: structure for DRM_IOCTL_NOUVEAU_VM_BIND

Definition:

struct drm_nouveau_vm_bind {
    __u32 op_count;
    __u32 flags;
#define DRM_NOUVEAU_VM_BIND_RUN_ASYNC 0x1;
    __u32 wait_count;
    __u32 sig_count;
    __u64 wait_ptr;
    __u64 sig_ptr;
    __u64 op_ptr;
};

Members

op_count

the number of drm_nouveau_vm_bind_op

flags

the flags for a drm_nouveau_vm_bind ioctl

Supported values:

DRM_NOUVEAU_VM_BIND_RUN_ASYNC - Indicates that the given VM_BIND operation should be executed asynchronously by the kernel.

If this flag is not supplied the kernel executes the associated operations synchronously and doesn’t accept any drm_nouveau_sync objects.

wait_count

the number of wait drm_nouveau_syncs

sig_count

the number of drm_nouveau_syncs to signal when finished

wait_ptr

pointer to drm_nouveau_syncs to wait for

sig_ptr

pointer to drm_nouveau_syncs to signal when finished

op_ptr

pointer to the drm_nouveau_vm_bind_ops to execute

struct drm_nouveau_exec_push¶: EXEC push operation

Definition:

struct drm_nouveau_exec_push {
    __u64 va;
    __u32 va_len;
    __u32 flags;
#define DRM_NOUVEAU_EXEC_PUSH_NO_PREFETCH 0x1;
};

Members

va: the virtual address of the push buffer mapping
va_len: the length of the push buffer mapping
flags: the flags for this push buffer mapping

Description

This structure represents a single EXEC push operation. UMDs should pass an array of this structure via struct drm_nouveau_exec’s push_ptr field.

struct drm_nouveau_exec¶: structure for DRM_IOCTL_NOUVEAU_EXEC

Definition:

struct drm_nouveau_exec {
    __u32 channel;
    __u32 push_count;
    __u32 wait_count;
    __u32 sig_count;
    __u64 wait_ptr;
    __u64 sig_ptr;
    __u64 push_ptr;
};

Members

channel: the channel to execute the push buffer in
push_count: the number of drm_nouveau_exec_push ops
wait_count: the number of wait drm_nouveau_syncs
sig_count: the number of drm_nouveau_syncs to signal when finished
wait_ptr: pointer to drm_nouveau_syncs to wait for
sig_ptr: pointer to drm_nouveau_syncs to signal when finished
push_ptr: pointer to drm_nouveau_exec_push ops

drm/panthor uAPI¶

Introduction

This documentation describes the Panthor IOCTLs.

Just a few generic rules about the data passed to the Panthor IOCTLs:

Structures must be aligned on 64-bit/8-byte. If the object is not naturally aligned, a padding field must be added.
Fields must be explicitly aligned to their natural type alignment with pad[0..N] fields.
All padding fields will be checked by the driver to make sure they are zeroed.
Flags can be added, but not removed/replaced.
New fields can be added to the main structures (the structures directly passed to the ioctl). Those fields can be added at the end of the structure, or replace existing padding fields. Any new field being added must preserve the behavior that existed before those fields were added when a value of zero is passed.
New fields can be added to indirect objects (objects pointed by the main structure), iff those objects are passed a size to reflect the size known by the userspace driver (see drm_panthor_obj_array::stride or drm_panthor_dev_query::size).
If the kernel driver is too old to know some fields, those will be ignored if zero, and otherwise rejected (and so will be zero on output).
If userspace is too old to know some fields, those will be zeroed (input) before the structure is parsed by the kernel driver.
Each new flag/field addition must come with a driver version update so the userspace driver doesn’t have to trial and error to know which flags are supported.
Structures should not contain unions, as this would defeat the extensibility of such structures.
IOCTLs can’t be removed or replaced. New IOCTL IDs should be placed at the end of the drm_panthor_ioctl_id enum.

MMIO regions exposed to userspace.

DRM_PANTHOR_USER_MMIO_OFFSET¶

File offset for all MMIO regions being exposed to userspace. Don’t use this value directly, use DRM_PANTHOR_USER_<name>_OFFSET values instead. pgoffset passed to mmap2() is an unsigned long, which forces us to use a different offset on 32-bit and 64-bit systems.

DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET¶

File offset for the LATEST_FLUSH_ID register. The Userspace driver controls GPU cache flushing through CS instructions, but the flush reduction mechanism requires a flush_id. This flush_id could be queried with an ioctl, but Arm provides a well-isolated register page containing only this read-only register, so let’s expose this page through a static mmap offset and allow direct mapping of this MMIO region so we can avoid the user <-> kernel round-trip.

IOCTL IDs

enum drm_panthor_ioctl_id - IOCTL IDs

Place new ioctls at the end, don’t re-order, don’t replace or remove entries.

These IDs are not meant to be used directly. Use the DRM_IOCTL_PANTHOR_xxx definitions instead.

IOCTL arguments

struct drm_panthor_obj_array¶: Object array.

Definition:

struct drm_panthor_obj_array {
    __u32 stride;
    __u32 count;
    __u64 array;
};

Members

stride: Stride of object struct. Used for versioning.
count: Number of objects in the array.
array: User pointer to an array of objects.

Description

This object is used to pass an array of objects whose size is subject to changes in future versions of the driver. In order to support this mutability, we pass a stride describing the size of the object as known by userspace.

You shouldn’t fill drm_panthor_obj_array fields directly. You should instead use the DRM_PANTHOR_OBJ_ARRAY() macro that takes care of initializing the stride to the object size.

DRM_PANTHOR_OBJ_ARRAY¶

DRM_PANTHOR_OBJ_ARRAY (cnt, ptr)

Initialize a drm_panthor_obj_array field.

Parameters

cnt: Number of elements in the array.
ptr: Pointer to the array to pass to the kernel.

Description

Macro initializing a drm_panthor_obj_array based on the object size as known by userspace.

enum drm_panthor_sync_op_flags¶: Synchronization operation flags.

Constants

DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASK: Synchronization handle type mask.
DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_SYNCOBJ: Synchronization object type.
DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJ: Timeline synchronization object type.
DRM_PANTHOR_SYNC_OP_WAIT: Wait operation.
DRM_PANTHOR_SYNC_OP_SIGNAL: Signal operation.

struct drm_panthor_sync_op¶: Synchronization operation.

Definition:

struct drm_panthor_sync_op {
    __u32 flags;
    __u32 handle;
    __u64 timeline_value;
};

Members

flags: Synchronization operation flags. Combination of DRM_PANTHOR_SYNC_OP values.
handle: Sync handle.
timeline_value: MBZ if (flags & DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_MASK) != DRM_PANTHOR_SYNC_OP_HANDLE_TYPE_TIMELINE_SYNCOBJ.

enum drm_panthor_dev_query_type¶: Query type

Constants

DRM_PANTHOR_DEV_QUERY_GPU_INFO: Query GPU information.
DRM_PANTHOR_DEV_QUERY_CSIF_INFO: Query command-stream interface information.
DRM_PANTHOR_DEV_QUERY_TIMESTAMP_INFO: Query timestamp information.
DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO: Query allowed group priorities information.

Description

Place new types at the end, don’t re-order, don’t remove or replace.

struct drm_panthor_gpu_info¶: GPU information

Definition:

struct drm_panthor_gpu_info {
    __u32 gpu_id;
#define DRM_PANTHOR_ARCH_MAJOR(x)               ((x) >> 28);
#define DRM_PANTHOR_ARCH_MINOR(x)               (((x) >> 24) & 0xf);
#define DRM_PANTHOR_ARCH_REV(x)                 (((x) >> 20) & 0xf);
#define DRM_PANTHOR_PRODUCT_MAJOR(x)            (((x) >> 16) & 0xf);
#define DRM_PANTHOR_VERSION_MAJOR(x)            (((x) >> 12) & 0xf);
#define DRM_PANTHOR_VERSION_MINOR(x)            (((x) >> 4) & 0xff);
#define DRM_PANTHOR_VERSION_STATUS(x)           ((x) & 0xf);
    __u32 gpu_rev;
    __u32 csf_id;
#define DRM_PANTHOR_CSHW_MAJOR(x)               (((x) >> 26) & 0x3f);
#define DRM_PANTHOR_CSHW_MINOR(x)               (((x) >> 20) & 0x3f);
#define DRM_PANTHOR_CSHW_REV(x)                 (((x) >> 16) & 0xf);
#define DRM_PANTHOR_MCU_MAJOR(x)                (((x) >> 10) & 0x3f);
#define DRM_PANTHOR_MCU_MINOR(x)                (((x) >> 4) & 0x3f);
#define DRM_PANTHOR_MCU_REV(x)                  ((x) & 0xf);
    __u32 l2_features;
    __u32 tiler_features;
    __u32 mem_features;
    __u32 mmu_features;
#define DRM_PANTHOR_MMU_VA_BITS(x)              ((x) & 0xff);
    __u32 thread_features;
    __u32 max_threads;
    __u32 thread_max_workgroup_size;
    __u32 thread_max_barrier_size;
    __u32 coherency_features;
    __u32 texture_features[4];
    __u32 as_present;
    __u32 pad0;
    __u64 shader_present;
    __u64 l2_present;
    __u64 tiler_present;
    __u32 core_features;
    __u32 pad;
    __u64 gpu_features;
};

Members

gpu_id: GPU ID.
gpu_rev: GPU revision.
csf_id: Command stream frontend ID.
l2_features: L2-cache features.
tiler_features: Tiler features.
mem_features: Memory features.
mmu_features: MMU features.
thread_features: Thread features.
max_threads: Maximum number of threads.
thread_max_workgroup_size: Maximum workgroup size.
thread_max_barrier_size: Maximum number of threads that can wait simultaneously on a barrier.
coherency_features: Coherency features.
texture_features: Texture features.
as_present: Bitmask encoding the number of address-space exposed by the MMU.
pad0: MBZ.
shader_present: Bitmask encoding the shader cores exposed by the GPU.
l2_present: Bitmask encoding the L2 caches exposed by the GPU.
tiler_present: Bitmask encoding the tiler units exposed by the GPU.
core_features: Used to discriminate core variants when they exist.
pad: MBZ.
gpu_features: Bitmask describing supported GPU-wide features

Description

Structure grouping all queryable information relating to the GPU.

struct drm_panthor_csif_info¶: Command stream interface information

Definition:

struct drm_panthor_csif_info {
    __u32 csg_slot_count;
    __u32 cs_slot_count;
    __u32 cs_reg_count;
    __u32 scoreboard_slot_count;
    __u32 unpreserved_cs_reg_count;
    __u32 pad;
};

Members

csg_slot_count

Number of command stream group slots exposed by the firmware.

cs_slot_count

Number of command stream slots per group.

cs_reg_count

Number of command stream registers.

scoreboard_slot_count

Number of scoreboard slots.

unpreserved_cs_reg_count

Number of command stream registers reserved by the kernel driver to call a userspace command stream.

All registers can be used by a userspace command stream, but the [cs_slot_count - unpreserved_cs_reg_count .. cs_slot_count] registers are used by the kernel when DRM_PANTHOR_IOCTL_GROUP_SUBMIT is called.

pad

Padding field, set to zero.

Description

Structure grouping all queryable information relating to the command stream interface.

struct drm_panthor_timestamp_info¶: Timestamp information

Definition:

struct drm_panthor_timestamp_info {
    __u64 timestamp_frequency;
    __u64 current_timestamp;
    __u64 timestamp_offset;
};

Members

timestamp_frequency: The frequency of the timestamp timer or 0 if unknown.
current_timestamp: The current timestamp.
timestamp_offset: The offset of the timestamp timer.

Description

Structure grouping all queryable information relating to the GPU timestamp.

struct drm_panthor_group_priorities_info¶: Group priorities information

Definition:

struct drm_panthor_group_priorities_info {
    __u8 allowed_mask;
    __u8 pad[3];
};

Members

allowed_mask

Bitmask of the allowed group priorities.

Each bit represents a variant of the enum drm_panthor_group_priority.

pad

Padding fields, MBZ.

Description

Structure grouping all queryable information relating to the allowed group priorities.

struct drm_panthor_dev_query¶: Arguments passed to DRM_PANTHOR_IOCTL_DEV_QUERY

Definition:

struct drm_panthor_dev_query {
    __u32 type;
    __u32 size;
    __u64 pointer;
};

Members

type

the query type (see drm_panthor_dev_query_type).

size

size of the type being queried.

If pointer is NULL, size is updated by the driver to provide the output structure size. If pointer is not NULL, the driver will only copy min(size, actual_structure_size) bytes to the pointer, and update the size accordingly. This allows us to extend query types without breaking userspace.

pointer

user pointer to a query type struct.

Pointer can be NULL, in which case, nothing is copied, but the actual structure size is returned. If not NULL, it must point to a location that’s large enough to hold size bytes.

struct drm_panthor_vm_create¶: Arguments passed to DRM_PANTHOR_IOCTL_VM_CREATE

Definition:

struct drm_panthor_vm_create {
    __u32 flags;
    __u32 id;
    __u64 user_va_range;
};

Members

flags

VM flags, MBZ.

id

Returned VM ID.

user_va_range

Size of the VA space reserved for user objects.

The kernel will pick the remaining space to map kernel-only objects to the VM (heap chunks, heap context, ring buffers, kernel synchronization objects, ...). If the space left for kernel objects is too small, kernel object allocation will fail further down the road. One can use drm_panthor_gpu_info::mmu_features to extract the total virtual address range, and chose a user_va_range that leaves some space to the kernel.

If user_va_range is zero, the kernel will pick a sensible value based on TASK_SIZE and the virtual range supported by the GPU MMU (the kernel/user split should leave enough VA space for userspace processes to support SVM, while still allowing the kernel to map some amount of kernel objects in the kernel VA range). The value chosen by the driver will be returned in user_va_range.

User VA space always starts at 0x0, kernel VA space is always placed after the user VA range.

struct drm_panthor_vm_destroy¶: Arguments passed to DRM_PANTHOR_IOCTL_VM_DESTROY

Definition:

struct drm_panthor_vm_destroy {
    __u32 id;
    __u32 pad;
};

Members

id: ID of the VM to destroy.
pad: MBZ.

enum drm_panthor_vm_bind_op_flags¶: VM bind operation flags

Constants

DRM_PANTHOR_VM_BIND_OP_MAP_READONLY

Map the memory read-only.

Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.

DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC

Map the memory not-executable.

Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.

DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED

Map the memory uncached.

Only valid with DRM_PANTHOR_VM_BIND_OP_TYPE_MAP.

DRM_PANTHOR_VM_BIND_OP_TYPE_MASK

Mask used to determine the type of operation.

DRM_PANTHOR_VM_BIND_OP_TYPE_MAP

Map operation.

DRM_PANTHOR_VM_BIND_OP_TYPE_UNMAP

Unmap operation.

DRM_PANTHOR_VM_BIND_OP_TYPE_SYNC_ONLY

No VM operation.

Just serves as a synchronization point on a VM queue.

Only valid if DRM_PANTHOR_VM_BIND_ASYNC is set in drm_panthor_vm_bind::flags, and drm_panthor_vm_bind_op::syncs contains at least one element.

struct drm_panthor_vm_bind_op¶: VM bind operation

Definition:

struct drm_panthor_vm_bind_op {
    __u32 flags;
    __u32 bo_handle;
    __u64 bo_offset;
    __u64 va;
    __u64 size;
    struct drm_panthor_obj_array syncs;
};

Members

flags

Combination of drm_panthor_vm_bind_op_flags flags.

bo_handle

Handle of the buffer object to map. MBZ for unmap or sync-only operations.

bo_offset

Buffer object offset. MBZ for unmap or sync-only operations.

va

Virtual address to map/unmap. MBZ for sync-only operations.

size

Size to map/unmap. MBZ for sync-only operations.

syncs

Array of struct drm_panthor_sync_op synchronization operations.

This array must be empty if DRM_PANTHOR_VM_BIND_ASYNC is not set on the drm_panthor_vm_bind object containing this VM bind operation.

This array shall not be empty for sync-only operations.

enum drm_panthor_vm_bind_flags¶: VM bind flags

Constants

DRM_PANTHOR_VM_BIND_ASYNC: VM bind operations are queued to the VM queue instead of being executed synchronously.

struct drm_panthor_vm_bind¶: Arguments passed to DRM_IOCTL_PANTHOR_VM_BIND

Definition:

struct drm_panthor_vm_bind {
    __u32 vm_id;
    __u32 flags;
    struct drm_panthor_obj_array ops;
};

Members

vm_id: VM targeted by the bind request.
flags: Combination of drm_panthor_vm_bind_flags flags.
ops: Array of struct drm_panthor_vm_bind_op bind operations.

enum drm_panthor_vm_state¶: VM states.

Constants

DRM_PANTHOR_VM_STATE_USABLE

VM is usable.

New VM operations will be accepted on this VM.

DRM_PANTHOR_VM_STATE_UNUSABLE

VM is unusable.

Something put the VM in an unusable state (like an asynchronous VM_BIND request failing for any reason).

Once the VM is in this state, all new MAP operations will be rejected, and any GPU job targeting this VM will fail. UNMAP operations are still accepted.

The only way to recover from an unusable VM is to create a new VM, and destroy the old one.

struct drm_panthor_vm_get_state¶: Get VM state.

Definition:

struct drm_panthor_vm_get_state {
    __u32 vm_id;
    __u32 state;
};

Members

vm_id

VM targeted by the get_state request.

state

state returned by the driver.

Must be one of the enum drm_panthor_vm_state values.

enum drm_panthor_bo_flags¶: Buffer object flags, passed at creation time.

Constants

DRM_PANTHOR_BO_NO_MMAP: The buffer object will never be CPU-mapped in userspace.

struct drm_panthor_bo_create¶: Arguments passed to DRM_IOCTL_PANTHOR_BO_CREATE.

Definition:

struct drm_panthor_bo_create {
    __u64 size;
    __u32 flags;
    __u32 exclusive_vm_id;
    __u32 handle;
    __u32 pad;
};

Members

size

Requested size for the object

The (page-aligned) allocated size for the object will be returned.

flags

Flags. Must be a combination of drm_panthor_bo_flags flags.

exclusive_vm_id

Exclusive VM this buffer object will be mapped to.

If not zero, the field must refer to a valid VM ID, and implies that:

the buffer object will only ever be bound to that VM
cannot be exported as a PRIME fd

handle

Returned handle for the object.

Object handles are nonzero.

pad

MBZ.

struct drm_panthor_bo_mmap_offset¶: Arguments passed to DRM_IOCTL_PANTHOR_BO_MMAP_OFFSET.

Definition:

struct drm_panthor_bo_mmap_offset {
    __u32 handle;
    __u32 pad;
    __u64 offset;
};

Members

handle: Handle of the object we want an mmap offset for.
pad: MBZ.
offset: The fake offset to use for subsequent mmap calls.

struct drm_panthor_queue_create¶: Queue creation arguments.

Definition:

struct drm_panthor_queue_create {
    __u8 priority;
    __u8 pad[3];
    __u32 ringbuf_size;
};

Members

priority: Defines the priority of queues inside a group. Goes from 0 to 15, 15 being the highest priority.
pad: Padding fields, MBZ.
ringbuf_size: Size of the ring buffer to allocate to this queue.

enum drm_panthor_group_priority¶: Scheduling group priority

Constants

PANTHOR_GROUP_PRIORITY_LOW

Low priority group.

PANTHOR_GROUP_PRIORITY_MEDIUM

Medium priority group.

PANTHOR_GROUP_PRIORITY_HIGH

High priority group.

Requires CAP_SYS_NICE or DRM_MASTER.

PANTHOR_GROUP_PRIORITY_REALTIME

Realtime priority group.

Requires CAP_SYS_NICE or DRM_MASTER.

struct drm_panthor_group_create¶: Arguments passed to DRM_IOCTL_PANTHOR_GROUP_CREATE

Definition:

struct drm_panthor_group_create {
    struct drm_panthor_obj_array queues;
    __u8 max_compute_cores;
    __u8 max_fragment_cores;
    __u8 max_tiler_cores;
    __u8 priority;
    __u32 pad;
    __u64 compute_core_mask;
    __u64 fragment_core_mask;
    __u64 tiler_core_mask;
    __u32 vm_id;
    __u32 group_handle;
};

Members

queues

Array of drm_panthor_queue_create elements.

max_compute_cores

Maximum number of cores that can be used by compute jobs across CS queues bound to this group.

Must be less or equal to the number of bits set in compute_core_mask.

max_fragment_cores

Maximum number of cores that can be used by fragment jobs across CS queues bound to this group.

Must be less or equal to the number of bits set in fragment_core_mask.

max_tiler_cores

Maximum number of tilers that can be used by tiler jobs across CS queues bound to this group.

Must be less or equal to the number of bits set in tiler_core_mask.

priority

Group priority (see enum drm_panthor_group_priority).

pad

Padding field, MBZ.

compute_core_mask

Mask encoding cores that can be used for compute jobs.

This field must have at least max_compute_cores bits set.

The bits set here should also be set in drm_panthor_gpu_info::shader_present.

fragment_core_mask

Mask encoding cores that can be used for fragment jobs.

This field must have at least max_fragment_cores bits set.

The bits set here should also be set in drm_panthor_gpu_info::shader_present.

tiler_core_mask

Mask encoding cores that can be used for tiler jobs.

This field must have at least max_tiler_cores bits set.

The bits set here should also be set in drm_panthor_gpu_info::tiler_present.

vm_id

VM ID to bind this group to.

All submission to queues bound to this group will use this VM.

group_handle

Returned group handle. Passed back when submitting jobs or destroying a group.

struct drm_panthor_group_destroy¶: Arguments passed to DRM_IOCTL_PANTHOR_GROUP_DESTROY

Definition:

struct drm_panthor_group_destroy {
    __u32 group_handle;
    __u32 pad;
};

Members

group_handle: Group to destroy
pad: Padding field, MBZ.

struct drm_panthor_queue_submit¶: Job submission arguments.

Definition:

struct drm_panthor_queue_submit {
    __u32 queue_index;
    __u32 stream_size;
    __u64 stream_addr;
    __u32 latest_flush;
    __u32 pad;
    struct drm_panthor_obj_array syncs;
};

Members

queue_index

Index of the queue inside a group.

stream_size

Size of the command stream to execute.

Must be 64-bit/8-byte aligned (the size of a CS instruction)

Can be zero if stream_addr is zero too.

When the stream size is zero, the queue submit serves as a synchronization point.

stream_addr

GPU address of the command stream to execute.

Must be aligned on 64-byte.

Can be zero is stream_size is zero too.

latest_flush

FLUSH_ID read at the time the stream was built.

This allows cache flush elimination for the automatic flush+invalidate(all) done at submission time, which is needed to ensure the GPU doesn’t get garbage when reading the indirect command stream buffers. If you want the cache flush to happen unconditionally, pass a zero here.

Ignored when stream_size is zero.

pad

MBZ.

syncs

Array of struct drm_panthor_sync_op sync operations.

Description

This is describing the userspace command stream to call from the kernel command stream ring-buffer. Queue submission is always part of a group submission, taking one or more jobs to submit to the underlying queues.

struct drm_panthor_group_submit¶: Arguments passed to DRM_IOCTL_PANTHOR_GROUP_SUBMIT

Definition:

struct drm_panthor_group_submit {
    __u32 group_handle;
    __u32 pad;
    struct drm_panthor_obj_array queue_submits;
};

Members

group_handle: Handle of the group to queue jobs to.
pad: MBZ.
queue_submits: Array of drm_panthor_queue_submit objects.

enum drm_panthor_group_state_flags¶: Group state flags

Constants

DRM_PANTHOR_GROUP_STATE_TIMEDOUT

Group had unfinished jobs.

When a group ends up with this flag set, no jobs can be submitted to its queues.

DRM_PANTHOR_GROUP_STATE_FATAL_FAULT

Group had fatal faults.

When a group ends up with this flag set, no jobs can be submitted to its queues.

DRM_PANTHOR_GROUP_STATE_INNOCENT

Group was killed during a reset caused by other groups.

This flag can only be set if DRM_PANTHOR_GROUP_STATE_TIMEDOUT is set and DRM_PANTHOR_GROUP_STATE_FATAL_FAULT is not.

struct drm_panthor_group_get_state¶: Arguments passed to DRM_IOCTL_PANTHOR_GROUP_GET_STATE

Definition:

struct drm_panthor_group_get_state {
    __u32 group_handle;
    __u32 state;
    __u32 fatal_queues;
    __u32 pad;
};

Members

group_handle: Handle of the group to query state on
state: Combination of DRM_PANTHOR_GROUP_STATE_* flags encoding the group state.
fatal_queues: Bitmask of queues that faced fatal faults.
pad: MBZ

Description

Used to query the state of a group and decide whether a new group should be created to replace it.

struct drm_panthor_tiler_heap_create¶: Arguments passed to DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE

Definition:

struct drm_panthor_tiler_heap_create {
    __u32 vm_id;
    __u32 initial_chunk_count;
    __u32 chunk_size;
    __u32 max_chunks;
    __u32 target_in_flight;
    __u32 handle;
    __u64 tiler_heap_ctx_gpu_va;
    __u64 first_heap_chunk_gpu_va;
};

Members

vm_id

VM ID the tiler heap should be mapped to

initial_chunk_count

Initial number of chunks to allocate. Must be at least one.

chunk_size

Chunk size.

Must be page-aligned and lie in the [128k:8M] range.

max_chunks

Maximum number of chunks that can be allocated.

Must be at least initial_chunk_count.

target_in_flight

Maximum number of in-flight render passes.

If the heap has more than tiler jobs in-flight, the FW will wait for render passes to finish before queuing new tiler jobs.

handle

Returned heap handle. Passed back to DESTROY_TILER_HEAP.

tiler_heap_ctx_gpu_va

Returned heap GPU virtual address returned

first_heap_chunk_gpu_va

First heap chunk.

The tiler heap is formed of heap chunks forming a single-link list. This is the first element in the list.

struct drm_panthor_tiler_heap_destroy¶: Arguments passed to DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY

Definition:

struct drm_panthor_tiler_heap_destroy {
    __u32 handle;
    __u32 pad;
};

Members

handle

Handle of the tiler heap to destroy.

Must be a valid heap handle returned by DRM_IOCTL_PANTHOR_TILER_HEAP_CREATE.

pad

Padding field, MBZ.

struct drm_panthor_bo_set_label¶: Arguments passed to DRM_IOCTL_PANTHOR_BO_SET_LABEL

Definition:

struct drm_panthor_bo_set_label {
    __u32 handle;
    __u32 pad;
    __u64 label;
};

Members

handle

Handle of the buffer object to label.

pad

MBZ.

label

User pointer to a NUL-terminated string

Length cannot be greater than 4096

struct drm_panthor_set_user_mmio_offset¶: Arguments passed to DRM_IOCTL_PANTHOR_SET_USER_MMIO_OFFSET

Definition:

struct drm_panthor_set_user_mmio_offset {
    __u64 offset;
};

Members

offset

User MMIO offset to use.

Must be either DRM_PANTHOR_USER_MMIO_OFFSET_32BIT or DRM_PANTHOR_USER_MMIO_OFFSET_64BIT.

Use DRM_PANTHOR_USER_MMIO_OFFSET (which selects OFFSET_32BIT or OFFSET_64BIT based on the size of an unsigned long) unless you have a very good reason to overrule this decision.

Description

This ioctl is only really useful if you want to support userspace CPU emulation environments where the size of an unsigned long differs between the host and the guest architectures.

DRM_IOCTL_PANTHOR¶

DRM_IOCTL_PANTHOR (__access, __id, __type)

Build a Panthor IOCTL number

Parameters

__access: Access type. Must be R, W or RW.
__id: One of the DRM_PANTHOR_xxx id.
__type: Suffix of the type being passed to the IOCTL.

Description

Don’t use this macro directly, use the DRM_IOCTL_PANTHOR_xxx values instead.

Return

An IOCTL number to be passed to ioctl() from userspace.

drm/xe uAPI¶

Xe Device Block Diagram

The diagram below represents a high-level simplification of a discrete GPU supported by the Xe driver. It shows some device components which are necessary to understand this API, as well as how their relations to each other. This diagram does not represent real hardware:

┌──────────────────────────────────────────────────────────────────┐
│ ┌──────────────────────────────────────────────────┐ ┌─────────┐ │
│ │        ┌───────────────────────┐   ┌─────┐       │ │ ┌─────┐ │ │
│ │        │         VRAM0         ├───┤ ... │       │ │ │VRAM1│ │ │
│ │        └───────────┬───────────┘   └─GT1─┘       │ │ └──┬──┘ │ │
│ │ ┌──────────────────┴───────────────────────────┐ │ │ ┌──┴──┐ │ │
│ │ │ ┌─────────────────────┐  ┌─────────────────┐ │ │ │ │     │ │ │
│ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │  │ ┌─────┐ ┌─────┐ │ │ │ │ │     │ │ │
│ │ │ │ │EU│ │EU│ │EU│ │EU│ │  │ │RCS0 │ │BCS0 │ │ │ │ │ │     │ │ │
│ │ │ │ └──┘ └──┘ └──┘ └──┘ │  │ └─────┘ └─────┘ │ │ │ │ │     │ │ │
│ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │  │ ┌─────┐ ┌─────┐ │ │ │ │ │     │ │ │
│ │ │ │ │EU│ │EU│ │EU│ │EU│ │  │ │VCS0 │ │VCS1 │ │ │ │ │ │     │ │ │
│ │ │ │ └──┘ └──┘ └──┘ └──┘ │  │ └─────┘ └─────┘ │ │ │ │ │     │ │ │
│ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │  │ ┌─────┐ ┌─────┐ │ │ │ │ │     │ │ │
│ │ │ │ │EU│ │EU│ │EU│ │EU│ │  │ │VECS0│ │VECS1│ │ │ │ │ │ ... │ │ │
│ │ │ │ └──┘ └──┘ └──┘ └──┘ │  │ └─────┘ └─────┘ │ │ │ │ │     │ │ │
│ │ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │  │ ┌─────┐ ┌─────┐ │ │ │ │ │     │ │ │
│ │ │ │ │EU│ │EU│ │EU│ │EU│ │  │ │CCS0 │ │CCS1 │ │ │ │ │ │     │ │ │
│ │ │ │ └──┘ └──┘ └──┘ └──┘ │  │ └─────┘ └─────┘ │ │ │ │ │     │ │ │
│ │ │ └─────────DSS─────────┘  │ ┌─────┐ ┌─────┐ │ │ │ │ │     │ │ │
│ │ │                          │ │CCS2 │ │CCS3 │ │ │ │ │ │     │ │ │
│ │ │ ┌─────┐ ┌─────┐ ┌─────┐  │ └─────┘ └─────┘ │ │ │ │ │     │ │ │
│ │ │ │ ... │ │ ... │ │ ... │  │                 │ │ │ │ │     │ │ │
│ │ │ └─DSS─┘ └─DSS─┘ └─DSS─┘  └─────Engines─────┘ │ │ │ │     │ │ │
│ │ └───────────────────────────GT0────────────────┘ │ │ └─GT2─┘ │ │
│ └────────────────────────────Tile0─────────────────┘ └─ Tile1──┘ │
└─────────────────────────────Device0───────┬──────────────────────┘
                                            │
                     ───────────────────────┴────────── PCI bus

Xe uAPI Overview

This section aims to describe the Xe’s IOCTL entries, its structs, and other Xe related uAPI such as uevents and PMU (Platform Monitoring Unit) related entries and usage.

List of supported IOCTLs:

DRM_IOCTL_XE_DEVICE_QUERY
DRM_IOCTL_XE_GEM_CREATE
DRM_IOCTL_XE_GEM_MMAP_OFFSET
DRM_IOCTL_XE_VM_CREATE
DRM_IOCTL_XE_VM_DESTROY
DRM_IOCTL_XE_VM_BIND
DRM_IOCTL_XE_EXEC_QUEUE_CREATE
DRM_IOCTL_XE_EXEC_QUEUE_DESTROY
DRM_IOCTL_XE_EXEC_QUEUE_GET_PROPERTY
DRM_IOCTL_XE_EXEC
DRM_IOCTL_XE_WAIT_USER_FENCE
DRM_IOCTL_XE_OBSERVATION
DRM_IOCTL_XE_MADVISE
DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS

Xe IOCTL Extensions

Before detailing the IOCTLs and its structs, it is important to highlight that every IOCTL in Xe is extensible.

Many interfaces need to grow over time. In most cases we can simply extend the struct and have userspace pass in more data. Another option, as demonstrated by Vulkan’s approach to providing extensions for forward and backward compatibility, is to use a list of optional structs to provide those extra details.

The key advantage to using an extension chain is that it allows us to redefine the interface more easily than an ever growing struct of increasing complexity, and for large parts of that interface to be entirely optional. The downside is more pointer chasing; chasing across the __user boundary with pointers encapsulated inside u64.

Example chaining:

struct drm_xe_user_extension ext3 {
        .next_extension = 0, // end
        .name = ...,
};
struct drm_xe_user_extension ext2 {
        .next_extension = (uintptr_t)&ext3,
        .name = ...,
};
struct drm_xe_user_extension ext1 {
        .next_extension = (uintptr_t)&ext2,
        .name = ...,
};

Typically the struct drm_xe_user_extension would be embedded in some uAPI struct, and in this case we would feed it the head of the chain(i.e ext1), which would then apply all of the above extensions.

struct drm_xe_user_extension¶: Base class for defining a chain of extensions

Definition:

struct drm_xe_user_extension {
    __u64 next_extension;
    __u32 name;
    __u32 pad;
};

Members

next_extension

Pointer to the next struct drm_xe_user_extension, or zero if the end.

name

Name of the extension.

Note that the name here is just some integer.

Also note that the name space for this is not global for the whole driver, but rather its scope/meaning is limited to the specific piece of uAPI which has embedded the struct drm_xe_user_extension.

pad

MBZ

All undefined bits must be zero.

struct drm_xe_ext_set_property¶: Generic set property extension

Definition:

struct drm_xe_ext_set_property {
    struct drm_xe_user_extension base;
    __u32 property;
    __u32 pad;
    __u64 value;
    __u64 reserved[2];
};

Members

base: base user extension
property: property to set
pad: MBZ
value: property value
reserved: Reserved

Description

A generic struct that allows any of the Xe’s IOCTL to be extended with a set_property operation.

struct drm_xe_engine_class_instance¶: instance of an engine class

Definition:

struct drm_xe_engine_class_instance {
#define DRM_XE_ENGINE_CLASS_RENDER              0;
#define DRM_XE_ENGINE_CLASS_COPY                1;
#define DRM_XE_ENGINE_CLASS_VIDEO_DECODE        2;
#define DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE       3;
#define DRM_XE_ENGINE_CLASS_COMPUTE             4;
#define DRM_XE_ENGINE_CLASS_VM_BIND             5;
    __u16 engine_class;
    __u16 engine_instance;
    __u16 gt_id;
    __u16 pad;
};

Members

engine_class: engine class id
engine_instance: engine instance id
gt_id: Unique ID of this GT within the PCI Device
pad: MBZ

Description

It is returned as part of the drm_xe_engine, but it also is used as the input of engine selection for both drm_xe_exec_queue_create and drm_xe_query_engine_cycles

The engine_class can be:

DRM_XE_ENGINE_CLASS_RENDER
DRM_XE_ENGINE_CLASS_COPY
DRM_XE_ENGINE_CLASS_VIDEO_DECODE
DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE
DRM_XE_ENGINE_CLASS_COMPUTE
DRM_XE_ENGINE_CLASS_VM_BIND - Kernel only classes (not actual hardware engine class). Used for creating ordered queues of VM bind operations.

struct drm_xe_engine¶: describe hardware engine

Definition:

struct drm_xe_engine {
    struct drm_xe_engine_class_instance instance;
    __u64 reserved[3];
};

Members

instance: The drm_xe_engine_class_instance
reserved: Reserved

struct drm_xe_query_engines¶: describe engines

Definition:

struct drm_xe_query_engines {
    __u32 num_engines;
    __u32 pad;
    struct drm_xe_engine engines[];
};

Members

num_engines: number of engines returned in engines
pad: MBZ
engines: The returned engines for this device

Description

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_ENGINES, then the reply uses an array of struct drm_xe_query_engines in .data.

enum drm_xe_memory_class¶: Supported memory classes.

Constants

DRM_XE_MEM_REGION_CLASS_SYSMEM: Represents system memory.
DRM_XE_MEM_REGION_CLASS_VRAM: On discrete platforms, this represents the memory that is local to the device, which we call VRAM. Not valid on integrated platforms.

struct drm_xe_mem_region¶: Describes some region as known to the driver.

Definition:

struct drm_xe_mem_region {
    __u16 mem_class;
    __u16 instance;
    __u32 min_page_size;
    __u64 total_size;
    __u64 used;
    __u64 cpu_visible_size;
    __u64 cpu_visible_used;
    __u64 reserved[6];
};

Members

mem_class

The memory class describing this region.

See enum drm_xe_memory_class for supported values.

instance

The unique ID for this region, which serves as the index in the placement bitmask used as argument for DRM_IOCTL_XE_GEM_CREATE

min_page_size

Min page-size in bytes for this region.

When the kernel allocates memory for this region, the underlying pages will be at least min_page_size in size. Buffer objects with an allowable placement in this region must be created with a size aligned to this value. GPU virtual address mappings of (parts of) buffer objects that may be placed in this region must also have their GPU virtual address and range aligned to this value. Affected IOCTLS will return -EINVAL if alignment restrictions are not met.

total_size

The usable size in bytes for this region.

used

Estimate of the memory used in bytes for this region.

Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. Without this the value here will always equal zero.

cpu_visible_size

How much of this region can be CPU accessed, in bytes.

This will always be <= total_size, and the remainder (if any) will not be CPU accessible. If the CPU accessible part is smaller than total_size then this is referred to as a small BAR system.

On systems without small BAR (full BAR), the probed_size will always equal the total_size, since all of it will be CPU accessible.

Note this is only tracked for DRM_XE_MEM_REGION_CLASS_VRAM regions (for other types the value here will always equal zero).

cpu_visible_used

Estimate of CPU visible memory used, in bytes.

Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. Without this the value here will always equal zero. Note this is only currently tracked for DRM_XE_MEM_REGION_CLASS_VRAM regions (for other types the value here will always be zero).

reserved

Reserved

struct drm_xe_query_mem_regions¶: describe memory regions

Definition:

struct drm_xe_query_mem_regions {
    __u32 num_mem_regions;
    __u32 pad;
    struct drm_xe_mem_region mem_regions[];
};

Members

num_mem_regions: number of memory regions returned in mem_regions
pad: MBZ
mem_regions: The returned memory regions for this device

Description

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_MEM_REGIONS, then the reply uses struct drm_xe_query_mem_regions in .data.

struct drm_xe_query_config¶: describe the device configuration

Definition:

struct drm_xe_query_config {
    __u32 num_params;
    __u32 pad;
#define DRM_XE_QUERY_CONFIG_REV_AND_DEVICE_ID   0;
#define DRM_XE_QUERY_CONFIG_FLAGS                       1;
#define DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM       (1 << 0);
#define DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY        (1 << 1);
#define DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR    (1 << 2);
#define DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT               2;
#define DRM_XE_QUERY_CONFIG_VA_BITS                     3;
#define DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY     4;
    __u64 info[];
};

Members

num_params: number of parameters returned in info
pad: MBZ
info: array of elements containing the config info

Description

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_CONFIG, then the reply uses struct drm_xe_query_config in .data.

The index in info can be:

DRM_XE_QUERY_CONFIG_REV_AND_DEVICE_ID - Device ID (lower 16 bits) and the device revision (next 8 bits)
DRM_XE_QUERY_CONFIG_FLAGS - Flags describing the device configuration, see list below
- DRM_XE_QUERY_CONFIG_FLAG_HAS_VRAM - Flag is set if the device has usable VRAM
- DRM_XE_QUERY_CONFIG_FLAG_HAS_LOW_LATENCY - Flag is set if the device has low latency hint support
- DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR - Flag is set if the device has CPU address mirroring support
DRM_XE_QUERY_CONFIG_MIN_ALIGNMENT - Minimal memory alignment required by this device, typically SZ_4K or SZ_64K
DRM_XE_QUERY_CONFIG_VA_BITS - Maximum bits of a virtual address
DRM_XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY - Value of the highest available exec queue priority

struct drm_xe_gt¶: describe an individual GT.

Definition:

struct drm_xe_gt {
#define DRM_XE_QUERY_GT_TYPE_MAIN               0;
#define DRM_XE_QUERY_GT_TYPE_MEDIA              1;
    __u16 type;
    __u16 tile_id;
    __u16 gt_id;
    __u16 pad[3];
    __u32 reference_clock;
    __u64 near_mem_regions;
    __u64 far_mem_regions;
    __u16 ip_ver_major;
    __u16 ip_ver_minor;
    __u16 ip_ver_rev;
    __u16 pad2;
    __u64 reserved[7];
};

Members

type: GT type: Main or Media
tile_id: Tile ID where this GT lives (Information only)
gt_id: Unique ID of this GT within the PCI Device
pad: MBZ
reference_clock: A clock frequency for timestamp
near_mem_regions: Bit mask of instances from drm_xe_query_mem_regions that are nearest to the current engines of this GT. Each index in this mask refers directly to the struct drm_xe_query_mem_regions’ instance, no assumptions should be made about order. The type of each region is described by struct drm_xe_query_mem_regions’ mem_class.
far_mem_regions: Bit mask of instances from drm_xe_query_mem_regions that are far from the engines of this GT. In general, they have extra indirections when compared to the near_mem_regions. For a discrete device this could mean system memory and memory living in a different tile. Each index in this mask refers directly to the struct drm_xe_query_mem_regions’ instance, no assumptions should be made about order. The type of each region is described by struct drm_xe_query_mem_regions’ mem_class.
ip_ver_major: Graphics/media IP major version on GMD_ID platforms
ip_ver_minor: Graphics/media IP minor version on GMD_ID platforms
ip_ver_rev: Graphics/media IP revision version on GMD_ID platforms
pad2: MBZ
reserved: Reserved

Description

To be used with drm_xe_query_gt_list, which will return a list with all the existing GT individual descriptions. Graphics Technology (GT) is a subset of a GPU/tile that is responsible for implementing graphics and/or media operations.

The index in type can be:

DRM_XE_QUERY_GT_TYPE_MAIN
DRM_XE_QUERY_GT_TYPE_MEDIA

struct drm_xe_query_gt_list¶: A list with GT description items.

Definition:

struct drm_xe_query_gt_list {
    __u32 num_gt;
    __u32 pad;
    struct drm_xe_gt gt_list[];
};

Members

num_gt: number of GT items returned in gt_list
pad: MBZ
gt_list: The GT list returned for this device

Description

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_GT_LIST, then the reply uses struct drm_xe_query_gt_list in .data.

struct drm_xe_query_topology_mask¶: describe the topology mask of a GT

Definition:

struct drm_xe_query_topology_mask {
    __u16 gt_id;
#define DRM_XE_TOPO_DSS_GEOMETRY        1;
#define DRM_XE_TOPO_DSS_COMPUTE         2;
#define DRM_XE_TOPO_L3_BANK             3;
#define DRM_XE_TOPO_EU_PER_DSS          4;
#define DRM_XE_TOPO_SIMD16_EU_PER_DSS   5;
    __u16 type;
    __u32 num_bytes;
    __u8 mask[];
};

Members

gt_id: GT ID the mask is associated with
type: type of mask
num_bytes: number of bytes in requested mask
mask: little-endian mask of num_bytes

Description

This is the hardware topology which reflects the internal physical structure of the GPU.

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_GT_TOPOLOGY, then the reply uses struct drm_xe_query_topology_mask in .data.

The type can be:

DRM_XE_TOPO_DSS_GEOMETRY - To query the mask of Dual Sub Slices (DSS) available for geometry operations. For example a query response containing the following in mask: DSS_GEOMETRY ff ff ff ff 00 00 00 00 means 32 DSS are available for geometry.
DRM_XE_TOPO_DSS_COMPUTE - To query the mask of Dual Sub Slices (DSS) available for compute operations. For example a query response containing the following in mask: DSS_COMPUTE ff ff ff ff 00 00 00 00 means 32 DSS are available for compute.
DRM_XE_TOPO_L3_BANK - To query the mask of enabled L3 banks. This type may be omitted if the driver is unable to query the mask from the hardware.
DRM_XE_TOPO_EU_PER_DSS - To query the mask of Execution Units (EU) available per Dual Sub Slices (DSS). For example a query response containing the following in mask: EU_PER_DSS ff ff 00 00 00 00 00 00 means each DSS has 16 SIMD8 EUs. This type may be omitted if device doesn’t have SIMD8 EUs.
DRM_XE_TOPO_SIMD16_EU_PER_DSS - To query the mask of SIMD16 Execution Units (EU) available per Dual Sub Slices (DSS). For example a query response containing the following in mask: SIMD16_EU_PER_DSS ff ff 00 00 00 00 00 00 means each DSS has 16 SIMD16 EUs. This type may be omitted if device doesn’t have SIMD16 EUs.

struct drm_xe_query_engine_cycles¶: correlate CPU and GPU timestamps

Definition:

struct drm_xe_query_engine_cycles {
    struct drm_xe_engine_class_instance eci;
    __s32 clockid;
    __u32 width;
    __u64 engine_cycles;
    __u64 cpu_timestamp;
    __u64 cpu_delta;
};

Members

eci: This is input by the user and is the engine for which command streamer cycles is queried.
clockid: This is input by the user and is the reference clock id for CPU timestamp. For definition, see clock_gettime(2) and perf_event_open(2). Supported clock ids are CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME, CLOCK_BOOTTIME, CLOCK_TAI.
width: Width of the engine cycle counter in bits.
engine_cycles: Engine cycles as read from its register at 0x358 offset.
cpu_timestamp: CPU timestamp in ns. The timestamp is captured before reading the engine_cycles register using the reference clockid set by the user.
cpu_delta: Time delta in ns captured around reading the lower dword of the engine_cycles register.

Description

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_ENGINE_CYCLES, then the reply uses struct drm_xe_query_engine_cycles in .data. struct drm_xe_query_engine_cycles is allocated by the user and .data points to this allocated structure.

The query returns the engine cycles, which along with GT’s reference_clock, can be used to calculate the engine timestamp. In addition the query returns a set of cpu timestamps that indicate when the command streamer cycle count was captured.

struct drm_xe_query_uc_fw_version¶: query a micro-controller firmware version

Definition:

struct drm_xe_query_uc_fw_version {
#define XE_QUERY_UC_TYPE_GUC_SUBMISSION 0;
#define XE_QUERY_UC_TYPE_HUC 1;
    __u16 uc_type;
    __u16 pad;
    __u32 branch_ver;
    __u32 major_ver;
    __u32 minor_ver;
    __u32 patch_ver;
    __u32 pad2;
    __u64 reserved;
};

Members

uc_type: The micro-controller type to query firmware version
pad: MBZ
branch_ver: branch uc fw version
major_ver: major uc fw version
minor_ver: minor uc fw version
patch_ver: patch uc fw version
pad2: MBZ
reserved: Reserved

Description

Given a uc_type this will return the branch, major, minor and patch version of the micro-controller firmware.

struct drm_xe_query_pxp_status¶: query if PXP is ready

Definition:

struct drm_xe_query_pxp_status {
    __u32 status;
    __u32 supported_session_types;
};

Members

status: current PXP status
supported_session_types: bitmask of supported PXP session types

Description

If PXP is enabled and no fatal error has occurred, the status will be set to one of the following values: 0: PXP init still in progress 1: PXP init complete

If PXP is not enabled or something has gone wrong, the query will be failed with one of the following error codes: -ENODEV: PXP not supported or disabled; -EIO: fatal error occurred during init, so PXP will never be enabled; -EINVAL: incorrect value provided as part of the query; -EFAULT: error copying the memory between kernel and userspace.

The status can only be 0 in the first few seconds after driver load. If everything works as expected, the status will transition to init complete in less than 1 second, while in case of errors the driver might take longer to start returning an error code, but it should still take less than 10 seconds.

The supported session type bitmask is based on the values in enum drm_xe_pxp_session_type. TYPE_NONE is always supported and therefore is not reported in the bitmask.

struct drm_xe_device_query¶: Input of DRM_IOCTL_XE_DEVICE_QUERY - main structure to query device information

Definition:

struct drm_xe_device_query {
    __u64 extensions;
#define DRM_XE_DEVICE_QUERY_ENGINES             0;
#define DRM_XE_DEVICE_QUERY_MEM_REGIONS         1;
#define DRM_XE_DEVICE_QUERY_CONFIG              2;
#define DRM_XE_DEVICE_QUERY_GT_LIST             3;
#define DRM_XE_DEVICE_QUERY_HWCONFIG            4;
#define DRM_XE_DEVICE_QUERY_GT_TOPOLOGY         5;
#define DRM_XE_DEVICE_QUERY_ENGINE_CYCLES       6;
#define DRM_XE_DEVICE_QUERY_UC_FW_VERSION       7;
#define DRM_XE_DEVICE_QUERY_OA_UNITS            8;
#define DRM_XE_DEVICE_QUERY_PXP_STATUS          9;
#define DRM_XE_DEVICE_QUERY_EU_STALL            10;
    __u32 query;
    __u32 size;
    __u64 data;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
query: The type of data to query
size: Size of the queried data
data: Queried data is placed here
reserved: Reserved

Description

The user selects the type of data to query among DRM_XE_DEVICE_QUERY_* and sets the value in the query member. This determines the type of the structure provided by the driver in data, among struct drm_xe_query_*.

The query can be:

DRM_XE_DEVICE_QUERY_ENGINES
DRM_XE_DEVICE_QUERY_MEM_REGIONS
DRM_XE_DEVICE_QUERY_CONFIG
DRM_XE_DEVICE_QUERY_GT_LIST
DRM_XE_DEVICE_QUERY_HWCONFIG - Query type to retrieve the hardware configuration of the device such as information on slices, memory, caches, and so on. It is provided as a table of key / value attributes.
DRM_XE_DEVICE_QUERY_GT_TOPOLOGY
DRM_XE_DEVICE_QUERY_ENGINE_CYCLES
DRM_XE_DEVICE_QUERY_PXP_STATUS

If size is set to 0, the driver fills it with the required size for the requested type of data to query. If size is equal to the required size, the queried information is copied into data. If size is set to a value different from 0 and different from the required size, the IOCTL call returns -EINVAL.

For example the following code snippet allows retrieving and printing information about the device engines with DRM_XE_DEVICE_QUERY_ENGINES:

struct drm_xe_query_engines *engines;
struct drm_xe_device_query query = {
    .extensions = 0,
    .query = DRM_XE_DEVICE_QUERY_ENGINES,
    .size = 0,
    .data = 0,
};
ioctl(fd, DRM_IOCTL_XE_DEVICE_QUERY, &query);
engines = malloc(query.size);
query.data = (uintptr_t)engines;
ioctl(fd, DRM_IOCTL_XE_DEVICE_QUERY, &query);
for (int i = 0; i < engines->num_engines; i++) {
    printf("Engine %d: %s\n", i,
        engines->engines[i].instance.engine_class ==
            DRM_XE_ENGINE_CLASS_RENDER ? "RENDER":
        engines->engines[i].instance.engine_class ==
            DRM_XE_ENGINE_CLASS_COPY ? "COPY":
        engines->engines[i].instance.engine_class ==
            DRM_XE_ENGINE_CLASS_VIDEO_DECODE ? "VIDEO_DECODE":
        engines->engines[i].instance.engine_class ==
            DRM_XE_ENGINE_CLASS_VIDEO_ENHANCE ? "VIDEO_ENHANCE":
        engines->engines[i].instance.engine_class ==
            DRM_XE_ENGINE_CLASS_COMPUTE ? "COMPUTE":
        "UNKNOWN");
}
free(engines);

struct drm_xe_gem_create¶: Input of DRM_IOCTL_XE_GEM_CREATE - A structure for gem creation

Definition:

struct drm_xe_gem_create {
#define DRM_XE_GEM_CREATE_EXTENSION_SET_PROPERTY        0;
#define DRM_XE_GEM_CREATE_SET_PROPERTY_PXP_TYPE       0;
    __u64 extensions;
    __u64 size;
    __u32 placement;
#define DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING            (1 << 0);
#define DRM_XE_GEM_CREATE_FLAG_SCANOUT                  (1 << 1);
#define DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM       (1 << 2);
    __u32 flags;
    __u32 vm_id;
    __u32 handle;
#define DRM_XE_GEM_CPU_CACHING_WB                      1;
#define DRM_XE_GEM_CPU_CACHING_WC                      2;
    __u16 cpu_caching;
    __u16 pad[3];
    __u64 reserved[2];
};

Members

extensions

Pointer to the first extension struct, if any

size

Size of the object to be created, must match region (system or vram) minimum alignment (min_page_size).

placement

A mask of memory instances of where BO can be placed. Each index in this mask refers directly to the struct drm_xe_query_mem_regions’ instance, no assumptions should be made about order. The type of each region is described by struct drm_xe_query_mem_regions’ mem_class.

flags

Flags, currently a mask of memory instances of where BO can be placed

vm_id

Attached VM, if any

If a VM is specified, this BO must:

Only ever be bound to that VM.

Cannot be exported as a PRIME fd.

handle

Returned handle for the object.

Object handles are nonzero.

cpu_caching

The CPU caching mode to select for this object. If mmaping the object the mode selected here will also be used. The exception is when mapping system memory (including data evicted to system) on discrete GPUs. The caching mode selected will then be overridden to DRM_XE_GEM_CPU_CACHING_WB, and coherency between GPU- and CPU is guaranteed. The caching mode of existing CPU-mappings will be updated transparently to user-space clients.

pad

MBZ

reserved

Reserved

Description

The flags can be:

DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING - Modify the GEM object allocation strategy by deferring physical memory allocation until the object is either bound to a virtual memory region via VM_BIND or accessed by the CPU. As a result, no backing memory is reserved at the time of GEM object creation.
DRM_XE_GEM_CREATE_FLAG_SCANOUT - Indicates that the GEM object is intended for scanout via the display engine. When set, kernel ensures that the allocation is placed in a memory region compatible with the display engine requirements. This may impose restrictions on tiling, alignment, and memory placement to guarantee proper display functionality.
DRM_XE_GEM_CREATE_FLAG_NEEDS_VISIBLE_VRAM - When using VRAM as a possible placement, ensure that the corresponding VRAM allocation will always use the CPU accessible part of VRAM. This is important for small-bar systems (on full-bar systems this gets turned into a noop). Note1: System memory can be used as an extra placement if the kernel should spill the allocation to system memory, if space can’t be made available in the CPU accessible part of VRAM (giving the same behaviour as the i915 interface, see I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS). Note2: For clear-color CCS surfaces the kernel needs to read the clear-color value stored in the buffer, and on discrete platforms we need to use VRAM for display surfaces, therefore the kernel requires setting this flag for such objects, otherwise an error is thrown on small-bar systems.

cpu_caching supports the following values:

DRM_XE_GEM_CPU_CACHING_WB - Allocate the pages with write-back caching. On iGPU this can’t be used for scanout surfaces. Currently not allowed for objects placed in VRAM.
DRM_XE_GEM_CPU_CACHING_WC - Allocate the pages as write-combined. This is uncached. Scanout surfaces should likely use this. All objects that can be placed in VRAM must use this.

This ioctl supports setting the following properties via the DRM_XE_GEM_CREATE_EXTENSION_SET_PROPERTY extension, which uses the generic drm_xe_ext_set_property struct:

DRM_XE_GEM_CREATE_SET_PROPERTY_PXP_TYPE - set the type of PXP session this object will be used with. Valid values are listed in enum drm_xe_pxp_session_type. DRM_XE_PXP_TYPE_NONE is the default behavior, so there is no need to explicitly set that. Objects used with session of type DRM_XE_PXP_TYPE_HWDRM will be marked as invalid if a PXP invalidation event occurs after their creation. Attempting to flip an invalid object will cause a black frame to be displayed instead. Submissions with invalid objects mapped in the VM will be rejected.

struct drm_xe_gem_mmap_offset¶: Input of DRM_IOCTL_XE_GEM_MMAP_OFFSET

Definition:

struct drm_xe_gem_mmap_offset {
    __u64 extensions;
    __u32 handle;
#define DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER     (1 << 0);
    __u32 flags;
    __u64 offset;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
handle: Handle for the object being mapped.
flags: Flags
offset: The fake offset to use for subsequent mmap call
reserved: Reserved

Description

The flags can be:

DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER - For user to query special offset for use in mmap ioctl. Writing to the returned mmap address will generate a PCI memory barrier with low overhead (avoiding IOCTL call as well as writing to VRAM which would also add overhead), acting like an MI_MEM_FENCE instruction.

Note

The mmap size can be at most 4K, due to HW limitations. As a result this interface is only supported on CPU architectures that support 4K page size. The mmap_offset ioctl will detect this and gracefully return an error, where userspace is expected to have a different fallback method for triggering a barrier.

Roughly the usage would be as follows:

struct drm_xe_gem_mmap_offset mmo = {
    .handle = 0, // must be set to 0
    .flags = DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER,
};

err = ioctl(fd, DRM_IOCTL_XE_GEM_MMAP_OFFSET, &mmo);
map = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, mmo.offset);
map[i] = 0xdeadbeaf; // issue barrier

struct drm_xe_vm_create¶: Input of DRM_IOCTL_XE_VM_CREATE

Definition:

struct drm_xe_vm_create {
    __u64 extensions;
#define DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE      (1 << 0);
#define DRM_XE_VM_CREATE_FLAG_LR_MODE           (1 << 1);
#define DRM_XE_VM_CREATE_FLAG_FAULT_MODE        (1 << 2);
    __u32 flags;
    __u32 vm_id;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
flags: Flags
vm_id: Returned VM ID
reserved: Reserved

Description

The flags can be:

DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE - Map the whole virtual address space of the VM to scratch page. A vm_bind would overwrite the scratch page mapping. This flag is mutually exclusive with the DRM_XE_VM_CREATE_FLAG_FAULT_MODE flag, with an exception of on x2 and xe3 platform.
DRM_XE_VM_CREATE_FLAG_LR_MODE - An LR, or Long Running VM accepts exec submissions to its exec_queues that don’t have an upper time limit on the job execution time. But exec submissions to these don’t allow any of the sync types DRM_XE_SYNC_TYPE_SYNCOBJ, DRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJ, used as out-syncobjs, that is, together with sync flag DRM_XE_SYNC_FLAG_SIGNAL. LR VMs can be created in recoverable page-fault mode using DRM_XE_VM_CREATE_FLAG_FAULT_MODE, if the device supports it. If that flag is omitted, the UMD can not rely on the slightly different per-VM overcommit semantics that are enabled by DRM_XE_VM_CREATE_FLAG_FAULT_MODE (see below), but KMD may still enable recoverable pagefaults if supported by the device.
DRM_XE_VM_CREATE_FLAG_FAULT_MODE - Requires also DRM_XE_VM_CREATE_FLAG_LR_MODE. It allows memory to be allocated on demand when accessed, and also allows per-VM overcommit of memory. The xe driver internally uses recoverable pagefaults to implement this.

struct drm_xe_vm_destroy¶: Input of DRM_IOCTL_XE_VM_DESTROY

Definition:

struct drm_xe_vm_destroy {
    __u32 vm_id;
    __u32 pad;
    __u64 reserved[2];
};

Members

vm_id: VM ID
pad: MBZ
reserved: Reserved

struct drm_xe_vm_bind_op¶: run bind operations

Definition:

struct drm_xe_vm_bind_op {
    __u64 extensions;
    __u32 obj;
    __u16 pat_index;
    __u16 pad;
    union {
        __u64 obj_offset;
        __u64 userptr;
        __s64 cpu_addr_mirror_offset;
    };
    __u64 range;
    __u64 addr;
#define DRM_XE_VM_BIND_OP_MAP           0x0;
#define DRM_XE_VM_BIND_OP_UNMAP         0x1;
#define DRM_XE_VM_BIND_OP_MAP_USERPTR   0x2;
#define DRM_XE_VM_BIND_OP_UNMAP_ALL     0x3;
#define DRM_XE_VM_BIND_OP_PREFETCH      0x4;
    __u32 op;
#define DRM_XE_VM_BIND_FLAG_READONLY    (1 << 0);
#define DRM_XE_VM_BIND_FLAG_IMMEDIATE   (1 << 1);
#define DRM_XE_VM_BIND_FLAG_NULL        (1 << 2);
#define DRM_XE_VM_BIND_FLAG_DUMPABLE    (1 << 3);
#define DRM_XE_VM_BIND_FLAG_CHECK_PXP   (1 << 4);
#define DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR     (1 << 5);
#define DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET   (1 << 6);
    __u32 flags;
#define DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC      -1;
    __u32 prefetch_mem_region_instance;
    __u32 pad2;
    __u64 reserved[3];
};

Members

extensions

Pointer to the first extension struct, if any

obj

GEM object to operate on, MBZ for MAP_USERPTR, MBZ for UNMAP

pat_index

The platform defined pat_index to use for this mapping. The index basically maps to some predefined memory attributes, including things like caching, coherency, compression etc. The exact meaning of the pat_index is platform specific and defined in the Bspec and PRMs. When the KMD sets up the binding the index here is encoded into the ppGTT PTE.

For coherency the pat_index needs to be at least 1way coherent when drm_xe_gem_create.cpu_caching is DRM_XE_GEM_CPU_CACHING_WB. The KMD will extract the coherency mode from the pat_index and reject if there is a mismatch (see note below for pre-MTL platforms).

Note: On pre-MTL platforms there is only a caching mode and no explicit coherency mode, but on such hardware there is always a shared-LLC (or is dgpu) so all GT memory accesses are coherent with CPU caches even with the caching mode set as uncached. It’s only the display engine that is incoherent (on dgpu it must be in VRAM which is always mapped as WC on the CPU). However to keep the uapi somewhat consistent with newer platforms the KMD groups the different cache levels into the following coherency buckets on all pre-MTL platforms:

ppGTT UC -> COH_NONE ppGTT WC -> COH_NONE ppGTT WT -> COH_NONE ppGTT WB -> COH_AT_LEAST_1WAY

In practice UC/WC/WT should only ever used for scanout surfaces on such platforms (or perhaps in general for dma-buf if shared with another device) since it is only the display engine that is actually incoherent. Everything else should typically use WB given that we have a shared-LLC. On MTL+ this completely changes and the HW defines the coherency mode as part of the pat_index, where incoherent GT access is possible.

Note: For userptr and externally imported dma-buf the kernel expects either 1WAY or 2WAY for the pat_index.

For DRM_XE_VM_BIND_FLAG_NULL bindings there are no KMD restrictions on the pat_index. For such mappings there is no actual memory being mapped (the address in the PTE is invalid), so the various PAT memory attributes likely do not apply. Simply leaving as zero is one option (still a valid pat_index). Same applies to DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR bindings as for such mapping there is no actual memory being mapped.

pad

MBZ

{unnamed_union}

anonymous

obj_offset

Offset into the object, MBZ for CLEAR_RANGE, ignored for unbind

userptr

user pointer to bind on

cpu_addr_mirror_offset

Offset from GPU addr to create CPU address mirror mappings. MBZ with current level of support (e.g. 1 to 1 mapping between GPU and CPU mappings only supported).

range

Number of bytes from the object to bind to addr, MBZ for UNMAP_ALL

addr

Address to operate on, MBZ for UNMAP_ALL

op

Bind operation to perform

flags

Bind flags

prefetch_mem_region_instance

Memory region to prefetch VMA to. It is a region instance, not a mask. To be used only with DRM_XE_VM_BIND_OP_PREFETCH operation.

pad2

MBZ

reserved

Reserved

Description

The op can be:

DRM_XE_VM_BIND_OP_MAP
DRM_XE_VM_BIND_OP_UNMAP
DRM_XE_VM_BIND_OP_MAP_USERPTR
DRM_XE_VM_BIND_OP_UNMAP_ALL
DRM_XE_VM_BIND_OP_PREFETCH

and the flags can be:

DRM_XE_VM_BIND_FLAG_READONLY - Setup the page tables as read-only to ensure write protection
DRM_XE_VM_BIND_FLAG_IMMEDIATE - On a faulting VM, do the MAP operation immediately rather than deferring the MAP to the page fault handler. This is implied on a non-faulting VM as there is no fault handler to defer to.
DRM_XE_VM_BIND_FLAG_NULL - When the NULL flag is set, the page tables are setup with a special bit which indicates writes are dropped and all reads return zero. In the future, the NULL flags will only be valid for DRM_XE_VM_BIND_OP_MAP operations, the BO handle MBZ, and the BO offset MBZ. This flag is intended to implement VK sparse bindings.
DRM_XE_VM_BIND_FLAG_CHECK_PXP - If the object is encrypted via PXP, reject the binding if the encryption key is no longer valid. This flag has no effect on BOs that are not marked as using PXP.
DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR - When the CPU address mirror flag is set, no mappings are created rather the range is reserved for CPU address mirroring which will be populated on GPU page faults or prefetches. Only valid on VMs with DRM_XE_VM_CREATE_FLAG_FAULT_MODE set. The CPU address mirror flag are only valid for DRM_XE_VM_BIND_OP_MAP operations, the BO handle MBZ, and the BO offset MBZ.
DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET - Can be used in combination with DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR to reset madvises when the underlying CPU address space range is unmapped (typically with munmap(2) or brk(2)). The madvise values set with DRM_IOCTL_XE_MADVISE are reset to the values that were present immediately after the DRM_IOCTL_XE_VM_BIND. The reset GPU virtual address range is the intersection of the range bound using DRM_IOCTL_XE_VM_BIND and the virtual CPU address space range unmapped. This functionality is present to mimic the behaviour of CPU address space madvises set using madvise(2), which are typically reset on unmap.

Note

free(3) may or may not call munmap(2) and/or brk(2), and may thus

not invoke autoreset. Neither will stack variables going out of scope. Therefore it’s recommended to always explicitly reset the madvises when freeing the memory backing a region used in a DRM_IOCTL_XE_MADVISE call.

The prefetch_mem_region_instance for DRM_XE_VM_BIND_OP_PREFETCH can also be:

DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC, which ensures prefetching occurs in the memory region advised by madvise.

struct drm_xe_vm_bind¶: Input of DRM_IOCTL_XE_VM_BIND

Definition:

struct drm_xe_vm_bind {
    __u64 extensions;
    __u32 vm_id;
    __u32 exec_queue_id;
    __u32 pad;
    __u32 num_binds;
    union {
        struct drm_xe_vm_bind_op bind;
        __u64 vector_of_binds;
    };
    __u32 pad2;
    __u32 num_syncs;
    __u64 syncs;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
vm_id: The ID of the VM to bind to
exec_queue_id: exec_queue_id, must be of class DRM_XE_ENGINE_CLASS_VM_BIND and exec queue must have same vm_id. If zero, the default VM bind engine is used.
pad: MBZ
num_binds: number of binds in this IOCTL
{unnamed_union}: anonymous
bind: used if num_binds == 1
vector_of_binds: userptr to array of struct drm_xe_vm_bind_op if num_binds > 1
pad2: MBZ
num_syncs: amount of syncs to wait on
syncs: pointer to struct drm_xe_sync array
reserved: Reserved

Description

Below is an example of a minimal use of drm_xe_vm_bind to asynchronously bind the buffer data at address BIND_ADDRESS to illustrate userptr. It can be synchronized by using the example provided for drm_xe_sync.

data = aligned_alloc(ALIGNMENT, BO_SIZE);
struct drm_xe_vm_bind bind = {
    .vm_id = vm,
    .num_binds = 1,
    .bind.obj = 0,
    .bind.obj_offset = to_user_pointer(data),
    .bind.range = BO_SIZE,
    .bind.addr = BIND_ADDRESS,
    .bind.op = DRM_XE_VM_BIND_OP_MAP_USERPTR,
    .bind.flags = 0,
    .num_syncs = 1,
    .syncs = &sync,
    .exec_queue_id = 0,
};
ioctl(fd, DRM_IOCTL_XE_VM_BIND, &bind);

struct drm_xe_exec_queue_create¶: Input of DRM_IOCTL_XE_EXEC_QUEUE_CREATE

Definition:

struct drm_xe_exec_queue_create {
#define DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY                0;
#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY               0;
#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE              1;
#define DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE               2;
    __u64 extensions;
    __u16 width;
    __u16 num_placements;
    __u32 vm_id;
#define DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT      (1 << 0);
    __u32 flags;
    __u32 exec_queue_id;
    __u64 instances;
    __u64 reserved[2];
};

Members

extensions

Pointer to the first extension struct, if any

width

submission width (number BB per exec) for this exec queue

num_placements

number of valid placements for this exec queue

vm_id

VM to use for this exec queue

flags

flags to use for this exec queue

exec_queue_id

Returned exec queue ID

instances

user pointer to a 2-d array of struct drm_xe_engine_class_instance

length = width (i) * num_placements (j) index = j + i * width

reserved

Reserved

Description

This ioctl supports setting the following properties via the DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY extension, which uses the generic drm_xe_ext_set_property struct:

DRM_XE_EXEC_QUEUE_SET_PROPERTY_PRIORITY - set the queue priority. CAP_SYS_NICE is required to set a value above normal.

DRM_XE_EXEC_QUEUE_SET_PROPERTY_TIMESLICE - set the queue timeslice duration in microseconds.

DRM_XE_EXEC_QUEUE_SET_PROPERTY_PXP_TYPE - set the type of PXP session this queue will be used with. Valid values are listed in enum drm_xe_pxp_session_type. DRM_XE_PXP_TYPE_NONE is the default behavior, so there is no need to explicitly set that. When a queue of type DRM_XE_PXP_TYPE_HWDRM is created, the PXP default HWDRM session (XE_PXP_HWDRM_DEFAULT_SESSION) will be started, if isn’t already running. The user is expected to query the PXP status via the query ioctl (see DRM_XE_DEVICE_QUERY_PXP_STATUS) and to wait for PXP to be ready before attempting to create a queue with this property. When a queue is created before PXP is ready, the ioctl will return -EBUSY if init is still in progress or -EIO if init failed. Given that going into a power-saving state kills PXP HWDRM sessions, runtime PM will be blocked while queues of this type are alive. All PXP queues will be killed if a PXP invalidation event occurs.

The example below shows how to use drm_xe_exec_queue_create to create a simple exec_queue (no parallel submission) of class DRM_XE_ENGINE_CLASS_RENDER.

struct drm_xe_engine_class_instance instance = {
    .engine_class = DRM_XE_ENGINE_CLASS_RENDER,
};
struct drm_xe_exec_queue_create exec_queue_create = {
     .extensions = 0,
     .vm_id = vm,
     .num_bb_per_exec = 1,
     .num_eng_per_bb = 1,
     .instances = to_user_pointer(&instance),
};
ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &exec_queue_create);

Allow users to provide a hint to kernel for cases demanding low latency
profile. Please note it will have impact on power consumption. User can
indicate low latency hint with flag while creating exec queue as
mentioned below,

struct drm_xe_exec_queue_create exec_queue_create = {
     .flags = DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT,
     .extensions = 0,
     .vm_id = vm,
     .num_bb_per_exec = 1,
     .num_eng_per_bb = 1,
     .instances = to_user_pointer(&instance),
};
ioctl(fd, DRM_IOCTL_XE_EXEC_QUEUE_CREATE, &exec_queue_create);

struct drm_xe_exec_queue_destroy¶: Input of DRM_IOCTL_XE_EXEC_QUEUE_DESTROY

Definition:

struct drm_xe_exec_queue_destroy {
    __u32 exec_queue_id;
    __u32 pad;
    __u64 reserved[2];
};

Members

exec_queue_id: Exec queue ID
pad: MBZ
reserved: Reserved

struct drm_xe_exec_queue_get_property¶: Input of DRM_IOCTL_XE_EXEC_QUEUE_GET_PROPERTY

Definition:

struct drm_xe_exec_queue_get_property {
    __u64 extensions;
    __u32 exec_queue_id;
#define DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN      0;
    __u32 property;
    __u64 value;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
exec_queue_id: Exec queue ID
property: property to get
value: property value
reserved: Reserved

Description

The property can be:

DRM_XE_EXEC_QUEUE_GET_PROPERTY_BAN

struct drm_xe_sync¶: sync object

Definition:

struct drm_xe_sync {
    __u64 extensions;
#define DRM_XE_SYNC_TYPE_SYNCOBJ                0x0;
#define DRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJ       0x1;
#define DRM_XE_SYNC_TYPE_USER_FENCE             0x2;
    __u32 type;
#define DRM_XE_SYNC_FLAG_SIGNAL (1 << 0);
    __u32 flags;
    union {
        __u32 handle;
        __u64 addr;
    };
    __u64 timeline_value;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
type: Type of the this sync object
flags: Sync Flags
{unnamed_union}: anonymous
handle: Handle for the object
addr: Address of user fence. When sync is passed in via exec IOCTL this is a GPU address in the VM. When sync passed in via VM bind IOCTL this is a user pointer. In either case, it is the users responsibility that this address is present and mapped when the user fence is signalled. Must be qword aligned.
timeline_value: Input for the timeline sync object. Needs to be different than 0 when used with DRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJ.
reserved: Reserved

Description

The type can be:

DRM_XE_SYNC_TYPE_SYNCOBJ
DRM_XE_SYNC_TYPE_TIMELINE_SYNCOBJ
DRM_XE_SYNC_TYPE_USER_FENCE

and the flags can be:

DRM_XE_SYNC_FLAG_SIGNAL

A minimal use of drm_xe_sync looks like this:

struct drm_xe_sync sync = {
    .flags = DRM_XE_SYNC_FLAG_SIGNAL,
    .type = DRM_XE_SYNC_TYPE_SYNCOBJ,
};
struct drm_syncobj_create syncobj_create = { 0 };
ioctl(fd, DRM_IOCTL_SYNCOBJ_CREATE, &syncobj_create);
sync.handle = syncobj_create.handle;
    ...
    use of &sync in drm_xe_exec or drm_xe_vm_bind
    ...
struct drm_syncobj_wait wait = {
    .handles = &sync.handle,
    .timeout_nsec = INT64_MAX,
    .count_handles = 1,
    .flags = 0,
    .first_signaled = 0,
    .pad = 0,
};
ioctl(fd, DRM_IOCTL_SYNCOBJ_WAIT, &wait);

struct drm_xe_exec¶: Input of DRM_IOCTL_XE_EXEC

Definition:

struct drm_xe_exec {
    __u64 extensions;
    __u32 exec_queue_id;
    __u32 num_syncs;
    __u64 syncs;
    __u64 address;
    __u16 num_batch_buffer;
    __u16 pad[3];
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
exec_queue_id: Exec queue ID for the batch buffer
num_syncs: Amount of struct drm_xe_sync in array.
syncs: Pointer to struct drm_xe_sync array.
address: address of batch buffer if num_batch_buffer == 1 or an array of batch buffer addresses
num_batch_buffer: number of batch buffer in this exec, must match the width of the engine
pad: MBZ
reserved: Reserved

Description

This is an example to use drm_xe_exec for execution of the object at BIND_ADDRESS (see example in drm_xe_vm_bind) by an exec_queue (see example in drm_xe_exec_queue_create). It can be synchronized by using the example provided for drm_xe_sync.

struct drm_xe_exec exec = {
    .exec_queue_id = exec_queue,
    .syncs = &sync,
    .num_syncs = 1,
    .address = BIND_ADDRESS,
    .num_batch_buffer = 1,
};
ioctl(fd, DRM_IOCTL_XE_EXEC, &exec);

struct drm_xe_wait_user_fence¶: Input of DRM_IOCTL_XE_WAIT_USER_FENCE

Definition:

struct drm_xe_wait_user_fence {
    __u64 extensions;
    __u64 addr;
#define DRM_XE_UFENCE_WAIT_OP_EQ        0x0;
#define DRM_XE_UFENCE_WAIT_OP_NEQ       0x1;
#define DRM_XE_UFENCE_WAIT_OP_GT        0x2;
#define DRM_XE_UFENCE_WAIT_OP_GTE       0x3;
#define DRM_XE_UFENCE_WAIT_OP_LT        0x4;
#define DRM_XE_UFENCE_WAIT_OP_LTE       0x5;
    __u16 op;
#define DRM_XE_UFENCE_WAIT_FLAG_ABSTIME (1 << 0);
    __u16 flags;
    __u32 pad;
    __u64 value;
    __u64 mask;
    __s64 timeout;
    __u32 exec_queue_id;
    __u32 pad2;
    __u64 reserved[2];
};

Members

extensions

Pointer to the first extension struct, if any

addr

user pointer address to wait on, must qword aligned

op

wait operation (type of comparison)

flags

wait flags

pad

MBZ

value

compare value

mask

comparison mask

timeout

how long to wait before bailing, value in nanoseconds. Without DRM_XE_UFENCE_WAIT_FLAG_ABSTIME flag set (relative timeout) it contains timeout expressed in nanoseconds to wait (fence will expire at now() + timeout). When DRM_XE_UFENCE_WAIT_FLAG_ABSTIME flat is set (absolute timeout) wait will end at timeout (uses system MONOTONIC_CLOCK). Passing negative timeout leads to neverending wait.

On relative timeout this value is updated with timeout left (for restarting the call in case of signal delivery). On absolute timeout this value stays intact (restarted call still expire at the same point of time).

exec_queue_id

exec_queue_id returned from xe_exec_queue_create_ioctl

pad2

MBZ

reserved

Reserved

Description

Wait on user fence, XE will wake-up on every HW engine interrupt in the instances list and check if user fence is complete:

(*addr & MASK) OP (VALUE & MASK)

Returns to user on user fence completion or timeout.

The op can be:

DRM_XE_UFENCE_WAIT_OP_EQ
DRM_XE_UFENCE_WAIT_OP_NEQ
DRM_XE_UFENCE_WAIT_OP_GT
DRM_XE_UFENCE_WAIT_OP_GTE
DRM_XE_UFENCE_WAIT_OP_LT
DRM_XE_UFENCE_WAIT_OP_LTE

and the flags can be:

DRM_XE_UFENCE_WAIT_FLAG_ABSTIME
DRM_XE_UFENCE_WAIT_FLAG_SOFT_OP

The mask values can be for example:

0xffu for u8
0xffffu for u16
0xffffffffu for u32
0xffffffffffffffffu for u64

enum drm_xe_observation_type¶: Observation stream types

Constants

DRM_XE_OBSERVATION_TYPE_OA: OA observation stream type
DRM_XE_OBSERVATION_TYPE_EU_STALL: EU stall sampling observation stream type

enum drm_xe_observation_op¶: Observation stream ops

Constants

DRM_XE_OBSERVATION_OP_STREAM_OPEN: Open an observation stream
DRM_XE_OBSERVATION_OP_ADD_CONFIG: Add observation stream config
DRM_XE_OBSERVATION_OP_REMOVE_CONFIG: Remove observation stream config

struct drm_xe_observation_param¶: Input of DRM_XE_OBSERVATION

Definition:

struct drm_xe_observation_param {
    __u64 extensions;
    __u64 observation_type;
    __u64 observation_op;
    __u64 param;
};

Members

extensions: Pointer to the first extension struct, if any
observation_type: observation stream type, of enum drm_xe_observation_type
observation_op: observation stream op, of enum drm_xe_observation_op
param: Pointer to actual stream params

Description

The observation layer enables multiplexing observation streams of multiple types. The actual params for a particular stream operation are supplied via the param pointer (use __copy_from_user to get these params).

enum drm_xe_observation_ioctls¶: Observation stream fd ioctl’s

Constants

DRM_XE_OBSERVATION_IOCTL_ENABLE: Enable data capture for an observation stream
DRM_XE_OBSERVATION_IOCTL_DISABLE: Disable data capture for a observation stream
DRM_XE_OBSERVATION_IOCTL_CONFIG: Change observation stream configuration
DRM_XE_OBSERVATION_IOCTL_STATUS: Return observation stream status
DRM_XE_OBSERVATION_IOCTL_INFO: Return observation stream info

Description

Information exchanged between userspace and kernel for observation fd ioctl’s is stream type specific

enum drm_xe_oa_unit_type¶: OA unit types

Constants

DRM_XE_OA_UNIT_TYPE_OAG: OAG OA unit. OAR/OAC are considered sub-types of OAG. For OAR/OAC, use OAG.
DRM_XE_OA_UNIT_TYPE_OAM: OAM OA unit
DRM_XE_OA_UNIT_TYPE_OAM_SAG: OAM_SAG OA unit

struct drm_xe_oa_unit¶: describe OA unit

Definition:

struct drm_xe_oa_unit {
    __u64 extensions;
    __u32 oa_unit_id;
    __u32 oa_unit_type;
    __u64 capabilities;
#define DRM_XE_OA_CAPS_BASE             (1 << 0);
#define DRM_XE_OA_CAPS_SYNCS            (1 << 1);
#define DRM_XE_OA_CAPS_OA_BUFFER_SIZE   (1 << 2);
#define DRM_XE_OA_CAPS_WAIT_NUM_REPORTS (1 << 3);
#define DRM_XE_OA_CAPS_OAM              (1 << 4);
    __u64 oa_timestamp_freq;
    __u64 reserved[4];
    __u64 num_engines;
    struct drm_xe_engine_class_instance eci[];
};

Members

extensions: Pointer to the first extension struct, if any
oa_unit_id: OA unit ID
oa_unit_type: OA unit type of drm_xe_oa_unit_type
capabilities: OA capabilities bit-mask
oa_timestamp_freq: OA timestamp freq
reserved: MBZ
num_engines: number of engines in eci array
eci: engines attached to this OA unit

struct drm_xe_query_oa_units¶: describe OA units

Definition:

struct drm_xe_query_oa_units {
    __u64 extensions;
    __u32 num_oa_units;
    __u32 pad;
    __u64 oa_units[];
};

Members

extensions: Pointer to the first extension struct, if any
num_oa_units: number of OA units returned in oau[]
pad: MBZ
oa_units: struct drm_xe_oa_unit array returned for this device. Written below as a u64 array to avoid problems with nested flexible arrays with some compilers

Description

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_OA_UNITS, then the reply uses struct drm_xe_query_oa_units in .data.

OA unit properties for all OA units can be accessed using a code block such as the one below:

struct drm_xe_query_oa_units *qoa;
struct drm_xe_oa_unit *oau;
u8 *poau;

// malloc qoa and issue DRM_XE_DEVICE_QUERY_OA_UNITS. Then:
poau = (u8 *)&qoa->oa_units[0];
for (int i = 0; i < qoa->num_oa_units; i++) {
        oau = (struct drm_xe_oa_unit *)poau;
        // Access 'struct drm_xe_oa_unit' fields here
        poau += sizeof(*oau) + oau->num_engines * sizeof(oau->eci[0]);
}

enum drm_xe_oa_format_type¶: OA format types as specified in PRM/Bspec 52198/60942

Constants

DRM_XE_OA_FMT_TYPE_OAG: OAG report format
DRM_XE_OA_FMT_TYPE_OAR: OAR report format
DRM_XE_OA_FMT_TYPE_OAM: OAM report format
DRM_XE_OA_FMT_TYPE_OAC: OAC report format
DRM_XE_OA_FMT_TYPE_OAM_MPEC: OAM SAMEDIA or OAM MPEC report format
DRM_XE_OA_FMT_TYPE_PEC: PEC report format

enum drm_xe_oa_property_id¶: OA stream property id’s

Constants

DRM_XE_OA_PROPERTY_OA_UNIT_ID: ID of the OA unit on which to open the OA stream, see oa_unit_id in ‘struct drm_xe_query_oa_units’. Defaults to 0 if not provided.
DRM_XE_OA_PROPERTY_SAMPLE_OA: A value of 1 requests inclusion of raw OA unit reports or stream samples in a global buffer attached to an OA unit.
DRM_XE_OA_PROPERTY_OA_METRIC_SET: OA metrics defining contents of OA reports, previously added via DRM_XE_OBSERVATION_OP_ADD_CONFIG.
DRM_XE_OA_PROPERTY_OA_FORMAT: OA counter report format
DRM_XE_OA_PROPERTY_OA_PERIOD_EXPONENT: Requests periodic OA unit sampling with sampling frequency proportional to 2^(period_exponent + 1)
DRM_XE_OA_PROPERTY_OA_DISABLED: A value of 1 will open the OA stream in a DISABLED state (see DRM_XE_OBSERVATION_IOCTL_ENABLE).
DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID: Open the stream for a specific exec_queue_id. OA queries can be executed on this exec queue.
DRM_XE_OA_PROPERTY_OA_ENGINE_INSTANCE: Optional engine instance to pass along with DRM_XE_OA_PROPERTY_EXEC_QUEUE_ID or will default to 0.
DRM_XE_OA_PROPERTY_NO_PREEMPT: Allow preemption and timeslicing to be disabled for the stream exec queue.
DRM_XE_OA_PROPERTY_NUM_SYNCS: Number of syncs in the sync array specified in DRM_XE_OA_PROPERTY_SYNCS
DRM_XE_OA_PROPERTY_SYNCS: Pointer to struct drm_xe_sync array with array size specified via DRM_XE_OA_PROPERTY_NUM_SYNCS. OA configuration will wait till input fences signal. Output fences will signal after the new OA configuration takes effect. For DRM_XE_SYNC_TYPE_USER_FENCE, addr is a user pointer, similar to the VM bind case.
DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE: Size of OA buffer to be allocated by the driver in bytes. Supported sizes are powers of 2 from 128 KiB to 128 MiB. When not specified, a 16 MiB OA buffer is allocated by default.
DRM_XE_OA_PROPERTY_WAIT_NUM_REPORTS: Number of reports to wait for before unblocking poll or read

Description

Stream params are specified as a chain of drm_xe_ext_set_property struct’s, with property values from enum drm_xe_oa_property_id and drm_xe_user_extension base.name set to DRM_XE_OA_EXTENSION_SET_PROPERTY. param field in struct drm_xe_observation_param points to the first drm_xe_ext_set_property struct.

Exactly the same mechanism is also used for stream reconfiguration using the DRM_XE_OBSERVATION_IOCTL_CONFIG observation stream fd ioctl, though only a subset of properties below can be specified for stream reconfiguration.

struct drm_xe_oa_config¶: OA metric configuration

Definition:

struct drm_xe_oa_config {
    __u64 extensions;
    char uuid[36];
    __u32 n_regs;
    __u64 regs_ptr;
};

Members

extensions: Pointer to the first extension struct, if any
uuid: String formatted like “%08x-%04x-%04x-%04x-%012x”
n_regs: Number of regs in regs_ptr
regs_ptr: Pointer to (register address, value) pairs for OA config registers. Expected length of buffer is: (2 * sizeof(u32) * n_regs).

Description

Multiple OA configs can be added using DRM_XE_OBSERVATION_OP_ADD_CONFIG. A particular config can be specified when opening an OA stream using DRM_XE_OA_PROPERTY_OA_METRIC_SET property.

struct drm_xe_oa_stream_status¶: OA stream status returned from DRM_XE_OBSERVATION_IOCTL_STATUS observation stream fd ioctl. Userspace can call the ioctl to query stream status in response to EIO errno from observation fd read().

Definition:

struct drm_xe_oa_stream_status {
    __u64 extensions;
    __u64 oa_status;
#define DRM_XE_OASTATUS_MMIO_TRG_Q_FULL         (1 << 3);
#define DRM_XE_OASTATUS_COUNTER_OVERFLOW        (1 << 2);
#define DRM_XE_OASTATUS_BUFFER_OVERFLOW         (1 << 1);
#define DRM_XE_OASTATUS_REPORT_LOST             (1 << 0);
    __u64 reserved[3];
};

Members

extensions: Pointer to the first extension struct, if any
oa_status: OA stream status (see Bspec 46717/61226)
reserved: reserved for future use

struct drm_xe_oa_stream_info¶: OA stream info returned from DRM_XE_OBSERVATION_IOCTL_INFO observation stream fd ioctl

Definition:

struct drm_xe_oa_stream_info {
    __u64 extensions;
    __u64 oa_buf_size;
    __u64 reserved[3];
};

Members

extensions: Pointer to the first extension struct, if any
oa_buf_size: OA buffer size
reserved: reserved for future use

enum drm_xe_pxp_session_type¶: Supported PXP session types.

Constants

DRM_XE_PXP_TYPE_NONE: PXP not used
DRM_XE_PXP_TYPE_HWDRM: HWDRM sessions are used for content that ends up on the display.

Description

We currently only support HWDRM sessions, which are used for protected content that ends up being displayed, but the HW supports multiple types, so we might extend support in the future.

enum drm_xe_eu_stall_property_id¶: EU stall sampling input property ids.

Constants

DRM_XE_EU_STALL_PROP_GT_ID: gt_id of the GT on which EU stall data will be captured.
DRM_XE_EU_STALL_PROP_SAMPLE_RATE: Sampling rate in GPU cycles from sampling_rates in struct drm_xe_query_eu_stall
DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS: Minimum number of EU stall data reports to be present in the kernel buffer before unblocking a blocked poll or read.

Description

These properties are passed to the driver at open as a chain of drm_xe_ext_set_property structures with property set to these properties’ enums and value set to the corresponding values of these properties. drm_xe_user_extension base.name should be set to DRM_XE_EU_STALL_EXTENSION_SET_PROPERTY.

With the file descriptor obtained from open, user space must enable the EU stall stream fd with DRM_XE_OBSERVATION_IOCTL_ENABLE before calling read(). EIO errno from read() indicates HW dropped data due to full buffer.

struct drm_xe_query_eu_stall¶: Information about EU stall sampling.

Definition:

struct drm_xe_query_eu_stall {
    __u64 extensions;
    __u64 capabilities;
#define DRM_XE_EU_STALL_CAPS_BASE               (1 << 0);
    __u64 record_size;
    __u64 per_xecore_buf_size;
    __u64 reserved[5];
    __u64 num_sampling_rates;
    __u64 sampling_rates[];
};

Members

extensions: Pointer to the first extension struct, if any
capabilities: EU stall capabilities bit-mask
record_size: size of each EU stall data record
per_xecore_buf_size: internal per XeCore buffer size
reserved: Reserved
num_sampling_rates: Number of sampling rates in sampling_rates array
sampling_rates: Flexible array of sampling rates sorted in the fastest to slowest order. Sampling rates are specified in GPU clock cycles.

Description

If a query is made with a struct drm_xe_device_query where .query is equal to DRM_XE_DEVICE_QUERY_EU_STALL, then the reply uses struct drm_xe_query_eu_stall in .data.

struct drm_xe_madvise¶: Input of DRM_IOCTL_XE_MADVISE

Definition:

struct drm_xe_madvise {
    __u64 extensions;
    __u64 start;
    __u64 range;
    __u32 vm_id;
#define DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC     0;
#define DRM_XE_MEM_RANGE_ATTR_ATOMIC            1;
#define DRM_XE_MEM_RANGE_ATTR_PAT               2;
    __u32 type;
    union {
        struct {
#define DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE     0;
#define DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM     -1;
            __u32 devmem_fd;
#define DRM_XE_MIGRATE_ALL_PAGES                0;
#define DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES        1;
            __u16 migration_policy;
            __u16 pad;
            __u64 reserved;
        } preferred_mem_loc;
        struct {
#define DRM_XE_ATOMIC_UNDEFINED 0;
#define DRM_XE_ATOMIC_DEVICE    1;
#define DRM_XE_ATOMIC_GLOBAL    2;
#define DRM_XE_ATOMIC_CPU       3;
            __u32 val;
            __u32 pad;
            __u64 reserved;
        } atomic;
        struct {
            __u32 val;
            __u32 pad;
            __u64 reserved;
        } pat_index;
    };
    __u64 reserved[2];
};

Members

extensions

Pointer to the first extension struct, if any

start

start of the virtual address range

range

size of the virtual address range

vm_id

vm_id of the virtual range

type

type of attribute

{unnamed_union}

anonymous

preferred_mem_loc

preferred memory location

Used when type == DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC

Supported values for preferred_mem_loc.devmem_fd:

DRM_XE_PREFERRED_LOC_DEFAULT_DEVICE: set vram of fault tile as preferred loc
DRM_XE_PREFERRED_LOC_DEFAULT_SYSTEM: set smem as preferred loc

Supported values for preferred_mem_loc.migration_policy:

DRM_XE_MIGRATE_ALL_PAGES
DRM_XE_MIGRATE_ONLY_SYSTEM_PAGES

atomic

Atomic access policy

Used when type == DRM_XE_MEM_RANGE_ATTR_ATOMIC.

Supported values for atomic.val:

DRM_XE_ATOMIC_UNDEFINED: Undefined or default behaviour. Support both GPU and CPU atomic operations for system allocator. Support GPU atomic operations for normal(bo) allocator.
DRM_XE_ATOMIC_DEVICE: Support GPU atomic operations.
DRM_XE_ATOMIC_GLOBAL: Support both GPU and CPU atomic operations.
DRM_XE_ATOMIC_CPU: Support CPU atomic only, no GPU atomics supported.

pat_index

Page attribute table index

Used when type == DRM_XE_MEM_RANGE_ATTR_PAT.

reserved

Reserved

Description

This structure is used to set memory attributes for a virtual address range in a VM. The type of attribute is specified by type, and the corresponding union member is used to provide additional parameters for type.

Supported attribute types:

DRM_XE_MEM_RANGE_ATTR_PREFERRED_LOC: Set preferred memory location.
DRM_XE_MEM_RANGE_ATTR_ATOMIC: Set atomic access policy.
DRM_XE_MEM_RANGE_ATTR_PAT: Set page attribute table index.

Example

struct drm_xe_madvise madvise = {
     .vm_id = vm_id,
     .start = 0x100000,
     .range = 0x2000,
     .type = DRM_XE_MEM_RANGE_ATTR_ATOMIC,
     .atomic_val = DRM_XE_ATOMIC_DEVICE,
};

ioctl(fd, DRM_IOCTL_XE_MADVISE, &madvise);

struct drm_xe_mem_range_attr¶: Output of DRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS

Definition:

struct drm_xe_mem_range_attr {
    __u64 extensions;
    __u64 start;
    __u64 end;
    struct {
        __u32 devmem_fd;
        __u32 migration_policy;
    } preferred_mem_loc;
    struct {
        __u32 val;
        __u32 reserved;
    } atomic;
    struct {
        __u32 val;
        __u32 reserved;
    } pat_index;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
start: start of the memory range
end: end of the memory range
preferred_mem_loc: preferred memory location
atomic: Atomic access policy
pat_index: Page attribute table index
reserved: Reserved

Description

This structure is provided by userspace and filled by KMD in response to the DRM_IOCTL_XE_VM_QUERY_MEM_RANGES_ATTRS ioctl. It describes memory attributes of a memory ranges within a user specified address range in a VM.

The structure includes information such as atomic access policy, page attribute table (PAT) index, and preferred memory location. Userspace allocates an array of these structures and passes a pointer to the ioctl to retrieve attributes for each memory ranges

struct drm_xe_vm_query_mem_range_attr¶: Input of DRM_IOCTL_XE_VM_QUERY_MEM_ATTRIBUTES

Definition:

struct drm_xe_vm_query_mem_range_attr {
    __u64 extensions;
    __u32 vm_id;
    __u32 num_mem_ranges;
    __u64 start;
    __u64 range;
    __u64 sizeof_mem_range_attr;
    __u64 vector_of_mem_attr;
    __u64 reserved[2];
};

Members

extensions: Pointer to the first extension struct, if any
vm_id: vm_id of the virtual range
num_mem_ranges: number of mem_ranges in range
start: start of the virtual address range
range: size of the virtual address range
sizeof_mem_range_attr: size of struct drm_xe_mem_range_attr
vector_of_mem_attr: userptr to array of struct drm_xe_mem_range_attr
reserved: Reserved

Description

This structure is used to query memory attributes of memory regions within a user specified address range in a VM. It provides detailed information about each memory range, including atomic access policy, page attribute table (PAT) index, and preferred memory location.

Userspace first calls the ioctl with num_mem_ranges = 0, sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve the number of memory regions and size of each memory range attribute. Then, it allocates a buffer of that size and calls the ioctl again to fill the buffer with memory range attributes.

If second call fails with -ENOSPC, it means memory ranges changed between first call and now, retry IOCTL again with num_mem_ranges = 0, sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL followed by Second ioctl call.

Example

struct drm_xe_vm_query_mem_range_attr query = {
     .vm_id = vm_id,
     .start = 0x100000,
     .range = 0x2000,
 };

// First ioctl call to get num of mem regions and sizeof each attribute
ioctl(fd, DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS, &query);

// Allocate buffer for the memory region attributes
void *ptr = malloc(query.num_mem_ranges * query.sizeof_mem_range_attr);
void *ptr_start = ptr;

query.vector_of_mem_attr = (uintptr_t)ptr;

// Second ioctl call to actually fill the memory attributes
ioctl(fd, DRM_IOCTL_XE_VM_QUERY_MEM_RANGE_ATTRS, &query);

// Iterate over the returned memory region attributes
for (unsigned int i = 0; i < query.num_mem_ranges; ++i) {
   struct drm_xe_mem_range_attr *attr = (struct drm_xe_mem_range_attr *)ptr;

   // Do something with attr

   // Move pointer by one entry
   ptr += query.sizeof_mem_range_attr;
 }

free(ptr_start);

drm/asahi uAPI¶

Introduction to the Asahi UAPI

This documentation describes the Asahi IOCTLs.

Just a few generic rules about the data passed to the Asahi IOCTLs (cribbed from Panthor):

Structures must be aligned on 64-bit/8-byte. If the object is not naturally aligned, a padding field must be added.
Fields must be explicitly aligned to their natural type alignment with pad[0..N] fields.
All padding fields will be checked by the driver to make sure they are zeroed.
Flags can be added, but not removed/replaced.
New fields can be added to the main structures (the structures directly passed to the ioctl). Those fields can be added at the end of the structure, or replace existing padding fields. Any new field being added must preserve the behavior that existed before those fields were added when a value of zero is passed.
New fields can be added to indirect objects (objects pointed by the main structure), iff those objects are passed a size to reflect the size known by the userspace driver (see drm_asahi_cmd_header::size).
If the kernel driver is too old to know some fields, those will be ignored if zero, and otherwise rejected (and so will be zero on output).
If userspace is too old to know some fields, those will be zeroed (input) before the structure is parsed by the kernel driver.
Each new flag/field addition must come with a driver version update so the userspace driver doesn’t have to guess which flags are supported.
Structures should not contain unions, as this would defeat the extensibility of such structures.
IOCTLs can’t be removed or replaced. New IOCTL IDs should be placed at the end of the drm_asahi_ioctl_id enum.

enum drm_asahi_ioctl_id¶: IOCTL IDs

Constants

DRM_ASAHI_GET_PARAMS: Query device properties.
DRM_ASAHI_GET_TIME: Query device time.
DRM_ASAHI_VM_CREATE: Create a GPU VM address space.
DRM_ASAHI_VM_DESTROY: Destroy a VM.
DRM_ASAHI_VM_BIND: Bind/unbind memory to a VM.
DRM_ASAHI_GEM_CREATE: Create a buffer object.
DRM_ASAHI_GEM_MMAP_OFFSET: Get offset to pass to mmap() to map a given GEM handle.
DRM_ASAHI_GEM_BIND_OBJECT: Bind memory as a special object
DRM_ASAHI_QUEUE_CREATE: Create a scheduling queue.
DRM_ASAHI_QUEUE_DESTROY: Destroy a scheduling queue.
DRM_ASAHI_SUBMIT: Submit commands to a queue.

Description

Place new ioctls at the end, don’t re-order, don’t replace or remove entries.

These IDs are not meant to be used directly. Use the DRM_IOCTL_ASAHI_xxx definitions instead.

struct drm_asahi_params_global¶: Global parameters.

Definition:

struct drm_asahi_params_global {
    __u64 features;
    __u32 gpu_generation;
    __u32 gpu_variant;
    __u32 gpu_revision;
    __u32 chip_id;
    __u32 num_dies;
    __u32 num_clusters_total;
    __u32 num_cores_per_cluster;
    __u32 max_frequency_khz;
    __u64 core_masks[DRM_ASAHI_MAX_CLUSTERS];
    __u64 vm_start;
    __u64 vm_end;
    __u64 vm_kernel_min_size;
    __u32 max_commands_per_submission;
    __u32 max_attachments;
    __u64 command_timestamp_frequency_hz;
};

Members

features

Feature bits from drm_asahi_feature

gpu_generation

GPU generation, e.g. 13 for G13G

gpu_variant

GPU variant as a character, e.g. ‘C’ for G13C

gpu_revision

GPU revision in BCD, e.g. 0x00 for ‘A0’ or 0x21 for ‘C1’

chip_id

Chip ID in BCD, e.g. 0x8103 for T8103

num_dies

Number of dies in the SoC

num_clusters_total

Number of GPU clusters (across all dies)

num_cores_per_cluster

Number of logical cores per cluster (including inactive/nonexistent)

max_frequency_khz

Maximum GPU core clock frequency

core_masks

Bitmask of present/enabled cores per cluster

vm_start

VM range start VMA. Together with vm_end, this defines the window of valid GPU VAs. Userspace is expected to subdivide VAs out of this window.

This window contains all virtual addresses that userspace needs to know about. There may be kernel-internal GPU VAs outside this range, but that detail is not relevant here.

vm_end

VM range end VMA

vm_kernel_min_size

Minimum kernel VMA window size.

When creating a VM, userspace is required to carve out a section of virtual addresses (within the range given by vm_start and vm_end). The kernel will allocate various internal structures within the specified VA range.

Allowing userspace to choose the VA range for the kernel, rather than the kernel reserving VAs and requiring userspace to cope, can assist in implementing SVM.

max_commands_per_submission

Maximum number of supported commands per submission. This mirrors firmware limits. Userspace must split up larger command buffers, which may require inserting additional synchronization.

max_attachments

Maximum number of drm_asahi_attachment’s per command

command_timestamp_frequency_hz

Timebase frequency for timestamps written during command execution, specified via drm_asahi_timestamp structures. As this rate is controlled by the firmware, it is a queryable parameter.

Userspace must divide by this frequency to convert timestamps to seconds, rather than hardcoding a particular firmware’s rate.

Description

This struct may be queried by drm_asahi_get_params.

enum drm_asahi_feature¶: Feature bits

Constants

DRM_ASAHI_FEATURE_SOFT_FAULTS

GPU has “soft fault” enabled. Shader loads of unmapped memory will return zero. Shader stores to unmapped memory will be silently discarded. Note that only shader load/store is affected. Other hardware units are not affected, notably including texture sampling.

Soft fault is set when initializing the GPU and cannot be runtime toggled. Therefore, it is exposed as a feature bit and not a userspace-settable flag on the VM. When soft fault is enabled, userspace can speculate memory accesses more aggressively.

Description

This covers only features that userspace cannot infer from the architecture version. Most features don’t need to be here.

struct drm_asahi_get_params¶: Arguments passed to DRM_IOCTL_ASAHI_GET_PARAMS

Definition:

struct drm_asahi_get_params {
    __u32 param_group;
    __u32 pad;
    __u64 pointer;
    __u64 size;
};

Members

param_group: Parameter group to fetch (MBZ)
pad: MBZ
pointer: User pointer to write parameter struct
size: Size of the user buffer. In case of older userspace, this may be less than sizeof(struct drm_asahi_params_global). The kernel will not write past the length specified here, allowing extensibility.

struct drm_asahi_vm_create¶: Arguments passed to DRM_IOCTL_ASAHI_VM_CREATE

Definition:

struct drm_asahi_vm_create {
    __u64 kernel_start;
    __u64 kernel_end;
    __u32 vm_id;
    __u32 pad;
};

Members

kernel_start

Start of the kernel-reserved address range. See drm_asahi_params_global::vm_kernel_min_size.

Both kernel_start and kernel_end must be within the range of valid VAs given by drm_asahi_params_global::vm_start and drm_asahi_params_global::vm_end. The size of the kernel range (kernel_end - kernel_start) must be at least drm_asahi_params_global::vm_kernel_min_size.

Userspace must not bind any memory on this VM into this reserved range, it is for kernel use only.

kernel_end

End of the kernel-reserved address range. See kernel_start.

vm_id

Returned VM ID

pad

MBZ

struct drm_asahi_vm_destroy¶: Arguments passed to DRM_IOCTL_ASAHI_VM_DESTROY

Definition:

struct drm_asahi_vm_destroy {
    __u32 vm_id;
    __u32 pad;
};

Members

vm_id: VM ID to be destroyed
pad: MBZ

enum drm_asahi_gem_flags¶: Flags for GEM creation

Constants

DRM_ASAHI_GEM_WRITEBACK

BO should be CPU-mapped as writeback.

Map as writeback instead of write-combine. This optimizes for CPU reads.

DRM_ASAHI_GEM_VM_PRIVATE

BO is private to this GPU VM (no exports).

struct drm_asahi_gem_create¶: Arguments passed to DRM_IOCTL_ASAHI_GEM_CREATE

Definition:

struct drm_asahi_gem_create {
    __u64 size;
    __u32 flags;
    __u32 vm_id;
    __u32 handle;
    __u32 pad;
};

Members

size: Size of the BO
flags: Combination of drm_asahi_gem_flags flags.
vm_id: VM ID to assign to the BO, if DRM_ASAHI_GEM_VM_PRIVATE is set
handle: Returned GEM handle for the BO
pad: MBZ

struct drm_asahi_gem_mmap_offset¶: Arguments passed to DRM_IOCTL_ASAHI_GEM_MMAP_OFFSET

Definition:

struct drm_asahi_gem_mmap_offset {
    __u32 handle;
    __u32 flags;
    __u64 offset;
};

Members

handle: Handle for the object being mapped.
flags: Must be zero
offset: The fake offset to use for subsequent mmap call

enum drm_asahi_bind_flags¶: Flags for GEM binding

Constants

DRM_ASAHI_BIND_UNBIND

Instead of binding a GEM object to the range, simply unbind the GPU VMA range.

DRM_ASAHI_BIND_READ

Map BO with GPU read permission

DRM_ASAHI_BIND_WRITE

Map BO with GPU write permission

DRM_ASAHI_BIND_SINGLE_PAGE

Map a single page of the BO repeatedly across the VA range.

This is useful to fill a VA range with scratch pages or zero pages. It is intended as a mechanism to accelerate sparse.

struct drm_asahi_gem_bind_op¶: Description of a single GEM bind operation.

Definition:

struct drm_asahi_gem_bind_op {
    __u32 flags;
    __u32 handle;
    __u64 offset;
    __u64 range;
    __u64 addr;
};

Members

flags

Combination of drm_asahi_bind_flags flags.

handle

GEM object to bind (except for UNBIND)

offset

Offset into the object (except for UNBIND).

For a regular bind, this is the beginning of the region of the GEM object to bind.

For a single-page bind, this is the offset to the single page that will be repeatedly bound.

Must be page-size aligned.

range

Number of bytes to bind/unbind to addr.

Must be page-size aligned.

addr

Address to bind to.

Must be page-size aligned.

struct drm_asahi_vm_bind¶: Arguments passed to DRM_IOCTL_ASAHI_VM_BIND

Definition:

struct drm_asahi_vm_bind {
    __u32 vm_id;
    __u32 num_binds;
    __u32 stride;
    __u32 pad;
    __u64 userptr;
};

Members

vm_id: The ID of the VM to bind to
num_binds: number of binds in this IOCTL.
stride: Stride in bytes between consecutive binds. This allows extensibility of drm_asahi_gem_bind_op.
pad: MBZ
userptr: User pointer to an array of num_binds structures of type drm_asahi_gem_bind_op and size stride bytes.

enum drm_asahi_bind_object_op¶: Special object bind operation

Constants

DRM_ASAHI_BIND_OBJECT_OP_BIND: Bind a BO as a special GPU object
DRM_ASAHI_BIND_OBJECT_OP_UNBIND: Unbind a special GPU object

enum drm_asahi_bind_object_flags¶: Special object bind flags

Constants

DRM_ASAHI_BIND_OBJECT_USAGE_TIMESTAMPS: Map a BO as a timestamp buffer.

struct drm_asahi_gem_bind_object¶: Arguments passed to DRM_IOCTL_ASAHI_GEM_BIND_OBJECT

Definition:

struct drm_asahi_gem_bind_object {
    __u32 op;
    __u32 flags;
    __u32 handle;
    __u32 vm_id;
    __u64 offset;
    __u64 range;
    __u32 object_handle;
    __u32 pad;
};

Members

op: Bind operation (enum drm_asahi_bind_object_op)
flags: Combination of drm_asahi_bind_object_flags flags.
handle: GEM object to bind/unbind (BIND)
vm_id: The ID of the VM to operate on (MBZ currently)
offset: Offset into the object (BIND only)
range: Number of bytes to bind/unbind (BIND only)
object_handle: Object handle (out for BIND, in for UNBIND)
pad: MBZ

enum drm_asahi_cmd_type¶: Command type

Constants

DRM_ASAHI_CMD_RENDER

Render command, executing on the render subqueue. Combined vertex and fragment operation.

Followed by a drm_asahi_cmd_render payload.

DRM_ASAHI_CMD_COMPUTE

Compute command on the compute subqueue.

Followed by a drm_asahi_cmd_compute payload.

DRM_ASAHI_SET_VERTEX_ATTACHMENTS

Software command to set attachments for subsequent vertex shaders in the same submit.

Followed by (possibly multiple) drm_asahi_attachment payloads.

DRM_ASAHI_SET_FRAGMENT_ATTACHMENTS

Software command to set attachments for subsequent fragment shaders in the same submit.

Followed by (possibly multiple) drm_asahi_attachment payloads.

DRM_ASAHI_SET_COMPUTE_ATTACHMENTS

Software command to set attachments for subsequent compute shaders in the same submit.

Followed by (possibly multiple) drm_asahi_attachment payloads.

enum drm_asahi_priority¶: Scheduling queue priority.

Constants

DRM_ASAHI_PRIORITY_LOW

Low priority queue.

DRM_ASAHI_PRIORITY_MEDIUM

Medium priority queue.

DRM_ASAHI_PRIORITY_HIGH

High priority queue.

Reserved for future extension.

DRM_ASAHI_PRIORITY_REALTIME

Real-time priority queue.

Reserved for future extension.

Description

These priorities are forwarded to the firmware to influence firmware scheduling. The exact policy is ultimately decided by firmware, but these enums allow userspace to communicate the intentions.

struct drm_asahi_queue_create¶: Arguments passed to DRM_IOCTL_ASAHI_QUEUE_CREATE

Definition:

struct drm_asahi_queue_create {
    __u32 flags;
    __u32 vm_id;
    __u32 priority;
    __u32 queue_id;
    __u64 usc_exec_base;
};

Members

flags

MBZ

vm_id

The ID of the VM this queue is bound to

priority

One of drm_asahi_priority

queue_id

The returned queue ID

usc_exec_base

GPU base address for all USC binaries (shaders) on this queue. USC addresses are 32-bit relative to this 64-bit base.

This sets the following registers on all queue commands:

USC_EXEC_BASE_TA (vertex) USC_EXEC_BASE_ISP (fragment) USC_EXEC_BASE_CP (compute)

While the hardware lets us configure these independently per command, we do not have a use case for this. Instead, we expect userspace to fix a 4GiB VA carveout for USC memory and pass its base address here.

struct drm_asahi_queue_destroy¶: Arguments passed to DRM_IOCTL_ASAHI_QUEUE_DESTROY

Definition:

struct drm_asahi_queue_destroy {
    __u32 queue_id;
    __u32 pad;
};

Members

queue_id: The queue ID to be destroyed
pad: MBZ

enum drm_asahi_sync_type¶: Sync item type

Constants

DRM_ASAHI_SYNC_SYNCOBJ: Binary sync object
DRM_ASAHI_SYNC_TIMELINE_SYNCOBJ: Timeline sync object

struct drm_asahi_sync¶: Sync item

Definition:

struct drm_asahi_sync {
    __u32 sync_type;
    __u32 handle;
    __u64 timeline_value;
};

Members

sync_type: One of drm_asahi_sync_type
handle: The sync object handle
timeline_value: Timeline value for timeline sync objects

DRM_ASAHI_BARRIER_NONE¶

DRM_ASAHI_BARRIER_NONE

Command index for no barrier

Description

This special value may be passed in to drm_asahi_command::vdm_barrier or drm_asahi_command::cdm_barrier to indicate that the respective subqueue should not wait on any previous work.

struct drm_asahi_cmd_header¶: Top level command structure

Definition:

struct drm_asahi_cmd_header {
    __u16 cmd_type;
    __u16 size;
    __u16 vdm_barrier;
    __u16 cdm_barrier;
};

Members

cmd_type

One of drm_asahi_cmd_type

size

Size of this command, not including this header.

For hardware commands, this enables extensibility of commands without requiring extra command types. Passing a command that is shorter than expected is explicitly allowed for backwards-compatibility. Truncated fields will be zeroed.

For the synthetic attachment setting commands, this implicitly encodes the number of attachments. These commands take multiple fixed-size drm_asahi_attachment structures as their payload, so size equals number of attachments * sizeof(struct drm_asahi_attachment).

vdm_barrier

VDM (render) command index to wait on.

Barriers are indices relative to the beginning of a given submit. A barrier of 0 waits on commands submitted to the respective subqueue in previous submit ioctls. A barrier of N waits on N previous commands on the subqueue within the current submit ioctl. As a special case, passing DRM_ASAHI_BARRIER_NONE avoids waiting on any commands in the subqueue.

Examples:

0: This waits on all previous work.

NONE: This does not wait for anything on this subqueue.

1: This waits on the first render command in the submit. This is valid only if there are multiple render commands in the same submit.

Barriers are valid only for hardware commands. Synthetic software commands to set attachments must pass NONE here.

cdm_barrier

CDM (compute) command index to wait on.

See vdm_barrier, and replace VDM/render with CDM/compute.

Description

This struct is core to the command buffer definition and therefore is not extensible.

struct drm_asahi_submit¶: Arguments passed to DRM_IOCTL_ASAHI_SUBMIT

Definition:

struct drm_asahi_submit {
    __u64 syncs;
    __u64 cmdbuf;
    __u32 flags;
    __u32 queue_id;
    __u32 in_sync_count;
    __u32 out_sync_count;
    __u32 cmdbuf_size;
    __u32 pad;
};

Members

syncs

An optional pointer to an array of drm_asahi_sync. The first in_sync_count elements are in-syncs, then the remaining out_sync_count elements are out-syncs. Using a single array with explicit partitioning simplifies handling.

cmdbuf

Pointer to the command buffer to submit.

This is a flat command buffer. By design, it contains no CPU pointers, which makes it suitable for a virtgpu wire protocol without requiring any serializing/deserializing step.

It consists of a series of commands. Each command begins with a fixed-size drm_asahi_cmd_header header and is followed by a variable-length payload according to the type and size in the header.

The combined count of “real” hardware commands must be nonzero and at most drm_asahi_params_global::max_commands_per_submission.

flags

Flags for command submission (MBZ)

queue_id

The queue ID to be submitted to

in_sync_count

Number of sync objects to wait on before starting this job.

out_sync_count

Number of sync objects to signal upon completion of this job.

cmdbuf_size

Command buffer size in bytes

pad

MBZ

struct drm_asahi_attachment¶: Describe an “attachment”.

Definition:

struct drm_asahi_attachment {
    __u64 pointer;
    __u64 size;
    __u32 pad;
    __u32 flags;
};

Members

pointer: Base address of the attachment
size: Size of the attachment in bytes
pad: MBZ
flags: MBZ

Description

Attachments are any memory written by shaders, notably including render target attachments written by the end-of-tile program. This is purely a hint about the accessed memory regions. It is optional to specify, which is fortunate as it cannot be specified precisely with bindless access anyway. But where possible, it’s probably a good idea for userspace to include these hints, forwarded to the firmware.

This struct is implicitly sized and therefore is not extensible.

struct drm_asahi_zls_buffer¶: Describe a depth or stencil buffer.

Definition:

struct drm_asahi_zls_buffer {
    __u64 base;
    __u64 comp_base;
    __u32 stride;
    __u32 comp_stride;
};

Members

base: Base address of the buffer
comp_base: If the load buffer is compressed, address of the compression metadata section.
stride: If layered rendering is enabled, the number of bytes between each layer of the buffer.
comp_stride: If layered rendering is enabled, the number of bytes between each layer of the compression metadata.

Description

These fields correspond to hardware registers in the ZLS (Z Load/Store) unit. There are three hardware registers for each field respectively for loads, stores, and partial renders. In practice, it makes sense to set all to the same values, except in exceptional cases not yet implemented in userspace, so we do not duplicate here for simplicity/efficiency.

This struct is embedded in other structs and therefore is not extensible.

struct drm_asahi_timestamp¶: Describe a timestamp write.

Definition:

struct drm_asahi_timestamp {
    __u32 handle;
    __u32 offset;
};

Members

handle: Handle of the timestamp buffer, or 0 to skip this timestamp. If nonzero, this must equal the value returned in drm_asahi_gem_bind_object::object_handle.
offset: Offset to write into the timestamp buffer

Description

The firmware can optionally write the GPU timestamp at render pass granularities, but it needs to be mapped specially via DRM_IOCTL_ASAHI_GEM_BIND_OBJECT. This structure therefore describes where to write as a handle-offset pair, rather than a GPU address like normal.

This struct is embedded in other structs and therefore is not extensible.

struct drm_asahi_timestamps¶: Describe timestamp writes.

Definition:

struct drm_asahi_timestamps {
    struct drm_asahi_timestamp start;
    struct drm_asahi_timestamp end;
};

Members

start: Timestamp recorded at the start of the operation
end: Timestamp recorded at the end of the operation

Description

Each operation that can be timestamped, can be timestamped at the start and end. Therefore, drm_asahi_timestamp structs always come in pairs, bundled together into drm_asahi_timestamps.

This struct is embedded in other structs and therefore is not extensible.

struct drm_asahi_helper_program¶: Describe helper program configuration.

Definition:

struct drm_asahi_helper_program {
    __u32 binary;
    __u32 cfg;
    __u64 data;
};

Members

binary

USC address to the helper program binary. This is a tagged pointer with configuration in the bottom bits.

cfg

Additional configuration bits for the helper program.

data

Data passed to the helper program. This value is not interpreted by the kernel, firmware, or hardware in any way. It is simply a sideband for userspace, set with the submit ioctl and read via special registers inside the helper program.

In practice, userspace will pass a 64-bit GPU VA here pointing to the actual arguments, which presumably don’t fit in 64-bits.

Description

The helper program is a compute-like kernel required for various hardware functionality. Its most important role is dynamically allocating scratch/stack memory for individual subgroups, by partitioning a static allocation shared for the whole device. It is supplied by userspace via drm_asahi_helper_program and internally dispatched by the hardware as needed.

This struct is embedded in other structs and therefore is not extensible.

struct drm_asahi_bg_eot¶: Describe a background or end-of-tile program.

Definition:

struct drm_asahi_bg_eot {
    __u32 usc;
    __u32 rsrc_spec;
};

Members

usc: USC address of the hardware USC words binding resources (including images and uniforms) and the program itself. Note this is an additional layer of indirection compared to the helper program, avoiding the need for a sideband for data. This is a tagged pointer with additional configuration in the bottom bits.
rsrc_spec: Resource specifier for the program. This is a packed hardware data structure describing the required number of registers, uniforms, bound textures, and bound samplers.

Description

The background and end-of-tile programs are dispatched by the hardware at the beginning and end of rendering. As the hardware “tilebuffer” is simply local memory, these programs are necessary to implement API-level render targets. The fragment-like background program is responsible for loading either the clear colour or the existing render target contents, while the compute-like end-of-tile program stores the tilebuffer contents to memory.

This struct is embedded in other structs and therefore is not extensible.

struct drm_asahi_cmd_render¶: Command to submit 3D

Definition:

struct drm_asahi_cmd_render {
    __u32 flags;
    __u32 isp_zls_pixels;
    __u64 vdm_ctrl_stream_base;
    struct drm_asahi_helper_program vertex_helper;
    struct drm_asahi_helper_program fragment_helper;
    __u64 isp_scissor_base;
    __u64 isp_dbias_base;
    __u64 isp_oclqry_base;
    struct drm_asahi_zls_buffer depth;
    struct drm_asahi_zls_buffer stencil;
    __u64 zls_ctrl;
    __u64 ppp_multisamplectl;
    __u64 sampler_heap;
    __u32 ppp_ctrl;
    __u16 width_px;
    __u16 height_px;
    __u16 layers;
    __u16 sampler_count;
    __u8 utile_width_px;
    __u8 utile_height_px;
    __u8 samples;
    __u8 sample_size_B;
    __u32 isp_merge_upper_x;
    __u32 isp_merge_upper_y;
    struct drm_asahi_bg_eot bg;
    struct drm_asahi_bg_eot eot;
    struct drm_asahi_bg_eot partial_bg;
    struct drm_asahi_bg_eot partial_eot;
    __u32 isp_bgobjdepth;
    __u32 isp_bgobjvals;
    struct drm_asahi_timestamps ts_vtx;
    struct drm_asahi_timestamps ts_frag;
};

Members

flags

Combination of drm_asahi_render_flags flags.

isp_zls_pixels

ISP_ZLS_PIXELS register value. This contains the depth/stencil width/height, which may differ from the framebuffer width/height.

vdm_ctrl_stream_base

VDM_CTRL_STREAM_BASE register value. GPU address to the beginning of the VDM control stream.

vertex_helper

Helper program used for the vertex shader

fragment_helper

Helper program used for the fragment shader

isp_scissor_base

ISP_SCISSOR_BASE register value. GPU address of an array of scissor descriptors indexed in the render pass.

isp_dbias_base

ISP_DBIAS_BASE register value. GPU address of an array of depth bias values indexed in the render pass.

isp_oclqry_base

ISP_OCLQRY_BASE register value. GPU address of an array of occlusion query results written by the render pass.

depth

Depth buffer

stencil

Stencil buffer

zls_ctrl

ZLS_CTRL register value

ppp_multisamplectl

PPP_MULTISAMPLECTL register value

sampler_heap

Base address of the sampler heap. This heap is used for both vertex shaders and fragment shaders. The registers are per-stage, but there is no known use case for separate heaps.

ppp_ctrl

PPP_CTRL register value

width_px

Framebuffer width in pixels

height_px

Framebuffer height in pixels

layers

Number of layers in the framebuffer

sampler_count

Number of samplers in the sampler heap.

utile_width_px

Width of a logical tilebuffer tile in pixels

utile_height_px

Height of a logical tilebuffer tile in pixels

samples

# of samples in the framebuffer. Must be 1, 2, or 4.

sample_size_B

# of bytes in the tilebuffer required per sample.

isp_merge_upper_x

32-bit float used in the hardware triangle merging. Calculate as: tan(60 deg) * width.

Making these values UAPI avoids requiring floating-point calculations in the kernel in the hot path.

isp_merge_upper_y

32-bit float. Calculate as: tan(60 deg) * height. See isp_merge_upper_x.

bg

Background program run for each tile at the start

eot

End-of-tile program ran for each tile at the end

partial_bg

Background program ran at the start of each tile when resuming the render pass during a partial render.

partial_eot

End-of-tile program ran at the end of each tile when pausing the render pass during a partial render.

isp_bgobjdepth

ISP_BGOBJDEPTH register value. This is the depth buffer clear value, encoded in the depth buffer’s format: either a 32-bit float or a 16-bit unorm (with upper bits zeroed).

isp_bgobjvals

ISP_BGOBJVALS register value. The bottom 8-bits contain the stencil buffer clear value.

ts_vtx

Timestamps for the vertex portion of the render

ts_frag

Timestamps for the fragment portion of the render

Description

This command submits a single render pass. The hardware control stream may include many draws and subpasses, but within the command, the framebuffer dimensions and attachments are fixed.

The hardware requires the firmware to set a large number of Control Registers setting up state at render pass granularity before each command rendering 3D. The firmware bundles this state into data structures. Unfortunately, we cannot expose either any of that directly to userspace, because the kernel-firmware ABI is not stable. Although we can guarantee the firmware updates in tandem with the kernel, we cannot break old userspace when upgrading the firmware and kernel. Therefore, we need to abstract well the data structures to avoid tying our hands with future firmwares.

The bulk of drm_asahi_cmd_render therefore consists of values of hardware control registers, marshalled via the firmware interface.

The framebuffer/tilebuffer dimensions are also specified here. In addition to being passed to the firmware/hardware, the kernel requires these dimensions to calculate various essential tiling-related data structures. It is unfortunate that our submits are heavier than on vendors with saner hardware-software interfaces. The upshot is all of this information is readily available to userspace with all current APIs.

It looks odd - but it’s not overly burdensome and it ensures we can remain compatible with old userspace.

struct drm_asahi_cmd_compute¶: Command to submit compute

Definition:

struct drm_asahi_cmd_compute {
    __u32 flags;
    __u32 sampler_count;
    __u64 cdm_ctrl_stream_base;
    __u64 cdm_ctrl_stream_end;
    __u64 sampler_heap;
    struct drm_asahi_helper_program helper;
    struct drm_asahi_timestamps ts;
};

Members

flags: MBZ
sampler_count: Number of samplers in the sampler heap.
cdm_ctrl_stream_base: CDM_CTRL_STREAM_BASE register value. GPU address to the beginning of the CDM control stream.
cdm_ctrl_stream_end: GPU base address to the end of the hardware control stream. Note this only considers the first contiguous segment of the control stream, as the stream might jump elsewhere.
sampler_heap: Base address of the sampler heap.
helper: Helper program used for this compute command
ts: Timestamps for the compute command

Description

This command submits a control stream consisting of compute dispatches. There is essentially no limit on how many compute dispatches may be included in a single compute command, although timestamps are at command granularity.

struct drm_asahi_get_time¶: Arguments passed to DRM_IOCTL_ASAHI_GET_TIME

Definition:

struct drm_asahi_get_time {
    __u64 flags;
    __u64 gpu_timestamp;
};

Members

flags: MBZ.
gpu_timestamp: On return, the GPU timestamp in nanoseconds.

DRM_IOCTL_ASAHI¶

DRM_IOCTL_ASAHI (__access, __id, __type)

Build an Asahi IOCTL number

Parameters

__access: Access type. Must be R, W or RW.
__id: One of the DRM_ASAHI_xxx id.
__type: Suffix of the type being passed to the IOCTL.

Description

Don’t use this macro directly, use the DRM_IOCTL_ASAHI_xxx values instead.

Return

An IOCTL number to be passed to ioctl() from userspace.