TTM Fencing

More information can be found here: http://marc.theaimsgroup.com/?t=120407426000002&r=1&w=2

Fence Flags

fence_flags are to tweak the behavior of calls into the fence manager.

  * DRM_FENCE_FLAG_EMIT causes a fence to enter the command stream 
        * instead of just being created. 
  * DRM_FENCE_FLAG_SHAREABLE makes sure the fence object is shareable 
        * between processes. 
  * DRM_FENCE_FLAG_WAIT_LAZY hints that a polling wait should sleep 
        * in-between polls. 
  * DRM_FENCE_FLAG_IGNORE_SIGNALS ignore signals while waiting. 
  * DRM_FENCE_FLAG_NO_USER don't return a user-space fence object on a 
        * superioctl. Just have the fence live in the kernel.

Fence Type

fence_type is something completely different. It's there to expose GPU states, or how far the GPU has proceeded with the command sequence associated with the fence. This info is needed so that the buffer object code knows when a buffer object is idle. A TTM object (buffer, regiser, whatever) is considered idle if (object->fence_type & ((object->fence_type & object->fence->signaled_types) == object->fence_type).

Let's take i915 as a typical example. Batch buffers should be idle and can be reused as soon as the GPU command reader is done with the buffer. At that point fence->sigaled_types will contain DRM_BO_FENCE_TYPE_EXE. However, render targets and textures aren't idle until that has happend AND and MI_FLUSH has executed, at which point fence->signaled_types will also contain DRM_I915_FENCE_TYPE_RW.

In an ideal situation we would like to only issue an MI_FLUSH at the end of a full scene, however, we might need a huge number of batch buffers to build that scene, and we'd like to be able to reuse these as soon as they've delivered their contents. We don't want them hanging around useless until someone issues a MI_FLUSH at the end of the scene.

Other uses for fence_type are when different engines are fed through the same command submission mechanism. A typical example would be the Unichrome which has different ways to signal 2D done, 3D done, video blit done, mpeg done etc. All these engines are fed through the same command submission mechanism.

This is contrary to the fence_class which is intended for separate command submission mechanisms and the TTM has ways to synchronize these on a per-buffer level. Typical usage would be hardware with completely separate 2D and 3D engines that operate in parallel with separate command FIFOs.

The callback has_irq should take fence_type and not fence_flags. I think this is just a bad naming of the parameter that has survived cleanups.

The callback "needed_flush" is there to tell the fence manager whether some kind of GPU flush is needed to signal any fence type in fence->waiting_types. In the intel example case above, If the DRM_I915_FENCE_TYPE_RW flag is set in fence->waiting_types, a flush would be needed, but only if there isn't already a flush pending that would make sure this flag signals. Also, since the flushes that are currently implemented on intel are asynchronous, a flush is not needed until the DRM_BO_FENCE_TYPE_EXE has signaled. Otherwise we would flush a previous command sequence instead of the intended one.

The callback "flush" should schedule a GPU flush for the fc->pending_flush types, and clear fc->pending_flush when the flush has been scheduled.

Flush

flush() only needs to start a flush, and poll() will report it when it's done, but note that on, for example intel, the flush mechanism currently implemented doesn't have an IRQ that reports when it's done. Hence the polling part of the intel fence wait. A better way of implementing this is to have flushes go through a high-priority ring, accompanied by a user irq. Not done yet. A third way is to just add an MI_FLUSH to the ring and keep track of its execution. The disadvantage of this is that the ring might be full of rendering commands that will take a lot of time to complete and we will drain the ring waiting for idle buffers.

Note also, that if poll() sees that fc->flush_pending is non-zero, it should also start a flush and clear fc->flush_pending. It can be set to non-zero in the drm fence handler.

So what happens is that the drm fence handler calls needed_flush() when it has signaled a fence_type on a fence. If needed_flush() determines that further flushing is needed, it should then check if there is a flush in the command stream that would perform the required action, and if so return zero. A simple way to track flushes is to store the sequence number for the last flush. If that sequence number is higher than the current sequence number, no flush is needed, and needed_flush() would return zero.

So yes, you need to check fc->pending_flush during poll, and make sure needed_flush() is smart enough to check the ring for previous flushes that would do the job, so we don't issue any unnecessary flushes.

So over to the question, how does the poll() function tell that a flush has executed? One option is to have a separate list with all flushes in the ring, and a corresponding sequence number. Everytime we hit a new sequence number we check the list, and see if a flush has executed, and in that case call drm_fence_handler() with the correct fence_type.

One other option is to use fence->native_type. When TYPE_EXE signals on a fence with a non-zero native_type, it will also automatically signal fence->native_type on itself and all previous fences. There is a problem with this, however, and that is that such a fence might disappear when it's no longer referenced by a buffer and thus we will lose track of that flush. This can probably be fixed with refcounting tricks, but now it starts to get more and more complicated.

The Radeon driver is probably going to use a simpler and attractive approach: To encode flush presence in the fence sequence blit. Bit 31 will be reserved for this.

If you look in the intel_driver, native_type is used exactly in this way to track flushes that are programmed by user-space. Here, a driver-specific fence-flag is used to tell the intel fence driver that we have programmed an MI_FLUSH. However, if you look even more closely, the intel needed_flush() implementation ignores these user-space flushes, unless they occur on the current fence. This is to avoid the previously mentioned ring round-trip delay. So currently it's a bit stupid since even if a flush occurs on the next fence, it will go on initiating a sync_flush operation.

Command stream barrier

A related topic is the command stream barrier, which is used when the hardware cannot synchronize by itself, and needs software assistance. Such a typical case is hardware with multiple command streams. Assume that you want to render to a buffer using the 3D engine and blit from it using the 2D engine, but then engines have no way to synchronize, and are fed with two different command streams. When the ttm detects, during the buffer validation, that you want to change command stream (which is the fence_class), it will, by default, idle the buffer. However, it might be that the 2D engine has a command to wait for a 3D event. In that case you'd want to add that command to the 2D stream, and make it want to wait for the 3D engine to finish rendering to the buffer in question. In that case you should impelement this in the command_stream_barrier callback. Note that the default behaviour is always to idle the buffer, which should be safe during all circumstances.

Another use case is flushing when changing access mode. Intel, for example, needs a flush when changing from 3D GPU write to 2D GPU read, but still it's using the same command stream. The way to implement this is to have one fence_type for READ an one for WRITE. When, during buffer validate, you want to change bo->fence_type from WRITE to READ, the TTM sync mechanism will detect that you are dropping a fence_type on the buffer and either idle the buffer (which will automatically flush it), or call command_stream_barrier if it's implemented. Now, command_stream_barrier() will simply issue a write_flush to the command stream and return, and the WRITE flag in bo->fence_type will be dropped when the buffer is fenced with the new fence. Of course, command_stream_barrier should first check to make sure there isn't already a flush in the command stream that would do the job.

Now what happens if we issue such a barrier (flush) and then hit an error. Let's say an EAGAIN which is common during X server operation. Then it's extremely important that we don't drop the WRITE type flag on the buffer, because then it might seem to be idle before the flush (barrier) has actually executed. This case is taken care of by the fact that he type flag will not be dropped until the buffer is fenced with the new fence. If an error occurs, the buffer will retain its old type flags and old fence.

In that case one might think that we will lose track of the flush and a new attempt to issue the command sequence will issue a new (unnecessary) barrier flush. That's not the case, because command_stream_barrier will check the command stream for old flushes.