CompositeSwap

Issues

X and GL applications need to control how their data is displayed for several reasons:

Plain X doesn't have many ways of doing the above (it mostly assumes immediate mode drawing to the visible frame buffer), but GLX provides many extensions related to handling the actual display of buffers on the user visible screen, these extensions need to be supported in DRI2 for the Linux graphics stack to really shine.

There are also issues related to memory consumption, swap behavior, and performance that can be addressed:

Memory savings

Performance

Behavior

OpenGL and GLX swap and throttling related extensions

Under a compositor, the behavior of these routines could change, or additional compositor<->client protocol added to support similar behavior. Changes noted below, though in general "video frame" should be thought of as a virtualized compositor frame rate rather than a monitor refresh (e.g. the compositor may report 60fps to applications even though it only updates the screen when recompositing is needed):

Compositor extensions

To support the above, feedback from the currently active compositor is necessary. Compositor<->client protocol is needed for:

Triple buffering

Triple buffering means different things to different people. For convenience, we list the types we're concerned about for this discussion here. All have a high memory cost and should generally only be enabled for a small number of clients at a time.

  1. compositor based - compositor keeps a private copy of each application's front buffer; this means it always has a consistent, fully drawn pixmap to use for creating screen frames. glXSwapBuffers updates the redirected front and notifies the compositor a new one is ready to pick up as its private copy.
    • good for avoiding ugly partial drawing artifacts
  2. server based - server keeps last ready client buffer around and returns available buffers to the client after a glXSwapBuffers occurs
    • can help keep clients busy at the cost of extra memory (good to keep frame rates up while preserving vblank sync'd swapping)
  3. client based - server returns 3 buffers to the client, which somehow requests copies between various of them using glXSwapBuffers
    • like (2) but totally client side.

Another factor for triple buffering is how to handle extra frames. If frames are rendered faster than can be displayed, some applications may want to discard the extra frames, while others may want to queue them (and likely be throttled).

New code

All of this means new DRI2, display server and compositor code, but it should be doable. In particular we may want:

Implementation: SGI_swap_control for the X server

The SGI_swap_control extension allows applications to control their glXSwapBuffers frequency. The glXSwapIntervalSGI call lets clients specify how frequently, in frames, their buffer swaps should occur. Implementing this requires server support, since with DRI2 the server is responsible for performing swaps. The basic flow is as follows:

  1. When an application calls glXSwapBuffers, a swap is scheduled in the server for the current frame count plus the interval count (though if the swap is to be a page flip, it's scheduled to be scheduled, since flips occur at the next vblank after being queued).
  2. The server calls into the DDX driver's ->ScheduleSwap routine, which is responsible for requesting a kernel frame event for when the swap should occur

  3. Control is returned to the client immediately
    1. Note: any futher GLX calls requiring a GLX context to be bound will block until the swap completes
  4. When the DDX receives the associated frame event, it will perform all the scheduled swap
  5. The swap count will increase
  6. The client will be unblocked if necessary

Implementation: SGI_video_sync for the X server

SGI_video_sync gives clients control over their framerate by exposing the frame count and allowing apps to wait on a given frame count. glXGetVideoSyncSGI returns the current frame count for the display the drawable is on, and glXWaitVideoSyncSGI allows a client to block until a given frame count is reached on its drawable's display. Flow in the server & client for glXGetVideoSyncSGI direct rendered case:

  1. Client calls glXGetVideoSyncSGI
  2. Mesa code receives the request and turns it into a DRI2GetMSCReq protocol request
  3. display server receives the request and calls DDX driver's ->GetMSC hook

  4. DDX driver returns frame count based on drawable location (count is returned for the CRTC with the greatest intersection with the drawable), or 0 if the drawable is currently offscreen (the GLX spec should be updated to reflect this), note this could also return an error (BadDrawable possibly).

  5. display server returns a DRI2GetMSCReply with the current MSC count to the client

Flow for glXWaitVideoSyncSGI direct rendered case:

  1. Client calls glXWaitVideoSyncSGI
  2. Mesa code receives the request and turns it into a DRI2WaitMSCReq protocol request
  3. display server receives the request and calls DDX driver's ->ScheduleWaitMSC hook which is responsible for requesting a kernel frame event for the specified WaitVideoSync values

  4. DDX blocks the client until the requested frame is received (which could be immediate if the frame has already passed)
  5. DDX receives the event and unblocks the client, calling into the server to complete the reply
  6. display server returns a DRI2WaitMSCReply with current MSC, SBC to the client

Again, if the window is redirected or the drawable is offscreen, the client won't block; an MSC reply with all zeros will be returned (again, could also return BadDrawable).

Testing

The code for the above is present in several repos and patches:

  1. kernel - 2.6.33-rc
  2. libdrm - 2.4.17 or newer
  3. dri2proto - 2.2 or newer
  4. glproto - 1.4.11 or newer
  5. mesa - master branch (will be in 7.8)
  6. xserver - master branch (will be in 1.9)
  7. xf86-video-intel - master branch (will be in 2.11)

See Graphics stack git development for information on how to build a stack with the above.

The direct rendered cases outlined in the implementation notes above are complete, but there's a bug in the async glXSwapBuffers that sometimes causes clients to hang after swapping rather than continue.

Open issues

This implementation does not guarantee tear free drawing in the non-composited, non-fullscreen (flip) case, but does provide the throttling feature implicit in SGI_swap_control. For a tear-free guarantee, the application must be performing full screen swaps eligible for page filpping or a compositor using page flipping must be present and the application's window redirected. Alternately, the driver can synchronize its blit activity with the scanout position to avoid tearing. However this approach can negatively affect performance.

The implementation also needs screen from the the compositor in the case of redirected windows, so it can request a vblank event from the kernel for the correct CRTC.

Using the new code

Some Linux applications may assume that glXSwapBuffers blocks until the swap has completed. With the above code, that's no longer the case (it's also not the case on other GLX implementations, so it's a non-portable assumption).

Similarly, some code may assume no throttling of swaps occurs. Now this behavior can be controlled with glXSwapInterval or through using glXSwapBuffersMscOML.

See below for specific use cases.

Avoiding tearing

Tearing occurs when blits or scanout buffer changes aren't synchronized with vertical retrace, causing two frames to appear adjacent to one another in the vertical (see Screen_tearing for an example). Since vertical blank periods are often very short (especially in LCD panels) and CPU scheduling between processes can be highly variable, simply blocking until a vertical retrace completes (e.g. using glXWaitForMscOML or glXWaitVideoSyncSGI) is not a reliable way of avoiding tearing.

Some DDX drivers provide an option to synchronize DRI2CopyRegion requests (generated by glXCopySubBufferMESA calls and some paths in glXSwapBuffers calls); this can prevent tearing at a potentially significant performance cost since some GPUs will stall until the vertical retrace is outside the region to be copied.

Another way to avoid tearing, assuming you're running a kernel with page flipping support, is to run your application full screen or under a compositing manager. If your app or compositing manager uses glXSwapBuffers to display new frames (as opposed to using glXCopySubBufferMESA without driver vertical retrace synchronization), the DDX and server should coordinate to flip whole new scanout buffers through the kernel, which synchronizes the flip to vertical retrace.

Throttling rendering

Use SGI_video_sync, SGI_swap_control, OML_sync_control or ARB_sync extensions to render at a constant rate. Depending on the application, it may be appropriate to throttle your rendering to a factor of the refresh rate (using SGI_swap_control or OML_sync_control) if you can't keep up with it; this avoids a variable frame rate which can be visually distracting. However, for many animations, especially those simulating physical activities (e.g. a bounce or slide), maintaining refresh rate rendering is critical to visual quality, so reducing quality may be a better option than dropping frames or displaying every other frame in those cases where your application can't keep up with the refresh rate.

Controlling buffer swap behavior

Use a TBD_swap_method_control to select triple buffering, blit or exchange methods.

Mutter

Misc notes (rib Thu Sep 3 20:40:43 BST 2009)

Reference

For reference: