CompositeSwap

Issues

X and GL applications need to control how their data is displayed for several reasons:

Plain X doesn't have many ways of doing the above (it mostly assumes immediate mode drawing to the visible frame buffer), but GL provides many extensions related to handling the actual display of buffers on the user visible screen.

There are also issues related to memory consumption, swap behavior, and performance that can be addressed:

Memory savings

Performance

Behavior

OpenGL and GLX swap and throttling related extensions

Under a compositor, the behavior of these routines should change. Changes noted below, though in general "video frame" should be thought of as a virtualized compositor frame rate rather than a monitor refresh (e.g. the compositor may report 60fps to applications even though it only updates the screen when recompositing is needed):

Compositor extensions

To support the above, feedback from the currently active compositor is necessary. We'll need protocol for:

Triple buffering

Triple buffering means different things to different people. For convenience, we list the types we're concerned about for this discussion here. All have a high memory cost and should generally only be enabled for a small number of clients at a time.

  1. compositor based - compositor keeps a private copy of each application's front buffer; this means it always has a consistent, fully drawn pixmap to use for creating screen frames. glXSwapBuffers updates the redirected front and notifies the compositor a new one is ready to pick up as its private copy.
    • good for avoiding ugly partial drawing artifacts
  2. server based - server keeps last ready client buffer around and returns available buffers to the client after a glXSwapBuffers occurs
    • can help keep clients busy at the cost of extra memory (good to keep frame rates up while preserving vblank sync'd swapping)
  3. client based - server returns 3 buffers to the client, which somehow requests copies between various of them using glXSwapBuffers
    • like (2) but totally client side.

Another factor for triple buffering is how to handle extra frames. If frames are rendered faster than can be displayed, some applications may want to discard the extra frames, while others may want to queue them (and likely be throttled).

New code

All of this means new DRI2, display server and compositor code, but it should be doable. In particular we may want:

Implementation: SGI_swap_control for the X server

The SGI_swap_control extension allows applications to control their glXSwapBuffers frequency. The glXSwapIntervalSGI call lets clients specify how frequently, in frames, their buffer swaps should occur. Implementing this requires server support, since with DRI2 the server is responsible for performing swaps. The basic flow is as follows:

  1. When an application calls glXSwapBuffers, a swap is scheduled in the server for the current frame count plus the interval count (though if the swap is to be a page flip, it's scheduled to be scheduled, since flips occur at the next vblank after being queued).
  2. The server calls into the DDX driver's ->SetupSwap hook to request an event for the frame in which the swap will occur (DDX support is necessary to request the event on the correct CRTC)

  3. Control is returned to the client immediately
    1. Note: any futher GLX calls requiring a GLX context to be bound will block until the swap completes
  4. When the server receives a vblank event, it will perform all the swaps scheduled for the frame number received
  5. All sleeping clients for the frame number received are unblocked

Implementation: SGI_video_sync for the X server

SGI_video_sync gives clients control over their framerate by exposing the frame count and allowing apps to wait on a given frame count. glXGetVideoSyncSGI returns the current frame count for the display the drawable is on, and glXWaitVideoSyncSGI allows a client to block until a given frame count is reached on its drawable's display. Flow in the server & client for glXGetVideoSyncSGI direct rendered case:

  1. Client calls glXGetVideoSyncSGI
  2. Mesa code receives the request and turns it into a DRI2GetMSCReq protocol request
  3. display server receives the request and calls DDX driver's ->GetMSC hook

  4. DDX driver returns frame count based on drawable location (count is returned for the CRTC with the greatest intersection with the drawable)
  5. display server returns a DRI2GetMSCReply with the current MSC count to the client

If the window is redirected, compositor protocol will be required to handle step 3 instead of DDX code. If the drawable is an offscreen pixmap, this call is invalid.

Flow for glXWaitVideoSyncSGI direct rendered case:

  1. Client calls glXWaitVideoSyncSGI
  2. Mesa code receives the request and turns it into a DRI2WaitMSCReq protocol request
  3. display server receives the request and calls DDX driver's ->SetupWaitMSC hook to request an event for the specified frame

  4. server blocks the client until the requested frame is received (which could be immediate if the frame has already passed)
  5. display server returns a DRI2WaitMSCReply with current MSC, SBC to the client

Again, if the window is redirected, the compositor will need to be called at step 3 instead. And again, if the drawable is an offscreen pixmap, this call is invalid.

Testing

The code for the above is present in several repos and patches:

  1. kernel - DRM events patch is in drm-next of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6, Page flipping patch (not needed yet)

  2. libdrm - master branch
  3. dri2proto - dri2-swapbuffers branch
  4. mesa - git://people.freedesktop.org/~jbarnes/mesa master branch
  5. xserver - git://people.freedesktop.org/~jbarnes/xserver master branch
  6. xf86-video-intel - git://people.freedesktop.org/~jbarnes/xf86-video-intel master branch

See Graphics stack git development for information on how to build a stack with the above.

The direct rendered cases outlined in the implementation notes above are complete, but there's a bug in the async glXSwapBuffers that sometimes causes clients to hang after swapping rather than continue.

Open issues

This implementation does not guarantee tear free drawing in the non-composited, non-fullscreen (flip) case, but does provide the throttling feature implicit in SGI_swap_control. For a tear-free guarantee, the application must be performing full screen swaps eligible for page filpping or a compositor using page flipping must be present and the application's window redirected.

The implementation also needs screen from the the compositor in the case of redirected windows, so it can request a vblank event from the kernel for the correct CRTC.

Best (future) practices

These are the best practices we're shooting for with this work.

Avoiding tearing

Run your app under a compositing manager or make sure it's full screen and can use page flipping. It's important *not* to try to use SGI_video_sync or other "wait on vblank" calls for this, as tear-free drawing in that case will depend on the length of the vblank period (very small on some monitors/modes), interrupt latency and scheduling latency (i.e. your application may not be awakened to run until the end or after the vblank period).

Throttling rendering

Use SGI_video_sync, SGI_swap_control, OML_sync_control or ARB_sync extensions.

Controlling buffer swap behavior

Use a TBD_swap_method_control to select triple buffering, blit or exchange methods.

Mutter

Misc notes (rib Thu Sep 3 20:40:43 BST 2009)

Reference

For reference: