Issues
X and GL applications need to control how their data is displayed for several reasons:
- preventing distracting artifacts - ugly "tearing" or partial drawing for example
- smoothness - constant, predictable frame rates are a requirement for animation, video and high end simulation programs
- performance - apps need to know how well they're keeping up with their desired frame rate so they can throttle back or simplify drawing as needed
Plain X doesn't have many ways of doing the above (it mostly assumes immediate mode drawing to the visible frame buffer), but GL provides many extensions related to handling the actual display of buffers on the user visible screen.
There are also issues related to memory consumption, swap behavior, and performance that can be addressed:
Memory savings
- memory consumption could be reduced if the private back/front pair was reduced to just a private back with the compositor copy acting as front (though there are issues with front buffer rendering in this case)
- it could also be reduced by throwing away the private back buffer in between frame rendering
Performance
- performance could be significantly improved on low bandwidth platform if buffer swaps could be simple pointer exchanges when windowed (similar to the way page flips work for full screen applications)
requires window managers to draw decorations independent of application window (i.e. exchange is only possible if front & back window pixmaps are the same size)
- page flipping for full screen swaps is also a significant win on bandwidth limited platforms
Behavior
- some applications want to control how buffer swaps occur (e.g. to preserve back buffer contents after a swap)
- triple buffering should be available to applications that need it, with configurable behavior for discard vs. queue if the buffers are rendered faster than can be displayed
OpenGL and GLX swap and throttling related extensions
Under a compositor, the behavior of these routines should change. Changes noted below, though in general "video frame" should be thought of as a virtualized compositor frame rate rather than a monitor refresh (e.g. the compositor may report 60fps to applications even though it only updates the screen when recompositing is needed):
SGI_swap_control - controls how frequently glXSwapBuffers swaps occur (in frames)
- for redirected windows, interval is in compositor frames rather than monitor video frames
SGI_video_sync - allows clients to query frame counts and wait on specific counts or divisor/remainders thereof
- for redirected windows, glXGetVideoSyncSGI returns the compositor frame count rather than the monitor frame count
- for redirected windows, glXWaitVideoSyncSGI will block until the compositor frame count satisfies the specified conditions
OML_sync_control - combines the above and adds the notion of a "swap count" and frame timestamp, allowing applications to closely monitor their performance and swap frequency
- for redirected windows, MSC represents the compositor frame count; likewise UST indicates the time when the compositor last generated a frame
- for redirected windows, SBC could be incremented when the compositor copies a buffer rather than when the buffer is copied from back to front
OML_swap_method - exposes swap method (whether copy, exchange or undefined) to the FBconfig
- the way X and compositors work now make it hard to report anything other than 'unknown' as the swap method; page flipping is opportunistic, and exchange is difficult for windows (redirected or not) due to reparenting by window managers
SGIX_swap_group/NV_swap_group - allow clients to swap in a synchronized manner
- for redirected windows, swap groups could depend on compositor copies (see SBC count above)
SGIX_swap_barrier - controls swap group behavior
- should be unaffected
- TBD_swap_control - allow selection triple/N buffering
- needs to be defined, can fail if driver can't handle requested number of buffers
INTEL_swap_event - deliver swap information to clients after a swap completes, useful for integrating swap based throttling into client event loops
Compositor extensions
To support the above, feedback from the currently active compositor is necessary. We'll need protocol for:
CompositeNotifyPixmapCopied(pixmap) - compositor notifies server that pixmap has been copied, unblocks clients blocked on swapping or rendering to front
- should be doable with xsync already, as a defined protocol between the compositor and clients
NotifyPixmapReady(pixmap) - server notifies compositor that an application has a new frame is ready (e.g. after a glXSwapBuffers)
- should be similar to a damage event, maybe damage is sufficient?
CompositeNotifyFrameDone - compositor notifies server that the compositor has finished drawing a new frame; clients blocked on frame related events can continue. Would increment the MSC (everywhere you see "video frame" in the above extensions would refer to this notification).
- again, should be doable with xsync
Triple buffering
Triple buffering means different things to different people. For convenience, we list the types we're concerned about for this discussion here. All have a high memory cost and should generally only be enabled for a small number of clients at a time.
- compositor based - compositor keeps a private copy of each application's front buffer; this means it always has a consistent, fully drawn pixmap to use for creating screen frames. glXSwapBuffers updates the redirected front and notifies the compositor a new one is ready to pick up as its private copy.
- good for avoiding ugly partial drawing artifacts
- server based - server keeps last ready client buffer around and returns available buffers to the client after a glXSwapBuffers occurs
- can help keep clients busy at the cost of extra memory (good to keep frame rates up while preserving vblank sync'd swapping)
- client based - server returns 3 buffers to the client, which somehow requests copies between various of them using glXSwapBuffers
- like (2) but totally client side.
Another factor for triple buffering is how to handle extra frames. If frames are rendered faster than can be displayed, some applications may want to discard the extra frames, while others may want to queue them (and likely be throttled).
New code
All of this means new DRI2, display server and compositor code, but it should be doable. In particular we may want:
- DRI2 proto for waiting on a given swap or frame count
DRI2 swapbuffers support for frame count & divisor/remainder delayed swaps
- new server code for handling swap groups
- new server code for handling indirect clients doing frame count or swap buffer count waits
Implementation: SGI_swap_control for the X server
The SGI_swap_control extension allows applications to control their glXSwapBuffers frequency. The glXSwapIntervalSGI call lets clients specify how frequently, in frames, their buffer swaps should occur. Implementing this requires server support, since with DRI2 the server is responsible for performing swaps. The basic flow is as follows:
- When an application calls glXSwapBuffers, a swap is scheduled in the server for the current frame count plus the interval count (though if the swap is to be a page flip, it's scheduled to be scheduled, since flips occur at the next vblank after being queued).
The server calls into the DDX driver's ->SetupSwap hook to request an event for the frame in which the swap will occur (DDX support is necessary to request the event on the correct CRTC)
- Control is returned to the client immediately
- Note: any futher GLX calls requiring a GLX context to be bound will block until the swap completes
- When the server receives a vblank event, it will perform all the swaps scheduled for the frame number received
- All sleeping clients for the frame number received are unblocked
Implementation: SGI_video_sync for the X server
SGI_video_sync gives clients control over their framerate by exposing the frame count and allowing apps to wait on a given frame count. glXGetVideoSyncSGI returns the current frame count for the display the drawable is on, and glXWaitVideoSyncSGI allows a client to block until a given frame count is reached on its drawable's display. Flow in the server & client for glXGetVideoSyncSGI direct rendered case:
- Client calls glXGetVideoSyncSGI
- Mesa code receives the request and turns it into a DRI2GetMSCReq protocol request
display server receives the request and calls DDX driver's ->GetMSC hook
- DDX driver returns frame count based on drawable location (count is returned for the CRTC with the greatest intersection with the drawable)
- display server returns a DRI2GetMSCReply with the current MSC count to the client
If the window is redirected, compositor protocol will be required to handle step 3 instead of DDX code. If the drawable is an offscreen pixmap, this call is invalid.
Flow for glXWaitVideoSyncSGI direct rendered case:
- Client calls glXWaitVideoSyncSGI
- Mesa code receives the request and turns it into a DRI2WaitMSCReq protocol request
display server receives the request and calls DDX driver's ->SetupWaitMSC hook to request an event for the specified frame
- server blocks the client until the requested frame is received (which could be immediate if the frame has already passed)
- display server returns a DRI2WaitMSCReply with current MSC, SBC to the client
Again, if the window is redirected, the compositor will need to be called at step 3 instead. And again, if the drawable is an offscreen pixmap, this call is invalid.
Testing
The code for the above is present in several repos and patches:
kernel - DRM events patch is in drm-next of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6, Page flipping patch (not needed yet)
- libdrm - master branch
- dri2proto - dri2-swapbuffers branch
- mesa - git://people.freedesktop.org/~jbarnes/mesa master branch
- xserver - git://people.freedesktop.org/~jbarnes/xserver master branch
- xf86-video-intel - git://people.freedesktop.org/~jbarnes/xf86-video-intel master branch
See Graphics stack git development for information on how to build a stack with the above.
The direct rendered cases outlined in the implementation notes above are complete, but there's a bug in the async glXSwapBuffers that sometimes causes clients to hang after swapping rather than continue.
Open issues
This implementation does not guarantee tear free drawing in the non-composited, non-fullscreen (flip) case, but does provide the throttling feature implicit in SGI_swap_control. For a tear-free guarantee, the application must be performing full screen swaps eligible for page filpping or a compositor using page flipping must be present and the application's window redirected.
The implementation also needs screen from the the compositor in the case of redirected windows, so it can request a vblank event from the kernel for the correct CRTC.
Best (future) practices
These are the best practices we're shooting for with this work.
Avoiding tearing
Run your app under a compositing manager or make sure it's full screen and can use page flipping. It's important *not* to try to use SGI_video_sync or other "wait on vblank" calls for this, as tear-free drawing in that case will depend on the length of the vblank period (very small on some monitors/modes), interrupt latency and scheduling latency (i.e. your application may not be awakened to run until the end or after the vblank period).
Throttling rendering
Use SGI_video_sync, SGI_swap_control, OML_sync_control or ARB_sync extensions.
Controlling buffer swap behavior
Use a TBD_swap_method_control to select triple buffering, blit or exchange methods.
Mutter
- When memory isn't a concern:
- Triple buffer all composited applications:
- 1) busy being used as part of the compositors next render.
- 2) a front buffer for the application to queue render commands against; swaping with 3 when done.
- 3) a back buffer waiting to be picked up by the compositor, and swap with 1)
- Ideally the compositor never has to wait to pick up the applications next front buffer, and no copying is required.
- use the SGI_swap_control extension to set glXSwapInterval (1) in applications and compositor.
- compositor "video frame periods" are defined - as normal - so we flip at the first vblank after a render completes.
- Allow the compositor to drive the video frame period of redirected applications (I.e. consider the compositor to be a pseudo display for redirected drawables)
- (Lets assume the compositor is rendering to multiple windows to cover multiple displays - because the full size of the monitors exceeds the GPU render target limits)
- The compositor can use a swap group to ensure each of these windows presents in sync.
- A new fence/sync object like extension could be implemented that allows the compositor to say: "when my sync group becomes ready and swaps, please notify all these composited-drawables of a video frame progression".
- I'm not sure what the best way to link composited drawables to a compositor are a.t.m
- when the compositors group swap completes I imagine it would be possible to avoid having to wait for the compositor to be scheduled to send a DRI2 request to the X server, to send events to clients. I.e. instead of using new DRI2 protocol to send the notification can it not be dealt with by the drm driver that would presumably know when the compositors group swap completes, it would then know to increment the video frame period for some other set of associated composited-drawables and if they pass their designated swap interval they can be unblocked. (or if we have asynchronous swap buffers - see below - an event could be sent by writing to the device file)
- asynchronous glXSwapBuffers and swap-buffers-complete events.
- Although the applications shouldn't run ahead of the compositor, since there's no point rendering more frames than the compositor can keep up with, applications also shouldn't be blocked from queueing up commands for the next frame. If we had asynchronous swapping + poll-able event notifications for swap-completion the application could stop itself painting when it knows it is two frames ahead of the compositor.
- Triple buffer all composited applications:
Misc notes (rib Thu Sep 3 20:40:43 BST 2009)
- It seems that composited apps should never need to know about real world screen vblank issues, that's only relevant to non-redirected windows including the compositor's. When dealing with a redirected window it seems it would be acceptable to come up with an entirely fake number for all existing extensions that care about vblanks. Somehow tying it to the render/swap-complete of the current compositor seems reasonable.
- Assuming we have the compositor generating a fake swap interval as above, and the compositor itself is responsible for synchronizing all windows it's responsible for it seems like all the swap group related extensions may just work.
- Walking through a hypothetical example of a composited flightgear simulator across multiple monitors seems to add up...
- Lets say the compositor has two windows across two monitors (assuming it would exceed render target limits to just have one)
- Say flightgear also creates two windows for the same reason and wants to use a swap group to ensure they get presented at the same time.
- Assume the compositor is itself also using a swap group to ensure all it's windows get presented at the same time and it drives the video frame period according to it own swap group becoming ready and completing.
- If the first flightgear window is drawn too and a swap issued it becomes ready but doesn't actually swap yet (so the compositor wont see it)
- The compositor may at this point complete it's current frame and swap and the latest flightgear window won't be shown.
- The second flightgear window can be drawn too and a swap issued which makes the group ready so now both windows are swapped and become available to the compositor.
- The compositor will pick up the new window contents and since it is itself using a a swap group both windows will be presented in sync.
- I can't see how GLX_OML_swap_method can be supported at all, given that GLX doesn't know ahead of time if any glx window will be later redirected?
Reference
For reference:
Apple GL Programming Guide - covers best practices for GL programmers in Apple's composited environment


