Some time ago, we implemented page flipping for several of the DRI drivers. Page flipping is an optimization where, on glXSwapBuffers, rather than blitting the contents of the GL window's back buffer to the front buffer, the driver simply updates the monitor's scanout registers to display the "back" buffer instead of the front; the back and front buffer essentially switch roles on every buffer swap. Note that one buffer swap is one frame of animation.

It seemed like a good idea at the time, but I (EricAnholt) have been arguing that it was a mistake, due to the added complexity, increased overhead for 2d, and X-versus-OpenGL nonconformance issues due to the shadow buffer used to maintain 2d contents between front and back.

So, how about some benchmarks? I ran quake3 with pageflipping enabled and disabled on a Radeon 64M VIVO, and got:

```
x /home/anholt/pf-q3-disable
+ /home/anholt/pf-q3-enable
+--------------------------------------------------------------------------+
| x + |
| x + |
| x + |
|x x + + |
| |AM |__AM||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 70.7 70.8 70.8 70.78 0.04472136
+ 5 73.1 73.3 73.3 73.26 0.089442719
Difference at 95.0% confidence
2.48 +/- 0.103127
3.50381% +/- 0.145701%
(Student's t, pooled s = 0.0707107)
```

With a Radeon 8500:

```
x /home/anholt/pf-q3-disable
+ /home/anholt/pf-q3-enable
+--------------------------------------------------------------------------+
| + |
| x + |
| x + |
|x x x + +|
| |____________A________M___| |__M__A______| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 71.6 71.8 71.8 71.74 0.089442719
+ 5 72 72.1 72 72.02 0.04472136
Difference at 95.0% confidence
0.28 +/- 0.103127
0.390298% +/- 0.143752%
(Student's t, pooled s = 0.0707107)
```

And with a Radeon 7000:

```
x /home/anholt/pf-q3-disable
+ /home/anholt/pf-q3-enable
+--------------------------------------------------------------------------+
|x +|
|x +|
|x +|
|A A|
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 31.2 31.2 31.2 31.2 0
+ 3 32.9 32.9 32.9 32.9 4.7683716e-07
Difference at 95.0% confidence
1.7 +/- 4.91975e-07
5.44872% +/- 1.57684e-06%
(Student's t, pooled s = 2.75302e-07)
```

So, with quake3 at 1024x768 we see an improvement between .4 and 5.4% in framerate. But I suppose that's not really fair for the rv100, since nobody would actually run it at that resolution on that card. So how about 640x480?

```
x /home/anholt/pf-q3-disable
+ /home/anholt/pf-q3-enable
+--------------------------------------------------------------------------+
| x + |
| x x + + |
| x x + + |
||_M___A______| |______A___M_||
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 5 59.5 59.6 59.5 59.54 0.054772256
+ 5 60 60.1 60.1 60.06 0.054772256
Difference at 95.0% confidence
0.52 +/- 0.0798822
0.873362% +/- 0.134166%
(Student's t, pooled s = 0.0547723)
```

Now, we see that if we crank down the resolution to a playable framerate, page flipping again becomes unimportant, even on the Radeon 7000.

From a theoretical perspective, let's assume that pageflipping turns glXSwapBuffers into a basically free operation, and that the back-to-front blit execution time is dominated by the video card's VRAM bandwidth. The peak performance win of pageflipping is therefore

```
2 * width * height * depth (in bytes)
framerate * --------------------------
bandwidth
```

Where `framerate`

is the FPS count without pageflipping, and the constant 2 factor comes from the fact that every pixel in the blit is read once and written once. In the example above of a Radeon 7000 at 640x480x32, you have 2.6GB/s of card bandwidth to work with, and about 2.33M of bandwidth used per blit. So each blit takes about 0.88ms to complete. By making swapping a "free" operation and starting at 59.6fps, you'd gain at most an additional 52.5ms per second, or about 5% less rendering time per buffer swap. Likewise in the 1024x768 example you'd end up with at most an additional 70.3ms or 7% higher framerate.