The effect of AGP settings on 3d performance comes up often, and I've always told people not to bother with tweaking AGP since it doesn't matter. But I realized I should really make sure that what I say is true. So, I've benchmarked a couple of things with AGP at 1x and 4x. I chose quake3, since it's an old standby and I've got scripts for it, and ipers, since it's very vertex-dispatch intensive (though vtxfmt fallbacks may be hurting it a lot these days and making it a less useful test). The machine was a P4 running at 1Ghz with a Radeon 64MB VIVO, with Options "AGPMode" "4" vs not in the xorg.conf. Drivers were Mesa CVS as of around 2004-11-01.

Quake3 (640x480, defaults, running demofour script from idr):

x 1x
+ 4x
+--------------------------------------------------------------------------+
|          x                                                   +          +|
|x         x          x                              +         +          +|
|   |______A_______|                                     |_____M_A________||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5         127.2         127.4         127.3         127.3   0.070710678
+   5         127.7         127.9         127.8        127.82   0.083666003
Difference at 95.0% confidence
        0.52 +/- 0.11297
        0.408484% +/- 0.0887435%
        (Student's t, pooled s = 0.0774597)

So, there was a 0.4% improvement by going to AGP 4x. However, what about the thrashing case? I hacked the DDX to only allocate 1MB for on-card textures and re-ran the test. (Note that with the default allocation of 8MB of AGP memory, this means I've got 5MB of space for textures available).

x 1x-1M
+ 4x-1M
+--------------------------------------------------------------------------+
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|A                                                                        A|
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   4          96.4          96.4          96.4          96.4             0
+   4         125.3         125.4         125.4       125.375          0.05
Difference at 95.0% confidence
        28.975 +/- 0.061175
        30.0571% +/- 0.0634595%
        (Student's t, pooled s = 0.0353553)

A 30% difference in a serious texture thrashing case. OK, interesting. I dropped the first run of each of them because they were clearly outliers. I can't explain why, but check the next test. I used the RADEON_GARTTEXTURING_FORCE_DISABLE environment variable to disable GART texturing and re-ran, while at the 1M on-card texture size.

x 1x-1M-noagptex
+ 4x-1M-noagptex
+--------------------------------------------------------------------------+
|x                                                                       + |
|xx                                                                      ++|
|xx                                                                      ++|
|A|                                                                      A||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5          52.2          52.3          52.2         52.24   0.054772256
+   5          60.3          60.4          60.3         60.34   0.054772256
Difference at 95.0% confidence
        8.1 +/- 0.0798822
        15.5054% +/- 0.152914%
        (Student's t, pooled s = 0.0547723)

Interestingly, these framerates are just what the outliers on the first runs of the previous test were. Perhaps we've got a bug, where the first client doesn't get AGP texturing? Odd, but possible. Anyway, with no AGP texturing, we're seeing a 15% performance improvement from increased AGP speeds, probably because textures are being uploaded to the card as hostdata blits (the texture data is written to the command ring which is in AGP space, and sent to the card that way).

How about the case of using AGP textures, with the card space being limited but AGP space being relatively large? I set Options "AGPSize" "64", giving me 61M of AGP texture space.

x 1x-1M-61Magp
+ 4x-1M-61Magp
+--------------------------------------------------------------------------+
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|x                                                                        +|
|A                                                                        A|
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   4          95.4          95.4          95.4          95.4             0
+   4         125.4         125.5         125.5        125.45   0.057735027
Difference at 95.0% confidence
        30.05 +/- 0.0706388
        31.499% +/- 0.0740449%
        (Student's t, pooled s = 0.0408248)

Again, I threw out the first run of each test for being outliers. A 31% performance improvement, so I probably wasn't thrashing when I was running with AGP the first time.

Unfortunately, ipers doesn't have any reporting mechanism, and its measurement is too imprecise to say much. However, both at 1x and 4x the framerate flipped between 12.50 and 14.29 fps.

It appears that for normal usage on cards where you aren't significantly restricted in terms of local memory, 4x AGP buys you almost nothing (.4%). If you're needing to grab nearly all of your textures from AGP, it can be a performance improvement of as much as 30%, and it can improve texture upload performance to card memory sufficiently for a 15% improvement when that's made to be a bottleneck.

-- EricAnholt