Programmed Input/Output (PIO)
Basically, PIO means that the processor is responsible to write a byte/word at a time to one I/O address at a time to communicate a block of data. Note that for x86, I/O space means something different than most other arch's as x86 has seperate I/O and memory spaces -- addressed by different instructions out/in vs store/load (okay, it's mov in x86 parlance, but if you know that, you already should know what I'm talking about). Think of PIO as writing to or reading from one end of a FIFO while the target reads/writes to the other end.
Memory-Mapped Input-Output (MMIO)
Like PIO, but the FIFO is directly memory mapped. Again, this means more to x86 as it has the wierd I/O vs. memory distinction that 68k, etc. do not have. The FIFO is of fixed length and writes to successive locations are writes "further down" the FIFO. Also, MMIO allows a set of registers to be directly addressed by location where in the PIO system they would be addressed by "write to register # to address register, write/read data to/from data register, repeat" kind of system. MMIO is genearlly simpler to program for than PIO, but PIO is a lot easier to design hardware for.
Direct Memory Access (DMA)
This is the most complex of the data-transfer systems as both the host and the slave can access each others memory in any way they chose. As most !CS or !CprE people can quickly see, that only leads to more problems. So, DMA programming is the most complex of the three normal types of memory access. With DMA, an external chip/processor (let's be honest, a 3D processor counts as an external processor in its own right) can see anything in out memory space and we can see anything in its memory space -- barring limitations from things like the GART. With DMA, a typical interaction with an external chip/ processor is "here's an address at which you can find a complex data structure in my memory space in which you can find the commands and data to describe the operation which I wish you to perform. Have fun. Oh, tell me when you're done." DMA will normally allow the highest level of performance, but it does require some compromises that may sacrifice some latency for bandwidth.
All of these communication schemes was originally used with disk I/O before graphics entered the picture, so you may wish to view them in that light. It makes it more obvious how they fit together in terms of efficiency. In PIO, you had to write a loop to write a chunk of data to one address in I/O space with outb or outw.
With MMIO, you could use a simple memcpy call and take advantage of all of the optimizations inherent in that.
With DMA, you wrote a few bytes to an I/O address to tell an external chip to "go to town" on data you had already composed. Meanwhile, you could work on getting the next chunk ready. Most simultaneous work getting done in the latter case.