Even at a surface level analysis the GTX 980 is a major step up over the GTX 680/770, which you’ll find many sites referring to in the briefing material for the new Maxwell GPUs. Most notable is the increased CUDA core count, up by a third to 2048, but also intriguing is the increased number of ROPs (up from 32) and Level 2 Cache (not listed, but increased to 2MB from 512KB). In theory this will aid both overall rendering speed and anti-aliasing performance significantly, a definitely plus for image quality enthusiasts.
Comparisons to the GTX 780 are more difficult to make. The number of shaders is down from 2304, as is the memory bus width. However the core frequency, memory frequency and amount of video memory is significantly higher. Guessing just how close the two cards will be in terms of performance is difficult, however by naming it the GTX 980 it’s likely that NVIDIA intends the card to at least go toe-to-toe with the smallest of the Big Kepler parts.
However, as we’ve mentioned the most eye-opening figure is that TDP – 165W. We can see where the card makes its savings by taking a look at the GM204 block layout.
GK104 organised the GPU by way of four graphics processing clusters (GPCs), and the GM204 does much the same. However whilst Kepler split each GPC into two streaming multiprocessors known as SMXs Maxwell has a much more aggressive partition: four SMMs of 128 CUDA cores each with yet further internal partitions. Interestingly the GM107 (seen on the 750Ti) made use of five SMMs per GPC. The Geforce GTX 980 is a fully unlocked GM204 part, whereas the GTX 970 makes use of 13 SMM and a total of 1664 CUDA cores.
The original GK104 was the GPU within four discrete 600-series parts including both GTX 660 variants and the GTX 670. Having proved so successful NVIDIA are highly likely to use similar measures to flesh out the rest of the 900-series mid-to-high-end card range; in essence, expect a GTX 960/960Ti based on the GK204 in the not too distant future.
Each Maxwell SMM is subdivided four ways into what can be thought of as a further sub-SM, such that each SMM now has four sets of instruction buffers, warp schedulers, dispatch units and registers rather than a single set per SMX in Kepler. This, in addition to offloading some scheduling to the CPU where appropriate, is why the card can be so power-efficient and perhaps more importantly make use of resources more intelligently.
One further development NVIDIA have made to the memory architecture is the implementation of Third Generation Delta Colour Compression. This new lossless algorithm allows them to make better use of the memory bandwidth available, which is essential for higher resolution rendering, texture sizes and IQ settings. Benefits realised by using this algorithm largely depend on the scene but internally NVIDIA estimate that the GTX 980 has between 17 and 25% more effective use of memory bandwidth than the GTX 680.
As well as hardware improvements NVIDIA have also implemented new technologies at the driver level which can make better use of Maxwell’s resources, perhaps most notably DSR and MFAA...