NVIDIA GTX670 Review

👤by Richard Weatherstone Comments 📅08-05-12
Specification & Kepler Architecture

It was almost two years to the day that we saw the introduction of NVIDIA's Fermi, NVIDIA's previous architectural change. The GF100 featured 512 stream processors in a 16x32 format built by TSMC in a 40nm process. Sporting OpenGL 4.0 and DirectX 11, enthusiasts climbed over each other to get the latest and greatest GPU from NVIDIA. While the performance was there, power efficiency was very poor with the cards consuming lots of power and running very hot.

Enter the 5 series 8 months later and many would say the GTX580 was the card the GTX480 should have been. It was more powerful thanks to the fully unlocked memory controller (6x64bit), ran cooler and thus became a firm favourite among the high end GPU market.

For a little over a year the GTX580 enjoyed sitting atop of the GPU pile (save for dual GPU variants) and it wasn't until the AMD HD7970 emerged in December 2011 that the GTX580 lost it's performance crown. NVIDIA however had not been resting on their laurels...

Kepler Core
The successor to Fermi is codenamed Kepler. Listening to feedback of Fermi from gamers around the world, NVIDIA sought to create a graphics card that was not only the best money could buy in terms of performance but crucially it had to run cool and quiet to make for a power efficient model. Kepler addresses these issues in two ways, GPU Boost which we will discuss further in our overclocking section of the review and a redesigned streaming multiprocessor.

The GTX670 is equipped with the same GK104 core of the GTX680 which NVIDIA claim is their highest and most power efficient GPU to date. Fabricated on the 28nm process, every component was designed with power efficiency in mind to ensure the GPU gave the best performance-per-watt possible.

While the 28nm manufacturing process holds the bulk of the power saving features, the way the new architecture works is also key to furthering Kepler's power reduction. SMX, an evolution of the SM from Fermi is NVIDIA's new streaming multiprocessor:

GTX680 Kepler Core

To feed the new SMX, each unit has four warp schedulers, each capable of firing off two instructions per warp, every clock. With a redesigned scheduling function, the GTX680 is a lot less complex and much more streamlined allowing the compiler to determine which instructions will be issued and can provide this information direct to the hardware block rather than going around the houses using Multi-port decode and a register scoreboard along with the dependency check before the information gets issued. This method is both quicker and more importantly, more power efficient.

GTX670 x7 SMX

The Block diagram above shows the Kepler core layout with one less SMX than it's bigger brother, the flagship GTX680.

Perhaps the biggest change to the processor core is the omission of the old Shader Clock. The shader clock was getting tired as it was part of the old Tesla architecture which was implemented as an area optimisation. Because of the way Kepler executes instructions at a much more streamlined manner and at a higher clock rate, fewer copies of the execution unit need be made.

Another improvement over Fermi with the introduction of SMX is the redesigned Polymorph engine (2.0). The Polymorph engine takes card of the tessellation workload of the GTX680. The design of the new Polymorph engine is to ensure that the tessellation we see on screen has little (as possible) impact on the rendering performance. What you may find bizarre though is that NVIDIA have cut the Polymorph engine count in half to 8 from Fermi's 16 yet claim it is almost twice as quick. This is thanks to the improved core an memory clockpeed.

Overall then, it seems NVIDIA have outdone themselves. Higher clockspeeds on both core and memory, 3x the amount of stream processors yet streamlined by eliminating the shader clock to cut the amount of power consumption without compromising overall performance.

Here are the official specifications of the NVIDIA GTX670:

Processing Units
Graphics Processing Clusters
7 SMXs
1344 CUDA Cores
112 Texture Units
32 ROP Units

Clock Speeds
915 MHZ Base Clock
980 MHz Boost Clock
6008 MHz Memory Clock (Data rate)
512KB MHz L2 Cache Size

2048MB Memory Total Video Memory
256-bit GDDR5 Memory Interface
192.2 GB/s Total Memory Bandwidth
102.5 GigaTexels/sec GB/s Texture Filtering Rate (Bilinear)

28 nm Transistor Count
3.54 Billion Connectors
2 x Dual-Link DVI 1 x HDMI 1 x DisplayPort Form Factor
Dual Slot Power Connectors
2 x 6-pin Recommended Power Supply

Power & Thermal
500 Watts Thermal Design Power (TDP)1
170 Watts Thermal Threshold2
98 C

19 pages 1 2 3 4 > »