Technical Specifications
Shader Modules :- 28
CUDA Cores :- 3584
Base Clock :- 1480 MHz
GPU Boost Clock :- 1582 MHz
Texture Units :- 224
Texel fill-rate :- 331.5 GT/s
Memory Clock (Data Rate) :- 5505 MHz
Memory Bandwidth :- 484 GB/s
ROPs :- 88
L2 Cache Size :- 2816 KB
TDP :- 250W
Transistors :- 12 billion
Die Size :- 471mm^2
Manufacturing Process :- 16nm
SLI Support :- Yes (High Bandwidth Bridge)
Video Outputs :- HDMI 2.0b, 3 x DisplayPort 1.4
HDCP :- 2.2
CUDA Cores :- 3584
Base Clock :- 1480 MHz
GPU Boost Clock :- 1582 MHz
Texture Units :- 224
Texel fill-rate :- 331.5 GT/s
Memory Clock (Data Rate) :- 5505 MHz
Memory Bandwidth :- 484 GB/s
ROPs :- 88
L2 Cache Size :- 2816 KB
TDP :- 250W
Transistors :- 12 billion
Die Size :- 471mm^2
Manufacturing Process :- 16nm
SLI Support :- Yes (High Bandwidth Bridge)
Video Outputs :- HDMI 2.0b, 3 x DisplayPort 1.4
HDCP :- 2.2
The GeForce GTX 1080 Ti is NVIDIA’s brand new flagship GeForce GTX graphics card, and it’s not hard to see why. Built using the Pascal architecture and based on the same GP102 GPU which is the centrepiece of the NVIDIA TITAN X, the GTX 1080 Ti brings new performance levels to the GeForce line at a price far below that of the professional-class card. It’s tempting to call it a ‘slimmed down TITAN X’, but there’s a little more to it than simply slapping a new shroud on it and calling it a day.
The block diagram of an NVIDIA Pascal-based GPU should now be pretty familiar to everyone. Shader Modules are organised into blocks of six per GPC, all controlled by the GigaThread Engine that distributes highly parallelised workloads. Above is the block diagram for the GTX 1080 Ti’s GP102, indicating both the lower number of 32-bit Memory Controllers (11 vs 12) and less L2 Cache (3096 vs. 2816kb) compared with the TITAN X; not shown are the four Double Precision units per SM.
While the NVIDIA TITAN X might specialise in photorealistic rendering and deep learning applications, NVIDIA have balanced the spec of the GTX 1080 Ti to err towards gaming performance. In the process the bill of materials has gone down, justifying a lower price point and improving the price/performance landscape for desktop GPUs. The consequence is that it still features the same 3584 CUDA cores, but fewer ROPs and lesser frame buffer which are balanced out by higher base and boost frequencies.
The GTX 1080 Ti is also the debut of higher performance GDDR5X VRAM. Now hitting 11Gbps, the VRAM will also be rolled out in the coming weeks on a range of ‘OC’ SKUs for the GTX 1080 and GTX 1060.
Power & Cooling
The size and complexity of the GTX 1080 Ti compared to the 1080 necessitated a substantial upgrade to its power delivery system. Now sporting a 7-phase 2x dual-FET power supply, it’s capable of supplying up to 250A of conditioned power to the GPU, aiding system stability and reducing thermal waste. Compared with the TITAN X, which sported a 6-phase power design, the GTX 1080 Ti could see higher stable overclocks too.
Nonetheless the 1080 Ti is a power-hungry card, and that inevitably generates heat. Rated at 250W TDP, the increased thermals required a new cooler that can dissipate much more heat than the 180W of the 1080. The cooler’s metal base-plate and back-plate help to cool low-profile surface mounted components, whilst a copper vapour chamber, heatsink and radial fan do the majority of work keeping the GPU itself cool. The vapour chamber itself is mounted on the GPU baseplate which should serve to reduce mechanical stress on the PCB.
To increase cooling efficiency NVIDIA widened the exhaust vent, removing the DVI/D connector in the process and substituting it for an additional DisplayPort out (for a total of 3x DP, 1 x HDMI on the bottom tier). Additionally, NVIDIA have tweaked the back-plate design – now half the plate can be removed to improve airflow, a handy feature in cramped dual-GPU SLI configurations.
NVIDIA have indicated that internal tests for the 1080 Ti pushed the card up to 2GHz, but as usual take those sorts of numbers with a pinch of salt. The only guarantee are those Base and Boost clocks, although those themselves are impressive.
Comparative Specifications
The GTX 1080 Ti is expected to be the final major member of the GTX 10-series, rounding out a release which begun with the much-heralded GTX 1080 in June of last year. However, where other new entrants in the range have cater to mainstream gaming workloads – i.e. 1080p60 gaming – the 1080 Ti is all about high resolutions and frame rates. Don’t be surprised if the design is promoted as the first 5K capable gaming GPU, and one to seriously consider if you need consistent >90fps gameplay at higher than 1440p resolutions and maximised image quality settings.
Tiled Caching
Over the course of architecture launches NVIDIA have allowed enthusiasts to peek behind the curtain a little at some of the memory optimisations present within the updated hardware. One example was 4:1 and 8:1 Delta Color Compression modes, present on all Pascal GPUs but only revealed alongside the GTX 1060 last summer and believed to improve effective memory bandwidth by as much as 20%. The same is true of the GTX 1080 Ti’s launch.
Boosting effective memory bandwidth on NVIDIA GTX GPUs is a technique they’re calling Tiled Caching, present on both Maxwell and Pascal architectures but only revealed at GDC 2017. The revised rendering methodology aims to reduce the high overdraw characteristic of Immediate Rendering modes, taking cues from the Tiled Rendering system often used by mobile GPUs.
”With a tiled renderer, the screen is broken into many separate tiles, and rendering is performed in two passes. A first pass processes the geometry and determines which tiles are covered by each triangle, and writes this information out to DRAM. Then in a second pass, the geometry list is reprocessed for each tile, one tile at a time– each tile is rendered to completion before moving to the next tile. Tiles are sized so all rendering happens on-chip, with only the final color written the DRAM.
[..]
We stick with the proven immediate rendering model for graphics, with no binning prepass. However, within the immediate rendering pipeline, we add a binner that writes to an on-chip geometry queue in L2. Once the binned data fills a preset buffer size within L2, Pascal renders this geometry, a tile at a time, until the queue is processed. The modified raster behaviour has the effect of modifying the working set of output pixels to naturally stay within the capacity of the existing on-chip L2 cache, whereas single pass rasterization would have overflowed the cache capacity. We call this technique “Tiled Caching” – an approach that uses tiled rasterizer behaviour to improve the effectiveness of the L2 Cache. Compared to a traditional tiler architecture, this architecture realizes similar pixel bandwidth savings benefits, but without any geometry bandwidth tax or binning pass latency.”
[..]
We stick with the proven immediate rendering model for graphics, with no binning prepass. However, within the immediate rendering pipeline, we add a binner that writes to an on-chip geometry queue in L2. Once the binned data fills a preset buffer size within L2, Pascal renders this geometry, a tile at a time, until the queue is processed. The modified raster behaviour has the effect of modifying the working set of output pixels to naturally stay within the capacity of the existing on-chip L2 cache, whereas single pass rasterization would have overflowed the cache capacity. We call this technique “Tiled Caching” – an approach that uses tiled rasterizer behaviour to improve the effectiveness of the L2 Cache. Compared to a traditional tiler architecture, this architecture realizes similar pixel bandwidth savings benefits, but without any geometry bandwidth tax or binning pass latency.”
By investing in the optimisation of memory, both in partnership with vendors to produce higher performance DRAM and improving algorithms and GPU architectures, NVIDIA improves overall hardware performance without incurring the substantial cost of transitioning to new memory paradigms on the desktop platform (such as HBM2).
Performance Expectations
It’s no surprise that the GTX 1080 Ti is being lauded by NVIDIA as being the best Ti to date, 40% more CUDA cores alone will account for an enormous performance improvement over the previous flagship. NVIDIA themselves are stating that overall you would expect to see an average of 35% net benefit compared to the GTX 1080, the largest margin a flagship Ti has held over its non-Ti counterpart for at least three generations.
That said, comparisons will perhaps be more readily drawn with the NVIDIA TITAN X, given their broadly similar specifications. Held up more as a workstation component, the TITAN X features both a larger frame buffer (12GB vs 11GB) and wider memory bus (384-bit vs 352-bit), and yet due to the faster memory on the GTX 1080 Ti total memory bandwidth available differs by less than 1%.
More substantial is the reduction in number of ROPs in the GTX 1080 Ti compared to the TITAN. This will have an impact on both overall frame render times and further influence performance when rendering with hardware anti-aliasing. It’s here, combined with the larger frame buffer, that the TITAN X make a case in professional and enterprise environments.
Where the GTX 1080 Ti pulls ahead is in raw GPU clock speed. Both Base and Boost frequencies receive a small but significant improvement which should have a particular impact in gaming, and of course the GPU is overclockable beyond this.





