👤by David Mitchelson Comments 📅30-10-18
The Turing TU102 GPU

Here’s the NVIDIA TU102 in all its glory. Yes, it’s huge; measuring 775mm^2 and composed of 18.6 billion transistors, it is by some margin the largest consumer GPU released to date. It’s also the most complex, incorporating a brand new architecture that re-imagines the Shader Module as a hybrid rendering engine capable of rasterization, real-time raytracing and deep learning inferencing simultaneously.

A fully equipped TU102 is an absolute beast. Up to 4608 CUDA cores are supported, 20% more than the 3840 in the Pascal-based TITAN Xp. But that’s only the starting point. With Turing, two new components are now part of the Shader Module: Tensor Cores, and RT Cores. Each Shader Module incorporates eight of the former and one of the latter for a total of 576 and 72 respectively.

Pascal vs Turing SM Block Diagrams (simplified)

Turing Tensor Cores are extremely capable at mixed precision workloads, and by natively supporting INT4 operations is well suited to the task of inferencing. The RT Core by comparison is dedicated to optimizing the process of raytracing, and thanks to this dedicated hardware accelerates the process tenfold over the GTX 1080Ti.

The RTX 2080 Ti slightly reduces the number of active SMs to 68, commensurately also reducing the number of active Tensor and RT cores. Don’t be surprised if the TITAN branding gets another turn at bat in this generation.


A New Workload Model – RTX-OPS

The RTXOPS Of A RTX 2080 Ti

The Turing architecture powerfully changes the way that NVIDIA graphics hardware processes the task of rendering a frame with mixed data structures and precision. NVIDIA are calling this RTX-OPS (RTX Operations Per Second), and it can be used as a metric to define how fast the GPU is with these parallel workloads. The RTX 2080 Ti Founder’s Edition is capable of 78 RTXOPS; it remains to be seen whether in the future this manner of delineating performance will be as widely used as TFLOPS.

As you can see, the RTX-OPS model is strongly parallelized. Ray tracing and Int32 shading operates alongside FP32 shading (i.e. rasterization), while Neural Network processing operates when the frame is rendered. It’s neat, and explains how NVIDIA were able to push ray tracing to the forefront even after algorithmic optimisations were developed.

24 pages « 2 3 4 5 > »