Ampere & Mellanox Take Centre Stage In Delayed NVIDIA GTC Keynote

👤by Tim Harmer Comments 📅14.05.2020 22:59:27

NVIDIA CEO Jensen Huang finally gave his Graphics Technology Conference keynote - dubbed the first 'Kitchen Keynote' - today through a series of packaged videos available on demand. The content focussed on new innovations in datacentre architecture, particularly new scalar technologies and application frameworks that accelerate key workloads including deep learning and inferencing, but also didn't skip the other myriad applications for their increasingly general-purpose solutions. And for the first time NVIDIA showcased tech. from Mellanox, NVIDIA's new acquisition that specialises in exascale networking hardware.

The highlight for many will be the introduction of the long-awaited Ampere architecture in the form of the NVIDIA A100, the most complex processor assembled to date. It effectively supplants the Volta-based V100 GPU at the top of NVIDIA's processor lineup, and is the first of their GPUs to leverage TSMC's 7nm manufacturing process (in a manner that is been 'optimised for NVIDIA'). Its 54bn transistors is almost 2.5x the V100, some of which constitute 3rd generation Tensor cores alongside HBM2 to push the available memory bandwidth to 1.2 TB/sec. It is, by any measure, a beast.

NVIDIA's A100 is integrated into the 3rd generation DGX server and workstation product line for High Performance Computing, shipped as a general purpose solution for artificial intelligence learning and inferencing. Eight A100 processors are augmented by Mellanox-powered HDR InfiniBand interconnects, two 64-core AMD Rome 7742 CPUs and 15TB storage at an initial price of $199,000. It offers 5 Petaflops of performance in a single node.

One innovation that helps the A100 stand apart from prior generations are the new 3rd Gen Tensor Cores, which offer a peak throughput of up to 20x more than comparable hardware on the V100. This is partially due to the use of the Tensor Float32 data structure, which compresses 8bit+23bit FP32 into 8bit+10bit instructions while not necessitating a change to the underlying code being run.

There was some discussion of the innovations unveiled during NVIDIA's traditional GTC window, particularly DLSS 2.0 and ray-tracing in Minecraft for Windows 10. Jensen admitted that initial responses to DLSS were somewhat mixed, but went on to promote its 2nd generation successor. Deep Learning Super Sampling's revised implementation utilises a vector matrix in its training and inferencing algorithms that allows the processed image to more closely resemble the very high resolution 'ground truth', allegedly surpassing even a frame rendered at the monitor's native resolution.

It should be noted however that the demonstration - itself a comparison of single images at DLSS 2.0 rendered resolution, 720p, 1080p 'native', 1080p DLSS 2.0 and 16K 'ground truth' - relied on Epic's 'Infiltration' canned benchmark. As has been observed in the past, DLSS imposed on 'real world' gameplay can vary significantly in quality; the second generation still has a lot to prove.

Other core implementations of NVIDIA's technology were showcased during the hour-plus keynote including natural language processing for conversational AI, automotive applications, data-driven recommendation engines and of course scientific investigation. They introduced Omniverse, a platform for 3D visualisation that ties many of these use-cases together which is based on open standards and has been developed in partnership with industry names such as Pixar, Autodesk, Adobe and Unity.

The full keynote in nine parts, as well as a few extra treats, can be found at The strides made by Ampere are exciting, and we eagerly await their implementation into consumer graphics in the future.

Recent Stories