AMD Ryzen 9 3900X Review

👤by Matthew Hodgson  Comments 📅07-07-19

Zen 2 Explained

AMD Ryzen 3000-series CPUs

Developed under the codenamed 'Matisse', AMD's Ryzen 3000-series succeeds the 'Pinnacle Ridge' 2000-series CPUs for desktop and is the first consumer design to incorporate the Zen 2 microarchitecture. Unlike Zen+’s refresh of the prior family based on a slightly more efficient manufacturing process, Zen 2 is an substantial update to the CPU core and radical new approach to chip design as a whole.

Two aspects of Zen 2 are integral to the new CPU design: TSMC's 7nm lithographic process, and a 'chiplet' ethos which does beyond even Threadripper’s parameters. In tandem they offer better chip manufacturing yields, higher operating frequencies, better price competitiveness and more cores while maintain CPU socket continuity.

The Matisse CPU - Flexible Design for Desktop

A Matisse CPU is comprised of two or optionally three 'chiplet' dies that between them cover all the bases of a conventional CPU SoC. There are two main classes of Zen 2 chiplet: the Core Chiplet die (CCD), and the IO Chiplet die (cIOD); each Matisse CPU may have up to two CCDs and will always have one cIOD.

The CCD is manufactured using TSMC's 7nm process and is identical to the core chiplet found in Rome, the Zen 2 workstation CPU. This allows for great economies of scale and efficient binning to make the very best use out of every die manufactured and maintain affordability, despite 7nm being on the cutting edge of lithographic technology.

Core Chiplet functionality, i.e. the CPU Cores and Cache Structures, benefits significantly from the transition to 7nm; CPU operating frequency and voltages tend to scale better as the manufacturing process reduces in size. The same is not true for other aspects of the CPU, namely memory, PCI-Express and other I/O controllers which gain relatively little from a die shrink.

In contrast, the cIOD is manufactured at GlobalFoundries on their 12nm process incorporates the dual-channel memory controller, PCIe signalling and SoC functionality. Due to the different requirements of desktop and workstation platforms (particularly memory channels, PCIe lanes and sheer number of connected CCDs) they differ significantly from the equivalent chiplet on Rome.

Communication between CCD and cIOD is across a GMI link, a development of AMD's Infinity Fabric technology. Now on a clock domain decoupled from memory clock speeds, it's also more stable at higher frequencies.

Updates to the Zen CCX in Zen 2

Almost every aspect of the Zen core architecture has been improved, augmented or buffed to a mirror shine with Zen 2. Each change has been driven by a requirement to increase IPC and reduce latency metrics, two key relative weaknesses of the Zen architecture prior to Matisse.

Key Changes:

Fetch Cycle updated to reduce mispredict rate in branch prediction:
- New Tagged Geometric (TAGE) branch predictor
- Larger Branch Target Buffers
- Larger 1K indirect target array
Decode Process incorporates Op cache improvements to increase effective throughput
- Doubled capacity to 4K infused Instructions
Floating Point Unit (FPU) Point & Load Store doubled to 256b, mul latency improved
- Native support for AVX2 256b instructions
Improved many aspects of Integer Execution, including SMT fairness
Broad Cache updates:
- Double L1 Cache Load/Store bandwidth to 32B/clk
- L3 Cache size doubled

Zen 2 carries forward AMD's Simultaneous Multithreading approach to processing two threads per physical CPU core. It proved to be a very efficient approach in Zen/Zen+, and benefits from other improved aspects of the Zen 2 architecture.

The first major departure from Zen is a doubling of L3 Cache to 4MB per core/16MB per Core Complex (CCX). This larger cache mitigates physical memory latency deficiencies in the architecture and more generally serves to significantly improve Instructions Per Clock (IPC). CCX L3 Cache is shared across all active cores in the CCX, while L2 and L1 remain exclusive to their associated core.

We've previously discussed AMD's enlarged L3 Cache, dubbed 'GameCache', here.

Each CPU core now also has double the L1 Cache load/store bandwidth, serving to eek out further IPC improvements alongside deeper queues and larger Op caches. Fetch and Pretech techniques have also been tweaked in an attempt to reduce overall misprediction rates.

Despite the increase in L3 Cache size, physically the Zen 2 CCX is 47% smaller than the Zen CCX at just 31mm^2 (72mm^2 per CCD), representing the huge density improvement offered by 7nm production. Two of these quad-core CCX's are on each Core Chiplet Die, although individual cores may be disabled depending on the SKU.

Thanks to a more beefy Floating Point Unit, Zen 2 also now natively supports AVX2 256bit instructions. These workloads are important for certain rendering, video encode/decode and cryptographic applications, and are yet another weakness that the latest generation attempts to smooth out.

In a general sense, the new Zen 2 CPU Core and Core Complex is beefier and more complex. While ostensibly new, it's also maturation of Zen that applies lessons learned directly to the new architectural approach in a comprehensive manner. It's difficult to imagine, for instance, that it would have been possible to manufacture a 16-core monolithic chip with 72MB Cache on even 7nm economically.

Matisse cIOD - Memory and More

Zen 2’s cIOD is a critical component of the Matisse CPU design that may get lost in a focus naturally skewed towards the CCD. Larger than the CCD and built on 12nm, it takes care of the vast array of Memory and IO requirements of the Matisse SoC.

Default dual-channel DDR4 memory support now extends to DDR4-3200 (from DDR4-2933) in capacities up to 128GB (from 64GB). Reported memory overclocks have reached far beyond this point, necessitating the introduction of a Memory : Infinity Fabric clock speed ratio rather than maintaining a tight coupling to the two clock domains. An effective upper limit of almost 1870MHz to Infinity Fabric stable operating speeds means that an automatic 1:2 ratio kicks in above DDR4-3733MT/s, stepping down the IF/GMI link speeds to 1/2 of memory. DDR4-3733 has been described as a ‘sweet spot’ for Ryzen 3000-series memory overclocking, i.e. a point at which memory and GMI link combine to the lowest latencies while remaining stable.

In principle, these changes could make Ryzen 3000-series immensely popular chips for enthusiasts and overclockers. Zen CPUs have always been sensitive to memory and IF speeds, each of which meaningfully improved benchmark results when their competition's performance responsiveness to memory clock was relatively low by comparison. Pushing these clocks higher and tweaking the GMI/IF frequency to maximise stability would be a welcome outlet for expertise slightly undermined by the quality of dynamic core overclocking now available on CPUs by default.

PCI-Express 4.0 support is derived from the cIOD, as are other SoC features. The standard doubles the effective bandwidth to almost 2GB/sec per lane while maintaining backwards compatibility with legacy PCIe devices. The chip as a whole offers 16 PCIe 4.0 lanes for discrete graphics, four lanes for NVMe storage, and four lanes for connection between CPU and Chipset (which itself will offer improved functionality with PCIe 4.0 peripheral devices). Four native USB 3.1 10 Gbps ports are also available from the cIOD, bolstered by more from the chipset on most desktop motherboards. The platform as a whole offers yet more PCIe 4.0 Lanes muxed off from the motherboard chipset for additional high-speed NVMe storage or other functionality.

Differences between 7nm and 12nm manufacturing and packaging techniques meant that AMD needed to take an idiosyncratic approach to bringing together Matisse. Even PCIe 4.0's more stringent signalling and hence material requirements had an impact on the physical aspects of the chip. Maintaining backwards compatiblity with Socket AM4 imposed more hurdles to overcome.

For all the benefits of a chiplet design, the Matisse CPU remains exceptionally complex due to signal pathing, mounting differences between 7nm and 12nm chiplets, a new 12-layer substrate, and far tighter tolerances than were in place for Zen and Zen+.