THREADRIPPER ARCHITECTUREAs is well known by know, AMD’s Threadripper CPUs take form through a very idiosyncratic layout. Nonetheless, the DNA of their desktop Ryzen line continues to run through the HEDT platform.
Each Threadripper CPU incorporates four discrete dies, rather than utilising a single monolithic core. Each die is very similar to that found in a desktop Ryzen CPU, incorporating two Core Complex (CCX) modules with four cores each. It’s this structure which gives Zen its enviable scalability, without levying the costs of transitioning to a single monolithic die with huge core numbers (as in Intel Skylake-X).
The ‘glue’ which holds all this together is known as Infinity Fabric. This high bandwidth, low-latency I/O channel runs at the speed of DRAM and allows cores to communicate not only between CCX’s, but also between dies.
First Generation Threadripper had four dies per package, but two were disabled. As a result a maximum of 16 cores were available on the flagship, as well as quad-channel DDR4 memory (thanks to each active die also having its own memory controller). This design meant that mechanically a Threadripper CPU would be much more stable - the oversized heatspreader would be supported by four dies rather than the two, allowing for more secure heatsink mounting and more inherent rigidity.
The same package design and similar socket has also been used in AMD’s EPYC server CPUs. The difference here is that EPYC processors incorporate up to four fully active dies rather than just the two, as well as eight-channel DDR4 memory support and 128 PCI-Express 3.0 lanes.
That being said, what’s new with 2nd Generation Threadripper?
UP TO FOUR ENABLED DIESLike their EPYC cousins, the new Threadripper CPUs with more than 16 cores have all four dies enabled, rather than just two. Like EPYC, as many as 32-cores are available to the platform, rather than just 16. And just like the first generation, the use of smaller dies connected by Infinity Fabric has lower associated costs and higher yields than a single large die with the same number of cores.
Despite featuring four enabled dies, certain features remain exclusive to EPYC. The most obvious is memory support. 2nd Gen. Threadripper remains a quad-channel platform, and two dies have had their memory access routed through the others which are directly connected to system memory.
The flagship Threadripper 2990WX has all eight processing cores active per die, giving it a total of 32 cores and support for 64 threads. In contrast the 2970WX has six cores enabled per die (three per CCX) for a total of 24 cores and 48 threads. Nonetheless the 2970WX still has access to the entire L3 cache available on the CPU – 64MB, just as much as the 2990WX.
TSMC 12NM MANUFACTURING PROCESSLike 2nd Generation Ryzen, Threadripper’s refreshed CPU lineup are manufactured using TSMC’s 12nm FinFET process. This has allowed AMD to push up core clock speeds relative to the first generation, as shown by an increase of 300MHz in the maximum (non-XFR) boost frequency of the 12-core 2920X relative to the 1920X. The TDP envelope of these new counterparts to the previous generation have remained the same, so no new exotic cooling is necessary.
This transition to 12nm is particularly important for the high core count processors. These CPUs run relatively high clock-speeds compared to the server/enterprise EPYC chips, primarily to maintain strong performance in gaming (a key use-case for Threadripper, but not so for EPYC). Even on 12nm, the TDP envelope of the 2990WX is 250W; any higher and even high-end air cooling may not have been suitable.
It’s not all about power efficiency. Improved transistor packing has reduced the L1, L2 and L3 cache latencies by ~ 13%, 34% and 16% respectively, an aspect that will have implications no matter the workload.
PERFORMANCE BOOST 2AMD Performance Boost was introduced with first generation Ryzen as a means of controlling per-core frequencies beyond the base level. Boost frequencies had an upper cap, but operating frequencies at any given time were controlled through the PB algorithm which analysed workloads across the CPU package. Optimum Performance Boost frequencies kicked in when up to two cores were under heavy load; this pair of cores were boosted to a high frequency, other cores remained at the base level.
Inherited from 2nd Generation Ryzen, Performance Boost 2 more rigorously governs the operating frequency of the CPU depending on the load each core is under and sensor data collected through [url= https://www.amd.com/en/technologies/sense-mi]AMD SenseMI[/url]. This implementation is far more aggressive than the initial iteration, holding the clocks higher for longer under heavier multi-core workloads. Plus, it goes one step further.
While Performance Boost pushed two cores to the maximum boost frequency when only those two were under heavy load, Performance Boost 2 will boost raise the frequency on every core that is under load even if more than two are being taxed. Not only is it therefore more appropriate for strongly multi-threaded workloads (for example video rendering), it doesn’t penalise workloads where few cores are under load (such as gaming).