AMD GameCache - Canny Marketing Or Critical Feature?

👤by Tim Harmer Comments 📅01.07.2019 19:43:08

When AMD's Ryzen 3000-series launches next week consumers will finally be able to buy into a ground-breaking new approach to consumer CPU design, but navigating all the marketing spiel will be difficult even with the aid of objective press reviews. With that in mind, we thought it best to explain one of the terms that'll be heard a lot in the next few weeks: AMD GameCache. First off however, a simplified rundown of the purpose of CPU cache is warranted.

In normal operation CPUs process instructions which are resident or formulated from data in system memory (AKA RAM). Unfortunately RAM is relatively slow; faster than an SSD or mechanical hard drive, but still pretty pedestrian nonetheless.

Minute-by-minute operation will have a CPU processing the same instructions multiple times, but not always in sequence. Repeatedly requesting data from system memory would therefore effectively throttle your CPU, so it makes sense for the CPU to have ready access to frequently used data.

Cache on a Zen 2 Core Complex (CCX)

Cache, typically physically located on the CPU die itself, serves as an intermediary data store for frequently used data that's much faster to access than system memory (i.e. it has lower latency). Modern CPUs have three types: fast but small L1 Cache, larger L2 cache, and the largest but slowest L3 cache.

When looking for data a common approach is for a processor to look in L1 first, then L2, then L3 and finally physical memory. Once the data is found it will be promoted to L1, pushing out data from L1 to L2 which then pushes more data out from L2 to L3. Because of this AMD Ryzen L2 and L3 are what are known as 'Victim' Caches: they're populated with data that's been pushed out of L1 over time.

Ideally you'd like a cache to be as large as possible, but they already occupy a lot of area on the processor die. The larger your die area the worse your manufacturing yields, and hence the more expensive any processor will be. That's a big strike against traditional so-called 'Monolithic' CPUs, where everything - CPU core, Cache, memory controller and more - is on a single die.

AMD Ryzen 3000-series Topology - 2 CCX's per CCD, to up to 2 CCD's per CPU

AMD's Zen 2 architecture is a radically different approach to CPU design, utilising a 'chiplet' rather than monolithic style. The CPU has one or two small CPU core chiplets (CCDs) and one larger IO Die (cIOD); CPU cache remains on the CCD but a new hop (i.e. communication performed over the GMI links) to the memory controller on the cIOD adds latency for system memory access.

Each CCD has two quad-core Core Complexes (CCXs) with their own hierarchical L1/L2/L3 cache structure. L1 and L2 cache are associated with only a single core, but L3 cache is shared across the whole Core Complex, and can even be accessed by cores in the neighbouring CCX (at slightly slower speed).

It's clear therefore that the new chiplet-style Zen 2 architecture will add a small but significant amount of latency to system memory access compared with Zen and Zen+. If major IPC improvements are your aim then that's a problem which needs to be compensated for. Hence AMD doubled the shared L3 Cache of a Zen 2 CCX from 2MB to 4MB, for a total of 16MB per CCX and 32MB per Core Chiplet Die; it's unlikely that would have been feasible on a monolithic CPU simply due to the increase in die area it would represent.

Performance Improvements

The impact to performance of each of Zen 2's architectural changes will depend greatly on whether the workload leans heavily on cache or system memory access, but AMD have at least shared their assessment of how doubling L3 cache affects effective memory latency and gaming performance.

Effect of increasing memory speed vs doubling L3 Cache on Ryzen 3000-series CPUs.
Baseline = Ryzen 3000-series CPU with 50% L3 cache disabled & memory at DDR4-2667.

According to AMD, doubling L3 cache in Zen 2 boosted perf. by as much as 21%. That's a better improvement than increasing DDR4 memory speeds from DDR4-2666 (default for Ryzen 3000-series CPUS) to DDR4-3600, despite Zen's proven scaling with memory speeds. While not necessarily a great measure, AMD calculate that the net reduction in effective memory latency is 33ns across representative workloads. And of course end-users can choose to overclock memory at the same time for compounded improvements in performance.

Given AMD's claims of a massive (20%+) net IPC gain in the transition from Zen+ to Zen 2, and gaming performance that rivals Intel's best consumer chips, it's clear that the transition to chiplets has been worth the drawbacks in their eyes. But we cannot ignore the possibility that there will be a class or classes of CPU workload that may not be as well suited to the new approach, due chiefly to a reliance on frequent memory access in a manner that doesn't benefit from caching.

Strong Reason for new Terminology?

With the Ryzen 3000-series' launch AMD are taking a page out of Intel's book by aggregating their L2 and L3 cache together into the previously mentioned marketing term they're calling GameCache, a parallel to Intel's Smartcache. Single Core Chiplet Die models with up to 8 cores are equipped with up to 36MB (512Kb L2 per Core plus 4MB L3 per 4-core CCX) GameCache, whereas the 16-core dual Core Chiplet Die models feature 72MB. Even if a CCX doesn't have all cores enabled (as in the 6-core and 12-core models) the full 4MB L3 cache of each CCX is available as a shared store.

It's been dubbed GameCache and prominently highlighted due to both a significant impact on game performance as outlined above, and the obvious (and flattering) parallels with Intel Smartcache. Let's not kid ourselves, if Coffee Lake CPUs had vastly more 'Smartcache' than AMD's GameCache it's unlikely the figure would be pushed quite so heavily.

The issue is that Intel and AMD architectures are so different now that direct comparison between these two isolated specifications is almost meaningless. You can't simply say 'the i9-9900K has 16MB Smartcache but Ryzen 9 3950X has 72MB GameCache so the 3950X is clearly better,' even if such a huge differential looks great on promotional material. Actual performance is what should matter the most, but if it did then marketing as a whole would be a fruitless exercise.

In isolation however the mammoth 'GameCache' size of each of the next-gen Ryzen CPUs is a vitally important factor in its design. It highlights a technology transition from Zen to Zen 2, and is a more accessible term most descriptors of the new architecture. As a result it's a pretty useful shorthand that could be much more important if AMD choose to substantially broaden their CPU stack in future generations, particularly in low-power and budget segments.

So AMD gets a pass on this one, for at least as long as Intel continue to use Smartcache. And, if nothing else, GameCache is a great jumping off point for learning more about the complexities and quirks of AMD's Zen 2 architecture.

AMD's Ryzen 3000-series CPUs go on sale this Sunday, July 7th alongside X570 motherboards and Radeon 5700-series GPU. They include models ranging from 6 to 12 cores, while the 16-core Ryzen 9 3950X will be available later this summer.