Written by Tim Harmer
High Bandwidth Memory – a First Look
GDDR5 is broken. Okay, that’s a little hyperbolic… it’s more accurate to say that GDDR5 is old, and after having served its time needs to be retired sooner rather than later. Thankfully one of the worst kept secrets in computing today is that graphics card designers and manufactures have been aware of this fact for some considerable time, and thankfully AMD are now willing to peer above the proverbial parapet and talk about the next step in performance memory for graphics cards: High Bandwidth Memory.
High Bandwidth Memory, or HBM as it’s become known, has been in development for around seven years. Not coincidentally GDDR5 itself was introduced in 2008, but you get the feeling that even time time was ticking away on this DRAM standard. However it’s was only in 2014 when the first serious announcements were made of HBM for graphics, and that in the form of NVIDIA putting it firmly on their public roadmap. If we only had them to go by we’d be waiting for Pascal, the follow-up to Maxwell, and who knows when that may arrive.
Thankfully NVIDIA aren’t the only horse in this race, but in a rather refreshing approach AMD have chosen to wait until it can realistically be implemented before announcing their intentions. Yes, that’s right, they’re ready to go with HBM, and it will be part of their next-generation graphics architecture alongside a GPU which is understood to use the codename ‘Fiji’. Exactly what SKU they’ll have - 300-series, 400-, etc. – is irrelevant to this fact, but the fact that consumer cards are imminent means that they’re able to share a little more about their thinking with regards to HBM, as well as their initial implementation.
The Problem of GDDR5 and memory-hungry trends
GDDR5 is an old technology which, like GDDR4, was based on DDR3. Graphics shipments incorporating GDDR5 memory began in 2008 with the AMD Radeon HD4870, a card that swiftly became a leader in its class due (in part) to the increased memory bandwidth this transition allowed. However the intervening years have been some of the longest without a revision in the underlying graphics technology used, with GDD4 and GDDR3 lasting for only 3 years each respectively (at the performance end of the spectrum). The reasons behind this are straightforward.
Memory bandwidth with GDDR5 VRAM is improved through in two ways central to the way such memory operates: increasing memory clock frequencies, and increasing the overall memory bus width. The latter has been favoured by AMD, including the recent R9 290X with a 512-bit bus width and memory clocked at (only) an effective 5GHz, whilst NVIDIA has taken the opposite approach to achieve broadly similar overall bandwidths. Unfortunately, even as games and compute requirements demand ever larger frame buffers and faster retrieval, GDDR5 is reaching its limit.
The problem is that to achieve these ever higher read/write rates you either need faster memory or more complex logic controllers, even as industry trends dictate a reduced overall system TDP. Voltage requirements for higher memory frequencies increase non-linearly (i.e. there are diminishing returns), whilst improving the efficiency of logic processing is non-trivial. You can see where we’re going with this.
Once we made do with 512MB of VRAM, and we were content. Now however games are cramped by even the roomy 4GB you’ll find on a R9 290X or GTX 980, especially as we reach resolutions beyond 1080p and toward high-fidelity 4K, but relatively inexpensive DRAM has meant ballooning specs are possible. However memory power requirements aren’t insignificant, and become all the more pressing with every gigabyte leap in frame buffer size.
The trade-off often comes in reduced power for the GPU as both AMD and NVIDIA try to keep under the TDP limitation imposed by the PCI-Express standards, which at a critical point means reduced system performance, all because of the demand for larger textures and higher resolutions. AMD believe that tipping point has almost arrived with current GDDR5 implementations. Hence the proposed solution: High Bandwidth Memory.
High Bandwidth Memory – AMD Implementation
As outlined, there are a number of ways you can approach bringing higher bandwidth to graphics systems, each with pros and cons. One seized on by certain CPU and SoC developers is to embed DRAM on-die, but this is quite cost prohibitive and balloons the size of the chip die. AMD needed an alternative take on the problem.
In partnership with memory manufacturer SK Hynix, AMD hit on an approach which was distinctly different from both embedded DRAM and existing VRAM solutions, but in a way that learned from both.
The breakthrough came with the development of ‘through silicon-via’ (TSV) technology. Normally DRAM chips are distributed around the GPU in a rather space-hungry array, each connecting to the GPU independently through a logic die. TSV by contrast allows the stacking of (exceedingly thin) memory chips rather than office clocks, with signalling passing through and between layers to a logic die at the base of the stack.
A follow-up step is to place multiple stacks on an interposer substrate, around the central processor (be it a GPU, CPU or SoC). By dint of their size they can be in much closer proximity to the GPU than standard DRAM, with the majority signalling passing through the interposer layer, and as a result you shave power requirements.
Closer proximity between processor and memory has other benefits. Most notably the bus width per chip can be orders of magnitude higher than GDDR5, with AMD indicating that 1,024-bits per stack will be the norm compared with 32-bits per chip. Furthermore signalling is simpler and doesn’t need to be clocked as highly, streamlining logic and reducing operating voltage and hence overall power.
The result is a memory system which on paper is better in virtually every metric. The critical bandwidth/watt measure, critical for servers but also important with respect to that trade-off we mentioned earlier, as much as triples with HBM compared to GDDR5. This means that AMD have more power to work with on their GPU, but also indicates that there is overhead to play with as the technology matures. Absolute video memory bandwidth can also be improved, but this may primarily come from later generations.
Since the first prototyping of HBM AMD have cultivated partnerships with ASE, Amkor and UMC to produce this interposer solution in high volumes, ensuring that they will have sufficient quantities of these unassuming but critical components.
Enthusiasts may be concerned that the new memory standard could dispense with overclocking, but that doesn’t seem to be the case. AMD were quick to assure us that simplified clocks and changes to the logic won’t dispense with overclocking; if anything it could become more straightforward. Overclocking with GDDR5 was relatively difficult as GPU and memory clocks are assymetric, however a simplified logic and low memory clock makes overclocking not only possible but also potentially more rewarding.
Physical Consequences
HBM stacks have a smaller surface area than traditional DRAM chips, and AMD raise the prospect of replacing the need for four cumbersome DRAM chips with one HBM stack and save around 94% the realestate. As a result the entire interposer layer, on which the GPU and memory sits, would be around 50% the footprint of current generation performance GPU+DRAM packages.
AMD provide a great illustrative example. The combined surface area of the memory and GPU components on their Radeon R9 290X is 9900mm-squared, whilst the footprint of the HBM-based next generation GPU+memory package is only 4900mm-squared. Of course there are other components on a graphics cards but this saving is far too great to be a mere curiosity.
Given roughly the same heat generated (two performance cards will have roughly the same TDP after all) means a more compact cooling solution is possible. You might recall a very compact card design leaked earlier this month with liquid cooling system, and this is a strong indication that such a card does exist. Naturally the use of HBM may give rise to other gfx card form factors, and this flexibility is just another one of the benefits of HBM.
Right now you’re probably thinking ‘ah, but what about the different heights of the GPU and DRAM stack, won’t that be a problem for heatsinks?’ Well, not so much as it turns out. The heights of HBM DRAM stacks and the GPU are pretty close, to the extent that AMD claim the differences are within the tolerences that thermal pads even out. If anything cooling DRAM components should be far more straightforward due to a simplified and compact layout, resulting in small heatsinks and waterblocks.
Challenges of HBM Approach
Before you think that everything is flowers and daydreams however it’s important to recognise that introducing HBM isn’t without its risks. Although AMD are sure that now is the time it’s worth remembering that they’re only introducing HBM within their performance range, rather than extending it throughout their lineup. This may be down to a number of factors, including an unwillingness to introduce new mid-range GPUs without first moving down to 16/14nm, but for now the province of HBM is the top-end only.
Partially this is down to cost. HBM is a new technology with only modest DRAM production capacity at this time; performance parts have a margin which can accommodate added costs that mid-range and budget GPUs don’t. Even though the tight power requirements of low to mid-range GPUs mean that introducing HBM could have the largest performance impact in these sectors it’s not feasible to introduce it. Yet.
One other factor which is perhaps more pressing, and certainly more self-evident, is one purely of frame buffer size. HBMv1 is limited to a maximum of 4GB, and the first generation of AMD GPUs that use the technology will commonly have two and four gigabyte configurations. However it will have escaped no-ones attention that some current AAA titles are utilising more than 4GB VRAM even at 1440p, scaling far higher at 4K resolutions.
Both AMD and NVIDIA recognised that large frame buffers were critical for high resolutions and the best texture qualities, factors which are big selling points for performance GPUs. NVIDIA’s TITAN X for example has 12GB VRAM, whilst upcoming cards may well come with 6GB as standard. AMD reference specifications for the flagship R9 290X allocate 4GB per GPU on their performance cards, and there is a continual push for more.
AMD claim better memory management will allow them to offset the potential impact of lower frame buffer size compared to their competition, but that remains to be seen. With the market currently fixated on 4K, as well as a new focus on super-sampling anti-aliasing methods which render at high resolutions and then scale down the output frame (i.e. AMD VSR) not being capable of these resolutions on your flagship would be an unwise choice.
Finally, we must reserve space to mention AMD partners, who will face challenges of their own. Whilst the new approach could allow for a far more compact cooling package, board partners have typically added value to their offerings through the more elaborate cooling solutions, each of which have heatsinks with high fin density in order to maximise heat dissipation. It shall be interesting to see how companies like ASUS, GIGABYTE and MSI innovate in this regard, and how many (if any) go down the liquid cooling path. Equally they may opt for redesigned heatsinks of similar magnitude to those already in use.
Conclusion
Moving to HBM is a risk, but then it was always going to be. AMD should be applauded for once again innovating ahead of the competition, but it remains to be seen if they are rewarded for early adoption.
For all the self-evident benefits HBM brings to the table, cost and performance will remain the key to gaining market share. At no time has AMD more needed a genuine, out-and-out win, but with the competition sticking to the status quo for the time being AMD have the opportunity to really take advantage of an opening.
As promising as the technology is, HBM may well come into its own in later iterations and GPU generations, bringing higher memory power efficiency and larger frame buffers to 16/14nm GPUs of the future. Greater power efficiency throughout the product stack seems like a no-brainer, whilst the applications to APUs and by extension heterogeneous computing as a whole are intriguing. AMD’s Zen architecture is due at some point next year and it shall be interesting to see if HBM makes an appearance.
Proof of the pudding is in the eating, but AMD’s next generation to be revealed as early as the 16th June, we hopefully won’t have much longer to wait.