Does The GTX 970 Have Memory Addressing Issues? NVIDIA Responds.

👤by Tim Harmer Comments 📅26.01.2015 13:36:19
UPDATE:

After a discussion with Jonah Alben, NVIDIA’s Senior VP of GPU Engineering, Ryan Shrout at PC Perspective has a more comprehensive followup article addressing the concerns. The explanation goes into some depth, outlining both memory subsystem difference in the GTX 970 and 980 and the means NVIDIA used to overcome the problems fusing off some functionality can cause, and is well worth reading.

A (very) brief summary is as follows:

1) Whilst only 13 of the 16 possible SMMs are activated on each GTX 970, this lower number of SMMs isn't directly responsible for the differences in memory performance between the GTX 970 and 980.

2) Initial technical specifications released for the GTX 970 were incorrect, and it in fact has 56 ROPs and 1792 KB of L2 cache compared to 64 ROPs and 2048 KB of L2 cache for the GTX 980. However the impact of reduced ROPs is small due to 13 SMMs only being able to supply 52 pixels/clock rather than the 56 pixels/clock that 56 ROPs can accommodate.

3) Communication between the SMMs and memory subsystem is via an interface called a Crossbar. On the GTX 980 eight ports on the crossbar allow access to the L2 cache and memory, but the GTX 970 eliminates one block of L2 cache and its associated port. Under normal circumstances therefore it would only address 3.5GB of DRAM. Instead NVIDIA have split the available 4GB into two pools of 3.5GB and 0.5GB.

4) The 0.5GB block is substantially slower (1/7th the speed) of the 3.5GB block, and NVIDIA has given its use a low priority relative to the 3.5GB block. Even so, it's still substantially faster than system memory and is therefore still potentially useful.

5) Some monitoring and synthetic benchmarking applications cannot see the whole 4GB available, resulting in much of the conflicting information currently circulating.

6) NVIDIA assess that the performance impact of this relatively reduced memory bandwidth compared to the GTX 980 is on the order of 4-6%, and likely imperceptible in real-world scenarios.


NVIDIA will naturally be castigated for providing incorrect information to reviewers, blamed on a miscommunication between engineers and PR. That said, the greater concern is that the memory has reduced performance compared to the expectations of some. In that vein NVIDIA want to refer readers back to independent benchmark results which show the performance level a GTX 970 can attain, standing by the quality of their product.

Arguments will no doubt rage well into the night, but it's a relief to now have an official and logical explanation for the inconsistencies unearthed.

---




For some weeks persistent rumours have percolated through enthusiast forums that allege NVIDIA's GeForce GTX 970, a graphics card which has received near universal acclaim, is exhibiting problems in niche cases when trying to address more than 3.5GB of video memory.

Rumours began when GTX 970 owners began to analyse the performance of their card in more depth, specifically looking at memory load with high-end resolution and image quality settings in their particular game of choice, but have been continuously muddled with questions over 32-bit and 64-bit addressing. Concrete evidence as been tough to come by. Many sites unofficially benchmarked their own GTX 970 in scenarios that load the VRAM, to outcomes which often appear (internally at least) conflicting. In some games the close to the full 4GB would be allocated, whilst in others the GTX 970 would top out at 3.5GB, and this would differ on a title-by-title basis. Perhaps one to chalk up simply to inconsistent data collection tools, or perhaps not.

Grumbling spiked at the end of last week. Firstly this video was linked widely showing severe stuttering when the user upped the memory usage to over 3.5GB*, and secondly a homebrewed tool written by German coder Nai was released. The tool, specifically for NVIDIA's CUDA platform and making use of NVIDIA dll's, progressively loaded data into video memory and recorded the rate at which that data was written and read. Nai collected results from both a GTX 980 and GTX 970 which indicated that the 970's final 500-600MB block of memory had a much reduced read/write rate than the previous 3.5GB, and that this contrasted to uniform results for the GTX 980.

It's known that the GTX 970 utilises the same Maxwell-based GM204 GPU as the GTX 980, but has three of the sixteen total SMMs fused off. It's therefore been theorised that this process also fundamentally changes how much VRAM the GPU can address directly and that this would impact performance. Some go as far as to say that the card shouldn't have been marketed as a 4GB design if this flaw was known, levelling greater criticism at NVIDIA despite the performance record of the 970 thus far.

Responding to PCPer on Friday NVIDIA said:

The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory. However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section. The GPU has higher priority access to the 3.5GB section. When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands. When a game requires more than 3.5GB of memory then we use both segments.

We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment. The best way to test that is to look at game performance. Compare a GTX 980 to a 970 on a game that uses less than 3.5GB. Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.

Here’s an example of some performance data:



On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.


Unsurprisingly NVIDIA focus on performance, but have also stated that a more comprehensive statement is being prepared.

Source: PCPer, Guru3D (Includes modified version of Nai's testing tool), GeForce.com

*A previous version of this article described the video as showing 'sever graphical artefacting'. The creator of the video clarified that he observed excessive stuttering in gameplay, but graphical artefacting was a result of the recording process and not observable in-game. We apologise for this error and have corrected the article.


Recent Stories

« GTX 960 G1 Gaming Release Extends GIGABYTE's Lineup · Does The GTX 970 Have Memory Addressing Issues? NVIDIA Responds. · Gearbox Wants Your Help Making The Next Borderlands Game »