AIBs Respond to GeForce RTX 30-Series 'POSCAPs' Issue

👤by Tim Harmer Comments 📅28.09.2020 21:25:26


No-one could describe the launch window for NVIDIA's GeForce RTX 30-series as untroubled. After website and stock problems affecting retailers worldwide (including their own official webstore), not all appears to be progressing seamlessly even now they're in the hands of consumers. Dotted across forums - both official and on Reddit - have been reports of instability with some members of the RTX 3080 family, not even a fortnight after the first cards were officially available. And now the collective wisdom of the internet has seized on a culprit, finally forcing NVIDIA's AIB partners (if not NVIDIA themselves) to respond.

GPU instability when gaming is notoriously tough to pin down as it often exhibits itself as strange geometry errors and unexplained crashes to desktop. There's no error code and very little to diagnose, leading many to blame one of a myriad of factors involved with little or no evidence. So what's different here?

Speculation arose due to a common thread in many of the complaints - crashes occurred when the GPU boosted to operate at speeds approaching and exceeding 2GHz, and an array of capacitors on the rear of the GPU socket utilised components that differed from the Founder's Edition layout.


The design spec for the RTX 3080. Credit: Igor's Lab


'Weaker' cards tended to utilise an array of six polymer capacitors commonly referred to as POSCAPs or SP-CAPs (POSCAP is a brand name for a particular type of large black polymer capacitor, but has become synonymous with this story). 'Stronger' ones utilised four or five POSCAPs and one or two arrays of smaller MLCC capcitors. NVIDIA's Founders Edition models made use of two 10-MLCC capacitor arrays.

The problem is that teething issues are nothing new in the hardware launch cycle, and often blown out of all proportion. And yet this particular explanation is being treated with credulity.

The role of capacitors

Igor at Igor's Lab has done much of the foundational work on this story, detailed here and with a more technical follow-up here. We strongly recommend that you to check them out for a deeper technical discussion of this topic and its root causes.

To summarise, capacitors have a dual role in this particular aspect of graphics card design. It's already commonly understood that they 'clean up' potentially noisy voltage signals as they reach the sensitive GPU (hence why they're located as close to the GPU as possible), noise which is accentuated as a factor in stability as the platform reaches its design tolerances.

Their other major role is to also serve as a buffer when the GPU demands greater supply in an exceptionally short period of time. Voltage regulators aren't able to instantaneously respond to higher GPU power demands, which can cause a precipitous drop in voltage on the 12V line. By draining the capacitors during the short time it takes the voltage regulator to adapt a stable supply is maintained on the 12V line, at least until the capacitors are out of charge.

POSCAPs are slow to buffer and discharge, but tend to hold lots of charge. MLCCs are much faster in both respects, but typically hold much less. Combining the two helps to maintain a consistent supply that's also reactive to extreme changes in GPU load. It's theorised that equipping the card with no MLCCs means that the card is unable to maintain a stable 12V when the GPU dynamically boosts from a relatively low P-State to a high P-State.



Due to either a miscommunication or misinterpretation of the NVIDIA specification document some AIBs utilise the weaker capacitor configuration of 6 POSCAPs over a stronger one that mixes POSCAPs and MLCCs, potentially in an effort to save on costs (both in terms of materials and complications to the production line). Others have utilised a full 6 arrays of 10 MLCCs.

To be clear, some partner cards with an all-POSCAP solution may have been able to handle the large power draw spikes due to specific unique conditions - high quality GPU silicon requiring less power to reach higher frequencies, or relatively high ambient temperatures preventing attaining the highest P-states. It all makes diagnosing this issue all the more difficult.

AIB Responses.

At the end of last week the story was still developing, with NVIDIA and AIBs both silent to press inquiries. The first to respond officially was EVGA, who issued the following statement on their user forums:

Recently there has been some discussion about the EVGA GeForce RTX 3080 series.

During our mass production QC testing we discovered a full 6 POSCAPs solution cannot pass the real world applications testing. It took almost a week of R&D effort to find the cause and reduce the POSCAPs to 4 and add 20 MLCC caps prior to shipping production boards, this is why the EVGA GeForce RTX 3080 FTW3 series was delayed at launch. There were no 6 POSCAP production EVGA GeForce RTX 3080 FTW3 boards shipped.

But, due to the time crunch, some of the reviewers were sent a pre-production version with 6 POSCAP’s, we are working with those reviewers directly to replace their boards with production versions. EVGA GeForce RTX 3080 XC3 series with 5 POSCAPs + 10 MLCC solution is matched with the XC3 spec without issues.

Also note that we have updated the product pictures at EVGA.com to reflect the production components that shipped to gamers and enthusiasts since day 1 of product launch.
Once you receive the card you can compare for yourself, EVGA stands behind its products!


This is the first and thus-far only official confirmation that in some cases the 6-POSCAP solution didn't meet the stability requirements of internal QA testing. EVGA are careful not to mention the RTX 3080 more widely, focussing only on their SKUs.

GALAX and Gainward responded similarly on their Chinese support forums, addressing their own products specifically:



ASUS may not feel the need to enter the conversation - their TUF Gaming RTX 3080 OC features 6 arrays of 10 MLCCs each and no POSCAPs, appearing to rule them out of contention for this issue. ZOTAC, GIGABYTE and MSI remain quiet for now, with the latter today quietly updating official product photos of their RTX 3080 GAMING cards to show configurations with two arrays of 10 MLCCs rather than just one array previously. Colorful withdrew review samples prior to launch citing unaccounted for instabilities.



All that being said, this seems like an issue that's fixable and should be resolved in future card revisions. Firmware and driver updates for affected cards might be issued to knock down or eliminate the top GPU Boost bin if that's the trigger for instability; an updated card design can be introduced for future variants (as occurred for the EVGA RTX 3080 FTW3); and AIBs will become more able to assess the capabilities of their silicon before binning appropriately. In fact it's likely that each of these are currently in the pipeline across all manufacturers.

So should we all be decrying NVIDIA in this? It would certainly be straightforward to do so: a design specification document that was insufficiently robust, and a QA process that certified GPUs which didn't meet their guidelines. Equally the affected AIBs could be in the firing line; who are they to play fast and loose with NVIDIA's guidelines in an effort to pinch pennies on an $800 card? Before throwing any party under the bus however remember that this is all speculation; a full accounting is likely to take many more weeks.

Mountains out of Molehills?


NVIDIA's RTX 30-series Founders Edition family coming up smelling of roses?


As noted, reports of problems with a product launch aren't uncommon. Astute readers may remember 2018's RTX 20-series launch where rumours of poor voltage regulation quickly spread across the internet, only for retailers to report that card return rates weren't above industry norms and in fact lower in many instances.

More technically-minded members of the tech. press have taken it upon themselves to begin detailed testing of their review samples, searching for differences in the behaviour of cards with different capacitor specs. More confident conclusions will likely be made by those parties later this week.

The matter is complicated by the RTX 3080 & 3090's stringent PSU requirements compared to all prior GPU generations. Eradicating that particular variable while maintaining all other conditions won't be easy, particularly when reports are effectively being crowd-sourced. It's still possible that weaker PSUs are the root cause of the instabilities rather than the component selection by AIBs

For now the only responsible course to take is to note that this continues to be speculation. Instability when overclocking, even to modest levels, is not uncommon; but returning a card that fails at stock settings is well within your rights as a consumer. It's wise to remember that.

In the mean time, early reports are that today's GeForce 456.55 Game Ready drivers alleviate some of the symptoms described. Hopefully this will turn out to be a long-term fix with negligible performance impact.

SOURCES: Igor's Labs, [url=]]Absolutely Hardcore Overclocking[/url], VideoCardz


      Please share your thoughts by commenting below!

Comments

Recent Stories