AMD Crimson Bug Locks Fan Speed, Overheats GPUs Due To Software Conflict
AMD's Radeon Software Crimson Edition was released to a hugely positive reception just last week but already appears to be suffering from the lack of an open beta testing phase. Four days ago reports began to surface of a small bug that has critical ramifications, a bug which may already have bricked a number of overclocked non-reference cards due to heat damage.
Eagle-eyed Reddit users of AMD GPUs reported that, having updated to RSCE and running monitoring tools, their GPU was getting pretty hot. Abnormally hot. Higher than 90C on a R9 280X hot. But all the while the GPU fan wasn't kicking into a higher gear to compensate, as would normally be the case with either custom fan profiles or fan speeds set to auto.
A quick delve into the Crimson Edition Global Overdrive Settings made the problem very clear indeed: for some reason the fan speed had defaulted and locked to 20% (or other value), rather than either operating on manufacturer-specified auto speeds or a custom fan profile. Playing games or benchmarking for anything more than a short period would result in temperatures rocketing to high levels, especially on cards with sub-standard coolers. Changing the setting back to 'Off' would return behaviour to normal, but only until the next system restart.
It's believe that the problem may be caused by a conflict with 3rd party overclocking software such as MSI Afterburner, Sapphire Trixx and GPU Tweak and only with non-reference cards, but that isn't confirmed. Theoretically confusion over which fan setting should take precedence may cause this behaviour, although it's surprising that the resolved state would be one which locks the fan speed rather than setting it to automated behaviour.
Unfortunately not everyone was lucky enough to catch the problem early. Threads on both /r/AMD and /r/PCMasterRace are littered with reports of bricked R9 280 and 290 GPUs, both of which would be out of warranty at this stage. More worrying still is that GPUs aren't heavily throttled or automatically shut down long before temps rise to dangerous levels.
For their part AMD acknowledged the problem on social media yesterday and will be rolling out a hotfix today. This isn't a quick as anyone would like, least of all AMD themselves, but in falling over the Thanksgiving weekend the situation has turned out to be something of a perfect storm. You should be safe if you don't use 3rd party OC tools and have a reference card, but until the fix is issued you should probably hold off on gaming just in case.
This is not the first time driver releases have borked cards, and overheating in general has been an isolated issue for both camps over the years. Famously in 2010 a uncapped frame rate in Starcraft 2's menu caused NVIDIA GPUs to overheat. Hopefully AMD will have a fix in place for users soon, but there's no word on compensation for owners of damaged cards.