The cause of the death of Intel processors is not over, the problem continues. Intel found another trigger

It might have seemed like Intel’s Raptor Lake (13th and 14th generation Desktop Core) instability and crashes were over—and Intel would have liked it to be—when BIOSes with microcode fixes started rolling out last month. But that’s not the case. The company has now issued a new statement saying that even the previous fixes are still not enough and owners will need one more update to prevent these processors from slowly failing.

On Wednesday, Intel released an update on the problem of instability and degradation of Raptor Lake processors on its community website (where it has been publishing information about the whole thing since the beginning, although it probably deserves more visibility due to the impact). This follows information from July in which Intel confirmed that the instability is a hardware issue caused by degradation due to the high voltages that processors are subjected to during normal use. We discussed it in previous articles.

Intel said at the time that analysis was ongoing to rule out other possible factors and confirm that the issue was indeed fully identified (and fixed). This week, the company added more information and confirmed that it has completed its investigation and is now officially confirming that the high voltage issue is indeed the root cause. This means that the possibility that there are other factors behind the instability that have not yet been resolved should be ruled out. So that’s good news.

Four causes of degradation

However, part of this report is also that other factors have been discovered that actually deliver this already identified cause (high voltage) to processors. Although there is only one “killer weapon”, it is used in the processor in several ways. Intel now states that excessive voltage supplied to the processor can arise from a total of four reasons. The first three factors are already addressed in some ways:

1) Increasing CPU consumption above recommended values. Intel still considers this to be part of the problem, and it means that the company is still saying that boards should have power limits set according to the “Intel Default Settings” profiles, not unlocked limits leading to higher performance. This means that the processors will definitely be underpowered compared to where they were in the reviews at release.

2) The second factor is that the processors ignored the “regulation” that the maximum boost frequency available with the so-called Thermal Velocity Boost should only be used at a temperature of up to 70 °C. That’s how it’s supposed to be officially, but we’ve mostly seen in reviews that this condition has been ignored in order to achieve higher performance and better benchmark scores. However, Thermal Velocity Boost also increases the voltage, and in combination with the higher temperature, the risk of chip damage also increases.

Intel said this summer that this was indeed a problem. Although this specification violation has been well known for some two years, Intel has now referred to the matter as a “bug found in microcode”. However, it is possible that, just like in the case of no-one-respected consumption limits, it was a deliberate attempt to increase performance by actual overclocking, but without an official acknowledgment.

Anyway, this “bug” has been fixed by microcode 0×125. It only applies to Core i9 models, for which the fix may slightly worsen the maximum performance in single-threaded programs (because Thermal Velocity Boost won’t be activated as often as before when working correctly).

3) The third cause of excessively high voltages, so far described as the main one in the sense that it has probably the largest share of responsibility, is the voltage control in the processors itself. This was too careless and allowed the processor to demand dangerously high voltages in an attempt to compensate for undervoltage (Vdroop) during sudden load changes. It turned out, however, that instead of just compensating the dip, high voltage peaks were actually formed, which gradually damage the processor until it stops working correctly.

Intel euphemistically refers to this degradation as “Vmin shift”, meaning that it gradually increases the voltage the processor needs to function correctly. But this does not mean anything other than degradation, half-pathically speaking, what happens is that the CPU suddenly behaves as if it is overclocked and requires a higher voltage (or underclocking) for stability, even if it is not overclocked. As the condition gradually worsens, the added voltage would have to be higher and higher, the CU consumption gets worse, and sooner or later the processor will essentially fail.

Intel’s solution was that voltage management would limit the processor’s requirements to a maximum of 1.55V, which should prevent dangerous spikes. However, it is difficult to say now whether they will be eliminated completely or only to a sufficient extent to significantly reduce the degradation. The fix came in a microcode update 0×129 and it is strongly recommended that you apply it by uploading the latest BIOS to the board.

Intel Statement on Raptor Lake Processor Instability and Degradation Issue (September 25, 2024)

Credit: Intel, Cnews image

Discovered another cause leading to dangerous voltage, there will be another update

4) However, Intel recently added another, separate fourth problem, which apparently means that to eliminate dangerous voltage spikes, engineers needed to make another change in voltage management. It is not entirely clear what exactly it consists of, Intel writes that the problem is that “microcode and BIOS required (from voltage regulators) voltage too high“.

This wording probably means that the algorithms taking care of voltage control (and selection) were not precise enough or did not react quickly enough to fluctuations and had to be changed in various ways – it may not be a matter of fixing any one specific “bug”. According to Intel, the changes should mainly concern the behavior of the processor during low loads and inactivity. This definitely confirms that the processors were not only damaged by a long high load at high temperatures and consumptions, but those dangerous voltage fluctuations were perhaps most to blame for the rapid transitions between idleness and boost, as we concluded in previous articles. Therefore, you don’t have to worry if you don’t use the processor for demanding things and gaming, but “just for normal things”.

But whatever the details of this voltage control error, it will again be addressed by a microcode fix. This time the patch update will have a label 0×12B. This patch version incorporates the previous patches 0x125 and 0x129 (it’s a hexadecimal number, 12B is more than 129). So look for microcode version 0x12B or higher when you want to verify that you have already fixed your computer.

Intel Core i9-13900K

Intel Core i9–13900K

Author: Ľubomír Samák

This microcode patch will be distributed as part of the motherboard’s UEFI (BIOS) update, so it must be delivered to the processor via this route. The CPU microcode itself cannot be replaced permanently, it works in such a way that the board updates it when the PC starts up. Raptor Lake processors will therefore always have to be operated in boards with corrected BIOSes, otherwise their (self) damage will continue again.

Intel is now working with motherboard manufacturers to release BIOSes that include this fix, but we don’t know when exactly they should be out – the company says it could take several weeks. If you have a 13th or 14th generation Core processor, check for a BIOS update from your motherboard manufacturer in the next let’s say two months, and always install it as soon as possible. The fact that it contains 0x12B microcode will probably usually be mentioned in the change description. You will need a BIOS update with a fix even if you have a branded computer such as HP, Acer, Asus, Dell, Lenovo and so on, in which case the update must be provided by the PC manufacturer.

The fix may slow down the processor slightly

Intel admits that the fix will have some negative impact on performance. The company says that according to its internal measurements, the drop in performance will usually be small, within normal testing variation. But that doesn’t mean CPUs won’t be consistently slower. If there was no impact on performance, Intel would choose a different wording (that “no significant impact on performance is expected”). Here, it is probably somehow visible in the benchmarks.

The changes in voltage management probably require the ramp-up of maximum turbo boost, which is dependent on sufficient voltage, to be slowed down a bit – the timing changes will probably be less aggressive now. Performance impacts are unlikely to occur for long-lasting, regular CPU loads (on higher i7 and i9 models), but may appear for short tasks or where the load keeps changing and interrupting. Anyway, take this as a necessary evil, there’s no point in rejecting the update because of this performance degradation.

Next-gen Arrow Lake should be trouble-free

Intel further reiterated that the problem only affects desktop Core 13th and 14th generation processors (Raptor Lake, Raptor Lake Refresh), but not notebook models. And according to the company, it will not affect the new generation of Core Ultra 200 (Arrow Lake) processors on the LGA 1851 platform, which will be released next month.

Source: Intel

Source: www.cnews.cz