Original URL: http://www.reghardware.co.uk/2006/04/28/amd_opteron_fpu_bug/
Exclusive AMD today admitted it has inadvertently allowed a number of 2.6GHz and 2.8GHz single-core Opteron x52 and x54 processors that could corrupt data under extreme conditions to escape into the wild.
It is believed that the glitch is triggered when the affected chip's FPU is made to loop through a series of memory-fetch, multiplication and addition operations without any condition checks on the result of the calculations. The loop has to run over and over again for long enough to cause localised heating which together with high ambient temperatures could combine to cause the result of the operation to be recorded incorrectly, leading to data corruption.
To trigger the effect, the loop has to be run millions of time, an AMD customer source told Reg Hardware, potentially for hours at a time with no other operations being introduced during the run.
According to the source - who claimed to be party to emails highlighting the issue and sent by AMD to a number of the chip maker's major customers and partners - AMD has investigated the problem and found it was only able to reproduce the bug's effects in a synthetic benchmark test.
The problem is believed to affect only a fraction - perhaps no more than 3,000 individual CPUs - which managed to slip through AMD's screening net. It is not known how this so-called 'test escape' ocurred, but it took place "in part of 2005 and early 2006", an AMD spokesman said.
AMD said it has introduced another screening test to catch any further affected parts. Chips caught in this test in future will be re-rated at a lower clock speed to prevent the problem. The company is also working with OEMs to identify affected parts and contact customers who could be affected - if they are, they will be offered free replacements.
AMD stressed the problem was due to "a convergence of three specific simultaneous conditions", not a fault with the Opteron architecture. The company claimed the issue had not been observed on systems running commercially available applications.
"It's very hard to imagine this type of [tight FP loop] code in our [financial services] environment," Reg Hardware's source said. "The only thing I could think that would be coded this way would be some type of strange cipher code. For example, any type of 'for' loop that uses a compare operation would not have the problem." ®
AMD unveils 'next gen' CPU plans (16 May 2006)
http://www.reghardware.co.uk/2006/05/16/amd_next_gen/
AMD VP sees more Opteron growth and a Dell win (10 May 2006)
http://www.theregister.co.uk/2006/05/10/amd_henri_interview/
Cray places order for Opteron Helper (3 May 2006)
http://www.theregister.co.uk/2006/05/03/cray_drc/
AMD quad-core Opterons to gain L3 cache in 2008? (3 May 2006)
http://www.reghardware.co.uk/2006/05/03/amd_quad-core_roadmap/
AMD reportedly 'delays' dual-core Turion debut (3 May 2006)
http://www.reghardware.co.uk/2006/05/03/amd_reschedules_dual-core_turion/
Investors practice tough love with AMD (13 April 2006)
http://www.theregister.co.uk/2006/04/13/amd_slip/
Opteron carries AMD to banner Q1 (12 April 2006)
http://www.theregister.co.uk/2006/04/12/amd_q1/
AMD readies Opteron 2xx, 8xx speed bump (30 March 2006)
http://www.reghardware.co.uk/2006/03/30/amd_preps_opteron_update/
AMD ponders maths boost to quad-core Opterons (15 March 2006)
http://www.reghardware.co.uk/2006/03/15/amd_clearspeed_opteron_maths_co-pro/
AMD kicks Opterons up a notch (6 March 2006)
http://www.reghardware.co.uk/2006/03/06/opteron_boost/
'All' AMD CPUs to support virtualisation mid-2006 (7 February 2006)
http://www.reghardware.co.uk/2006/02/07/amd_pacifica_support/
AMD quad-cores to use DDR 2-oriented Socket F interconnect (27 January 2006)
http://www.reghardware.co.uk/2006/01/27/amd_quad-core_socket_f/
AMD's server share is no chimera (26 January 2006)
http://www.reghardware.co.uk/2006/01/26/amd_q4_2005_market_share/