Inside the Pentium II Math Bug

By Robert R. Collins

 


Just two days before its biggest processor announcement in years, Intel was hit by reports of a math bug in its Pentium Pro and (the soon to be announced) Pentium II processors. The bad timing prompted reports that the bug disclosure was deliberately timed to coincide with the Pentium II announcement, thereby maximizing the embarrassment to Intel. Another early rumor put AMD behind the bug report. Yet another industry rumor said that the Pentium II used for the tests was illegally obtained. As intriguing as these theories may be, none of them are true. How do I know? Because I wrote the bug report. The bug was known as the Dan-0411 bug by the news media and Internet community. Intel had its own name for it - the Flag Erratum.

The Facts

I received e-mail from "Dan" who asked if I could reproduce what he thought was a bug in the Pentium Pro. After contemplating my involvement for ten days, I finally decided to help out (see the accompanying text box). I wrote an assembly-language program that checked into the problem. I ran the test on Pentium Pro, Pentium II, Pentium classic (P54C), Pentium MMX (P55C), and AMD K6 processors. (I had purchased the Pentium II over the counter at Fry's Electronics in Sunnyvale, California, six weeks before its official introduction. There was nothing illegal about the acquisition of the Pentium II processor.) After running the test on these various processors, I came to the conclusion that a bug did exist in the Pentium Pro and Pentium II.

Why Dan-0411? These days, astronomers name new stars and comets by combining the discoverer's name and some number. Why should microprocessor bugs be different? In this case, "Dan" is the discoverer of the bug, and 04-11 (1997) is the date on which I got my first e-mail about it. So I've named the bug "Dan - 0411" after its discoverer and the date he first reported it to me. (Please refer to http://www.rcollins.org/secrets/Dan0411.html for the text of the original bug announcement.)

What is the Bug and What Does it Affect?

The bug relates to operations that convert floating-point numbers into integer numbers. All floating-point numbers are stored inside of the microprocessor in an 80-bit format. Even though the external representation of a number may not be an 80-bit format, once the number is loaded into the microprocessor, it is converted to an 80-bit format. Integer numbers are stored externally in two different sizes. A short integer is stored in 16 bits, and a long integer is stored in 32 bits. It is often desirable to store the floating-point numbers as integer numbers. On occasion, the converted numbers won't fit into the smaller integer format. This is when the bug occurs.

The host software are is supposed to be warned by the microprocessor when such a floating-point conversion error occurs; a specific error flag is supposed to be set in a floating-point status register. If the microprocessor fails to set this flag, it does not comply with the IEEE Floating Point Standards, which mandate such behavior. For the Dan-0411 bug, the Pentium II and Pentium Pro fail to set this error flag in many cases. It is interesting to note that a launch failure of the Ariane 5 rocket, which happened less than a minute into the launch, was traced to behavior around an overflow condition. In this case, it was a software bug, not a microprocessor bug, that caused the problem. One of the computers on board had a floating-point to integer conversion that had overflowed. The overflow was not expected and, therefore, not detected by the computer software. As a result, the computer did a dump of its memory. Unfortunately, this memory dump was interpreted by the rocket as instructions to its rocket nozzles. Result: Boom!

The case of the Ariane rocket is a sensational example of the drastic consequences of an unhandled float-to-integer overflow. Pentium Pro and Pentium II users, on the other hand, are most likely to see the results of this bug in their graphics displays or in heavy-duty numerical analysis programs. Intel says ordinary users might see a temporary screen glitch on some games when this bug occurs.

The Nature of the Bug

The Dan-0411 bug occurs when a large negative floating-point number is stored to memory in an integer format. Under normal operation, the largest negative integer (MAXNEG) is stored in memory when a floating-point number is too large to fit in the integer format. The FPU Status Word is supposed to indicate that an Invalid Operand Exception (#IE) occurred (FSW.IE = 1). Floating-point numbers that overflow the "real number" format are supposed to behave differently than floating-point numbers that overflow the "integer number" format. Float-to-real overflows are supposed to set the overflow flag (FSW.OE=1); Float-to-integer overflows are supposed to set the Invalid Operand Exception flag (FSW.IE). Section 7.8.4 of the Pentium Pro Family Developer's Manual, Volume 2 makes this difference quite clear:

Float-to-real overflows:

The FPU reports a floating-point numeric overflow exception (#O) whenever the rounded result of an arithmetic instruction exceeds the largest allowable Finite value that will fit into the real format of the destination operand. For example, if the destination format is extended-real (80 bits), overflow occurs when the rounded result falls outside the unbiased range of -1.0 x 216384 to 1.0 x 216384 (exclusive). Numeric overflow can occur on arithmetic operations where the result is stored in an FPU data register. It can also occur on store real operations (with the FST and FSTP instructions), where a within-range value in a data register is stored in memory in a single-or double-real format. The overflow threshold range for the single-real format is -1.0 x 2128 to 1.0 x 2128, the range for the double-real format is -1.0 x 21024 to 1.0 x 21024.

Float-to-integer overflows:

The numeric overflow exception cannot occur when overflow occurs when storing values in an integer or BCD integer format. Instead, the invalid-arithmetic-operand exception is signaled.

Instead of setting the Invalid Operand Exception (FSW.IE) bit, only the precision exception (FSW.PE) bit is set. The precision-exception flag indicates that a computation can't be precisely represented by the floating-point operation - in this case, the float-to-integer store operation. In most cases, this bit is ignored by programmers. Therefore, when the conditions are met for the Dan-0411 bug to occur, programmers may never know that an error occurred. If that isn't bad enough, it gets worse. The Dan - 0411 bug occurs for three out of four rounding modes, and when exceptions are either masked or unmasked. In the case of masked exceptions, the correct value is stored to memory; only the Floating-Point Status Word (FSW) is incorrectly set. For unmasked exceptions, the errant behavior is more serious.

  • No exception occurs. The floating-point exception handler is not invoked. Therefore, the errant condition is undetectable.
  • MAXNEG is returned to memory. Storing MAXNEG to memory is an errant condition. When exceptions are unmasked, nothing is supposed to he stored to memory. This means the microprocessor is spuriously storing data to memory when no data is expected.
  • In the case of the FISTP instruction, the floating-point value is popped from the floating-point stack. When exceptions are unmasked, the floating-point stack is supposed to remain unchanged to allow for error recovery. In this case, the value is popped from the stack and gone forever. Even if the errant condition was detectable, it would be unrecoverable after the FISTP instruction.

The Chronology of Dan-0411

Friday, April 11th, 1997. "Dan" sends me e-mail, saying he thinks he's discovered a math bug in the Pentium Pro. Dan doesn't have the means to write assembly-language source code to verify the bug, so he contacts me. Initially, I decide not to get involved because this could he a real hot potato. I've already had my problems with Intel and I don't want any more - especially if this turns out to be a serious bug. My web site receives a typical 31,258 hits on this day.

Monday, April 21st, 1997. After more pressure from Dan and discussing the possible consequences with friends, I decide to write the source code that determines whether or not Dan has found a bug. Intel stock closes at approximately $140 per share. My web site receives 29,795 hits.

Tuesday, April 22nd, 1997. I send Dan confirmation that he has found a bug in the Pentium Pro. I also tell Dan that I have access to a Pentium II, and the bug also appears in Intel's newest microprocessor. Intel stock closes at near $142 per share. My web site receives 30,707 hits.

Tuesday, April 29th, 1997. I post a message to comp.sys.intel saying that I think a new math bug existed in the Pentium Pro and Pentium II (I expected my message would be largely ignored as a "troll" for a flame fight. As I shortly found out, mixing the words "MAJOR," "FLOATING POINT BUG," and "PENTIUM PRO" in a message subject line is a volatile combination of words.) Intel stock closes near $150 per share. I receive 35,720 hits.

Wednesday, Apri1 30th, 1997. An Intel representative calls me on the phone wanting any information I can to give him. I tell him that I'm pretty sure it's a bug, but that I don't have Dan's permission to give him any further information Intel stock closes near $154 per share. My web site receives 37,959 hits.

Thursday, May 1st, 1997. I start getting strange phone calls about the bug. The first comes from an "investor" with a British accent who only cares about the authenticity of the Usenet report. After I try to explain the possible insignificance of the bug, he says "I don't care, I'm investor," then promptly hangs up. Fifteen minutes later, I receive a second phone call - this time from a man claiming to be an attorney. He claims to represent clients who file class-action product-defect lawsuits. I don't bite his bait and become suspicious. I ask who his client is and why he's poking around this bug issue. The "lawyer" becomes defensive and refuses to tell me who he works for or to explain his interest in this bug report. Finally, he threatens me with a libel lawsuit if I can t prove that a math bug really exists in the Pentium Pro and Pentium II. I automatically assume that this lawyer is somehow connected with the investor who called me a few minutes earlier. This just couldn't be a coincidence.

Unnerved by the phone call, I call my contact at Intel. He denies that any Intel attorneys are involved. He promises to keep the attorneys off my back. He uses the opportunity to press me for further details about the bug. Again I decline, citing my lack of Dan's permission to give him any more information. Intel stock closes near $155 per share. My web site receives 42,231 hits.

Friday, May 2nd, 1997. CNet reporter Brooke Crothers sends e-mail to ask me for confirmation of the bug report Crothers doesn't wait for my response, and posts the breaking story on the CNet web site (http://www.cnet.com/). Intel contacts me one last time before the coming weekend. Again, I decline to give any further information. At 11:00 PM, I get a call to notify me about the CNet article. I had no prior knowledge of its existence. Intel stock closes near $158 per share. My web site receives 45,272 hits.

Saturday, May 3rd, 1997. With the CNet article published, the threat of a libel lawsuit begins to weigh heavily on me. I know that I must write the bug report or risk legal consequences. I convince Dan that the report must be made public I decide to post a message on Usenet informing the Internet community that the bug report will he made public on May 5th, at 0900 PST. I send a copy of the message to Intel. The stock market is closed for the weekend. Typically, my web site traffic decreases by 20%. Instead, activity increases as the pressure to publish the bug builds. My web site receives 52,581 hits.

Sunday, May 4th, 1997. I work all day Saturday and Sunday perfecting the article and giving it to various individuals for peer review. I also perfect the assembly-language source code and decide to offer it, as well as binary executable programs, to anybody who wants to detect the hug.

By 10 00 PM, I have packaged the article and source code for publication. Sunday is typically the slowest day at my web site. I usually receive fewer than 20,000 hits. On this day, I receive 39,654 hits.

Monday, May 5th, 1997. I make some finishing touches to the article I'm finally finished at 0830 PST, just 30 minutes before publication I send a copy to Intel before the official publication time). Even though the 0900 hour hasn't come, reporters are already calling me in my office and on my cellular telephone. I don't give any advance information, yet their articles run before the officia1 0900 publication time.

Within minutes of publication, e-mail messages start rolling in from various press organizations CNN, MSNBC, Wall Street Journal, EE Times, CNet, PC Week, and others. Luckily, I forget to take my cellular phone with me and am unable to take their phone calls, thus, they send their interview requests via e-mail. With news of the bug report, web site activity soars Intel stock closes near $163 per share. My web site receives 349,418 hits.

The bug report goes international. I start getting interview requests from France, Germany, Australia, Japan, and other nations. Various attempts are made to discredit the bug report or to minimize it.

Tuesday, May 6th, 1997. The bug report shows up in various print publications. The online publications report the bug as a top story, eclipsing Intel's Pentium II announcement. Reports start surfacing that I'm getting revenge on Intel for its legal action against me. (No legal action ever occurred.) Another report surfaces that AMD is behind the bug report. Another report surfaces that this bug report was calculated to be timed with the Pentium II announcement (even though a rudimentary analysis of the chronology that appeared in the original article would have refuted such claims).

Two industry analysts downgrade Intel's stock from "buy" to "neutral" because of the bug report. After an initial surge in Intel's stock price, it loses five points to close near $162 per share. My web site receives an all-time record of 503,989 hits.

Wednesday, May 7th, 1997. I get a private e-mail from a guy (I'll call Mr. X) who claims that I may have made a mistake in my bug analysis. He claims that the bug may be much more severe than I had originally reported. He also claims to have spoken with me in the week prior, and that he's an investor in AMD, but not selling short on Intel stock. (Strange set of information) Within minutes, CNet sends me e-mail asking me to confirm this guy's claims. I wonder how CNet got a copy of this e-mail, and become suspicious. CNet claims Mr. X sent them a copy of his e-mail to me.

Intel's stock loses three more points to close under $160 per share. My web site receives 345,531 hits.

Later that night, I call CNet and ask a few questions of my own. I want to know how they learned of this story - reporting it three days before their competition. They tell me Mr. X tipped them off.

Thursday, May 8th, 1997. I confirm the claims by Mr. X. My original article contained an ambiguous paragraph that minimized the severity of the bug. I fixed the ambiguity and called CNet to confirm Mr. X's claims. By now, I m a little suspicious of Mr. X, and I wonder if CNet has actually talked to him. They confirm they have, and tell me he's got a British accent. I become convinced that Mr. X is the same investor who called me on May 1st. Both have a British accent and both claim to have spoken to me.

After I confirm the ambiguity in my original report, a second round of press reporting is sparked. Now, everybody's reporting that the bug is more severe than originally anticipated. Intel stock starts to regain some lost ground, closing near $161 per share. My web site receives 262,937 hits.

Friday, May 9th, 1997. I go on vacation I need to get away.

Intel posts its response after the stock market closes. The bug is much more severe than originally discovered during my analysis.

I get a phone call from a representative of Senator Torricelli's (D-NJ) office in Washington, DC, telling me about legislation introduced in the Senate that I might be interested in (not related to the bug, though). He mentions that my web site is reasonably well known in the Capitol. Intel stock continues to inch back up, closing near $162 per share. My web site receives 234,809 hits.

Saturday, May 10th, 1997. I appear on a computer radio show. Afterwards, life gets back to normal. The stock market is closed for the weekend. My web site receives 120,072 hits. Slowly, my web site traffic returns to normal, though "normal traffic" appears to he more than double the level it was before the bug was reported. My 15 minutes of fame are now over.

Why Wasn't this Bug Detected Before?

I'm not sure why this bug wasn't detected sooner, but there are clues that could help provide an explanation. Professor William Kahan of the University of California, Berkeley, has written a suite of floating-point test programs in FORTRAN (see http://http.cs.berkeley.edu/~wkahan/). These programs are commonly used to test the Float-to-Integer Store instructions (FIST and FISTP). Dan ported Dr. Kahan's FORTRAN programs to C and ran the tests against the Pentium Pro - this is when the bug came to light. So in the end, either Intel failed to run Dr. Kahan's test on the Pentium Pro, misconfigured the program, or a FORTRAN compiler hid the bug in the chip.

Source Code and Programs

Source code and two executable programs are available for download. The programs are executable versions of the stand-alone assembly-language source code. The first program, FISTBUG.EXE, demonstrates the bug in a straightforward manner. When you run the program, all that appears on the screen is either the simple message "*** Dan-0411 bug found. ***", or "Dan-0411 not found." The second program, FISTBUGV.EXE, runs the same exact tests as the first, but is much more verbose. This program shows the microprocessor stepping information and itemized results. Each operand under test is printed to the screen, along with pass/fail status for four different testing methods.


View results of FISTBUG

http://www.rcollins.org/ftp/source/fistbug/fistbug.res

Source Code Availability

View source code for FISTBUG.EXE and FISTBUGV.EXE
http://www.rcollins.org/ftp/source/fistbug/fistbug.asm
http://www.rcollins.org/ftp/source/fistbug/makefile

Executable Programs

Download FISTBUG.EXE and FISTBUGV.EXE binary executables.
http://www.rcollins.org/ftp/source/fistbug/fistbug.exe
http://www.rcollins.org/ftp/source/fistbug/fistbugv.exe
http://www.rcollins.org/secrets/Dan0411.zip

The Entire FISTBUG Archive

Download fistbug.zip archive. Archive contains source code, binary executables, and my results.
http://www.rcollins.org/ftp/dloads/fistbug.zip


Back to Dr. Dobb's Undocumented Corner home page