### Floating point data types

In C on an x86 system:- A
`float`

is 32 bits. - A
`double`

is 64 bits. - A
`long double`

is compiler-dependent. A`long double`

in gcc contains 80 bits of data, although it's stored as 128 bits (16 bytes) to maintain word alignment. Visual C++ treats a`long double`

as 64 bits, just like a double (and has been criticized for doing so; see here and here). I had been under the impression that`long double`

s were nonstandard or were a recent addition to the standard, but as far as I can tell, they've been around since before C99.

### Selecting a floating point data type

See this portion of another post.### Floating point storage

This is covered on many sites; the most concise and thorough description I've found is in chapter 2 of Sun's Numerical Computation Guide.### NaNs and Infinity

See my previous post.### Comparing floating point numbers

A simple equality comparison (such as`a == b`

) will often fail, even for values which you would expect to be equal, due to rounding errors (especially rounding introduced by converting between binary and decimal). (Do any compilers or static code analysis tools emit warnings if you try to do a naive equality comparison?)- The simplest solution is to check if two floating point numbers are within a small developer-chosen epsilon value of each other; this is the approach used by the CUnit and CppUnit testing frameworks, but it has the disadvantage of requiring you to choose an epsilon value yourself and make sure that it's appropriate for the data being compared.
- "Comparing Floating Point Numbers," by Bruce Dawson, discusses this absolute epsilon approach as well as more sophisticated approaches such as comparing using a relative epsilon and comparing based on the number of possible floating point values that could occur between the two operands.
- The simplest good approach, provided by CERT, is to compare using compiler-provided absolute epsilons. (As I understand it, CERT's approach should be equivalent to Dawson's approach with a max ulps of 1.)

### Handling floating point exceptions

See this post.### Low-level floating point calculations

If you need to do floating point work at the assembly level, use Intel Software Developer's Manuals as a reference. Volume 1 has some background on the FPU; volume 2A contains most floating point instructions (since they start with F).### Example code

Rather than simple math, the following code covers manipulating floating point numbers' bit representations, handling NaNs and infinities, and so on.- This earlier post has several code snippets.
- The Google C++ Testing Framework has a FloatingPoint class template in include/gtest/internal/gtest-internal.h and an associated FloatingPointTest test fixture in test/gtest_unittest.cc. Although FloatingPoint is intended for use by the framework, it should be general enough to be useable elsewhere.

### For further reading

In no particular order...- "What Every Computer Scientist Should Know About Floating-Point Arithmetic," by David Goldberg et al. Very thorough and math-intensive. Despite the title, I haven't finished reading this paper.
- "Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic," by W. Kahan. Kahan is the primary architect of the IEEE 754 floating point standard; the section entitled "Ruminations on Programming Languages" is particularly interesting.
- "Comparing Floating Point Numbers," by Bruce Dawson. Also referenced above.
- The Floating Point Arithmetic section of the CERT Secure Coding Standards. Primarily a list of "gotchas" which experienced programmers hopefully already know to avoid.
- Wikipedia's article on floating point and linked articles. Honestly, I've found Wikipedia's coverage to be a bit sprawling; hence my attempt at a shorter, more targeted list.

**EDIT:** (3/15/2009) Added sections on "Selecting a floating point data type" and "Handling floating point exceptions."

## No comments:

Post a Comment