Monday, December 8, 2008

NaNs, Uninitialized Variables, and C++

I've become a big fan of fail fast programming, or what Matthew Wilson colorfully refers to as "hairshirt programming." In other words, if I make a mistake in my code, or if I use a piece of code in a way that violates the assumptions under which it was written, I want to know as soon as possible. C++ is fairly good at doing this at compile time, thanks to a static, reasonably strong, extensible type system, and thanks to add-on features like BOOST_STATIC_ASSERT. For problems that can't be detected at compile time, the liberal use of assertions can help enforce preconditions and postconditions, and classes like std::string and boost::array add error checking over C-style strings and arrays. When I first read about signaling NaNs, I thought that they would be a wonderful addition to this fail fast toolbox. As documented, a signaling NaN is basically magic value that, when assigned to a floating point variable, would cause any attempt to use that variable to throw an immediate exception. This could be an immensely useful tool for tracking down uninitialized variable use and for tracing the operation of legacy code (to answer questions such as, "Is this variable really unused along this code path?"). Unfortunately, signaling NaNs don't work nearly as well as advertised. But first, some background... NaNs are used in the IEEE 754 floating point standard to describe indeterminate forms such as 0/0, ∞ - ∞, 0 * ∞, and so on. NaNs are not the same thing as infinity (either positive or negative infinity). Attempting to do arithmetic on a NaN (almost always) results in another NaN. So if 0/0 is not a number, then 0/0 + 1 is also not a number, and 0/0 * 100 is not a number, and so on. Under some circumstances, this NaN propagation may be what you want; under other circumstances, you may want to catch NaNs as soon as possible (to invoke error recovery, or to launch a separate calculation path, or whatever). Presumably to accommodate this, IEEE 754 distinguishes between quiet NaNs (which quietly propagate when used) and signaling NaNs (which throw exceptions when used). NaNs are represented by storing certain bit patterns in place of a "normal" floating point number. For details, see here. Interestingly, many of the bits in a NaN encoding are unused. (I.e., a NaN can be represented by many different bit patterns.) I've seen suggestions that the unused bits could be used to encode a line number or module number, although I've seen no code that takes advantage of this. C++ supports quiet and signaling NaNs through its std::numeric_limits class template. For example:
#include <limits>
#include <iostream>

int main(int argc, char *argv[])
{
  if (std::numeric_limits<double>::has_quiet_NaN) {
      double d = std::numeric_limits<double>::quiet_NaN();
      std::cout << "The bit pattern of a quiet NaN on your platform is 0x"
           << std::hex << *((long long int*)&d) << endl;
  } else {
      std::cout << "You do not have a quiet NaN for doubles" << endl;
  }
}
Similarly, std::numeric_limits<double>::signaling_NaN() returns a signaling NaN. Unfortunately, as far as I can tell, this feature of the Standard C++ Library is completely useless:
  • If floating point exceptions are enabled, then processing a signaling NaN raises an exception. std::numeric_limits<double>::signaling_NaN()'s returning a signaling NaN counts as processing it.
  • If floating point exceptions are disabled, then processing a signaling NaN transforms it to a quiet NaN.
You can store a signaling NaN to a variable by directly assigning the appropriate bit pattern:
void set_snan(double& f)
{
  *((long long*)&f) = 0x7ff0000000000001;
}
Or, depending on your C++ library implementation, you might find it simpler to steal its signaling_NaN() implementation:
// Assign a signaling NaN using Dinkumware's implementation; works in MSVC
// and C++Builder.
void set_snan(double& f)
{
  memcpy(&f, &_Snan._Double, sizeof(f));
}
Even after you take care of assigning a signaling NaN, making sure that it actually signals is non-trivial:
  • It appears to be somewhat compiler-specific which actions actually raise an exception. Assigning the value of one variable to another does not raise an exception in CodeGear C++Builder but does in Visual C++. The rule seems to be that "anything involving the FPU" can raise an exception if signaling NaNs are involved, and assignment may be done through the FPU or may be done as a memory copy.
  • Raising an exception only occurs if "invalid operation" exceptions are enabled, and there is no portable way of controlling floating point exceptions. C99 specifies the fetestexcept family of functions, at least some of which are also specified in TR1, but of the compilers I tested, only gcc (but not Apple's version!) supports fetestexcept. Visual C++ provides _control87 and _controlfp, which is rather low-level for my taste. CodeGear C++Builder provides Math::SetMaskException, which uses Object Pascal-inspired sets instead of C-style bitmask manipulations but is completely nonportable. If all else fails, you'll have to set the FPU's control word using inline assembly; see here.
  • The floating point exception mask is, as best I can tell, part of the per-thread CPU state. This means that third-party code which your code uses can change it, violating your expectations of which floating point operations will raise exceptions.
To summarize:
  • There is no portable way to assign a signaling NaN.
  • Even if there were, there is no portable way to make sure that performing an operation on a signaling NaN actually signals.
  • It is not entirely straightforward which operations will cause a signaling NaN to signal.
  • Even if you get all of the above right, third-party code may go and mess it up.
So, to conclude, signaling NaNs are probably not a good approach to handling uninitialized variables for the "fail fast" toolbox. In fact, I don't know what practical use signaling NaNs have in the C++ standard at all. Is there a good solution for catching uninitialized variable use? I haven't yet had time to test out options against my code base to see what would work for me, but a few possibilities come to mind:
  • Compiler warnings help, although code complexity can easily exceed a compiler's ability to track uninitialized variables. GCC suffers an additional disadvantage; its uninitialized variable detection depends on its code flow analysis, and code flow analysis is only enabled if optimizations are enabled, so you often won't see these warnings in debug builds.
  • boost::optional works, although that's not a drop-in replacement
  • Never have an uninitialized variable again: use STLSoft's must_init by appending _init_t to all of your fundamental data types (int → int_init_t, double → double_init_t, etc.). Invasive, but an interesting idea...
  • When I asked this question on Stack Overflow, this class was suggested as one possible answer.
  • Signaling NaNs are perhaps the easiest approach and remain an option, in spite of their shortcomings.
Finally, for anyone who's persistent enough to have read this far, here's a complete test program for playing with the various issues raised in this posting. Tested on Visual C++ 2008 and Debian 4.0's g++.
#include <float.h>
#include <iostream>
#include <sstream>
#include <limits>
#include <iomanip>

#if defined(__unix)
#include <fenv.h>
#endif

#if defined(_YMATH)

// (Ab)use Dinkumware's implementation if found.

void set_snan(long double& f)
{
 memcpy(&f, &_LSnan._Long_double, sizeof(f));
}

void set_snan(double& f)
{
  memcpy(&f, &_Snan._Double, sizeof(f));
}

void set_snan(float& f)
{
  memcpy(&f, &_FSnan._Float, sizeof(f));
}

#else

// Add some type safety to our evil, non-portable bit-flipping.
#include <boost/static_assert.hp>
BOOST_STATIC_ASSERT(
  sizeof(long double) == sizeof(long long) + sizeof(long)
  && sizeof(double) == sizeof(long long)
  && sizeof(float) == sizeof(long));

void set_snan(long double& f)
{
  *((long long*)&f) = 0x0000000000000001LL;
  *((long*)&f + 2) = 0x7fff;
}

void set_snan(double& f)
{
  *((long long*)&f) = 0x7ff0000000000001LL;
}

void set_snan(float& f)
{
  *((long*)&f) = 0x7f800001L;
}

#endif

// Return a string containing p's raw bits, in hex value.
// Assume little endian.
template<typename T>
std::string ascii_bits(const T& p)
{
  std::ostringstream o;
  o << "0x" << std::setfill('0');
  for (int i = sizeof(p) - 1; i >= 0; i--) {
    o << std::hex << std::setw(2)
   << int(reinterpret_cast<const unsigned char*>(&p)[i]);
  }
  return o.str();
}

using namespace std;

int main(int argc, char* argv[])
{
  typedef double TYPE_TO_TEST;
  TYPE_TO_TEST f, g;

  // Enable exceptions.  A real app would be more selective and may
  // need to save the previous mask.
#if !defined(__unix)
 _control87(0, _EM_INVALID);
#else
  feenableexcept(FE_ALL_EXCEPT);
#endif

  f = std::numeric_limits<TYPE_TO_TEST>::quiet_NaN();
  cout << "Has quiet NaN?  "
       << std::numeric_limits<TYPE_TO_TEST>::has_quiet_NaN << endl;
  cout << "Quiet NaN is printed like this: " << f << endl;
  cout << "Bit pattern for quiet NaN:      "
       << ascii_bits(f) << endl;

  set_snan(f);

  cout << "Has signaling NaN?  "
       << std::numeric_limits<TYPE_TO_TEST>::has_signaling_NaN << endl;
  cout << "Bit pattern for signaling NaN:  " << ascii_bits(f) << endl;

  g = f;

  cout << "Depending on your compiler, you may see this." << endl;

  g = f + 1;

  cout << "You should never see this." << endl;

  return 0;
}