Sunday, February 22, 2009

Things I Didn't Know about C++: Local Classes

C++ is a complex language, and although I thought I knew it pretty well, I'm continuing to find areas of the language that I either didn't know or didn't understand well enough. So, on the (possibly narcissistic) assumption that others may not know enough about them too, here's a brief series of postings about them.

First up are local classes, which are not the same thing as nested classes.

class imitation_string {
public:
  // This is a nested class.  Other code can then use imitation_string::iterator.
  class iterator {
  public:
    iterator& operator++();
    iterator& operator--();
    char& operator*();
  };
};

void string_tokenizer(const imitation_string& in, std::vector<imitation_string>& out)
{
  // This is a local class.  It can only be used from within this function.
  class token {
    // Insert class definition...
  };
}

Local classes are closely related to nested (local) functions, which are not permitted in C++.

// Not valid C++.
int f()
{
  int i = 0;
  void g()
  {
    // A nested function has access to its enclosing function's variables.
    i++;
  }
  g();
  return i;
}

The position of local classes within the design of C++ seems very awkward to me. "True" nested functions require special implementation from the compiler, due to the requirement that they be able to access their enclosing functions' stack frames, and calling a nested function (via a function pointer) after its enclosing function has exited is a quick way to crash a program. This might be reason enough to omit them from the C++ language, but it didn't stop the GNU C Compiler from implementing them as an extension. Stroustrup dismisses local functions by saying that "most often, the use of a local function is a sign that a function is too large," but that didn't stop local classes from being permitted (with the same "too large" caveat). Nested functions occupy a much more comfortable position in other languages; Pascal (and therefore Object Pascal and Delphi) have supported them for ages, they're a key development technique in Javascript (used for everything from closures to jQuery event handlers), lambda expressions offer equivalent functionality in languages which support them, and so on.

Even ignoring their awkward position, local classes in C++ have some odd traits. First, they can neither be used as template parameters (so no smart pointers to local classes). Local class templates and local template methods are likewise prohibited. Second, although local class methods cannot access the enclosing function's variables (unlike a nested function), they can access static local variables within the enclosing function.

// Valid C++!
int f()
{
  static int i;
  class g {
  public:
    static void Execute() { i++; }
  };
  g::Execute();
  return i;
}

Similarly, local classes within a class method can access static members (even protected and private static members) of the enclosing class, and such access is implicitly scoped. They can also access protected and private members of friends of the enclosing class.

class F {
public:
  DoStuff();
private:
  static int i;
};

int F::DoStuff()
{
  class G {
  public:
    static void Execute() { i++; }
  };
  return i;
}

Various workarounds have been proposed for local classes' inability to access local variables of their enclosing functions. Herb Sutter gives a rundown of the various options in GotW #58. His final solution is worth mention; in an bit of C++ judo, he turns the enclosing function into a class, whose constructor contains the function body and which is implicitly convertible to the desired return value. The function's local variables become member variables, and the local functions / nested classes become class methods that can access these member variables.

Local classes have two main uses. First, they can be used to augment the regular control flow of a function. For example, Boost's new ScopeExit library lets you write arbitrary code to be executed whenever a code block exits (whether it exits by finishing, an explicit return statement, or throwing an exception). It implements this by defining a local class whose destructor executes the code RAII-fashion. The Google C++ Testing Framework offers another example of augmenting control flow with local classes. A unit testing framework needs both a way to abort the current test if a test assertion fails and (ideally) a way to signify that certain failed assertions are expected. The standard way to do this is to have failed assertions handled by throwing exceptions, and expected failures can be caught and handled. However, a design goal of Google Test is to avoid requiring the use of exceptions for maximum portability. Failed assertions are handled by a simple return statement. Expected assertions are wrapped in a local class method so that the return statement doesn't abort the entire function.

static int i = 1;
// This asserion:
ASSERT_EQ(1, i);
// expands to code vaguely resembling this:
if (1 != i) {
  ReportFailedAssertion(1, i, "ASSERT_EQ(1, i)", __LINE__);
  return;
}

// This assertion:
EXPECT_FATAL_FAILURE(ASSERT_EQ(2, i));
// expands to code vaguely resembling this.  Note the local class scoped to
// a dummy do/while block.
do {
  class GTestExpectFatalFailureHelper {
  public:
    static void Execute() {
      if (2 != i) {
        ReportFailedAssertion(2, i, "ASSERT_EQ(2, i)", __LINE__);
        return;
      }
    }
  };
  GTestExpectFatailFailureHelper::Execute();
} while(false);

Second, local classes can be used like any other class or function, as a mechanism for reusing and refactoring code. If you have code that appears in more than one place within a function, but is too specific to be used outside of that function, then following the principles of information hiding (if you prefer the dry CS term) or Spartan programming (if you prefer classical historical allusions or gratuitous Frank Miller references), you can use a local class to keep the code scoped as narrowly as possible. Personally, I've very rarely found code that's repeated within a function but is so specific that it will never be used outside of that function. However, this could simply be a case of my available tools determining how I solve problems. For example, Delphi permits local functions, and Delphi developers seem to find them moderately useful; the latest incarnation of Delphi's Visual Components Library, consisting of roughly 11,000 methods across 220,000 lines of code, uses about 190 local functions. Walter Bright, the creator of the D programming language, gives several examples of how nested functions can be used. He concludes,

Lack of nested functions is a significant deficit in the expressive power of C and C++, necessitating lengthy and error-prone workarounds. Nested functions are applicable to a wide variety of common programming situations, providing a symbolic, pointer-free, and type-safe solution.

They could be added to C and C++, but to my knowledge they are not being considered for inclusion. Nested functions and delegates are available now with D. As I get used to them, I find more and more uses for them in my own code and find it increasingly difficult to do without them in C++.

Sunday, February 15, 2009

The Problem with the Web

In "Why Chrome is Shiny," Jonathan Edwards does a wonderful job of articulating the vague misgivings that I've had about the current rush of interest in web applications:

I have realized that Internet browsers are a dead end, much like MS-DOS was... Some examples of these problems:

  1. No multithreading
  2. Reference counting GC, causing memory leaks
  3. Only 2 outgoing TCP sockets, and only to same site as URL
  4. Whole-page rendering, making dynamic layout changes unpleasant
  5. DOM incompatibilities
  6. Event firing and handling incompatibilities
  7. Limited standard libraries, and poor support for large-scale programming

All of this reminds me very much of MS-DOS. Like the browser, it was essentially a toy that was not originally intended to be used for anything serious or intense. Hackers came in and discovered they could do all sorts of things beyond those original intentions, and that they could get rich in the process. In the resulting gold rush there was no time to worry about fixing the platform. MS-DOS willfully ignored the existing body of knowledge about how to design an operating system.

Javascript has its good parts - prototypes, first-class functions, JSON - but it has plenty of problems too. People are expending prodigious effort finding solutions or workarounds for those problems - for example, the V8 and TraceMonkey teams' efforts to improve Javascript's performance - but I can't help but wonder if that effort would be better spent on a language that was, you know, designed for the purpose for which it's being used.

And Javascript is only one problematic aspect of current web development. Web security, for example, is awful; Jeremiah Grossman estimates that 90% of sites have vulnerabilities, and he and his team can generally find a vulnerability in under two minutes. (And he writes some great war stories about doing so.) Security for web development is roughly where security for network server development was twenty years ago: it's possible to write secure code, but only if you know what you're doing, and only if you never make a mistake. Twenty years ago, that meant never overflowing a buffer, never using sprintf/strcpy/strcat, and being very careful how you used the nonintuitive strncpy/strncat. Now, it means never building an SQL query from user data, always HTML-escaping text before outputting it, blocking CSRF, and so on. Development on the desktop and in network services is finally advancing from "secure if you try really hard" to "secure by default", thanks to training and awareness efforts by SANS and others, more secure (harder to misuse) C libraries such as Microsoft's and OpenBSD's, the selective replacement of C and C++ with higher-level languages that eliminate the possibility of memory allocation or buffer overrun errors, and vendors (primarily Microsoft) finally embracing the principle of least privilege. Web development has yet to make this transition from "secure if you try hard" to "secure by default." (Unless there have been advances that I'm not aware of? The most intriguing idea I've read is to use a strong type system, such as Haskell's, to distinguish between unsanitized and sanitized data.)

In spite of all of the problems with web development, it's the best (only?) method we've found of writing cross-platform, zero-deployment, sandboxed apps that can share data where needed and access data from anywhere. These are valuable features.

What should be done about the problems with web development? Silverlight might have the right idea, but I prefer my web without vendor lock-in. One approach (apparently favored by Edwards in "Why Chrome is Shiny") is to use a platform like the Google Web Toolkit or haXe to build on top of the current web platform and (hopefully) hide many of its shortcomings. At the very least, I figure we should be aware of the problems of the current technology craze and be a bit cautious of jumping on the bandwagon. That's good advice for technology in general.

Edit: I don't feel like I communicated very well... What seems strange about much of web development - what seems surreal about so much of software development in general - is that none of it has to be this way. It's not like other fields of engineering where are solutions are constrained by the laws of physics or by the chemical properties of the substances we're working with or similar; it's all, as Fred Brooks says, "castles in the air, [built] from air." It's largely an accident of history and of market forces that this combination of HTML and CSS (incompatible between vendors and often done improperly on web sites), Javascript (rushed out the door by Netscape and saddled with some bad design decisions), and XHR. And now, developers have spent prodigious effort making this concoction work, and with so much brains and sweat put into it, it often works amazingly well. But it's a place no one would choose to start from if they could have a choice.