Coding Castles: January 2009

Friday, January 30, 2009

Floods, Recessions, and Other Forces

I found out today that a friend of mine was getting laid off.

I make computers obey me for a living. I consistently try to improve my skills, and I'd like to think that I'm pretty good at what I do. I tend to assume that I can master any problem or challenge that comes my way. But today was a reminder that there's still a great deal that I can't control. Professionally, I can't control my boss, my coworkers, vendors, or customers; in the broader realm of life, I can control even less.

As Dennis Bakke said in Joy at Work, when there's a hundred-year record flood, it doesn't matter how well your house is built.

It was an important reminder.

Wednesday, January 28, 2009

A Simple Essay

Like many developers, I sometimes over-complicate my code, whether in an attempt to generalize and future-proof it or to just test out some new technique. In theory I know that over-complication is bad, but trying to do this in practice raises questions that I don't know how to answer. So, following a time-honored blogging tradition, I'm going to provide quotes from better known, more insightful people who address the topic, and I'll intersperse a few thoughts of my own so that I can act like I'm adding something to the discussion. (Each of the linked articles and papers is recommended reading for more treatment of this topic.)

“Controlling complexity is the essence of computer programming.” - Brian W. Kernighan (source unknown)

“Complexity is the business we are in, and complexity is what limits us.” - Frederick P. Brooks, The Mythical Man-Month, chapter 17

The main reason for pursuing simplicity is our own limitations:

“Programming is a desperate losing battle against the unconquerable complexity of code, and the treachery of requirements... A lesson I have learned the hard way is that we aren’t smart enough... The human mind can not grasp the complexity of a moderately sized program, much less the monster systems we build today. This is a bitter pill to swallow, because programming attracts and rewards the intelligent, and its culture encourages intellectual arrogance. I find it immensely helpful to work on the assumption that I am too stupid to get things right. This leads me to conservatively use what has already been shown to work, to cautiously test out new ideas before committing to them, and above all to prize simplicity.” - Jonathan Edwards, “Beautiful Code”

Edsger Dijkstra made a similar point over thirty-five years ago:

"The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague." - Edsger W. Dijkstra, "The Humble Programmer"

Part of Djikstra's remedy, as I understand it, was to push for radical simplicity in programming languages, on the belief we must be able to easily and entirely understand “our basic tools.” The trend has instead been to push more and more complexity into our tools in hopes that it will make our applications manageably simple: languages continue to sprout new features, optimizing compilers rearrange our functions and variables behind our backs, garbage collection makes resource reclamation non-deterministic, and environments like the JVM and CLI become massive projects in their own right.

Accommodating human frailty is the philosophical reason for simplicity, but there are practical benefits as well:

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?” - Brian W. Kernighan and P. J. Plauger, The Elements of Programming Style, p. 10.

(A well-known quote, but have debugging advancements like IDE integration, VM playback, and omniscient debugging and techniques like TDD changed this?)

One reason for [Extreme Programming's encouragement of simplicity] is economic. If I have to do any work that's only used for a feature that's needed tomorrow, that means I lose effort from features that need to be done for this iteration... This economic disincentive is compounded by the chance that we may not get it right. However certain we may be about how this function works, we can still get it wrong - especially since we don't have detailed requirements yet. Working on the wrong solution early is even more wasteful than working on the right solution early. And the XPerts generally believe that we are much more likely to be wrong than right (and I agree with that sentiment.) The second reason for simple design is that a complex design is more difficult to understand than a simple design. Therefore any modification of the system is made harder by added complexity. This adds a cost during the period between when the more complicated design was added and when it was needed.” - Martin Fowler, “Is Design Dead?”

Although simplicity has many benefits, complexity is often unavoidable:

“The complexity of software is an essential property, not an accidental one. Hence descriptions of a software entity that abstract away its complexity often abstract away its essence. Mathematics and the physical sciences made great strides for three centuries by constructing simplified models of complex phenomena, deriving properties from the models, and verifying those properties experimentally. This worked because the complexities ignored in the models were not the essential properties of the phenomena. It does not work when the complexities are the essence.” - Frederick P. Brooks, “No Silver Bullet: Essence and Accident in Software Engineering” (as printed in The Mythical Man-Month)

Brooks gives other reasons for complexity being an essential property of software: communication, the number of possible states, and interdependencies all scale nonlinearly, and unlike other human constructs, no two parts of a program are identical.

“I find languages that support just one programming paradigm constraining. They buy their simplicity (whether real or imagined) by putting programmers into an intellectual straitjacket or by pushing complexity from the language into the applications.” - Bjarne Stroustrup, “The Real Stroustrup Interview”

This is one of the few “pro-complexity” quotes I've been able to find. Stroustrup is arguably biased, considering the complexity of his most famous creation, but his point is valid: some complexity is unavoidable and sometimes the best you can do is to push it to the language or the libraries so that the applications can (hopefully) avoid dealing with it. This is why I think that, e.g., the Boost libraries are a good thing, despite or because of their complexity.

Complexity is a major risk to software projects.

“What is the most common mistake on C++ and OO projects? Unnecessary complexity – the plague of OO technology. Complexity, like risk, is a fact of life that can't be avoided. Some software systems have to be complex because the business processes they represent are complex. But unfortunately many intermediate developers try to “make things better” by adding generalization and flexibility that no one has asked for or will ever need. The customer wants a cup of tea, and the developers build a system that can boil the ocean [thanks to John Vlissides for this quip].” - Marshall Cline et al., C++ FAQs, Second Edition, p. 36

“What's the Software Peter Principle”? The Software Peter Principle is in operation when unwise developers “improve” and “generalize” the software until they themselves can no longer understand it, then the project slowly dies.” - Marshall Cline et al., C++ FAQs, Second Edition, p. 37

The term “Software Peter Principle” was coined in C++ FAQs, but it's at least spread enough to get its own Wikipedia page with some added discussion. “Avoid Death by Complexity,” by Alan Keefer, covers the same subject matter and is too long to quote here; just go read it.

Development methodologies can help in the pursuit of simplicity:

I suggest: Exploiting the mass market to avoid constructing what can be bought. Using rapid prototyping as part of a planned iteration in establishing software requirements. Growing software organically, adding more and more function to systems as they are run, used, and tested. Identifying and developing the great conceptual designers of the rising generation. - Frederick P. Brooks, “No Silver Bullet”

Popular perception often views “No Silver Bullet” and The Mythical Man-Month as pessimistic, but Brooks argued that the essential complexity could be steadily reduced; his approach of rapid prototyping and iterative organic growth anticipates some the agile development techniques.

“Do the simplest thing that could possibly work.” “You Aren't Gonna Need It.” - Extreme Programming maxims.

Keep in mind, though, that XP practices reinforce each other; YAGNI and “do the simplest thing” won't work unless you're also practicing unit tests (to find out if your simplest thing actually does work, and to permit refactoring) and refactoring (to keep your design clean as your “simplest” and “not needed” code necessarily grows).

As I try to figure out the balance between simplicity and complexity, I was encouraged to read that someone as experienced as Martin Fowler seems to struggle with some of the same questions:

“So we want our code to be as simple as possible. That doesn't sound like that's too hard to argue for, after all who wants to be complicated? But of course this begs the question "what is simple?" ... [One major criteria for XP is] clarity of code. XP places a high value on code that is easily read. In XP "clever code" is a term of abuse. But some people's intention revealing code is another's cleverness... In his XP 2000 paper, Josh Kerievsky points out a good example of this. He looks at possibly the most public XP code of all - JUnit. JUnit uses decorators to add optional functionality to test cases, such things as concurrency synchronization and batch set up code. By separating out this code into decorators it allows the general code to be clearer than it otherwise would be. But you have to ask yourself if the resulting code is really simple... So might we conclude that JUnit's design is simpler for experienced designers but more complicated for less experienced people?...Simplicity is still a complicated thing to find. Recently I was involved in doing something that may well be over-designed. It got refactored and some of the flexibility was removed. But as one of the developers said "it's easier to refactor over-design than it is to refactor no design." It's best to be a little simpler than you need to be, but it isn't a disaster to be a little more complex. The best advice I heard on all this came from Uncle Bob (Robert Martin). His advice was not to get too hung up about what the simplest design is. After all you can, should, and will refactor it later. In the end the willingness to refactor is much more important than knowing what the simplest thing is right away.” - Martin Fowler, “Is Design Dead?”

Of course, this struggle between simplicity and complexity is hardly new with or specific to programming.

'Tis the gift to be simple, 'tis the gift to be free...” - Joseph Brackett Jr., “Simple Gifts,” 1848

Saturday, January 17, 2009

No Boost for Google

I recently came across the Google C++ Style Guide, and since the folks at Google are geniuses, I thought it would be worth a read. Among the various sound practices, sage advice, and sensible conventions which it laid out, I found this surprising requirement:

Use only approved libraries from the Boost library collection... Some Boost libraries encourage coding practices which can hamper readability, such as metaprogramming and other advanced template techniques, and an excessively "functional" style of programming... In order to maintain a high level of readability for all contributors who might read and maintain code, we only allow an approved subset of Boost features [consisting of roughly five and a half of Boost's nearly one hundred libraries]...

I confess: I love the Boost libraries. I curse my compiler for not supporting Pointer Container. I gaze in awe at the metaprogramming that makes Units and Accumulators possible. I use Format, despite its documented performance problems, because it's so darn convenient. I painstakingly craft Lambda and Preprocessor expressions, in spite of their unforgiving syntax and obtuse error messages, just to have code that's a bit less repetitive and a bit more extensible. (It's truly amazing that the Boost devs have figured out how to add preprocessor-based code generation and lambda expressions to C++.)

But is my approach really best? As already mentioned, the folks at Google are geniuses; maybe they're on to something here? The core goal of the Style Guide authors in restricting Boost seems to be to promote simplicity, which is often in short supply as a software project develops. Some of a project's increase in complexity is unavoidable, as it gains features and adapts to handle real-world problems. Some of a project's increase in complexity is the natural result of entropy and is best handled by a healthy regimen of refactoring. Some of it, though, can only be staved off by a radical commitment to simplicity, as the folks at Google (seem to) suggest, but this raises a number of questions and tradeoffs. For example, is the simplicity of straightforward, somewhat repetitive boilerplate code better or worse than fully generic C++ template wizardry? Is boilerplate code better or worse than depending on a DSL or scripting language to generate code? What about code that's simple to an experienced developer but complex to a novice (or vice versa - code that's simple for a junior developer to write but fails to use simple-to-maintain techniques that a senior developer would know)? To what extent is it okay to make a class's internals complicated if it makes the class simpler to use? Do I use Lambda and Preprocessor because they're the best tools for the job, or do I use them because I find wrapping my head around them to be a more interesting challenge than maintaining the legacy code base I'm working on? Are the benefits (to my productivity and to future extensibility) of adding a dependency on yet another external library worth the costs to my coworkers and future maintainers of having to understand yet another library before they can work on my code?

At this point in my career, I'm not sure how to answer these questions. I suspect that the approach advocated by the Google C++ Style Guide is flawed; if simplicity is the goal, it's difficult to see how rolling your own IPC or threading library could be simpler than using Boost's, and even small classes and functions like optional and lexical_cast can make code simpler and more readable. However, I suspect that my approach is flawed too; much has been written about the fact that even relatively simple programs are too complex to hold in your head at once, and pursuing the design goals of elegance, extensibility, and robustness without also remembering simplicity can too quickly land you in architecture astronaut territory.

Like I said, I love the Boost libraries. But I've been revising some code I wrote six months ago to no longer use Lambda. It's just simpler this way.

Sunday, January 4, 2009

Floating Point Reference

"Fractions are your friends." This was my high school algebra teacher's standard response when students complained about the sometimes tedious math that they could require. Many years later, I'm finding that while floating point math, while not tedious, is still rather more involved than it appears at first; here's my attempt to summarize issues that may come up.

Floating point data types

In C on an x86 system:

A float is 32 bits.
A double is 64 bits.
A long double is compiler-dependent. A long double in gcc contains 80 bits of data, although it's stored as 128 bits (16 bytes) to maintain word alignment. Visual C++ treats a long double as 64 bits, just like a double (and has been criticized for doing so; see here and here). I had been under the impression that long doubles were nonstandard or were a recent addition to the standard, but as far as I can tell, they've been around since before C99.

Selecting a floating point data type

See this portion of another post.

Floating point storage

This is covered on many sites; the most concise and thorough description I've found is in chapter 2 of Sun's Numerical Computation Guide.

NaNs and Infinity

See my previous post.

Comparing floating point numbers

A simple equality comparison (such as a == b) will often fail, even for values which you would expect to be equal, due to rounding errors (especially rounding introduced by converting between binary and decimal). (Do any compilers or static code analysis tools emit warnings if you try to do a naive equality comparison?)

The simplest solution is to check if two floating point numbers are within a small developer-chosen epsilon value of each other; this is the approach used by the CUnit and CppUnit testing frameworks, but it has the disadvantage of requiring you to choose an epsilon value yourself and make sure that it's appropriate for the data being compared.
"Comparing Floating Point Numbers," by Bruce Dawson, discusses this absolute epsilon approach as well as more sophisticated approaches such as comparing using a relative epsilon and comparing based on the number of possible floating point values that could occur between the two operands.
The simplest good approach, provided by CERT, is to compare using compiler-provided absolute epsilons. (As I understand it, CERT's approach should be equivalent to Dawson's approach with a max ulps of 1.)

Greater than / less than comparisons generally require no special handling, although at the assembly level, a comparison may return a result of "unordered" if NaNs are involved, and one of the compilers I tested (CodeGear C++Builder) fails to account for this and so may return incorrect results when comparing NaNs.

Handling floating point exceptions

See this post.

Low-level floating point calculations

If you need to do floating point work at the assembly level, use Intel Software Developer's Manuals as a reference. Volume 1 has some background on the FPU; volume 2A contains most floating point instructions (since they start with F).

Example code

Rather than simple math, the following code covers manipulating floating point numbers' bit representations, handling NaNs and infinities, and so on.

This earlier post has several code snippets.
The Google C++ Testing Framework has a FloatingPoint class template in include/gtest/internal/gtest-internal.h and an associated FloatingPointTest test fixture in test/gtest_unittest.cc. Although FloatingPoint is intended for use by the framework, it should be general enough to be useable elsewhere.

For further reading

In no particular order...

"What Every Computer Scientist Should Know About Floating-Point Arithmetic," by David Goldberg et al. Very thorough and math-intensive. Despite the title, I haven't finished reading this paper.
"Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic," by W. Kahan. Kahan is the primary architect of the IEEE 754 floating point standard; the section entitled "Ruminations on Programming Languages" is particularly interesting.
"Comparing Floating Point Numbers," by Bruce Dawson. Also referenced above.
The Floating Point Arithmetic section of the CERT Secure Coding Standards. Primarily a list of "gotchas" which experienced programmers hopefully already know to avoid.
Wikipedia's article on floating point and linked articles. Honestly, I've found Wikipedia's coverage to be a bit sprawling; hence my attempt at a shorter, more targeted list.

EDIT: (3/15/2009) Added sections on "Selecting a floating point data type" and "Handling floating point exceptions."

Coding Castles