Sunday, May 17, 2009

Comparing Version Numbers

Comparing version numbers is one of those operations that seems like it ought to be simple but can have a fair bit of complexity under the surface. How would you write an algorithm that gives the correct result for all of the following?

  • 2.10 comes after 2.9
  • 2.11-beta comes after 2.11-alpha
  • 2.11 comes after 2.11-beta
  • 7 comes after 2003
  • the revision that was checked in on 1/30/2009 and never given a version number comes after the revision that was checked in on 8/1/2008 and never given a version number

The Debian Project deals with more version numbers than anyone - over 30,000 packages over more than fifteen years, each of which must be separately versioned and tracked for upgrade and depedency purposes - so it's no surprise that they've developed a policy and algorithm that can handle all of the above cases. Debian's algorithm neatly handles cases like 2.9 versus 2.10, and it has two main additions over other version comparison algorithms: it states that ~ is sorted before anything, even the empty string, so 2.11~beta comes before 2.11, and it permits the addition of an epoch to "reset" version numbers, so 2003, which has the default epoch of 0, comes before 1:7, which has an epoch of 1. All of this makes Debian's algorithm a good approach for comparing version numbers even if you're not a Debian package maintainer.

I recently needed to compare some Debian version numbers, but the only C or C++ implementations of this algorithm that I could find were GPL'ed, and my target environment wouldn't permit me to simply shell out to dpkg --compare-versions, so I wrote my own implementation. Here it is, under the public domain. Note that, although it's written as a C++ class, the method bodies are in C style to make it easier to reuse them in a non-C++ project.

DebVersion.h:

#ifndef DEBVERSIONH
#define DEBVERSIONH

#include <stdlib.h>
#include <iostream>
#include <boost/operators.hpp>

class DebVersion : public boost::totally_ordered<DebVersion> {
public:
  DebVersion() : mEpoch(0), mUpstream(NULL), mRevision(NULL) {}
  DebVersion(const DebVersion& other) { *this = other; }
  explicit DebVersion(const char *version);
  ~DebVersion();

  int Epoch() const { return mEpoch; }
  const char *Upstream() const { return mUpstream; }
  const char *Revision() const { return mRevision; }

  void Epoch(int new_epoch) { mEpoch = new_epoch; }
  void Upstream(const char *new_upstream) { free(mUpstream); mUpstream = strdup(new_upstream); }
  void Revision(const char *new_revision) { free(mRevision); mRevision = strdup(new_revision); }
  void ClearRevision() { Revision("0"); }

  DebVersion& operator=(const DebVersion& other);
  bool operator<(const DebVersion& other) const;
  bool operator==(const DebVersion& other) const;

protected:
  int mEpoch;
  char *mUpstream;
  char *mRevision;

  static int CompareComponent(const char *a, const char *b);
};

inline DebVersion::~DebVersion()
{
  free(mUpstream);
  free(mRevision);
}

inline bool DebVersion::operator==(const DebVersion& other) const
{
  return Epoch() == other.Epoch()
    && strcmp(Upstream(), other.Upstream()) == 0
    && strcmp(Revision(), other.Revision()) == 0;
}

inline DebVersion& DebVersion::operator=(const DebVersion& other)
{
  mEpoch = other.Epoch();
  mUpstream = strdup(other.Upstream());
  mRevision = strdup(other.Revision());
  return *this;
}

void operator<<(std::ostream& o, const DebVersion& ver);

#endif

DebVersion.cpp:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <ctype.h>

#include "DebVersion.h"

void operator<<(std::ostream& o, const DebVersion& ver)
{
  if (ver.Epoch() != 0) {
    o << ver.Epoch() << ":";
  }
  o << ver.Upstream();
  if (0 != strcmp(ver.Revision(), "0")) {
    o << "-" << ver.Revision();
  }
}

DebVersion::DebVersion(const char *version)
{
  // Extract epoch.  The Debian Policy Manual says that epoch must be
  // numeric, so atoi is safe.
  const char *epoch_end = strchr(version, ':');
  if (epoch_end != NULL) {
    mEpoch = atoi(version);
  } else {
    mEpoch = 0;
  }

  const char *upstream_start;
  if (epoch_end) {
    upstream_start = epoch_end + 1;
  } else {
    upstream_start = version;
  }

  const char *upstream_end = strrchr(upstream_start, '-');
  size_t upstream_size;
  if (upstream_end == NULL) {
    upstream_size = strlen(upstream_start);
  } else {
    upstream_size = upstream_end - upstream_start;
  }

  mUpstream = (char*) malloc(upstream_size + 1);
  strncpy(mUpstream, upstream_start, upstream_size);
  mUpstream[upstream_size] = '\0';

  if (upstream_end == NULL) {
    mRevision = strdup("0");
  } else {
    mRevision = strdup(upstream_end + 1);
  }
}

/**Compare two components of a Debian-style version.  Return -1, 0, or 1
 * if a is less than, equal to, or greater than b, respectively.
 */
int DebVersion::CompareComponent(const char *a, const char *b)
{
  while (*a && *b) {

    while (*a && *b && !isdigit(*a) && !isdigit(*b)) {
      if (*a != *b) {
        if (*a == '~') return -1;
        if (*b == '~') return 1;
        return *a < *b ? -1 : 1;
      }
      a++;
      b++;
    }
    if (*a && *b && (!isdigit(*a) || !isdigit(*b))) {
      if (*a == '~') return -1;
      if (*b == '~') return 1;
      return isdigit(*a) ? -1 : 1;
    }

    char *next_a, *next_b;
    long int num_a = strtol(a, &next_a, 10);
    long int num_b = strtol(b, &next_b, 10);
    if (num_a != num_b) {
      return num_a < num_b ? -1 : 1;
    }
    a = next_a;
    b = next_b;

  }
  if (!*a && !*b) {
    return 0;
  } else if (*a) {
    return *a == '~' ? -1 : 1;
  } else {
    return *b == '~' ? 1 : -1;
  }
}

bool DebVersion::operator<(const DebVersion& other) const
{
  if (Epoch() != other.Epoch()) {
    return Epoch() < other.Epoch();
  }

  int result = CompareComponent(Upstream(), other.Upstream());
  if (result) {
    return -1 == result;
  }

  return -1 == CompareComponent(Revision(), other.Revision());
}

And, because no code is complete without some good unit tests, here's DebVersionTest.cpp, using the excellent Google Test framework:

#include <gtest/gtest.h>

#include "DebVersion.h"

TEST(DebVersionTest, ParsesUpstreamOnly)
{
  DebVersion v("1");
  EXPECT_EQ(0, v.Epoch());
  EXPECT_STREQ("1", v.Upstream());
  EXPECT_STREQ("0", v.Revision());
}

TEST(DebVersionTest, ParsesEpochs)
{
  DebVersion v("1:2.0.1");
  EXPECT_EQ(1, v.Epoch());
  EXPECT_STREQ("2.0.1", v.Upstream());
  EXPECT_STREQ("0", v.Revision());
}

TEST(DebVersionTest, ParsesRevisions)
{
  DebVersion v("10:4.0.1~alpha-4-5");
  EXPECT_EQ(10, v.Epoch());
  EXPECT_STREQ("4.0.1~alpha-4", v.Upstream());
  EXPECT_STREQ("5", v.Revision());
}

TEST(DebVersionTest, ComparesNumbers)
{
  EXPECT_LT(DebVersion("1"), DebVersion("2"));
  EXPECT_EQ(DebVersion("10"), DebVersion("10"));
  EXPECT_LT(DebVersion("9"), DebVersion("10"));
  EXPECT_GT(DebVersion("10"), DebVersion("9"));
}

TEST(DebVersionTest, ComparesEpochs)
{
  EXPECT_GT(DebVersion("2:1"), DebVersion("1:2"));
  EXPECT_LT(DebVersion("10"), DebVersion("1:2"));
}

TEST(DebVersionTest, ComparesAlphas)
{
  EXPECT_LT(DebVersion("alpha"), DebVersion("beta"));
  EXPECT_LT(DebVersion("alpha1"), DebVersion("alpha2"));
  EXPECT_GT(DebVersion("alpha10"), DebVersion("alpha2"));
}

TEST(DebVersionTest, ComparesTildes)
{
  EXPECT_LT(DebVersion("3.0~beta1"), DebVersion("3.0"));
  EXPECT_GT(DebVersion("3.0~beta"), DebVersion("3.0~~prebeta"));
  EXPECT_LT(DebVersion("3.0~beta4"), DebVersion("3.0~rc1"));
}

TEST(DebVersionTest, ComparesRevisions)
{
  EXPECT_LT(DebVersion("3.0-2"), DebVersion("3.0-10"));
}

Monday, May 11, 2009

Minding Your Manners

I recently went to a book sale where shoppers could browse countless bins of completely disorganized books and buy as many as could fit in a box for $30. I quickly learned that the only way to navigate a sale like this was to disregard my traditional short list of topics and authors that I follow, grab anything remotely interesting, and sort it out later. One of my finds was New Rules @ Work, by Barbara Pachter and Ellen Coleman.

book cover

The book's subtitle sums up its contents: "79 Etiquette Tips, Tools, and Techniques to Get Ahead and Stay Ahead." It covers topics such as greeting people, navigating social gatherings, corporate attire, and communications skills (whether face-to-face or by phone or email). For me, at least, the tips presented are an odd mix of the very obvious (stand up when shaking a guest's hand), the rather irrelevant (how to sample wine at restaurants fancier than my humble software developer's expense account would cover), and the desperately needed (how to mingle at a social gathering). Even though many of the suggestions fell in the first category for me, the book is a quick read and is interspersed with plenty of amusing anecdotes about various etiquette blunders to keep it interesting.

All of this has little to do with coding, but there's a lot more to software development than just coding:

  • gathering requirements
  • managing the infrastructure (version control, build system, continuous integration)
  • documentation (developer docs if not end-user docs)
  • testing
  • release management and deployment
  • customer support (possibly indirectly, depending on the size of your organization)

And there's a lot more to a career in software development than just software development:

And, of course, workplace etiquette. New Rules @ Work emphasizes the effects that etiquette (or the lack thereof) can have on your career, and it seems to present etiquette primarily as a tool for advancing your career. Considering the book's focus, this is probably okay, but it seems a bit utilitarian to me. I've always preferred Brendan Frasier's character's explanation in the movie Blast from the Past:

Good manners are just a way of showing other people we have respect for them... A lady or a gentleman is someone who always tries to make sure the people around him or her are as comfortable as possible.

Etiquette is a way to serve others.

As software developers, it can be our tendency to focus too much on the technology. We want to try out the latest development tools or techniques even if they're not best for the job; we pursue certain standards or ideals even if they don't make business sense; we see a technical challenge and try to solve it even if it's not the right problem to solve. As a remedy to this, we're told to focus on delivering business value. This is good advice, but it raises the question, why pursue business value? The standard answer is "profits," but as Max De Pree says, "Profits are like breathing. Breathing is not the goal of life, but it is pretty good evidence of whether or not you are alive." The goal of business - the reason that business can exist, the way that business makes profits - is service: meeting people's needs by offering goods or services.

There's a story that, during the initial development of the Apple Macintosh, Steve Jobs complained that it booted too slowly. He argued that shaving ten seconds off of the boot time, multiplied by millions of users, would save several lifetimes. My software may never save a lifetime, but every feature that I add, every bug that I fix, every UI interaction that I streamline means that users can do their jobs a bit faster, a bit easier, with a bit less stress.

Software development is about serving users. Etiquette is just a way to do in the small what our businesses and our careers do in the large.

Sunday, May 3, 2009

Memberwise Visitation

This code was the original impetus for my postings on template template parameters and multi-paradigm programming, but those didactic digressions grew into articles of their own. I don't know how helpful such digressions are - I'm learning that truly effective teaching requires the patience to develop good examples and an awareness of where your target audience is, and I'm not sure that I possess either, at least in this format - but at least they're out of the way...

C++'s object-oriented and generic programming each offer a lot of capability, but they can be combined for even more power. Consider, for example, the following code:

#include <iostream>
#include <boost/ptr_container/ptr_vector.hpp>
#include <boost/shared_ptr.hpp>

template <typename Class>
class ClassVisitor {
public:

  class MemberVisitor {
  public:
    virtual ~MemberVisitor() {}
    virtual void Visit(Class& instance) = 0;
  };

  template <typename T>
  class MemberVisitorImpl : public MemberVisitor {
  public:
    MemberVisitorImpl(T Class::*ptr) : mMember(ptr) {}
    void Visit(Class& instance) {
      // Sample visitor method.  You can define whatever member-specific
      // behavior you want.
      std::cout << instance.*mMember << ' ';
    }
  private:
    T Class::*mMember;
  };

  template <typename T>
  ClassVisitor& Declare(T Class::*ptr);

  void Visit(Class& instance);

private:
  typedef boost::ptr_vector<MemberVisitor> MemberVisitorList;
  MemberVisitorList mMemberVisitors;
};

template <typename Class>
template <typename T>
ClassVisitor<Class>& ClassVisitor<Class>::Declare(T Class::*ptr)
{
  mMemberVisitors.push_back(new MemberVisitorImpl<T>(ptr));
  return *this; // supports a fluent interface
}

template <typename Class>
void ClassVisitor<Class>::Visit(Class& instance)
{
  for (typename MemberVisitorList::iterator i = mMemberVisitors.begin();
       i != mMemberVisitors.end();
       i++) {
    i->Visit(instance);
  }
}

struct Annotation {
  const char *mMessage;
  int mX;
  int mY;
};

int main(int argc, char *argv[])
{
  ClassVisitor<Annotation> def;
  def.Declare(&Annotation::Message)
    .Declare(&Annotation::X)
 .Declare(&Annotation::Y);

  Annotation annotation;
  annotation.Message = "Note this";
  annotation.X = 10;
  annotation.Y = 20;
  def.Visit(annotation);
  return 0;
}

This code offers one approach to visiting each member of a class with a client-specified operation. Generic programming is used MemberVisitor subclasses handle members of any type, while OOP's polymorphism is used to let different types be handled at runtime. (Normally templates can only be applied at compile-time.) The example code simply dumps member variables' contents to screen and ends up being merely an overcomplicated operator<<(std::ostream&, const Annotation&), but MemberVisitor instances can contain additional information about their member variables and can define more than one Visit operation. In my current code's memberwise visitor implementation:

  • MemberVisitor instances contain member variables' descriptions, defaults, and constraints (minimum, maximum, etc.).
  • Copy constructors, operator=, and default constructors can be implemented by iterating over the list of member variables and default values that the MemberVisitor list provides.
  • MemberVisitors' information is shared by visit operations: per-member serialization and unserialization use members' types and formats, constraints are used during unserialization to check for corrupt archives and at runtime to check design by contract-style invariants, member descriptions and defaults can be used to report errors and reset to a known good state if a corrupt archive is encountered during deserialization, and the class can easily be dumped one member at a time to a human-readable format for inspection and debugging.
  • A second set of visitors handles binding members to a form's fields: loading form fields from a class instance, checking whether a form is dirty, validating a form, and saving form fields to a class instance are each implemented as Visit operations, and information such as MemberVisitor constraints can be used when a form is loaded to configure the UI widgets and before a form is saved to check for validity.

The result is that the MemberVisitors offer some introspection/reflection capability for C++ as well as something like a database schema for each class, similar to what would be provided by most frameworks' database support, but not tied in to a database backend. This multiparadigm (OO-and-generic) combination is only one approach to implementing this functionality; other implementations could use more of a pure template metaprogramming approach or could rely on preprocessor macros, code generation via an external tool, or reflection (via vendor extensions or a third-party library). Each implementation offers its own tradeoffs for complexity, portability (template metaprogramming often runs into compiler bugs; reflection is often vendor-specific), and build system support (in the case of code generation and reflection).

I'm still playing around with different approaches trying to find the best balance of tradeoffs and trying to figure out the best way to take advantage of C++'s multiparadigm support without overcomplicating my code. Are there other examples of code that directly leverages C++'s multiparadigm support? Are there preexisting implementations of this kind of member schema functionality for C++? Would there be any interest in a (cleaned up and open-sourced) member schema library, or is that functionality adequately covered by database libraries? Comments welcome.