The Pain of Late Bug Detection
[The web] is big. Really big. You just won't believe how vastly, hugely, mindbogglingly big it is... [1]When it comes to quick implementation, Python is an efficient language used by many web archiving projects. Indeed, a quick search of github for WARC and Python yields a list of 80 projects and forks. Python is also the language used for my research into the temporal coherence of existing web archive holdings.
The sheer size of the Web means lots of variation and lots low-frequency edge cases. These variations and edge cases are naturally reflected in web archive holdings. Code used to research the Web and web archives naturally contains many, many code branches.
Python struggles under these conditions. It struggles because minor changes can easily introduce bugs that go undetected until much later. And later for Python means at run time. Indeed the sheer number of edge cases introduces code branches that are exercised so infrequently that code rot creeps in. Of course, all research code dealing with the web should create checkpoints and be restartable as a matter of self defense—and mine does defend itself. Still, detecting as many of these kinds of errors up front, before run time is much better than dealing with a mid-experiment crash.
[1] Douglas Adams may have actually written something a little different.
Static Typing to the Rescue
Static typing allows detection of many types of errors before code is executed. Consider the function definitions in figure 1 below. Heuristic is an abstract base class for memento selection heuristics. In my early research, memento selection heuristics required only Memento-Datetime. Subsequent work introduced selection based on both Memento-Datetime and Last-Modified. When the last_modified parameter was added, the cost functions were update accordingly—or so I thought. Notice that the last_modified parameter is missing from the PastPreferred cost function. Testing did not catch this oversight (see "Testing?" below). The addition of static type checking did.class Heuristic(object): |
... |
class MinDist(Heuristic): def cost(self, memento_datetime, last_modified=None): class Bracket(Heuristic): def cost(self, memento_datetime, last_modified): class PastPreferred(Heuristic): def cost(self, memento_datetime): |
Figure 1. Original Code |
class Heuristic(object): |
def cost(self, memento_datetime: datetime, last_modified: Optional[datetime]) \ -> Tuple[int,datetime]: raise NotImplementedError |
class MinDist(Heuristic): def cost(self, memento_datetime: datetime, last_modified: Optional[datetime] = None) \ -> Tuple[int,datetime]: class Bracket(Heuristic): def cost(self, memento_datetime: datetime, last_modified: Optional[datetime]) \ -> Tuple[int,datetime]: class PastPreferred(Heuristic): def cost(self, memento_datetime: datetime, last_modified: Optional[datetime] = None) \ -> Tuple[int,datetime]: |
Figure 2. Type Hinted Code |
Testing?
Many have argued that if code is well tested, the extra work introduced by static type checking out weighs the benfits. But what about bugs in the tests? (After all tests are code too—and not immune from programmer error). The code shown in Figure 1 had a complete set of tests (i.e. 100% coverage). However when Last-Modified was added, the PastPreferred tests were not updated and continued to pass. The addition of static type checking revealed the PastPreferred test bug, three research code bugs missed by the tests, and over dozen other test bugs. Remember, "Test coverage is of little use as a numeric statement of how good your tests are."— Scott G. Ainsworth