2016-07-24: Improve research code with static type checking

The Pain of Late Bug Detection

[The web] is big. Really big. You just won't believe how vastly, hugely, mindbogglingly big it is... [1]

When it comes to quick implementation, Python is an efficient language used by many web archiving projects. Indeed, a quick search of github for WARC and Python yields a list of 80 projects and forks. Python is also the language used for my research into the temporal coherence of existing web archive holdings.

The sheer size of the Web means lots of variation and lots low-frequency edge cases. These variations and edge cases are naturally reflected in web archive holdings. Code used to research the Web and web archives naturally contains many, many code branches.

Python struggles under these conditions. It struggles because minor changes can easily introduce bugs that go undetected until much later. And later for Python means at run time. Indeed the sheer number of edge cases introduces code branches that are exercised so infrequently that code rot creeps in. Of course, all research code dealing with the web should create checkpoints and be restartable as a matter of self defense—and mine does defend itself. Still, detecting as many of these kinds of errors up front, before run time is much better than dealing with a mid-experiment crash.

[1] Douglas Adams may have actually written something a little different.

Static Typing to the Rescue

Static typing allows detection of many types of errors before code is executed. Consider the function definitions in figure 1 below. Heuristic is an abstract base class for memento selection heuristics. In my early research, memento selection heuristics required only Memento-Datetime. Subsequent work introduced selection based on both Memento-Datetime and Last-Modified. When the last_modified parameter was added, the cost functions were update accordingly—or so I thought. Notice that the last_modified parameter is missing from the PastPreferred cost function. Testing did not catch this oversight (see "Testing?" below). The addition of static type checking did.

class Heuristic(object):

...

class MinDist(Heuristic):
    def cost(self, memento_datetime, last_modified=None):

class Bracket(Heuristic):
    def cost(self, memento_datetime, last_modified):

class PastPreferred(Heuristic):
    def cost(self, memento_datetime):

Figure 1. Original Code

Static type checking is available for Python through the use of type hinting. Type hinting is specified in PEP 484 and is implemented in mypy. Type hints do not change Python execution; they simply allow mypy to programmatically check expectations set by the programmer. Figure 2 shows the heuristics code with type hints added. Note the addition of the cost function to the Heuristic class. Although not implemented, it allows the type checker to ensure that all cost functions conform to expectations. (This is the addition that led to finding the PastPreferred.cost bug.)

class Heuristic(object):

    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime]) \
             -> Tuple[int,datetime]:
        raise NotImplementedError 

class MinDist(Heuristic):
    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime] = None) \
             -> Tuple[int,datetime]:

class Bracket(Heuristic):
    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime]) \
             -> Tuple[int,datetime]:

class PastPreferred(Heuristic):
    def cost(self, memento_datetime: datetime,
             last_modified: Optional[datetime] = None) \
             -> Tuple[int,datetime]:

Figure 2. Type Hinted Code

Testing?

Many have argued that if code is well tested, the extra work introduced by static type checking out weighs the benfits. But what about bugs in the tests? (After all tests are code too—and not immune from programmer error). The code shown in Figure 1 had a complete set of tests (i.e. 100% coverage). However when Last-Modified was added, the PastPreferred tests were not updated and continued to pass. The addition of static type checking revealed the PastPreferred test bug, three research code bugs missed by the tests, and over dozen other test bugs. Remember, "Test coverage is of little use as a numeric statement of how good your tests are."

— Scott G. Ainsworth

2016-07-24: Improve research code with static type checking

The Pain of Late Bug Detection

Static Typing to the Rescue

Testing?

Trending Articles

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Griffith faces three more offences

NCERT Solutions for Class 9th Sanskrit Chapter 2 अविवेकः परमापदां पदम्

Skint TV teen to be sentenced

09g927750** 6 speed transmission TCM VAG original firmware files

The 10 Wyoming Cities With The Largest Black Population For 2021

More things we have to put up with: when NOT to raise hell with Disclosure

Karnataka SSLC 10th Exam Time Table 2016 (www.kseeb.kar.nic.in)

Scripting Tracker - Development Tool for SAP GUI Scripting

PSM I question: Product Backlog item considered complete

Karimnagar District Police Office Mobile Numbers List in Telangana State

Ifield Avenue closed following crash in Langley Green

Practice Sheet of Right form of verbs for HSC Students

Shatta Wale – You Shock Me (Prod. by Willis Beatz)

Parris out on $9,000 bail

TASK ERROR: storage migration failed: block job (mirror) error:...

Electronic Bank Statement field Assignment (ZUONR) missing alphabets from...

गर्मी पर स्टेटस – Funny Summer Status in Hindi for Whatsapp

Forum Post: RE: TMS570LC4357: Disable error pin output for ESM group 1, 2, 3

newbie need guide - help - read flash xc2287-96F with dap miniwiggler