Python: Unit Testing with Doctest

What is Unit testing and what it is not?

The title of this section, begs another question: "Why do I care?" One answer is that Unit testing is a best practice that has been evolving toward its current form over most of the time that programming has existed. Another answer is that the core principles of Unit testing are just good sense; it might actually be a little embarrassing to our community as a whole that it took us so long to recognize them.

Alright, so what is Unit testing? In its most fundamental form, Unit testing can be defined as testing the smallest meaningful pieces of code (such pieces are called units), in such a way that each piece's success or failure depends only on itself. For the most part, we've been following this principle already.

There's a reason for each part of this definition: we test the smallest meaningful pieces of code because, when a test fails, we want that failure to tell where the problem is us as specifically as possible. We make each test independent because we don't want a test to make any other test succeed, when it should have failed; or fail when it should have succeeded. When tests aren't independent, you can't trust them to tell you what you need to know.

Traditionally, automated testing is associated with Unit testing. Automated testing makes it fast and easy to run unit tests, which tend to be amenable to automation. We'll certainly make heavy use of automated testing with doctest and later with tools such as unittest and Nose as well.

Any test that involves more than one unit is automatically not a unit test. That matters because the results of such tests tend to be confusing. The effects of the different units get tangled together, with the end result that not only do you not know where the problem is (is the mistake in this piece of code, or is it just responding correctly to bad input from some other piece of code?), you're also often unsure exactly what the problem is this output is wrong, but how does each unit contribute to the error? Empirical scientists must perform experiments that check only one hypothesis at a time, whether the subject at hand is chemistry, physics, or the behavior of a body of program code.

Time for action – identifying units

Imagine that you're responsible for testing the following code:

class testable:
  def method1(self, number):
    number += 4
    number **= 0.5
    number *= 7
    return number
  def method2(self, number):
    return ((number * 2) ** 1.27) * 0.3
  def method3(self, number):
    return self.method1(number) + self.method2(number)
  def method4(self):
    return 1.713 * self.method3(id(self))

In this example, what are the units? Is the whole class a single unit, or is each method a separate unit. How about each statement, or each expression? Keep in mind that the definition of a unit is somewhat subjective (although never bigger than a single class), and make your own decision.

Think about what you chose. What would the consequences have been if you chose otherwise? For example, if you chose to think of each method as a unit, what would be different if you chose to treat the whole class as a unit?

Consider method4. Its result depends on all of the other methods working correctly. On top of that, it depends on something that changes from one test run to another, the unique ID of the self object. Is it even possible to treat method4 as a unit in a self-contained test? If we could change anything except method4, what would we have to change to enable method4 to run in a self-contained test and produce a predictable result?

What just happened?

By answering those three questions, you thought about some of the deeper aspects of unit testing.

The question of what constitutes a unit, is fundamental to how you organize your tests. The capabilities of the language affects this choice. C++ and Java make it difficult or impossible to treat methods as units, for example, so in those languages each class is usually treated as a single unit. C, on the other hand, doesn't support classes as language features at all, so the obvious choice of unit is the function. Python is flexible enough that either classes or methods could be considered units, and of course it has stand-alone functions as well, which are also natural to think of as units. Python can't easily handle individual statements within a function or method as units, because they don't exist as separate objects when the test runs. They're all lumped together into a single code object that's part of the function.

The consequences of your choice of unit are far-reaching. The smaller the units are, the more useful the tests tend to be, because they narrow down the location and nature of bugs more quickly. For example, one of the consequences of choosing to treat the testable class as a single unit is that tests of the class will fail if there is a mistake in any of the methods. That tells you that there's a mistake in testable, but not (for example) that it's in method2. On the other hand, there is a certain amount of rigmarole involved in treating method4 and its like as units. Even so, I recommend using methods and functions as units most of the time, because it pays off in the long run.

In answering the third question, you probably discovered that the functions id and self.method3 would need to have different definitions, definitions that produced a predictable result, and did so without invoking code in any of the other units. In Python, replacing the real function with such stand-ins is fairly easy to do in an ad hoc manner.

Unit testing throughout the development process

We'll walk through the development of a single class, treating it with all the dignity of a real project. We'll be strictly careful to integrate unit testing into every phase of the project. This may seem silly at times, but just play along. There's a lot to learn from the experience.

The example we'll be working with is a PID controller. The basic idea is that a PID controller is a feedback loop for controlling some piece of real-world hardware. It takes input from a sensor that can measure some property of the hardware, and generates a control signal that adjusts that property toward some desired state. The position of a robot arm in a factory might be controlled by a PID controller.

If you want to know more about PID controllers, the Internet is rife with information. The Wikipedia entry is a good place to start: http://en.wikipedia.org/wiki/PID_controller.

Design phase

Our notional client comes to us with the following (rather sparse) specification:

We want a class that implements a PID controller for a single variable. The measurement, setpoint, and output should all be real numbers.

We need to be able to adjust the setpoint at runtime, but we want it to have a memory, so that we can easily return to the previous setpoint.

Time for action - unit testing during design

Time to make that specification a bit more formal—and complete—by writing unit tests that describe the desired behavior.

We need to write a test that describes the PID constructor. After checking our references, we determine that a PID controller is defined by three gains, and a setpoint. The controller has three components: proportional, integral and derivative (hence the name PID). Each gain is a number that determines how much one of the three parts of the controller has on the final result. The setpoint determines what the goal of the controller is; in other words, to where it's trying to move the controlled variable. Looking at all that, we decide that the constructor should just store the gains and the setpoint, along with initializing some internal state that we know we'll need due to reading up on the workings of a PID controller:
```
>>> import pid
>>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0)
>>> controller.gains
(0.5, 0.5, 0.5)
>>> controller.setpoint
[0.0]
>>> controller.previous_time is None
True
>>> controller.previous_error
0.0
>>> controller.integrated_error
0.0
```

We need to write tests that describe measurement processing. This is the controller in action, taking a measured value as its input and producing a control signal that should smoothly move the measured variable to the setpoint. For this to work correctly, we need to be able to control what the controller sees as the current time. After that, we plug our test input values into the math that defines a PID controller, along with the gains, to figure out what the correct outputs would be:
```
>>> import time
>>> real_time = time.time
>>> time.time = (float(x) for x in xrange(1, 1000)).next
>>> pid = reload(pid)
>>> controller = pid.PID(P=0.5, I=0.5, D=0.5, setpoint=0)
>>> controller.measure(12)
-6.0
>>> controller.measure(6)
-3.0
>>> controller.measure(3)
-4.5
>>> controller.measure(-1.5)
-0.75
>>> controller.measure(-2.25)
-1.125
>>> time.time = real_time
```

We need to write tests that describe setpoint handling. Our client asked for a setpoint stack, so we write tests that check such stack behavior. Writing code that uses this stack behavior brings to our attention that fact that a PID controller with no setpoint is not a meaningful entity, so we add a test that checks that the PID class rejects that situation by raising an exception.

>>> pid = reload(pid)
>>> controller = pid.PID(P = 0.5, I = 0.5, D = 0.5, setpoint = 0)
>>> controller.push_setpoint(7)
>>> controller.setpoint
[0.0, 7.0]
>>> controller.push_setpoint(8.5)
>>> controller.setpoint
[0.0, 7.0, 8.5]
>>> controller.pop_setpoint()
8.5
>>> controller.setpoint
[0.0, 7.0]
>>> controller.pop_setpoint()
7.0
>>> controller.setpoint
[0.0]
>>> controller.pop_setpoint()
Traceback (most recent call last):
ValueError: PID controller must have a setpoint

What just happened?

Our clients gave us a pretty good initial specification, but it left a lot of details to assumption. By writing these tests, we've codified exactly what our goal is. Writing the tests forced us to make our assumptions explicit. Additionally, we've gotten a chance to use the object, which gives us an understanding of it that would otherwise be hard to get at this stage.

Normally we'd place the doctests in the same file as the specification, and in fact that's what you'll find in the book's code archive. In the book format, we used the specification text as the description for each step of the example.

You could ask how many tests we should write for each piece of the specification. After all, each test is for certain specific input values, so when code passes it, all it proves is that the code produces the right results for that specific input. The code could conceivably do something entirely wrong, and still pass the test. The fact is that it's usually a safe assumption that the code you'll be testing was supposed to do the right thing, and so a single test for each specified property fairly well distinguishes between working and non-working code. Add to that tests for any boundaries specified—for "The X input may be between the values 1 and 7, inclusive" you might add tests for X values of 0.9 and 7.1 to make sure they weren't accepted—and you're doing fine.

There were a couple of tricks we pulled to make the tests repeatable and independent. In every test after the first, we called the reload function on the pid module, to reload it from the disk. That has the effect of resetting anything that might have changed in the module, and causes it to re-import any modules that it depends on. That latter effect is particularly important, since in the tests of measure, we replaced time.time with a dummy function. We want to be sure that the pid module uses the dummy time function, so we reload the pid module. If the real time function is used instead of the dummy, the test won't be useful, because there will be only one time in all of history at which it would succeed. Tests need to be repeatable.

The dummy time function is created by making an iterator that counts through the integers from 1 to 999 (as floating point values), and binding time.time to that iterator's next method. Once we were done with the time-dependent tests, we replaced the original time.time.

Right now, we have tests for a module that doesn't exist. That's good! Writing the tests was easier than writing the module will be, and it gives us a stepping stone toward getting the module right, quickly and easily. As a general rule, you always want to have tests ready before the code that they test is written.

Have a go hero

Try this a few times on your own: Describe some program or module that you'd enjoy having access to in real life, using normal language. Then go back through it and try writing tests, describing the program or module. Keep an eye out for places where writing the test makes you aware of ambiguities in your prior description, or makes you realize that there's a better way to do something.