Test all the things with Python

20 min read

The first testing tool we’re going to look at is called doctest. The name is short for “document testing” or perhaps a “testable document”. Either way, it’s a literate tool designed to make it easy to write tests in such a way that computers and humans both benefit from them. Ideally, doctest tests both, informs human readers, and tells the computer what to expect.

Mixing tests and documentation helps us:

  • Keeps the documentation up-to-date with reality
  • Make sure that the tests express the intended behavior
  • Reuse some of the efforts involved in the documentation and test creation

(For more resources related to this topic, see here.)

Where doctest performs best

The design decisions that went into doctest make it particularly well suited to writing acceptance tests at the integration and system testing levels. This is because doctest mixes human-only text with examples that both humans and computers can read. This structure doesn’t support or enforce any of the formalizations of testing, but it conveys information beautifully and it still provides the computer with the ability to say that works or that doesn’t work. As an added bonus, it is about the easiest way to write tests you’ll ever see.

In other words, a doctest file is a truly excellent program specification that you can have the computer check against your actual code any time you want. API documentation also benefits from being written as doctests and checked alongside your other tests. You can even include doctests in your docstrings.

The basic idea you should be getting from all this is that doctest is ideal for uses where humans and computers will both benefit from reading them.

The doctest language

Like program source code, doctest tests are written in plain text. The doctest module extracts the tests and ignores the rest of the text, which means that the tests can be embedded in human-readable explanations or discussions. This is the feature that makes doctest suitable for uses such as program specifications.

Example – creating and running a simple doctest

We are going to create a simple doctest file, to show the fundamentals of using the tool. Perform the following steps:

  1. Open a new text file in your editor, and name it test.txt.
  2. Insert the following text into the file:
    This is a simple doctest that checks some of Python's arithmetic
    >>> 2 + 2
    >>> 3 * 3
  3. We can now run the doctest. At the command prompt, change to the directory where you saved test.txt. Type the following command:
    $ python3 ‑m doctest test.txt
  4. When the test is run, you should see output like this:

Result – three times three does not equal ten

You just wrote a doctest file that describes a couple of arithmetic operations, and ran it to check whether Python behaved as the tests said it should. You ran the tests by telling Python to execute doctest on the file containing the tests.

In this case, Python’s behavior differed from the tests because, according to the tests, three times three equals ten. However, Python disagrees on that. As doctest expected one thing and Python did something different, doctest presented you with a nice little error report showing where to find the failed test, and how the actual result differed from the expected result. At the bottom of the report is a summary showing how many tests failed in each file tested, which is helpful when you have more than one file containing tests.

The syntax of doctests

You might have already figured it out from looking at the previous example: doctest recognizes tests by looking for sections of text that look like they’ve been copied and pasted from a Python interactive session. Anything that can be expressed in Python is valid within a doctest.

Lines that start with a >>> prompt are sent to a Python interpreter. Lines that start with a prompt are sent as continuations of the code from the previous line, allowing you to embed complex block statements into your doctests. Finally, any lines that don’t start with >>> or , up to the next blank line or >>> prompt, represent the output expected from the statement. The output appears as it would in an interactive Python session, including both the return value and anything printed to the console. If you don’t have any output lines, doctest assumes it to mean that the statement is expected to have no visible result on the console, which usually means that it returns None.

The doctest module ignores anything in the file that isn’t part of a test, which means that you can put explanatory text, HTML, line-art diagrams, or whatever else strikes your fancy in between your tests. We took advantage of this in the previous doctest to add an explanatory sentence before the test itself.

Example – a more complex test

Add the following code to your test.txt file, separated from the existing code by at least one blank line:

Now we're going to take some more of doctest's syntax for a spin.


>>> import sys

>>> def test_write():

...     sys.stdout.write("Hellon")

...     return True

>>> test_write()



Now take a moment to consider before running the test. Will it pass or fail? Should it pass or fail?

Result – five tests run?

Just as we discussed before, run the test using the following command:

python3 -m doctest test.txt

You should see a result like this:

Because we added the new tests to the same file containing the tests from before, we still see the notification that three times three does not equal 10. Now, though, we also see that five tests were run, which means our new tests ran and were successful.

Why five tests? As far as doctest is concerned, we added the following three tests to the file:

  • The first one says that, when we import sys, nothing visible should happen
  • The second test says that, when we define the test_write function, nothing visible should happen
  • The third test says that, when we call the test_write function, Hello and True should appear on the console, in that order, on separate lines

Since all three of these tests pass, doctest doesn’t bother to say much about them. All it did was increase the number of tests reported at the bottom from two to five.

Expecting exceptions

That’s all well and good for testing that things work as expected, but it is just as important to make sure that things fail when they’re supposed to fail. Put another way: sometimes your code is supposed to raise an exception, and you need to be able to write tests that check that behavior as well.

Fortunately, doctest follows nearly the same principle in dealing with exceptions as it does with everything else; it looks for text that looks like a Python interactive session. This means it looks for text that looks like a Python exception report and traceback, and matches it against any exception that gets raised.

The doctest module does handle exceptions a little differently from the way it handles other things. It doesn’t just match the text precisely and report a failure if it doesn’t match. Exception tracebacks tend to contain many details that are not relevant to the test, but that can change unexpectedly. The doctest module deals with this by ignoring the traceback entirely: it’s only concerned with the first line, Traceback (most recent call last):, which tells it that you expect an exception, and the part after the traceback, which tells it which exception you expect. The doctest module only reports a failure if one of these parts does not match.

This is helpful for a second reason as well: manually figuring out what the traceback will look like, when you’re writing your tests, would require a significant amount of effort and would gain you nothing. It’s better to simply omit them.

Example – checking for an exception

This is yet another test that you can add to test.txt, this time testing some code that ought to raise an exception.

Insert the following text into your doctest file, as always separated by at least one blank line:

Here we use doctest's exception syntax to check that Python is correctly enforcing its grammar. The error is a missing ) on the def line.

>>> def faulty(:

...     yield from [1, 2, 3, 4, 5]

Traceback (most recent call last):

SyntaxError: invalid syntax

The test is supposed to raise an exception, so it will fail if it doesn’t raise the exception or if it raises the wrong exception. Make sure that you have your mind wrapped around this: if the test code executes successfully, the test fails, because it expected an exception.

Run the tests using the following doctest:

python3 -m doctest test.txt

Result – success at failing

The code contains a syntax error, which means this raises a SyntaxError exception, which in turn means that the example behaves as expected; this signifies that the test passes.

When dealing with exceptions, it is often desirable to be able to use a wildcard matching mechanism. The doctest provides this facility through its ellipsis directive that we’ll discuss shortly.

Expecting blank lines

The doctest uses the first blank line after  >>> to identify the end of the expected output, so what do you do when the expected output actually contains a blank line?

The doctest handles this situation by matching a line that contains only the text <BLANKLINE> in the expected output with a real blank line in the actual output.

Controlling doctest behavior with directives

Sometimes, the default behavior of doctest makes writing a particular test inconvenient. For example, doctest might look at a trivial difference between the expected and real outputs and wrongly conclude that the test has failed. This is where doctest directives come to the rescue. Directives are specially formatted comments that you can place after the source code of a test and that tell doctest to alter its default behavior in some way.

A directive comment begins with # doctest:, after which comes a comma-separated list of options that either enable or disable various behaviors. To enable a behavior, write a + (plus symbol) followed by the behavior name. To disable a behavior, white a (minus symbol) followed by the behavior name. We’ll take a look at the several directives in the following sections.

Ignoring part of the result

It’s fairly common that only part of the output of a test is actually relevant to determining whether the test passes. By using the +ELLIPSIS directive, you can make doctest treat the text (called an ellipsis) in the expected output as a wildcard that will match any text in the output.

When you use an ellipsis, doctest will scan until it finds text matching whatever comes after the ellipsis in the expected output, and continue matching from there. This can lead to surprising results such as an ellipsis matching against a 0-length section of the actual output, or against multiple lines. For this reason, it needs to be used thoughtfully.

Example – ellipsis test drive

We’re going to use the ellipsis in a few different tests to better get a feel of how it works. As an added bonus, these tests also show the use of doctest directives.

Add the following code to your test.txt file:

Next up, we're exploring the ellipsis.

>>> sys.modules # doctest: +ELLIPSIS

{...'sys': <module 'sys' (built-in)>...}


>>> 'This is an expression that evaluates to a string'

... # doctest: +ELLIPSIS

'This is ... a string'


>>> 'This is also a string' # doctest: +ELLIPSIS

'This is ... a string'


>>> import datetime

>>> datetime.datetime.now().isoformat() # doctest: +ELLIPSIS


Result – ellipsis elides

The tests all pass, where they would all fail without the ellipsis. The first and last tests, in which we checked for the presence of a specific module in sys.modules and confirmed a specific formatting while ignoring the contents of a string, demonstrate the kind of situation where ellipsis is really useful, because it lets you focus on the part of the output that is meaningful and ignore the rest of the test. The middle tests demonstrate how different outputs can match the same expected result when ellipsis is in play.

Look at the last test. Can you imagine any output that wasn’t an ISO-formatted time stamp, but that would match the example anyway? Remember that the ellipsis can match any amount of text.

Ignoring white space

Sometimes, white space (spaces, tabs, newlines, and their ilk) is more trouble than it’s worth. Maybe you want to be able to break a single line of expected output across several lines in your test file, or maybe you’re testing a system that uses lots of white space but doesn’t convey any useful information with it.

The doctest gives you a way to “normalize” white space, turning any sequence of white space characters, in both the expected output and in the actual output, into a single space. It then checks whether these normalized versions match.

Example – invoking normality

We’re going to write a couple of tests that demonstrate how whitespace normalization works.

Insert the following code into your doctest file:

Next, a demonstration of whitespace normalization.

>>> [1, 2, 3, 4, 5, 6, 7, 8, 9]


[1, 2, 3,

 4, 5, 6,

 7, 8, 9]


>>> sys.stdout.write("This textn contains weird     spacing.n")


This text contains weird spacing.


Result – white space matches any other white space

Both of these tests pass, in spite of the fact that the result of the first one has been wrapped across multiple lines to make it easy for humans to read, and the result of the second one has had its strange newlines and indentations left out, also for human convenience.

Notice how one of the tests inserts extra whitespace in the expected output, while the other one ignores extra whitespace in the actual output? When you use +NORMALIZE_WHITESPACE, you gain a lot of flexibility with regard to how things are formatted in the text file.

You may have noted the value 39 on the last line of the last example. Why is that there? It’s because the write() method returns the number of bytes that were written, which in this case happens to be 39. If you’re trying this example in an environment that maps ASCII characters to more than one byte, you will see a different number here; this will cause the test to fail until you change the expected number of bytes.

Skipping an example

On some occasions, doctest will recognize some text as an example to be checked, when in truth you want it to be simply text. This situation is rarer than it might at first seem, because usually there’s no harm in letting doctest check everything it can. In fact, usually it’s very helpful to have doctest check everything it can. For those times when you want to limit what doctest checks, though, there’s the +SKIP directive.

Example – humans only

Append the following code to your doctest file:

Now we're telling doctest to skip a test


>>> 'This test would fail.' # doctest: +SKIP

If it were allowed to run.

Result – it looks like a test, but it’s not

Before we added this last example to the file, doctest reported thirteen tests when we ran the file through it. After adding this code, doctest still reports thirteen tests. Adding the skip directive to the code completely removed it from consideration by doctest. It’s not a test that passes, nor a test that fails. It’s not a test at all.

The other directives

There are a number of other directives that can be issued to doctest, should you find the need. They’re not as broadly useful as the ones already mentioned, but the time might come when you require one or more of them.

The full documentation for all of the doctest directives can be found at http://docs.python.org/3/library/doctest.html#doctest-options.

The remaining directives of doctest in the Python 3.4 version are as follows:

  • DONT_ACCEPT_TRUE_FOR_1: This makes doctest differentiate between boolean values and numbers
  • DONT_ACCEPT_BLANKLINE: This removes support for the <BLANKLINE> feature
  • IGNORE_EXCEPTION_DETAIL: This makes doctest only care that an exception is of the expected type

Strictly speaking, doctest supports several other options that can be set using the directive syntax, but they don’t make any sense as directives, so we’ll ignore them here.

The execution scope of doctest tests

When doctest is running the tests from text files, all the tests from the same file are run in the same execution scope. This means that, if you import a module or bind a variable in one test, that module or variable is still available in later tests. We took advantage of this fact several times in the tests written so far in this article: the sys module was only imported once, for example, although it was used in several tests.

This behavior is not necessarily beneficial, because tests need to be isolated from each other. We don’t want them to contaminate each other because, if a test depends on something that another test does, or if it fails because of something that another test does, these two tests are in some sense combined into one test that covers a larger section of your code. You don’t want that to happen, because then knowing which test has failed doesn’t give you as much information about what went wrong and where it happened.

So, how can we give each test its own execution scope? There are a few ways to do it. One would be to simply place each test in its own file, along with whatever explanatory text that is needed. This works well in terms of functionality, but running the tests can be a pain unless you have a tool to find and run all of them for you. Another problem with this approach is that this breaks the idea that the tests contribute to a human-readable document.

Another way to give each test its own execution scope is to define each test within a function, as follows:

>>> def test1():

...     import frob

...     return frob.hash('qux')

>>> test1()


By doing this, the only thing that ends up in the shared scope is the test function (named test1 here). The frob module and any other names bound inside the function are isolated with the caveat that things that happen inside imported modules are not isolated. If the frob.hash() method changes a state inside the frob module, that state will still be changed if a different test imports the frob module again.

The third way is to exercise caution with the names you create, and be sure to set them to known values at the beginning of each test section. In many ways this is the easiest approach, but this is also the one that places the most burden on you, because you have to keep track of what’s in the scope.

Why does doctest behave in this way, instead of isolating tests from each other? The doctest files are intended not just for computers to read, but also for humans. They often form a sort of narrative, flowing from one thing to the next. It would break the narrative to be constantly repeating what came before. In other words, this approach is a compromise between being a document and being a test framework, a middle ground that works for both humans and computers.

Check your understanding

Once you’ve decided on your answers to these questions, check them by writing a test document and running it through doctest:

  • How does doctest recognize the beginning of a test in a document?
  • How does doctest know when a test continues to further lines?
  • How does doctest recognize the beginning and end of the expected output of a test?
  • How would you tell doctest that you want to break the expected output across several lines, even though that’s not how the test actually outputs it?
  • Which parts of an exception report are ignored by doctest?
  • When you assign a variable in a test file, which parts of the file can actually see that variable?
  • Why do we care what code can see the variables created by a test?
  • How can we make doctest not care what a section of output contains?

Exercise – English to doctest

Time to stretch your wings a bit. I’m going to give you a description of a single function in English. Your job is to copy the description into a new text file, and then add tests that describe all the requirements in a way that the computer can understand and check.

Try to make the doctests so that they’re not just for the computer. Good doctests tend to clarify things for human readers as well. By and large, this means that you present them to human readers as examples interspersed with the text.

Without further ado, here is the English description:

The fib(N) function takes a single integer as its only parameter N. If N is 0 or 1, the function returns 1. If N is less than 0, the function raises a ValueError. Otherwise, the function returns the sum of fib(N – 1) and fib(N – 2). The returned value will never be less than 1. A naïve implementation of this function would get very slow as N increased.

I’ll give you a hint and point out that the last sentence about the function being slow, isn’t really testable. As computers get faster, any test you write that depends on an arbitrary definition of “slow” will eventually fail. Also, there’s no good way to test the difference between a slow function and a function stuck in an infinite loop, so there’s not much point in trying. If you find yourself needing to do that, it’s best to back off and try a different solution.

Not being able to tell whether a function is stuck or just slow is called the halting problem by computer scientists. We know that it can’t be solved unless we someday discover a fundamentally better kind of computer. Faster computers won’t do the trick, and neither will quantum computers, so don’t hold your breath.

The next-to-last sentence also provides some difficulty, since to test it completely would require running every positive integer through the fib() function, which would take forever (except that the computer will eventually run out of memory and force Python to raise an exception). How do we deal with this sort of thing, then?

The best solution is to check whether the condition holds true for a random sample of viable inputs. The random.randrange() and random.choice() functions in the Python standard library make that fairly easy to do.


We learned the syntax of doctest, and went through several examples describing how to use it. After that, we took a real-world specification for the AVL tree, and examined how to formalize it as a set of doctests, so that we could use it to automatically check the correctness of an implementation.

Specifically, we covered doctest’s default syntax and the directives that alter it, how to write doctests in text files, how to write doctests in Python docstrings, and what it feels like to use doctest to turn a specification into tests. If, you want to learn more about Python Testing then you can refer the following books:

Expert Python Programming: https://www.packtpub.com/application-development/expert-python-programming

Python Testing Cookbook: https://www.packtpub.com/application-development/python-testing-cookbook

Resources for Article:


Further resources on this subject:


Please enter your comment!
Please enter your name here