Manual and Automated Testing

9 min read

In this article by Claus Führer the author of the book Scientific Computing with Python 3, we focus on two aspects of testing for scientific programming: Manual and Automated testing. Manual testing is what is done by every programmer to quickly check that an implementation is working. Automated testing is the refined, automated variant of that idea. We will introduce some tools available for automatic testing in general, with a view on the particular case of scientific computing.

(For more resources related to this topic, see here.)

Manual Testing

During the development of code you do a lot of small tests in order to test its functionality. This could be called Manual Testing. Typically, you would test that a given function does what it is supposed to do, by manually testing the function in an interactive environment.

For instance, suppose that you implement the Bisection algorithm. It is an algorithm that finds a zero (root) of a scalar nonlinear function. To start the algorithm an interval has to be given with the property, that the function takes different signs on the interval boundaries.

You would then test an implementation of that algorithm typically by checking:

That a solution is found when the function has opposite signs at the interval boundaries
that an exception is raised when the function has the same sign at the interval boundaries

Manual testing, as necessary as may seem to be, is unsatisfactory. Once you convinced yourself that the code does what it is supposed to do, you formulate a relatively small number of demonstration examples to convince others of the quality of the code. At that stage one often loses interest in the tests made during development and they are forgotten or even deleted.

As soon as you change a detail and things no longer work correctly you might regret that your earlier tests are no longer available.

Automatic Testing

The correct way to develop any piece of code is to use automatic testing.

The advantages are

The automated repetition of a large number of tests after every code refactoring and before new versions are launched
A silent documentation of the use of the code
A documentation of the test coverage of your code: Did things work before a change or was a certain aspect never tested?

We suggest to develop tests in parallel to the code. Good design of tests is an art of its own and there is rarely an investment which guarantees such a good pay-off in development time savings as the investment in good tests.

Now we will go through the implementation of a simple algorithm with the automated testing methods in mind.

Testing the bisection algorithm

Let us examine automated testing for the bisection algorithm. With this algorithm a zero of a real valued function is found. An implementation of the algorithm can have the following form:

def bisect(f,a,b,tol=1.e-8):
   """
   Implementation of the bisection algorithm
   f real valued function
   a,b interval boundaries (float) with
     the property f(a)*f(b)<=0
   tol tolerance ( float )
   """
   if f(a)*f(b)>0:
     raise ValueError ("Incorrect initial interval [a,b]")
   for i in range (100):
     c = (a + b)/2 .
     if f (a)*f(c) <= 0:
       b=c
     else:
       a=c
   if abs (a - b)<tol:
     return (a + b)/2
   raise Exception (’ No root found within the given tolerance { }’.format (tol)

We assume this to be stored in a file bisection.py.

As a first test case we test that the zero of the function F(x) = x is found:

def test_identity():
  result = bisect(lambda x: x, -1., 1.)  #(for lambda)
  expected = 0.
  assert allclose(result, expected),’expected zero not found’
text_identity()

In this code you meet the Python keyword assert for the first time. It raises an exception AssertionError if its first argument returns the value False. Its optional second argument is a string with additional information.

We use the function allclose in order to test for equality for float. Let us comment on some of the features of the test function. We use an assertion to make sure that an exception will be raised if the code does not behave as expected. We have to manually run the test in the line test_identity(). There are many tools to automate this kind of call. Let us now setup a test that checks if bisect raises an exception when the function has the same sign on both ends of the interval. For now, we will suppose that the exception raised is a ValueError exception.

Example: Checking the sign for the bisection algorithm.

def test_badinput():
    try: bisect(lambda x: x,0.5,1)
except ValueError:
    pass 
else:
    raise AssertionError()

test_badinput()

In this case an AssertionError is raised if the exception is not of type ValueError.

There are tools to simplify the above construction to check that an exception is raised. Another useful kind of tests is the edge case test. Here we test arguments or user input which is likely to create mathematically undefined situations or states of the program not foreseen by the programmer.

For instance, what happens if both bounds are equal? What happens if a>b? We easily setup up such a test by using for instance

def test_equal_boundaries():
    result = bisect(lambda x: x, 1., 1.)
       expected = 0.
       assert allclose(result, expected), ‘test equal interval bounds failed’
def test_reverse_boundaries():
    result = bisect(lambda x: x, 1., -1.)
    expected = 0.
      assert allclose(result, expected), ‘test reverse interval bounds failed’

test_equal_boundaries()
test_reverse_boundaries()

Using unittest

The standard Python package unittest greatly facilitates automated testing. That package requires that we rewrite our tests a little to be compatible. The first test would have to be rewritten in a class, as follows:

from bisection import bisect
import unittest

class TestIdentity(unittest.TestCase):
    def test(self):
     result = bisect(lambda x: x, -1.2, 1.,tol=1.e-8)
     expected = 0.
     self.assertAlmostEqual(result, expected)

if __name__==‘__main__’:
    unittest.main()

Let us examine the differences to the previous implementation. First, the test is now a method and a part of a class. The class must inherit from unittest,TestCase. The test method’s name must start with test. Note that we may now use one of the assertion tools of the package, namely . Finally, the tests are run using unittest.main.

We recommend to write the tests in a file separate from the code to be tested. That’s why it starts with an import.

The test passes and returns

Ran 1 test in 0.002s

If we would have run it with a loose tolerance parameter, e.g., 1.e-3, a failure of the test would have been reported:

F
==========================================================
FAIL: test (__main__.TestIdentity)
----------------------------------------------------------------------
Traceback (most recent call last):
File “<ipython-input-11-e44778304d6f>“, line 5, in test
self.assertAlmostEqual(result, expected)
AssertionError: 0.00017089843750002018 != 0.0 within 7 places
---------------------------------------------------------------------
Ran 1 test in 0.004s
FAILED (failures=1)

Tests can and should be grouped together as methods of a test class:

Example:

import unittest
from bisection import bisect
class TestIdentity(unittest.TestCase):
    def identity_fcn(self,x):
      return x
    def test_functionality(self):
      result = bisect(self.identity_fcn, -1.2, 1.,tol=1.e-8)
      expected = 0.
      self.assertAlmostEqual(result, expected)
    def test_reverse_boundaries(self):
      result = bisect(self.identity_fcn, 1., -1.)
      expected = 0.
      self.assertAlmostEqual(result, expected)
    def test_exceeded_tolerance(self):
      tol=1.e-80
      self.assertRaises(Exception, bisect, self.identity_fcn, -1.2, 1.,tol)
if __name__==‘__main__’:
    unittest.main()

Here, the last test needs some comments: We used the method unittest.TestCase.assertRaises. It tests whether an exception is correctly raised. Its first parameter is the exception type, for example,ValueError, Exception, and its second argument is a the name of the function, which is expected to raise the exception. The remaining arguments are the arguments for this function.

The command unittest.main() creates an instance of the class TestIdentity and executes those methods starting by test.

Test setUp and tearDown

The class unittest.TestCase provides two special methods, setUp and tearDown, which are run before and after every call to a test method. This is needed when testing generators, which are exhausted after every test. We demonstrate this here by testing a program which checks in which line in a file a given string occurs for the first time:

class NotFoundError(Exception):
    pass

def find_string(file, string):
    for i,lines in enumerate(file.readlines()):
        if string in lines:
          return i
raise NotFoundError(‘String {} not found in File {}‘.
                         format(string,file.name))

We assume, that this code is saved in a file find_string.py. A test has to prepare a file and open it and remove it after the test:

import unittest 
import os # used for, e.g., deleting files

from find_in_file import find_string, NotFoundError
class TestFindInFile(unittest.TestCase):
  def setUp(self):
    file = open(‘test_file.txt’, ‘w’)
    file.write(‘aha’)
    file.close() self.file = open(‘test_file.txt’, ‘r’)
  def tearDown(self):
    os.remove(self.file.name)
  def test_exists(self):
    line_no=find_string(self.file, ‘aha’)
    self.assertEqual(line_no, 0)
  def test_not_exists(self):
    self.assertRaises(NotFoundError, find_string,self.file, ‘bha’)

if __name__==‘__main__’:
    unittest.main()

Before each test setUp is run and afterwards tearDown is executed.

Parametrizing Tests

One frequently wants to repeat the same test set-up with different data sets. When using the functionalities of unittests this requires to automatically generate test cases with the corresponding methods injected:

To this end we first construct a test case with one or several methods that will be used, when we later set up test methods. Let us consider the bisection method again and let us check if the values it returns are really zeros of the given function. We first build the test case and the method which will use for the tests:

class Tests(unittest.TestCase):

  def checkifzero(self,fcn_with_zero,interval):
    result = bisect(fcn_with_zero,*interval,tol=1.e-8)
    function_value=fcn_with_zero(result)
    expected=0.
    self.assertAlmostEqual(function_value, expected)

Then we dynamically create test functions as attributes of this class:

test_data=[‘name’:’identity’, ‘function’:lambda x: x,
    ‘interval’:[-1.2, 1.],
        ‘name’:’parabola’, ‘function’:lambda x: x**2-1,
    ’interval’:[0, 10.],
        ‘name’:’cubic’, ‘function’:lambda x: x**3-2*x**
    2,‘interval’:[0.1, 5.],]

def make_test_function(dic):
    return lambda self:self.checkifzero(dic[‘function’],dic
    [‘interval’])
for data in test_data:
    setattr(Tests, “test_name”.format(name=data[‘name’]),
      make_test_function(data))

if __name__==‘__main__’:
    unittest.main()

In this example the data is provided as a list of dictionaries. A function make_test_function dynamically generates a test function which uses a particular data dictionary to perform the test with the previously defined method checkifzero. This test function is made a method of the TestCase class by using the Python command settattr.

Summary

No program development without testing! In this article we showed the importance of well organized and documented tests. Some professionals even start development by first specifying tests. A useful tool for automatic testing is unittest, which we explained in detail.

While testing improves the reliability of a code, profiling is needed to improve the performance. Alternative ways to code may result in large performance differences. We showed how to measure computation time and how to localize bottlenecks in your code.