16 min read

Writing code isn’t easy. Even the best programmer in the world can’t foresee any possible alternative and flow of the code.  This means that executing our code will always produce surprises and unexpected behavior. Some will be very evident and others will be very subtle, but the ability to identify and remove these defects in the code is critical to building solid software.

These defects in software are known as bugs, and therefore removing them is called debugging. Inspecting the code just by reading it is not great. There are always surprises, and complex code is difficult to follow. That’s why the ability to debug by stopping execution and taking a look at the current state of things is important.

This article is an excerpt from a book written by Jaime Buelta titled Python Automation Cookbook.  The Python Automation Cookbook helps you develop a clear understanding of how to automate your business processes using Python, including detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails. To follow along with the examples implemented in the article, you can find the code on the book’s GitHub repository.

In this article, we will see some of the tools and techniques for debugging, and apply them specifically to Python scripts. The scripts will have some bugs that we will fix as part of the recipe.

Debugging through logging

A simple, yet very effective, debugging approach is to output variables and other information at strategic parts of your code to follow the flow of the program. The simplest form of this approach is called print debugging or inserting print statements at certain points to print the value of variables or points while debugging.

But taking this technique a little bit further and combining it with the logging techniques allows us to create a semi-permanent trace of the execution of the program, which can be really useful when detecting issues in a running program.

Getting ready

Download the debug_logging.py file from GitHub. It contains an implementation of the bubble sort algorithm, which is the simplest way to sort a list of elements. It iterates several times over the list, and on each iteration, two adjacent values are checked and interchanged, so the bigger one is after the smaller. This makes the bigger values ascend like bubbles in the list.

When run, it checks the following list to verify that it is correct:

assert [1, 2, 3, 4, 7, 10] == bubble_sort([3, 7, 10, 2, 4, 1])

How to do it…

  1. Run the debug_logging.py script and check whether it fails:
$ python debug_logging.py
INFO:Sorting the list: [3, 7, 10, 2, 4, 1]
INFO:Sorted list:      [2, 3, 4, 7, 10, 1]
Traceback (most recent call last):
  File "debug_logging.py", line 17, in <module>
    assert [1, 2, 3, 4, 7, 10] == bubble_sort([3, 7, 10, 2, 4, 1])
AssertionError
  1. Enable the debug logging, changing the second line of the debug_logging.py script:
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.INFO)

Change the preceding line to the following one:

logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG)

Note the different level.

  1. Run the script again, with more information inside:
$ python debug_logging.py
INFO:Sorting the list: [3, 7, 10, 2, 4, 1]
DEBUG:alist: [3, 7, 10, 2, 4, 1]
DEBUG:alist: [3, 7, 10, 2, 4, 1]
DEBUG:alist: [3, 7, 2, 10, 4, 1]
DEBUG:alist: [3, 7, 2, 4, 10, 1]
DEBUG:alist: [3, 7, 2, 4, 10, 1]
DEBUG:alist: [3, 2, 7, 4, 10, 1]
DEBUG:alist: [3, 2, 4, 7, 10, 1]
DEBUG:alist: [2, 3, 4, 7, 10, 1]
DEBUG:alist: [2, 3, 4, 7, 10, 1]
DEBUG:alist: [2, 3, 4, 7, 10, 1]
INFO:Sorted list : [2, 3, 4, 7, 10, 1]
Traceback (most recent call last):
  File "debug_logging.py", line 17, in <module>
    assert [1, 2, 3, 4, 7, 10] == bubble_sort([3, 7, 10, 2, 4, 1])
AssertionError
  1. After analyzing the output, we realize that the last element of the list is not sorted. We analyze the code and discover an off-by-one error in line 7. Do you see it? Let’s fix it by changing the following line:
for passnum in reversed(range(len(alist) - 1)):

Change the preceding line to the following one:

for passnum in reversed(range(len(alist))):

(Notice the removal of the -1 operation.)

  1.  Run it again and you will see that it works as expected. The debug logs are not displayed here:
$ python debug_logging.py
INFO:Sorting the list: [3, 7, 10, 2, 4, 1]
...
INFO:Sorted list     : [1, 2, 3, 4, 7, 10]

How it works…

Step 1 presents the script and shows that the code is faulty, as it’s not properly sorting the list. The script already has some logs to show the start and end result, as well as some debug logs that show each intermediate step.

In step 2, we activate the display of the DEBUG logs, as in step 1 only the INFO ones were shown.

Step 3 runs the script again, this time displaying extra information, showing that the last element in the list is not sorted.

The bug is an off-by-one error, a very common kind of error, as it should iterate to the whole size of the list. This is fixed in step 4.

Step 5 shows that the fixed script runs correctly.

Debugging with breakpoints

Python has a ready-to-go debugger called pdb. Given that Python code is interpreted, this means that stopping the execution of the code at any point is possible by setting a breakpoint, which will jump into a command line where any code can be used to analyze the situation and execute any number of instructions.

Let’s see how to do it.

Getting ready

Download the debug_algorithm.py script, available from GitHub. The code checks whether numbers follow certain properties:

def valid(candidate):
    if candidate <= 1:
        return False

lower = candidate – 1
while lower > 1:
if candidate / lower == candidate // lower:
return False
lower -= 1

return True

assert not valid(1)
assert valid(3)
assert not valid(15)
assert not valid(18)
assert not valid(50)
assert valid(53)

It is possible that you recognize what the code is doing but bear with me so that we can analyze it interactively.

How to do it…

  1. Run the code to see all the assertions are valid:
$ python debug_algorithm.py
  1. Add  breakpoint(), after the while loop, just before line 7, resulting in the following:
    while lower > 1:
        breakpoint()
        if candidate / lower == candidate // lower:
  1.  Execute the code again, and see that it stops at the breakpoint, entering into the interactive Pdb mode:
$ python debug_algorithm.py
> .../debug_algorithm.py(8)valid()
-> if candidate / lower == candidate // lower:
(Pdb)
  1. Check the value of the candidate and the two operations. This line is checking whether the dividing of candidate by lower is an integer (the float and integer division is the same):
(Pdb) candidate
3
(Pdb) candidate / lower
1.5
(Pdb) candidate // lower
1
  1. Continue to the next instruction with n. See that it ends the while loop and returns True:
(Pdb) n
> ...debug_algorithm.py(10)valid()
-> lower -= 1
(Pdb) n
> ...debug_algorithm.py(6)valid()
-> while lower > 1:
(Pdb) n
> ...debug_algorithm.py(12)valid()
-> return True
(Pdb) n
--Return--
> ...debug_algorithm.py(12)valid()->True
-> return True
  1. Continue the execution until another breakpoint is found with c. Note that this is the next call to valid(), which has 15 as an input:
(Pdb) c
> ...debug_algorithm.py(8)valid()
-> if candidate / lower == candidate // lower:
(Pdb) candidate
15
(Pdb) lower
14
  1. Continue running and inspecting the numbers until what the valid function is doing makes sense. Are you able to find out what the code does? (If you can’t, don’t worry and check the next section.) When you’re done, exit with q. This stops the execution:
(Pdb) q
...
bdb.BdbQuit

How it works…

The code is, as you probably know already, checking whether a number is a prime number. It tries to divide the number by all integers lower than it. If at any point is divisible, it returns a False result, because it’s not a prime.

After checking the general execution in step 1, in step 2, we introduced a breakpoint in the code.

When the code is executed in step 3, it stops at the breakpoint position, entering into an interactive mode. In the interactive mode, we can inspect the values of any variable as well as perform any kind of operation.

As demonstrated in step 4, sometimes, a line of code can be better analyzed by reproducing its parts. The code can be inspected and regular operations can be executed in the command line.

The next line of code can be executed by calling n(ext), as done in step 5 several times, to see the flow of the code.

Step 6 shows how to resume the execution with the c(ontinue) command in order, to stop in the next breakpoint. All these operations can be iterated to see the flow and values, and to understand what the code is doing at any point.

The execution can be stopped with q(uit), as demonstrated in step 7.

Improving your debugging skills

In this recipe, we will analyze a small script that replicates a call to an external service, analyzing it and fixing some bugs. We will show different techniques to improve the debugging.

The script will ping some personal names to an internet server (httpbin.org, a test site) to get them back, simulating its retrieval from an external server. It will then split them into first and last name and prepare them to be sorted by surname. Finally, it will sort them.

The script contains several bugs that we will detect and fix.

Getting ready

For this recipe, we will use the requests and parse modules and include them in our virtual environment:

$ echo "requests==2.18.3" >> requirements.txt
$ echo "parse==1.8.2" >> requirements.txt
$ pip install -r requirements.txt

The debug_skills.py script is available from GitHub. Note that it contains bugs that we will fix as part of this recipe.

How to do it…

  1. Run the script, which will generate an error:
$ python debug_skills.py
Traceback (most recent call last):
 File "debug_skills.py", line 26, in <module>
 raise Exception(f'Error accessing server: {result}')
Exception: Error accessing server: <Response [405]>
  1. Analyze the status code. We get 405, which means that the method we sent is not allowed. We inspect the code and realize that for the call in line 24, we used GET when the proper one is POST (as described in the URL). Replace the code with the following:
# ERROR Step 2. Using .get when it should be .post
# (old) result = requests.get('http://httpbin.org/post', json=data)
result = requests.post('http://httpbin.org/post', json=data)

We keep the old buggy code commented with (old) for clarity of changes.

  1. Run the code again, which will produce a different error:
$ python debug_skills.py
Traceback (most recent call last):
  File "debug_skills_solved.py", line 34, in <module>
    first_name, last_name = full_name.split()
ValueError: too many values to unpack (expected 2)
  1. Insert a breakpoint in line 33, one preceding the error. Run it again and enter into debugging mode:
$ python debug_skills_solved.py
..debug_skills.py(35)<module>()
-> first_name, last_name = full_name.split()
(Pdb) n
> ...debug_skills.py(36)<module>()
-> ready_name = f'{last_name}, {first_name}'
(Pdb) c
> ...debug_skills.py(34)<module>()
-> breakpoint()

Running n does not produce an error, meaning that it’s not the first value. After a few runs on c, we realize that this is not the correct approach, as we don’t know what input is the one generating the error.

  1. Instead, we wrap the line with a try...except block and produce a breakpoint at that point:
    try:
        first_name, last_name = full_name.split()
    except:
        breakpoint()
  1. We run the code again. This time the code stops at the moment the data produced an error:
$ python debug_skills.py
> ...debug_skills.py(38)<module>()
-> ready_name = f'{last_name}, {first_name}'
(Pdb) full_name
'John Paul Smith'
  1. The cause is now clear, line 35 only allows us to split two words, but raises an error if a middle name is added. After some testing, we settle into this line to fix it:
    # ERROR Step 6 split only two words. Some names has middle names
    # (old) first_name, last_name = full_name.split()
    first_name, last_name = full_name.rsplit(maxsplit=1)
  1. We run the script again. Be sure to remove the breakpoint and try..except block. This time, it generates a list of names! And they are sorted alphabetically by surname. However, a few of the names look incorrect:
$ python debug_skills_solved.py
['Berg, Keagan', 'Cordova, Mai', 'Craig, Michael', 'Garc\\u00eda, Roc\\u00edo', 'Mccabe, Fathima', "O'Carroll, S\\u00e9amus", 'Pate, Poppy-Mae', 'Rennie, Vivienne', 'Smith, John Paul', 'Smyth, John', 'Sullivan, Roman']

Who’s called O'Carroll, S\\u00e9amus?

  1. To analyze this particular case, but skip the rest, we must create an if condition to break only for that name in line 33. Notice the in to avoid having to be totally correct:
    full_name = parse.search('"custname": "{name}"', raw_result)['name']
    if "O'Carroll" in full_name:
        breakpoint()
  1. Run the script once more. The breakpoint stops at the proper moment:
$ python debug_skills.py
> debug_skills.py(38)<module>()
-> first_name, last_name = full_name.rsplit(maxsplit=1)
(Pdb) full_name
"S\\u00e9amus O'Carroll"
  1. Move upward in the code and check the different variables:
(Pdb) full_name
"S\\u00e9amus O'Carroll"
(Pdb) raw_result
'{"custname": "S\\u00e9amus O\'Carroll"}'
(Pdb) result.json()
{'args': {}, 'data': '{"custname": "S\\u00e9amus O\'Carroll"}', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Connection': 'close', 'Content-Length': '37', 'Content-Type': 'application/json', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.3'}, 'json': {'custname': "Séamus O'Carroll"}, 'origin': '89.100.17.159', 'url': 'http://httpbin.org/post'}
  1. In the result.json() dictionary, there’s actually a different field that seems to be rendering the name properly, which is called 'json'. Let’s look at it in detail; we can see that it’s a dictionary:
(Pdb) result.json()['json']
{'custname': "Séamus O'Carroll"}
(Pdb) type(result.json()['json'])
<class 'dict'>
  1. Change the code, instead of parsing the raw value in 'data', use directly the 'json' field from the result. This simplifies the code, which is great!
    # ERROR Step 11. Obtain the value from a raw value. Use
    # the decoded JSON instead
    # raw_result = result.json()['data']
    # Extract the name from the result
    # full_name = parse.search('"custname": "{name}"', raw_result)['name']
    raw_result = result.json()['json']
    full_name = raw_result['custname']
  1. Run the code again. Remember to remove the breakpoint:
$ python debug_skills.py
['Berg, Keagan', 'Cordova, Mai', 'Craig, Michael', 'García, Rocío', 'Mccabe, Fathima', "O'Carroll, Séamus", 'Pate, Poppy-Mae', 'Rennie, Vivienne', 'Smith, John Paul', 'Smyth, John', 'Sullivan, Roman']

This time, it’s all correct! You have successfully debugged the program!

How it works…

The structure of the recipe is divided into three different problems. Let’s analyze it in small blocks:

First error—Wrong call to the external service:

After showing the first error in step 1, we read with care the resulting error, saying that the server is returning a 405 status code. This corresponds to a method not allowed, indicating that our calling method is not correct.

Inspect the following line:

result = requests.get('http://httpbin.org/post', json=data)

It gives us the indication that we are using a GET call to one URL that’s defined for POST, so we make the change in step 2.

We run the code in step 3 to find the next problem.

Second error—Wrong handling of middle names:

In step 3, we get an error of too many values to unpack. We create a breakpoint to analyze the data in step 4 at this point but discover that not all the data produces this error. The analysis done in step 4 shows that it may be very confusing to stop the execution when an error is not produced, having to continue until it does. We know that the error is produced at this point, but only for certain kind of data.

As we know that the error is being produced at some point, we capture it in a try..except block in step 5. When the exception is produced, we trigger the breakpoint.

This makes step 6 execution of the script to stop when the full_name is 'John Paul Smith'. This produces an error as the split expects two elements, not three.

This is fixed in step 7, allowing everything except the last word to be part of the first name, grouping any middle name(s) into the first element. This fits our purpose for this program, to sort by the last name.

The following line does that with rsplit:

first_name, last_name = full_name.rsplit(maxsplit=1)

It divides the text by words, starting from the right and making a maximum of one split, guaranteeing that only two elements will be returned.

When the code is changed, step 8 runs the code again to discover the next error.

Third error—Using a wrong returned value by the external service:

Running the code in step 8 displays the list and does not produce any errors. But, examining the results, we can see that some of the names are incorrectly processed.

We pick one of the examples in step 9 and create a conditional breakpoint. We only activate the breakpoint if the data fulfills the if condition.

The code is run again in step 10. From there, once validated that the data is as expected, we worked backward to find the root of the problem. Step 11 analyzes previous values and the code up to that point, trying to find out what lead to the incorrect value.

We then discover that we used the wrong field in the returned value from the result from the server. The value in the json field is better for this task and it’s already parsed for us. Step 12 checks the value and sees how it should be used.

In step 13, we change the code to adjust. Notice that the parse module is no longer needed and that the code is actually cleaner using the json field.

Once this is fixed, the code is run again in step 14. Finally, the code is doing what’s expected, sorting the names alphabetically by surname. Notice that the other name that contained strange characters is fixed as well.

To summarize, this article discussed different methods and tips to help in the debugging process and ensure the quality of your software. It leverages the great introspection capabilities of Python and its out-of-the-box debugging tools for fixing problems and producing solid automated software.

If you found this post useful, do check out the book, Python Automation Cookbook.  This book helps you develop a clear understanding of how to automate your business processes using Python, including detecting opportunities by scraping the web, analyzing information to generate automatic spreadsheets reports with graphs, and communicating with automatically generated emails.

Read Next

Getting started with Web Scraping using Python [Tutorial]

How to perform sentiment analysis using Python [Tutorial]

How to predict viral content using random forest regression in Python [Tutorial]