8 min read

(For more resources on Python 3, see here.)

Managing objects

The difference between these objects and most of the examples we’ve seen so far is that our examples tend to represent concrete ideas. Management objects are more like office managers; they don’t do the actual “visible” work out on the floor, but without them, there would be no communication between departments and nobody would know what they are supposed to do. Analogously, the attributes on a management class tend to refer to other objects that do the “visible” work; the behaviors on such a class delegate to those other classes at the right time, and pass messages between them.

As an example, we’ll write a program that does a find and replace action for text files stored in a compressed ZIP file. We’ll need objects to represent the ZIP file and each individual text file (luckily, we don’t have to write these classes, they’re available in the Python Standard Library). The manager object will be responsible for ensuring three steps occur in order:

  1. Unzipping the compressed file.
  2. Performing the find and replace action.
  3. Zipping up the new files.

The class is initialized with the .zip filename and search and replace strings. We create a temporary directory to store the unzipped files in, so that the folder stays clean. We also add a useful helper method for internal use that helps identify an individual filename inside that directory:

import sys
import os
import shutil
import zipfile

class ZipReplace:
def __init__(self, filename, search_string,
replace_string):
self.filename = filename
self.search_string = search_string
self.replace_string = replace_string
self.temp_directory = "unzipped-{}".format(
filename)


def _full_filename(self, filename):
return os.path.join(self.temp_directory, filename)

Then we create an overall “manager” method for each of the three steps. This method delegates responsibility to other methods. Obviously, we could do all three steps in one method, or indeed, in one script without ever creating an object. There are several advantages to separating the three steps:

  1. Readability: The code for each step is in a self-contained unit that is easy to read and understand. The method names describe what the method does, and no additional documentation is required to understand what is going on.
  2. Extensibility: If a subclass wanted to use compressed TAR files instead of ZIP files, it could override the zip and unzip methods without having to duplicate the find_replace method.
  3. Partitioning: An external class could create an instance of this class and call the find and replace method directly on some folder without having to zip the content.

The delegation method is the first in the code below; the rest of the methods are included for completeness:

def zip_find_replace(self):
self.unzip_files()
self.find_replace()
self.zip_files()


def unzip_files(self):
os.mkdir(self.temp_directory)
zip = zipfile.ZipFile(self.filename)
try:
zip.extractall(self.temp_directory)
finally:
zip.close()

def find_replace(self):
for filename in os.listdir(self.temp_directory):
with open(self._full_filename(filename)) as file:
contents = file.read()
contents = contents.replace(
self.search_string, self.replace_string)
with open(
self._full_filename(filename), "w") as file:
file.write(contents)

def zip_files(self):
file = zipfile.ZipFile(self.filename, 'w')
for filename in os.listdir(self.temp_directory):
file.write(
self._full_filename(filename), filename)
shutil.rmtree(self.temp_directory)

if __name__ == "__main__":
ZipReplace(*sys.argv[1:4]).zip_find_replace()

For brevity, the code for zipping and unzipping files is sparsely documented. Our current focus is on object-oriented design; if you are interested in the inner details of the zipfile module, refer to the documentation in the standard library, either online at http://docs.python.org/library/zipfile.html or by typing import zipfile ; help(zipfile) into your interactive interpreter. Note that this example only searches the top-level files in a ZIP file; if there are any folders in the unzipped content, they will not be scanned, nor will any files inside those folders.

The last two lines in the code allow us to run the example from the command line by passing the zip filename, search string, and replace string as arguments:

python zipsearch.py hello.zip hello hi

Of course, this object does not have to be created from the command line; it could be imported from another module (to perform batch ZIP file processing) or accessed as part of a GUI interface or even a higher-level management object that knows what to do with ZIP files (for example to retrieve them from an FTP server or back them up to an external disk).

As programs become more and more complex, the objects being modeled become less and less like physical objects. Properties are other abstract objects and methods are actions that change the state of those abstract objects. But at the heart of every object, no matter how complex, is a set of concrete properties and well-defined behaviors.

Removing duplicate code

Often the code in management style classes such as ZipReplace is quite generic and can be applied in many different ways. It is possible to use either composition or inheritance to help keep this code in one place, thus eliminating duplicate code. Before we look at any examples of this, let’s discuss a tiny bit of theory. Specifically: why is duplicate code a bad thing?

There are several reasons, but they all boil down to readability and maintainability. When we’re writing a new piece of code that is similar to an earlier piece, the easiest thing to do is copy the old code and change whatever needs to change (variable names, logic, comments) to make it work in the new location. Alternatively, if we’re writing new code that seems similar, but not identical to code elsewhere in the project, the easiest thing to do is write fresh code with similar behavior, rather than figure out how to extract the overlapping functionality.

But as soon as someone has to read and understand the code and they come across duplicate blocks, they are faced with a dilemma. Code that might have made sense suddenly has to be understood. How is one section different from the other? How are they the same? Under what conditions is one section called? When do we call the other? You might argue that you’re the only one reading your code, but if you don’t touch that code for eight months it will be as incomprehensible to you as to a fresh coder. When we’re trying to read two similar pieces of code, we have to understand why they’re different, as well as how they’re different. This wastes the reader’s time; code should always be written to be readable first.

I once had to try to understand someone’s code that had three identical copies of the same three hundred lines of very poorly written code. I had been working with the code for a month before I realized that the three “identical” versions were actually performing slightly different tax calculations. Some of the subtle differences were intentional, but there were also obvious areas where someone had updated a calculation in one function without updating the other two. The number of subtle, incomprehensible bugs in the code could not be counted.

Reading such duplicate code can be tiresome, but code maintenance is an even greater torment. As the preceding story suggests, keeping two similar pieces of code up to date can be a nightmare. We have to remember to update both sections whenever we update one of them, and we have to remember how the multiple sections differ so we can modify our changes when we are editing each of them. If we forget to update both sections, we will end up with extremely annoying bugs that usually manifest themselves as, “but I fixed that already, why is it still happening?”

The result is that people who are reading or maintaining our code have to spend astronomical amounts of time understanding and testing it compared to if we had written the code in a non-repetitive manner in the first place. It’s even more frustrating when we are the ones doing the maintenance. The time we save by copy-pasting existing code is lost the very first time we have to maintain it. Code is both read and maintained many more times and much more often than it is written. Comprehensible code should always be paramount.

This is why programmers, especially Python programmers (who tend to value elegant code more than average), follow what is known as the Don’t Repeat Yourself, or DRY principle. DRY code is maintainable code. My advice to beginning programmers is to never use the copy and paste feature of their editor. To intermediate programmers, I suggest they think thrice before they hit Ctrl + C.

But what should we do instead of code duplication? The simplest solution is often to move the code into a function that accepts parameters to account for whatever sections are different. This isn’t a terribly object-oriented solution, but it is frequently sufficient. For example, if we have two pieces of code that unzip a ZIP file into two different directories, we can easily write a function that accepts a parameter for the directory to which it should be unzipped instead. This may make the function itself slightly more difficult to read, but a good function name and docstring can easily make up for that, and any code that invokes the function will be easier to read.

That’s certainly enough theory! The moral of the story is: always make the effort to refactor your code to be easier to read instead of writing bad code that is only easier to write.

LEAVE A REPLY

Please enter your comment!
Please enter your name here