Documenting Your Python Project-part2

Building the Documentation

An easier way to guide your readers and your writers is to provide each one of them with helpers and guidelines, as we have learned in the previous section of this article.

From a writer's point of view, this is done by having a set of reusable templates together with a guide that describes how and when to use them in a project. It is called a documentation portfolio.

From a reader point of view, being able to browse the documentation with no pain, and getting used to finding the info efficiently, is done by building a document landscape.

Building the Portfolio

There are many kinds of documents a software project can have, from low-level documents that refer directly to the code, to design papers that provide a high-level overview of the application.

For instance, Scott Ambler defines an extensive list of document types in his book Agile Modeling (http://www.agilemodeling.com/essays/agileArchitecture.htm). He builds a portfolio from early specifications to operations documents. Even the project management documents are covered, so the whole documenting needs are built with a standardized set of templates.

Since a complete portfolio is tightly related to the methodologies used to build the software, this article will only focus on a common subset that you can complete with your specific needs. Building an efficient portfolio takes a long time, as it captures your working habits.

A common set of documents in software projects can be classified in three categories:

Design: All documents that provide architectural information, and low-level design information, such as class diagrams, or database diagrams
Usage: Documents on how to use the software; this can be in the shape of a cookbook and tutorials, or a module-level help
Operations: Provide guidelines on how to deploy, upgrade, or operate the software

Design

The purpose of design documentation is to describe how the software works and how the code is organized. It is used by developers to understand the system but is also a good entry point for people who are trying to understand how the application works.

The different kinds of design documents a software can have are:

Architecture overview
Database models
Class diagrams with dependencies and hierarchy relations
User interface wireframes
Infrastructure description

Mostly, these documents are composed of some diagrams and a minimum amount of text. The conventions used for the diagrams are very specific to the team and the project, and this is perfectly fine as long as it is consistent.

UML provides thirteen diagrams that cover most aspects in a software design. The class diagram is probably the most used one, but it is possible to describe every aspect of software with it. See http://en.wikipedia.org/wiki/Unified_Modeling_Language#Diagrams.

Following a specific modeling language such as UML is not often fully done, and teams just make up their own way throughout their common experience. They pick up good practice from UML or other modeling languages, and create their own recipes.

For instance, for architecture overview diagrams, some designers just draw boxes and arrows on a whiteboard without following any particular design rules and take a picture of it. Others work with simple drawing programs such as Dia (http://www.gnome.org/projects/dia) or Microsoft Visio (not open source, so not free), since it is enough to understand the design.

Database model diagrams depend on the kind of database you are using. There are complete data modeling software applications that provide drawing tools to automatically generate tables and their relations. But this is overkill in Python most of the time. If you are using an ORM such as SQLAlchemy (for instance), simple boxes with lists of fields, together with table relations are enough to describe your mappings before you start to write them.

Class diagrams are often simplified UML class diagrams: There is no need in Python to specify the protected members of a class, for instance. So the tools used for an architectural overview diagram fit this need too.

User interface diagrams depend on whether you are writing a web or a desktop application. Web applications often describe the center of the screen, since the header, footer, left, and right panels are common. Many web developers just handwrite those screens and capture them with a camera or a scanner. Others create prototypes in HTML and make screen snapshots. For desktop applications, snapshots on prototype screens, or annotated mock-ups made with tools such as Gimp or Photoshop are the most common way.

Infrastructure overview diagrams are like architecture diagrams, but they focus on how the software interacts with third-party elements, such as mail servers, databases, or any kind of data streams.

Common Template

The important point when creating such documents is to make sure the target readership is perfectly known, and the content scope is limited. So a generic template for design documents can provide a light structure with a little advice for the writer.

Such a structure can include:

Title
Author
Tags (keywords)
Description (abstract)
Target (Who should read this?)
Content (with diagrams)
References to other documents

The content should be three or four screens (a 1024x768 average screen) at the most, to be sure to limit the scope. If it gets bigger, it should be split into several documents or summarized.

The template also provides the author's name and a list of tags to manage its evolutions and ease its classification. This will be covered later in the article.

Paster is the right tool to use to provide templates for documentation. pbp.skels implements the design template described, and can be used exactly like code generation. A target folder is provided and a few questions are answered:

$ paster create -t pbp_design_doc design
Selected and implied templates:
pbp.skels#pbp_design_doc A Design document
Variables:
egg: design
package: design
project: design
Enter title ['Title']: Database specifications for atomisator.db
Enter short_name ['recipe']: mappers
Enter author (Author name) ['John Doe']: Tarek
Enter keywords ['tag1 tag2']: database mapping sql
Creating template pbp_design_doc
Creating directory ./design
Copying +short_name+.txt_tmpl to ./design/mappers.txt

The result can then be completed:

=========================================
Database specifications for atomisator.db
=========================================
:Author: Tarek
:Tags: database mapping sql
:abstract:
Write here a small abstract about your design document.
.. contents ::
Who should read this ?
::::::::::::::::::::::
Explain here who is the target readership.
Content
:::::::
Write your document here. Do not hesitate to split it in several
sections.
References
::::::::::
Put here references, and links to other documents.

Usage

Usage documentation describes how a particular part of the software works. This documentation can describe low-level parts such as how a function works, but also high-level parts such command-line arguments for calling the program. This is the most important part of documentation in framework applications, since the target readership is mainly the developers that are going to reuse the code.

The three main kinds of documents are:

Recipe: A short document that explains how to do something. This kind of document targets one readership and focuses on one specific topic.
Tutorial: A step-by-step document that explains how to use a feature of the software. This document can refer to recipes, and each instance is intended to one readership.
Module helper: A low-level document that explains what a module contains. This document could be shown (for instance) when you call the help built-in over a module.

Recipe

A recipe answers a very specific problem and provides a solution to resolve it.

For example, ActiveState provides a Python Cookbook online (a cookbook is a collection of recipes), where developers can describe how to do something in Python (http://aspn.activestate.com/ASPN/Python/Cookbook).

These recipes must be short and are structured like this:

Title
Submitter
Last updated
Version
Category
Description
Source (the source code)
Discussion (the text explaining the code)
Comments (from the web)

Often, they are one-screen long and do not go into great details. This structure perfectly fits a software's needs and can be adapted in a generic structure, where the target readership is added and the category replaced by tags:

Title (short sentence)
Author
Tags (keywords)
Who should read this?
Prerequisites (other documents to read, for example)
Problem (a short description)
Solution (the main text, one or two screens)
References (links to other documents)

The date and version are not useful here, since we will see later that the documentation is managed like source code in the project.

Like the design template, pbp.skels provide a pbp_recipe_doc template that can be used to generate this structure:

$ paster create -t pbp_recipe_doc recipes
Selected and implied templates:
pbp.skels#pbp_recipe_doc A recipe
Variables:
egg: recipes
package: recipes
project: recipes
Enter title (use a short question): How to use atomisator.db
Enter short_name ['recipe'] : atomisator-db
Enter author (Author name) ['John Doe']: Tarek
Enter keywords ['tag1 tag2']: atomisator db
Creating template pbp_recipe_doc
Creating directory ./recipes
Copying +short_name+.txt_tmpl to ./recipes/atomisator-db.txt

The result can then be completed by the writer:

========================
How to use atomisator.db
========================
:Author: Tarek
:Tags: atomisator db
.. contents ::
Who should read this ?
::::::::::::::::::::::
Explain here who is the target readership.
Prerequisites
:::::::::::::
Put here the prerequisites for people to follow this recipe.
Problem
:::::::
Explain here the problem resolved in a few sentences.
Solution
::::::::
Put here the solution.
References
::::::::::
Put here references, and links to other recipes.

Tutorial

A tutorial differs from a recipe in its purpose. It is not intended to resolve an isolated problem, but rather describes how to use a feature of the application step by step. This can be longer than a recipe and can concern many parts of the application. For example, Django provides a list of tutorials on its website. Writing your first Django App, part 1 (http://www.djangoproject.com/documentation/tutorial01) explains in ten screens how to build an application with Django.

A structure for such a document can be:

Title (short sentence)
Author
Tags (words)
Description (abstract)
Who should read this?
Prerequisites (other documents to read, for example)
Tutorial (the main text)
References (links to other documents)

The pbp_tutorial_doc template is provided in pbp.skels as well with this structure, which is similar to the design template.

Module Helper

The last template that can be added in our collection is the module helper template. A module helper refers to a single module and provides a description of its contents, together with usage examples.

Some tools can automatically build such documents by extracting the docstrings and computing module help using pydoc, like Epydoc ( http://epydoc.sourceforge.net). So it is possible to generate an extensive documentation based on API introspection. This kind of documentation is often provided in Python frameworks. For instance Plone provides an http://api.plone.org server that keeps an up-to-date collection of module helpers.

The main problems with this approach are:

There is no smart selection performed over the modules that are really interesting to document.
The code can be obfuscated by the documentation.

Furthermore, module documentation provides examples that sometimes refer to several parts of the module, and are hard to split between the functions' and classes' docstrings. The module docstring could be used for that purpose by writing a text at the top of the module. But this ends in having a hybrid file composed of a block of text, then a block of code. This is rather obfuscating when the code represents less than 50% of the total length. If you are the author, this is perfectly fine. But when people try to read the code (not the documentation), they will have to jump the docstrings part.

Another approach is to separate the text in its own file. A manual selection can then be operated to decide which Python module will have its module helper file. The documents can then be separated from the code base and allowed to live their own life, as we will see in the next part. This is how Python is documented.

Many developers will disagree on the fact that doc and code separation is better than docstrings. This approach means that the documentation process is fully integrated in the development cycle; otherwise it will quickly become obsolete. The docstrings approach solves this problem by providing proximity between the code and its usage example, but doesn't bring it to a higher level: a document that can be used as part of a plain documentation.

The template for Module Helper is really simple, as it contains just a little metadata before the content is written. The target is not defined since it is the developers who wish to use the module: