Python 3: Building a Wiki Application

0
207
17 min read

 

Python 3 Web Development Beginner’s Guide

Python 3 Web Development Beginner's Guide

Use Python to create, theme, and deploy unique web applications

        Read more about this book      

(For more resources on Python, see here.)

Nowadays, a wiki is a well-known tool to enable people to maintain a body of knowledge in a cooperative way. Wikipedia (http://wikipedia.org) might be the most famous example of a wiki today, but countless numbers of forums use some sort of wiki and many tools and libraries exist to implement a wiki application.

In this article, we will develop a wiki of our own, and in doing so, we will focus on two important concepts in building web applications. The first one is the design of the data layer. The second one is input validation. A wiki is normally a very public application that might not even employ a basic authentication scheme to identify users. This makes contributing to a wiki very simple, yet also makes a wiki vulnerable in the sense that anyone can put anything on a wiki page. It’s therefore a good idea to verify the content of any submitted change. You may, for example, strip out any HTML markup or disallow external links.

Enhancing user interactions in a meaningful way is often closely related with input validation. Client-side input validation helps prevent the user from entering unwanted input and is therefore a valuable addition to any application but is not a substitute for server-side input validation as we cannot trust the outside world not to try and access our server in unintended ways.

The data layer

A wiki consists of quite a number of distinct entities we can indentify. We will implement these entities and the relations that exist between them by reusing the Entity/Relation framework developed earlier.

 

Time for action – designing the wiki data model

As with any application, when we start developing our wiki application we must first take a few steps to create a data model that can act as a starting point for the development:

  1. Identify each entity that plays a role in the application. This might depend on the requirements. For example, because we want the user to be able to change the title of a topic and we want to archive revisions of the content, we define separate Topic and Page entities.
  2. Identify direct relations between entities. Our decision to define separate Topic and Page entities implies a relation between them, but there are more relations that can be identified, for example, between Topic and Tag. Do not specify indirect relations: All topics marked with the same tag are in a sense related, but in general, it is not necessary to record these indirect relations as they can easily be inferred from the recorded relation between topics and tags.

The image shows the different entities and relations we can identify in our wiki application.

In the diagram, we have illustrated the fact that a Topic may have more than one Page while a Page refers to a single User in a rather informal way by representing Page as a stack of rectangles and User as a single rectangle. In this manner, we can grasp the most relevant aspects of the relations at a glance. When we want to show more relations or relations with different characteristics, it might be a good idea to use more formal methods and tools. A good starting point is the Wikipedia entry on UML: http://en.wikipedia.org/wiki/Unified_Modelling_Language.

Python 3 Web Development

What just happened?

With the entities and relations in our data model identified, we can have a look at their specific qualities.

The basic entity in a wiki is a Topic. A topic, in this context, is basically a title that describes what this topic is about. A topic has any number of associated Pages. Each instance of a Page represents a revision; the most recent revision is the current version of a topic. Each time a topic is edited, a new revision is stored in the database. This way, we can simply revert to an earlier version if we made a mistake or compare the contents of two revisions. To simplify identifying revisions, each revision has a modification date. We also maintain a relation between the Page and the User that modified that Page.

In the wiki application that we will develop, it is also possible to associate any number of tags with a topic. A Tag entity consists simply of a tag attribute. The important part is the relation that exists between the Topic entity and the Tag entity.

Like a Tag, a Word entity consists of a single attribute. Again, the important bit is the relation, this time, between a Topic and any number of Words. We will maintain this relation to reflect the words used in the current versions (that is, the last revision of a Page) of a Topic. This will allow for fairly responsive full text search facilities.

The final entity we encounter is the Image entity. We will use this to store images alongside the pages with text. We do not define any relation between topics and images. Images might be referred to in the text of the topic, but besides this textual reference, we do not maintain a formal relation. If we would like to maintain such a relation, we would be forced to scan for image references each time a new revision of a page was stored, and probably we would need to signal something if a reference attempt was made to a non-existing image. In this case, we choose to ignore this: references to images that do not exist in the database will simply show nothing:

Chapter6/wikidb.py

from entity import Entity
from relation import Relation
class User(Entity): pass
class Topic(Entity): pass
class Page(Entity): pass
class Tag(Entity): pass
class Word(Entity): pass
class Image(Entity): pass
class UserPage(Relation): pass
class TopicPage(Relation): pass
class TopicTag(Relation): pass
class ImagePage(Relation): pass
class TopicWord(Relation): pass
def threadinit(db):
User.threadinit(db)
Topic.threadinit(db)
Page.threadinit(db)
Tag.threadinit(db)
Word.threadinit(db)
Image.threadinit(db)
UserPage.threadinit(db)
TopicPage.threadinit(db)
TopicTag.threadinit(db)
ImagePage.threadinit(db)
TopicWord.threadinit(db)
def inittable():
User.inittable(userid="unique not null")
Topic.inittable(title="unique not null")
Page.inittable(content="",
modified="not null default CURRENT_TIMESTAMP")
Tag.inittable(tag="unique not null")
Word.inittable(word="unique not null")
Image.inittable(type="",data="blob",title="",
modified="not null default CURRENT_TIMESTAMP",
description="")
UserPage.inittable(User,Page)
TopicPage.inittable(Topic,Page)
TopicTag.inittable(Topic,Tag)
TopicWord.inittable(Topic,Word)

Because we can reuse the entity and relation modules we developed earlier, the actual implementation of the database layer is straightforward (full code is available as wikidb.py). After importing both modules, we first define a subclass of Entity for each entity we identified in our data model. All these classes are used as is, so they have only a pass statement as their body.

Likewise, we define a subclass of Relation for each relation we need to implement in our wiki application.

All these Entity and Relation subclasses still need the initialization code to be called once each time the application starts and that is where the convenience function initdb() comes in. It bundles the initialization code for each entity and relation (highlighted).

Many entities we define here are simple but a few warrant a closer inspection. The Page entity contains a modified column that has a non null constraint. It also has a default: CURRENT_TIMESTAMP (highlighted). This default is SQLite specific (other database engines will have other ways of specifying such a default) and will initialize the modified column to the current date and time if we create a new Page record without explicitly setting a value.

The Image entity also has a definition that is a little bit different: its data column is explicitly defined to have a blob affinity. This will enable us to store binary data without any problem in this table, something we need to store and retrieve the binary data contained in an image. Of course, SQLite will happily store anything we pass it in this column, but if we pass it an array of bytes (not a string that is), that array is stored as is.

 

The delivery layer

With the foundation, that is, the data layer in place, we build on it when we develop the delivery layer. Between the delivery layer and the database layer, there is an additional layer that encapsulates the domain-specific knowledge (that is, it knows how to verify that the title of a new Topic entity conforms to the requirements we set for it before it stores it in the database):

Python 3 Web Development

Each different layer in our application is implemented in its own file or files. It is easy to get confused, so before we delve further into these files, have a look at the following table. It lists the different files that together make up the wiki application and refers to the names of the layers.

Python 3 Web Development

We’ll focus on the main CherryPy application first to get a feel for the behavior of the application.

 

Time for action – implementing the opening screen

The opening screen of the wiki application shows a list of all defined topics on the right and several ways to locate topics on the left. Note that it still looks quite rough because, at this point, we haven’t applied any style sheets:

Python 3 Web Development

Let us first take a few steps to identify the underlying structure. This structure is what we would like to represent in the HTML markup:

  • Identify related pieces of information that are grouped together. These form the backbone of a structured web page. In this case, the search features on the left form a group of elements distinct from the list of topics on the right.
  • Identify distinct pieces of functionality within these larger groups. For example, the elements (input field and search button) that together make up the word search are such a piece of functionality, as are the tag search and the tag cloud.
  • Try to identify any hidden functionality, that is, necessary pieces of information that will have to be part of the HTML markup, but are not directly visible on a page. In our case, we have links to the jQuery and JQuery UI JavaScript libraries and links to CSS style sheets.

Identifying these distinct pieces will not only help to put together HTML markup that reflects the structure of a page, but also help to identify necessary functionality in the delivery layer because each of these functional pieces is concerned with specific information processed and produced by the server.

What just happened?

Let us look in somewhat more detail at the structure of the opening page that we identified.

Most notable are three search input fields to locate topics based on words occurring in their bodies, based on their actual title or based on tags associated with a topic. These search fields feature auto complete functionality that allows for comma-separated lists. In the same column, there is also room for a tag cloud, an alphabetical list of tags with font sizes dependent on the number of topics marked with that tag.

The structural components

The HTML markup for this opening page is shown next. It is available as the file basepage.html and the contents of this file are served by several methods in the Wiki class implementing the delivery layer, each with a suitable content segment. Also, some of the content will be filled in by AJAX calls, as we will see in a moment:

Chapter6/basepage.html


<html>
<head>
<title>Wiki</title>
<script
src=
"http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"
type="text/javascript">
</script>
<script
src=
"http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.3/jquery-ui.min.js"
type="text/javascript">
</script>
<link rel="stylesheet"
href="http://ajax.googleapis.com/ajax/libs/
jqueryui/1.8.3/themes/smoothness/jquery-ui.css"
type="text/css" media="all" />
<link rel="stylesheet" href="/wiki.css"
type="text/css" media="all" />
</head>
<body>

<div id="navigation">
<div class="navitem">
<a href="./">Wiki Home</a>
</div>
<div class="navitem">
<span class="label">Search topic</span>
<form id="topicsearch">
<input type="text" >
<button type="submit" >Search</button>
</form>
</div>
<div class="navitem">
<span class="label">Search word</span>
<form id="wordsearch">
<input type="text" >
<button type="submit" >Search</button>
</form>
</div>
<div class="navitem">
<span class="label">Search tag</span>
<form id="tagsearch">
<input type="text" >
<button type="submit" >Search</button>
</form>
</div>
<div class="navitem">
<p id="tagcloud">Tag cloud</p>
</div>
</div>
<div id="content">%s</div>
<script src="/wikiweb.js" type="text/javascript"></script>
</body>
</html>

The <head> element contains both links to CSS style sheets and <script> elements that refer to the jQuery libraries. This time, we choose again to retrieve these libraries from a public content delivery network.

The highlighted lines show the top-level <div> elements that define the structure of the page. In this case, we have identified a navigation part and a content part and this is reflected in the HTML markup.

Enclosed in the navigation part are the search functions, each in their own <div> element. The content part contains just an interpolation placeholder %s for now, that will be filled in by the method that serves this markup. Just before the end of the body of the markup is a final <script> element that refers to a JavaScript file that will perform actions specific to our application and we will examine those later.

The application methods

The markup from the previous section is served by methods of the Wiki class, an instance of which class can be mounted as a CherryPy application. The index() method, for example, is where we produce the markup for the opening screen (the complete file is available as wikiweb.py and contains several other methods that we will examine in the following sections):

Chapter6/wikiweb.py

@cherrypy.expose
def index(self):
item = '<li><a href="show?topic=%s">%s</a></li>'
topiclist = "n".join(
[item%(t,t)for t in wiki.gettopiclist()])
content = '<div id="wikihome"><ul>%s</ul></div>'%(
topiclist,)
return basepage % content

First, we define the markup for every topic we will display in the main area of the opening page (highlighted). The markup consists of a list item that contains an anchor element that refers to a URL relative to the page showing the opening screen. Using relative URLs allows us to mount the class that implements this part of the application anywhere in the tree that serves the CherryPy application. The show() method that will serve this URL takes a topic parameter whose value is interpolated in the next line for each topic that is present in the database.

The result is joined to a single string that is interpolated into yet another string that encapsulates all the list items we just generated in an unordered list (a <ul> element in the markup) and this is finally returned as the interpolated content of the basepage variable.

In the definition of the index() method, we see a pattern that will be repeated often in the wiki application: methods in the delivery layer, like index(), concern themselves with constructing and serving markup to the client and delegate the actual retrieval of information to a module that knows all about the wiki itself. Here the list of topics is produced by the wiki.gettopiclist() function, while index() converts this information to markup. Separation of these activities helps to keep the code readable and therefore maintainable.

 

Time for action – implementing a wiki topic screen

When we request a URL of the form show?topic=value, this will result in calling the show() method. If value equals an existing topic, the following (as yet unstyled) screen is the result:

Python 3 Web Development

Just as for the opening screen, we take steps to:

  • Identify the main areas on screen
  • Identify specific functionality
  • Identify any hidden functionality

The page structure is very similar to the opening screen, with the same navigational items, but instead of a list of topics, we see the content of the requested topic together with some additional information like the tags associated with this subject and a button that may be clicked to edit the contents of this topic. After all, collaboratively editing content is what a Wiki is all about.

We deliberately made the choice not to refresh the contents of just a part of the opening screen with an AJAX call, but opted instead for a simple link that replaces the whole page. This way, there will be an unambiguous URL in the address bar of the browser that will point at the topic. This allows for easy bookmarking. An AJAX call would have left the URL of the opening screen that is visible in the address bar of the browser unaltered and although there are ways to alleviate this problem, we settle for this simple solution here.

What just happened?

As the main structure we identified is almost identical to the one for the opening page, the show() method will reuse the markup in basepage.html.

Chapter6/wikiweb.py

@cherrypy.expose
def show(self,topic):
topic = topic.capitalize()
currentcontent,tags = wiki.gettopic(topic)
currentcontent = "".join(wiki.render(currentcontent))
tags = ['<li><a href="searchtags?tags=%s">%s</a></li>'%(
t,t) for t in tags] content = '''
<div>
<h1>%s</h1><a href="edit?topic=%s">Edit</a>
</div>
<div id="wikitopic">%s</div>
<div id="wikitags"><ul>%s</ul></div>
<div id="revisions">revisions</div>
''' % ( topic, topic, currentcontent,"n".join(tags))
return basepage % content

The show() method delegates most of the work to the wiki.gettopic() method (highlighted) that we will examine in the next section and concentrates on creating the markup it will deliver to the client. wiki.gettopic() will return a tuple that consists of both the current content of the topic and a list of tags.

Those tags are converted to <li> elements with anchors that point to the searchtags URL. This list of tags provides a simple way for the reader to find related topics with a single click. The searchtags URL takes a tags argument so a single <li> element constructed this way may look like this: <li><a href=”searchtags?tags=Python”>Python</a></li>.

The content and the clickable list of tags are embedded in the markup of the basepage together with an anchor that points to the edit URL. Later, we will style this anchor to look like a button and when the user clicks it, it will present a page where the content may be edited.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here