To keep things simple though, the article doesn’t discuss how to implement database-backed web pages with Python, concentrating only on how to connect Python with MySQL.
Sample Application
The best way to learn new programming techniques is to write an application that exercises them. This article will walk you through the process of building a simple Python application that interacts with a MySQL database. In a nutshell, the application picks up some live data from a web site and then persists it to an underlying MySQL database. For the sake of simplicity, it doesn’t deal with a large dataset. Rather, it picks up a small subset of data, storing it as a few rows in the underlying database.
In particular, the application gets the latest post from the Packt Book Feed page available at http://feeds.feedburner.com/packtpub/sDsa?format=xml. Then, it analyzes the post’s title, finding appropriate tags for the article associated with the post, and finally inserts information about the post into the posts and posttags underlying database tables. As you might guess, a single post may be associated with more than one tag, meaning a record in the posts table may be related to several records in the posttags table.
Diagrammatically, the sample application components and their interactions might look like this:
Note the use of appsample.py. This script file will contain all the application code written in Python. In particular, it will contain the list of tags, as well as several Python functions packaging application logic.
Software Components
To build the sample discussed in the article you’re going to need the following software components installed on your computer:
- Python 2.5.x
- MySQLdb 1.2.x
- MySQL 5.1
All these software components can be downloaded and used for free. Although you may already have these pieces of software installed on your computer, here’s a brief overview of where you can obtain them.
You can download an appropriate Python release from the Downloads page at Python’s web site at http://python.org/download/. You may be tempted to download the most recent release. Before you choose the release, however, it is recommended that you visit the Python for MySQL page at http://sourceforge.net/projects/mysql-python/ to check what Python releases are supported by the current MySQLdb module that will be used to connect your Python installation with MySQL.
MySQLdb is the Python DB API-2.0 interface for MySQL. You can pick up the latest MySQLdb package (version 1.2.2 at the time of writing) from the sourceforge.net’s Python for MySQL page at http://sourceforge.net/projects/mysql-python/. Before you can install it, though, make sure you have Python installed in your system.
You can obtain the MySQL 5.1 distribution from the mysql.com web site at http://dev.mysql.com/downloads/mysql/5.1.html, picking up the package designed for your operating system.
Setting up the Database
Assuming you have all the software components that were outlined in the preceding section installed in your system, you can now start building the sample application. The first step is to create the posts and posttags tables in your underlying MySQL database. As mentioned earlier, a single post may be associated with more than one tag. What this means in practice is that the posts and posttags tables should have a foreign key relationship. In particular, you might create these tables as follows:
CREATE TABLE posts ( title VARCHAR(256) PRIMARY KEY, guid VARCHAR(1000), pubDate VARCHAR(50) ) ENGINE = InnoDB;
CREATE TABLE posttags (
title VARCHAR(256),
tag VARCHAR(20),
PRIMARY KEY(title,tag),
FOREIGN KEY(title) REFERENCES posts(title)
)
ENGINE = InnoDB;
As you might guess, you don’t need to populate above tables with data now. This will be automatically done later when you launch the sample.
Developing the Script
Now that you have the underlying database ready, you can move on and develop the Python code to complete the sample. In particular, you’re going to need to write the following components in Python:
- tags nested list of tags that will be used to describe the posts obtained from the Packt Book Feed page.
- obtainPost function that will be used to obtain the information about the latest post from the Packt Book Feed page.
- determineTags function that will determine appropriate tags to be applied to the latest post obtained from the Packt Book Feed page.
- insertPost function that will insert the information about the post obtained into the underlying database tables: posts and posttags.
- execPr function that will make calls to the other, described above functions. You will call this function to launch the application.
All the above components will reside in a single file, say, appsample.py that you can create in your favorite text editor, such as vi or Notepad.
First, add the following import declarations to appsample.py:
import MySQLdb import urllib2 import xml.dom.minidom
As you might guess, the first module is required to connect Python with MySQL, providing the Python DB API-2.0 interface for MySQL. The other two are needed to obtain and then parse the Packt Book Feed page’s data. You will see them in action in the obtainPost function in a moment.
But first let’s create a nested list of tags that will be used by the determineTags function that determines the tags appropriate for the post being analyzed. To save space here, the following list contains just a few tags. You may and should include more tags to this list, of course.
tags=["Python","Java","Drupal","MySQL","Oracle","Open Source"]
The next step is to add the obtainPost function responsible for getting the data from the Packt Book Feed page and generating the post dictionary that will be utilized in further processing:
def obtainPost(): addr = "http://feeds.feedburner.com/packtpub/sDsa?format=xml" xmldoc = xml.dom.minidom.parseString(urllib2.urlopen(addr).read()) item = xmldoc.getElementsByTagName("item")[0] title = item.getElementsByTagName("title")[0].firstChild.data guid = item.getElementsByTagName("guid")[0].firstChild.data pubDate = item.getElementsByTagName("pubDate")[0].firstChild.data post ={"title": title, "guid": guid, "pubDate": pubDate} return post
Now that you have obtained all the required information about the latest post on the Packt Book Feed page, you can analyze the post’s title to determine appropriate tags. For that, add the determineTags function to appsample.py:
def determineTags(title, tagslist): curtags=[] for curtag in tagslist: if title.find(curtag)>-1:curtags.append(curtag) return curtags
By now, you have both the post and tags to be persisted to the database. So, add the insertPost function that will handle this task (don’t forget to change the parameters specified to the MySQLdb.connect function for the actual ones):
def insertPost(title, guid, pubDate, curtags): db=MySQLdb.connect(host="localhost",user="usrsample",passwd="pswd",db="dbsample") c=db.cursor() c.execute("""INSERT INTO posts (title, guid, pubDate) VALUES(%s, %s,%s)""", (title, guid, pubDate)) db.commit() for tag in curtags: c.execute("""INSERT INTO posttags (title, tag) VALUES(%s,%s)""", (title, tag)) db.commit() db.close()
All that is left to do is add the execPr function that brings all the pieces together, calling the above functions in the proper order:
def execPr(): p = obtainPost() t = determineTags(p["title"],tags) insertPost(p["title"], p["guid"], p["pubDate"], t)
Now let’s test the code we just wrote. The simplest way to do this is through Python’s interactive command line. To start an interactive Python session, you can type python at your system shell prompt.
It’s important to realize that since the sample discussed here is going to obtain some data from the web, you must connect to the Internet before you launch the application. Once you’re connected, you can launch the execPr function in your Python session, as follows:
>>>import appsample >>>appsample.execPr()
If everything is okay, you should see no messages. To make sure that everything really went as planned, you can check the posts and posttags tables. To do this, you might connect to the database with the MySQL command-line tool and then issue the following SQL commands:
SELECT * FROM posts;
The above should generate the output that might look like this:
|title |guid
|pubDate
------------------------------------------------------------------
Open Source CMS Award Voting Now Closed | http://www.packtpub.com/
article/2008-award-voting-closed | Tue, 21 Oct 2008 09:29:54 +0100
Then, you might want to check out the posttags table:
SELECT * FROM posttags;
This might generate the following output:
|title |tag Open Source CMS Award Voting Now Closed | Open Source
Please note that you may see different results since you are working with live data. Another thing to note here is that if you want to re-run the sample, you first need to empty the posts and posttags tables. Otherwise, you will encounter the problem related to the primary key constraints. However, that won’t be a problem at all if you re-run the sample in a few days, when a new post or posts appear on the Packt Book Feed page.
Conclusion
In this article you looked at a simple Python application persisting data to an underlying MySQL database. Although, for the sake of simplicity, the sample discussed here doesn’t offer a web interface, it illustrates how you can obtain data from the Internet, and then utilize it within your application, and finally store that data in the database.