Python Data Persistence using MySQL Part II: Moving Data Processing to the Data

8 min read


To move data processing to the data, you can use stored procedures, stored functions, and triggers. All these components are implemented inside the underlying database, and can significantly improve performance of your application due to reducing network overhead associated with multiple calls to the database. It is important to realize, though, the decision to move any piece of processing logic into the database should be taken with care. In some situations, this may be simply inefficient. For example, if you decide to move some logic dealing with the data stored in a custom Python list into the database, while still keeping that list implemented in your Python code, this can be inefficient in such a case, since it only increases the number of calls to the underlying database, thus causing significant network overhead. To fix this situation, you could move the list from Python into the database as well, implementing it as a table.

Starting with version 5.0, MySQL supports stored procedures, stored functions, and triggers, making it possible for you to enjoy programming on the underlying database side. In this article, you will look at triggers in action. Stored procedures and functions can be used similarly.

Planning Changes for the Sample Application

Assuming you have followed the instructions in Python Data Persistence using MySQL, you should already have the application structure to be reorganized here. To recap, what you should already have is:

  • tags nested list of tags used to describe the posts obtained from the Packt Book Feed page.
  • obtainPost function obtains the information about the most recent post on the Packt Book Feed page.
  • determineTags function determines tags appropriate to the latest post obtained from the Packt Book Feed page.
  • insertPost function inserts the information about the obtained post into the underlying database tables: posts and posttags.
  • execPr function brings together the functionality of the described above functions.

That’s what you should already have on the Python side. And on the database side, you should have the following components:

  • posts table contains records representing posts obtained from the Packt Book Feed page.
  • posttags table contains records each of which represents a tag associated with a certain post stored in the posts table.

Let’s figure out how we can refactor the above structure, moving some data processing inside the database.

The first thing you might want to do is to move the tags list from Python into the database, creating a new table tags for that. Then, you can move the logic implemented with the determineTags function inside the database, defining the AFTER INSERT trigger on the posts table. From within this trigger, you will also insert rows into the posttags table, thus eliminating the need to do it from within the insertPost function.

Once you’ve done all that, you can refactor the Python code implemented in the appsample module. To summarize, here are the steps you need to perform in order to refactor the sample application discussed in the earlier article:

  • Create tags table and populate it with the data currently stored in the  tags list implemented in Python.
  • Define the AFTER INSERT trigger on the posts table.
  • Refactor the insertPost function in the module.
  • Remove the tags list from the module.
  • Remove the determineTags function from the module.
  • Refactor the execPr function in the module.

Refactoring the Underlying Database

To keep things simple, the tags table might contain a single column tag with the primary key constraint defined on it. So, you can create the tags table as follows:


Then, you might want to modify the posttags table, adding a foreign key constraint to its tag column. Before you can do that, though, you will need to delete all the rows from this table. This can be done with the following query:

DELETE FROM posttags;

Now you can move on and alter posttags as follows:


The next step is to populate the tags table. You can automate this process with the help of the following Python script:

>>> import MySQLdb
>>> import appsample
>>> db=MySQLdb.connect(host="localhost",user="usrsample",passwd="pswd",db=">>> dbsample")
>>> c=db.cursor()
>>> c.executemany("""INSERT INTO tags VALUES(%s)""", appsample.tags)
>>> db.commit()
>>> db.close()

As a result, you should have the tags table populated with the data taken from the tags list discussed in Python Data Persistence using MySQL. To make sure it has done so, you can turn back to the mysql prompt and issue the following query against the tags table:


The above should output the list of tags you have in the tags list. Of course, you can always extend this list, adding new tags with the INSERT statement. For example, you could issue the following statement to add the Visual Studio tag:

INSERT INTO tags VALUES('Visual Studio');

Now you can move on and define the AFTER INSERT trigger on the posts table:

delimiter //
INSERT INTO posttags(title, tag) SELECT NEW.title as title, tag FROM tags WHERE LOCATE(tag, NEW.title)>0;
delimiter ;

As you can see, the posttags table will be automatically populated with appropriate tags just after a new row is inserted into the posts table. Notice the use of the INSERT … SELECT statement in the body of the trigger. Using this syntax lets you insert several rows into the posttags table at once, without having to use an explicit loop. In the WHERE clause of SELECT, you use standard MySQL string function LOCATE returning the position of the first occurrence of the substring, passed in as the first argument, in the string, passed in as the second argument. In this particular example, though, you are not really interested in obtaining the position of an occurrence of the substring in the string. All you need to find out here is whether the substring appears in the string or not. If it is, it should appear in the posttags table as a separate row associated with the row just inserted into the posts table.

Refactoring the Sample’s Python Code

Now that you have moved some data and data processing from Python into the underlying database, it’s time to reorganize the appsample custom Python module created as discussed in Python Data Persistence using MySQL. As mentioned earlier, you need to rewrite the insertPost and execPr functions and remove the determineTags function and the tags list.

This is what the appsample module should look like after revising:

import MySQLdb
import urllib2
import xml.dom.minidom

def obtainPost():
addr = ""
xmldoc = xml.dom.minidom.parseString(urllib2.urlopen(addr).read())
item = xmldoc.getElementsByTagName("item")[0]title = item.getElementsByTagName("title")[0]
guid = item.getElementsByTagName("guid")[0]
pubDate = item.getElementsByTagName("pubDate")[0]
post ={"title": title, "guid": guid, "pubDate": pubDate}
return post

def insertPost(title, guid, pubDate):
c.execute("""INSERT INTO posts (title, guid, pubDate) VALUES(%s,%s,%s)""", (title, guid, pubDate))

def execPr():
p = obtainPost()
insertPost(p["title"], p["guid"], p["pubDate"])

If you compare it with appsample discussed in Part 1, you should notice that the revision is much shorter. It’s important to note, however, that nothing has changed from the user standpoint. So, if you now start the execPr function in your Python session:

>>>import appsample

This should insert a new record into the posts table, inserting automatically corresponding tags records into the posttags table, if any. The difference lies in the way it’s going on behind the scenes. Now the Python code is responsible only for obtaining the latest post from the Packt Book Feed page and then inserting a record into the posts table. Dealing with tags is now responsibility of the logic implemented inside the database. In particular, the AFTER INSERT trigger defined on the posts table should take care of inserting the rows into the posttags table.

To make sure that everything has worked smoothly, you can now check out the content of the posts and posttags tables. To look at the latest post stored in the posts table, you could issue the following query:

SELECT title, str_to_date(pubDate,'%a, %e %b %Y') lastdate FROM posts ORDER BY lastdate DESC LIMIT 1;

Then, you might want to look at the related tags stored in the posttags tables, by issuing the following query:

SELECT p.title, t.tag, str_to_date(p.pubDate,'%a, %e %b %Y') lastdate FROM posts p, posttags t
WHERE p.title=t.title ORDER BY lastdate DESC LIMIT 1;


In this article, you looked at how some business logic of a Python/MySQL application can be moved from Python into MySQL. For that, you continued with the sample application originally discussed in Python Data Persistence using MySQL.


Please enter your comment!
Please enter your name here