Tracking SQL Queries for a Request using Django

For a typical Django application, database interactions are of key importance. Ensuring that the database queries being made are correct helps to ensure that the application results are correct. Further, ensuring that the database queries produced for the application are efficient helps to make sure that the application will be able to support the desired number of concurrent users.

Django provides support in this area by making the database query history available for examination. This type of access is useful to see the SQL that is issued as a result of calling a particular model method. However, it is not helpful in learning about the bigger picture of what SQL queries are made during the processing of a particular request.

This section will show how to include information about the SQL queries needed for production of a page in the page itself. We will alter our existing survey application templates to include query information, and examine the query history for some of the existing survey application views. Though we are not aware of any problems with the existing views, we may learn something in the process of verifying that they issue the queries we expect.

Settings for accessing query history in templates

Before the query history can be accessed from a template, we need to ensure some required settings are configured properly. Three settings are needed in order for the SQL query information to be available in a template. First, the debug context processor, django.core.context_processors.debug, must be included in the TEMPLATE_CONTEXT_PROCESSORS setting. This context processor is included in the default value for TEMPLATE_CONTEXT_PROCESSORS. We have not changed that setting; therefore we do not need to do anything to enable this context processor in our project.

Second, the IP address of the machine sending the request must be listed in the INTERNAL_IPS setting. This is not a setting we have used before, and it is empty by default, so we will need to add it to the settings file. When testing using the same machine as where the development server runs, setting INTERNAL_IPS to include the loopback address is sufficient:

# Addresses for internal machines that can see potentially sensitive
# information such as the query history for a request.
INTERNAL_IPS = ('127.0.0.1', )

If you also test from other machines, you will need to include their IP addresses in this setting as well.

Third and finally, DEBUG must be True in order for the SQL query history to be available in templates.

When those three settings conditions are met, the SQL query history may be available in templates via a template variable named sql_queries. This variable contains a list of dictionaries. Each dictionary contains two keys: sql and time. The value for sql is the SQL query itself, and the value for time is the number of seconds the query took to execute.

Note that the sql_queries context variable is set by the debug context processor. Context processors are only called during template rendering when a RequestContext is used to render the template. Up until now, we have not used RequestContexts in our survey application views, since they were not necessary for the code so far. But in order to access the query history from the template, we will need to start using RequestContexts. Therefore, in addition to modifying the templates, we will need to change the view code slightly in order to include query history in the generated pages for the survey application.

SQL queries for the home page

Let's start by seeing what queries are issued in order to generate the survey application home page. Recall that the home page view code is:

def home(request):
today = datetime.date.today()
active = Survey.objects.active()
completed = Survey.objects.completed().filter(closes__gte=today-
datetime.timedelta(14))
upcoming = Survey.objects.upcoming().filter(
opens__lte=today+datetime.timedelta(7))
return render_to_response('survey/home.html',
{'active_surveys': active,
'completed_surveys': completed,
'upcoming_surveys': upcoming,
})

There are three QuerySets rendered in the template, so we would expect to see that this view generates three SQL queries. In order to check that, we must first change the view to use a RequestContext:

from django.template import RequestContext
def home(request):
today = datetime.date.today()
active = Survey.objects.active()
completed = Survey.objects.completed().filter(closes__gte=today-
datetime.timedelta(14))
upcoming = Survey.objects.upcoming().filter(
opens__lte=today+datetime.timedelta(7))
return render_to_response('survey/home.html',
{'active_surveys': active,
'completed_surveys': completed,
'upcoming_surveys': upcoming,},
RequestContext(request))

The only change here is to add the RequestContext(request) as a third parameter to render_to_response, after adding an import for it earlier in the file. When we make this change, we may as well also change the render_to_response lines for the other views to use RequestContexts as well. That way when we get to the point of examining the SQL queries for each, we will not get tripped up by having forgotten to make this small change.

Second, we'll need to display the information from sql_queries somewhere in our survey/home.html template. But where? We don't necessarily want this information displayed in the browser along with the genuine application data, since that could get confusing. One way to include it in the response but not have it be automatically visible on the browser page is to put it in an HTML comment. Then the browser will not display it on the page, but it can be seen by viewing the HTML source for the displayed page.

As a first attempt at implementing this, we might change the top of survey/home.html to look like this:

{% extends "survey/base.html" %}
{% block content %}

This template code prints out the contents of sql_queries within an HTML comment at the very beginning of the content block supplied by survey/home. html. First, the number of queries is noted by filtering the list through the length filter. Then the code iterates through each dictionary in the sql_queries list and displays sql, followed by a note in parentheses about the time taken for each query.

How well does that work? If we try it out by retrieving the survey home page (after ensuring the development server is running), and use the browser's menu item for viewing the HTML source for the page, we might see that the comment block contains something like:

<!--
1 queries

SELECT `django_session`.`session_key`, `django_session`.`session_data`,
`django_session`.`expire_date` FROM `django_session` WHERE (`django_
session`.`session_key` = d538f13c423c2fe1e7f8d8147b0f6887 AND `django_
session`.`expire_date` > 2009-10-24 17:24:49 ) (0.001 seconds)

-->

Note that the exact number of queries displayed here will depend on the version of Django you are running. This result is from Django 1.1.1; later versions of Django may not show any queries displayed here. Furthermore, the history of the browser's interaction with the site will affect the queries issued. This result is from a browser that had been used to access the admin application, and the last interaction with the admin application was to log out. You may see additional queries if the browser had been used to access the admin application but the user had not logged out. Finally, the database in use can also affect the specific queries issued and their exact formatting. This result is from a MySQL database.

That's not exactly what we expected. First, a minor annoyance, but 1 queries is wrong, it should be 1 query. Perhaps that wouldn't annoy you, particularly just in internal or debug information, but it would annoy me. I would change the template code that displays the query count to use correct pluralization:

{% with sql_queries|length as qcount %}
{{ qcount }} quer{{ qcount|pluralize:"y,ies" }}
{% endwith %}

Here, since the template needs to use the length result multiple times, it is first cached in the qcount variable by using a {% with %} block. Then it is displayed, and it is used as the variable input to the pluralize filter that will put the correct letters on the end of quer depending on the qcount value. Now the comment block will show 0 queries, 1 query, 2 queries, and so on.

With that minor annoyance out of the way, we can concentrate on the next, larger, issue, which is that the displayed query is not a query we were expecting. Furthermore, the three queries we were expecting, to retrieve the lists of completed, active, and upcoming surveys, are nowhere to be seen. What's going on? We'll take each of these in turn.

The query that is shown is accessing the django_session table. This table is used by the django.contrib.sessions application. Even though the survey application does not use this application, it is listed in our INSTALLED_APPS, since it is included in the settings.py file that startproject generates. Also, the middleware that the sessions application uses is listed in MIDDLEWARE_CLASSES.

The sessions application stores the session identifier in a cookie, named sessionid by default, that is sent to the browser as soon as any application uses a session. The browser will return the cookie in all requests to the same server. If the cookie is present in a request, the session middleware will use it to retrieve the session data. This is the query we see previously listed: the session middleware is retrieving the data for the session identified by the session cookie sent by the browser.

But the survey application does not use sessions, so how did the browser get a session cookie in the first place? The answer is that the admin application uses sessions, and this browser had previously been used to access the admin application. At that time, the sessionid cookie was set in a response, and the browser faithfully returns it on all subsequent requests. Thus, it seems likely that this django_session table query is due to a sessionid cookie set as a side-effect of using the admin application.

Can we confirm that? If we find and delete the cookie from the browser and reload the page, we should see that this SQL query is no longer listed. Without the cookie in the request, whatever code was triggering access to the session data won't have anything to look up. And since the survey application does not use sessions, none of its responses should include a new session cookie, which would cause subsequent requests to include a session lookup. Is this reasoning correct? If we try it, we will see that the comment block changes to:

<!--

0 queries

-->

Thus, we seem to have confirmed, to some extent, what happened to cause a django_session table query during processing of a survey application response. We did not track down what exact code accessed the session identified by the cookie—it could have been middleware or a context processor, but we probably don't need to know the details. It's enough to keep in mind that there are other applications running in our project besides the one we are working on, and they may cause database interactions independent of our own code. If we observe behavior which looks like it might cause a problem for our code, we can investigate further, but for this particular case we will just avoid using the admin application for now, as we would like to focus attention on the queries our own code is generating.

Now that we understand the query that was listed, what about the expected ones that were not listed? The missing queries are due to a combination of the lazy evaluation property of QuerySets and the exact placement of the comment block that lists the contents of sql_queries. We put the comment block at the top of the content block in the home page, to make it easy to find the SQL query information when looking at the page source. The template is rendered after the three QuerySets are created by the view, so it might seem that the comment placed at the top should show the SQL queries for the three QuerySets.

However, QuerySets are lazy; simply creating a QuerySet does not immediately cause interaction with the database. Rather, sending the SQL to the database is delayed until the QuerySet results are actually accessed. For the survey home page, that does not happen until the parts of the template that loop through each QuerySet are rendered. Those parts are all below where we placed the sql_queries information, so the corresponding SQL queries had not yet been issued. The fix for this is to move the placement of the comment block to the very bottom of the content block.

When we do that we should also fix two other issues with the query display. First, notice that the query displayed above has &gt; shown instead of the > symbol that would actually have been in the query sent to the database. Furthermore, if the database in use is one (such as PostgreSQL) that uses straight quotes instead of back quotes for quoting, all of the back quotes in the query would be shown as &quot;. This is due to Django's automatic escaping of HTML markup characters. This is unnecessary and hard to read in our HTML comment, so we can suppress it by sending the sql query value through the safe filter.

Second, the query is very long. In order to avoid needing to scroll to the right in order to see the entire query, we can also filter the sql value through wordwrap to introduce some line breaks and make the output more readable.

To make these changes, remove the added comment block from the top of the content block in the survey/home.html template and instead change the bottom of this template to be:

{% endif %}

{% endblock content %}

Now, if we again reload the survey home page and view the source for the returned page, we will see the queries listed in a comment at the bottom:

<!--

3 queries

SELECT `survey_survey`.`id`, `survey_survey`.`title`,
`survey_survey`.`opens`, `survey_survey`.`closes` FROM
`survey_survey` WHERE (`survey_survey`.`opens` <= 2009-10-25
AND `survey_survey`.`closes` >= 2009-10-25 ) (0.000 seconds)

SELECT `survey_survey`.`id`, `survey_survey`.`title`,
`survey_survey`.`opens`, `survey_survey`.`closes` FROM
`survey_survey` WHERE (`survey_survey`.`closes` < 2009-10-25
AND `survey_survey`.`closes` >= 2009-10-11 ) (0.000 seconds)

SELECT `survey_survey`.`id`, `survey_survey`.`title`,
`survey_survey`.`opens`, `survey_survey`.`closes` FROM
`survey_survey` WHERE (`survey_survey`.`opens` > 2009-10-25
AND `survey_survey`.`opens` <= 2009-11-01 ) (0.000 seconds)

-->

That is good, those look like exactly what we expect to see for queries for the home page. Now that we seem to have some working template code to show queries, we will consider packaging up this snippet so that it can easily be reused elsewhere.