Django Foreign Key Object Patcher

This article may be helpful for optimizing query performance when fetching foreign key objects in Django on large tables. Most of the time using the select_related queryset function is enough to group the population of foreign key objects into the original query. However, if you are using a relational database and the tables used in select_related become sufficiently large (1m+ rows), then select_related will begin to perform very poorly. Today, we will discuss a generic tool we use at Votizen.com to patch foreign key objects for a couple of large tables.

A quick note: if you are using Django 1.4, there is a prefetch feature that should mitigate this, but if you are using an older version, you may need your own patching mechanism.

Getting ready

We will use the following data model for today’s code:

from django.contrib.auth.models import User
from django.db import models

class Message(models.Model):
    """
    Messages sent to a user.
    """
    message = models.TextField()
    user = models.ForeignKey(User)

This might be used for a system where messages are delivered to users on your site. Anyway, this will create a message table that foreign keys to the Django auth_user table. Normally, you might query all messages like:

message_qs = Message.objects.all().select_related('user')

Which would make one query, but populate both the message object and the user foreign key object of the messages fetched. However, if the message and auth_user tables becomes sufficiently large, then you will start having performance problems.

How do it…

Here is the patching function:

from django.db.models.query import QuerySet

def generic_model_patcher(element_list, *args):
    """
    Accepts a list of patching tuples, each tuples should
        have the following definition:
            (target_property, query_set_or_manager,)
        the target_property is the property to set values on

        so patching Item would be:
            ('user', Message.objects,)
    """
    l_maps = []

    # validate args and setup maps
    for t in args:
        if not (isinstance(t, tuple) and 2 == len(t) and
                isinstance(t[0], (str, unicode,)) and
                isinstance(t[1], (QuerySet, models.Manager,))):
            raise ValueError('arguments must be tuples, '
                             '(target_property, query_set,)')
        l_maps.append({})

    new_element_list = list(element_list)

    # gather all ids up
    for element in new_element_list:
        for i, t in enumerate(args):
            id_value = getattr(element, u'%s_id' % t[0])

            if id_value:  # support nullable ids
                l_maps[i][id_value] = element

    # fetch the data and patch it onto elements
    for i, t in enumerate(args):
        map = l_maps[i]
        for new_element in t[1].filter(id__in=map.keys()):
            element = map[new_element.id]
            setattr(element, t[0], new_element)

    return new_element_list

To use it with our example:

list_of_messages = list(Message.objects.all()[:30])
patched_list_of_messages = generic_model_patcher(list_of_messages, (
    'user', Message.objects,))

How it works…

This technique will collect all foreign key ids from a list of objects and make one additional query to fetch them, using a map to patch the fetched foreign key objects back onto the original object. So you will be making two queries instead of one, but with large tables it is often faster to make two queries against unique indexes, than joining the tables, even if joining on two indexed columns. However, accessing message.user when iterating over patched_list_of_messages will cause no additional queries, just like when using select_related.

It is important that only lists of objects are passed into generic_model_patcher and not querysets, because the whole set will be iterated on to collect the foreign key ids. The patching function supports patching of multiple foreign key objects at the same time, by passing multiple tuples.

The tuple should have two values, the name of the property for the foreign key objects (user in this case) and a queryset or manager to fetch them. The manager is required, to give maximum control to the developers so they can decide whether to use the default or something more specialized.

There’s more…

We use this technique most often to patch pagination objects before they are passed into templates. Here is an example using pagination:

from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger

message_paginator = Paginator(Message.objects.all(), 30)

page = request.GET.get('page', 1)
try:
    message_paginated = message_paginator.page(page)
except PageNotAnInteger:
    # If page is not an integer, deliver first page.
    message_paginated = message_paginator.page(1)
except EmptyPage:
    # If page is out of range (e.g. 9999), deliver last page of results.
    message_paginated = message_paginator.page(message_paginator.num_pages)

message_paginated.objects_list = generic_model_patcher(
    message_paginated.objects_list, ('user', Message.objects,))

You would simply iterate of message_paginated.objects_list in your template and it would be the paginated and patched results.