14 min read

There are situations where you need to create only one instance of data throughout the lifetime of a program. This can be a class instance, a list, or a dictionary, for example. The creation of a second instance is undesirable. This can result in logical errors or malfunctioning of the program. The design pattern that allows you to create only one instance of data is called singleton. In this article, you will learn about module-level, classic, and borg singletons; you’ll also learn about how they work, when to use them, and build a two-threaded web crawler that uses a singleton to access the shared resource.

(For more resources related to this topic, see here.)

Singleton is the best candidate when the requirements are as follows:

  • Controlling concurrent access to a shared resource
  • If you need a global point of access for the resource from multiple or different parts of the system
  • When you need to have only one object

Some typical use cases of using a singleton are:

  • The logging class and its subclasses (global point of access for the logging class to send messages to the log)
  • Printer spooler (your application should only have a single instance of the spooler in order to avoid having a conflicting request for the same resource)
  • Managing a connection to a database
  • File manager
  • Retrieving and storing information on external configuration files
  • Read-only singletons storing some global states (user language, time, time zone, application path, and so on)

There are several ways to implement singletons. We will look at module-level singleton, classic singletons, and borg singleton.

Module-level singleton

All modules are singletons by nature because of Python’s module importing steps:

  1. Check whether a module is already imported. If yes, return it. If not, find a module, initialize it, and return it.
  2. Initializing a module means executing a code, including all module-level assignments.
  3. When you import the module for the first time, all of the initializations will be done; however, if you try to import the module for the second time, Python will return the initialized module. Thus, the initialization will not be done, and you get a previously imported module with all of its data.

So, if you want to quickly make a singleton, use the following steps and keep the shared data as the module attribute.

   singletone.py:

   only_one_var = "I'm only one var"
   module1.py:
   import single tone
   print singleton.only_one_var
   singletone.only_one_var += " after modification"
   import module2
   module2.py:
   import singletone
   print singleton.only_one_var

Here, if you try to import a global variable in a singleton module and change its value in the module1 module, module2 will get a changed variable.

This function is quick and sometimes is all that you need; however, we need to consider the following points:

  • It’s pretty error-prone. For example, if you happen to forget the global statements, variables local to the function will be created and, the module’s variables won’t be changed, which is not what you want.
  • It’s ugly, especially if you have a lot of objects that should remain as singletons.
  • They pollute the module namespace with unnecessary variables.
  • They don’t permit lazy allocation and initialization; all global variables will be loaded during the module import process.
  • It’s not possible to re-use the code because you can not use the inheritance.
  • No special methods and no object-oriented programming benefits at all.

Classic singleton

In classic singleton in Python, we check whether an instance is already created. If it is created, we return it; otherwise, we create a new instance, assign it to a class attribute, and return it.

Let’s try to create a dedicated singleton class:

class Singleton(object):

   def __new__(cls):

       if not hasattr(cls, 'instance'):

           cls.instance = super(Singleton, cls).__new__(cls)

       return cls.instance

Here, before creating the instance, we check for the special __new__ method, which is called right before __init__ if we had created an instance earlier. If not, we create a new instance; otherwise, we return the already created instance.

Let’s check how it works:

>>> singleton = Singleton()

>>> another_singleton = Singleton()

>>> singleton is another_singleton

True

>>> singleton.only_one_var = "I'm only one var"

>>> another_singleton.only_one_var

I'm only one var

Try to subclass the Singleton class with another one.

class Child(Singleton):

   pass

If it’s a successor of Singleton, all of its instances should also be the instances of Singleton, thus sharing its states. But this doesn’t work as illustrated in the following code:

   >>> child = Child()

>>> child is singleton

>>> False

>>> child.only_one_var

AttributeError: Child instance has no attribute 'only_one_var'

To avoid this situation, the borg singleton is used.

Borg singleton

Borg is also known as monostate. In the borg pattern, all of the instances are different, but they share the same state.

In the following code , the shared state is maintained in the _shared_state attribute. And all new instances of the Borg class will have this state as defined in the __new__ class method.

class Borg(object):
    _shared_state = {}

    def __new__(cls, *args, **kwargs):

        obj = super(Borg, cls).__new__(cls, *args, **kwargs)
        obj.__dict__ = cls._shared_state
        return obj

Generally, Python stores the instance state in the __dict__ dictionary and when instantiated normally, every instance will have its own __dict__. But, here we deliberately assign the class variable _shared_state to all of the created instances.

Here is how it works with subclassing:

class Child(Borg):
    pass
>>> borg = Borg()
>>> another_borg = Borg()
>>> borg is another_borg
False
>>> child = Child()
>>> borg.only_one_var = "I'm the only one var"
>>> child.only_one_var
I'm the only one var

So, despite the fact that you can’t compare objects by their identity, using the is statement, all child objects share the parents’ state.

If you want to have a class, which is a descendant of the Borg class but has a different state, you can reset shared_state as follows:

class AnotherChild(Borg):
    _shared_state = {}

>>> another_child = AnotherChild()
>>> another_child.only_one_var
AttributeError: AnotherChild instance has no attribute 'shared_state'

Which type of singleton should be used is up to you. If you expect that your singleton will not be inherited, you can choose the classic singleton; otherwise, it’s better to stick with borg.

Implementation in Python

As a practical example, we’ll create a simple web crawler that scans a website you open on it, follows all the links that lead to the same website but to other pages, and downloads all of the images it’ll find.

To do this, we’ll need two functions: a function that scans a website for links, which leads to other pages to build a set of pages to visit, and a function that scans a page for images and downloads them.

To make it quicker, we’ll download images in two threads. These two threads should not interfere with each other, so don’t scan pages if another thread has already scanned them, and don’t download images that are already downloaded.

So, a set with downloaded images and scanned web pages will be a shared resource for our application, and we’ll keep it in a singleton instance.

In this example, you will need a library for parsing and screen scraping websites named BeautifulSoup and an HTTP client library httplib2. It should be sufficient to install both with either of the following commands:

  • $ sudo pip install BeautifulSoup httplib2
  • $ sudo easy_install BeautifulSoup httplib2

First of all, we’ll create a Singleton class. Let’s use the classic singleton in this example:

import httplib2
import os
import re
import threading
import urllib
from urlparse import urlparse, urljoin

from BeautifulSoup import BeautifulSoup

class Singleton(object):
    def __new__(cls):
        if not hasattr(cls, 'instance'):
             cls.instance = super(Singleton, cls).__new__(cls)
        return cls.instance

It will return the singleton objects to all parts of the code that request it.

Next, we’ll create a class for creating a thread. In this thread, we’ll download images from the website:

class ImageDownloaderThread(threading.Thread):
    """A thread for downloading images in parallel."""
    def __init__(self, thread_id, name, counter):
        threading.Thread.__init__(self)
        self.name = name

    def run(self):
        print 'Starting thread ', self.name
        download_images(self.name)
        print 'Finished thread ', self.name

The following function traverses the website using BFS algorithms, finds links, and adds them to a set for further downloading. We are able to specify the maximum links to follow if the website is too large.

def traverse_site(max_links=10):
    link_parser_singleton = Singleton()

    # While we have pages to parse in queue
    while link_parser_singleton.queue_to_parse:
        # If collected enough links to download images, return
        if len(link_parser_singleton.to_visit) == max_links:
            return

        url = link_parser_singleton.queue_to_parse.pop()

        http = httplib2.Http()
        try:
            status, response = http.request(url)
        except Exception:
            continue

        # Skip if not a web page
        if status.get('content-type') != 'text/html':
            continue

        # Add the link to queue for downloading images
        link_parser_singleton.to_visit.add(url)
        print 'Added', url, 'to queue'

        bs = BeautifulSoup(response)

        for link in BeautifulSoup.findAll(bs, 'a'):

            link_url = link.get('href')

            # <img> tag may not contain href attribute
            if not link_url:
                continue

            parsed = urlparse(link_url)

            # If link follows to external webpage, skip it
            if parsed.netloc and parsed.netloc != parsed_root.netloc:
                continue

            # Construct a full url from a link which can be relative
            link_url = (parsed.scheme or parsed_root.scheme) + '://' + (parsed.netloc or parsed_root.netloc) + parsed.path or ''

            # If link was added previously, skip it
            if link_url in link_parser_singleton.to_visit:
                continue

            # Add a link for further parsing
            link_parser_singleton.queue_to_parse = [link_url] + link_parser_singleton.queue_to_parse

The following function downloads images from the last web resource page in the singleton.to_visit queue and saves it to the img directory. Here, we use a singleton for synchronizing shared data, which is a set of pages to visit between two threads:

def download_images(thread_name):
    singleton = Singleton()
    # While we have pages where we have not download images
    while singleton.to_visit:
        url = singleton.to_visit.pop()

        http = httplib2.Http()
        print thread_name, 'Starting downloading images from', url

        try:
            status, response = http.request(url)
        except Exception:
            continue

        bs = BeautifulSoup(response)

       # Find all <img> tags
        images = BeautifulSoup.findAll(bs, 'img')

        for image in images:
            # Get image source url which can be absolute or relative
            src = image.get('src')
            # Construct a full url. If the image url is relative,
            # it will be prepended with webpage domain.
            # If the image url is absolute, it will remain as is
            src = urljoin(url, src)

            # Get a base name, for example 'image.png' to name file locally
            basename = os.path.basename(src)

            if src not in singleton.downloaded:
                singleton.downloaded.add(src)
                print 'Downloading', src
                # Download image to local filesystem
                urllib.urlretrieve(src, os.path.join('images', basename))

        print thread_name, 'finished downloading images from', url

Our client code is as follows:


if __name__ == '__main__':
    root = 'http://python.org'

    parsed_root = urlparse(root)

    singleton = Singleton()
    singleton.queue_to_parse = [root]
    # A set of urls to download images from
    singleton.to_visit = set()
    # Downloaded images
    singleton.downloaded = set()

    traverse_site()

    # Create images directory if not exists
    if not os.path.exists('images'):
        os.makedirs('images')

    # Create new threads
    thread1 = ImageDownloaderThread(1, "Thread-1", 1)
    thread2 = ImageDownloaderThread(2, "Thread-2", 2)

    # Start new Threads
    thread1.start()
    thread2.start()

Run a crawler using the following command:

$ python crawler.py

You should get the following output (your output may vary because the order in which the threads access resources is not predictable):

If you go to the images directory, you will find the downloaded images there.

Summary

To learn more about design patterns in depth, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended:

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here