In this article we’ll understand how functional programming can be applied to web services in Python.
This article is an extract from the 2nd edition of the bestseller, Functional Python Programming, written by Steven Lott.
We’ll look at a RESTful web service, which can slice and dice a source of data and provide downloads as JSON, XML, or CSV files. We’ll provide an overall WSGI-compatible wrapper. The functions that do the real work of the application won’t be narrowly constrained to fit the WSGI standard.
We’ll use a simple dataset with four subcollections: the Anscombe Quartet. It’s a small set of data but it can be used to show the principles of a RESTful web service.
We’ll split our application into two tiers: a web tier, which will be a simple WSGI application, and data service tier, which will be more typical functional programming. We’ll look at the web tier first so that we can focus on a functional approach to provide meaningful results.
We need to provide two pieces of information to the web service:
- The quartet that we want: this is a slice and dice operation. The idea is to slice up the information by filtering and extracting meaningful subsets.
- The output format we want.
The data selection is commonly done through the request path. We can request /anscombe/I/ or /anscombe/II/ to pick specific datasets from the quartet. The idea is that a URL defines a resource, and there’s no good reason for the URL to ever change. In this case, the dataset selectors aren’t dependent on dates or some organizational approval status, or other external factors. The URL is timeless and absolute.
The output format is not a first-class part of the URL. It’s just a serialization format, not the data itself. In some cases, the format is requested through the HTTP Accept header. This is hard to use from a browser, but easy to use from an application using a RESTful API. When extracting data from the browser, a query string is commonly used to specify the output format. We’ll use the ?form=json method at the end of the path to specify the JSON output format.
A URL we can use will look like this:
http://localhost:8080/anscombe/III/?form=csv
This would request a CSV download of the third dataset.
Creating the Web Server Gateway Interface
First, we’ll use a simple URL pattern-matching expression to define the one and only routing in our application. In a larger or more complex application, we might have more than one such pattern:
import re path_pat= re.compile(r"^/anscombe/(?P<dataset>.*?)/?$")
This pattern allows us to define an overall script in the WSGI sense at the top level of the path. In this case, the script is anscombe. We’ll take the next level of the path as a dataset to select from the Anscombe Quartet. The dataset value should be one of I, II, III, or IV.
We used a named parameter for the selection criteria. In many cases, RESTful APIs are described using a syntax, as follows:
/anscombe/{dataset}/
We translated this idealized pattern into a proper, regular expression, and preserved the name of the dataset selector in the path.
Here are some example URL paths that demonstrate how this pattern works:
>>> m1 = path_pat.match( "/anscombe/I" ) >>> m1.groupdict() {'dataset': 'I'} >>> m2 = path_pat.match( "/anscombe/II/" ) >>> m2.groupdict() {'dataset': 'II'} >>> m3 = path_pat.match( "/anscombe/" ) >>> m3.groupdict() {'dataset': ''}
Each of these examples shows the details parsed from the URL path. When a specific series is named, this is located in the path. When no series is named, then an empty string is found by the pattern.
Here’s the overall WSGI application:
import traceback import urllib.parse def anscombe_app( environ: Dict, start_response: SR_Func ) -> Iterable[bytes]: log = environ['wsgi.errors'] try: match = path_pat.match(environ['PATH_INFO']) set_id = match.group('dataset').upper() query = urllib.parse.parse_qs(environ['QUERY_STRING']) print(environ['PATH_INFO'], environ['QUERY_STRING'], match.groupdict(), file=log)
dataset = anscombe_filter(set_id, raw_data())
content_bytes, mime = serialize(
query['form'][0], set_id, dataset)
headers = [
('Content-Type', mime),
('Content-Length', str(len(content_bytes))),
]
start_response("200 OK", headers)
return [content_bytes]
except Exception as e: # pylint: disable=broad-except
traceback.print_exc(file=log)
tb = traceback.format_exc()
content = error_page.substitute(
title="Error", message=repr(e), traceback=tb)
content_bytes = content.encode("utf-8")
headers = [
('Content-Type', "text/html"),
('Content-Length', str(len(content_bytes))),
]
start_response("404 NOT FOUND", headers)
return [content_bytes]
This application will extract two pieces of information from the request: the PATH_INFO and the QUERY_STRING keys in the environment dictionary. The PATH_INFO request will define which set to extract. The QUERY_STRING request will specify an output format.
It’s important to note that query strings can be quite complex. Rather than assume it is simply a string like ?form=json, we’ve used the urllib.parse module to properly locate all of the name-value pairs in the query string. The value with the 'form' key in the dictionary extracted from the query string can be found in query['form'][0]. This should be one of the defined formats. If it isn’t, an exception will be raised, and an error page displayed.
After locating the path and query string, the application processing is highlighted in bold. These two statements rely on three functions to gather, filter, and serialize the results:
- The raw_data() function reads the raw data from a file. The result is a dictionary with lists of Pair objects.
- The anscombe_filter() function accepts a selection string and the dictionary of raw data and returns a single list of Pair objects.
- The list of pairs is then serialized into bytes by the serialize() function. The serializer is expected to produce byte’s, which can then be packaged with an appropriate header, and returned.
We elected to produce an HTTP Content-Length header as part of the result. This header isn’t required, but it’s polite for large downloads. Because we decided to emit this header, we are forced to create a bytes object with the serialization of the data so we can count the bytes.
If we elected to omit the Content-Length header, we could change the structure of this application dramatically. Each serializer could be changed to a generator function, which would yield bytes as they are produced. For large datasets, this can be a helpful optimization. For the user watching a download, however, it might not be so pleasant because the browser can’t display how much of the download is complete.
A common optimization is to break the transaction into two parts. The first part computes the result and places a file into a Downloads directory. The response is a 302 FOUND with a Location header that identifies the file to download. Generally, most clients will then request the file based on this initial response. The file can be downloaded by Apache httpd or Nginx without involving the Python application.
For this example, all errors are treated as a 404 NOT FOUND error. This could be misleading, since a number of individual things might go wrong. More sophisticated error handling could give more try:/except: blocks to provide more informative feedback.
For debugging purposes, we’ve provided a Python stack trace in the resulting web page. Outside the context of debugging, this is a very bad idea. Feedback from an API should be just enough to fix the request, and nothing more. A stack trace provides too much information to potentially malicious users.
Getting raw data
Here’s what we’re using for this application:
from Chapter_3.ch03_ex5 import ( series, head_map_filter, row_iter) from typing import ( NamedTuple, Callable, List, Tuple, Iterable, Dict, Any)
RawPairIter = Iterable[Tuple[float, float]]
class Pair(NamedTuple):
x: float
y: float
pairs: Callable[[RawPairIter], List[Pair]] \
= lambda source: list(Pair(*row) for row in source)
def raw_data() -> Dict[str, List[Pair]]:
with open("Anscombe.txt") as source:
data = tuple(head_map_filter(row_iter(source)))
mapping = {
id_str: pairs(series(id_num, data))
for id_num, id_str in enumerate(
['I', 'II', 'III', 'IV'])
}
return mapping
The raw_data() function opens the local data file, and applies the row_iter() function to return each line of the file parsed into a row of separate items. We applied the head_map_filter() function to remove the heading from the file. The result created a tuple-of-list structure, which is assigned the variable data. This handles parsing the input into a structure that’s useful. The resulting structure is an instance of the Pair subclass of the NamedTuple class, with two fields that have float as their type hints.
We used a dictionary comprehension to build the mapping from id_str to pairs assembled from the results of the series() function. The series() function extracts (x, y) pairs from the input document. In the document, each series is in two adjacent columns. The series named I is in columns zero and one; the series() function extracts the relevant column pairs.
The pairs() function is created as a lambda object because it’s a small generator function with a single parameter. This function builds the desired NamedTuple objects from the sequence of anonymous tuples created by the series() function.
Since the output from the raw_data() function is a mapping, we can do something like the following example to pick a specific series by name:
>>> raw_data()['I'] [Pair(x=10.0, y=8.04), Pair(x=8.0, y=6.95), ...
Given a key, for example, 'I', the series is a list of Pair objects that have the x, y values for each item in the series.
Applying a filter
In this application, we’re using a simple filter. The entire filter process is embodied in the following function:
def anscombe_filter( set_id: str, raw_data_map: Dict[str, List[Pair]] ) -> List[Pair]: return raw_data_map[set_id]
We made this trivial expression into a function for three reasons:
- The functional notation is slightly more consistent and a bit more flexible than the subscript expression
- We can easily expand the filtering to do more
- We can include separate unit tests in the docstring for this function
While a simple lambda would work, it wouldn’t be quite as convenient to test.
For error handling, we’ve done exactly nothing. We’ve focused on what’s sometimes called the happy path: an ideal sequence of events. Any problems that arise in this function will raise an exception. The WSGI wrapper function should catch all exceptions and return an appropriate status message and error response content.
For example, it’s possible that the set_id method will be wrong in some way. Rather than obsess over all the ways it could be wrong, we’ll simply allow Python to throw an exception. Indeed, this function follows the Python advice that, it’s better to seek forgiveness than to ask permission. This advice is materialized in code by avoiding permission-seeking: there are no preparatory if statements that seek to qualify the arguments as valid. There is only forgiveness handling: an exception will be raised and handled in the WSGI wrapper. This essential advice applies to the preceding raw data and the serialization that we will see now.
Serializing the results
Serialization is the conversion of Python data into a stream of bytes, suitable for transmission. Each format is best described by a simple function that serializes just that one format. A top-level generic serializer can then pick from a list of specific serializers. The picking of serializers leads to the following collection of functions:
Serializer = Callable[[str, List[Pair]], bytes] SERIALIZERS: Dict[str, Tuple[str, Serializer]]= { 'xml': ('application/xml', serialize_xml), 'html': ('text/html', serialize_html), 'json': ('application/json', serialize_json), 'csv': ('text/csv', serialize_csv), }
def serialize(
format: str, title: str, data: List[Pair]
) -> Tuple[bytes, str]:
mime, function = SERIALIZERS.get(
format.lower(), ('text/html', serialize_html))
return function(title, data), mime
The overall serialize() function locates a specific serializer in the SERIALIZERS dictionary, which maps a format name to a two-tuple. The tuple has a MIME type that must be used in the response to characterize the results. The tuple also has a function based on the Serializer type hint. This function will transform a name and a list of Pair objects into bytes that will be downloaded.
The serialize() function doesn’t do any data transformation. It merely maps a name to a function that does the hard work of transformation. Returning a function permits the overall application to manage the details of memory or file-system serialization. Serializing to the file system, while slow, permits larger files to be handled.
We’ll look at the individual serializers below. The serializers fall into two groups: those that produce strings and those that produce bytes. A serializer that produces a string will need to have the string encoded as bytes for download. A serializer that produces bytes doesn’t need any further work.
For the serializers, which produce strings, we can use function composition with a standardized convert-to-bytes function. Here’s a decorator that can standardize the conversion to bytes:
from typing import Callable, TypeVar, Any, cast
from functools import wraps
def to_bytes(
function: Callable[..., str]
) -> Callable[..., bytes]:
@wraps(function)
def decorated(*args, **kw):
text = function(*args, **kw)
return text.encode("utf-8")
return cast(Callable[..., bytes], decorated)
We’ve created a small decorator named @to_bytes. This will evaluate the given function and then encode the results using UTF-8 to get bytes. Note that the decorator changes the decorated function from having a return type of str to a return type of bytes. We haven’t formally declared parameters for the decorated function, and used ... instead of the details. We’ll show how this is used with JSON, CSV, and HTML serializers. The XML serializer produces bytes directly and doesn’t need to be composed with this additional function.
We could also do the functional composition in the initialization of the serializers mapping. Instead of decorating the function definition, we could decorate the reference to the function object. Here’s an alternative definition for the serializer mapping:
SERIALIZERS = { 'xml': ('application/xml', serialize_xml), 'html': ('text/html', to_bytes(serialize_html)), 'json': ('application/json', to_bytes(serialize_json)), 'csv': ('text/csv', to_bytes(serialize_csv)), }
This replaces decoration at the site of the function definition with decoration when building this mapping data structure. It seems potentially confusing to defer the decoration.
Serializing data into JSON or CSV formats
The JSON and CSV serializers are similar because both rely on Python’s libraries to serialize. The libraries are inherently imperative, so the function bodies are strict sequences of statements.
Here’s the JSON serializer:
import json
@to_bytes
def serialize_json(series: str, data: List[Pair]) -> str:
"""
>>> data = [Pair(2,3), Pair(5,7)]
>>> serialize_json( "test", data )
b'[{"x": 2, "y": 3}, {"x": 5, "y": 7}]'
"""
obj = [dict(x=r.x, y=r.y) for r in data]
text = json.dumps(obj, sort_keys=True)
return text
We created a list-of-dict structure and used the json.dumps() function to create a string representation. The JSON module requires a materialized list object; we can’t provide a lazy generator function. The sort_keys=True argument value is helpful for unit testing. However, it’s not required for the application and represents a bit of overhead.
Here’s the CSV serializer:
import csv
import io
@to_bytes
def serialize_csv(series: str, data: List[Pair]) -> str:
"""
>>> data = [Pair(2,3), Pair(5,7)]
>>> serialize_csv("test", data)
b'x,y\\r\\n2,3\\r\\n5,7\\r\\n'
"""
buffer = io.StringIO()
wtr = csv.DictWriter(buffer, Pair._fields)
wtr.writeheader()
wtr.writerows(r._asdict() for r in data)
return buffer.getvalue()
The CSV module’s readers and writers are a mixture of imperative and functional elements. We must create the writer, and properly create headings in a strict sequence. We’ve used the _fields attribute of the Pair namedtuple to determine the column headings for the writer.
The writerows() method of the writer will accept a lazy generator function. In this case, we used the _asdict() method of each Pair object to return a dictionary suitable for use with the CSV writer.
Serializing data into XML
We’ll look at one approach to XML serialization using the built-in libraries. This will build a document from individual tags. A common alternative approach is to use Python introspection to examine and map Python objects and class names to XML tags and attributes.
Here’s our XML serialization:
import xml.etree.ElementTree as XML
def serialize_xml(series: str, data: List[Pair]) -> bytes:
"""
>>> data = [Pair(2,3), Pair(5,7)]
>>> serialize_xml( "test", data )
b'<series name="test"><row><x>2</x><y>3</y></row><row><x>5</x><y>7</y></row></series>'
"""
doc = XML.Element("series", name=series)
for row in data:
row_xml = XML.SubElement(doc, "row")
x = XML.SubElement(row_xml, "x")
x.text = str(row.x)
y = XML.SubElement(row_xml, "y")
y.text = str(row.y)
return cast(bytes, XML.tostring(doc, encoding='utf-8'))
We created a top-level element, <series>, and placed <row> sub-elements underneath that top element. Within each <row> sub-element, we’ve created <x> and <y> tags, and assigned text content to each tag.
The interface for building an XML document using the ElementTree library tends to be heavily imperative. This makes it a poor fit for an otherwise functional design. In addition to the imperative style, note that we haven’t created a DTD or XSD. We have not properly assigned a namespace to our tags. We also omitted the <?xml version="1.0"?> processing instruction that is generally the first item in an XML document.
The XML.tostring() function has a type hint that states it returns str. This is generally true, but when we provide the encoding parameter, the result type changes to bytes. There’s no easy way to formalize the idea of variant return types based on parameter values, so we use an explicit cast() to inform mypy of the actual type.
A more sophisticated serialization library could be helpful here. There are many to choose from. Visit https://wiki.python.org/moin/PythonXml for a list of alternatives.
Serializing data into HTML
In our final example of serialization, we’ll look at the complexity of creating an HTML document. The complexity arises because in HTML, we’re expected to provide an entire web page with a great deal of context information. Here’s one way to tackle this HTML problem:
import string data_page = string.Template("""\ <html> <head><title>Series ${title}</title></head> <body> <h1>Series ${title}</h1> <table> <thead><tr><td>x</td><td>y</td></tr></thead> <tbody> ${rows} </tbody> </table> </body> </html> """)
@to_bytes
def serialize_html(series: str, data: List[Pair]) -> str:
"""
>>> data = [Pair(2,3), Pair(5,7)]
>>> serialize_html("test", data) #doctest: +ELLIPSIS
b'<html>...<tr><td>2</td><td>3</td></tr>\\n<tr><td>5</td><td>7</td></tr>...
"""
text = data_page.substitute(
title=series,
rows="\n".join(
"<tr><td>{0.x}</td><td>{0.y}</td></tr>".format(row)
for row in data)
)
return text
Our serialization function has two parts. The first part is a string.Template() function that contains the essential HTML page. It has two placeholders where data can be inserted into the template. The ${title} method shows where title information can be inserted, and the ${rows} method shows where the data rows can be inserted.
The function creates individual data rows using a simple format string. These are joined into a longer string, which is then substituted into the template.
While workable for simple cases like the preceding example, this isn’t ideal for more complex result sets. There are a number of more sophisticated template tools to create HTML pages. A number of these include the ability to embed the looping in the template, separate from the function that initializes serialization.
If you found this tutorial useful and would like to learn more such techniques, head over to get Steven Lott’s bestseller, Functional Python Programming.
Read Next:
What is the difference between functional and object-oriented programming?