Some functional programming languages offer clever approaches to the problem of working with statically typed function definitions. The problem is that many functions we’d like to write are entirely generic with respect to data type. For example, most of our statistical functions are identical for `int` or `float` numbers, as long as the division returns a value that is a subclass of `numbers.Real` (for example, `Decimal`, `Fraction`, or `float`).

In many functional languages, sophisticated type or type-pattern matching rules are used by the compiler to make a single generic definition work for multiple data types. Python doesn’t have this problem and doesn’t need the pattern matching. In this article, we’ll understand how to achieve Polymorphism and type-pattern matching in Python.

This Python tutorial is an extract taken from the 2nd edition of the bestseller, Functional Python Programming, authored by Steven Lott.

Instead of the (possibly) complex features of statically typed functional languages, Python changes the approach dramatically. Python uses dynamic selection of the final implementation of an operator based on the data types being used. In Python, we always write generic definitions. The code isn’t bound to any specific data type. The Python runtime will locate the appropriate operations based on the types of the actual objects in use. The *3.3.7 Coercion rules* section of the language reference manual and the `numbers` module in the library provide details on how this mapping from operation to special method name works.

This means that the compiler doesn’t certify that our functions are expecting and producing the proper data types. We generally rely on unit testing and the **mypy** tool for this kind of type checking.

In rare cases, we might need to have different behavior based on the types of data elements. We have two ways to tackle this:

- We can use the
`isinstance()`function to distinguish the different cases - We can create our own subclass of
`numbers.Number`or`NamedTuple`and implement proper polymorphic special method names.

In some cases, we’ll actually need to do both so that we can include appropriate data type conversions for each operation. Additionally, we’ll also need to use the `cast()` function to make the types explicit to the **mypy** tool.

The ranking example in the previous section is tightly bound to the idea of applying rank-ordering to simple pairs. While this is the way the Spearman correlation is defined, a multivariate dataset have a need to do rank-order correlation among all the variables.

The first thing we’ll need to do is generalize our idea of rank-order information. The following is a `NamedTuple` value that handles a `tuple` of ranks and a `raw` data object:

from typing import NamedTuple, Tuple, Any class Rank_Data(NamedTuple): rank_seq: Tuple[float] raw: Any

A typical use of this kind of class definition is shown in this example:

>>> data = {'key1': 1, 'key2': 2} >>> r = Rank_Data((2, 7), data) >>> r.rank_seq[0] 2 >>> r.raw {'key1': 1, 'key2': 2}

The row of raw data in this example is a dictionary. There are two rankings for this particular item in the overall list. An application can get the sequence of rankings as well as the original raw data item.

We’ll add some syntactic sugar to our ranking function. In many previous examples, we’ve required either an iterable or a concrete collection. The `for` statement is graceful about working with either one. However, we don’t always use the `for` statement, and for some functions, we’ve had to explicitly use `iter()` to make an `iterable` out of a collection. We can handle this situation with a simple `isinstance()` check, as shown in the following code snippet:

def some_function(seq_or_iter: Union[Sequence, Iterator]): if isinstance(seq_or_iter, Sequence): yield from some_function(iter(seq_or_iter), key) return # Do the real work of the function using the Iterator

This example includes a type check to handle the small difference between a `Sequence` object and an `Iterator`. Specifically, the function uses `iter()` to create an `Iterator` from a `Sequence`, and calls itself recursively with the derived value.

For rank-ordering, the `Union[Sequence, Iterator]` will be supported. Because the source data must be sorted for ranking, it’s easier to use `list()` to transform a given iterator into a concrete sequence. The essential `isinstance()` check will be used, but instead of creating an iterator from a sequence (as shown previously), the following examples will create a sequence object from an iterator.

In the context of our rank-ordering function, we can make the function somewhat more generic. The following two expressions define the inputs:

Source = Union[Rank_Data, Any] Union[Sequence[Source], Iterator[Source]]

There are four combinations defined by these two types:

`Sequence[Rank_Data]``Sequence[Any]``Iterator[Rank_Data]``Iterator[Any]`

## Handling four combination data types

Here’s the `rank_data()` function with three cases for handling the four combinations of data types:

from typing import ( Callable, Sequence, Iterator, Union, Iterable, TypeVar, cast, Union ) K_ = TypeVar("K_") # Some comparable key type used for ranking. Source = Union[Rank_Data, Any] def rank_data( seq_or_iter: Union[Sequence[Source], Iterator[Source]], key: Callable[[Rank_Data], K_] = lambda obj: cast(K_, obj) ) -> Iterable[Rank_Data]:

```
if isinstance(seq_or_iter, Iterator):
# Iterator? Materialize a sequence object
yield from rank_data(list(seq_or_iter), key)
return
```

```
data: Sequence[Rank_Data]
if isinstance(seq_or_iter[0], Rank_Data):
# Collection of Rank_Data is what we prefer.
data = seq_or_iter
else:
# Convert to Rank_Data and process.
empty_ranks: Tuple[float] = cast(Tuple[float], ())
data = list(
Rank_Data(empty_ranks, raw_data)
for raw_data in cast(Sequence[Source], seq_or_iter)
)
```

```
for r, rd in rerank(data, key):
new_ranks = cast(
Tuple[float],
rd.rank_seq + cast(Tuple[float], (r,)))
yield Rank_Data(new_ranks, rd.raw)
```

We’ve decomposed the ranking into three cases to cover the four different types of data. The following are the cases defined by the union of unions:

- Given an
`Iterator`(an object without a usable`__getitem__()`method), we’ll materialize a`list`object to work with. This will work for`Rank_Data`as well as any other raw data type. This case covers objects which are`Iterator[Rank_Data]`as well as`Iterator[Any]`. - Given a
`Sequence[Any]`, we’ll wrap the unknown objects into`Rank_Data`tuples with an empty collection of rankings to create a`Sequence[Rank_Data]`. - Finally, given a
`Sequence[Rank_Data]`, add yet another ranking to the tuple of ranks inside the each`Rank_Data`container.

The first case calls `rank_data()` recursively. The other two cases both rely on a `rerank()` function that builds a new `Rank_Data` tuple with additional ranking values. This contains several rankings for a complex record of raw data values.

Note that a relatively complex `cast()` expression is required to disambiguate the use of generic tuples for the rankings. The **mypy** tool offers a `reveal_type()` function that can be incorporated to debug the inferred types.

The `rerank()` function follows a slightly different design to the example of the `rank()` function shown previously. It yields two-tuples with the rank and the original data object:

def rerank( rank_data_iter: Iterable[Rank_Data], key: Callable[[Rank_Data], K_] ) -> Iterator[Tuple[float, Rank_Data]]: sorted_iter = iter( sorted( rank_data_iter, key=lambda obj: key(obj.raw) ) ) # Apply ranker to head, *tail = sorted(rank_data_iter) head = next(sorted_iter) yield from ranker(sorted_iter, 0, [head], key)

The idea behind `rerank()` is to sort a collection of `Rank_Data` objects. The first item, `head`, is used to provide a seed value to the `ranker()` function. The `ranker()` function can examine the remaining items in the iterable to see if they match this initial value, this allows computing a proper rank for a batch of matching items.

The `ranker()` function accepts a sorted iterable of data, a base rank number, and an initial collection of items of the minimum rank. The result is an iterable sequence of two-tuples with a rank number and an associated `Rank_Data` object:

def ranker( sorted_iter: Iterator[Rank_Data], base: float, same_rank_seq: List[Rank_Data], key: Callable[[Rank_Data], K_] ) -> Iterator[Tuple[float, Rank_Data]]: try: value = next(sorted_iter) except StopIteration: dups = len(same_rank_seq) yield from yield_sequence( (base+1+base+dups)/2, iter(same_rank_seq)) return if key(value.raw) == key(same_rank_seq[0].raw): yield from ranker( sorted_iter, base, same_rank_seq+[value], key) else: dups = len(same_rank_seq) yield from yield_sequence( (base+1+base+dups)/2, iter(same_rank_seq)) yield from ranker( sorted_iter, base+dups, [value], key)

This starts by attempting to extract the next item from the `sorted_iter` collection of sorted `Rank_Data` items. If this fails with a `StopIteration` exception, there is no next item, the source was exhausted. The final output is the final batch of equal-valued items in the `same_rank_seq` sequence.

If the sequence has a next item, the `key()` function extracts the key value. If this new value matches the keys in the `same_rank_seq` collection, it is accumulated into the current batch of same-valued keys. The final result is based on the rest of the items in `sorted_iter`, the current value for the rank, a larger batch of `same_rank` items that now includes the `head` value, and the original `key()` function.

If the next item’s key doesn’t match the current batch of equal-valued items, the final result has two parts. The first part is the batch of equal-valued items accumulated in `same_rank_seq`. This is followed by the reranking of the remainder of the sorted items. The base value for these is incremented by the number of equal-valued items, a fresh batch of equal-rank items is initialized with the distinct key, and the original `key()` extraction function is provided.

The output from `ranker()` depends on the `yield_sequence()` function, which looks as follows:

def yield_sequence( rank: float, same_rank_iter: Iterator[Rank_Data] ) -> Iterator[Tuple[float, Rank_Data]]: head = next(same_rank_iter) yield rank, head yield from yield_sequence(rank, same_rank_iter)

We’ve written this in a way that emphasizes the recursive definition. For any practical work, this should be optimized into a single `for` statement.

The following are some examples of using this function to rank (and rerank) data. We’ll start with a simple collection of scalar values:

>>> scalars= [0.8, 1.2, 1.2, 2.3, 18] >>> list(rank_data(scalars)) [Rank_Data(rank_seq=(1.0,), raw=0.8), Rank_Data(rank_seq=(2.5,), raw=1.2), Rank_Data(rank_seq=(2.5,), raw=1.2), Rank_Data(rank_seq=(4.0,), raw=2.3), Rank_Data(rank_seq=(5.0,), raw=18)]

Each value becomes the `raw` attribute of a `Rank_Data` object.

When we work with a slightly more complex object, we can also have multiple rankings. The following is a sequence of two tuples:

>>> pairs = ((2, 0.8), (3, 1.2), (5, 1.2), (7, 2.3), (11, 18)) >>> rank_x = list(rank_data(pairs, key=lambda x:x[0])) >>> rank_x [Rank_Data(rank_seq=(1.0,), raw=(2, 0.8)), Rank_Data(rank_seq=(2.0,), raw=(3, 1.2)), Rank_Data(rank_seq=(3.0,), raw=(5, 1.2)), Rank_Data(rank_seq=(4.0,), raw=(7, 2.3)), Rank_Data(rank_seq=(5.0,), raw=(11, 18))]

```
>>> rank_xy = list(rank_data(rank_x, key=lambda x:x[1] ))
>>> rank_xy
[Rank_Data(rank_seq=(1.0, 1.0), raw=(2, 0.8)),
Rank_Data(rank_seq=(2.0, 2.5), raw=(3, 1.2)),
Rank_Data(rank_seq=(3.0, 2.5), raw=(5, 1.2)),
Rank_Data(rank_seq=(4.0, 4.0), raw=(7, 2.3)),
Rank_Data(rank_seq=(5.0, 5.0), raw=(11, 18))]
```

Here, we defined a collection of pairs. Then, we ranked the two tuples, assigning the sequence of `Rank_Data` objects to the `rank_x` variable. We then ranked this collection of `Rank_Data` objects, creating a second rank value and assigning the result to the `rank_xy` variable.

The resulting sequence can be used for a slightly modified `rank_corr()` function to compute the rank correlations of any of the available values in the `rank_seq` attribute of the `Rank_Data` objects. We’ll leave this modification as an exercise for you.

If you found this tutorial useful and would like to learn more such techniques, head over to get Steven Lott’s bestseller, Functional Python Programming.

## Read Next:

*Why functional programming in Python matters: Interview with best selling author, Steven Lott*