22 min read

In this article by Silas Toms, author of the book ArcPy and ArcGIS – Geospatial Analysis with Python we will see how programming languages share a concept that has aided programmers for decades: functions. The idea of a function, loosely speaking, is to create blocks of code that will perform an action on a piece of data, transforming it as required by the programmer and returning the transformed data back to the main body of code.

Functions are used because they solve many different needs within programming. Functions reduce the need to write repetitive code, which in turn reduces the time needed to create a script. They can be used to create ranges of numbers (the range() function), or to determine the maximum value of a list (the max function), or to create a SQL statement to select a set of rows from a feature class. They can even be copied and used in another script or included as part of a module that can be imported into scripts. Function reuse has the added bonus of making programming more useful and less of a chore. When a scripter starts writing functions, it is a major step towards making programming part of a GIS workflow.

(For more resources related to this topic, see here.)

Technical definition of functions

Functions, also called subroutines or procedures in other programming languages, are blocks of code that have been designed to either accept input data and transform it, or provide data to the main program when called without any input required. In theory, functions will only transform data that has been provided to the function as a parameter; it should not change any other part of the script that has not been included in the function. To make this possible, the concept of namespaces is invoked.

Namespaces make it possible to use a variable name within a function, and allow it to represent a value, while also using the same variable name in another part of the script. This becomes especially important when importing modules from other programmers; within that module and its functions, the variables that it contains might have a variable name that is the same as a variable name within the main script.

In a high-level programming language such as Python, there is built-in support for functions, including the ability to define function names and the data inputs (also known as parameters). Functions are created using the keyword def plus a function name, along with parentheses that may or may not contain parameters. Parameters can also be defined with default values, so parameters only need to be passed to the function when they differ from the default. The values that are returned from the function are also easily defined.

A first function

Let’s create a function to get a feel for what is possible when writing functions. First, we need to invoke the function by providing the def keyword and providing a name along with the parentheses. The firstFunction() will return a string when called:

def firstFunction():
   'a simple function returning a string'
   return "My First Function"
>>>firstFunction()

The output is as follows:

'My First Function'

Notice that this function has a documentation string or doc string (a simple function returning a string) that describes what the function does; this string can be called later to find out what the function does, using the __doc__ internal function:

>>>print firstFunction.__doc__

The output is as follows:

'a simple function returning a string' 

The function is defined and given a name, and then the parentheses are added followed by a colon. The following lines must then be indented (a good IDE will add the indention automatically). The function does not have any parameters, so the parentheses are empty. The function then uses the keyword return to return a value, in this case a string, from the function.

Next, the function is called by adding parentheses to the function name. When it is called, it will do what it has been instructed to do: return the string.

Functions with parameters

Now let’s create a function that accepts parameters and transforms them as needed. This function will accept a number and multiply it by 3:

def secondFunction(number):
   'this function multiples numbers by 3'
   return number *3
>>> secondFunction(4)

The output is as follows:

12

The function has one flaw, however; there is no assurance that the value passed to the function is a number. We need to add a conditional to the function to make sure it does not throw an exception:

def secondFunction(number):
   'this function multiples numbers by 3'
   if type(number) == type(1) or type(number) == type(1.0):
       return number *3
>>> secondFunction(4.0)

The output is as follows:

12.0
>>>secondFunction(4)

The output is as follows:

12
>>>secondFunction("String")
>>> 

The function now accepts a parameter, checks what type of data it is, and returns a multiple of the parameter whether it is an integer or a function. If it is a string or some other data type, as shown in the last example, no value is returned.

There is one more adjustment to the simple function that we should discuss: parameter defaults. By including default values in the definition of the function, we avoid having to provide parameters that rarely change. If, for instance, we wanted a different multiplier than 3 in the simple function, we would define it like this:

def thirdFunction(number, multiplier=3):
   'this function multiples numbers by 3'
   if type(number) == type(1) or type(number) == type(1.0):
       return number *multiplier
>>>thirdFunction(4)

The output is as follows:

12
>>>thirdFunction(4,5)

The output is as follows:

20

The function will work when only the number to be multiplied is supplied, as the multiplier has a default value of 3. However, if we need another multiplier, the value can be adjusted by adding another value when calling the function. Note that the second value doesn’t have to be a number as there is no type checking on it. Also, the default value(s) in a function must follow the parameters with no defaults (or all parameters can have a default value and the parameters can be supplied to the function in order or by name).

Using functions to replace repetitive code

One of the main uses of functions is to ensure that the same code does not have to be written over and over.

The first portion of the script that we could convert into a function is the three ArcPy functions. Doing so will allow the script to be applicable to any of the stops in the Bus Stop feature class and have an adjustable buffer distance:

bufferDist = 400
buffDistUnit = "Feet"
lineName = '71 IB'
busSignage = 'Ferry Plaza'
sqlStatement = "NAME = '{0}' AND BUS_SIGNAG = '{1}'"
def selectBufferIntersect(selectIn,selectOut,bufferOut, 
    intersectIn, intersectOut, sqlStatement,
  bufferDist, buffDistUnit, lineName, busSignage):
   'a function to perform a bus stop analysis'    arcpy.Select_analysis(selectIn, selectOut, sqlStatement.format(lineName, busSignage))    arcpy.Buffer_analysis(selectOut, bufferOut, "{0} {1}".format(bufferDist), "FULL",
"ROUND", "NONE", "")
   arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut, intersectIn), intersectOut,
"ALL", "", "INPUT")
   return intersectOut

This function demonstrates how the analysis can be adjusted to accept the input and output feature class variables as parameters, along with some new variables.

The function adds a variable to replace the SQL statement and variables to adjust the bus stop, and also tweaks the buffer distance statement so that both the distance and the unit can be adjusted. The feature class name variables, defined earlier in the script, have all been replaced with local variable names; while the global variable names could have been retained, it reduces the portability of the function.

The next function will accept the result of the selectBufferIntersect() function and search it using the Search Cursor, passing the results into a dictionary. The dictionary will then be returned from the function for later use:

def createResultDic(resultFC):
   'search results of analysis and create results dictionary' 
 dataDictionary = {}   
   with arcpy.da.SearchCursor(resultFC, ["STOPID","POP10"]) as cursor:
       for row in cursor:
           busStopID = row[0]
           pop10 = row[1]
           if busStopID not in dataDictionary.keys():
               dataDictionary[busStopID] = [pop10]
           else:
               dataDictionary[busStopID].append(pop10)
   return dataDictionary

This function only requires one parameter: the feature class returned from the searchBufferIntersect() function. The results holding dictionary is first created, then populated by the search cursor, with the busStopid attribute used as a key, and the census block population attribute added to a list assigned to the key.

The dictionary, having been populated with sorted data, is returned from the function for use in the final function, createCSV(). This function accepts the dictionary and the name of the output CSV file as a string:

def createCSV(dictionary, csvname):
'a function takes a dictionary and creates a CSV file'
   with open(csvname, 'wb') as csvfile:
       csvwriter = csv.writer(csvfile, delimiter=',')
       for busStopID in dictionary.keys():
           popList = dictionary[busStopID]
           averagePop = sum(popList)/len(popList)
           data = [busStopID, averagePop]
           csvwriter.writerow(data)

The final function creates the CSV using the csv module. The name of the file, a string, is now a customizable parameter (meaning the script name can be any valid file path and text file with the extension .csv). The csvfile parameter is passed to the CSV module’s writer method and assigned to the variable csvwriter, and the dictionary is accessed and processed, and passed as a list to csvwriter to be written to the CSV file. The csv.writer() method processes each item in the list into the CSV format and saves the final result. Open the CSV file with Excel or a text editor such as Notepad.

To run the functions, we will call them in the script following the function definitions:

analysisResult = selectBufferIntersect(Bus_Stops,Inbound71, Inbound71_400ft_buffer,
CensusBlocks2010, Intersect71Census, bufferDist, lineName,
               busSignage )
dictionary = createResultDic(analysisResult) createCSV(dictionary,r'C:\Projects\Output\Averages.csv')

Now, the script has been divided into three functions, which replace the code of the first modified script. The modified script looks like this:

# -*- coding: utf-8 -*-
# ---------------------------------------------------------------------------
# 8662_Chapter4Modified1.py
# Created on: 2014-04-22 21:59:31.00000
#   (generated by ArcGIS/ModelBuilder)
# Description: 
# Adjusted by Silas Toms
# 2014 05 05
# ---------------------------------------------------------------------------
 
# Import arcpy module
import arcpy
import csv
 
# Local variables:
Bus_Stops = r"C:\Projects\PacktDB.gdb\SanFrancisco\Bus_Stops"
CensusBlocks2010 = r"C:\Projects\PacktDB.gdb\SanFrancisco\CensusBlocks2010"
Inbound71 = r"C:\Projects\PacktDB.gdb\Chapter3Results\Inbound71"
Inbound71_400ft_buffer = r"C:\Projects\PacktDB.gdb\Chapter3Results\Inbound71_400ft_buffer"
Intersect71Census = r"C:\Projects\PacktDB.gdb\Chapter3Results\Intersect71Census"
bufferDist = 400
lineName = '71 IB'
busSignage = 'Ferry Plaza'
def selectBufferIntersect(selectIn,selectOut,bufferOut,intersectIn,
                         intersectOut, bufferDist,lineName, busSignage ):
   arcpy.Select_analysis(selectIn, 
                         selectOut, 
                          "NAME = '{0}' AND BUS_SIGNAG = '{1}'".format(lineName, busSignage))
   arcpy.Buffer_analysis(selectOut, 
                         bufferOut, 
                         "{0} Feet".format(bufferDist), 
                         "FULL", "ROUND", "NONE", "")
   arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut,intersectIn), 
                             intersectOut, "ALL", "", "INPUT")
   return intersectOut
 
def createResultDic(resultFC):
   dataDictionary = {}
   
   with arcpy.da.SearchCursor(resultFC, 
                               ["STOPID","POP10"]) as cursor:
       for row in cursor:
           busStopID = row[0]
           pop10 = row[1]
           if busStopID not in dataDictionary.keys():
               dataDictionary[busStopID] = [pop10]
           else:
               dataDictionary[busStopID].append(pop10)
   return dataDictionary
 
def createCSV(dictionary, csvname):
   with open(csvname, 'wb') as csvfile:
       csvwriter = csv.writer(csvfile, delimiter=',')
       for busStopID in dictionary.keys():
           popList = dictionary[busStopID]
           averagePop = sum(popList)/len(popList)
           data = [busStopID, averagePop]
           csvwriter.writerow(data)
analysisResult = selectBufferIntersect(Bus_Stops,Inbound71, 
Inbound71_400ft_buffer,CensusBlocks2010,Intersect71Census, bufferDist,lineName, busSignage )
dictionary = createResultDic(analysisResult) createCSV(dictionary,r'C:\Projects\Output\Averages.csv') print "Data Analysis Complete"

Further generalization of the functions, while we have created functions from the original script that can be used to extract more data about bus stops in San Francisco, our new functions are still very specific to the dataset and analysis for which they were created. This can be very useful for long and laborious analysis for which creating reusable functions is not necessary. The first use of functions is to get rid of the need to repeat code. The next goal is to then make that code reusable. Let’s discuss some ways in which we can convert the functions from one-offs into reusable functions or even modules.

First, let’s examine the first function:

def selectBufferIntersect(selectIn,selectOut,bufferOut,intersectIn,
                         intersectOut, bufferDist,lineName, busSignage ):
   arcpy.Select_analysis(selectIn, 
                         selectOut, 
                         "NAME = '{0}' AND BUS_SIGNAG = '{1}'".format(lineName, busSignage))
   arcpy.Buffer_analysis(selectOut, 
                         bufferOut, 
                         "{0} Feet".format(bufferDist), 
                         "FULL", "ROUND", "NONE", "")
   arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut,intersectIn), 
                             intersectOut, "ALL", "", "INPUT")
   return intersectOut

This function appears to be pretty specific to the bus stop analysis. It’s so specific, in fact, that while there are a few ways in which we can tweak it to make it more general (that is, useful in other scripts that might not have the same steps involved), we should not convert it into a separate function. When we create a separate function, we introduce too many variables into the script in an effort to simplify it, which is a counterproductive effort. Instead, let’s focus on ways to generalize the ArcPy tools themselves.

The first step will be to split the three ArcPy tools and examine what can be adjusted with each of them. The Select tool should be adjusted to accept a string as the SQL select statement. The SQL statement can then be generated by another function or by parameters accepted at runtime.

For instance, if we wanted to make the script accept multiple bus stops for each run of the script (for example, the inbound and outbound stops for each line), we could create a function that would accept a list of the desired stops and a SQL template, and would return a SQL statement to plug into the Select tool. Here is an example of how it would look:

def formatSQLIN(dataList, sqlTemplate):
   'a function to generate a SQL statement'
   sql = sqlTemplate #"OBJECTID IN "
   step = "("
   for data in dataList:
       step += str(data)
   sql += step + ")"
   return sql
 
def formatSQL(dataList, sqlTemplate):
   'a function to generate a SQL statement'
   sql = ''
   for count, data in enumerate(dataList):
       if count != len(dataList)-1:
           sql += sqlTemplate.format(data) + ' OR '
       else:
           sql += sqlTemplate.format(data)
   return sql
 
>>> dataVals = [1,2,3,4]
>>> sqlOID = "OBJECTID = {0}"
>>> sql = formatSQL(dataVals, sqlOID)
>>> print sql

The output is as follows:

OBJECTID = 1 OR OBJECTID = 2 OR OBJECTID = 3 OR OBJECTID = 4

This new function, formatSQL(), is a very useful function. Let’s review what it does by comparing the function to the results following it. The function is defined to accept two parameters: a list of values and a SQL template. The first local variable is the empty string sql, which will be added to using string addition. The function is designed to insert the values into the variable sql, creating a SQL statement by taking the SQL template and using string formatting to add them to the template, which in turn is added to the SQL statement string (note that sql += is equivelent to sql = sql +). Also, an operator (OR) is used to make the SQL statement inclusive of all data rows that match the pattern. This function uses the built-in enumerate function to count the iterations of the list; once it has reached the last value in the list, the operator is not added to the SQL statement.

Note that we could also add one more parameter to the function to make it possible to use an AND operator instead of OR, while still keeping OR as the default:

def formatSQL2(dataList, sqlTemplate, operator=" OR "):
   'a function to generate a SQL statement'
   sql = ''
   for count, data in enumerate(dataList):
       if count != len(dataList)-1:
           sql += sqlTemplate.format(data) + operator
       else:
           sql += sqlTemplate.format(data)
   return sql
 
>>> sql = formatSQL2(dataVals, sqlOID," AND ")
>>> print sql

The output is as follows:

OBJECTID = 1 AND OBJECTID = 2 AND OBJECTID = 3 AND OBJECTID = 4

While it would make no sense to use an AND operator on ObjectIDs, there are other cases where it would make sense, hence leaving OR as the default while allowing for AND. Either way, this function can now be used to generate our bus stop SQL statement for multiple stops (ignoring, for now, the bus signage field):

>>> sqlTemplate = "NAME = '{0}'"
>>> lineNames = ['71 IB','71 OB']
>>> sql = formatSQL2(lineNames, sqlTemplate)
>>> print sql

The output is as follows:

NAME = '71 IB' OR NAME = '71 OB'

However, we can’t ignore the Bus Signage field for the inbound line, as there are two starting points for the line, so we will need to adjust the function to accept multiple values:

def formatSQLMultiple(dataList, sqlTemplate, operator=" OR "):
   'a function to generate a SQL statement'
   sql = ''
   for count, data in enumerate(dataList):
       if count != len(dataList)-1:
           sql += sqlTemplate.format(*data) + operator
       else:
           sql += sqlTemplate.format(*data)
   return sql
 
>>> sqlTemplate = "(NAME = '{0}' AND BUS_SIGNAG = '{1}')"
>>> lineNames = [('71 IB', 'Ferry Plaza'),('71 OB','48th Avenue')]
>>> sql = formatSQLMultiple(lineNames, sqlTemplate)
>>> print sql

The output is as follows:

(NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza') OR (NAME = '71 OB' AND BUS_SIGNAG = '48th Avenue')

The slight difference in this function, the asterisk before the data variable, allows the values inside the data variable to be correctly formatted into the SQL template by exploding the values within the tuple. Notice that the SQL template has been created to segregate each conditional by using parentheses. The function(s) are now ready for reuse, and the SQL statement is now ready for insertion into the Select tool:

sql = formatSQLMultiple(lineNames, sqlTemplate)
arcpy.Select_analysis(Bus_Stops, Inbound71, sql)

Next up is the Buffer tool. We have already taken steps towards making it generalized by adding a variable for the distance. In this case, we will only add one more variable to it, a unit variable that will make it possible to adjust the buffer unit from feet to meter or any other allowed unit. We will leave the other defaults alone.

Here is an adjusted version of the Buffer tool:

bufferDist = 400
bufferUnit = "Feet"
arcpy.Buffer_analysis(Inbound71, 
                     Inbound71_400ft_buffer, 
                     "{0} {1}".format(bufferDist, bufferUnit), 
                     "FULL", "ROUND", "NONE", "")

Now, both the buffer distance and buffer unit are controlled by a variable defined in the previous script, and this will make it easily adjustable if it is decided that the distance was not sufficient and the variables might need to be adjusted.

The next step towards adjusting the ArcPy tools is to write a function, which will allow for any number of feature classes to be intersected together using the Intersect tool. This new function will be similar to the formatSQL functions as previous, as they will use string formatting and addition to allow for a list of feature classes to be processed into the correct string format for the Intersect tool to accept them. However, as this function will be built to be as general as possible, it must be designed to accept any number of feature classes to be intersected:

def formatIntersect(features):
   'a function to generate an intersect string'
   formatString = ''
   for count, feature in enumerate(features):
       if count != len(features)-1:
           formatString += feature + " #;"
       else:
           formatString += feature + " #"
       return formatString
>>> shpNames = ["example.shp","example2.shp"]
>>> iString = formatIntersect(shpNames)
>>> print iString

The output is as follows:

example.shp #;example2.shp #

Now that we have written the formatIntersect() function, all that needs to be created is a list of the feature classes to be passed to the function. The string returned by the function can then be passed to the Intersect tool:

intersected = [Inbound71_400ft_buffer, CensusBlocks2010]
iString = formatIntersect(intersected)
# Process: Intersect
arcpy.Intersect_analysis(iString,
                         Intersect71Census, "ALL", "", "INPUT")

Because we avoided creating a function that only fits this script or analysis, we now have two (or more) useful functions that can be applied in later analyses, and we know how to manipulate the ArcPy tools to accept the data that we want to supply to them.

Summary

In this article, we discussed how to take autogenerated code and make it generalized, while adding functions that can be reused in other scripts and will make the generation of the necessary code components, such as SQL statements, much easier.

Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here