6 min read

In this article by Kurt Menke, GISP, Dr. Richard Smith Jr., GISP, Dr. Luigi Pirelli, Dr. John Van Hoesen, GISP, authors of the book Mastering QGIS, we’ll have a look at how to geocode address-based date using QGIS and MMQGIS.

(For more resources related to this topic, see here.)

Geocoding addresses has many applications, such as mapping the customer base for a store, members of an organization, public health records, or incidence of crime. Once mapped, the points can be used in many ways to generate information. For example, they can be used as inputs to generate density surfaces, linked to parcels of land, and characterized by socio-economic data. They may also be an important component of a cadastral information system.

An address geocoding operation typically involves the tabular address data and a street network dataset. The street network needs to have attribute fields for address ranges on the left- and right-hand side of each road segment. You can geocode within QGIS using a plugin named MMQGIS (http://michaelminn.com/linux/mmqgis/).

MMQGIS has many useful tools. For geocoding, we will use the tools found in MMQGIS | Geocode. There are two tools there: Geocode CSV with Google/ OpenStreetMap and Geocode from Street Layer as shown in the following screenshot. The first tool allows you to geocode a table of addresses using either the Google Maps API or the OpenStreetMap Nominatim web service. This tool requires an Internet connection but no local street network data as the web services provide the street network. The second tool requires a local street network dataset with address range attributes to geocode the address data:

Mastering QGIS

How address geocoding works

The basic mechanics of address geocoding are straightforward. The street network GIS data layer has attribute columns containing the address ranges on both the even and odd side of every street segment. In the following example, you can see a piece of the attribute table for the Streets.shp sample data. The columns LEFTLOW, LEFTHIGH, RIGHTLOW, and RIGHTHIGH contain the address ranges for each street segment:

Mastering QGIS

In the following example we are looking at Easy Street. On the odd side of the street, the addresses range from 101 to 199. On the even side, they range from 102 to 200. If you wanted to map 150 Easy Street, QGIS would assume that the address is located halfway down the even side of that block. Similarly, 175 Easy Street would be on the odd side of the street three quarters the way down the block. Address geocoding assumes that the addresses are evenly spaced along the linear network. QGIS should place the address point very close to its actual position, but due to variability in lot sizes not every address point will be perfectly positioned.

Mastering QGIS

Now that you’ve learned the basics, let’s work through an example. Here we will geocode addresses using web services. The output will be a point shapefile containing all the attribute fields found in the source Addresses.csv file.

An example – geocoding using web services

Here are the steps for geocoding the Addresses.csv sample data using web services.

  1. Load the Addresses.csv and the Streets.shp sample data into QGIS Desktop.
  2. Open Addresses.csv and examine the table. These are addresses of municipal facilities. Notice that the street address (for example, 150 Easy Street) is contained in a single field. There are also fields for the city, state, and country. Since both Google and OpenStreetMap are global services, it is wise to include such fields so that the services can narrow down the geography.
  3. Install and enable the MMQGIS plugin.
  4. Navigate to MMQGIS | Geocode | Geocode CSV with Google/OpenStreetMap. The Web Service Geocode dialog window will open.
  5. Select Input CSV File (UTF-8) by clicking on Browse… and locating the delimited text file on your system.
  6. Select the address fields by clicking on the drop-down menu and identifying the Address Field, City Field, State Field, and Country Field fields. MMQGIS may identify some or all of these fields by default if they are named with logical names such as Address or State.
  7. Choose the web service.
  8. Name the output shapefile by clicking on Browse….
  9. Name Not Found Output List by clicking on Browse…. Any records that are not matched will be written to this file. This allows you to easily see and troubleshoot any unmapped records.
  10. Click on OK. The status of the geocoding operation can be seen in the lower-left corner of QGIS. The word Geocoding will be displayed, followed by the number of records that have been processed.

The output will be a point shapefile and a CSV file listing that addresses were not matched.

Mastering QGIS

Two additional attribute columns will be added to the output address point shapefile: addrtype and addrlocat. These fields provide information on how the web geocoding service obtained the location. These may be useful for accuracy assessment.

Addrtype is the Google <type> element or the OpenStreetMap class attribute. This will indicate what kind of address type this is (highway, locality, museum, neighborhood, park, place, premise, route, train_station, university etc.).

Addrlocat is the Google <location_type> element or OpenStreetMap type attribute. This indicates the relationship of the coordinates to the addressed feature (approximate, geometric center, node, relation, rooftop, way interpolation, and so on).

If the web service returns more than one location for an address, the first of the locations will be used as the output feature.

Use of this plugin requires an active Internet connection. Google places both rate and volume restrictions on the number of addresses that can be geocoded within various time limits. You should visit the Google Geocoding API website: (http://code.google.com/apis/maps/documentation/geocoding/) for more details, and current information and Google’s terms of service.

Geocoding via these web services can be slow. If you don’t get the desired results with one service, try the other.

Geocoding operations rarely have 100% success. Street names in the street shapefile must match the street names in the CSV file exactly. Any discrepancies between the name of a street in the address table, and the street attribute table will lower the geocoding success rate.

The following image shows the results of geocoding addresses via street address ranges. The addresses are shown with the street network used in the geocoding operation:

Mastering QGIS

Geocoding is often an iterative process. After the initial geocoding operation, you can review the Not Found CSV file. If it’s empty then all the records were matched. If it has records in it, compare them with the attributes of the streets layer. This will help you determine why those records were not mapped. It may be due to inconsistencies in the spelling of street names. It may also be due to a street centerline layer that is not as current as the addresses. Once the errors have been identified they can be corrected by editing the data, or obtaining a different street centreline dataset. The geocoding operation can be re-run on those unmatched addresses. This process can be repeated until all records are matched.

Use the Identify tool to inspect the mapped points, and the roads, to ensure that the operation was successful. Never take a GIS operation for granted. Check your results with a critical eye.

Mastering QGIS

Summary

This article introduced you to the process of address geocoding using QGIS and the MMQGIS plugin.

Resources for Article:


Further resources on this subject:


1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here