(For more resources on Python, see here.)
Shapely is a very capable library for performing various calculations on geo-spatial data. Let’s put it through its paces with a complex, real-world problem.
The U.S. Census Bureau makes available a Shapefile containing something called Core Based Statistical Areas (CBSAs), which are polygons defining urban areas with a population of 10,000 or more. At the same time, the GNIS website provides lists of placenames and other details. Using these two datasources, we will identify any parks within or close to an urban area.
Because of the volume of data we are potentially dealing with, we will limit our search to California. Feel free to download the larger data sets if you want, though you will have to optimize the code or your program will take a very long time to check all the CBSA polygon/placename combinations.
The XXX in the above file name is a date stamp, and will vary depending on when you download the data. Just remember the name of the file as you’ll need to refer to it in your source code.
shapefile = osgeo.ogr.Open("tl_2009_06_cbsa.shp")
layer = shapefile.GetLayer(0)
for i in range(layer.GetFeatureCount()):
feature = layer.GetFeature(i)
geometry = feature.GetGeometryRef()
...
Make sure you add directory paths to your osgeo.ogr.Open() statement (and to the file() statement below) to match where you’ve placed these files.
wkt = geometry.ExportToWkt()
shape = shapely.wkt.loads(wkt)
f = file("CA_Features_XXX.txt", "r")
for line in f.readlines():
chunks = line.rstrip().split("|")
if chunks[2] == "Park":
name = chunks[1]
latitude = float(chunks[9])
longitude = float(chunks[10])
...
Remember that the GNIS placename database is a pipedelimited text file. That’s why we have to split the line up using line.rstrip().split(“|”).
The second option is faster when dealing with a large number of points as we can pre-calculate the dilated polygons and then use them to compare against each point in turn. Let’s take this option:
# findNearbyParks.py
import osgeo.ogr
import shapely.geometry
import shapely.wkt
MAX_DISTANCE = 0.1 # Angular distance; approx 10 km.
print "Loading urban areas..."
urbanAreas = {} # Maps area name to Shapely polygon.
shapefile = osgeo.ogr.Open("tl_2009_06_cbsa.shp")
layer = shapefile.GetLayer(0)
for i in range(layer.GetFeatureCount()):
feature = layer.GetFeature(i)
name = feature.GetField("NAME")
geometry = feature.GetGeometryRef()
shape = shapely.wkt.loads(geometry.ExportToWkt())
dilatedShape = shape.buffer(MAX_DISTANCE)
urbanAreas[name] = dilatedShape
print "Checking parks..."
f = file("CA_Features_XXX.txt", "r")
for line in f.readlines():
chunks = line.rstrip().split("|")
if chunks[2] == "Park":
parkName = chunks[1]
latitude = float(chunks[9])
longitude = float(chunks[10])
pt = shapely.geometry.Point(longitude, latitude)
for urbanName,urbanArea in urbanAreas.items():
if urbanArea.contains(pt):
print parkName + " is in or near " + urbanName
f.close()
Don’t forget to change the name of the CA_Features_XXX.txt file to match the actual name of the file you downloaded. You may also need to change the path names to the tl_2009_06_CBSA.shp file and the CA_Features file if you placed them in a different directory.
If you run this program, you will get a master list of all the parks that are in or close to an urban area:
% python findNearbyParks.py
Loading urban areas...
Checking parks...
Imperial National Wildlife Refuge is in or near El Centro, CA
TwinLakesStateBeach is in or near Santa Cruz-Watsonville, CA
AdmiralWilliamStandleyState Recreation Area is in or near Ukiah, CA
Agate Beach County Park is in or near San Francisco-Oakland-Fremont, CA
...
Note that our program uses angular distances to decide if a park is in or near a given urban area. An angular distance is the angle (in decimal degrees) between two rays going out from the center of the Earth to the Earth’s surface. Because a degree of angular measurement (at least for the latitudes we are dealing with here) roughly equals 100 km on the Earth’s surface, an angular measurement of 0.1 roughly equals a real distance of 10 km.
Using angular measurements makes the distance calculation easy and quick to calculate, though it doesn’t give an exact distance on the Earth’s surface. If your application requires exact distances, you could start by using an angular distance to filter out the features obviously too far away, and then obtain an exact result for the remaining features by calculating the point on the polygon’s boundary that is closest to the desired point, and then calculating the linear distance between the two points. You would then discard the points that exceed your desired exact linear distance.
Imagine that you have two points on the Earth’s surface with a straight line drawn between them:
Each point can be described as a coordinate using some arbitrary coordinate system (for example, using latitude and longitude values), while the length of the straight line could be described as the distance between the two points.
Given any two coordinates, it is possible to calculate the distance between them. Conversely, you can start with one coordinate, a desired distance and a direction, and then calculate the coordinates for the other point.
Of course, because the Earth’s surface is not flat, we aren’t really dealing with straight lines at all. Rather, we are calculating geodetic or Great Circle distances across the surface of the Earth.
The pyproj Python library allows you to perform these types of calculations for any given datum. You can also use pyproj to convert from projected coordinates back to geographic coordinates, and vice versa, allowing you to perform these sorts of calculations for any desired datum, coordinate system, and projection.
Ultimately, a geometry such as a line or a polygon consists of nothing more than a list of connected points. This means that, using the above process, you can calculate the geodetic distance between each of the points in any polygon and total the results to get the actual length for any geometry. Let’s use this knowledge to solve a realworld problem.
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…