(For more resources related to this topic, see here.)

Classifying images

Automated Remote Sensing ( ARS ) is rarely ever done in the visible spectrum. The most commonly available wavelengths outside of the visible spectrum are infrared and near-infrared. The following scene is a thermal image (band 10) from a fairly recent Landsat 8 flyover of the US Gulf Coast from New Orleans, Louisiana to Mobile, Alabama. Major natural features in the image are labeled so you can orient yourself:

image-classification-and-feature-extraction-images-img-0

Because every pixel in that image has a reflectance value, it is information. Python can "see" those values and pick out features the same way we intuitively do by grouping related pixel values. We can colorize pixels based on their relation to each other to simplify the image and view related features. This technique is called classification. Classifying can range from fairly simple groupings based only on some value distribution algorithm derived from the histogram to complex methods involving training data sets and even computer learning and artificial intelligence. The simplest forms are called unsupervised classifications, whereas methods involving some sort of training data to guide the computer are called supervised. It should be noted that classification techniques are used across many fields, from medical doctors trying to spot cancerous cells in a patient's body scan, to casinos using facial-recognition software on security videos to automatically spot known con-artists at blackjack tables.

To introduce remote sensing classification we'll just use the histogram to group pixels with similar colors and intensities and see what we get. First you'll need to download the Landsat 8 scene here:

http://geospatialpython.googlecode.com/files/thermal.zip

Instead of our histogram() function from previous examples, we'll use the version included with NumPy that allows you to easily specify a number of bins and returns two arrays with the frequency as well as the ranges of the bin values. We'll use the second array with the ranges as our class definitions for the image. The lut or look-up table is an arbitrary color palette used to assign colors to classes. You can use any colors you want.

import gdalnumeric

# Input file name (thermal image)
src = "thermal.tif"

# Output file name
tgt = "classified.jpg"

# Load the image into numpy using gdal
srcArr = gdalnumeric.LoadFile(src)

# Split the histogram into 20 bins as our classes
classes = gdalnumeric.numpy.histogram(srcArr, bins=20)[1]

# Color look-up table (LUT) - must be len(classes)+1.
# Specified as R,G,B tuples 
lut = [[255,0,0],[191,48,48],[166,0,0],[255,64,64],
[255,115,115],[255,116,0],[191,113,48],[255,178,115],
[0,153,153],[29,115,115],[0,99,99],[166,75,0],
[0,204,0],[51,204,204],[255,150,64],[92,204,204],[38,153,38],
 [0,133,0],[57,230,57],[103,230,103],[184,138,0]]

# Starting value for classification
start = 1

# Set up the RGB color JPEG output image
rgb = gdalnumeric.numpy.zeros((3, srcArr.shape[0],
srcArr.shape[1],), gdalnumeric.numpy.float32)
       
# Process all classes and assign colors
for i in range(len(classes)):
    mask = gdalnumeric.numpy.logical_and(start <= 
 srcArr, srcArr <= classes[i])
    for j in range(len(lut[i])):
      rgb[j] = gdalnumeric.numpy.choose(mask, (rgb[j],  lut[i][j]))
    start = classes[i]+1 

# Save the image    
gdalnumeric.SaveArray(rgb.astype(gdalnumeric.numpy.uint8), 
tgt, format="JPEG")

The following image is our classification output, which we just saved as a JPEG. We didn't specify the prototype argument when saving as an image, so it has no georeferencing information.

image-classification-and-feature-extraction-images-img-1

This result isn't bad for a very simple unsupervised classification. The islands and coastal flats show up as different shades of green. The clouds were isolated as shades of orange and dark blues. We did have some confusion inland where the land features were colored the same as the Gulf of Mexico. We could further refine this process by defining the class ranges manually instead of just using the histogram.