[box type=”note” align=”” class=”” width=””]This article is an excerpt from the book by Rodolfo Bonnin titled Machine Learning for Developers.
Surprisingly the question frequently asked by developers across the globe is, “How do I get started in Machine Learning?”. One reason could be attributed to the vastness of the subject area. This book is a systematic guide teaching you how to implement various Machine Learning techniques and their day-to-day application and development. [/box]
In the tutorial given below, we have implemented convolution in a practical example to see it applied to a real image and get intuitive ideas of its effect.We will use different kernels to detect high-detail features and execute subsampling operation to get optimized and brighter image.
This is a simple intuitive implementation of discrete convolution concept by applying it to a sample image with different types of kernel. Let’s import the required libraries. As we will implement the algorithms in the clearest possible way, we will just use the minimum necessary ones, such as
NumPy
:
import matplotlib.pyplot as plt import imageio
import numpy as np
Using the imread
method of the imageio
package, let’s read the image (imported as three equal channels, as it is grayscale). We then slice the first channel, convert it to a floating point, and show it using matplotlib
:
arr = imageio.imread("b.bmp") [:,:,0].astype(np.float) plt.imshow(arr, cmap=plt.get_cmap('binary_r')) plt.show()
Now it’s time to define the kernel convolution operation. As we did previously, we will simplify the operation on a 3 x 3 kernel in order to better understand the border conditions. apply3x3kernel
will apply the kernel over all the elements of the image, returning a new equivalent image. Note that we are restricting the kernels to 3 x 3 for simplicity, and so the 1 pixel border of the image won’t have a new value because we are not taking padding into consideration:
class ConvolutionalOperation:
def apply3x3kernel(self, image, kernel): # Simple 3x3 kernel operation newimage=np.array(image)
for m in range(1,image.shape[0]-2):
for n in range(1,image.shape[1]-2): newelement = 0
for i in range(0, 3):
for j in range(0, 3):
newelement = newelement + image[m - 1 + i][n - 1+
j]*kernel[i][j]
newimage[m][n] = newelement
return (newimage)
As we saw in the previous sections, the different kernel configurations highlight different elements and properties of the original image, building filters that in conjunction can specialize in very high-level features after many epochs of training, such as eyes, ears, and doors. Here, we will generate a dictionary of kernels with a name as the key, and the coefficients of the kernel arranged in a 3 x 3 array. The Blur filter is equivalent to calculating the average of the 3 x 3 point neighborhood, Identity simply returns the pixel value as is, Laplacian is a classic derivative filter that highlights borders, and then the two Sobel filters will mark horizontal edges in the first case, and vertical ones in the second case:
kernels = {"Blur":[[1./16., 1./8., 1./16.], [1./8., 1./4., 1./8.], [1./16., 1./8., 1./16.]]
,"Identity":[[0, 0, 0], [0., 1., 0.], [0., 0., 0.]]
,"Laplacian":[[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]]
,"Left Sobel":[[1., 0., -1.], [2., 0., -2.], [1., 0., -1.]]
,"Upper Sobel":[[1., 2., 1.], [0., 0., 0.], [-1., -2., -1.]]}
Let’s generate a ConvolutionalOperation
object and generate a comparative kernel graphical chart to see how they compare:
conv = ConvolutionalOperation() plt.figure(figsize=(30,30))
fig, axs = plt.subplots(figsize=(30,30)) j=1
for key,value in kernels.items(): axs = fig.add_subplot(3,2,j)
out = conv.apply3x3kernel(arr, value) plt.imshow(out, cmap=plt.get_cmap('binary_r')) j=j+1
plt.show()
<matplotlib.figure.Figure at 0x7fd6a710a208>
In the final image you can clearly see how our kernel has detected several high-detail features on the image—in the first one, you see the unchanged image because we used the unit kernel, then the Laplacian edge finder, the left border detector, the upper border detector, and then the blur operator:
Having reviewed the main characteristics of the convolution operation for the continuous and discrete fields, we can conclude by saying that, basically, convolution kernels highlight or hide patterns. Depending on the trained or (in our example) manually set parameters, we can begin to discover many elements in the image, such as orientation and edges in different dimensions. We may also cover some unwanted details or outliers by blurring kernels, for example. Additionally, by piling layers of convolutions, we can even highlight higher-order composite elements, such as eyes or ears.
This characteristic of convolutional neural networks is their main advantage over previous data-processing techniques: we can determine with great flexibility the primary components of a certain dataset, and represent further samples as a combination of these basic building blocks.
Now it’s time to look at another type of layer that is commonly used in combination with the former—the pooling layer.
Subsampling operation (pooling)
The subsampling operation consists of applying a kernel (of varying dimensions) and reducing the extension of the input dimensions by dividing the image into mxn blocks and taking one element representing that block, thus reducing the image resolution by some determinate factor. In the case of a 2 x 2 kernel, the image size will be reduced by half. The most well-known operations are maximum (max pool), average (avg pool), and minimum (min pool).
The following image gives you an idea of how to apply a 2 x 2 maxpool kernel, applied to a one-channel 16 x 16 matrix. It just maintains the maximum value of the internal zone it covers:
Now that we have seen this simple mechanism, let’s ask ourselves, what’s the main purpose of it? The main purpose of subsampling layers is related to the convolutional layers: to reduce the quantity and complexity of information while retaining the most important information elements. In other word, they build a compact representation of the underlying information.
Now it’s time to write a simple pooling operator. It’s much easier and more direct to write than a convolutional operator, and in this case we will only be implementing max pooling, which chooses the brightest pixel in the 4 x 4 vicinity and projects it to the final image:
class PoolingOperation:
def apply2x2pooling(self, image, stride): # Simple 2x2 kernel operation newimage=np.zeros((int(image.shape[0]/2),int(image.shape[1]/2)),np.float32)
for m in range(1,image.shape[0]-2,2):
for n in range(1,image.shape[1]-2,2): newimage[int(m/2),int(n/2)] = np.max(image[m:m+2,n:n+2])
return (newimage)
Let’s apply the newly created pooling operation, and as you can see, the final image resolution is much more blocky, and the details, in general, are brighter:
plt.figure(figsize=(30,30)) pool=PoolingOperation()
fig, axs = plt.subplots(figsize=(20,10)) axs = fig.add_subplot(1,2,1)
plt.imshow(arr, cmap=plt.get_cmap('binary_r')) out=pool.apply2x2pooling(arr,1)
axs = fig.add_subplot(1,2,2)
plt.imshow(out, cmap=plt.get_cmap('binary_r')) plt.show()
Here you can see the differences, even though they are subtle. The final image is of lower precision, and the chosen pixels, being the maximum of the environment, produce a brighter image:
This simple implementation with various kernels simplified the working mechanism of discrete convolution operation on a 2D dataset. Using various kernels and subsampling operation, the hidden patterns of dataset are unveiled and the image is made more sharpened, with maximum pixels and much brighter image thereby producing compact representation of the dataset.
If you found this article interesting, do check Machine Learning for Developers and get to know about the advancements in deep learning, adversarial networks, popular programming frameworks to prepare yourself in the ubiquitous field of machine learning.