Computer vision applications have become quite ubiquitous in our lives. The applications
are varied, ranging from apps that play Virtual Reality (VR) or Augmented Reality (AR)
games to applications for scanning documents using smartphone cameras.
On our smartphones, we have QR code scanning and face detection, and now we even have facial recognition techniques. Online, we can now search using images and find similar looking images. Photo sharing applications can identify people and make an album based on the friends or family found in the photos. Due to improvements in image stabilization techniques, even with shaky hands, we can create stable videos.
In this context, we will learn about basic computer vision, reading an image and image color conversion.
With the recent advancements in deep learning techniques, applications like image
classification, object detection, tracking, and so on have become more accurate and this has led to the development of more complex autonomous systems, such as drones, self-driving cars, humanoids, and so on. Using deep learning, images can be transformed into more complex details; for example, images can be converted into Van Gogh style paintings.
Such progress in several domains makes a non-expert wonder, how computer vision is capable of inferring this information from images. The motivation lies in human perception and the way we can perform complex analyzes of the environment around us. We can estimate the closeness of, structure and shape of objects, and estimate the textures of a surface too. Even under different lights, we can identify objects and even recognize something if we have seen it before.
Considering these advancements and motivations, one of the basic questions that arise is
what is computer vision? In this article, we will begin by answering this question and then
provide a broader overview of the various sub-domains and applications within computer
vision. Later in the article, we will start with basic image operations.
What is computer vision?
In order to begin the discussion on computer vision, observe the following image:
Even if we have never done this activity before, we can clearly tell that the image is of
people skiing in the snowy mountains on a cloudy day. This information that we perceive is
quite complex and can be subdivided into more basic inferences for a computer vision
The most basic observation that we can get from an image is of the things or objects in it. In the previous image, the various things that we can see are trees, mountains, snow, sky,
people, and so on. Extracting this information is often referred to as image classification,
where we would like to label an image with a predefined set of categories. In this case, the
labels are the things that we see in the image.
A wider observation that we can get from the previous image is landscape. We can tell that
the image consists of snow, mountains, and sky, as shown in the following image:
Although it is difficult to create exact boundaries for where the snow, mountain, and sky are in the image, we can still identify approximate regions of the image for each of them. This is often termed as segmentation of an image, where we break it up into regions according to object occupancy.
Making our observation more concrete, we can further identify the exact boundaries of objects in the image, as shown in the following figure:
In the image, we see that people are doing different activities and as such have different
shapes; some are sitting, some are standing, some are skiing. Even with this many
variations, we can detect objects and can create bounding boxes around them. Only a few
bounding boxes are shown in the image for understanding—we can observe much more
While, in the image, we show rectangular bounding boxes around some objects, we are not
categorizing what object is in the box. The next step would be to say the box contains a
person. This combined observation of detecting and categorizing the box is often referred to as object detection.
Extending our observation of people and surroundings, we can say that different people in the image have different heights, even though some are nearer and others are farther from the camera. This is due to our intuitive understanding of image formation and the relations of objects. We know that a tree is usually much taller than a person, even if the trees in the image are shorter than the people nearer to the camera. Extracting the information about geometry in the image is another sub-field of computer vision, often referred to as image reconstruction.
Computer vision is everywhere
In the previous section, we developed an initial understanding of computer vision. With
this understanding, there are several algorithms that have been developed and are used in
industrial applications. Studying these not only improve our understanding of the system
but can also seed newer ideas to improve overall systems.
In this section, we will extend our understanding of computer vision by looking at various
applications and their problem formulations:
- Image classification: In the past few years, categorizing images based on the
objects within has gained popularity. This is due to advances in algorithms as
well as the availability of large datasets. Deep learning algorithms for image
classification have significantly improved the accuracy while being trained on
datasets like Imagenet. The trained model is often further used to improve other recognition algorithms like object detection, as well as image categorization in online applications. In this book, we will see how to create a simple algorithm to classify images using deep learning models. [box type=”note” align=”” class=”” width=””]Here is a simple tutorial to see how to perform image classification in OpenCV to see object detection in action.[/box]
- Object detection: Not just self-driving cars, but robotics, automated retail stores,
traffic detection, smartphone camera apps, image filters and many more
applications use object detection. These also benefit from deep learning and
vision techniques as well as the availability of large, annotated datasets. We saw
an introduction to object detection in the previous section that produces
bounding boxes around objects and also categorizes what object is inside the box. [box type=”note” align=”” class=”” width=””]Check out this tutorial on fingerprint detection in OpenCV to see object detection in action.[/box]
- Object Tracking: Following robots, surveillance cameras and people interaction
are few of the several applications of object tracking. This consists of defining the
location and keeps track of corresponding objects across a sequence of images.
- Image geometry: This is often referred to as computing the depth of objects from
the camera. There are several applications in this domain too. Smartphones apps
are now capable of computing three-dimensional structures from the video
created onboard. Using the three-dimensional reconstructed digital models,
further extensions like AR or VR application are developed to interface the image
world with the real world.
- Image segmentation: This is creating cluster regions in images, such that one
cluster has similar properties. The usual approach is to cluster image pixels
belonging to the same object. Recent applications have grown in self-driving cars
and healthcare analysis using image regions.
- Image generation: These have a greater impact in the artistic domain, merging
different image styles or generating completely new ones. Now, we can mix and
merge Van Gogh’s painting style with smartphone camera images to create
images that appear as if they were painted in a similar style to Van Gogh’s.
The field is quickly evolving, not only through making newer methods of image analysis
but also finding newer applications where computer vision can be used. Therefore,
applications are not just limited to those explained above.
[box type=”note” align=”” class=”” width=””]Check out this post on Image filtering techniques in OpenCV.[/box]
Getting started with image operations
In this section, we will see basic image operations for reading and writing images. We will
also, see how images are represented digitally. Before we proceed further with image IO, let’s see what an image is made up of in the digital world.
An image is simply a two-dimensional array, with each cell of the array containing intensity values. A simple image is a black and white image with 0s representing white and 1s representing black. This is also referred to as a binary image. A further extension of this is dividing black and white into a broader grayscale with a range of 0 to An image of this type, in the three-dimensional view, is as follows, where x and y are pixel locations and z is the intensity value:
This is a top view, but on viewing sideways we can see the variation in the intensities that make up the image:
We can see that there are several peaks and image intensities that are not smooth. Let’s apply smoothing algorithm.
As we can see, pixel intensities form more continuous formations, even though there is no
significant change in the object representation.
import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D import cv2 # loads and read an image from path to file img = cv2.imread('../figures/building_sm.png') # convert the color to grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # resize the image(optional) gray = cv2.resize(gray, (160, 120)) # apply smoothing operation gray = cv2.blur(gray,(3,3)) # create grid to plot using numpy xx, yy = np.mgrid[0:gray.shape, 0:gray.shape] # create the figure fig = plt.figure() ax = fig.gca(projection='3d') ax.plot_surface(xx, yy, gray ,rstride=1, cstride=1, cmap=plt.cm.gray, linewidth=1) # show it plt.show()
This code uses the following libraries: NumPy, OpenCV, and matplotlib. In the further sections of this article, we will see operations on images using their color properties. Please download the relevant images from the website to view them clearly.
Reading an image
We can use the OpenCV library to read an image, as follows. Here, change the path to the
image file according to use:
import cv2 # loads and read an image from path to file img = cv2.imread('../figures/flower.png') # displays previous image cv2.imshow("Image",img) # keeps the window open until a key is pressed cv2.waitKey(0) # clears all window buffers cv2.destroyAllWindows()
The resulting image is shown in the following screenshot:
Here, we read the image in BGR color format where B is blue, G is green, and R is red. Each pixel in the output is collectively represented using the values of each of the colors. An example of the pixel location and its color values is shown in the previous figure bottom.
Image color conversions
An image is made up pixels and is usually visualized according to the value stored. There is also an additional property that makes different kinds of image. Each of the value stored in a pixel is linked to a fixed representation. For example, a pixel value of 10 can represent gray intensity value 1o or blue color intensity value 10 and so on. It is therefore important to understand different color types and their conversion. In this section, we will see color types and conversions using OpenCV.
[box type=”note” align=”” class=”” width=””]Did you know OpenCV 4 is on schedule for July release, check out this news piece to know about it in detail.[/box]
Grayscale: This is a simple one channel image with values ranging from 0 to 255 that represent the intensity of pixels. The previous image can be converted to grayscale, as follows:
import cv2 # loads and read an image from path to file img = cv2.imread('../figures/flower.png') # convert the color to grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # displays previous image cv2.imshow("Image",gray) # keeps the window open until a key is pressed cv2.waitKey(0) # clears all window buffers cv2.destroyAllWindows()
The resulting image is as shown in the following screenshot:
HSV and HLS: These are another representation of color representing H is hue, S is saturation, V is value, and L is lightness. These are motivated by the human perception system. An example of image conversion for these is as follows:
# convert the color to hsv hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) # convert the color to hls hls = cv2.cvtColor(img, cv2.COLOR_BGR2HLS)
This conversion is as shown in the following figure, where an input image read in BGR
format is converted to each of the HLS (on left) and HSV (on right) colortypes:
LAB color space: Denoted L for lightness, A for green-red colors, and B for blueyellow colors, this consists of all perceivable colors. This is used to convert between one type of colorspace (for example, RGB) to others (such as CMYK) because of its device independence properties. On devices where the format is different to that of the image that is sent, the incoming image color space is first converted to LAB and then to the corresponding space available on the device. The output of converting an RGB image is as follows:
This article is an excerpt from the book Practical Computer Vision written by Abhinav Dadhich. This book will teach you different computer vision techniques and show how to apply them in practical applications.