Read more about this book |

OpenCV (Open Source Computer Vision) is an open source library containing more than 500 optimized algorithms for image and video analysis. Since its introduction in 1999, it has been largely adopted as the primary development tool by the community of researchers and developers in computer vision. OpenCV was originally developed at Intel by a team led by Gary Bradski as an initiative to advance research in vision and promote the development of rich, vision-based CPU-intensive applications.

In this article by **Robert Laganière**, author of OpenCV 2 Computer Vision Application Programming Cookbook, we will cover:

- Calibrating a camera
- Computing the fundamental matrix of an image pair
- Matching images using random sample consensus
- Computing a homography between two images

*(For more resources related to the article, see here.)*

# Introduction

Images are generally produced using a digital camera that captures a scene by projecting light onto an image sensor going through its lens. The fact that an image is formed through the projection of a 3D scene onto a 2D plane imposes the existence of important relations between a scene and its image, and between different images of the same scene. Projective geometry is the tool that is used to describe and characterize, in mathematical terms, the process of image formation. In this article, you will learn some of the fundamental projective relations that exist in multi-view imagery and how these can be used in computer vision programming. But before we start the recipes, let’s explore the basic concepts related to scene projection and image formation.

## Image formation

Fundamentally, the process used to produce images has not changed since the beginning of photography. The light coming from an observed scene is captured by a camera through a frontal **aperture** and the captured light rays hit an **image plane** (or **image sensor**) located on the back of the camera. Additionally, a lens is used to concentrate the rays coming from the different scene elements. This process is illustrated by the following figure:

Here, **do** is the distance from the lens to the observed object, **di** is the distance from the lens to the image plane, and **f** is the **focal length** of the lens. These quantities are related by the so-called **thin lens equation**:

In computer vision, this camera model can be simplified in a number of ways. First, we can neglect the effect of the lens by considering a camera with an infinitesimal aperture since, in theory, this does not change the image. Only the central ray is therefore considered. Second, since most of the time we have *do>>di*, we can assume that the image plane is located at the focal distance. Finally, we can notice from the geometry of the system, that the image on the plane is inverted. We can obtain an identical but upright image by simply positioning the image plane in front of the lens. Obviously, this is not physically feasible, but from a mathematical point of view, this is completely equivalent. This simplified model is often referred to as the **pin-hole** camera model and it is represented as follows:

From this model, and using the law of similar triangles, we can easily derive the basic projective equation:

The size (**hi**) of the image of an object (of height **ho**) is therefore inversely proportional to its distance (**do**) from the camera which is naturally true. This relation allows the position of the image of a 3D scene point to be predicted onto the image plane of a camera.

# Calibrating a camera

From the introduction of this article, we learned that the essential parameters of a camera under the pin-hole model are its focal length and the size of the image plane (which defines the field of view of the camera). Also, since we are dealing with digital images, the number of pixels on the image plane is another important characteristic of a camera. Finally, in order to be able to compute the position of an image’s scene point in pixel coordinates, we need one additional piece of information. Considering the line coming from the focal point that is orthogonal to the image plane, we need to know at which pixel position this line pierces the image plane. This point is called the **principal point**. It could be logical to assume that this principal point is at the center of the image plane, but in practice, this one might be off by few pixels depending at which precision the camera has been manufactured.

Camera calibration is the process by which the different camera parameters are obtained. One can obviously use the specifications provided by the camera manufacturer, but for some tasks, such as 3D reconstruction, these specifications are not accurate enough. Camera calibration will proceed by showing known patterns to the camera and analyzing the obtained images. An optimization process will then determine the optimal parameter values that explain the observations. This is a complex process but made easy by the availability of OpenCV calibration functions.

## How to do it…

To calibrate a camera, the idea is show to this camera a set of scene points for which their 3D position is known. You must then determine where on the image these points project. Obviously, for accurate results, we need to observe several of these points. One way to achieve this would be to take one picture of a scene with many known 3D points. A more convenient way would be to take several images from different viewpoints of a set of some 3D points. This approach is simpler but requires computing the position of each camera view, in addition to the computation of the internal camera parameters which fortunately is feasible.

OpenCV proposes to use a chessboard pattern to generate the set of 3D scene points required for calibration. This pattern creates points at the corners of each square, and since this pattern is flat, we can freely assume that the board is located at Z=0 with the X and Y axes well aligned with the grid. In this case, the calibration process simply consists of showing the chessboard pattern to the camera from different viewpoints. Here is one example of a calibration pattern image:

The nice thing is that OpenCV has a function that automatically detects the corners of this chessboard pattern. You simply provide an image and the size of the chessboard used (number of vertical and horizontal inner corner points). The function will return the position of these chessboard corners on the image. If the function fails to find the pattern, then it simply returns false:

// output vectors of image points

std::vector<cv::Point2f> imageCorners;

// number of corners on the chessboard

cv::Size boardSize(6,4);

// Get the chessboard corners

bool found = cv::findChessboardCorners(image,

boardSize, imageCorners);

Note that this function accepts additional parameters if one needs to tune the algorithm, which are not discussed here. There is also a function that draws the detected corners on the chessboard image with lines connecting them in sequence:

//Draw the corners

cv::drawChessboardCorners(image,

boardSize, imageCorners,

found); // corners have been found

The image obtained is seen here:

The lines connecting the points shows the order in which the points are listed in the vector of detected points. Now to calibrate the camera, we need to input a set of such image points together with the coordinate of the corresponding 3D points. Let’s encapsulate the calibration process in a *CameraCalibrator* class:

class CameraCalibrator {

// input points:

// the points in world coordinates

std::vector<std::vector<cv::Point3f>> objectPoints;

// the point positions in pixels

std::vector<std::vector<cv::Point2f>> imagePoints;

// output Matrices

cv::Mat cameraMatrix;

cv::Mat distCoeffs;

// flag to specify how calibration is done

int flag;

// used in image undistortion

cv::Mat map1,map2;

bool mustInitUndistort;

public:

CameraCalibrator() : flag(0), mustInitUndistort(true) {};

As mentioned previously, the 3D coordinates of the points on the chessboard pattern can be easily determined if we conveniently place the reference frame on the board. The method that accomplishes this takes a vector of the chessboard image filename as input:

// Open chessboard images and extract corner points

int CameraCalibrator::addChessboardPoints(

const std::vector<std::string>& filelist,

cv::Size & boardSize) {

// the points on the chessboard

std::vector<cv::Point2f> imageCorners;

std::vector<cv::Point3f> objectCorners;

// 3D Scene Points:

// Initialize the chessboard corners

// in the chessboard reference frame

// The corners are at 3D location (X,Y,Z)= (i,j,0)

for (int i=0; i<boardSize.height; i++) {

for (int j=0; j<boardSize.width; j++) {

objectCorners.push_back(cv::Point3f(i, j, 0.0f));

}

}

// 2D Image points:

cv::Mat image; // to contain chessboard image

int successes = 0;

// for all viewpoints

for (int i=0; i<filelist.size(); i++) {

// Open the image

image = cv::imread(filelist[i],0);

// Get the chessboard corners

bool found = cv::findChessboardCorners(

image, boardSize, imageCorners);

// Get subpixel accuracy on the corners

cv::cornerSubPix(image, imageCorners,

cv::Size(5,5),

cv::Size(-1,-1),

cv::TermCriteria(cv::TermCriteria::MAX_ITER +

cv::TermCriteria::EPS,

30, // max number of iterations

0.1)); // min accuracy

//If we have a good board, add it to our data

if (imageCorners.size() == boardSize.area()) {

// Add image and scene points from one view

addPoints(imageCorners, objectCorners);

successes++;

}

}

return successes;

}

The first loop inputs the 3D coordinates of the chessboard, which are specified in an arbitrary square size unit here. The corresponding image points are the ones provided by the *cv::findChessboardCorners* function. This is done for all available viewpoints. Moreover, in order to obtain a more accurate image point location, the function *cv::cornerSubPix* can be used and as the name suggests, the image points will then be localized at sub-pixel accuracy. The termination criterion that is specified by the *cv::TermCriteria* object defines a maximum number of iterations and a minimum accuracy in sub-pixel coordinates. The first of these two conditions that is reached will stop the corner refinement process.

When a set of chessboard corners has been successfully detected, these points are added to our vector of image and scene points:

// Add scene points and corresponding image points

void CameraCalibrator::addPoints(const std::vector<cv::Point2f>&

imageCorners, const std::vector<cv::Point3f>& objectCorners) {

// 2D image points from one view

imagePoints.push_back(imageCorners);

// corresponding 3D scene points

objectPoints.push_back(objectCorners);

}

The vectors contains *std::vector* instances. Indeed, each vector element being a vector of points from one view.

Once a sufficient number of chessboard images have been processed (and consequently a large number of 3D scene point/2D image point correspondences are available), we can initiate the computation of the calibration parameters:

// Calibrate the camera

// returns the re-projection error

double CameraCalibrator::calibrate(cv::Size &imageSize)

{

// undistorter must be reinitialized

mustInitUndistort= true;

//Output rotations and translations

std::vector<cv::Mat> rvecs, tvecs;

// start calibration

return

calibrateCamera(objectPoints, // the 3D points

imagePoints, // the image points

imageSize, // image size

cameraMatrix,// output camera matrix

distCoeffs, // output distortion matrix

rvecs, tvecs,// Rs, Ts

flag); // set options

}

In practice, 10 to 20 chessboard images are sufficient, but these must be taken from different viewpoints at different depths. The two important outputs of this function are the camera matrix and the distortion parameters. The camera matrix will be described in the next section. For now, let’s consider the distortion parameters. So far, we have mentioned that with the pin-hole camera model, we can neglect the effect of the lens. But this is only possible if the lens used to capture an image does not introduce too important optical distortions. Unfortunately, this is often the case with lenses of lower quality or with lenses having a very short focal length. You may have already noticed that in the image we used for our example, the chessboard pattern shown is clearly distorted. The edges of the rectangular board being curved in the image. It can also be noticed that this distortion becomes more important as we move far from the center of the image. This is a typical distortion observed with fish-eye lens and it is called **radial distortion**. The lenses that are used in common digital cameras do not exhibit such a high degree of distortion, but in the case of the lens used here, these distortions cannot certainly be ignored.

It is possible to compensate for these deformations by introducing an appropriate model. The idea is to represent the distortions induced by a lens by a set of mathematical equations. Once established, these equations can then be reverted in order to undo the distortions visible on the image. Fortunately, the exact parameters of the transformation that will correct the distortions can be obtained together with the other camera parameter during the calibration phase. Once this is done, any image from the newly calibrated camera can be undistorted:

// remove distortion in an image (after calibration)

cv::Mat CameraCalibrator::remap(const cv::Mat &image) {

cv::Mat undistorted;

if (mustInitUndistort) { // called once per calibration

cv::initUndistortRectifyMap(

cameraMatrix, // computed camera matrix

distCoeffs, // computed distortion matrix

cv::Mat(), // optional rectification (none)

cv::Mat(), // camera matrix to generate undistorted

image.size(), // size of undistorted

CV_32FC1, // type of output map

map1, map2); // the x and y mapping functions

mustInitUndistort= false;

}

// Apply mapping functions

cv::remap(image, undistorted, map1, map2,

cv::INTER_LINEAR); // interpolation type

return undistorted;

}

Which results in the following image:

As you can see, once the image is undistorted, we obtain a regular perspective image.

## How it works…

In order to explain the result of the calibration, we need to go back to the figure in the introduction which describes the pin-hole camera model. More specifically, we want to demonstrate the relation between a point in 3D at position (X,Y,Z) and its image (x,y) on a camera specified in pixel coordinates. Let’s redraw this figure by adding a reference frame that we position at the center of the projection as seen here:

Note that the Y-axis is pointing downward to get a coordinate system compatible with the usual convention that places the image origin at the upper-left corner. We learned previously that the point (X,Y,Z) will be projected onto the image plane at (fX/Z,fY/Z). Now, if we want to translate this coordinate into pixels, we need to divide the 2D image position by, respectively, the pixel width (px) and height (py). We notice that by dividing the focal length f given in world units (most often meters or millimeters) by px, then we obtain the focal length expressed in (horizontal) pixels. Let’s then define this term as fx. Similarly, fy =f/py is defined as the focal length expressed in vertical pixel unit. The complete projective equation is therefore:

Recall that (u0,v0) is the principal point that is added to the result in order to move the origin to the upper-left corner of the image. These equations can be rewritten in matrix form through the introduction of **homogeneous coordinates** in which 2D points are represented by 3-vectors, and 3D points represented by 4-vectors (the extra coordinate is simply an arbitrary scale factor that need to be removed when a 2D coordinate needs to be extracted from a homogeneous 3-vector). Here is the projective equation rewritten:

The second matrix is a simple projection matrix. The first matrix includes all of the camera parameters which are called the **intrinsic parameters** of the camera. This 3×3 matrix is one of the output matrices returned by the *cv::calibrateCamera* function. There is also a function called *cv::calibrationMatrixValues* that returns the value of the intrinsic parameters given a calibration matrix.

More generally, when the reference frame is not at the projection center of the camera, we will need to add a rotation (a 3×3 matrix) and a translation vector (*3×1* matrix). These two matrices describe the rigid transformation that must be applied to the 3D points in order to bring them back to the camera reference frame. Therefore, we can rewrite the projection equation in its most general form:

Remember that in our calibration example, the reference frame was placed on the chessboard. Therefore, there is a rigid transformation (rotation and translation) that must be computed for each view. These are in the output parameter list of the *cv::calibrateCamera* function. The rotation and translation components are often called the **extrinsic parameters** of the calibration and they are different for each view. The intrinsic parameters remain constant for a given camera/lens system. The intrinsic parameters of our test camera obtained from a calibration based on 20 chessboard images are fx=167, fy=178, u0=156, v0=119. These results are obtained by *cv::calibrateCamera* through an optimization process aimed at finding the intrinsic and extrinsic parameters that will minimize the difference between the predicted image point position, as computed from the projection of the 3D scene points, and the actual image point position, as observed on the image. The sum of this difference for all points specified during the calibration is called the **re-projection error**.

To correct the distortion, OpenCV uses a polynomial function that is applied to the image point in order to move them at their undistorted position. By default, 5 coefficients are used; a model made of 8 coefficients is also available. Once these coefficients are obtained, it is possible to compute 2 mapping functions (one for the x coordinate and one for the y) that will give the new undistorted position of an image point on a distorted image. This is computed by the function *cv::initUndistortRectifyMap* and the function *cv::remap* remaps all of the points of an input image to a new image. Note that because of the non-linear transformation, some pixels of the input image now fall outside the boundary of the output image. You can expand the size of the output image to compensate for this loss of pixels, but you will now obtain output pixels that have no values in the input image (they will then be displayed as black pixels).

## There’s more…

When a good estimate of the camera intrinsic parameters are known, it could be advantageous to input them to the *cv::calibrateCamera* function. They will then be used as initial values in the optimization process. To do so, you just need to add the flag *CV_CALIB_USE_INTRINSIC_GUESS* and input these values in the calibration matrix parameter. It is also possible to impose a fixed value for the principal point (*CV_CALIB_ FIX_PRINCIPAL_POINT*), which can often be assumed to be the central pixel. You can also impose a fixed ratio for the focal lengths fx and fy (*CV_CALIB_FIX_RATIO*) in which case you assume pixels of square shape.