In this article by Fábio M. Soares and Alan M.F. Souza, the authors of the book Neural Network Programming with Java - Second Edition, we will cover pattern recognition, neural networks in pattern recognition, and text recognition (OCR).

We all know that humans can read and recognize images faster than any supercomputer; however we have seen so far that neural networks show amazing capabilities of learning through data in both supervised and unsupervised way. In this article we present an additional case of pattern recognition involving an example of Optical Character Recognition (OCR). Neural networks can be trained to strictly recognize digits written in an image file. The topics of this article are:

Pattern recognition
- Defined classes
- Undefined classes
Neural networks in pattern recognition
- MLP
Text recognition (OCR)
- Preprocessing and Classes definition

(For more resources related to this topic, see here.)

Pattern recognition

Patterns are a bunch of data and elements that look similar to each other, in such a way that they can occur systematically and repeat from time to time. This is a task that can be solved mainly by unsupervised learning by clustering; however, when there are labelled data or defined classes of data, this task can be solved by supervised methods. We as humans perform this task more often than we can imagine. When we see objects and recognize them as belonging to a certain class, we are indeed recognizing a pattern. Also when we analyze charts discrete events and time series, we might find an evidence of some sequence of events that repeat systematically under certain conditions. In summary, patterns can be learned by data observations.

Examples of pattern recognition tasks include, not liming to:

Shapes recognition
Objects classification
Behavior clustering
Voice recognition
OCR
Chemical reactions taxonomy

Defined classes

In the existence of a list of classes that has been predefined for a specific domain, then each class is considered to be a pattern, therefore every data record or occurrence is assigned one of these predefined classes.

The predefinition of classes can usually be performed by an expert or based on a previous knowledge of the application domain. Also it is desirable to apply defined classes when we want the data to be classified strictly into one of the predefined classes.

One illustrated example for pattern recognition using defined classes is animal recognition by image, shown in the next figure. The pattern recognizer however should be trained to catch all the characteristics that formally define the classes. In the example eight figures of animals are shown, belonging to two classes: mammals and birds. Since this is a supervised mode of learning, the neural network should be provided with a sufficient number of images that allow it to properly classify new images:

text-recognition-img-0

Of course, sometimes the classification may fail, mainly due to similar hidden patterns in the images that neural networks may catch and also due to small nuances present in the shapes. For example, the dolphin has flippers but it is still a mammal. Sometimes in order to obtain a better classification, it is necessary to apply preprocessing and ensure that the neural network will receive the appropriate data that would allow for classification.

Undefined classes

When data are unlabeled and there is no predefined set of classes, it is a scenario for unsupervised learning. Shapes recognition are a good example since they may be flexible and have infinite number of edges, vertices or bindings:

text-recognition-img-1

In the previous figure, we can see some sorts of shapes and we want to arrange them, whereby the similar ones can be grouped into the same cluster. Based on the shape information that is present in the images, it is likely for the pattern recognizer to classify the rectangle, the square and the rectangular triangle in into the same group. But if the information were presented to the pattern recognizer, not as an image, but as a graph with edges and vertices coordinates, the classification might change a little.

In summary, the pattern recognition task may use both supervised and unsupervised mode of learning, basically depending of the objective of recognition.

Neural networks in pattern recognition

For pattern recognition the neural network architectures that can be applied are the MLPs (supervised) and the Kohonen network (unsupervised). In the first case, the problem should be set up as a classification problem, that is. the data should be transformed into the X-Y dataset, where for every data record in X there should be a corresponding class in Y. The output of the neural network for classification problems should have all of the possible classes, and this may require preprocessing of the output records.

For the other case, the unsupervised learning, there is no need to apply labels on the output, however, the input data should be properly structured as well. To remind the reader the schema of both neural networks are shown in the next figure:

text-recognition-img-2

Data pre-processing

We have to deal with all possible types of data, that is., numerical (continuous and discrete) and categorical (ordinal or unscaled).

But here we have the possibility to perform pattern recognition on multimedia content, such as images and videos. So how could multimedia be handled? The answer of this question lies in the way these contents are stored in files. Images, for example, are written with a representation of small colored points called pixels. Each color can be coded in an RGB notation where the intensity of red, green and blue define every color the human eye is able to see. Therefore an image of dimension 100x100 would have 10,000 pixels, each one having 3 values for red, green and blue, yielding a total of 30,000 points. That is the challenge for image processing in neural networks.

Some methods, may reduce this huge number of dimensions. Afterwards an image can be treated as big matrix of numerical continuous values.

For simplicity in this article we are applying only gray-scaled images with small dimension.

Text recognition (OCR)

Many documents are now being scanned and stored as images, making necessary the task of converting these documents back into text, for a computer to apply edition and text processing. However, this feature involves a number of challenges:

Variety of text font
Text size
Image noise
Manuscripts

In the spite of that, humans can easily interpret and read even the texts written in a bad quality image. This can be explained by the fact that humans are already familiar with the text characters and the words in their language. Somehow the algorithm must become acquainted with these elements (characters, digits, signalization, and so on), in order to successfully recognize texts in images.

Digits recognition

Although there are a variety of tools available in the market for OCR, this remains still a big challenge for an algorithm to properly recognize texts in images. So we will be restricting our application in a smaller domain, so we could face simpler problems. Therefore, in this article we are going to implement a Neural Network to recognize digits from 0 to 9 represented on images. Also the images will have standardized and small dimensions, for simplicity purposes.