[box type=”note” align=”” class=”” width=””]This article is an excerpt from a book by Michael Beyeler titled *Machine Learning for OpenCV. The code and related files are available on Github here.[/box]*

A famous dataset in the world of machine learning is called the **Iris **dataset. The Iris dataset contains measurements of 150 iris flowers from three different species: **setosa**, **versicolor**, and **viriginica**. These measurements include the length and width of the petals, and the length and width of the sepals, all measured in centimeters:

**Understanding logistic regression**

Despite its name, **logistic regression **can actually be used as a model for classification. It uses a **logistic function **(or **sigmoid**) to convert any real-valued input *x *into a predicted output value *ŷ *that take values between 0 and 1, as shown in the following figure:

** The logistic function**

Rounding *ŷ *to the nearest integer effectively classifies the input as belonging either to class 0 or 1.

Of course, most often, our problems have more than one input or feature value, *x*. For example, the Iris dataset provides a total of four features. For the sake of simplicity, let’s focus here on the first two features, sepal length—which we will call feature *f**1*—and sepal width—which we will call *f**2*. Using the tricks we learned when talking about linear regression, we know we can express the input *x *as a **linear combination **of the two features, *f**1 *and *f**2*:

However, in contrast to linear regression, we are not done yet. From the previous section, we know that the sum of products would result in a real-valued, output—but we are interested in a categorical value, zero or one. This is where the logistic function comes in: it acts as a **squashing function**, *σ*, that compresses the range of possible output values to the range [0, 1]:

[box type=”shadow” align=”” class=”” width=””]Because the output is always between 0 and 1, it can be interpreted as a probability. If we only have a single input variable x, the output value ŷ can be interpreted as the probability of x belonging to class 1.[/box]

Now let’s apply this knowledge to the Iris dataset!

**Loading the training data**

The Iris dataset is included with scikit-learn. We first load all the necessary modules, as we did in our earlier examples:

```
In [1]: import numpy as np
... import cv2
... from sklearn import datasets
... from sklearn import model_selection
... from sklearn import metrics
... import matplotlib.pyplot as plt
... %matplotlib inline
In [2]: plt.style.use('ggplot')
```

Then, loading the dataset is a one-liner:

`In [3]: iris = datasets.load_iris()`

This function returns a dictionary we call iris, which contains a bunch of different fields:

```
In [4]: dir(iris)
Out[4]: ['DESCR', 'data', 'feature_names', 'target', 'target_names']
```

Here, all the data points are contained in ‘data’. There are 150 data points, each of which has four feature values:

```
In [5]: iris.data.shape
Out[5]: (150, 4)
```

These four features correspond to the sepal and petal dimensions mentioned earlier:

```
In [6]: iris.feature_names Out[6]: ['sepal length (cm)',
'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
```

For every data point, we have a class label stored in target:

```
In [7]: iris.target.shape
Out[7]: (150,)
```

We can also inspect the class labels, and find that there is a total of three classes:

```
In [8]: np.unique(iris.target)
Out[8]: array([0, 1, 2])
```

**Making it a binary classification problem**

For the sake of simplicity, we want to focus on a **binary classification problem **for now, where we only have two classes. The easiest way to do this is to discard all data points belonging to a certain class, such as class label 2, by selecting all the rows that **do not **belong to class 2:

```
In [9]: idx = iris.target != 2
... data = iris.data[idx].astype(np.float32)
... target = iris.target[idx].astype(np.float32)
```

**Inspecting the data**

Before you get started with setting up a model, it is always a good idea to have a look at the data. We did this earlier for the town map example, so let’s continue our streak. Using Matplotlib, we create a **scatter plot **where the color of each data point corresponds to the class label:

```
In [10]: plt.scatter(data[:, 0], data[:, 1], c=target, cmap=plt.cm.Paired, s=100)
... plt.xlabel(iris.feature_names[0])
... plt.ylabel(iris.feature_names[1]) Out[10]: <matplotlib.text.Text at 0x23bb5e03eb8>
```

To make plotting easier, we limit ourselves to the first two features (iris.feature_names[0] being the sepal length and iris.feature_names[1] being the sepal width). We can see a nice separation of classes in the following figure:

**Plotting the ﬁrst two features of the Iris dataset**

**Splitting the data into training and test sets**

We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn’s many helper functions:

```
In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )
```

Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:

```
In [12]: X_train.shape, y_train.shape Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape Out[13]: ((10, 4), (10,))
```

**Training the classifier**

Creating a logistic regression classifier involves pretty much the same steps as setting up *k*– NN:

`In [14]: lr = cv2.ml.LogisticRegression_create()`

We then have to specify the desired training method. Here, we can choose cv2.ml.LogisticRegression_BATCH or cv2.ml.LogisticRegression_MINI_BATCH. For now, all we need to know is that we want to update the model after every data point, which can be achieved with the following code:

```
In [15]: lr.setTrainMethod(cv2.ml.LogisticRegression_MINI_BATCH)
... lr.setMiniBatchSize(1)
```

We also want to specify the number of iterations the algorithm should run before it terminates:

`In [16]: lr.setIterations(100)`

We can then call the training method of the object (in the exact same way as we did earlier), which will return True upon success:

`In [17]: lr.train(X_train, cv2.ml.ROW_SAMPLE, y_train) Out[17]: True`

As we just saw, the goal of the training phase is to find a set of weights that best transform the feature values into an output label. A single data point is given by its four feature values (*f**0*, *f**1*, *f**2*, *f**3*). Since we have four features, we should also get four weights, so that *x = w**0 **f**0 **+ w**1 **f**1 **+ w**2 **f**2 **+ w**3 **f**3*, and *ŷ=σ(x)*. However, as discussed previously, the algorithm adds an extra weight that acts as an offset or bias, so that *x = w**0 **f**0 **+ w**1 **f**1 **+ w**2 **f**2 **+ w**3 **f**3 **+ w**4*. We can retrieve these weights as follows:

```
In [18]: lr.get_learnt_thetas()
Out[18]: array([[-0.04109113, -0.01968078, -0.16216497, 0.28704911,
0.11945518]], dtype=float32)
```

This means that the input to the logistic function is *x = -0.0411 f**0 **– 0.0197 f**1 **– 0.162 f**2 **+ 0.287 f**3 **+ 0.119*. Then, when we feed in a new data point (*f**0*, *f**1*, *f**2*, *f**3*) that belongs to class 1, the output *ŷ=σ(x) *should be close to 1. But how well does that actually work?

**Testing the classifier**

Let’s see for ourselves by calculating the accuracy score on the training set:

```
In [19]: ret, y_pred = lr.predict(X_train)
In [20]: metrics.accuracy_score(y_train, y_pred) Out[20]: 1.0
```

Perfect score! However, this only means that the model was able to perfectly **memorize **the training dataset. This does not mean that the model would be able to classify a new, unseen data point. For this, we need to check the test dataset:

```
In [21]: ret, y_pred = lr.predict(X_test)
... metrics.accuracy_score(y_test, y_pred) Out[21]: 1.0
```

Luckily, we get another perfect score! Now we can be sure that the model we built is truly awesome.

*If you enjoyed building a classifier using logistic regression and would like to learn more machine learning tasks using OpenCV, be sure to check out the book, Machine Learning for OpenCV, where this section originally appears.*