





















































The development of a neural network is inspired by human brain activities. As such, this type of network is a computational model that mimics the pattern of the human mind. In contrast to this, support vector machines first, map input data into a high dimensional feature space defined by the kernel function, and find the optimum hyperplane that separates the training data by the maximum margin. In short, we can think of support vector machines as a linear algorithm in a high dimensional space.
In this article, we will cover:
(For more resources related to this topic, see here.)
The neural network is constructed with an interconnected group of nodes, which involves the input, connected weights, processing element, and output. Neural networks can be applied to many areas, such as classification, clustering, and prediction. To train a neural network in R, you can use neuralnet, which is built to train multilayer perceptron in the context of regression analysis, and contains many flexible functions to train forward neural networks. In this recipe, we will introduce how to use neuralnet to train a neural network.
In this recipe, we will use an iris dataset as our example dataset. We will first split the irisdataset into a training and testing datasets, respectively.
Perform the following steps to train a neural network with neuralnet:
> data(iris)
> ind <- sample(2, nrow(iris), replace = TRUE, prob=c(0.7, 0.3))
> trainset = iris[ind == 1,]> testset = iris[ind == 2,]
> install.packages("neuralnet")> library(neuralnet)
> trainset$setosa = trainset$Species == "setosa"
> trainset$virginica = trainset$Species == "virginica"
> trainset$versicolor = trainset$Species == "versicolor"
> network = neuralnet(versicolor + virginica + setosa~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, trainset, hidden=3)
> network
Call: neuralnet(formula = versicolor + virginica + setosa ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = trainset, hidden = 3)
1 repetition was calculated.
Error Reached Threshold Steps
1 0.8156100175 0.009994274769 11063
> network$result.matrix
1
error 0.815610017474
reached.threshold 0.009994274769
steps 11063.000000000000
Intercept.to.1layhid1 1.686593311644
Sepal.Length.to.1layhid1 0.947415215237
Sepal.Width.to.1layhid1 -7.220058260187
Petal.Length.to.1layhid1 1.790333443486
Petal.Width.to.1layhid1 9.943109233330
Intercept.to.1layhid2 1.411026063895
Sepal.Length.to.1layhid2 0.240309549505
Sepal.Width.to.1layhid2 0.480654059973
Petal.Length.to.1layhid2 2.221435192437
Petal.Width.to.1layhid2 0.154879347818
Intercept.to.1layhid3 24.399329878242
Sepal.Length.to.1layhid3 3.313958088512
Sepal.Width.to.1layhid3 5.845670010464
Petal.Length.to.1layhid3 -6.337082722485
Petal.Width.to.1layhid3 -17.990352566695
Intercept.to.versicolor -1.959842102421
1layhid.1.to.versicolor 1.010292389835
1layhid.2.to.versicolor 0.936519720978
1layhid.3.to.versicolor 1.023305801833
Intercept.to.virginica -0.908909982893
1layhid.1.to.virginica -0.009904635231
1layhid.2.to.virginica 1.931747950462
1layhid.3.to.virginica -1.021438938226
Intercept.to.setosa 1.500533827729
1layhid.1.to.setosa -1.001683936613
1layhid.2.to.setosa -0.498758815934
1layhid.3.to.setosa -0.001881935696
> head(network$generalized.weights[[1]])
The neural network is a network made up of artificial neurons (or nodes). There are three types of neurons within the network: input neurons, hidden neurons, and output neurons. In the network, neurons are connected; the connection strength between neurons is called weights. If the weight is greater than zero, it is in an excitation status. Otherwise, it is in an inhibition status. Input neurons receive the input information; the higher the input value, the greater the activation. Then, the activation value is passed through the network in regard to weights and transfer functions in the graph. The hidden neurons (or output neurons) then sum up the activation values and modify the summed values with the transfer function. The activation value then flows through hidden neurons and stops when it reaches the output nodes. As a result, one can use the output value from the output neurons to classify the data.
Artificial Neural Network
The advantages of a neural network are: firstly, it can detect a nonlinear relationship between the dependent and independent variable. Secondly, one can efficiently train large datasets using the parallel architecture. Thirdly, it is a nonparametric model so that one can eliminate errors in the estimation of parameters. The main disadvantages of neural network are that it often converges to the local minimum rather than the global minimum. Also, it might over-fit when the training process goes on for too long.
In this recipe, we demonstrate how to train a neural network. First, we split the iris dataset into training and testing datasets, and then install the neuralnet package and load the library into an R session. Next, we add the columns versicolor, setosa, and virginica based on the name matched value in the Species column, respectively. We then use the neuralnet function to train the network model. Besides specifying the label (the column where the name equals to versicolor, virginica, and setosa) and training attributes in the function, we also configure the number of hidden neurons (vertices) as three in each layer.
Then, we examine the basic information about the training process and the trained network saved in the network. From the output message, it shows the training process needed 11,063 steps until all the absolute partial derivatives of the error function were lower than 0.01 (specified in the threshold). The error refers to the likelihood of calculating Akaike Information Criterion (AIC). To see detailed information on this, you can access the result.matrix of the built neural network to see the estimated weight. The output reveals that the estimated weight ranges from -18 to 24.40; the intercepts of the first hidden layer are 1.69, 1.41 and 24.40, and the two weights leading to the first hidden neuron are estimated as 0.95 (Sepal.Length), -7.22 (Sepal.Width), 1.79 (Petal.Length), and 9.94 (Petal.Width). We can lastly determine that the trained neural network information includes generalized weights, which express the effect of each covariate. In this recipe, the model generates 12 generalized weights, which are the combination of four covariates (Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) to three responses (setosa, virginica, versicolor).
The package, neuralnet, provides the plot function to visualize a built neural network and the gwplot function to visualize generalized weights. In following recipe, we will cover how to use these two functions.
You need to have completed the previous recipe by training a neural network and have all basic information saved in network.
Perform the following steps to visualize the neural network and the generalized weights:
> plot(network)
Figure 10: The plot of trained neural network
> par(mfrow=c(2,2))
> gwplot(network,selected.covariate="Petal.Width")
> gwplot(network,selected.covariate="Sepal.Width")
> gwplot(network,selected.covariate="Petal.Length")
> gwplot(network,selected.covariate="Petal.Width")
Figure 11: The plot of generalized weights
In this recipe, we demonstrate how to visualize the trained neural network and the generalized weights of each trained attribute. Also, the plot includes the estimated weight, intercepts and basic information about the training process. At the bottom of the figure, one can find the overall error and number of steps required to converge.
If all the generalized weights are close to zero on the plot, it means the covariate has little effect. However, if the overall variance is greater than one, it means the covariate has a nonlinear effect.
> ?gwplot
To learn more about machine learning with R, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended:
Further resources on this subject: