Using Genetic Algorithms for optimizing your models [Tutorial]

While, at present, deep learning (DL) is on top in terms of both application and employability, it has close competition with evolutionary algorithms. These algorithms are inspired by the natural process of evolution, the world's best optimizers. In this article, we will explore what is a genetic algorithm, advantages of genetic algorithms, and various uses of genetic algorithm in optimizing your models.

This article is an excerpt taken from the book 'Hands-On Artificial Intelligence for IoT' written by Amita Kapoor. The book explores building smarter systems by combining artificial intelligence and the Internet of Things—two of the most talked about topics today.

Let's now learn how can we implement the genetic algorithm. Genetic Algorithm was developed by John Holland in 1975. It was shown that it can be used to solve an optimization problem by his student Goldberg, who used genetic algorithms to control gas pipeline transmission. Since then, genetic algorithms have remained popular, and have inspired various other evolutionary programs.

To apply genetic algorithms in solving optimization problems using the computer, as the first step we will need to encode the problem variables into genes. The genes can be a string of real numbers or a binary bit string (series of 0s and 1's). This represents a potential solution (individual) and many such solutions together form the population at time t. For instance, consider a problem where we need to find two variables, a and b, such that the two lie in the range (0, 255). For binary gene representation, these two variables can be represented by a 16-bit chromosome, with the higher 8 bits representing gene a and the lower 8 bits for b. The encoding will need to be later decoded to get the real values of the variables a and b.

The second important requirement for genetic algorithms is defining a proper fitness function, which calculates the fitness score of any potential solution (in the preceding example, it should calculate the fitness value of the encoded chromosome). This is the function that we want to optimize by finding the optimum set of parameters of the system or the problem at hand. The fitness function is problem-dependent. For example, in the natural process of evolution, the fitness function represents the organism's ability to operate and to survive in its environment.

Pros and cons of Genetic algorithm

Genetic algorithms sound cool, right! Now, before we try and build code around them, let's point out certain advantages and disadvantages of genetic algorithms.

Advantages

Genetic algorithms offer some intriguing advantages and can produce results when the tradition gradient-based approaches fail:

They can be used to optimize either continuous or discrete variables.

Unlike gradient descent, we do not require derivative information, which also means that there is no need for the fitness function to be continuous and differentiable.

It can simultaneously search from a wide sampling of the cost surface.

We can deal with a large number of variables without a significant increase in computation time.

The generation of the population and calculating their fitness values can be performed in parallel, and hence genetic algorithms are well suited for parallel computers.

They can work even when the topological surface is extremely complex because crossover and mutation operators help them in jumping out of a local minimum.

They can provide more than one optimum solution.

We can use them with numerically generated data, experimental data, or even analytical functions. They specifically work well for large-scale optimization problems.

Disadvantages

Despite the previously mentioned advantages, we still do not find genetic algorithms to be a ubiquitous solution to all optimization problems. This is for the following reasons:

If the optimization function is a well-behaved convex function, then gradient-based methods will give a faster convergence

The large population of solutions that helps genetic algorithms cover the search space more extensively also results in slow convergence

Designing a fitness function can be a daunting task

Coding genetic algorithms using Distributed Evolutionary Algorithms in Python

Now that we understand how genetic algorithms work, let's try solving some problems with them. They have been used to solve NP-hard problems such as the traveling salesman problem. To make the task of generating a population, performing the crossover, and performing mutation operations easy, we will make use of Distributed Evolutionary Algorithms in Python (DEAP). It supports multiprocessing and we can use it for other evolutionary algorithms as well. You can download DEAP directly from PyPi using this:

pip install deap

It is compatible with Python 3.

To learn more about DEAP, you can refer to its GitHub repository and its user's guide.

Guess the word

In this program, we use genetic algorithms to guess a word. The genetic algorithm will know the number of letters in the word and will guess those letters until it finds the right answer. We decide to represent the genes as a single alphanumeric character; strings of these characters thus constitute a chromosome. And our fitness function is the sum of the characters matching in the individual and the right word:

As the first step, we import the modules we will need. We use the string module and the random module to generate random characters from (a—z, A—Z, and 0—9). From the DEAP module, we use creator, base, and tools:

import string
import random

from deap import base, creator, tools

In DEAP, we start with creating a class that inherits from the deep.base module. We need to tell it whether we are going to have a minimization or maximization of the function; this is done using the weights parameter. A value of +1 means we are maximizing (for minimizing, we give the value -1.0). The following code line will create a class, FitnessMax, that will maximize the function:

creator.create("FitnessMax", base.Fitness, weights=(1.0,))

We also define an Individual class, which will inherit the class list, and tell the DEAP creator module to assign FitnessMax as its fitness attribute:

creator.create("Individual", list, fitness=creator.FitnessMax)

Now, with the Individual class defined, we use the toolbox of DEAP defined in the base module. We will use it to create a population and define our gene pool. All the objects that we will need from now onward—an individual, the population, the functions, the operators, and the arguments—are stored in a container called toolbox. We can add or remove content to/from the toolbox container using the register() and unregister() methods:

toolbox = base.Toolbox()
# Gene Pool
toolbox.register("attr_string", random.choice, \
               string.ascii_letters + string.digits )

Now that we have defined how the gene pool will be created, we create an individual and then a population by repeatedly using the Individual class. We will pass the class to the toolbox responsible for creating a N parameter , telling it how many genes to produce:

#Number of characters in word
# The word to be guessed
word = list('hello')
N = len(word)
# Initialize population
toolbox.register("individual", tools.initRepeat, \
         creator.Individual, toolbox.attr_string, N )
toolbox.register("population",tools.initRepeat, list,\
         toolbox.individual)

We define the fitness function. Note the comma in the return statement. This is because the fitness function in DEAP is returned as a tuple to allow multi-objective fitness functions:

def evalWord(individual, word):
    return sum(individual[i] == word[i] for i in\
            range(len(individual))),

Add the fitness function to the container. Also, add the crossover operator, mutation operator, and parent selector operator. You can see that, for this, we are using the register function. In the first statement, we register the fitness function that we have defined, along with the additional arguments it will take. The next statement registers the crossover operation; it specifies that here we are using a two-point crossover (cxTwoPoint). Next, we register the mutation operator; we choose the mutShuffleIndexes option, which shuffles the attributes of the input individual with a probability indpb=0.05. And finally, we define how the parents are selected; here, we have defined the method of selection as tournament selection with a tournament size of 3:

toolbox.register("evaluate", evalWord, word)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutShuffleIndexes, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)

Now we have all the ingredients, so we will write down the code of the genetic algorithm, which will perform the steps we mentioned earlier in a repetitive manner:

def main():
    random.seed(64)
    # create an initial population of 300 individuals 
    pop = toolbox.population(n=300)
    # CXPB is the crossover probability 
    # MUTPB is the probability for mutating an individual
    CXPB, MUTPB = 0.5, 0.2
print("Start of evolution")
# Evaluate the entire population
fitnesses = list(map(toolbox.evaluate, pop))
for ind, fit in zip(pop, fitnesses):
ind.fitness.values = fit

print(" Evaluated %i individuals" % len(pop))

# Extracting all the fitnesses of individuals in a list
fits = [ind.fitness.values[0] for ind in pop]
# Variable keeping track of the number of generations
g = 0

# Begin the evolution
while max(fits) < 5 and g < 1000:

# A new generation
g += 1
print("-- Generation %i --" % g)

# Select the next generation individuals
offspring = toolbox.select(pop, len(pop))
# Clone the selected individuals
offspring = list(map(toolbox.clone, offspring))

# Apply crossover and mutation on the offspring
for child1, child2 in zip(offspring[::2], offspring[1::2]):
# cross two individuals with probability CXPB
if random.random() < CXPB:

toolbox.mate(child1, child2)
# fitness values of the children
# must be recalculated later
del child1.fitness.values
del child2.fitness.values
for mutant in offspring:
# mutate an individual with probability MUTPB
if random.random() < MUTPB:

toolbox.mutate(mutant)
del mutant.fitness.values

# Evaluate the individuals with an invalid fitness
invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
fitnesses = map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit

print(" Evaluated %i individuals" % len(invalid_ind))

# The population is entirely replaced by the offspring
pop[:] = offspring

# Gather all the fitnesses in one list and print the stats
fits = [ind.fitness.values[0] for ind in pop]

length = len(pop)
mean = sum(fits) / length
sum2 = sum(x*x for x in fits)
std = abs(sum2 / length - mean**2)**0.5

print(" Min %s" % min(fits))
print(" Max %s" % max(fits))
print(" Avg %s" % mean)
print(" Std %s" % std)

print("-- End of (successful) evolution --")

best_ind = tools.selBest(pop, 1)[0]
print("Best individual is %s, %s" % (''.join(best_ind),\
best_ind.fitness.values))

9. Here, you can see the result of this genetic algorithm. In seven generations, we reached the right word:

using-genetic-algorithms-for-optimizing-your-models-tutorial-img-0

Genetic algorithm for LSTM optimization

In a genetic CNN, we use genetic algorithms to estimate the optimum CNN architecture; in genetic RNN, we will now use a genetic algorithm to find the optimum hyperparameters of the RNN, the window size, and the number of hidden units. We will find the parameters that reduce the root-mean-square error (RMSE) of the model. The hyperparameters window size and number of units are again encoded in a binary string with 6 bits for window size and 4 bits for the number of units. Thus, the complete encoded chromosome will be of 10 bits. The LSTM is implemented using Keras. The code we implement is taken from https://github.com/aqibsaeed/Genetic-Algorithm-RNN:

1. The necessary modules are imported. This time, we are using Keras to implement the LSTM model:

import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split as split

from keras.layers import LSTM, Input, Dense
from keras.models import Model

from deap import base, creator, tools, algorithms
from scipy.stats import bernoulli
from bitstring import BitArray

np.random.seed(1120)

The dataset we need for LSTM has to be time series data; we use the wind-power forecasting data from Kaggle (https://www.kaggle.com/c/GEF2012-wind-forecasting/data):

data = pd.read_csv('train.csv')
data = np.reshape(np.array(data['wp1']),(len(data['wp1']),1))

train_data = data[0:17257]
test_data = data[17257:]

Define a function to prepare the dataset depending upon the chosen window_size:

def prepare_dataset(data, window_size):
    X, Y = np.empty((0,window_size)), np.empty((0))
    for i in range(len(data)-window_size-1):
        X = np.vstack([X,data[i:(i + window_size),0]])
        Y = np.append(Y,data[i + window_size,0])   
    X = np.reshape(X,(len(X),window_size,1))
    Y = np.reshape(Y,(len(Y),1))
    return X, Y

The train_evaluate function creates the LSTM network for a given individual and returns its RMSE value (fitness function):

def train_evaluate(ga_individual_solution):   
    # Decode genetic algorithm solution to integer for window_size and num_units
    window_size_bits = BitArray(ga_individual_solution[0:6])
    num_units_bits = BitArray(ga_individual_solution[6:]) 
    window_size = window_size_bits.uint
    num_units = num_units_bits.uint
    print('\nWindow Size: ', window_size, ', Num of Units: ', num_units)
    
    # Return fitness score of 100 if window_size or num_unit is zero
    if window_size == 0 or num_units == 0:
        return 100, 
    
    # Segment the train_data based on new window_size; split into train and validation (80/20)
    X,Y = prepare_dataset(train_data,window_size)
    X_train, X_val, y_train, y_val = split(X, Y, test_size = 0.20, random_state = 1120)
    
    # Train LSTM model and predict on validation set
    inputs = Input(shape=(window_size,1))
    x = LSTM(num_units, input_shape=(window_size,1))(inputs)

    predictions = Dense(1, activation='linear')(x)
    model = Model(inputs=inputs, outputs=predictions)
    model.compile(optimizer='adam',loss='mean_squared_error')
    model.fit(X_train, y_train, epochs=5, batch_size=10,shuffle=True)
    y_pred = model.predict(X_val)
    
    # Calculate the RMSE score as fitness score for GA
    rmse = np.sqrt(mean_squared_error(y_val, y_pred))
    print('Validation RMSE: ', rmse,'\n')
    
    return rmse,

Next, we use DEAP tools to define Individual (again, since the chromosome is represented by a binary encoded string (10 bits), we use Bernoulli's distribution), create the population, use ordered crossover, use mutShuffleIndexes mutation, and use the roulette wheel selection for selecting the parents:

population_size = 4
num_generations = 4
gene_length = 10

# As we are trying to minimize the RMSE score, that's why using -1.0. 
# In case, when you want to maximize accuracy for instance, use 1.0
creator.create('FitnessMax', base.Fitness, weights = (-1.0,))
creator.create('Individual', list , fitness = creator.FitnessMax)

toolbox = base.Toolbox()
toolbox.register('binary', bernoulli.rvs, 0.5)
toolbox.register('individual', tools.initRepeat, creator.Individual, toolbox.binary, n = gene_length)
toolbox.register('population', tools.initRepeat, list , toolbox.individual)

toolbox.register('mate', tools.cxOrdered)
toolbox.register('mutate', tools.mutShuffleIndexes, indpb = 0.6)
toolbox.register('select', tools.selRoulette)
toolbox.register('evaluate', train_evaluate)

population = toolbox.population(n = population_size)
r = algorithms.eaSimple(population, toolbox, cxpb = 0.4, mutpb = 0.1, ngen = num_generations, verbose = False)

We get the best solution, as follows:

best_individuals = tools.selBest(population,k = 1)
best_window_size = None
best_num_units = None

for bi in best_individuals:
    window_size_bits = BitArray(bi[0:6])
    num_units_bits = BitArray(bi[6:]) 
    best_window_size = window_size_bits.uint
    best_num_units = num_units_bits.uint
    print('\nWindow Size: ', best_window_size, ', Num of Units: ', best_num_units)

And finally, we implement the best LSTM solution:

X_train,y_train = prepare_dataset(train_data,best_window_size)
X_test, y_test = prepare_dataset(test_data,best_window_size)

inputs = Input(shape=(best_window_size,1))
x = LSTM(best_num_units, input_shape=(best_window_size,1))(inputs)
predictions = Dense(1, activation='linear')(x)
model = Model(inputs = inputs, outputs = predictions)
model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(X_train, y_train, epochs=5, batch_size=10,shuffle=True)
y_pred = model.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print('Test RMSE: ', rmse)

Yay! Now, you have the best LSTM network for predicting wind power.

In this article, we looked at an interesting nature-inspired algorithm family: genetic algorithms. We learned how to convert our optimization problems into a form suitable for genetic algorithms. Crossover and mutation, two very crucial operations in genetic algorithms, were explained. We applied what we learned from two very different optimization problems. We used it to guess a word and to find the optimum hyperparameters for an LSTM network. If you want to explore more topics related to genetic algorithms, be sure to check out the book 'Hands-On Artificial Intelligence for IoT'.

How AI is transforming the Smart Cities IoT? [Tutorial]

How to stop hackers from messing with your home network (IoT)

Defending your business from the next wave of cyberwar: IoT Threats