Genetic algorithms in Python

Although I’m fond of numerical optimization through gradients, … there are some times where a global optimization is much more powerfull. For instance, I have to generate two sequences/combs that are orthogonal and for which their autocorrelation is almost an impulse. The two combs have a fixed number of impulse, so it’s a perfect job for genetic algorithms.

Genetic algorithms are a global optimization technique. The parameters are encoded in a genome, and then different populations are grown. The fittest individuals survive and give new individuals. The usual implementation in Python is PyEvolve, a pure Python package that isn’t depended on anything except if you want to save and plot populations.

Creating a genome and a cost function

So my goal is to minimize crosscorrelation and autocorrelation, safe for the middle value of autocorrelation. I won’t use a fixed-size sequence, I will encode in the genome the distance between two impulses. This will allow for a really small genome with a lot of possibilities.

For this purpose, I create a class named OrthogonalSequences:

class OrthogonalSequences(object):
  def __init__(self, size, number):
    self.size = size
    self.number = number
    self.genome_length = size * number

  def convert_slice_to_time(chromosome):
    Converts a chromosome in a time sequence
    total = numpy.cumsum(chromosome)
    array = numpy.zeros(total[-1]+1)
    array[total] = 1
    array[0] = 1
    return array

  def evaluate(self, chromosomes):
    Evaluate the cost (square of error) of the genome by evaluating cross- and autocorrelation.
    value = 0
    for i in range(self.number):
      slice_i = self.convert_slice_to_time(chromosomes[i * self.size:(i+1) * self.size])
      value += .1 * len(slice_i)
      for j in range(i, self.number):
        slice_j = self.convert_slice_to_time(chromosomes[j * self.size:(j+1) * self.size])
        correlation = numpy.correlate(slice_i, slice_j, mode = "full")
        if (i == j):
          correlation[len(slice_i)-1] -= self.size

        value += numpy.sum(correlation ** 2)

    return value

In fact this class will only be used to evaluate a genome. As I have two combs (and more if you’d like), I need to pass this information to the evaluation function, and thus nothing is more easy than Python for this. I’ve also added a small factor that adds a cost for lengthy combs.

Setting up the optimizer

Now, the only thing remaining is to set all objects and launch the optimization:

if __name__ == "__main__":
  from pyevolve import G1DList, GSimpleGA, Consts
  import sys

  size = int(sys.argv[1]) if len(sys.argv) > 1 else 10
  size -= 1
  number = int(sys.argv[2]) if len(sys.argv) > 2 else 2
  rangemax = int(sys.argv[3]) if len(sys.argv) > 3 else 100
  generations = int(sys.argv[4]) if len(sys.argv) > 4 else 50
  population = int(sys.argv[5]) if len(sys.argv) > 5 else 100

  genome = G1DList.G1DList(number*size)
  sequences = OrthogonalSequences(size, number)
  ga = GSimpleGA.GSimpleGA(genome)

  genome.setParams(rangemin=1, rangemax=rangemax)


  ga.evolve(freq_stats = generations/10)
  print ga.bestIndividual()


Now the result is the following for 2 sequences of 10 impulses:

Gen. 1 (2.00%): Max/Min/Avg Fitness(Raw) [436.73(387.20)/317.81(349.20)/363.94(363.94)]
Gen. 5 (10.00%): Max/Min/Avg Fitness(Raw) [417.89(375.20)/340.39(345.20)/348.24(348.24)]
Gen. 10 (20.00%): Max/Min/Avg Fitness(Raw) [419.21(379.20)/339.65(345.20)/349.34(349.34)]
Gen. 15 (30.00%): Max/Min/Avg Fitness(Raw) [417.72(377.20)/341.16(345.20)/348.10(348.10)]
Gen. 20 (40.00%): Max/Min/Avg Fitness(Raw) [419.40(375.20)/337.80(345.20)/349.50(349.50)]
Gen. 25 (50.00%): Max/Min/Avg Fitness(Raw) [419.86(389.20)/341.55(345.20)/349.88(349.88)]
Gen. 30 (60.00%): Max/Min/Avg Fitness(Raw) [418.25(381.20)/341.41(345.20)/348.54(348.54)]
Gen. 35 (70.00%): Max/Min/Avg Fitness(Raw) [418.08(375.20)/340.08(345.20)/348.40(348.40)]
Gen. 40 (80.00%): Max/Min/Avg Fitness(Raw) [418.42(375.20)/339.53(345.20)/348.68(348.68)]
Gen. 45 (90.00%): Max/Min/Avg Fitness(Raw) [418.80(381.20)/340.76(345.20)/349.00(349.00)]
Gen. 50 (100.00%): Max/Min/Avg Fitness(Raw) [417.94(371.20)/338.92(345.20)/348.28(348.28)]
Total time elapsed: 2.478 seconds.
- GenomeBase
        Score:                   345.200000
        Fitness:                 338.919595

        Slot [Evaluator] (Count: 1)
                Name: evaluate
        Slot [Initializator] (Count: 1)
                Name: G1DListInitializatorInteger
                Doc:  Integer initialization function of G1DList

   This initializator accepts the *rangemin* and *rangemax* genome parameters.

        Slot [Mutator] (Count: 1)
                Name: G1DListMutatorSwap
                Doc:  The mutator of G1DList, Swap Mutator
        Slot [Crossover] (Count: 1)
                Name: G1DListCrossoverSinglePoint
                Doc:  The crossover of G1DList, Single Point

   .. warning:: You can't use this crossover method for lists with just one element.

- G1DList
        List size:       18
        List:            [36, 48, 1, 11, 95, 37, 34, 38, 25, 10, 29, 64, 19, 11, 62, 68, 7, 15]

Here are the visual results for this optimization:

The first comb
The second comb
Correlations between the combs

They are not perfect, far from it, but remember that they are only combs with positive impulses. This means that you can’t get a perfect result. An almost-perfect result would be that the crosscorrelation is below 1, but with only one value of 2, it’s still pretty good.

Before the end

The process is very fast for 2 sequences of 10 impulses. If you add more sequences and more impulses, the computation time will rise fastly, but PyEvolve has now the ability to use the multiprocessing module, which means it can do parallel optimizations now.

PyEvolve can do much more, I only barely scratch the surface of its capablities, but I hope this example will make you try this package.

Buy Me a Coffee!
Other Amount:
Your Email Address:

2 thoughts on “Genetic algorithms in Python

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.