Of course, the purpose of sexual reproduction is to perpetuate our species by having offspring. Combined with natural selection, it’s enable fit our genes to our environment quickly. Buy why is it required to have two mates to produce a single offspring? Would asexual reproduction or having 3+ parents be more advantageous?

The end goal of reproduction is have the offspring breed their own offspring. Since genes influence the probability of survival/breeding, one goal of reproduction is to have as many genes that will allow offspring survival/reproduction as possible. This is a very information theoretic approach – how quickly can biology transfer important information?

A species can evolve more quickly (quicker to optimal fitness) when they share genetic information! The sharing of genetic material is important when combined with natural selection. As a species we can evolve quicker this way! Correspondingly, most species use two mates when reproducing.

This isn’t surprising when compared with asexual reproduction which involves one partner. Sharing information used to survive is more advantageous than not sharing that information.

But would having three or more parents help our genome advance more quickly? I’ll simulate to guess an answer to this question in this post.

This post is inspired by a information theory lecture by Varun Jog, which is in turn inspired by a chapter in the textbook “Information Theory, Inference, and Learning Algorithms” by David MacKay.

Model

Simulation of evolution needs to have 4 main parts:

  • an individual’s gene
  • determination of a individual’s fitness
  • inheritance of genes when producing offspring
  • natural selection

Individual gene and fitness

Each individual’s DNA will be a binary sequence. This is a fair representation of DNA – the only difference is that actual DNA has 2 bits of information per base pair.

We’ll model fitness when an individual has genes $g_i \in \braces{0, 1}$ as

$$ \text{fitness } = \sum_i g_i $$

This is a sensible model because it mirrors actual fitness. If one is already really fit, a little more fitness won’t help much (i.e, fitness of 99 → 100 is a small percentage change). If one is not fit, getting a little fit helps a ton (i.e., fitness of 1 → 2 is a large percentage change).

import numpy as np
def fitness(member):
    member = np.asarray(member)
    if member.ndim == 2:
        _, n_genes = member.shape
        indiv_fitnesses = member.sum(axis=1)
        return indiv_fitnesses.mean()
        
    return member.sum()

Inheritance

We’ll model the probability of pulling parent $i$’s gene as $\frac{1}{n}$ when there are $n$ parents. This mirrors how reproduction works.

While implementing this we include mutation. This will flip each gene with probability $p$. This is a naturally occurring process.

def produce_offspring(parents, mutate=False, p=0.00):
    n_parents, n_genes = parents.shape
    gene_to_pull = np.random.randint(n_parents, size=n_genes)
    child = [parents[gene_to_pull[k]][k] for k in range(n_genes)]
    child = np.array(child)
    
    if mutate:
        genes_to_flip = np.random.choice([0, 1], size=child.shape, p=[1-p, p])
        i = np.argwhere(genes_to_flip == 1)
        child[i] = 1 - child[i]
        
    return child

One generation

Each generation of parents will produce twice as many children. We’ll kill half those children to simulate natural selection.

We’ll produce $2N$ children when we have $N$ parents, regardless of how many parents are required to produce each offspring. If we produced more children for some groups that there would be more children to hand to natural selection. This would lead to a bias because natural selection selects the strongest candidates.

def one_generation(members, n_parents=2, **kwargs):
    """ members: 2D np.ndarray
            members.shape = (n_parents, n_genes) """
    parents = np.random.permutation(members) 
    children = []
    for parent_group in range(len(parents) // n_parents):
        parents_group = parents[parent_group * n_parents : (parent_group + 1) * n_parents]
        children += [produce_offspring(parents_group, **kwargs) for _ in range(2*n_parents)]
    children = np.array(children)

    # make sure we produce (approximately) 2*N children when we have N parents
    assert np.abs(len(children) - 2*len(parents)) < n_mates + 1
    return children 

Simulations

Now want to simulate evolution with an initial population:

In the natural selection process we’ll kill off half the children, meaning there will be $N$ parents for the next generation.

At each generation we’ll record relevant data. We’ll look at the fitness below.

def evolve(n_parents=2000, n_mates=2, n_genes=200, n_generations=100, p_fit=0.5, verbose=10):
    parents = np.random.choice([0, 1], size=(n_parents, n_genes), p=[1-p_fit, p_fit])
    child = produce_offspring(parents)

    data = []
    for generation in range(n_generations):
        if verbose and generation % verbose == 0:
            print('Generation {} for n_mates = {} with {} parents'.format(generation, n_mates, len(parents)))
            
        children = one_generation(parents, n_parents=n_mates, mutate=True, p=0.01)
        
        # kill half the children
        children_fitness = np.array([fitness(child) for child in children])
        i = np.argsort(children_fitness)
        children = children[i]
        parents = children[len(children) // 2:].copy()
        
        data += [{'fitness': fitness(parents), 'generation': generation,
                  'n_mates': n_mates, 'n_parents': n_parents,
                  'n_genes': n_genes, 'n_generations': n_generations}]
        
    return data

Data collection

Then we can run this for a different number of mates required to produce one offspring:

data = []
for n_mates in [1, 2, 3, 4]:
    data += evolve(n_mates=n_mates, p_fit=0.50)

Results

from altair import Chart
import pandas as pd

df = pd.DataFrame(data)
Chart(df).mark_line().encode(
    x='generation', y='fitness', color='n_mates')

Asexual reproduction requires ~75 generations to reach the fitness sexual reproduction reaches in ~15 generations. Sexual reproduction appears to be fairly close to optimal in this model.

This post download as a Jupyter notebook, Reproduction.ipynb