Neuroevolution in the Snake Game
Overview
This project was developed during my M.Sc. studies as an exploration of applying artificial intelligence to classic arcade environments.
It implements a Snake Game controlled by an AI agent trained via neuroevolution —
a technique that applies evolutionary algorithms to optimize the parameters of neural networks.
The underlying model is a Multilayer Perceptron evolved through a genetic algorithm,
which iteratively refines the network’s weights and biases across successive generations to improve performance.
Neural Network Architecture
The neural network consists of one input layer with 25 neurons, two hidden layers with 16 and 8 neurons respectively with activation function ReLU, and one output layer with 4 neurons.
This architecture encodes the snake’s perception of its environment and its decision-making process.
Input Representation
At each timestep, the snake observes its surroundings along eight spatial directions.
For each direction, three quantities are computed:
- The normalized distance to the nearest wall
- A binary indicator denoting the presence of an apple
- A binary indicator denoting the presence of the snake’s own body
These values yield \(3 \times 8 = 24\) features, augmented by a final input representing the current body length, for a total of 25 input neurons.
Output Semantics
The network produces four continuous outputs, corresponding to the possible movement directions:
- Left (L)
- Down (D)
- Right (R)
- Up (U)
The direction associated with the highest activation value is selected as the next movement action.
Neuroevolutionary Training Process
The agent’s control policy is optimized using a genetic algorithm applied to a population of 4,000 neural networks.
Each individual (i.e., snake) encodes its weights and biases as chromosomes initialized with random values.
After each generation, individuals are evaluated using a fitness function that rewards efficient goal-seeking and penalizes non-productive behavior.
Through successive generations, this evolutionary process drives the emergence of increasingly competent strategies.
Fitness Function
The fitness function guiding evolution is defined as:
\[f(\text{a}, \text{s}_c, \text{s}_f) = e^{\text{a}} + 0.05 \cdot \text{s}_c - 0.1 \cdot \text{s}_f\]where:
- \(\text{a}\): number of apples collected
- \(\text{s}_c\): number of steps taken toward the apple
- \(\text{s}_f\): number of steps taken away from the apple
The exponential term emphasizes reward accumulation, while the linear terms modulate exploration versus exploitation.
All distances are measured using the Euclidean metric.
Selection and Genetic Operators
Parent selection is performed using the roulette-wheel method, favoring individuals with higher fitness scores.
Offspring are generated through crossover and mutation, with a mutation rate of \(0.005\%\).
This combination of selection pressure and stochastic variation enables the progressive refinement of the neural controllers, converging toward effective behavioral policies over time.