Evolving neural networks, revisited
- Evolving what
- Weights
- Unit biases, gains, time constants (as in Beer, 1996)
- Reasons for evolving rather than learning the parameters
- No suitable learning algorithm
- May avoid local minima
- Once found, solutions are saved and don't have to be found again
The Baldwin Effect
- The Baldwin effect: learning appears to affect evolution; learned traits eventually become genetically determined
- Learning can change the fitness landscape, making it more likely for evolution to find particular solutions and creating the illusion that learned solutions are inherited.
- How learning can solve problems that evolution can't (Hinton & Nowlan, 1987)
- Neural network with 20 connections, either on or off; only one ideal connectivity (from `2^20` possibilities): very spiky fitness surface
- Genome specifies for each connection: yes (prob .25), no (prob .25), or ? (prob .5) (explore during lifetime: learning trials that randomly select values for the ?s)
- Learning makes the fitness landscape smoother, allowing far better performance than for a population with all of the connections specified genetically.
-
How genetic search and hill climbing can work together
- Genetic search gives evidence on what conveys fitness in widely separated parts of the search space
- Hill climbing excels at local optimization
-
Why Lamarckian evolution would be hard to implement: requires inverted the forward function that maps from genotypes, through development and learning, to adapted phenotypes
-
Selective pressure for genes which facilitate the development of certain useful characteristics in response to the environment
- The same characteristic will tend to develop regardless of the environmental factors that originally controlled it; environmental control of the process is supplanted by internal genetic control
-
Baldwin effect only useful when the space is difficult to search with help from an adaptive process
-
Combining reinforcement learning and evolution (Ackley & Littman, slides)
- Separate evaluation (hard-wired; one output unit) and action (learned) networks
- Weights of evaluation network and initial weights of action network specified by genome
- Action network learns by complementary reinforcement back-propagation
- Back-propagation network
- Input vector represents state
- Output vector of the network represents probabilities of action outputs, a search vector, `vec s`
- Search vector transformed to a binary action vector, `vec o`
- For positive reinforcement, target is `vec o - vec s`
- For negative reinforcement, target is `(vec 1 - vec o) - vec s`
- Separate learning rates for positive and negative cases
- Additional forward passes and weight updates until the original output is generated (positive reinforcement) or a different output is generated (negative reinforcement)
- Reinforcement signal is evaluation network's current output minus its last output
- In an ALife simulation with simple sensory input, food, predators, and obstacles, evolution plus learning outperforms evolution or learning
- Early in evolution, the action network weights change more; later the evaluation network weights change more (a Baldwin effect)