Due Monday, 2009-02-23
Download
these ten Python modules and one data file.
You will create a new module, learner.py and will
also make minor edits to thing.py.
Submit all files that you create or edit.
The program runs similarly, except for a few improvements suggested by various students and a few changes that make it possible to implement learning.
The main difference is in theBunny class.
If you use the settings in world,
bunnies have two sensors, a 360-degree Touch
sensor and a four-feeler Feel sensor.
Each feeler feels what is at its end or any
Wall that it passes through.
To make the world relatively constant, kitties and veggies
never die.
Bunnies die when their strength goes to 0; they do not age.
There are three methods in the Critter class that you will need to understand.
sensory_state() calls the critter's sensors and
returns a vector of numbers that gets passed to the critter's network (an instance of the Network class).
For Touch and Feel, these are 1s and 0s representing the presence or absence of different "textures" (see sense.py) as detected by the four feelers and the touch sensor.
run_network() runs the critter's network with
the output of sensory_state() as input.
If a non-empty target list is provided,
the network updates its weights using the delta rule.
Note that the Bunny class is set so that the outputs of its network
have the identity function as their activation function (linear=True).
In other words, the activations are not restricted to the [-1,1] range.
If the network's window is open (open it by clicking on the bunny),
the activations and target are displayed in the window.
You can see the weights by pressing the mouse on an output unit.
step() calls sensory_state() and runs its network
with this as input.
Then it calls decide() in the critter's memory
to get the index of an action and executes the action.
Then it updates the critter's strength on the basis of the different
reinforcements received.
The parameter values given in world specify stable populations of kitties and veggies, a variable population of bunnies, and walls dividing the world.
The reinforcement values that are relevant for this assignment are
STEP_COST, the cost of living one time stepMOVE_COST, the cost incurred with a moveTURN_COST, the cost incurred with a turnBUMP_COST, the cost incurred for bumping into a wallEAT_COST, the cost incurred for eating (or attempting to eat)HIDE_COST, the cost incurred for hidingCHEWED_COST, the cost incurred when a bunny is chewed on by a kittyFOOD_VALUE, the proportion of a veggie's strength that a bunny gains when it
eats the veggie
Three parameters are relevant for Q-learning:
DISCOUNT, the discount rate (γ) in the Q-learning ruleLR, the learning rate in the networkEXPLOITATION, which controls the extent to which decisions are based
on neural network weightslearner.py,
you will have to do the following in thing.py
from learner import * at the topBunny.set_learner(),
which should replace pass in the method.
pass below # Q-learning in step()
with code that interacts with the learner module, completing the
implementation of Q-learning.
decide() in its memory to select an action index.
decide() uses the Luce choice role, with EXPLOITATION as a parameter.
Network.NO_TARGET for
the irrelevant positions in the list.
world.
If you adjust any of these, explain how and why.
DISCOUNTLREXPLOITATIONTURN_COST