Degrees of freedom
- Correspondence between geometry of effector and
geometry of world
- Effectors and degrees of freedom
- Degrees of freedom for the system:
the least number of independent coordinates required to specify
the position of the system elements without violating
any geometric constraints
- DOF for object in space (6), simple object on a flat surface (3),
car (3), car with trailer (4)
- Controllable
DOF for car (2); turntable (3); joints: hinge (1), prismatic (1),
rotary (1), ball-and-socket (3); human arm (3+1+3=7)
- Controllable DOF (c) and number of DOF required to specify end state (s)
- When c is less than s, it is impossible to reach certain states with one
set of settings.
- When c equals s, there is one
set of settings for each end state.
- When c is greater than s, there are multiple
ways to reach end states (indeterminacy).
- The controllable parameters in the system may not be completely
independent: certain combinations of settings may be unreachable.
- Humans have the capacity for motor equivalence: the ability to
achieve the same physical objective in more than one
way; they can handle indeterminacy.
- Cost functions (based on time or energy requirements) may be
included to facilitate choice among actions.
- The inherent dynamical characteristics of the physical system may
be capitalized on, so that there are attractors
corresponding to desired trajectories.
Kinematics
- Forward kinematics: given a set of joint angles (or some
other settings of the system), specify a resultant state
- Inverse kinematics: given a goal state, specify a set of
joint angles (or other settings); problem of indeterminacy when number of
controllable degrees of freedom in
the system exceeds number of degrees of freedom needed to
specify task
- Examples of inverse kinematics problems
- Getting a (human, robotic, animated) hand into a desired
position and orientation
- Producing a given linguistic sound with a set of articulators
Inverse kinematics and learning to control a robot or a body
- The goal is to learn the controller, which takes a goal
state and a state of the world as input and produces a command or
series of commands to the system's effectors.
We'll assume the controller is a neural network.
-
If f maps
motor patterns onto target states, the problem becomes one of
inverting f. But f is not known a
priori, and the inverse of f may not exist.
-
The motor commands have consequences in the world: via the environment
(or plant), they get turned into a state of the world.
- But the environment (the physics of the body and other relevant parts
of the world) is outside the system's "nervous system", that is, not
trainable.
- Control system may include both a feedforward and a feedback component.
- The system must ideally configure itself so that the composition of the
controller (inverse model) and the environment is the
identity function: the inverse model should be an
inverse of the environment.
(y* is the goal state, y the state of the world which is actually
reached. u is the output of the feedforward controller, that is, an action, u ′ the output modified in response to feedback. With no feedback u = u ′. The magenta parts are "inside" (under the control of) the agent.)
Approaches to learning inverse kinematics
- Use feedback: Move the system in the desired direction.
But for humans, response to feedback is slow relative to high-precision movements that can be made, so feedforward approaches seem more fruitful.
- Reinforcement learning:
-
For each input state, there is a set of possible actions u, each with an associated Q-value for
that state.
- Once an action is selected and transmitted to the environment, a reinforcement signal is computed as a function of the difference between the response and the goal state.
-
The Q-value for the selected action and the given input state is adjusted accordingly.
- Direct inverse modeling
- Build an inverse model directly by observing the IO behavior of the environment.
- The feedforward controller is an associative learning device trained using gradient descent.
- To generate a training pair,
give a random action u′ to the environment, and observe the resulting state of the world y.
- For each training pair <u ′, y>, train the feedforward controller, using y as the input and u ′ as the target.
-
Problem: this approach is not goal-directed: useless outputs (y) get associated with the test actions that produced them (u ′), but real targets (y*) may be ignored.
-
Another problem: with excess DOFs, the IO relationship for the controller is one-to-many. For a given input, an output must be selected. The learning rule averages over possible outputs, which is wrong when the environment is nonlinear.
- Forward modeling (Jordan & Rumelhart)
- Have a separate component in the system (usually a separate neural network) whose job is to learn a model of the environment.
- Train this forward model, using gradient descent, by observing the environment when you give it
different actions.
-
Then freeze the forward model, and use it to train the controller.
-
The system being trained now includes a model of the world, as well as the controller.
-
Since the "world" is now in the system, error can be
propagated all the way back to the controller from the predicted
outcome (y).
-
To train the controller, the input and the target is a goal state y*.
-
The error is the difference between the target and the observed behavior of the whole system (or the predicted behavior of the system in the forward model y′) y.
- For dynamic performance, use sequences of angular accelerations as goals and torques as actions.