`E = E_0 + half epsilon sum_i sum_j w_(ij)^2`
`w_(ij)^(text(new)) = (1 - epsilon) w_(ij)^(text(old))`
`E = E_0 + half lambda sum_i sum_j (w_(ij) / w_0)^2 / (1 + (w_(ij) / w_0)^2)`
Relatively large w0 results in a preference for many small weights; relatively small w0 a preference for fewer large weights.create-unit, and again
train all weights to output layer.
Create-unit