MAICS96: Best, Benkert, Monahan, & Stinson

Comparing Subjective Contour Perception in Artificial Neural Networks and Infants

Bradley J. Best, Michele M. Benkert, John S. Monahan, Michael C. Stinson

Central Michigan University

Brad Best
PO Box 2152
Midland, MI 48641
71044.3661@compuserve.com

Michele M. Benkert
819 E. Gaylord St.
Mt. Pleasant, MI 48858

John S. Monahan
Department of Psychology
Central Michigan University
Mt. Pleasant, MI 48859
john.monahan@cmich.edu

Michael C. Stinson
Department of Computer Science
Central Michigan University
Mt. Pleasant, MI 48859
stinson@cps.cmich.edu

Neural networks and biological plausibility

Artificial Neural Networks (ANNs) often claim to share features with biological neural networks. Fukushima, Ito, and Miyake (1983) introduced the Neocognitron, a network architecture based on generalizations of the pioneering work of Hubel and Wiesel in determining cortical organization of the visual system. Kohonen (1982) developed a self-organizing memory in which clusters developed by the system have a meaningful topology -- neighboring clusters respond to similar stimuli. Grossberg (1995) postulated that Adaptive Resonance Theory (ART) processes may be universal throughout the brain.

Despite ANNs neurophysiological heritage, the field has not stressed comparisons with human performance. The traditional approach has been application oriented -- e.g., tailoring a network to select stocks based on market data -- using the statistical pattern recognition capabilities of ANNs without supporting the parallels drawn with biological systems.

This may be because opinions regarding how biological systems learn and operate range from everything being "hard-wired", or genetically coded, to everything being learned through experience. ANNs span these possibilities from the self-organizing Kohonen and ART networks to the Neocognitron in which cells within each layer are trained to respond to specific stimuli. The debate on "hard-wiring" is, however, focused on early visual processing. Learning certainly occurs in later visual processing through an unsupervised method probably integrating Hebbian learning (connections between cells that fire simultaneously are strengthened) with decorrelation, or anti-Hebbian learning (connections between cells that do not fire simultaneously are weakened; Oram & Perret, 1994).

Once trained a network may operate in a feedforward manner in which information travels solely from input layers toward output layers (e.g., the Neocognitron), or in a recurrent manner in which information may flow from early layers to later layers and back until the network stabilizes (e.g., ART). Biological networks exhibit both processes: object recognition operates in a feed-forward mode but the time needed to perceive subjective contours probably indicates recurrent operation (c.f. below; Oram & Perret, 1994).

The human visual system

The brain keeps many complete maps of the retinal images progressing from the Lateral Geniculate Nucleus (LGN) where simple information is present (e.g., local relative contrast) to cortical areas where cells respond to highly complex information (e.g., only responding to faces). At each stage the brain actively transforms sensory input. LGN cells construct center surround response patterns, which respond best to contrast between the center of the receptive field and the surrounding area, and pass information to the primary visual cortex where the response pattern of cells changes. Cortical "simple" cells respond best to a bar or slit shaped stimulus of a particular orientation within their receptive fields. Complex cells generally have larger receptive fields and combine the inputs of several simple cells. Hypercomplex cells represent a deeper processing level and may respond to an optimally oriented stimulus in a position invariant manner. Each local cortical area exhaustively samples input by maintaining a complete set of cells sensitive to all possible orientations of stimuli. Within these modules the cells are stacked in columns sharing orientation specificity. (Hubel, 1988).

Oram and Perret (1994) divide primate form-processing into four stages: 1) contour extraction and feature grouping (visual areas V1-V2/V3), 2) combination of features and contours into complex detectors at specific receptive field sites (V4, PIT, CIT), 3) processing approximate object features at a specific size and orientation (AIT), and 4) view specific generalization across object instances (STPa).

The essential characteristics that an ANN model of human vision must capture appear to be the hierarchical structure of simple to complex information and the representation of a transformation of the complete input image at different stages.

Neural Network Models

ANN training methods adapt the network weights to produce the desired output. In supervised training methods the correct network output is provided to compare with the current network output. The weights are then refined to produce a result closer to the proper output. In unsupervised training methods the network separates the data by recognizing its statistical regularities (Zurada, 1992). An alternate method is to assume network weights based on some a priori judgment (Rumelhart, Hinton, & Williams, 1986) but this requires the designer to know in advance what the connection values must be.

After "learning", a network may be presented with stimuli and will produce a response. Depending on the type of network it will use either feedforward recall or recurrent recall. Oram and Perret (1994) demonstrated that the rapid discriminations made within the visual system "...cannot therefore rely on lateral or feedback processing… because such processing would delay the response latencies…" (p. 961). Therefore, object recognition in the primate visual system proceeds in a primarily feedforward fashion (other perceptual processes may involve feedback loops).

The Neocognitron, a feedforward network, is described as a "multilayered network with a hierarchical structure similar to the hierarchical model for the visual system proposed by Hubel and Wiesel" (Fukushima, Miyake, and Ito, 1983). Fukushima, et. al. (1983) demonstrated an implementation that recognized deformed and varied handwritten characters in different spatial locations thus displaying positional invariance in feature detection. The Neocognitron uses either supervised or unsupervised learning and can be adapted to use other ANN learning laws.

Adaptive Resonance Theory (ART) relies on the principle that our perceptions are matched against our expectations and that visual object recognition is dependent on category assignment of perceived objects (Grossberg, 1995). When learning, an ART network compares stimuli to stored patterns. If the input differs enough from the closest stored pattern (by an amount set by the "vigilance" parameter) a new cluster is formed. If the input is instead similar enough to an existing cluster the weights corresponding only to this cluster are updated. Because of this, an ART network ". . . exhibits a degree of plasticity when acquiring new cluster data . . . the recall of the data already learned is not affected" (Zurada, 1992, p. 444). Like biological networks, ART networks can generalize (categorical learning) and learn in an unsupervised mode. Also, they employ feedback loops in both learning and recall modes.

The human visual system also feeds information back to previous layers (Oram & Perret, 1994). In ART, feedback loops allow the network to resonate -- as an input pattern is transformed through processing and is then represented at the inputs it will eventually result in the output of a previously learned pattern. Grossberg (1995) states "Once the reciprocal feedback equilibrates, the bottom-up and top-down signals lock the activity pattern in a resonant state... only resonant states of the brain can achieve consciousness, and the time needed to develop resonance helps to explain why an event's perception takes so long" (p. 440).

Like ART, the Kohonen network clusters patterns without supervision using a winner-take-all strategy, where only the cluster with the strongest output has its connections strengthened. These units are arranged in an explicit topology where neighboring clusters learn similar patterns which results in similar clusters being close together. As single layer networks, both Kohonen and ART networks may be limited to modeling single stages of visual processing.

Infants and Perception of Subjective Contours

One problem in relating ANN performance to human performance is constructing a reasonable experiment. Adults bring considerable knowledge to the experimental setting which influences their responses. Therefore, perceptual studies of infants may be a better area for comparison. Infants share several qualities with ANNs that constrain the experimental questions that can be posed. First, infants do not have the perceptual experience of adults and therefore do not perceive exactly the same way. Second, infants cannot be verbally instructed. Third, perceptual studies of infants typically involve stimuli discrimination. Based on the presence or absence of a discrimination, inferences are made comparing infants to adults. This paradigm suits neural network experiments since category classification is a fundamental ANN function.

Figure 1a: Figure 1b: Figure 1c:

Subjective contours, edges we perceive that have no physical reality, have been studied in this way with infants. In figure 1a, we perceive a white square of a brighter white than the surround covering four dark corner objects but the white is of identical brightness throughout. The illusion is created by neural perceptual processes in the presence of discontinuities in the image boundaries (Shipley & Kellman, 1990). Because it takes time to stabilize it may be due to feedback processes (Bertenthal, Campos, & Haith, 1980).

Bertenthal, et. al. (1980) determined that infants 7 months of age consistently respond differentially to images containing subjective contours when compared to similar images that do not. Their stimuli (figures 1a-1c) have identical overall intensity, number of elements, and individual element appearance. The only difference is the orientation of the corner elements. Infants express more interest, measured by length of gaze, in the configuration where the corner elements all open inward. Because infants are sensitive to the arrangement that produces the subjective contour in adults, we infer they perceive subjective contours.

We tested the ANNs using these stimuli to examine two hypotheses: 1) infants discrimination might be replicated by a network simply on the basis of statistical regularities in the images, and 2) a hierarchical network with a structure more consistent with the visual system might explain the presence of subjective contours.

Method

The Kohonen, ART1, and Neocognitron simulations were developed with Borland Pascal 7.0 and run on IBM PC compatible 486 computers. The Kohonen network implementation used a "conscience" mechanism (Hecht-Nielsen, 1990, p. 69) to insure equiprobable cluster distribution. We implemented the Neocognitron architecture using the Kohonen learning law as suggested by Hecht-Nielsen (1990, p. 210) to develop the network weights.

Figure 2a:4 circles Figure 2b:square

We trained the networks to discriminate between four black circles on a white background (figure 2a) and a white square on a black background (figure 2b). We then presented the stimuli from the Bertenthal, et. al. (1980) experiment for classification. We also presented stimuli where the corner elements were rectangular (figure 3a), squared off circles (figure 3b), or open to the center but misaligned (figure 3c).

Figure 3a: Figure 3b: Figure 3c:

The Kohonen network was allowed to develop four clusters, two of which adapted to represent the training inputs. The ART1 network developed an identical cluster pair.

The images were presented to the networks as 49 by 49 bitmaps, the smallest image possible (for computational reasons) that still caused the subjective contour illusion in human judges.

Results

The ART and Kohonen networks performed identically. They clustered the input images as separate clusters and proceeded to identify the subjective contour illusion (Fig. 1a) as well as the other arrangements with the same corner elements (Fig. 1b and 1c) and the misaligned stimulus (Fig. 3c) as belonging to the cluster defined by the four black circles. The ART and Kohonen networks identified the stimuli with squared off corner elements (Fig. 3a) as belonging to the cluster defined by the white square (the specific cluster number is arbitrary, whether they match is what matters).

ART and Kohonen networks both rely on a distance measure to compare unknowns with stored patterns. They make matches based on the number of elements an unknown shares with a memory.

The Neocognitron matched the three figures that produce subjective contours with a cluster unit that neither of the training vectors matched. In addition, the configuration with two elements open inward and two outward, which produces a less significant subjective contour, (fig. 1b) was mapped onto this cluster. The misaligned corner element stimulus (fig. 1c) was also mapped onto this cluster representing the four circles (fig. 2b). The outward opening stimulus (fig. 1c) was matched with a cluster that matched no other stimuli.

Table 1:
Figure - Description         ART            Kohonen        Neocognitron   Contour? 
1a - subj. contour           A              A              C              Yes
1b - half subj. contour      A              A              C              Yes
1c - reversed corners        A              A              D              No
2a - circles (training)      A              A              A              No
2b - square (training)       B              B              B              No
3a - squared corners         B              B              C              Yes
3b - blunted corners         B              B              C              Yes
3c - misaligned corners      A              A              C              No

Discussion

The Kohonen and ART networks used a decision criteria made obvious by the use of visual images: the number of matching elements. The position and shapes created by these elements were irrelevant to this matching procedure. Infants, who discriminate between fig.1a and 1c, are apparently making a qualitatively different decision than these networks. Shape and position were, however, clearly important to the Neocognitron which considered the misaligned stimulus the same as the subjective contour stimulus.

The point of interest is if the ANNs discriminate between stimuli that give rise to subjective contours and those that do not, and if they represent shapes internally in a way that would make this possible. Because the stimuli share many characteristics (average intensity, number of elements, appearance of elements) the networks cannot discriminate between the stimuli based on anything except the orientation of the corner elements. Although we could have trained the networks to do exactly that, we were interested in whether neural network models that made claims to biological plausibility gave rise to basic perceptual phenomena.

ART is frequently discussed as a physiological model (e.g., Grossberg 1995) but its recurrent operation violates the physiological data which show that object recognition can proceed in a purely feedforward method (the model may still be relevant to subjective contours). Further, unlike the human visual system, an ART network is highly noise sensitive (Zurada, 1992) and may classify dissimilar patterns as belonging to the same category or classify similar patterns as belonging to different categories. Finally, although Grossberg (1995) applied the network to model processes known to take place in the LGN, the striate cortex, and the inferotemporal cortex (IT) and postulated that ART processes may be universal within the brain, ART only models a single layer which appears insufficient to produce or predict the existence of subjective contours.

The roots of subjective contours might be found instead in multi-layer hierarchical networks such as the Neocognitron. Our expectation was that the Neocognitron output units representing both the square and the circle would fire only when the corners were aligned in a way that created a subjective contour illusion. The network seems to have met this expectation by matching only those figures which produce subjective contours to a cluster that did not match either training pattern. The misaligned stimulus was clustered with the subjective contour figures, which is consistent with the deformation resistance of this network. There is, however, a difficulty in determining exactly what the cluster represents since many transformations have occurred by the output layer. It may be possible to drive the network backwards to reconstruct a prototypical input represented by the cluster.

Although the Neocognitron normally learns "by a sequential directed learning procedure, where only a single layer at a time is plastic, and where the types of features that the layer should respond to were known" (Anderson and Rosenfeld, p.535), in the interest of plausibility, we explored a suggestion by Hecht-Nielsen (1990) to combine features of the Neocognitron and Kohonen networks. Using the architecture of the Neocognitron as described by Fukushima, et. al. (1983) and combining it with the Kohonen learning method to train the individual layers of the network, we created a network that consisted of self-organizing hierarchical feature maps. The network determined features through clustering random sections of the input. Low level features derived included corners and edges of particular orientations which were combined into more complex representations in higher layers.

Analyzing the results from the Neocognitron is difficult given that the network consists of 8 different layers, each of which is made up of up to 37 independent sub-layers within each layer representing the presence of a particular feature within the input. By examining the activation levels of the sub-layers during execution, it was possible to find network cells that responded to both light and dark edges. Further, the network activated many of the same cells for the subjective contour stimulus that were active for the two training stimuli, indicating some internal confusion as to which object was actually present. Although we cannot say the network perceived a subjective contour, the building blocks of subjective contours may be present.

In inferring what it means when ANNs can or cannot discriminate between stimuli, it is important to note concerns expressed in the perception literature. Bertenthal, Campos, & Haith (1980) noted that infants might just respond to the configuration in which the corner elements were all open to the inside and not to any subjective contour. This is also true of the ANNs used in this study. For this reason the misaligned stimuli (figure 3c) was included to see whether the networks would discriminate between aligned and misaligned stimuli. That the Neocognitron did so is evidence of a qualitatively different kind of discrimination when compared to the other networks used here.

Combining ANN research and infant perceptual studies allows a cross fertilization whereby we can compare stimuli and infant perceptual performance based on their statistical properties and ANNs based on their perceptual properties. We believe this is an area of research that holds great promise.

References

Anderson, J. A. & Rosenfeld, E. (1988), Eds. Neurocomputing. Cambridge, MA: The MIT Press.

Bertenthal, B. I., Campos, J. J., & Haith, M. M. (1980). Development of Visual Organization: The Perception of Subjective Contours. Child Development, 51, 1072-1080.

Fukushima, K., Miyake, S. & Ito, T. (1983). Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13, 826-834.

Grossberg, S. (1995). The attentive brain: Neural networks that match sensory inputs with learned expectations help explain how humans see, hear, learn and recognize information. American Scientist, 83, 438-449.

Hecht-Nielsen, R. (1990). Neurocomputing. Reading, MA: Addison-Wesley.

Hubel, D. H. (1988). Eye, Brain, and Vision. New York: W. H. Freeman and Company.

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59-69.

Oram, M. W. & Perret, D. I . (1994). Modeling visual recognition from neurobiological constraints. Neural Networks, 7, 945-972.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructures of Cognition, 1, 318-362.

Shipley, T. F. & Kellman, P. J. (1990). The role of discontinuities in the perception of subjective figures. Perception & Psychophysics, 48, 259-270.

Zurada, J. M. (1992). An Introduction to Artificial Neural Systems. St. Paul: West Publishing Company.