Выбрать главу

V.S. Ramachandran: “[Most old theories of perception] are based on a now largely discredited “bucket brigade” model of vision, the sequential hierarchical model that ascribes our esthetic response only to the very last stage—the big jolt of recognition. In my view … there are minijolts at each stage of visual segmentation before the final ‘Aha’. Indeed the very act of perceptual groping for objectlike entities may be pleasurable in the same way a jigsaw puzzle is. Art, in other words, is visual foreplay before the final climax of recognition.”[83]

In fact, today we know that visual systems in our brains receive many more signals from the rest of the brain than signals that come in from our eyes.

Richard Gregory:“Such a major contribution of stored knowledge to perception is consistent with the recently discovered richness of downgoing pathways in brain anatomy. Some 80% of fibers to the lateral geniculate nucleus relay station come downwards from the cortex, and only about 20% from the retinas.”[84]

Presumably those signals suggest which kinds of features to detect or which kinds of objects might be in sight. Thus, once you suspect that you’re inside a kitchen, you will be more disposed to recognize objects as saucers or cups.

All this means that the higher levels of your brain never perceive a visual scene as just a collection of pigment spots; instead, your Scene-Describing resources must represent this block-arch in terms (for example) like “horizontal block on top of two upright ones.” Without the use of such ‘high-level’ Ifs, reaction-rules would rarely be practical.

Accordingly, for Builder to use sensory evidence, it needed some knowledge of what that data might possibly mean, so we provided Builder with representations of the shapes of the objects that it was to face. Then, from assuming that something was made of rectangular blocks, one of those programs could frequently ‘figure out’ just which blocks appeared in a scene, based only on seeing its silhouette! It did this by making a series of guesses like these:

Once that program discerns a few of those edges, it imagines more parts of the blocks they belong to, and then uses those guesses to search for more clues, moving up and down among those stages. The program was frequently better at this than were the researchers who programmed it.[85]

We also gave Builder additional knowledge about the most usual ‘meanings’ of corners and edges. For example, if the program found edges like these then it could guess that they all might belong to a single block; then the program would try to find an object that might be hiding the rest of those edges.[86]

Our low-level systems see patches and fragments, but then we use ‘context’ to guess what they mean—and then confirm those conjectures by using several levels and types of intermediate processes. In other words, we ‘re-cognize’ things by being ‘re-minded’ of familiar objects that could match fragments of incomplete evidence. But we still do not know enough about how our high-level expectations affect which features our low-level systems detect. For example, why don’t we see the middle figure below as having the same shape as its neighbors?

In an excellent survey of this subject, Zenon Pylyshyn describes several theories about such things, but concludes that we still have a great deal to learn.[87]

∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞

§5-8. The Concept of a “Simulus”

Reality leaves a lot to the imagination.

—John Lennon.

All of us can recognize an arch composed of rectangular blocks.

But also, we all can imagine how it would look if its top were replaced by a three-sided block.

How could a program or mind ‘imagine’ things that are not present in the scene? We could do this by ‘envisioning’ a change at any perceptual stage!

Making changes at very low levels: In principle, we could make a new image by changing each spot of the retinal picture—but in practice, such changes would need huge computations. Also, if you wanted to shift your point of view, you’d have to compute the whole image again. Worse, before you could do such a computation, some part of your mind would first have to know precisely what that picture describes. But to do this you’d already have to represent this at some higher descriptive level—but then, why do all that calculation?

Making changes at intermediate stages: One could change, not the picture itself, but parts of higher-level descriptions. For example, at the level of Region-Finders one could change the name of that top block’s front face from “rectangle” to “triangle.” However, this would cause trouble at other levels, because that triangle’s edges would not have the proper relations to the edges of regions that neighbor on it.

Below we’ll see that it would be better to replace the whole block at the higher Object-Finder level.

Visualizer: I sometimes have troubles like that in my mind. When I try to imagine a triangular shape, I know where its three lines should appear, but I ‘see’ them as nebulous, glowing, streaks whose ill-defined ends many not properly meet. When I try to correct this by ‘pushing’ a line, it abruptly moves with some constant speed that I cannot change—and when I tell that line to stop, it tends to keep moving anyway (though, strangely, it never gets far away).

That person was trying to change descriptions but had trouble maintaining the proper relationships between their parts. Imagining is somewhat like seeing, except that when we alter internal representations, they may not maintain their consistency. A real object can’t move with two different speeds at once, nor can two real lines both intersect and not meet—but imagination has no such constraints. Of the clear affront

Making changes at the higher semantic levels: You could imagine replacing the top of that arch by merely changing the name of its shape, e.g., by replacing rectangular by triangular in, “A rectangular block supported by two upright blocks.”

Now think about how efficient this is! To make such a change at the sensory level, you would have to alter thousands of ‘pixels’—the items of data that make up a picture—whereas you need only to change a single word when you work at an abstract linguistic level, to represent an entire thing by one or only a few compact symbols. Of course, those symbols are useless unless each one is connected to structures that give more details or ‘meanings’ to them.

Our Builder system could do such things by making changes in what we call “Semantic Networks.” For example, it could represent a three-block Arch by describing relations between three blocks.[88] Then, to ‘imagine’ a triangular block on the top, Builder needs only to change a single link:

вернуться

83

V.S. Ramachandran, Science, v305 no.5685, 6 August 2004.

вернуться

85

This program was based on ideas of Yoshiaki Shirai (and Manuel Blum). See ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-263.pdf. However, I should add that Builder had almost no competence for any but neat geometrical scenes—and, so far as I know, there still are no ‘general-purpose vision machines” that can, for example, look around a room and recognize everyday objects therein. I suspect that this is mainly because they lack enough knowledge about real-world objects; we’ll discuss this more in Chapter 6.

вернуться

86

See papers by Adolfo Guzman and David Waltz at ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-139.pdf and

ftp://publications.ai.mit.edu/ai-publications/pdf/AITR-271.pdf

вернуться

87

See Zenon Pylyshyn, http://ruccs.rutgers.edu/faculty/ZPbbs98.html. [Broken Link] The octagon example is from Kanizsa, G. (1985). Seeing and Thinking. Acta Psychologica, 59, 23-33.

вернуться

88

In this kind of diagram, each object is represented by a network that describes relationships between its parts. Then each part, in turn, is further described in terms of relationships between its parts, etc.,—until those sub-descriptions descend to a level at which each one because a simple list of properties, such as an object’s color, size, and shape. For more details, see §§Frames, Quillian’s thesis in Semantic Information Processing, and Patrick Winston’s book, The Psychology of Computer Vision.