Выбрать главу

Freud’s ‘Id’ consists of instinctive drives, while his ‘Superego’ embodies our learned ideals (many of which are inhibitions). The ‘Ego’ would then be those parts in between—the deliberate and reflective levels—whose principal, at least in Freud’s view, is to resolve the conflicts between our instincts and our acquired ideals. Then a person’s ego may represent itself as being in control of things—whereas a friend or psychiatrist may see that ego as a battlefield.

Student: To repeat the question I earlier asked, would you use the word ‘conscious’ for such a machine? It seems to me that Model Six includes all the features you mentioned in §4-1, namely, short-term memory, serial processing, high-level descriptions and room for self-models.

It would not surprise me if such a machine, after having acquired the right kinds of knowledge, were to declare that it as conscious as we claim to be. This sort of thing could happen if, as we’ll suggest in Chapter §9, its highest levels built models that represent its ‘Self’ as a single, self-aware entity. Of course, other entities might disagree.

∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞

This chapter began by asking how we could conceive of things that we’ve never seen or experienced. The rest of this chapter will show more details of how our imagination could result from multiple levels of processing.

∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞∞

§5-7. Imagination

“We don’t see things as they are. We see things as we are.”

—Anais Nin

When Carol picks up one of her blocks, that action seems utterly simple to her: she just reaches out, grasps it, and lifts it up. She just sees that block and knows how to act. No ‘thinking’ seems to intervene.

However, the seeming ‘directness’ of seeing the world is an illusion that comes from our failure to sense its complexity. For, most of what we think we see comes from our knowledge and from our imaginations. Thus, consider this portrait of Abraham Lincoln made by my old friend Leon Harmon, a pioneer in computerized graphics. (To the right is a portrait that I made of Leon.)

How do you recognize features in pictures so sparse that a nose or an eye is but three or four patches of darkness or light? Clearly, you do this by using additional knowledge. For example, when you sit at a table across from your friends, you cannot see their backs or legs—but your knowledge-based systems assume by default that all those body-parts are present. Thus we take our perceptual talents for granted—but ‘seeing’ seems simple only because the rest of our minds are virtually blind to the processes that we use to do it.

In 1965 it was our goal was to develop machines that could do some of the things that most children can do—such as pouring a liquid into a cup, or building arches and towers like this from disorderly clutters of building blocks.[80] To do this, we built a variety of mechanical hands and electronic eyes—and we connected these to our computer.

When we built that first robot for building with blocks, it made hundreds of different kinds of mistakes.[81] It would try to put blocks on top of themselves, or try to put two of them in the same place, because it did not yet have the commonsense knowledge one needs to manipulate physical objects! Even today, no one has yet made a visual system that behaves in anything close to humanlike ways to distinguish the objects in everyday visual scenes. But eventually, our army of students developed programs that could “see” arrangements of plain wooden blocks well enough to recognize that a structure like this is “a horizontal block on top of two upright ones.”

It took several years for us to make a computer-based robot called Builder that could do a variety of such things—such as to build an arch or tower of blocks from a disorderly pile of children’s blocks, after seeing a single example of it. We encountered troubles at every stage but sometime those programs managed to work when arranged into a sequence of levels of processes. (Note that these do not much resemble the levels of Model Six, but do tend to progress from highly specific to very abstract.)

Begin with an image of separate points.

Identify these as textures and edges, etc.

Group these into regions and shapes.

Assemble these into possible objects.

Try to identify them as familiar things.

Describe their spatial relationships.

However, those low-level stages would frequently fail to find enough usable features. Look at this magnified digital view of the lower front edge of the top of that arch:

That particular edge is hard to see because the two regions that it bounds have almost identical textures.[82] We tried a dozen different ways to recognize edges, but no single method worked well by itself. Eventually we got better results by finding ways to combine them. We had the same experience at every leveclass="underline" no single method ever sufficed, but it helped to combine several different ones. Still, in the end, that step-by-step model failed, because Builder still made too many mistakes. We concluded that this was because the information in our system flowed only in the input-to-output direction — so if any level made a mistake, there was no further chance to correct it. To fix this we had to add many ‘top-down’ paths, so that knowledge could flow both down and up.

The same applies to the actions we take because, when we want to change the situation we’re in, then we’ll need plans for what we will do, so all this applies to the Do’s of our rules. For example a rule like, “If you see a block, Do pick it up” leads to a complex sequence of acts: before you begin to lift a block, you need to form an action-plan to direct your shoulder, arm, and hand to do this without upsetting the objects surrounding that block). So again, one needs high-level processes, and making these plans will equally need to use multiple levels of processing—so our diagram must become something like this:

Each Action Planner reacts to a scene by composing a sequence of Motion-Goals, which in turn will execute Motor Skills like ‘reach for,’ ‘grasp,’ ‘lift up,’ and then ‘move’. Each Motor-Skill is a specialist at controlling how certain muscles and joints will move—so what started out as a simple Reaction-Machine turned into a large and complex system in which each If and Do involves multiple steps and the processes at every stage exchange signals from both below and above.

In earlier times the most common view was that our visual systems work from “bottom to top,” first by discerning the low-level features of scenes, then assembling them into regions and shapes, and finally recognizing the objects. However, in recent years it has become clear that our highest-level expectations affect what happens in the “earliest” stages.

вернуться

80

Some early steps in that project are described in ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-200.pdf.

вернуться

81

See http://web.media.mit.edu/~minsky/papers/PR1971.html

вернуться

82

In fact, that darker horizontal streak is not the lower edge, but is part of the surface next to that edge, slightly shadowed because that edge is worn-down.