There is massively promiscuous metaphor-mixing going on here, and I could deconstruct it 'til the cows come home, but I won't. Consider only one word: "document." When we document something in the real world, we make fixed, permanent, immutable records of it. But computer documents are volatile, ephemeral constellations of data. Sometimes (as when you've just opened or saved them) the document as portrayed in the window is identical to what is stored, under the same name, in a file on the disk, but other times (as when you have made changes without saving them) it is completely different. In any case, every time you hit "Save" you annihilate the previous version of the "document" and replace it with whatever happens to be in the window at the moment. So even the word "save" is being used in a sense that is grotesquely misleading---"destroy one version, save another" would be more accurate.
Anyone who uses a word processor for very long inevitably has the experience of putting hours of work into a long document and then losing it because the computer crashes or the power goes out. Until the moment that it disappears from the screen, the document seems every bit as solid and real as if it had been typed out in ink on paper. But in the next moment, without warning, it is completely and irretrievably gone, as if it had never existed. The user is left with a feeling of disorientation (to say nothing of annoyance) stemming from a kind of metaphor shear--you realize that you've been living and thinking inside of a metaphor that is essentially bogus.
So GUIs use metaphors to make computing easier, but they are bad metaphors. Learning to use them is essentially a word game, a process of learning new definitions of words like "window" and "document" and "save" that are different from, and in many cases almost diametrically opposed to, the old. Somewhat improbably, this has worked very well, at least from a commercial standpoint, which is to say that Apple/Microsoft have made a lot of money off of it. All of the other modern operating systems have learned that in order to be accepted by users they must conceal their underlying gutwork beneath the same sort of spackle. This has some advantages: if you know how to use one GUI operating system, you can probably work out how to use any other in a few minutes. Everything works a little differently, like European plumbing--but with some fiddling around, you can type a memo or surf the web.
Most people who shop for OSes (if they bother to shop at all) are comparing not the underlying functions but the superficial look and feel. The average buyer of an OS is not really paying for, and is not especially interested in, the low-level code that allocates memory or writes bytes onto the disk. What we're really buying is a system of metaphors. And--much more important--what we're buying into is the underlying assumption that metaphors are a good way to deal with the world.
Recently a lot of new hardware has become available that gives computers numerous interesting ways of affecting the real world: making paper spew out of printers, causing words to appear on screens thousands of miles away, shooting beams of radiation through cancer patients, creating realistic moving pictures of the Titanic. Windows is now used as an OS for cash registers and bank tellers' terminals. My satellite TV system uses a sort of GUI to change channels and show program guides. Modern cellular telephones have a crude GUI built into a tiny LCD screen. Even Legos now have a GUI: you can buy a Lego set called Mindstorms that enables you to build little Lego robots and program them through a GUI on your computer.
So we are now asking the GUI to do a lot more than serve as a glorified typewriter. Now we want to become a generalized tool for dealing with reality. This has become a bonanza for companies that make a living out of bringing new technology to the mass market.
Obviously you cannot sell a complicated technological system to people without some sort of interface that enables them to use it. The internal combustion engine was a technological marvel in its day, but useless as a consumer good until a clutch, transmission, steering wheel and throttle were connected to it. That odd collection of gizmos, which survives to this day in every car on the road, made up what we would today call a user interface. But if cars had been invented after Macintoshes, carmakers would not have bothered to gin up all of these arcane devices. We would have a computer screen instead of a dashboard, and a mouse (or at best a joystick) instead of a steering wheel, and we'd shift gears by pulling down a menu:
PARK --- REVERSE --- NEUTRAL ---- 3 2 1 --- Help...
A few lines of computer code can thus be made to substitute for any imaginable mechanical interface. The problem is that in many cases the substitute is a poor one. Driving a car through a GUI would be a miserable experience. Even if the GUI were perfectly bug-free, it would be incredibly dangerous, because menus and buttons simply can't be as responsive as direct mechanical controls. My friend's dad, the gentleman who was restoring the MGB, never would have bothered with it if it had been equipped with a GUI. It wouldn't have been any fun.
The steering wheel and gearshift lever were invented during an era when the most complicated technology in most homes was a butter churn. Those early carmakers were simply lucky, in that they could dream up whatever interface was best suited to the task of driving an automobile, and people would learn it. Likewise with the dial telephone and the AM radio. By the time of the Second World War, most people knew several interfaces: they could not only churn butter but also drive a car, dial a telephone, turn on a radio, summon flame from a cigarette lighter, and change a light bulb.
But now every little thing--wristwatches, VCRs, stoves--is jammed with features, and every feature is useless without an interface. If you are like me, and like most other consumers, you have never used ninety percent of the available features on your microwave oven, VCR, or cellphone. You don't even know that these features exist. The small benefit they might bring you is outweighed by the sheer hassle of having to learn about them. This has got to be a big problem for makers of consumer goods, because they can't compete without offering features.
It's no longer acceptable for engineers to invent a wholly novel user interface for every new product, as they did in the case of the automobile, partly because it's too expensive and partly because ordinary people can only learn so much. If the VCR had been invented a hundred years ago, it would have come with a thumbwheel to adjust the tracking and a gearshift to change between forward and reverse and a big cast-iron handle to load or to eject the cassettes. It would have had a big analog clock on the front of it, and you would have set the time by moving the hands around on the dial. But because the VCR was invented when it was--during a sort of awkward transitional period between the era of mechanical interfaces and GUIs--it just had a bunch of pushbuttons on the front, and in order to set the time you had to push the buttons in just the right way. This must have seemed reasonable enough to the engineers responsible for it, but to many users it was simply impossible. Thus the famous blinking 12:00 that appears on so many VCRs. Computer people call this "the blinking twelve problem". When they talk about it, though, they usually aren't talking about VCRs.