Выбрать главу

"foo" ; the string containing the characters f, o, and o.

"fo\o" ; the same string

"fo\\o" ; the string containing the characters f, o, \, and o.

"fo\"o" ; the string containing the characters f, o, ", and o.

Names used in Lisp programs, such as FORMAT and hello-world, and *db* are represented by objects called symbols. The reader knows nothing about how a given name is going to be used—whether it's the name of a variable, a function, or something else. It just reads a sequence of characters and builds an object to represent the name.[41] Almost any character can appear in a name. Whitespace characters can't, though, because the elements of lists are separated by whitespace. Digits can appear in names as long as the name as a whole can't be interpreted as a number. Similarly, names can contain periods, but the reader can't read a name that consists only of periods. Ten characters that serve other syntactic purposes can't appear in names: open and close parentheses, double and single quotes, backtick, comma, colon, semicolon, backslash, and vertical bar. And even those characters can, if you're willing to escape them by preceding the character to be escaped with a backslash or by surrounding the part of the name containing characters that need escaping with vertical bars.

Two important characteristics of the way the reader translates names to symbol objects have to do with how it treats the case of letters in names and how it ensures that the same name is always read as the same symbol. While reading names, the reader converts all unescaped characters in a name to their uppercase equivalents. Thus, the reader will read foo, Foo, and FOO as the same symboclass="underline" FOO. However, \f\o\o and |foo| will both be read as foo, which is a different object than the symbol FOO. This is why when you define a function at the REPL and it prints the name of the function, it's been converted to uppercase. Standard style, these days, is to write code in all lowercase and let the reader change names to uppercase.[42]

To ensure that the same textual name is always read as the same symbol, the reader interns symbols—after it has read the name and converted it to all uppercase, the reader looks in a table called a package for an existing symbol with the same name. If it can't find one, it creates a new symbol and adds it to the table. Otherwise, it returns the symbol already in the table. Thus, anywhere the same name appears in any s-expression, the same object will be used to represent it.[43]

Because names can contain many more characters in Lisp than they can in Algol-derived languages, certain naming conventions are distinct to Lisp, such as the use of hyphenated names like hello-world. Another important convention is that global variables are given names that start and end with *. Similarly, constants are given names starting and ending in +. And some programmers will name particularly low-level functions with names that start with % or even %%. The names defined in the language standard use only the alphabetic characters (A-Z) plus *, +, -, /, 1, 2, <, =, >, and &.

The syntax for lists, numbers, strings, and symbols can describe a good percentage of Lisp programs. Other rules describe notations for literal vectors, individual characters, and arrays, which I'll cover when I talk about the associated data types in Chapters 10 and 11. For now the key thing to understand is how you can combine numbers, strings, and symbols with parentheses-delimited lists to build s-expressions representing arbitrary trees of objects. Some simple examples look like this:

x ; the symbol X

() ; the empty list

(1 2 3) ; a list of three numbers

("foo" "bar") ; a list of two strings

(x y z) ; a list of three symbols

(x 1 "foo") ; a list of a symbol, a number, and a string

(+ (* 2 3) 4) ; a list of a symbol, a list, and a number.

An only slightly more complex example is the following four-item list that contains two symbols, the empty list, and another list, itself containing two symbols and a string:

(defun hello-world ()

(format t "hello, world"))

S-expressions As Lisp Forms

After the reader has translated a bunch of text into s-expressions, the s-expressions can then be evaluated as Lisp code. Or some of them can—not every s-expressions that the reader can read can necessarily be evaluated as Lisp code. Common Lisp's evaluation rule defines a second level of syntax that determines which s-expressions can be treated as Lisp forms.[44] The syntactic rules at this level are quite simple. Any atom—any nonlist or the empty list—is a legal Lisp form as is any list that has a symbol as its first element.[45]

Of course, the interesting thing about Lisp forms isn't their syntax but how they're evaluated. For purposes of discussion, you can think of the evaluator as a function that takes as an argument a syntactically well-formed Lisp form and returns a value, which we can call the value of the form. Of course, when the evaluator is a compiler, this is a bit of a simplification—in that case, the evaluator is given an expression and generates code that will compute the appropriate value when it's run. But this simplification lets me describe the semantics of Common Lisp in terms of how the different kinds of Lisp forms are evaluated by this notional function.

The simplest Lisp forms, atoms, can be divided into two categories: symbols and everything else. A symbol, evaluated as a form, is considered the name of a variable and evaluates to the current value of the variable.[46] I'll discuss in Chapter 6 how variables get their values in the first place. You should also note that certain "variables" are that old oxymoron of programming: "constant variables." For instance, the symbol PI names a constant variable whose value is the best possible floating-point approximation to the mathematical constant pi.

All other atoms—numbers and strings are the kinds you've seen so far—are self-evaluating objects. This means when such an expression is passed to the notional evaluation function, it's simply returned. You saw examples of self-evaluating objects in Chapter 2 when you typed 10 and "hello, world" at the REPL.

It's also possible for symbols to be self-evaluating in the sense that the variables they name can be assigned the value of the symbol itself. Two important constants that are defined this way are T and NIL, the canonical true and false values. I'll discuss their role as booleans in the section "Truth, Falsehood, and Equality."

Another class of self-evaluating symbols are the keyword symbols—symbols whose names start with :. When the reader interns such a name, it automatically defines a constant variable with the name and with the symbol as the value.

вернуться

41

In fact, as you'll see later, names aren't intrinsically tied to any one kind of thing. You can use the same name, depending on context, to refer to both a variable and a function, not to mention several other possibilities.

вернуться

42

The case-converting behavior of the reader can, in fact, be customized, but understanding when and how to change it requires a much deeper discussion of the relation between names, symbols, and other program elements than I'm ready to get into just yet.

вернуться

43

I'll discuss the relation between symbols and packages in more detail in Chapter 21.

вернуться

44

Of course, other levels of correctness exist in Lisp, as in other languages. For instance, the s-expression that results from reading (foo 1 2) is syntactically well-formed but can be evaluated only if foo is the name of a function or macro.

вернуться

45

One other rarely used kind of Lisp form is a list whose first element is a lambda form. I'll discuss this kind of form in Chapter 5.

вернуться

46

One other possibility exists—it's possible to define symbol macros that are evaluated slightly differently. We won't worry about them.