I can't really make the complete case for Lisp's syntax until I've explained Lisp's macros a bit more thoroughly, but I can start with an historical tidbit that suggests it may be worth keeping an open mind: when John McCarthy first invented Lisp, he intended to implement a more Algol-like syntax, which he called M-expressions. However, he never got around to it. He explained why not in his article "History of Lisp."[36]
The project of defining M-expressions precisely and compiling them or at least translating them into S-expressions was neither finalized nor explicitly abandoned. It just receded into the indefinite future, and a new generation of programmers appeared who preferred [S-expressions] to any FORTRAN-like or ALGOL-like notation that could be devised.
In other words, the people who have actually used Lisp over the past 45 years have liked the syntax and have found that it makes the language more powerful. In the next few chapters, you'll begin to see why.
Breaking Open the Black Box
Before we look at the specifics of Lisp's syntax and semantics, it's worth taking a moment to look at how they're defined and how this differs from many other languages.
In most programming languages, the language processor—whether an interpreter or a compiler—operates as a black box: you shove a sequence of characters representing the text of a program into the black box, and it—depending on whether it's an interpreter or a compiler—either executes the behaviors indicated or produces a compiled version of the program that will execute the behaviors when it's run.
Inside the black box, of course, language processors are usually divided into subsystems that are each responsible for one part of the task of translating a program text into behavior or object code. A typical division is to split the processor into three phases, each of which feeds into the next: a lexical analyzer breaks up the stream of characters into tokens and feeds them to a parser that builds a tree representing the expressions in the program, according to the language's grammar. This tree—called an abstract syntax tree—is then fed to an evaluator that either interprets it directly or compiles it into some other language such as machine code. Because the language processor is a black box, the data structures used by the processor, such as the tokens and abstract syntax trees, are of interest only to the language implementer.
In Common Lisp things are sliced up a bit differently, with consequences for both the implementer and for how the language is defined. Instead of a single black box that goes from text to program behavior in one step, Common Lisp defines two black boxes, one that translates text into Lisp objects and another that implements the semantics of the language in terms of those objects. The first box is called the reader, and the second is called the evaluator.[37]
Each black box defines one level of syntax. The reader defines how strings of characters can be translated into Lisp objects called s-expressions.[38] Since the s-expression syntax includes syntax for lists of arbitrary objects, including other lists, s-expressions can represent arbitrary tree expressions, much like the abstract syntax tree generated by the parsers for non-Lisp languages.
The evaluator then defines a syntax of Lisp forms that can be built out of s-expressions. Not all s-expressions are legal Lisp forms any more than all sequences of characters are legal s-expressions. For instance, both (foo 1 2)
and ("foo" 1 2)
are s-expressions, but only the former can be a Lisp form since a list that starts with a string has no meaning as a Lisp form.
This split of the black box has a couple of consequences. One is that you can use s-expressions, as you saw in Chapter 3, as an externalizable data format for data other than source code, using READ
to read it and PRINT
to print it.[39] The other consequence is that since the semantics of the language are defined in terms of trees of objects rather than strings of characters, it's easier to generate code within the language than it would be if you had to generate code as text. Generating code completely from scratch is only marginally easier—building up lists vs. building up strings is about the same amount of work. The real win, however, is that you can generate code by manipulating existing data. This is the basis for Lisp's macros, which I'll discuss in much more detail in future chapters. For now I'll focus on the two levels of syntax defined by Common Lisp: the syntax of s-expressions understood by the reader and the syntax of Lisp forms understood by the evaluator.
S-expressions
The basic elements of s-expressions are lists and atoms. Lists are delimited by parentheses and can contain any number of whitespace-separated elements. Atoms are everything else.[40] The elements of lists are themselves s-expressions (in other words, atoms or nested lists). Comments—which aren't, technically speaking, s-expressions—start with a semicolon, extend to the end of a line, and are treated essentially like whitespace.
And that's pretty much it. Since lists are syntactically so trivial, the only remaining syntactic rules you need to know are those governing the form of different kinds of atoms. In this section I'll describe the rules for the most commonly used kinds of atoms: numbers, strings, and names. After that, I'll cover how s-expressions composed of these elements can be evaluated as Lisp forms.
Numbers are fairly straightforward: any sequence of digits—possibly prefaced with a sign (+
or -
), containing a decimal point (.
) or a solidus (/
), or ending with an exponent marker—is read as a number. For example:
123 ; the integer one hundred twenty-three
3/7 ; the ratio three-sevenths
1.0 ; the floating-point number one in default precision
1.0e0 ; another way to write the same floating-point number
1.0d0 ; the floating-point number one in "double" precision
1.0e-4 ; the floating-point equivalent to one-ten-thousandth
+42 ; the integer forty-two
-42 ; the integer negative forty-two
-1/4 ; the ratio negative one-quarter
-2/8 ; another way to write negative one-quarter
246/2 ; another way to write the integer one hundred twenty-three
These different forms represent different kinds of numbers: integers, ratios, and floating point. Lisp also supports complex numbers, which have their own notation and which I'll discuss in Chapter 10.
As some of these examples suggest, you can notate the same number in many ways. But regardless of how you write them, all rationals—integers and ratios—are represented internally in "simplified" form. In other words, the objects that represent -2/8 or 246/2 aren't distinct from the objects that represent -1/4 and 123. Similarly, 1.0
and 1.0e0
are just different ways of writing the same number. On the other hand, 1.0
, 1.0d0
, and 1
can all denote different objects because the different floating-point representations and integers are different types. We'll save the details about the characteristics of different kinds of numbers for Chapter 10.
Strings literals, as you saw in the previous chapter, are enclosed in double quotes. Within a string a backslash (\
) escapes the next character, causing it to be included in the string regardless of what it is. The only two characters that must be escaped within a string are double quotes and the backslash itself. All other characters can be included in a string literal without escaping, regardless of their meaning outside a string. Some example string literals are as follows:
37
Lisp implementers, like implementers of any language, have many ways they can implement an evaluator, ranging from a "pure" interpreter that interprets the objects given to the evaluator directly to a compiler that translates the objects into machine code that it then runs. In the middle are implementations that compile the input into an intermediate form such as bytecodes for a virtual machine and then interprets the bytecodes. Most Common Lisp implementations these days use some form of compilation even when evaluating code at run time.
38
Sometimes the phrase
39
Not all Lisp objects can be written out in a way that can be read back in. But anything you can READ
can be printed back out "readably" with PRINT
.