(defun parse-cons-form (sexp)
(if (consp (first sexp))
(parse-explicit-attributes-sexp sexp)
(parse-implicit-attributes-sexp sexp)))
(defun parse-explicit-attributes-sexp (sexp)
(destructuring-bind ((tag &rest attributes) &body body) sexp
(values tag attributes body)))
(defun parse-implicit-attributes-sexp (sexp)
(loop with tag = (first sexp)
for rest on (rest sexp) by #'cddr
while (and (keywordp (first rest)) (second rest))
when (second rest)
collect (first rest) into attributes and
collect (second rest) into attributes
end
finally (return (values tag attributes rest))))
Now that you have the basic language specified, you can think about how you're actually going to implement the language processors. How do you get from a series of FOO forms to the desired HTML? As I mentioned previously, you'll be implementing two language processors for FOO: an interpreter that walks a tree of FOO forms and emits the corresponding HTML directly and a compiler that walks a tree and translates it into Common Lisp code that'll emit the same HTML. Both the interpreter and compiler will be built on top of a common foundation of code, which provides support for things such as escaping reserved characters and generating nicely indented output, so it makes sense to start there.
Character Escaping
The first bit of the foundation you'll need to lay is the code that knows how to escape characters with a special meaning in HTML. There are three such characters, and they must not appear in the text of an element or in an attribute value; they are <
, >
, and &
. In element text or attribute values, these characters must be replaced with the character reference entities <
, >
;, and &
. Similarly, in attribute values, the quotation marks used to delimit the value must be escaped, '
with '
and "
with "
. Additionally, any character can be represented by a numeric character reference entity consisting of an ampersand, followed by a sharp sign, followed by the numeric code as a base 10 integer, and followed by a semicolon. These numeric escapes are sometimes used to embed non-ASCII characters in HTML.
The Package |
Since FOO is a low-level library, the package you develop it in doesn't rely on much external code—just the usual dependency on names from the
|
The following function accepts a single character and returns a string containing a character reference entity for that character:
(defun escape-char (char)
(case char
(#\& "&")
(#\< "<")
(#\> ">")
(#\' "'")
(#\" """)
(t (format nil "&#~d;" (char-code char)))))
You can use this function as the basis for a function, escape
, that takes a string and a sequence of characters and returns a copy of the first argument with all occurrences of the characters in the second argument replaced with the corresponding character entity returned by escape-char
.
(defun escape (in to-escape)
(flet ((needs-escape-p (char) (find char to-escape)))
(with-output-to-string (out)
(loop for start = 0 then (1+ pos)
for pos = (position-if #'needs-escape-p in :start start)
do (write-sequence in out :start start :end pos)
when pos do (write-sequence (escape-char (char in pos)) out)
while pos))))
You can also define two parameters: *element-escapes*
, which contains the characters you need to escape in normal element data, and *attribute-escapes*
, which contains the set of characters to be escaped in attribute values.
(defparameter *element-escapes* "<>&")
(defparameter *attribute-escapes* "<>&\"'")
Here are some examples:
HTML> (escape "foo & bar" *element-escapes*)
"foo & bar"
HTML> (escape "foo & 'bar'" *element-escapes*)
"foo & 'bar'"
HTML> (escape "foo & 'bar'" *attribute-escapes*)
"foo & 'bar'"
Finally, you'll need a variable, *escapes*
, that will be bound to the set of characters that need to be escaped. It's initially set to the value of *element-escapes*
, but when generating attributes, it will, as you'll see, be rebound to the value of *attribute-escapes*
.
(defvar *escapes* *element-escapes*)
Indenting Printer
To handle generating nicely indented output, you can define a class indenting-printer
, which wraps around an output stream, and functions that use an instance of that class to emit strings to the stream while keeping track of when it's at the beginning of the line. The class looks like this:
(defclass indenting-printer ()
((out :accessor out :initarg :out)
(beginning-of-line-p :accessor beginning-of-line-p :initform t)
(indentation :accessor indentation :initform 0)
(indenting-p :accessor indenting-p :initform t)))
The main function that operates on indenting-printer
s is emit
, which takes the printer and a string and emits the string to the printer's output stream, keeping track of when it emits a newline so it can reset the beginning-of-line-p
slot.
(defun emit (ip string)
(loop for start = 0 then (1+ pos)
for pos = (position #\Newline string :start start)
do (emit/no-newlines ip string :start start :end pos)
when pos do (emit-newline ip)
while pos))
To actually emit the string, it uses the function emit/no-newlines
, which emits any needed indentation, via the helper indent-if-necessary
, and then writes the string to the stream. This function can also be called directly by other code to emit a string that's known not to contain any newlines.