2013-09-27-bcfd3b8
March 31st, 2016 by john.warden@gmail.com

The Problem

Why were keywords and vectors added to LISP, a language that was already based on symbols and lists?

The answer: because keywords and vectors are self-evaluating. Like string and number literals, they are what they are. Unlike symbols and lists, they are never treated as code that is replaced with some other value when evaluated.

You might argue that if you don’t want to symbol or a list to be treated as code, just don’t eval it! But this gets complicated if you are evaluating code that has non-code symbolic data embedded in it.

For example, it’s clear that the code below prints “2”.


> (def function 'print)
> (def arg 2)
> (eval (list function arg))
; prints 2

But what about this?


> (def function 'print)
> (def arg '(red blue))
> (eval (list function arg))
; ERROR: red is not defined

Even though we used the quote operator when we first defined arg to make sure the value was treated as a list of colors and not code, this value gets evaluated again during the evaluation of the generated code.

It can be hard to keep track of exactly when symbols and lists might be evaluated and thus need to be quoted. This is one reason LISP programmers prefer vectors and keywords for symbolic data that is not code. If data is not meant to be interpreted as code, it’s better if it can’t be interpreted as code — just as it’s better if the number 2 can’t be interpreted as anything but the number 2. You can eval vectors, keywords, and numbers all day long and they will never change.


> (def function 'print)
> (def arg [:red :blue])
> (eval (list 'print arg))
; prints [:red :blue]

There is more good discussion about why we use keywords as symbols in the Arcane Sentiment blog.

An Alternative for Self-Evaluating Data

So a way to embed self-evaluating, non-code symbolic data literals inside of LISP code is valuable. Keywords and vectors make this possible.

But another way of meeting this need would be to make symbols and lists self-evaluating by default! We could then have a way to deliberately mark expressions that are meant to be evaluated as code.

Below I explore how this might work in a variant of LISP.

: as Quote Operator

First, let’s make : the quote operator, instead of ', since LISP programmers are used to thinking of : as meaning “self-evaluating”.


> :(red blue)
(red blue)

If a single symbol is quoted, it looks like a keyword. But the : is not part of the value as it is in common LISP — it’s just the quote operator.


> :blue
blue

But since quoted data is self-evaluating by default, a symbol evaluates to itself.


> (eval :blue)
blue

' for Aliases

Second, let’s continue to use ' to quote expressions that are meant to be interpreted as code, since LISP programmers are used to that.

But instead of quoting a symbol, we can make ' part of the symbol. Let’s call symbols that start with ' aliases.


> 'x
'x

So now we have inverted the usage of : and '. In common LISP, : is a prefix attached to keywords, and ' is the quote operator. In this new LISP, : is the quote operator, and ' is a prefix attached to aliases.

Quoting Code

So let’s say aliases are evaluated in the same way as symbols are evaluated in common lisp — including rules for special forms. So to quote code, you’d need to replace every symbol in a common LISP expression with an explicit alias in this new LISP.


> eval ('let (['x 4]) ('sqrt 'x))
2

This is inconvenient. So let’s say placing ' in front of an expression is a syntactic shortcut for marking all symbols in the expression as aliases.


> '(let ([x 4]) (sqrt x))
'(let ([x 4]) (sqrt x))

' is actually part of the data, marking the entire expression as code.

The equivalent self-evaluating data structure would not include the '.


> :(let ([x 4]) (sqrt x))
(let ([x 4]) (sqrt x))

So the ' operator can still be used just as it is used common LISP, for quoting symbolic data that represents code. But now we we have the : operator for quoting self-evaluating symbolic data.

Self-Evaluating Lists

Now, common LISP assumes that lists represent function calls when evaluated. So the head of the list has to evaluate to a function, or we get an error.

> ("sqrt" 2)
ERROR: "sqrt" is not a function

Let’s say that in this new LISP, if the head of the list evaluates to a symbol, there is no error — the list is just left as a list.

> (:green "SUCCESS")
(green "SUCCESS")

Another way of looking at this is to say that a symbols acts like a value constructor that returns a list with itself as the head and its arguments as the tail.

Here’s another example of symbols as value constructors:


> (defun makeSquare (s) (:rect s s))
> (makeSquare 2)
(rect 2 2)

Back to Original Example

Now going back to the example at the beginning of this post, we can insert symbolic data in code and eval that code, and we get the expected the result.


> (def function 'print)
> (def arg :(red blue))
> (eval (list function arg))
; prints (red blue)

So we constructed a list where the head is an alias to a function, and the argument is a list of symbols. The argument self-evaluates to a list of colors, because its head is a self evaluating symbol. But, the overall expression evaluates as a function call, because the head is an alias that evaluates to the function it is bound to.

Summary of : and ' Quotes.

  • : means quote
  • ' means mark all symbols as aliases
  • symbols are self-evaluating
  • aliases are evaluated as symbols are in common LISP
  • lists are evaluated as:
    • function-applications if the head evaluates to a function
    • lists if the head evaluates to a symbol
    • errors if the head evaluates to anything else

Lists for Trees, Vectors for Lists

It’s common for the head of a list to be a symbol that defines the semantics of the list, whether the list is used to represent code…

'(sqrt 2)

…or self-evaluating symbolic data where the head is a symbol indicating the data type.

:(rect 2 2)

This is called “tagged data” in SICP. In the case of both tagged data and code, lists function like trees, with a root that indicates the type or behavior of that node, and zero or more subtrees.

Contrast this to a list where the first item doesn’t have any special significance: it just happens to be first.

[:red :blue]

LISP programmers are more likely to use vectors in this latter case, partly because LISP will never try to eval a vector as a function-application, and partly because of the syntactic similarity of vectors to lists in other languages.

I’d argue that in LISP, lists have proved themselves most appropriate for representing tree-structured data, either code or tagged data, where the head of the list is a symbol or expression acting as the root of the tree. Vectors on the other hand have proved themselves most useful for representing, well, lists.

This code snippet (using the alternative LISP we just defined) illustrates the use of lists for code, lists for self-evaluating tagged-data, and vectors for lists of things.

(render [(:rect 3 2) (:rect 8 10)])

Nesting :– and '-quotes

Naturally, you can nest '-quoted code:

> (eval '(let ([f 'print]) (f 2))) 
; prints red

And you can nest self-evaluating data literals (:-quotes) inside of '-quoted code:

> (eval '[ (square 2), (:square 2) ])
[4, (square 2)]

And there’s no reason you shouldn’t also be able to nest code inside of self-evaluating data literals:

> (eval :[ ('square 2), (square 2) ])
[4, (square 2)]

The two expressions above produce identical results, but in the first case, we assume code and quote the part of the expression that we don’t want to be treated as code, while in the second case, we assume self-evaluating data and quote the part of the expression that we do want to be treated as code.

So just as in common LISP, any data can be evaluated. Data might be purely self-evaluating, or pure code, or code with some nested self-evaluating data literals, or self-evaluating data with nested code. When data is evaluated, aliases will be replaced with whatever they are bound to, and any lists where the head has evaluated to a function will be replaced with the result of applying that function — all with the exception of special forms.

Summary

So we have inverted the default assumptions about evaluation of symbols and lists. Instead of treating symbols as aliases for other values by default, and using : to mark exceptions, we treat symbols as self-evaluating by default, and use ' to mark exceptions. And instead of treating lists as function applications by default, we treat lists as self-evaluating if the head evaluates to a symbol, and only treat lists as function applications if the head evaluates to a function.

Code is data, but not all data is code. When LISP was created, it was intended that symbols and lists be used to represent all sorts of symbolic expressions — not just code. But the requirement for self-evaluating non-code data literals that can be included as literals in code, led to the introduction of keywords and vectors as self-evaluating supplements to symbols and lists.

With this new variant of LISP, we go back to using symbols and lists in self-evaluating data, and introduce aliases as non-self-evaluating symbols for use in code. We also allow both : and ' to be used for quoting expressions, letting programmers make higher-level decisions on whether quoted data is code or not, freeing them from the granular, case-by-case decisions about whether to use a keyword or a symbol.

Finally, making symbols act as value constructors, creates a natural way to use lists either as trees representing tagged data or trees representing code, while leaving vectors to play the more natural role of listing things.

Posted in Programming Language Design Tagged with: , ,