Understanding homoiconicity through Clojure macros

code is data, data is code

Having been primarily involved with .Net languages in my career so far, homoiconicity was a new idea to me when I first encountered it in Clojure (and also later in Elixir).

If you look it up on wikipedia, you’ll find the usual wordy definition that vaguely makes sense.

“…homoiconicity is a property of some programming languages in which the program structure is similar to its syntax, and therefore the program’s internal representation can be inferred by reading the text’s layout…”

so, in short: code is data, data is code.

quote & eval

Take a simple example:

this line of code performs a temporary binding (binds x to the value 1) and then increments x to give the return value of 2.

So it’s code that can be executed and yields some data.

But equally, it can also be thought of as a list with three elements:

a symbol named let;
a vector with two elements — a symbol named x, and an integer;
a list with two elements — a symbol named inc, and a symbol named x.

quote

You can use the quote function to take some Clojure code and instead of evaluating it, return it as data.

sidebar: any F# developer reading this will notice the similarity to code quotations in F#, although the representation you get back is not nearly as easy to manipulate nor is there a built-in way to evaluate it. That said, you do have some options, including:

F# Quotation Evaluator, and
Vagabond, which is how MBrace executes your code in the cloud

eval

On the flip side, you have the eval function. It takes data and executes it as code.

After you have captured some executable code as data, you can also manipulate it before executing the transformed code. This is where macros come in.

Macros

clojure.test for instance, is a unit test framework written with macros. You can do a simple assertion using the ‘is’ macro:

and contrast this with the error message we get from, say, NUnit.

Isn’t it great that the failing expressions are printed out so you can straight away see what was wrong? It’s much more informative than the generic message we get from NUnit, which forces us to dig around and figure out which line of the test failed.

Update 23/05/2015:

As Vasily pointed out in the comments, there is an assertion library for F# called Unquote which uses F# code quotations (mentioned above) and produces user-friendly error messages similar to clojure.test. It goes to show that, even without macros, just being able to easily capture code as data structures in your language can enable many use cases — Phil Trelford’s Foq mocking library is another good example.

Building an assert-equals macro

As a process of discovery, let’s see how this can be done via macros.

Version 1

To start off, we will define the simplest macro that might work:

oops, so that last case didn’t work.

That’s because the actual and expected values passed into the macro are code, not the integer value 2.

Version 2

So what if we just throw an eval in there?

that works, right? right?

Well, not quite.

Instead of manipulating the data representing our code, we have evaluated them at compile time (macros runs at compile time).

You can verify this by using macroexpand:

so you can see that our macro has transformed the input code into the boolean value true and returned it as code.

Version 3

What we ought to do is return the code we want to execute as data, which we know how to do already — using the quote function. In the returned code, we also need to error when the assertion fails.

So starting with the code we want to execute given that:

well, we’d want to:

compare the evaluated values of actual and expected and throw an AssertionError if they are not equal
display the actual expression (inc 1) and expected expression (+ 0 1) in the error message
display the evaluated value for actual — 2

so something along the lines of the following, perhaps?

Now that we know our endgame we can work backwards to define our macro:

See the resemblance? The important thing to note here is that we have quoted the whole let block (via the ‘ shorthand). But in order to reference the actual and expected expressions and return them as they are, i.e. (inc 1), (+ 0 1) , we had to selectivelyunquote certain things using the ~ operator.

You can expand the macro and see that it’s semantically identical to the code that we wanted to output:

Before we move on, you might be wondering about some of the quote-unquote, unquote-quote actions going on here, so let’s spend a few moments to dwell into them.

Outputting the actual expression to be evaluated

Remember, the actual and expected arguments in our defmacro block are the quoted versions of (inc 1) and (+ 0 1).

We want to evaluate actual only once for efficiency, and in case it causes side effects. Which is why we need to evaluate it and bind the result to a symbol.

In order to generate the output code (let [actual-value (inc 1)] …) which will evaluate (inc 1) at runtime, we need to reference the actual expression in its quoted form, hence ~actual.

Note the difference in the expanded code if we don’t unquote actual.

without the ~, the generated code would look for a local variable called actual which will fail because it doesn’t exist.

Outputting the actual-value symbol

In order to output the actual-value symbol in the let binding we had to write ~’actual-value, that is, (unquote (quote actual-value)).

I know, right!? Took me a while to get my head around it too.

Q. Can we not just write ‘(let [actual-value ~actual] …) ?

A. No, because it’ll translate to (let [user/actual-value (inc 1)]…) which is not a valid let binding.

Q. Ok, how about ~actual-value?

A. No, because the macro won’t compile as we’ll be looking for a non-existent local variable actual-value inside the scope of defmacro.

Q. Ok.. or ‘actual-value?

A. No, because it’ll translate to (let [(quote actual-value) (inc 1)]…) which fails at runtime because that’s not a valid syntax for binding.

Q. So how does ~’actual-value work exactly?

A. The following:

(quote actual-value) to capture the symbol actual-value
unquote the symbol so that it appears as it is in the output code

Outputting the actual and expected expressions

Finally, when formulating the error message, we also saw ‘~actual and ‘~expected.

Here are the expanded code with and without the quote.

See the difference?

Without the quote, the generated code will have evaluated (inc 1) and printed FAIL in 2.

With the quote, it’d have printed FAIL in (inc 1) instead, which is what we want.

Rule of thumb

to capture a symbol, use ~’symbol-name
to reference an argument to the macro and generate code that will be evaluated at runtime, use ~arg-name
to reference an argument to the macro and generate code that quotes it at runtime, use ‘~arg-name

Finally, let’s test out our new macro.

Sweet! So that’s it?

Almost.

There’s a minor problem with our macro here — it’s not safe from name collisions on actual-value.

Version 4

If you see # at the end of a symbol then this is used to automatically generate a new symbol with a random name. This is useful in macros as it keeps the symbols declared in macros from leaking out.

So instead of using ~’actual-value in the let binding we might do the following instead:

When expanded, you can see the let binding is using a randomly generated symbol actual-value__16087__auto__:

Not only is this version safer, it’s also more readable without the mind-bending (unquote (quote actual-value)) business!

So there, a quick(-ish) introduction to homoiconicity and Clojure macros. Macros are a powerful tool to have in one’s toolbox, and allows you to extend the language in a very natural way as clojure.test does. I hope you find the idea interesting and I have done the topic justice and explained it clearly enough.

Feel free to let me know in the comments if anything’s not clear.