Unhygienic ("anaphoric") Clojure macros for fun and profit

"Anaphoric"? What a name!

Before I get into describing how I've found anaphoric macros handy, we need some background. Anaphoric is a word borrowed from natural-language grammarians, referring basically to pronouns: "using a pronoun or similar word instead of repeating a word used earlier".

In programming languages, anaphoric means basically the same thing: for example, Perl's $_ is a perfect example of a pronoun. The Perl manual on this topic calls $_ a "topicalizer", and gives the English example "Carrots, I hate 'em!"

In languages from the lisp family, anaphoric is sometimes used a little more generally, to mean (roughly) "a macro that introduces, or uses, symbols it does not explicitly 'own', which might conflict with other bindings of the same name".

If you have experience with writing Common Lisp macros, that definition may frighten you: it looks like symbol capture! And it's true, unintentional symbol capture is the source of some very subtle, tricky bugs, and it's good survival instincts to react strongly to symbol capture. On the other hand, Clojure makes it very difficult to capture symbols unintentionally (thanks!), and well-documented intentional symbol capture can lead to some very elegant and convenient macros. Here's an example in Clojure of a class of macros Paul Graham explains in On Lisp:

(defmacro anarithmetic [& exprs]
  (reduce (fn [rest expr]
            `(let ~['it expr]
               ~rest))
          'it
          (reverse exprs)))

(anarithmetic (+ 10 20) (* 2 it))
;; expands to
(let [it (+ 10 20)] 
  (let [it (* 2 it)] 
    it))

This is a pretty basic example, but it shows how you can use an anaphoric macro to express the same code in a different way, by introducing the symbol "it" into your code to implicitly represent intermediate results. The resulting code is the same as (* 2 (+ 10 20)). Note that the Common Lisp version needs to be more careful about not sharing bindings between different levels of let's, because you might use setf at some point; Clojure's immutable data types make this an issue not really worth worrying about.

That was just a toy

In real life I don't have any reason to use anarithmetic: Clojure's -> and ->> operators make for an even more readable solution, in my opinion. But I have recently found several reasons to introduce symbol capture into my macros to make them more powerful. My first example might be considered "hacky", rather than "elegant": I'm okay with that, because it still reduces the number of repeated ideas in my code, and I don't plan to expose it as a public API.

Here's the problem I wanted to solve: I wanted to write several defmethods that were similar, and there's no easy way to do that without writing a macro that expands into one or more defmethods, and then calling it. Note that this example uses my macro-do templating utility: it defines an anonymous macro, and then immediately calls it on every argument supplied. If you find this hinders readability, go check out the source of macro-do.

Target expansion code

(defmethod permute clojure.lang.IPersistentMap
  [swaps orig]
  (reduce (fn [acc [a b]]
            (assoc acc
              a (orig b)
              b (orig a)))
          orig
          swaps))

(defmethod permute clojure.lang.IPersistentVector
  [swaps orig]
  (reduce (fn [acc [a b]]
            (assoc acc b (orig a)))
          orig
          swaps))

There's a lot of duplicated code there: in fact, the only differences are in the class of the defmethod, and the body of the reduce function. But if I wanted to write a purely hygenic macro to generalize this boilerplate, it would have to take a, b, and acc as arguments, which is a lot of wasted typing. Really the "cleanest" way to do this would be to have it take the whole (fn [...] ...) part as an argument, but that's half of the boilerplate!

Instead, I just have my macro introduce a, b, and acc as bindings in the expansion, and splice in the expression as part of the reduction function. If someone wants to use different names for their variables, tough. I'm only writing this macro to simplify writing several methods that all do a basically pairwise reduction of a vector; when I want to do something not pairwise, I'll just write the defmethods by hand.

Making it happen

(macro-do
 [class how]
 `(defmethod permute ~class [swaps# ~'orig]
             (reduce (fn ~'[acc [a b]]
                       ~how)
                     ~'orig swaps#))

 clojure.lang.IPersistentMap
 (assoc acc
   a (orig b)
   b (orig a))

 clojure.lang.IPersistentVector
 (assoc acc b (orig a)))

Because I know exactly what context the expressions will be spliced into, I'm able to save such a large portion of the the coding in each defmethod that the macro "pays for itself" even with only two uses.

What about a less-gross example?

Okay, introducing bindings in that last example definitely shrank the code, but it's not exactly documented how a, b, and acc behave. If someone other than me wanted to use this macro (not that they can - I made it anonymous to save them from themselves!) they'd have to do some studying to find out what all the bindings "mean".

Even in the example from On Lisp, "it" feels a little un-natural: in real life the macro would be well-documented, but using it in your code is bound to feel a little awkward even so. Is there a real-world example that feels elegant instead of ugly?

A digression: lazy-seq

Clojure loves lazy sequences. Everything is lazy: map, iterate, filter, concat - these all produce sequences that don't do any computations until the caller requests an item from the sequence. They do this with the magic of the lazy-seq macro, but there are so many lazy sequence functions available that when you need to produce lazy sequences of your own, you can usually do so by combining existing functions, rather than using lazy-seq yourself.

However, as I discovered recently, often it's simpler and less error-prone to use lazy-seq anyway, when the existing functions would require a lot of glue to squish into the shape you want. So twice recently I've written some code that looks like this:

An anaphoric macro: lazy-loop

(defn unfold [next done? seed]
  ((fn unfold* [seed]
     (lazy-seq
      (when-not (done? seed)
      (let [[value new-seed] (next seed)]
        (cons value
              (unfold* new-seed))))))
   seed))

Aside: some prefer to use letfn to declare the local function, and then call it by name; I'd rather give it a name only for purposes of self-reference, and call it at the moment I define it. That's what my ((fn name [arg] ...) actual-arg) syntax does.

Anyway, declaring the internal function, wrapping everything up with lazy-seq, and then calling that function with the initial arguments is kinda repetitive. Can I automate it with a macro? Certainly, but why should I have to specify a name for the internal function? It would be much nicer if it were defined with anaphoric magic:

(defn unfold [next done? seed]
  (lazy-loop [seed seed]
    (when-not (done? seed)
      (let [[value new-seed] (next seed)]
        (cons value
              (lazy-recur new-seed))))))

This is only two lines shorter, but the lines are much simpler to read. lazy-loop and lazy-recur behave like the built-in loop/recur, so the reader immediately grasps what they're doing; with my original code, you have to figure out what unfold*'s place in all this is, and the seed argument hanging at the bottom looks out of place: it's not immediately obvious that this is being passed as the argument to unfold*. With lazy-loop and lazy-recur, control flows naturally from the top of the function to the bottom, with no need to figure out what seed belongs to: it's clearly the initial value for the lazy-loop binding, just as in (loop [seed seed] ...).

So how complicated is the macro that enables this? It's trivial!

(defmacro lazy-loop
  [bindings & body]
  (let [inner-fn 'lazy-recur
        names (take-nth 2 bindings)
        values (take-nth 2 (rest bindings))]
    `((fn ~inner-fn
        ~(vec names)
        (lazy-seq
          ~@body))
      ~@values)))

It really looks like just a template for the original version, except with some logic for peeling apart loop bindings into names and values, since fn wants the names in its parameter list, and the values in the call to it. I'm not very happy with my repetition of take-nth: this code will probably see some improvement over time. But it's still pretty easy to read, and it's clear that it's introducing a symbol called "lazy-recur". In the real version I have explicitly documented this (and in fact the documentation is about as long as the code), but this Hub is focusing on the code that makes it happen.

Go forth and do it

I'm definitely glad that Clojure makes it hard to introduce symbol capture accidentally, but doing it explicitly has proved to have more benefits than I'd expected. Next time you're writing some repetitive code, and you inevitably realize that it should be split up into a separate macro or function, ask yourself if it would be more convenient or more readable to have the macro "magically" introduce some bindings for you as well. If so, give it a try - you may find symbol capture isn't such a bad thing after all!

More by this Author


Comments

No comments yet.

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working