Don't use XML/JSON for Clojure-only persistence/messaging

Source

XML is a popular, generic data-storage format, mainly because its plain-text encoding and mandatory matching close-tags make it easy for humans to read. But it's not especially easy for humans to edit, because it is so dang verbose: there is a lot of redundancy in the format. And there are a wealth of libraries available, especially in Java, for creating, parsing, and otherwise manipulating XML. So if you're getting started with Clojure, and need to store or transfer some data across a JVM boundary (eg to disk, or to another Clojure program over a socket), you can hardly be blamed for deciding that XML is a perfect fit. Just grab c.c.prxml for writing, and c.zip for parsing, and have a blast, right?

Easier than in Java

And it's true, this would not be so bad. It certainly requires less work than doing the same thing in Java. If you need to share your data with some other programming language, it might even be the best choice (although JSON may well be better depending on the circumstances). But Clojure is a Lisp , which means that Clojure data structures are even easier to parse than XML or JSON, and that the functionality for doing so is built right into the language without depending on a secondary library.

Writing with pr-str and spit

clojure.core has a built-in pr function, which will write any built-in Clojure data structure (as deeply nested as you like, of course) to a machine-readable format. You can either bind *out* to a writer on the file you want to write to, and then simply pr your data structure, or you can use pr-str to get the output in a string and then use spit to send the output to a file.

For ultra-high fidelity, you can bind *print-dup* to true while printing, and Clojure will make an extra effort to make sure the output is machine-readable. I don't recommend this approach generally though, because it leads to less legible files, and is rarely necessary.

Reading with slurp, read, and read-string

The inverse of spit is slurp: it reads an entire file into a single string. You can then use read-string to turn those characters into a meaningful Clojure data structure. Or you can bypass the string phase entirely by binding *in* to a reader on the file, and calling read directly. For real config-file power, you can even choose to eval the data structure that was read in, effectively giving your config file language the full power of Clojure, allowing users to use let blocks to avoid duplicated blocks, for expressions to generate repetitive config settings, and so on.

Putting it together: an example

user=> (def config {:size 10, 
                    :name 'dave, 
                    :friends ['charles 'nancy]})
#'user/config
user=> (spit "config.clj" config)
nil
user=> (slurp "config.clj")
"{:size 10, :name dave, :friends [charles nancy]}"
user=> (def config2 (read-string (slurp "config.clj")))
#'user/config2
user=> config2
{:size 10, :name dave, :friends [charles nancy]}
user=> (= config config2)
true

Advanced topics: the print-dup multimethod

The idea behind print-dup is to write values as forms that can reliably be read back in to produce identical values. This is useful for values that can't be conveniently expressed as a list, vector, or hash-map. How do you print a readable version of a java.util.Date? You can't just write out "(Date. 1 2 3)", because that's a list of four elements, not an actual Date object; the code won't be evaluated when read in by read-string.

Clojure provides a hook into the reader to allow arbitrary objects to be effectively embedded in code as if they were literals: a form that looks like #=(some code) will not be returned from read as a list; instead, the code will be executed, and a reference to the resultant object will be included in the data structure returned by read. If you need to disable this for security reasons (it does allow code injection anytime you call read), you can bind *read-eval* to false when you are reading.

Many of Clojure's built-in data types can automatically print themselves readably, if you bind *print-dup* to ask them to. For example, Namespace objects:

user> (pr [1 2 *ns*])
[1 2 #<Namespace user>]nil
user> (read-string (with-out-str (pr [1 2 *ns*])))
; Evaluation aborted.
user> (binding [*print-dup* true] (pr [1 2 *ns*]))
[1 2 #=(find-ns user)]nil
user> (read-string (with-out-str (binding [*print-dup* true] 
                                   (pr [1 2 *ns*]))))
[1 2 #<Namespace user>]

Rolling your own

But java.util.Date objects don't know anything about Clojure, and can't be expected to print themselves in a way that its reader can understand. Enter the print-dup multimethod: print-dup is defined in clojure.core: if you add an implementation of this multimethod for the class you're interested in, then it will be called when someone asks for a readable printed version of them:

user> (import java.util.Date)
java.util.Date
user> (pr-str (Date. 1 2 3))
"#<Date Sun Mar 03 00:00:00 PST 1901>"
user> (read-string (pr-str (Date. 1 2 3)))
java.lang.Exception: Unreadable form
user> (binding [*print-dup* true] (pr-str (java.util.Date. 1 2 3)))
java.lang.IllegalArgumentException: No method in multimethod 'print-dup' for dispatch value: class java.util.Date
user> (defmethod print-dup Date [d out]
        (.write out
                (str "#="
                     `(Date. ~(.getYear d)
                             ~(.getMonth d)
                             ~(.getDate d)))))
#<MultiFn clojure.lang.MultiFn@18f2e35>
user> (binding [*print-dup* true] (read-string (pr-str (Date. 1 2 3))))
#<Date Sun Mar 03 00:00:00 PST 1901>

More by this Author


Comments 9 comments

Simone Smith profile image

Simone Smith 5 years ago from San Francisco

Wow, very cool! To be honest, I had not even heard of Clojure before reading this Hub. Though I'm afraid much of it went over my head, I really appreciate your explanation and love the highlighted code!


amalloy profile image

amalloy 5 years ago from Los Angeles Author

Thanks, Simone! Basically Clojure is one of the programming languages that is kinda "trendy" at the moment - I do most of my away-from-work programming in it.

As it happens one of my first Hubs was called "What is Clojure": it's probably still too technical for someone who isn't a programmer, but it's more introductory than this one, if you want to take a gander.


Simone Smith profile image

Simone Smith 5 years ago from San Francisco

I shall do so!


Alexander Yakushev 5 years ago

Nice entry! Storing data as s-expressions back in Common Lisp times was very popular, now with Clojure you can get access even to more sophisticated data structures such as maps.

The only thing that I quite don't agree with is the idea of eval'ing the config file. After all, with the power of eval comes the responsibility and you can never be sure that user hadn't put some malicious code inside his config. Unless you do some explicit checks and minimize the possibility of this, of course.


amalloy profile image

amalloy 5 years ago from Los Angeles Author

I see your point (and considered it, of course). Most of the time, the only person your user can harm by putting malicious code in their own config file is themselves, so this isn't an issue. If you have a high-security application, you usually will be looking very closely at all uses of eval anyway, and this one would stand out. But to be honest I have a hard time imagining a Clojure application where the user can trick you into doing something for him that he couldn't do himself anyway. Just don't run any of your programs as setuid root (ha, ha) and that's usually enough.


Alexander Yakushev 5 years ago

That's the answer I would like to hear. You are right that the abuse of eval done by local user is either his own fault or just not so dangerous. This is just my Common Lisp anti-eval reflex, can't help it :) .


Paul Dorman 5 years ago

A great coincidence that I was thinking of how to directly persist clojure code earlier today, for the umpteenth time since starting to learn the language, and here you are, describing one way you approach it.

I wonder if it would be practical to extend what you've described here to implement a network protocol for clojure for the purpose of passing functions between remote clojure programs, which could be compiled and executed with few intervening steps. My naïve idea is that fewer transformations would make for higher performance, though I'm completely prepared for this logic to be proved false (or at least how I'm thinking about it).

Thanks for the informative article!

Paul


amalloy profile image

amalloy 5 years ago from Los Angeles Author

I read something on the Clojure google group (http://groups.google.com/group/clojure) a couple months back about someone who had done this, in a way that was flexible enough to serialize/persist even closures. I can't find the thread now, but it may have been technomancy (Phil Hagelberg) who did it, so maybe give him a buzz.

Edit: found it! GG's search function really sucks.

Basically he defines a new (serializable-function) macro that expands into a call to (fn) but attaches ::source and ::env metadata, and uses those to implement print-dup.

https://github.com/Seajure/serializable-fn/blob/ma...


Bertrand Dechoux 4 years ago

Great! I was looking for information about how to store clojure code/data and all the pointers are there. Thanks a lot.

    Sign in or sign up and post using a HubPages Network account.

    0 of 8192 characters used
    Post Comment

    No HTML is allowed in comments, but URLs will be hyperlinked. Comments are not for promoting your articles or other sites.


    Click to Rate This Article
    working