ArtsAutosBooksBusinessEducationEntertainmentFamilyFashionFoodGamesGenderHealthHolidaysHomeHubPagesPersonal FinancePetsPoliticsReligionSportsTechnologyTravel

Don't use XML/JSON for Clojure-only persistence/messaging

Updated on March 25, 2011
Source

XML is a popular, generic data-storage format, mainly because its plain-text encoding and mandatory matching close-tags make it easy for humans to read. But it's not especially easy for humans to edit, because it is so dang verbose: there is a lot of redundancy in the format. And there are a wealth of libraries available, especially in Java, for creating, parsing, and otherwise manipulating XML. So if you're getting started with Clojure, and need to store or transfer some data across a JVM boundary (eg to disk, or to another Clojure program over a socket), you can hardly be blamed for deciding that XML is a perfect fit. Just grab c.c.prxml for writing, and c.zip for parsing, and have a blast, right?

Easier than in Java

And it's true, this would not be so bad. It certainly requires less work than doing the same thing in Java. If you need to share your data with some other programming language, it might even be the best choice (although JSON may well be better depending on the circumstances). But Clojure is a Lisp , which means that Clojure data structures are even easier to parse than XML or JSON, and that the functionality for doing so is built right into the language without depending on a secondary library.

Writing with pr-str and spit

clojure.core has a built-in pr function, which will write any built-in Clojure data structure (as deeply nested as you like, of course) to a machine-readable format. You can either bind *out* to a writer on the file you want to write to, and then simply pr your data structure, or you can use pr-str to get the output in a string and then use spit to send the output to a file.

For ultra-high fidelity, you can bind *print-dup* to true while printing, and Clojure will make an extra effort to make sure the output is machine-readable. I don't recommend this approach generally though, because it leads to less legible files, and is rarely necessary.

Reading with slurp, read, and read-string

The inverse of spit is slurp: it reads an entire file into a single string. You can then use read-string to turn those characters into a meaningful Clojure data structure. Or you can bypass the string phase entirely by binding *in* to a reader on the file, and calling read directly. For real config-file power, you can even choose to eval the data structure that was read in, effectively giving your config file language the full power of Clojure, allowing users to use let blocks to avoid duplicated blocks, for expressions to generate repetitive config settings, and so on.

Putting it together: an example

user=> (def config {:size 10, 
                    :name 'dave, 
                    :friends ['charles 'nancy]})
#'user/config
user=> (spit "config.clj" config)
nil
user=> (slurp "config.clj")
"{:size 10, :name dave, :friends [charles nancy]}"
user=> (def config2 (read-string (slurp "config.clj")))
#'user/config2
user=> config2
{:size 10, :name dave, :friends [charles nancy]}
user=> (= config config2)
true

Advanced topics: the print-dup multimethod

The idea behind print-dup is to write values as forms that can reliably be read back in to produce identical values. This is useful for values that can't be conveniently expressed as a list, vector, or hash-map. How do you print a readable version of a java.util.Date? You can't just write out "(Date. 1 2 3)", because that's a list of four elements, not an actual Date object; the code won't be evaluated when read in by read-string.

Clojure provides a hook into the reader to allow arbitrary objects to be effectively embedded in code as if they were literals: a form that looks like #=(some code) will not be returned from read as a list; instead, the code will be executed, and a reference to the resultant object will be included in the data structure returned by read. If you need to disable this for security reasons (it does allow code injection anytime you call read), you can bind *read-eval* to false when you are reading.

Many of Clojure's built-in data types can automatically print themselves readably, if you bind *print-dup* to ask them to. For example, Namespace objects:

user> (pr [1 2 *ns*])
[1 2 #<Namespace user>]nil
user> (read-string (with-out-str (pr [1 2 *ns*])))
; Evaluation aborted.
user> (binding [*print-dup* true] (pr [1 2 *ns*]))
[1 2 #=(find-ns user)]nil
user> (read-string (with-out-str (binding [*print-dup* true] 
                                   (pr [1 2 *ns*]))))
[1 2 #<Namespace user>]

Rolling your own

But java.util.Date objects don't know anything about Clojure, and can't be expected to print themselves in a way that its reader can understand. Enter the print-dup multimethod: print-dup is defined in clojure.core: if you add an implementation of this multimethod for the class you're interested in, then it will be called when someone asks for a readable printed version of them:

user> (import java.util.Date)
java.util.Date
user> (pr-str (Date. 1 2 3))
"#<Date Sun Mar 03 00:00:00 PST 1901>"
user> (read-string (pr-str (Date. 1 2 3)))
java.lang.Exception: Unreadable form
user> (binding [*print-dup* true] (pr-str (java.util.Date. 1 2 3)))
java.lang.IllegalArgumentException: No method in multimethod 'print-dup' for dispatch value: class java.util.Date
user> (defmethod print-dup Date [d out]
        (.write out
                (str "#="
                     `(Date. ~(.getYear d)
                             ~(.getMonth d)
                             ~(.getDate d)))))
#<MultiFn clojure.lang.MultiFn@18f2e35>
user> (binding [*print-dup* true] (read-string (pr-str (Date. 1 2 3))))
#<Date Sun Mar 03 00:00:00 PST 1901>

Comments

    0 of 8192 characters used
    Post Comment

    • profile image

      Bertrand Dechoux 

      6 years ago

      Great! I was looking for information about how to store clojure code/data and all the pointers are there. Thanks a lot.

    • amalloy profile imageAUTHOR

      amalloy 

      7 years ago from Los Angeles

      I read something on the Clojure google group (http://groups.google.com/group/clojure) a couple months back about someone who had done this, in a way that was flexible enough to serialize/persist even closures. I can't find the thread now, but it may have been technomancy (Phil Hagelberg) who did it, so maybe give him a buzz.

      Edit: found it! GG's search function really sucks.

      Basically he defines a new (serializable-function) macro that expands into a call to (fn) but attaches ::source and ::env metadata, and uses those to implement print-dup.

      https://github.com/Seajure/serializable-fn/blob/ma...

    • profile image

      Paul Dorman 

      7 years ago

      A great coincidence that I was thinking of how to directly persist clojure code earlier today, for the umpteenth time since starting to learn the language, and here you are, describing one way you approach it.

      I wonder if it would be practical to extend what you've described here to implement a network protocol for clojure for the purpose of passing functions between remote clojure programs, which could be compiled and executed with few intervening steps. My naïve idea is that fewer transformations would make for higher performance, though I'm completely prepared for this logic to be proved false (or at least how I'm thinking about it).

      Thanks for the informative article!

      Paul

    • profile image

      Alexander Yakushev 

      7 years ago

      That's the answer I would like to hear. You are right that the abuse of eval done by local user is either his own fault or just not so dangerous. This is just my Common Lisp anti-eval reflex, can't help it :) .

    • amalloy profile imageAUTHOR

      amalloy 

      7 years ago from Los Angeles

      I see your point (and considered it, of course). Most of the time, the only person your user can harm by putting malicious code in their own config file is themselves, so this isn't an issue. If you have a high-security application, you usually will be looking very closely at all uses of eval anyway, and this one would stand out. But to be honest I have a hard time imagining a Clojure application where the user can trick you into doing something for him that he couldn't do himself anyway. Just don't run any of your programs as setuid root (ha, ha) and that's usually enough.

    • profile image

      Alexander Yakushev 

      7 years ago

      Nice entry! Storing data as s-expressions back in Common Lisp times was very popular, now with Clojure you can get access even to more sophisticated data structures such as maps.

      The only thing that I quite don't agree with is the idea of eval'ing the config file. After all, with the power of eval comes the responsibility and you can never be sure that user hadn't put some malicious code inside his config. Unless you do some explicit checks and minimize the possibility of this, of course.

    • Simone Smith profile image

      Simone Haruko Smith 

      7 years ago from San Francisco

      I shall do so!

    • amalloy profile imageAUTHOR

      amalloy 

      7 years ago from Los Angeles

      Thanks, Simone! Basically Clojure is one of the programming languages that is kinda "trendy" at the moment - I do most of my away-from-work programming in it.

      As it happens one of my first Hubs was called "What is Clojure": it's probably still too technical for someone who isn't a programmer, but it's more introductory than this one, if you want to take a gander.

    • Simone Smith profile image

      Simone Haruko Smith 

      7 years ago from San Francisco

      Wow, very cool! To be honest, I had not even heard of Clojure before reading this Hub. Though I'm afraid much of it went over my head, I really appreciate your explanation and love the highlighted code!

    working

    This website uses cookies

    As a user in the EEA, your approval is needed on a few things. To provide a better website experience, hubpages.com uses cookies (and other similar technologies) and may collect, process, and share personal data. Please choose which areas of our service you consent to our doing so.

    For more information on managing or withdrawing consents and how we handle data, visit our Privacy Policy at: https://hubpages.com/privacy-policy#gdpr

    Show Details
    Necessary
    HubPages Device IDThis is used to identify particular browsers or devices when the access the service, and is used for security reasons.
    LoginThis is necessary to sign in to the HubPages Service.
    Google RecaptchaThis is used to prevent bots and spam. (Privacy Policy)
    AkismetThis is used to detect comment spam. (Privacy Policy)
    HubPages Google AnalyticsThis is used to provide data on traffic to our website, all personally identifyable data is anonymized. (Privacy Policy)
    HubPages Traffic PixelThis is used to collect data on traffic to articles and other pages on our site. Unless you are signed in to a HubPages account, all personally identifiable information is anonymized.
    Amazon Web ServicesThis is a cloud services platform that we used to host our service. (Privacy Policy)
    CloudflareThis is a cloud CDN service that we use to efficiently deliver files required for our service to operate such as javascript, cascading style sheets, images, and videos. (Privacy Policy)
    Google Hosted LibrariesJavascript software libraries such as jQuery are loaded at endpoints on the googleapis.com or gstatic.com domains, for performance and efficiency reasons. (Privacy Policy)
    Features
    Google Custom SearchThis is feature allows you to search the site. (Privacy Policy)
    Google MapsSome articles have Google Maps embedded in them. (Privacy Policy)
    Google ChartsThis is used to display charts and graphs on articles and the author center. (Privacy Policy)
    Google AdSense Host APIThis service allows you to sign up for or associate a Google AdSense account with HubPages, so that you can earn money from ads on your articles. No data is shared unless you engage with this feature. (Privacy Policy)
    Google YouTubeSome articles have YouTube videos embedded in them. (Privacy Policy)
    VimeoSome articles have Vimeo videos embedded in them. (Privacy Policy)
    PaypalThis is used for a registered author who enrolls in the HubPages Earnings program and requests to be paid via PayPal. No data is shared with Paypal unless you engage with this feature. (Privacy Policy)
    Facebook LoginYou can use this to streamline signing up for, or signing in to your Hubpages account. No data is shared with Facebook unless you engage with this feature. (Privacy Policy)
    MavenThis supports the Maven widget and search functionality. (Privacy Policy)
    Marketing
    Google AdSenseThis is an ad network. (Privacy Policy)
    Google DoubleClickGoogle provides ad serving technology and runs an ad network. (Privacy Policy)
    Index ExchangeThis is an ad network. (Privacy Policy)
    SovrnThis is an ad network. (Privacy Policy)
    Facebook AdsThis is an ad network. (Privacy Policy)
    Amazon Unified Ad MarketplaceThis is an ad network. (Privacy Policy)
    AppNexusThis is an ad network. (Privacy Policy)
    OpenxThis is an ad network. (Privacy Policy)
    Rubicon ProjectThis is an ad network. (Privacy Policy)
    TripleLiftThis is an ad network. (Privacy Policy)
    Say MediaWe partner with Say Media to deliver ad campaigns on our sites. (Privacy Policy)
    Remarketing PixelsWe may use remarketing pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to advertise the HubPages Service to people that have visited our sites.
    Conversion Tracking PixelsWe may use conversion tracking pixels from advertising networks such as Google AdWords, Bing Ads, and Facebook in order to identify when an advertisement has successfully resulted in the desired action, such as signing up for the HubPages Service or publishing an article on the HubPages Service.
    Statistics
    Author Google AnalyticsThis is used to provide traffic data and reports to the authors of articles on the HubPages Service. (Privacy Policy)
    ComscoreComScore is a media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers. Non-consent will result in ComScore only processing obfuscated personal data. (Privacy Policy)
    Amazon Tracking PixelSome articles display amazon products as part of the Amazon Affiliate program, this pixel provides traffic statistics for those products (Privacy Policy)