DevTalk.net

ActiveMesa's software development blog. MathSharp, R2P, X2C and more!

JSON Processing in F#

with one comment

Intro

The F# programming language is really fun to work with. It lets you improve your code almost infinitely, not only in terms of performance but also readability. Having finished reading through McConnell’s Code Complete I’ve decided to add some declarativeness to JSON processing code I developed for my little pet project Linq2vk. And since JSON is ubiquitous these days, I hope the technique described below might be of use to others as well.

Data

It is common for all modern online services to provide a public API with JSON as an option of data representation format and all the data we get in response from those services is either a huge JavaScript object or an array. Let me leave objects for further investigations and concentrate on arrays like the following one:

[123, "Maxim", "Moiseev",  "http://bitbucket.org/moiseev"]

This is an array (or list if you will) of simple objects, of which we are sure that it contains an integer for an Id and then three string values for first name, last name and homepage URL respectively. We are not going to actually parse JSON and represent it as a .NET list of tokens, it is better to use a great Json.NET library for that task. Our target is writing a function of type JToken[] -> Person, where JToken is a list of JSON tokens provided by Json.NET and Person is a simple user defined .NET class. In other words, we are going to deserialize JSON array into an instance of a .NET class.

Initial approach (naïve)

There are (as always) several ways of achieving this. The simplest one, which I have used until now is as follows (for simplicity I won’t define the Person class here, using a tuple instead):

let createPerson (ts: JToken[]) =
    let id = ts.[0] |> int
    let name = ts.[1] |> string
    let lastName = ts.[2] |> string
    let url = ts.[3] |> string
    (id, name, lastName, url)

Note that int and string here are type conversion functions, so we can use any other converter like for example this one:

let intBool (token:JToken) =
    0 <> (token |> int)

This approach works perfectly until, say, the URL becomes optional so that the source JavaScript list can sometimes contain 4 tokens and sometimes only 3.

let createPerson (ts: JToken[]) =
    let id = ts.[0] |> int
    let name = ts.[1] |> string
    let lastName = ts.[2] |> string
    let url =
        if ts.Length > 3 then
            ts.[3] |> string
        else
            String.empty
    (id, name, lastName, url)

Still, this works but looks rather ugly. Thanks to the if statement inside the almost declarative code. Let’s try to avoid ifs.

let optional<'T> (tokens:JToken[]) (index:int) convert =
    if index < tokens.Length then
        convert tokens.[index]
    else
        Unchecked.defaultof<'T>

This optional function lets us process our list of JTokens like this:

let createPerson (ts: JToken[]) =
    let id = ts.[0] |> int
    let name = ts.[1] |> string
    let lastName = ts.[2] |> string
    let url = optional<string> 3 string
    (id, name, lastName, url)

Much better now, but still field processing code does not look unified: normal fields are accessed directly while optional one works through a separate function, conversion function in former case is called as a part of pipe but is passed as a parameter in the latter. Still not good enough.

Better approach

Here comes the idea to use all the knowledge from multiple infinitely long and smart articles about monads and parser combinators. Luckily F# gives us its Computation Expressions feature, which is all about those monads.

Fine. Let’s go. We’ll start with the definition of our parser type:

type JsonArrayParser<'a> =
    JsonArrayParser of (JToken[] -> ('a * JToken[]) list

The parser is essentially a function that takes an array of tokens as its input and returns a list of pairs: an object of some type 'a and an input array. In case of success, the result is a singleton list, otherwise it is an empty list. (List is used here for simplicity, we could define a discriminated union with two value constructors for either case or use a standard Option<'a> type).

Let’s add some auxiliary code:

// running our parser
let runParser (JsonArrayParser p) tokens =
    p tokens
exception JsonParserException of string
// getting the parsing result
let getParsingResult = function
    | [] -> raise <| JsonParserException("Parsing failed")
    | [(res, _)] -> res
    | _ -> raise <| System.NotSupportedException("This should not normally happen")
// and a combinations of these functions
let parse parser tokens =
    runParser parser tokens
        |> getParsingResult

It is time now to define the Computation Expression builder:

type JsonArrayBuilder() =
    member this.Return(x) =
        JsonArrayParser (fun ts -> [(x, ts)])
    member this.Bind(p, f) =
        JsonArrayParser (fun ts ->
            match runParser p ts with
            | [(r, ts')] -> runParser (f r) ts'
            | _ -> []
        )
let jarr = JsonArrayBuilder()

That’s all!

  • Return – well… returns a given value;
  • Bind – given a parser A creates a new parser B that, depending on the result of A either calls a function to a A’s result or returns an empty list as a sign of failure.

Frankly speaking, this is not enough. We would also need some simple parsers:

let at idx =
    JsonArrayParser (fun ts ->
        if ts.Length <> 0 &amp;&amp; idx < ts.Length then
            [(ts.[idx]), ts]
        else
            []
    )
let tokenAtAs idx converter =
    jarr {
        let! t = at idx
        return (converter t)
    }
let (<@>) converter idx =
    tokenAtAs idx converter

Here is how we are now able to write our JSON processing:

let createPerson (ts: JToken[]) =
    jarr {
        let! id = int <@> 0
        let! name = string <@> 1
        let! lastName = string <@> 2
        let! url = string <@> 3
        return (id, name, lastName, url)
    }
// simple test
let tokens = JArray.Parse("[1,2]") |> Seq.toArray
let me = parse createPerson tokens

Finally we are at the point where we started and again we are missing support for optional fields. This should not be a big deal this time however, since every piece of data processing is a parser – what we need is just another parser, or more precisely a parser combinator (a thing that gets parser as its input and returns a new parser back) that will return a default value if initial parser fails.

let parseOrDefault parser defaultValue =
    JsonArrayParser (fun ts ->
        match runParser parser ts with
        | [] -> [(defaultValue, ts)]
        | res -> res
    )
let (<|>) = parseOrDefault

Voilà! Support for optionals is implemented:

let createPerson (ts:JToken[]) =
    // ...
    let! url = string <@> 3 <|> String.Empty
    // ...

Please note the fact that thanks to operator priority we don’t need to use parentheses.

A little more fun

Lets imagine that sometimes parsing of a field depends on some external condition, for example, if id field is greater than 1000 then URL will not be present and we should use some default value instead.

let parseOrFail condition parser =
    JsonArrayParser (fun ts ->
        if condition then
            runParser parser ts
        else
            []
    )
let (<?>) = parseOrFail

As a result of applying <?> to any parser A we get a new one – B – that, depending on a condition, does exactly what A does or returns empty list without even running A.

let createPerson (ts: JToken[]) =
    jarr {
        let! id = int <@> 0
        let! name = string <@> 1
        let! lastName = string <@> 2
        let! url = (id > 1000) <?> (string <@> 3) <|> String.Default
        return (id, name, lastName, url)
    }

See how using <?> and <|> in the same statement resembles usage of standard ternary operator ?:.

Conclusion

In the end we have a declarative way of processing JSON tokens and a simple parser framework capable of performing any kind of parsing task that may occur. As a next step we can implement a similar framework for processing JSON objects, with the only difference being the usage of field names instead of indices.

Written by Maxim Moiseev

January 11th, 2011 at 4:25 pm

Posted in FSharp

  • Larry

    Really interested in how you recommend using F# with MongoDB (which has a recently released C# driver)