Tuesday, May 13, 2014

Mapping objects to JSON with Fleece

In the last post I briefly explained some of the consequences of breaking parametricity, in particular for JSON serialization. I used JSON serialization here as a particular case only to introduce Fleece, but breaking parametricity anywhere has similar consequences.

So how can we serialize JSON without breaking parametricity?

The simplest thing we could do is to write multiple monomorphic Serialize functions, one for each type we want to serialize, e.g:

class Person {
    public readonly int Id;
    public readonly string Name;

    public Person(int id, string name) {
        Id = id;
        Name = name;
    }
}

string Serialize(Person p) {
    return @"{""Id"": " + p.Id + @", ""Name"": """ + p.Name + @"""}";
}

You’ll notice that this code doesn’t escape the Name value and therefore will generate broken JSON in general. Why doesn’t this happen with the Id? Types! The lowly int type has restricted the values it can have and therefore we can statically assure that it doesn’t need any escaping. Cool, huh?

So how do we solve the escaping problem for the Name field? With more types, of course! Instead of handling JSON as strings, we create some types to model more precisely what JSON can do, and let the compiler check that for us. System.Json does just that, so let’s use it. Strictly speaking it’s still a too loose but it’s decent enough. Our code becomes:

JsonObject Serialize(Person p) {
    return new JsonObject {
        {"Id", p.Id},
        {"Name", p.Name},
    };
}

Now we don’t have to worry about encoding issues, JsonObject takes care of that. Also the function now returns a JsonObject instead of a string, which allows us to safely compose the result into larger JSON objects. When we’re done composing, we call ToString() to get the serialized JSON.
One way to look at this is that we’re defining a tiny language here, with JsonValue.ToString() being one possible interpreter.
I’m sorry if these explanations sound a bit patronizing, but I really want to emphasize how types make our job easier.

Let’s raise the abstraction bar a bit and write a function that serializes an IEnumerable<T>. Since we don’t want to break parametricity, we must ensure that this will work for any type T such that T can itself be serialized. But we don’t know how to turn any arbitrary type T into a JsonValue, and we’ve ruled out runtime type inspection, so we have to pass the conversion explicitly:

JsonArray Serialize<T>(IEnumerable<T> list, Func<T, JsonValue> serializerT) {
    return new JsonArray(list.Select(serializerT));
}

That works, but it’s not very nice. We have to compose these serializer functions manually! Every time we want to serialize an IEnumerable<T> we’ll have to look up where the function to serialize T is. Is there any way to avoid that?

We could pass the conversion by using interfaces. If we have:

public interface IToJson {
    JsonValue ToJson();
}

We could make Person implement IToJson simply by moving the Serialize(Person p) function above to the definition of Person:

class Person: IToJson {
    public readonly int Id;
    public readonly string Name;

    public Person2(int id, string name) {
        Id = id;
        Name = name;
    }

    public JsonValue ToJson() {
        return new JsonObject {
            {"Id", Id},
            {"Name", Name},
        };
    }
}

Then serializing a list can be restricted to types implementing IToJson:

JsonValue ToJson<T>(this IEnumerable<T> list) where T: IToJson {
    return new JsonArray(list.Select(x => x.ToJson()));
}

But this only works for types we control. We can’t make IEnumerable<T> implement IToJson. We can wrap it:

class JsonList<T>: IToJson where T: IToJson {
    public readonly IEnumerable<T> List;

    public JsonList(IEnumerable<T> list) {
        List = list;
    }

    public JsonValue ToJson() {
        return new JsonArray(List.Select(x => x.ToJson()));
    }
}

But this is inconvenient. Suppose we have a class Company with a list of Person as employees. Here’s how serialization would look like:

class Company: IToJson {
    public readonly string Name;
    public readonly List<Person> Employees;

    public Company(string name, List<Person> employees) {
        Name = name;
        Employees = employees;
    }

    public JsonValue ToJson() {
        return new JsonObject {
            {"Name", Name},
            {"Employees", new JsonList<Person>(Employees).ToJson()}
        };
    }
}

That’s not much better than manually inlining the code for JsonList.ToJson(). Ideally, we just want to call a simple function ToJson and have the compiler somehow figure out if there’s a suitable function declared for the type of the argument. Overloading would be great if the generic ToJson function for lists could somehow recursively look into what overloads are defined if there’s a match for T, at compile time.

As far as I know C# can’t do that but it turns out that F# can.

Enter Fleece

With Fleece, converting Person and Company to JSON goes like this, without implementing any IToJson interface:

type Person with
   static member ToJSON (x: Person) =
       jobj [ 
           "Id" .= x.Id
           "Name" .= x.Name
       ]

type Company with
    static member ToJSON (x: Company) =
        jobj [
            "Name" .= x.Name
            "Employees" .= x.Employees
        ]

let company = { Company.Name = "Double Fine"; Employees = [{ Employee.Id = 1; Name = "Tim Schafer"}] }

let jsonAsString = (toJSON company).ToString()

Fleece here is statically ensuring that the type of the argument of toJSON has a definition of a static member ToJSON. It also checks statically that every type to the right of a .= expression has a suitable definition of ToJSON, and chooses it automatically. If there isn’t a suitable definition, you get a compile-time error.

Moreover, it "composes" (probably not the right term to use here) the serializers automatically at compile-time. In previous examples, we had to compose the list serializer with the Person serializer “manually” to serialize a concrete list of persons. Fleece defines a parametric serializer for lists, and then the compiler composes this serializer with the serializer for Person we have defined here.

What makes this possible in F# are inlines and static member constraints. Here’s how the definition of ToJSON for a list looks like:

type ToJSONClass with
    static member inline ToJSON (x: 'a list) =
        JArray (listAsReadOnly (List.map toJSON x))

If you hover in Visual Studio over the ToJSON keyword in this definition, it says ToJSON: 'a list -> JsonValue (requires member ToJSON). This means that F# is inferring that the type 'a must have a ToJSON member in order to call toJSON on a list of 'a.

This is equivalent to Haskell’s:

instance (ToJSON a) => ToJSON [a] where
   toJSON = Array . V.fromList . map toJSON

Haskell has dedicated syntax for this which makes the constraint explicit compared to F#.

This has long been used in F# for generic math. F# also has an ad-hoc, limited form of this in the EqualityConditionalOn attribute, where the equality of a type depends on a type argument having “equality”. So for example, this in F#:

type MyBox<[<EqualityConditionalOn>] 'a> = ...

Roughly corresponds to Haskell’s:

instance (Eq a) => Eq (MyBox a) where ...

With the difference that F# can sometimes derive equality automatically depending on the equality of type arguments.

The bottom line here is that the toJSON function is overloaded only for supported types, instead of being parametrically polymorphic, so it does not break parametricity. This overloading is also called ad-hoc polymorphism. Haskell achieves ad-hoc polymorphism thanks to typeclasses, and this technique is based on that. In fact, Fleece is based on Aeson, the de-facto standard JSON library for Haskell.

Anton Tayanovskyy had also prototyped a similar typeclass-based JSON library for F# some time ago.

Fleece only scratches the surface of what’s possible with inline-encoded typeclasses (a.k.a. “type methods”). I recommend reading Gustavo León’s blog to learn more about this technique and checking out the FsControl and FSharpPlus projects.

Refactor safety

Now, in my last post I claimed that this approach would save code by saving tests. But here we see that we have to write serializer code for each type that we want to serialize, unlike reflection-based libraries! That’s quite a bit of boilerplate! I could argue that it’s a small price to pay to get code you can reason about, but the truth is this is pretty trivial code that could be easily generated at compile-time.
In fact, Haskell can do that, either with Template Haskell or the more recent GHC.Generics, in which case you only need to make your type derive Generic and declare the typeclass instance. The actual serialization code is filled in by the compiler.

But there is a bigger problem with both reflection- and GHC.Generics-derived serialization: they tie JSON output to identifiers in your code. So whenever you rename some identifier (for example a field name in a record) in a type that is used to model some JSON output, you’re implicitly changing your JSON schema. You’re making a breaking change in the output of the program in what’s normally a safe operation. To quote a tweet:

Or more bluntly:

Still, it might be useful to start prototyping with reflection-based serializer while being aware of these issues, then switch to explicit serialization once the initial prototype stage is done. Some haskellers do this with GHC.Generics-derived serialization (thanks to Jonathan Fischoff for confirming this on #haskell IRC).

Even in C#, without these F# fake typeclases, you should be very wary of breaking parametricity and the cost on maintenance it implies. In my opinion, the boilerplate is worth it to avoid breaking parametricity.

In the next post we’ll see how to use Fleece to map in the opposite direction: JSON to objects. We’ll also see some of the drawbacks of these fake typeclasses.

No comments: