Tuesday, May 6, 2014

On parametric polymorphism and JSON serialization

A couple of months ago I wrote Fleece, a JSON mapper for F#. What does that mean? It provides a library of functions to help map a JSON value tree onto .NET typed instances. And some more functions to do the inverse operation: map typed .NET values onto a JSON tree.
Fleece delegates the actual JSON parsing and serialization to System.Json.

But what’s the purpose of Fleece? Why another JSON library? Why F#? Is Fleece merely another case of Not-Invented-Here Syndrome? After all, anyone can serialize things easily with something like ServiceStack.Text, for example:

class Person {
    public int Id { get; set; }
    public string Name { get; set; }
}
var john = ServiceStack.Text.JsonSerializer.SerializeToString(new Person {Id = 1, Name = "John"});

Right?

However there’s a problem here. How do we know that the code above works for this definition of Person? Well, of course we can just run it and get the expected result stored in the john variable. But can you be sure that it will work for this Person type, without running it? It seems obvious that it will work, otherwise the library wouldn’t be useful, would it?

And yet, if we slightly change the definition of Person:

class Person {
    public readonly int Id;
    public readonly string Name;

    public Person(int id, string name) {
        Id = id;
        Name = name;
    }
}
var john = ServiceStack.Text.JsonSerializer.SerializeToString(new Person(id: 1, name: "John"));

It will compile, but john will contain the string "{}", i.e. an empty object. Definitely not what anyone would want! Yes, set the magic IncludePublicFields flag and it works, but why do we have to guess this? Would it make any difference if it throwed an exception instead of generating an empty JSON object? We spend a lot of time compiling things, can’t the compiler check this for us?

Even worse, SerializeToString will happily accept any instance of any type, even if it doesn’t make any sense:

var json = ServiceStack.Text.JsonSerializer.SerializeToString(new Func<int, int>(x => x + 1));
Console.WriteLine(json); // empty string

By the way, don’t think this is to bash ServiceStack in particular. Most JSON libraries for .NET have this problem:

static void NewtonsoftJson() {
    var json = Newtonsoft.Json.JsonConvert.SerializeObject(new Func<int, int>(x => x + 1));
    Console.WriteLine(json);

    // {"Delegate":{},"method0":{"Name":"<Newtonsoft>b__3","AssemblyName":"SerializationLies, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null","ClassName":"SerializationLies.Program","Signature":"Int32 <Newtonsoft>b__3(Int32)","Signature2":"System.Int32 <Newtonsoft>b__3(System.Int32)","MemberType":8,"GenericArguments":null}}
}

DataContractJsonSerializer at least acknowledges its partiality and throws:

static void DataContract() {
    using (var ms = new MemoryStream()) {
        new DataContractJsonSerializer(typeof(Func<int, int>)).WriteObject(ms, new Func<int, int>(x => x + 1));
        Console.WriteLine(Encoding.ASCII.GetString(ms.ToArray()));
    }

/* throws:             
Unhandled Exception: System.Runtime.Serialization.SerializationException: DataContractJsonSerializer does not support the setting of the FullTypeName of the object to be serialized to a value other than the default FullTypeName. 
Attempted to serialize object with full type name 'System.DelegateSerializationHolder' and default full type name 
'System.Func`2[[System.Int32, mscorlib, Version=4.0.0.0,Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]'.
*/
}

And by extension, pretty much all web frameworks (probably the most common producers of JSON) have this problem too:

ASP.NET MVC 4

public class HomeController : Controller {
    public ActionResult Index() {
        return Json(new Func<int, int>(x => x + 1), JsonRequestBehavior.AllowGet);
    }
}

Throws System.InvalidOperationException: A circular reference was detected while serializing an object of type 'System.Reflection.RuntimeModule'.

And therefore also Figment, since I based it on ASP.NET MVC:

get "/" (json ((+) 1))

In ASP.NET Web API your controllers can return any object and the serializers will try to serialize it according to the result of content negotiation. In the case of JSON, the default serializer is Newtonsoft so you end up with the same result I showed above for Newtonsoft.

In NancyFX:

public class Home : NancyModule {
    public Home() {
        Get["/home"] = _ => new Func<int, int>(x => x + 1);
    }
}

Visit /home.json and get an InvalidOperationException: Circular reference detected.

In ServiceStack:

[Route("/home")]
public class Home { }
public class HomeService : Service {
    public object Any(Home h) {
        return new Func<int, int>(x => x + 1);
    }
} 

Visit /home?format=json and you get:

{"Method":{"__type":"System.Reflection.RuntimeMethodInfo, mscorlib","Name":"b__0","DeclaringType":"ServiceStackTest.HomeService, ServiceStackTest","ReflectedType":"ServiceStackTest.HomeService, ServiceStackTest","MemberType":8,"MetadataToken":100663310,"Module":{"__type":"System.Reflection.RuntimeModule, mscorlib","MDStreamVersion":131072,"FullyQualifiedName":"G:\\Windows\\Microsoft.NET\\Framework\\v4.0.30319\\Temporary ASP.NET Files\\root\\c11d5664\\d0efee07\\assembly\\dl3\\51c16a64\\5cdba327_3150cf01\\ServiceStackTest.dll","ModuleVersionId":"125112c7a82d4c2099718b901637e950","MetadataToken":1,"ScopeName":"ServiceStackTest.dll","Name":"ServiceStackTest.dll","Assembly":{"__type":"System.Reflection.RuntimeAssembly, mscorlib","CodeBase":"file:///g:/prg/ServiceStackTest/ServiceStackTest/bin/ServiceStackTest.DLL","FullName":"ServiceStackTest, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null","DefinedTypes":["ServiceStackTest.Global, ServiceStackTest","ServiceStackTest.AppHost, ServiceStackTest","Service{"ResponseStatus":{"ErrorCode":"InvalidCastException","Message":"Unable to cast object of type 'System.Security.Policy.Zone' to type 'System.Security.Policy.Url'.","StackTrace":"   at ServiceStack.Text.Common.WriteType`2.WriteProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.Common.WriteListsOfElements`1.WriteIEnumerable(TextWriter writer, Object oValueCollection)\r\n   at ServiceStack.Text.Common.WriteType`2.WriteProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.Common.WriteType`2.WriteAbstractProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.Common.WriteType`2.WriteProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.Common.WriteType`2.WriteAbstractProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.Common.WriteType`2.WriteProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.Common.WriteType`2.WriteAbstractProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.Common.WriteType`2.WriteProperties(TextWriter writer, Object value)\r\n   at ServiceStack.Text.JsonSerializer.SerializeToStream(Object value, Type type, Stream stream)\r\n   at ServiceStack.Text.JsonSerializer.SerializeToStream[T](T value, Stream stream)\r\n   at ServiceStack.Serialization.JsonDataContractSerializer.SerializeToStream[T](T obj, Stream stream)\r\n   at ServiceStack.Host.ContentTypes.b__5(IRequest r, Object o, Stream s)\r\n   at ServiceStack.Host.ContentTypes.<>c__DisplayClass2.b__1(IRequest httpReq, Object dto, IResponse httpRes)\r\n   at ServiceStack.HttpResponseExtensionsInternal.WriteToResponse(IResponse response, Object result, ResponseSerializerDelegate defaultAction, IRequest request, Byte[] bodyPrefix, Byte[] bodySuffix)"}}

You get the point.

I don’t think anyone really wants this, but what are the alternatives?

Many would say “just write a test for it”, but that would mean writing a test for every type we serialize, hardly a good use of our time and very easy to forget. Since we’re working in a statically-typed language, can’t we make the compiler work for us?

The first step is understanding why this is really wrong. When you write a function signature like this in C#:

string Serialize<T>(T obj)

when interpreted under the Curry-Howard isomorphism this is actually saying: “I propose a function named ‘Serialize’ which turns any value of any type into a string”. Which is a blatant lie when implemented, since we’ve seen that you can’t get a meaningful string out of many types. The only logical implementation for such a function, without breaking parametricity, is a constant string.

Well, in .NET we could also implement it by calling ToString() on the argument, but you’ll notice that Object.ToString() has essentially the same signature as our Serialize above, and therefore the same arguments apply.

And you have to give up side effects too, otherwise you could implement this function simply by ignoring the argument and reading a string from a file. Runtime type inspection (i.e. reflection) too, as it breaks parametricity.

These restrictions and the property of parametricity are important because they enable code you can reason about, and very precisely. They will help you refactor towards more general code. You can even derive theorems just from their signatures.

I won't explain Curry-Howard or parametricity here, but if you're not familiar with these concepts I highly recommend following the links above. I found them to be very important concepts, especially in statically-typed languages.

You may think that all this reasoning and theorems is only for academics and doesn’t apply to your job, but we’ll see how by following these restrictions we can get the compiler to check what otherwise would have taken a test for each serialized type. This means less code and less runtime errors, a very real benefit! The more you abide by these restrictions, the more the compiler and types will help you.

The opposite of this programming by reasoning is programming by coincidence. You start "trying out things" to somehow hit those magical lines of code that do what you wanted to do... at least for the inputs you have considered.

So to sum up: use of unconstrained type parameters (generics) in concrete methods/classes for things that don’t truly represent “for all types” is a logic flaw.

Now that I briefly and badly explained the problem, what not to do and why, in the next post we'll see what we can do.

No comments: