Monday, August 22, 2011

Validating with applicative functors and LINQ

In my last post I introduced the basics of validation with applicative functors. I used a very simple example that didn't make it justice, so let's fix that now. I'll borrow a more complex example from the FluentValidation wiki:

class Address {
    public string Line1 { get; set; }
    public string Line2 { get; set; }
    public string Town { get; set; }
    public string County { get; set; }
    public string Postcode { get; set; }        
}

class Order {
    public string ProductName { get; set; }
    public decimal? Cost { get; set; }
}

class Customer {
    public int Id { get; set; }
    public string Surname { get; set; }
    public string Forename { get; set; }
    public decimal Discount { get; set; }
    public Address Address { get; set; }
    public IList<Order> Orders { get; set; }
}

In this domain, we'll do a few somewhat arbitrary validations:

  • A customer's surname can't be null.
  • A customer's surname can't be 'Foo'.
  • An address postcode can't be null
  • Address lines are optional, but content in the second line is not allowed if there is no content in the first line.
  • An order's product name can't be null
  • If the order's product name is valid, check that its cost is positive.

Let's do this top-down; starting with the customer:

var customer = new Customer { ... }
var result =
    from surname in NonNull(customer.Surname, "Surname can't be null")
    join surname2 in NotEqual(customer.Surname, "foo", "Surname can't be foo") on 1 equals 1
    join address in ValidateAddress(customer.Address) on 1 equals 1
    join orders in ValidateOrders(customer.Orders) on 1 equals 1
    select customer;

surname, surname2, etc, are all dummies here, we never really use them. Now we'll describe each of these functions, starting with NonNull() which is almost trivial:

static FSharpChoice<T, Errors> NonNull<T>(T value, string err) where T: class {
    if (value == null)
        return FSharpChoice.Error<T>(err);
    return FSharpChoice.Ok(value);
}

where Errors is just a type alias for FSharpList<string>, and FSharpChoice.Error() and FSharpChoice.Ok() are just less-verbose FSharpChoice constructors for this purpose.

NotEqual() is similar and, just as one would expect, validates that a value is not equal to some other value and also returns a FSharpChoice<T, Errors>

Now let's see how ValidateAddress() looks like:

static FSharpChoice<Address, Errors> ValidateAddress(Address a) {
    return from x in NonNull(a.Postcode, "Post code can't be null")
           join y in ValidateAddressLines(a) on 1 equals 1
           select a;
}

Ok, but what's in ValidateAddressLines() ?

static FSharpChoice<Address, Errors> ValidateAddressLines(Address a) {
    if (a.Line1 != null || a.Line2 == null)
        return FSharpChoice.Ok(a);
    return FSharpChoice.Error<Address>("Line1 is empty but Line2 is not");
}

Note how both NonNull() and ValidateAddressLines() use the form "if condition then ok else error". It's a pretty common pattern so let's abstract that to a higher-order function:

static Func<T, FSharpChoice<T, Errors>> Validator<T>(Predicate<T> pred, string err) {
    return x => {
        if (pred(x))
            return FSharpChoice.Ok(x);
        return FSharpChoice.Error<T>(err);
    };
}

and now we can write:

static readonly Func<Address, FSharpChoice<Address, Errors>> ValidateAddressLines =
    Validator<Address>(x => x.Line1 != null || x.Line2 == null, 
                       "Line1 is empty but Line2 is not");

Only ValidateOrders() is left to explain. Let's see first how to validate a single order, and then we'll figure out how to make that operate on a list of orders:

static FSharpChoice<Order, Errors> ValidateOrder(Order o) {
    return
        from name in NonNull(o.ProductName, "Product name can't be null")
        from cost in GreaterThan(o.Cost, 0, string.Format("Cost for product '{0}' can't be negative", name))
        select o;
}

Here we used monadic validation (from...from...) which, as explained before, causes the second validation to run only if the first was successful.

Now the tricky part: making this work on IEnumerable<Order>. I'll use explicit types here so you can better see what's going on in each step, and inline comments:

static FSharpChoice<IEnumerable<Order>, Errors> ValidateOrders(IEnumerable<Order> orders) {
    // first we apply the validator to all orders
    IEnumerable<FSharpChoice<Order,Errors>> validatedOrders = orders.Select(ValidateOrder);

    // now we fold (Aggregate) over the list of validated orders...
    // ...to collect all orders (or their concatenated errors) in a single FSharpChoice

    // we need first an empty list of orders in the domain of validation:
    FSharpChoice<FSharpList<Order>, Errors> zero = ListModule.Empty<Order>().PureValidate();

    // now the actual fold
    return validatedOrders
        .Aggregate(zero, 
            (FSharpChoice<FSharpList<Order>, Errors> e, FSharpChoice<Order, Errors> c) => 
                from a in e
                join b in c on 1 equals 1
                select a.Cons(b))

        // finally we need to upcast the list to match the return type
        .Select(x => (IEnumerable<Order>)x);
}

Just as before with the conditional validator, we can (and should) abstract this to a higher-order function, and then we can express ValidateOrders() as:

static Func<IEnumerable<Order>, FSharpChoice<IEnumerable<Order>, Errors>> ValidateOrders =
    EnumerableValidator<Order>(ValidateOrder);

Epilogue

Validating with applicative functors is nothing new. Haskell and Scala developers have been doing it for quite some time now.

This approach to validation is attractive because it's very simple. It all revolves around the FSharpChoice type which is one of the basic building blocks in functional programming, and a very simple type: it's either this or that, either the validated value or the list of errors.

There are no ad-hoc concepts about validation here: validators are functions returning FSharpChoice<T, Errors> . We use higher-order functions to abstract validators. Folding and mapping, to manipulate validations. Validators are composed through an applicative functor or monad. These are all very general concepts. Accidental difficulty (sometimes also called "accidental complexity") is low. The core that enables applicative functor validation can be expressed in around 30 lines of F# code. In contrast, commonly used validation libraries in .NET have lots of types representing non-generic, ad-hoc concepts, abstractions, with ad-hoc interactions. Therefore, concept count and accidental complexity are high. 

Of course, applicative functors are best expressed and understood in functional languages, but C# 3 seems to be a good enough vehicle for them. They can be expressed even in Java (the Functional Java library implements the validation applicative functor) although it's quite verbose. If you want to learn more about applicative functors, see:

If you found this post interesting, stay tuned: Steffen, Ryan and I are planning to create a general FP library for C# and F#, where this code would be part of said library, among many other things. In the meantime, you can find the code here.

No comments: