Monday, August 29, 2011

Validating with applicative functors in F#

In my last post I showed how to validate using applicative functors in C#. LINQ helps a lot with making the syntax bearable.

In F# we have other tools to encode applicative functors: custom operators and type inference. And thanks to pattern matching it's much easier to define the validation applicative functor operations in F# (but I won't show that here).

So let's see what the C# code from the last post looks like in F#. Hopefully this article will serve both C# developers interested in learning F# and F# developers interested in learning more about applicative functors. F# programmers will probably find the first part boring and might want to go directly to the second part.

Defining the primitives

Starting from the bottom up, here's the function that builds a validator from a predicate:

static Func<T, FSharpChoice<T, Errors>> Validator<T>(Predicate<T> pred, string error) {
    return x => {
        if (pred(x))
            return FSharpChoice.Ok(x);
        return FSharpChoice.Error<T>(error);
    };
}

Same thing in F#:

let validator pred error x =
    if pred x
        then Choice1Of2 x
        else Choice2Of2 [error]

We don't need the Ok and Error constructors because F# correctly infers the appropriate types. In fact, we did not use a single type annotation!

We don't need to explicitly state that we're returning a function, as we'll see now:

static FSharpChoice<T, Errors> NonNull<T>(T value, string err) where T: class {
    return Validator<T>(x => x != null, err)(value);
}

Same in F# :

let (==) = LanguagePrimitives.PhysicalEquality
let inline (!=) a b = not (a == b)

let nonNull e = validator ((!=) null) e

Readers unfamiliar with F# might wonder about the definition of == and !=. The answer is: F# doesn't have a built-in reference equality operator, and with good reason: usually you want to compare things by their structure rather than by reference. We could have used F# structural equality here (the <> operator in this case) but this would have placed an additional equality type restriction. It doesn't really matter in this case but we'll try to keep things as close as possible to the equivalent C# code. If you want to learn more about structural equality I refer you to this very thorough article by Brian McNamara.

C# programmers that are still paying any attention to this code (gotcha! ;-) should have noticed that I defined the validator function with three parameters (pred, error, x), yet I only defined the first two arguments when defining nonNull. Where did the other parameter go? This is an example of partial application of a curried function. Currying and partial application are often confused with each other, but they're not the same thing. Ivan Towlson has a great series of articles about partial application in F# and comparing it to C#. I also recomend this article (and related) by Dustin Campbell.

Moving on:

static FSharpChoice<T, Errors> NotEqual<T>(T value, T other, string err) {
    return Validator<T>(v => !Equals(v, other), err)(value);
}

Same in F# :

let notEqual a = validator ((<>) a)

Here we did use structural equality, we certainly don't want to compare things by reference. Object.Equals is not the same as F#'s structural equality. Something probably a little closer would be requiring T to implement IEquatable<T> in the C# function, but still not the same.

static Func<Address, FSharpChoice<Address, Errors>> ValidateAddressLines = 
    Validator<Address>(x => x.Line1 != null || x.Line2 == null, 
                       "Line1 is empty but Line2 is not");

Same in F# :

let validateAddressLines =
    validator 
        (fun (a: Address) -> a.Line1 != null || a.Line2 == null) 
        "Line1 is empty but Line2 is not"

Nothing interesting here. Next:

static FSharpChoice<T?, Errors> GreaterThan<T>(T? value, T? other, string err) where T: struct, IComparable<T> {
    if (!value.HasValue && !other.HasValue || value.HasValue != other.HasValue || value.Value.CompareTo(other.Value) > 0)
        return FSharpChoice.Ok(value);
    return FSharpChoice.Error<T?>(err);
}

The bad news here is that F# doesn't have built-in support for nullable types. Fortunately, it's not too hard to write a few functions and get decent syntax. I already did that about a year ago so I'll just use it here:

let greaterThan o = validator ((<?) o)

No, that's not a typo: due to the way operators are defined, they have to be read backwards when partially applied. You can be a bit more explicit/verbose and instead say:

let greaterThan o = validator (fun a -> a >? o)

Combining functions through monads and applicative functors

Now that we have all primitives defined, let's get to the interesting part: combine them with an applicative functor and monad.

I've blogged about applicative functors in F# many times... very briefly, there are two operations: pure (puree in F#) and apply (<*> in F#). And we define this convenience function (this is standard for applicative functors):

let inline (<!>) f x = puree f <*> x

Here's ValidateAddress() in C# using LINQ to encode the applicative functor:

static FSharpChoice<Address, Errors> ValidateAddress(Address a) {
    return from x in NonNull(a.Postcode, "Post code can't be null")
           join y in ValidateAddressLines(a) on 1 equals 1
           select a;
}

Here's the equivalent in F# :

let validateAddress (a: Address) = 
    fun x y -> a
    <!> nonNull "Post code can't be null" a.Postcode
    <*> validateAddressLines a

If you squint a little, they're not that different really. We can go a bit further to avoid having those dummy values x and y that we end up ignoring anyway. We only need to define (these are also standard functions):

let inline lift2 f a b = f <!> a <*> b
let inline ( <*) a b = lift2 (fun z _ -> z) a b

Now we can say:

let validateAddress (a: Address) = 
    puree a
    <* nonNull "Post code can't be null" a.Postcode
    <* validateAddressLines a

Validating a single Order in C#:

static FSharpChoice<Order, Errors> ValidateOrder(Order o) {
    return
        from name in NonNull(o.ProductName, "Product name can't be null")
        from cost in GreaterThan(o.Cost, 0, string.Format("Cost for product '{0}' must be positive", name))
        select o;
}

Unlike the previous validators, this one is monadic, not applicative. In F# we have some alternatives. We can use a computation expression:

let validateOrder (o: Order) =
    validation {
        let! name = nonNull "Product name can't be null" o.ProductName
        let! _ = greaterThan (0m).n (sprintf "Cost for product '%s' must be positive" name) o.Cost
        return o
    }

Or we can use monadic operators directly:

let validateOrder (o: Order) =
    let nameNotNull = nonNull "Product name can't be null" o.ProductName
    let positiveCost n = greaterThan (0m).n (sprintf "Cost for product '%s' can't be negative" n) o.Cost
    nameNotNull >>= positiveCost |> map (fun _ -> o)

The higher-order function to make this operate on sequences of orders:

static FSharpChoice<FSharpList<Order>, Errors> ValidateOrders(IEnumerable<Order> orders) {
    var zero = ListModule.Empty<Order>().PureValidate();
    return orders
        .Select(ValidateOrder)
        .Aggregate(zero, (e, c) => from a in e
                                   join b in c on 1 equals 1
                                   select a.Cons(b));
}

Same in F# :

let inline flip f a b = f b a
let inline cons a b = a::b
let seqValidator f = 
    let zero = puree []
    Seq.map f >> Seq.fold (lift2 (flip cons)) zero
let validateOrders c = seqValidator validateOrder c

And the final validations with an sample application:

let customer = 
    Customer(
        Surname = "foo",
        Address = Address(Postcode = "1424"),
        Orders = ResizeArray([
                                Order(ProductName = "Foo", Cost = (5m).n)
                                Order(ProductName = "Bar", Cost = (-1m).n)
                                Order(ProductName = null , Cost = (-1m).n)
                 ]))
let result = 
    puree customer
    <* nonNull "Surname can't be null" customer.Surname
    <* notEqual "foo" "Surname can't be foo" customer.Surname
    <* validateAddress customer.Address
    <* validateOrders customer.Orders
match result with
| Choice1Of2 c -> printfn "Valid customer: %A" c
| Choice2Of2 errors -> printfn "Invalid customer. Errors:\n%A" errors

Wrap it up already!

I know, I know, this post is too long already. Hopefully this step-by-step, side-by-side comparison will help C# programmers see some of the power of F# through currying, partial application, custom operators and type inference, in a non-trivial example. And F# developers can see another use for applicative functors: validation.

Code is here.

Monday, August 22, 2011

Validating with applicative functors and LINQ

In my last post I introduced the basics of validation with applicative functors. I used a very simple example that didn't make it justice, so let's fix that now. I'll borrow a more complex example from the FluentValidation wiki:

class Address {
    public string Line1 { get; set; }
    public string Line2 { get; set; }
    public string Town { get; set; }
    public string County { get; set; }
    public string Postcode { get; set; }        
}

class Order {
    public string ProductName { get; set; }
    public decimal? Cost { get; set; }
}

class Customer {
    public int Id { get; set; }
    public string Surname { get; set; }
    public string Forename { get; set; }
    public decimal Discount { get; set; }
    public Address Address { get; set; }
    public IList<Order> Orders { get; set; }
}

In this domain, we'll do a few somewhat arbitrary validations:

  • A customer's surname can't be null.
  • A customer's surname can't be 'Foo'.
  • An address postcode can't be null
  • Address lines are optional, but content in the second line is not allowed if there is no content in the first line.
  • An order's product name can't be null
  • If the order's product name is valid, check that its cost is positive.

Let's do this top-down; starting with the customer:

var customer = new Customer { ... }
var result =
    from surname in NonNull(customer.Surname, "Surname can't be null")
    join surname2 in NotEqual(customer.Surname, "foo", "Surname can't be foo") on 1 equals 1
    join address in ValidateAddress(customer.Address) on 1 equals 1
    join orders in ValidateOrders(customer.Orders) on 1 equals 1
    select customer;

surname, surname2, etc, are all dummies here, we never really use them. Now we'll describe each of these functions, starting with NonNull() which is almost trivial:

static FSharpChoice<T, Errors> NonNull<T>(T value, string err) where T: class {
    if (value == null)
        return FSharpChoice.Error<T>(err);
    return FSharpChoice.Ok(value);
}

where Errors is just a type alias for FSharpList<string>, and FSharpChoice.Error() and FSharpChoice.Ok() are just less-verbose FSharpChoice constructors for this purpose.

NotEqual() is similar and, just as one would expect, validates that a value is not equal to some other value and also returns a FSharpChoice<T, Errors>

Now let's see how ValidateAddress() looks like:

static FSharpChoice<Address, Errors> ValidateAddress(Address a) {
    return from x in NonNull(a.Postcode, "Post code can't be null")
           join y in ValidateAddressLines(a) on 1 equals 1
           select a;
}

Ok, but what's in ValidateAddressLines() ?

static FSharpChoice<Address, Errors> ValidateAddressLines(Address a) {
    if (a.Line1 != null || a.Line2 == null)
        return FSharpChoice.Ok(a);
    return FSharpChoice.Error<Address>("Line1 is empty but Line2 is not");
}

Note how both NonNull() and ValidateAddressLines() use the form "if condition then ok else error". It's a pretty common pattern so let's abstract that to a higher-order function:

static Func<T, FSharpChoice<T, Errors>> Validator<T>(Predicate<T> pred, string err) {
    return x => {
        if (pred(x))
            return FSharpChoice.Ok(x);
        return FSharpChoice.Error<T>(err);
    };
}

and now we can write:

static readonly Func<Address, FSharpChoice<Address, Errors>> ValidateAddressLines =
    Validator<Address>(x => x.Line1 != null || x.Line2 == null, 
                       "Line1 is empty but Line2 is not");

Only ValidateOrders() is left to explain. Let's see first how to validate a single order, and then we'll figure out how to make that operate on a list of orders:

static FSharpChoice<Order, Errors> ValidateOrder(Order o) {
    return
        from name in NonNull(o.ProductName, "Product name can't be null")
        from cost in GreaterThan(o.Cost, 0, string.Format("Cost for product '{0}' can't be negative", name))
        select o;
}

Here we used monadic validation (from...from...) which, as explained before, causes the second validation to run only if the first was successful.

Now the tricky part: making this work on IEnumerable<Order>. I'll use explicit types here so you can better see what's going on in each step, and inline comments:

static FSharpChoice<IEnumerable<Order>, Errors> ValidateOrders(IEnumerable<Order> orders) {
    // first we apply the validator to all orders
    IEnumerable<FSharpChoice<Order,Errors>> validatedOrders = orders.Select(ValidateOrder);

    // now we fold (Aggregate) over the list of validated orders...
    // ...to collect all orders (or their concatenated errors) in a single FSharpChoice

    // we need first an empty list of orders in the domain of validation:
    FSharpChoice<FSharpList<Order>, Errors> zero = ListModule.Empty<Order>().PureValidate();

    // now the actual fold
    return validatedOrders
        .Aggregate(zero, 
            (FSharpChoice<FSharpList<Order>, Errors> e, FSharpChoice<Order, Errors> c) => 
                from a in e
                join b in c on 1 equals 1
                select a.Cons(b))

        // finally we need to upcast the list to match the return type
        .Select(x => (IEnumerable<Order>)x);
}

Just as before with the conditional validator, we can (and should) abstract this to a higher-order function, and then we can express ValidateOrders() as:

static Func<IEnumerable<Order>, FSharpChoice<IEnumerable<Order>, Errors>> ValidateOrders =
    EnumerableValidator<Order>(ValidateOrder);

Epilogue

Validating with applicative functors is nothing new. Haskell and Scala developers have been doing it for quite some time now.

This approach to validation is attractive because it's very simple. It all revolves around the FSharpChoice type which is one of the basic building blocks in functional programming, and a very simple type: it's either this or that, either the validated value or the list of errors.

There are no ad-hoc concepts about validation here: validators are functions returning FSharpChoice<T, Errors> . We use higher-order functions to abstract validators. Folding and mapping, to manipulate validations. Validators are composed through an applicative functor or monad. These are all very general concepts. Accidental difficulty (sometimes also called "accidental complexity") is low. The core that enables applicative functor validation can be expressed in around 30 lines of F# code. In contrast, commonly used validation libraries in .NET have lots of types representing non-generic, ad-hoc concepts, abstractions, with ad-hoc interactions. Therefore, concept count and accidental complexity are high. 

Of course, applicative functors are best expressed and understood in functional languages, but C# 3 seems to be a good enough vehicle for them. They can be expressed even in Java (the Functional Java library implements the validation applicative functor) although it's quite verbose. If you want to learn more about applicative functors, see:

If you found this post interesting, stay tuned: Steffen, Ryan and I are planning to create a general FP library for C# and F#, where this code would be part of said library, among many other things. In the meantime, you can find the code here.

Thursday, August 18, 2011

Refactoring to monadic C# - applicative functors

In the last part of this series, we "upgraded" the Maybe monad to an Either monad in order to add error messages when parsing two integers. To recap, the code using the Either monad via LINQ looked like this:

var somethingOrError =
    from userID in FSharpOption.TryParseInt(req_userID).ToFSharpChoice("Invalid User ID")
    from id in FSharpOption.TryParseInt(req_otherID).ToFSharpChoice("Invalid ID")
    select doSomething(userID, id);

somethingOrError.Match(Console.WriteLine, setError);

That code, however, only dealt with setting one error message for the whole computation. When it got the first error, it aborted the rest of the computation.

Sometimes we do want this shortcut evaluation, but more generally, when validating data, we want to validate the whole thing, and then display a list of errors, not abort at the first error we find.

To do this, we need to step down from monads into a weaker abstraction: applicative functors. I've already blogged about applicative functors in F# and C#, mostly in the context of formlets. In a nutshell:

  • Applicative functors are defined by two functions: pure and apply.
  • These functions must satisfy some laws, which I won't mention here.
  • Pure is basically a Func<T, M<T>> for some M, i.e. it puts some value into M
  • Apply is basically a Func<M<Func<A,B>>,M<A>,M<B>>, i.e. it applies a function within M
  • All monads are applicative functors (but not necessarily the other way around).
  • All applicative functors are functors (but not necessarily the other way around), which means you can Select() over them with LINQ.

But I don't want to get into much detail here. What's interesting here is that we won't use the "default" implementation of the Either applicative functor (i.e. the one derived from the monad) because it would only keep the first error message, so it would be pretty useless in this case! The Either applicative functor we'll use will of course collect all validation messages in a list.

But first, for comparison, here's some simple imperative code to parse two numbers and then do something with them or display all errors:

var errors = new List<string>();

int userID;
var userID_ok = int.TryParse(req_userID, out userID);
if (!userID_ok)
    errors.Add("Invalid user ID");

int id;
var id_ok = int.TryParse(req_otherID, out id);
if (!id_ok)
    errors.Add("Invalid ID");

if (errors.Count > 0)
    setErrors(errors);
else
    Console.WriteLine(doSomething(userID, id));

Just like with CsFormlets, this Either applicative functor can be expressed either "directly", or using LINQ as explained by Tomas Petricek here. In this post I'll skip the direct syntax and use LINQ instead. And using LINQ, the above imperative code is translated into this:

var userID = FSharpOption.TryParseInt(req_userID)
    .ToFSharpChoice(FSharpList.New("Invalid User ID"));
var id = FSharpOption.TryParseInt(req_otherID)
    .ToFSharpChoice(FSharpList.New("Invalid ID"));

var result =
    from a in userID
    join b in id on 1 equals 1
    select doSomething(a, b);

result.Match(Console.WriteLine, setErrors);

Note that it's very similar to the monadic code we were first using, only instead of from...from... (which compiles to SelectMany(), i.e. monadic bind) we do from...join... .
"on 1 equals 1" is the syntactic tax we have to pay for bending LINQ like this.

result is of type FSharpChoice<int, FSharpList<string>> , that is it contains either the result of doSomething() or a list of errors.

This was a trivial example of using applicative functors for validation; just an introduction. It doesn't show one of its biggest strengths: composability. In the next post we'll look into validating a more complex domain.

Code is here.

Monday, August 1, 2011

Refactoring to monadic C# - part 2

This is a follow-up to my previous article "Refactoring to monadic C#", in which I described how to refactor a typical imperative piece of code into functional style, and introduced and applied the Maybe monad.

To quickly recap, the code I refactored was a parser for NuGet's version ranges, e.g. [1.0] or (1.0,2.0), similar to math interval notation.

It's probably a good idea to keep both part 1 and part 2 of this article open side by side for reference and comparison.

Nesting options

After the refactoring we did in part 1, the function that parsed the upper bound of the interval was:

var maxVersion = L.F((string[] parts) =>  {
    var p = parts.Length == 2 ? parts[1] : parts[0];
    if (string.IsNullOrWhiteSpace(p))
        return FSharpOption<Version>.Some(null);
    return ParseVersion(p);
});

But... isn't Some(null) kind of awkward? Aren't option types supposed to save us from nulls?

A useful thing about option types I haven't mentioned is that they can be nested, that is, unlike Nullable<T> it's perfectly valid (and useful) to have an option of an option. Using nested options the code looks like this:

var maxVersion = L.F((string[] parts) =>  {
    var p = parts.Length == 2 ? parts[1] : parts[0];
    if (string.IsNullOrWhiteSpace(p))
        return FSharpOption<FSharpOption<Version>>.Some(FSharpOption<Version>.None);
    return ParseVersion(p).Select(v => v.ToOption());
});

Of course, now we have to change the monadic code in LINQ to handle this nested option (changes in bold):

var versionRange = L.F(() => from x in checkLength(value)
                             from isMin in minInclusive(value)
                             from isMax in maxInclusive(value)
                             let val = value.Substring(1, value.Length - 2)
                             let parts = val.Split(',')
                             from y in checkParts(parts)
                             from min in minVersion(parts)
                             from max in maxVersion(parts)
                             select (IVersionSpec)new VersionSpec {
                                 IsMinInclusive = isMin, 
                                 MinVersion = min.HasValue() ? min.Value : null, 
                                 IsMaxInclusive = isMax, 
                                 MaxVersion = max.HasValue() ? max.Value : null,
                             });

In this particular case, Some(null) wasn't so bad because we're not doing anything else with that value, but more generally, if you need to further process the value, nesting options is the better, cleaner way.

Error messages

Now let's see a different but closely related example: we want to parse some values (perhaps from a web request) and show an error if any parsing fails. If the values are parsed successfully, we do "something" with them (e.g. look them up in a database or whatever).

A typical imperative solution might look like this:

int doSomething(int userID, int id) {
    // fetch some other entity, do "stuff"
    return userID + id;
}

void setError(string e) {}

const string req_userID = "123";
const string req_otherID = "999";
int userID;
var userID_ok = int.TryParse(req_userID, out userID);
if (!userID_ok) {
    setError("Invalid User ID");
} else {
    int id;
    var id_ok = int.TryParse(req_otherID, out id);
    if (!id_ok) {
        setError("Invalid ID");
    } else {
        Console.WriteLine(doSomething(userID, id));
    }
}

If you try to refactor this to Options, like the previous example, you'll quickly see that Options are not quite enough. Options contain either a value or nothing, but in this case we want something that contains either a value or an error message. Such a type is precisely called Either in Haskell. More generally, both Option (called Maybe in Haskell) and Either are sum types (also called tagged union or discriminated union). The most commonly used sum type is bool. Sum types are in turn a special case of algebraic data types.

But I digress. Back to our example, we want to map from Option to Either, by attaching the corresponding error message instead of None where the parsing fails. Just like Option, writing an Either is not hard, but since we're already using F#'s option type we might as well also use F#'s generic sum type (called Choice). C# doesn't have the notion of algebraic data types so we'll add some syntax sugar around Choice to make it more usable.

Defining a monad over Choice is trivial once you understand how the Option monad works. We only have to choose by convention that one of the branches will work as the "error" branch, and we treat it like None in Option. So when we hit an error in the computation, we skip the rest of the computation.

Once we have everything in place, we can translate the code above to:

var somethingOrError =
    from userID in FSharpOption.TryParseInt(req_userID).ToFSharpChoice("Invalid User ID")
    from id in FSharpOption.TryParseInt(req_otherID).ToFSharpChoice("Invalid ID")
    select doSomething(userID, id);

where somethingOrError is of type FSharpChoice<int,string> , that is, it contains either the result of doSomething(...) if the values were parsed correctly, or a string with the error message for the first failed parsing.

But we're not done yet! We still need to either write the result to console or call setError if something went wrong.

if (somethingOrError.IsChoice1Of2) {
    var r = (FSharpChoice<int, string>.Choice1Of2)somethingOrError;
    Console.WriteLine(r.Item);
} else {
    var r = (FSharpChoice<int, string>.Choice2Of2)somethingOrError;
    setError(r.Item);
}

Ewwww. In languages with first-class support for sum types, extracting values is done through pattern matching, not ifs and downcasting (which are leaking implementation details here). Pattern matching, as explained in the Haskell wikibook, "is a convenient way to bind variables to different parts of a given value".

Of course, we don't have pattern matching in C#, but we can easily write an ad-hoc match, by abstracting the actions from the last snippet, so we can now write:

somethingOrError.Match(Console.WriteLine, setError);

Much cleaner, and it also ensures that both branches are covered.

More generic pattern matching can be encoded in C# (there are numerous examples on the web) but it's just too awkward, complex, unwieldy for my taste.

Conclusion

Even though C# lacks first-class support for sum types and pattern matching, they can still be gainfully encoded and used. Using FSharpChoice for sum types is convenient because it already implements equality semantics and you only have to wrap pattern matching once.

Of course, it's not without downsides: FSharpChoice is esentially an anonymous sum type (Choice1Of2 and Choice2Of2 have no real meaning), so you can't tell one FSharpChoice<int,int> from another FSharpChoice<int,int>, even though they may be representing widely different things. Also, as you add more "choices" to the type, it gets increasingly unwieldy, i.e. what does the third type represent in a FSharpChoice<int,string,int,bool> ? Without the ability to name things you have to keep track of this in your head (or add suitable functions, at the cost of convenience and conciseness). However, as shown in this post, the two-type generic Choice is quite simple and useful.

As with the first part of this post, I specifically tried to avoid getting into too much detail about the implementation or how it works under the covers, instead focusing on the end result, at the cost of some handwaving. You can see the entire implementation (it's almost trivial really) on github.