Wednesday, September 5, 2012

A non-empty list type for .NET

A simple immutable list like the fundamental list type in OCaml, F# or Haskell can be expressed as:

type 'a Alist = 
    | Nil 
    | Cons of 'a * 'a Alist 

That is, it's either empty, or it's non-empty. We could refactor the non-empty part to a record type:

type 'a Alist = 
    | Nil 
    | Cons of 'a NonEmptyList 
and 'a NonEmptyList = { Head: 'a; Tail: 'a Alist }

This NonEmptyList type is clearly not a new data structure (it's still an immutable list). It may seem silly at first to use this as a separate list type, but it actually goes a long way towards making illegal states unrepresentable.

Because it is guaranteed not to be empty by the type system, it has certain interesting properties. For one, obviously getting the head of a non-empty list will always work, for any instance of the type, while List.head throws an exception for an empty list (here's the proof of why it can't have any other behavior).

The F# List module has many such partial functions that are undefined for empty lists: head, tail, reduce, average, min, max... and of course the respective functions in the Seq module and System.Linq.Enumerable. If you want to use one of these functions on a regular list/IEnumerable, you either have to immediately check if the input is empty first, and return a 'default' value (many people wrap this in a function e.g. MaxOrDefault(defaultValue), effectively making it a total function); or catch the possible exception every time. Otherwise you're risking a possible failure. Another way to make these functions total is by simply removing the empty list from the domain, that is, operating on non-empty lists.

The Haskell community generally recommends avoiding such trivially avoidable partial functions [1] [2] [3], and this advice applies equally to most (all?) languages, especially typed languages. At the very least, it's useful to be aware of where and why you're using a partial function.

Applicative validation makes for a good example of NonEmptyLists. I've written about applicative functor validation before, in F# and in C#, with examples. The type I used to represent a validation was Choice<'a, string list>. This means: either the value (when it's correct), or a list of errors. But strictly speaking, this type is too "loose", since it allows the value Choice2Of2 [], which intuitively means "The input was invalid, but there is no error". This simply doesn't make any sense. If the input is invalid, there must be at least one error. Thus, the correct type to use here is Choice<'a, string NonEmptyList>.

Another occurrence of NonEmptyList recently popped up while I was writing a library to bind the Urchin Data API. In this API there's a parameter that is mandatory, but admits more than one value: a perfect match for a NonEmptyList.

In general, when you find yourself not knowing what to do with the empty list case, or when you think "this list can't possibly be empty here", it may be an indication that you need a NonEmptyList.

The code is currently in my fork of FSharpx, it includes the usual functions: cons, map, append, toList, rev, collect, etc.

It's also usable from C#, here are some tests showing this.

I briefly touched on the subject of totality here, which has deep connections to Turing completeness. Here's some recommended further reading about it:

No comments:

Post a Comment