Thursday, January 7, 2010

Translating Haskell to F# and other considerations

In my last post I showed one way to group consecutive integers in a list. When I realized I needed such a function I first went googling for it but all I found was this Haskell code:

-- | Group consecutive integers:
--
-- > groupConsec [1,2,3,5,6,8,9] == [[1,2,3],[5,6],[8,9]]
groupConsec :: [Int] -> [[Int]]
groupConsec = groupConsec' []
    where
      groupConsec' x   []    = x
      groupConsec' [] (y:ys) = groupConsec' [[y]] ys
      groupConsec' xs (y:ys) = if y - head (last xs) == length (last xs)
                               then groupConsec' (init xs ++ [last xs ++ [y]]) ys
                               else groupConsec' (     xs ++ [           [y]]) ys

As I was on a C# project it was easier to start from scratch than trying to port this. So I wrote what I showed on the last post and moved on.

But I thought it would be interesting to port it to F#, to see how much different it would be.

First, let's translate it literally, by making as little changes as possible to the original code:

let groupConsec =
    let head = List.head
    let last l = List.nth l (l.Length-1)
    let length = List.length
    let init l = List.ofSeq (Seq.take ((List.length l)-1) l)
    let rec groupConsec' a b =
        match a,b with
        | x, [] -> x
        | [], y::ys -> groupConsec' [[y]] ys
        | xs, y::ys -> if y - head (last xs) = length (last xs)
                       then groupConsec' (init xs @ [last xs @ [y]]) ys
                       else groupConsec' (     xs @ [          [y]]) ys
                       
    groupConsec' []

At the beginning there's a couple of functions that aren't built into F#, but the rest of it is syntactically pretty much the same.

BUT, syntactic similarity does not imply that they're semantically the same. There's a fundamental difference from the original code: Haskell is a lazy language, while F#, coming from the ML-family, does eager evaluation. This doesn't necessarily mean that lazy is always better than eager or eager better than lazy. Lazy languages seem to have their own issues.

So let's just replace all lists in the code with LazyLists and see what happens:

let lazyGroupConsec =
    let head = LazyList.head
    /// From Haskell prelude: http://www.haskell.org/onlinereport/standard-prelude.html
    let rec last =
            function
            | LazyList.Nil -> failwith "last: empty list"
            | LazyList.Cons(x,xs) -> 
            match xs with
            | LazyList.Nil -> x
            | _ -> last xs
            
    let length = LazyList.length
    let rec init = 
            function
            | LazyList.Nil -> failwith "init: empty list"
            | LazyList.Cons(x,xs) -> 
            match xs with
            | LazyList.Nil -> LazyList.empty()
            | _ -> LazyList.cons x (init xs)
            
    let (@) x y = LazyList.append x y
    let lazyList x = LazyList.cons x (LazyList.empty())
                
    let rec groupConsec' a b =
        match a,b with
        | x, LazyList.Nil -> x
        | LazyList.Nil, LazyList.Cons(y,ys) -> groupConsec' (lazyList (LazyList.ofList [y])) ys
        | xs, LazyList.Cons(y,ys) -> if y - head (last xs) = length (last xs)
                                     then groupConsec' (init xs @ lazyList (last xs @ (LazyList.ofList [y]))) ys
                                     else groupConsec' (     xs @ lazyList (          (LazyList.ofList [y]))) ys

    groupConsec' (LazyList.empty())

Ok, now the function is lazy, but the code took a serious hit in readability...

Let's take now a whole different approach: porting from the C# code, but using recursion instead of the loop to avoid the mutable list:

let lazyGroupConsec2 : int seq -> int list seq =
    let lastOf x = List.nth x ((List.length x) - 1)
    let rec groupConsec' group groups input =
            if Seq.isEmpty input
            then Seq.append groups [group]
            else
                let hd = Seq.head input
                let tl = Seq.skip 1 input
                if List.length group = 0 || hd - (lastOf group) <= 1
                then groupConsec' (group @ [hd]) groups tl
                else groupConsec' [hd] (Seq.append groups [group]) tl
    groupConsec' [] []

This one looks pretty neat, right? Too bad it needs a type annotation or it gets a value restriction error.

Now I'm too lazy (no pun intended) to properly analyze the complexity of each of these functions, but at least let's run a simple benchmark over a range of integers with some holes: let range = {1..2000} |> Seq.filter (fun i -> i % 5 <> 0)

Function Time
groupConsec 00:00.1383291
lazyGroupConsec 00:00.9355807
lazyGroupConsec2 02:03.2433208

In case it's not clear, the first, non-lazy one took about 138ms and the last one took over 2 minutes. The reason is that seq pretty much sucks when used recursively, as Brian explains in detail in this stackoverflow answer.

Let's try using a LazyList instead for the inner recursive function:

let lazyGroupConsec3 l =
    let lastOf x = List.nth x ((List.length x) - 1)
    let (+) x y = LazyList.append x (LazyList.ofList [y])
    let rec groupConsec' group groups input =
            if LazyList.isEmpty input
            then groups + group
            else
                let hd = LazyList.head input
                let tl = LazyList.tail input
                if List.length group = 0 || hd - (lastOf group) <= 1
                then groupConsec' (group @ [hd]) groups tl
                else groupConsec' [hd] (groups + group) tl
    groupConsec' [] (LazyList.empty()) (LazyList.ofSeq l)

This one took 95ms to process the same range of integers.

Conclusions:

  • Porting code from Haskell to F# might be tempting but beware of their fundamental differences.
  • Because of these differences, the best solution in Haskell might not be the best solution in F#.
  • You really need to know how seq, list and lazylist work or you'll have a big performance problem.
  • LazyLists are so cool I wish they were more tightly integrated to the language.

Homework:

  • Properly analyze big-O complexity, or at least measure it empirically.
  • Check that all recursive functions here get properly optimized for tail calls.

Grouping consecutive integers in C#

Quick post: an extension method to group consecutive integers:

/// <summary>
/// Groups consecutive integers. Requires asc-ordered list
/// </summary>
/// <param name="list">ASC-ordered list of integers</param>
/// <returns></returns>
public static IEnumerable<IEnumerable<int>> GroupConsecutive(this IEnumerable<int> list) {
    var group = new List<int>();
    foreach (var i in list) {
        if (group.Count == 0 || i - group[group.Count - 1] <= 1)
            group.Add(i);
        else {
            yield return group;
            group = new List<int> {i};
        }
    }
    yield return group;
}

Sample usage:

var groups = new[] {1, 2, 4, 5, 6, 9}.GroupConsecutive();
foreach (var g in groups)
    Console.WriteLine("group: {0}", string.Join(",", g.Select(x => x.ToString()).ToArray()));

which prints out:

group: 1,2
group: 4,5,6
group: 9