Thursday, June 20, 2013

Optimizing with the help of FsCheck

Recently I needed a function to transpose a jagged array in F#. As I knew Haskell probably had this in its standard library and I'm lazy, I hoogled “transpose” and followed the link to its source code, then translated it to F#:

module List =
    let tryHead = 
        | [] -> None
        | x::_ -> Some x

    let tryTail =
        | _::xs -> Some xs
        | _ -> None

    let rec transpose =
        | [] -> []
        | []::xs -> transpose xs
        | (x::xs)::xss -> (x::(List.choose tryHead xss))::transpose (xs::(List.choose tryTail xss))

The end.

Oh wait, I needed to process arrays, not lists. Well, I suppose we could convert the jagged array to a jagged list, and then back to an array:

let transpose x = x |> Array.toList |> Seq.toList |> List.transpose |> List.toArray |> Seq.toArray

It works, but it's a bit inefficient. For example, it does a lot of unnecessary allocations. Let's time it with a big array:

let bigArray = Array.init 3000 (fun i -> Array.init i id)
transpose bigArray |> ignore

Real: 00:00:01.369, CPU: 00:00:01.357, GC gen0: 79, gen1: 44, gen2: 1

We can do better. Since we're working with arrays, we could calculate the length of each array and preallocate it, then copy the correponding values. Problem is, that kind of code is very imperative, and tricky to get right.
Enter FsCheck. With FsCheck we can use the inefficient implementation as a model to ensure that the new, more efficient implementation is correct. It's very easy too:

let transpose2 (a: 'a[][]) = a // placeholder for the efficient implementation
FsCheck.Check.Quick ("transpose compare", fun (a: int[][]) -> transpose a = transpose2 a) 

Run this and FsCheck will generate jagged arrays to compare both implementations. As the new implementation is merely a placeholder for now, it won't take long to find a value that fails the comparison:

transpose compare-Falsifiable, after 1 test (0 shrinks) (StdGen (1547388233,295729162)):

Now that we have a simple but strong test harness we can confidently implement the optimized function. And many failed test runs later, this passes the 100 (by default) test cases generated by FsCheck:

let transpose2 (a: 'a[][]) =
    if a.Length = 0 then 
        let r = Array.zeroCreate (a |> Array.length |> Seq.max)
        for i in 0 .. r.Length-1 do
            let c = Array.zeroCreate (a |> Seq.filter (fun x -> Array.length x > i) |> Seq.length)
            let mutable k = 0
            for j in 0 .. a.Length-1 do
                if a.[j].Length > i then
                    let v = a.[j].[i]
                    c.[k] <- v
                    k <- k + 1
            r.[i] <- c

See, I told you it would be all imperative and messy! But is it more efficient?

transpose2 bigArray |> ignore

Real: 00:00:00.218, CPU: 00:00:00.218, GC gen0: 3, gen1: 2, gen2: 0

Yes it is (at least with this very unscientific benchmark).

BONUS: are you wondering why the original definition uses List.choose tryHead instead of List.head ? That is:

let rec transpose =
    | [] -> []
    | []::xs -> transpose xs
    | (x::xs)::xss -> (x::( List.head xss))::transpose (xs::( List.tail xss))

FsCheck can help with that too! Simply run this against the generated inputs and ignore the result (a trivial test):

FsCheck.Check.Quick("transpose", List.transpose >> ignore)

And watch it fail:

transpose-Falsifiable, after 1 test (6 shrinks) (StdGen (1278231111,295729178)):
[[true]; []]
with exception:
System.ArgumentException: The input list was empty.
Parameter name: list
   at Microsoft.FSharp.Collections.ListModule.Head[T](FSharpList`1 list)
   at[T,TResult](FSharpFunc`2 mapping, FSharpList`1 x)
Note how FsCheck shrinks the value to the simplest possible failing case. This usage is great to find unwanted partial functions.

One last comment: none of this is specific for F#. FsCheck has a nice C# interface! You could also use it from Fuchu.